Supported LLM Providers
Savine operates on a Bring Your Own Key (BYOK) model. You maintain your billing relationship with the AI labs, retain ownership of your data, and we securely orchestrate the API calls.
Provider Comparison
| Provider | Models Available | Speed | Strength | Cost Tier | Features |
|---|---|---|---|---|---|
| OpenAI | GPT-4o, o1, o3-mini | Medium | Reasoning, Tools | High | Function Calling, Vision |
| Anthropic | Claude 3.5 Sonnet | Medium | Coding, Instructions | High | Function Calling, Vision |
| Gemini 2.0 Flash | Fast | Multimodal, Speed | Low | Function Calling, Vision | |
| Groq | Llama 3.3, Mixtral | Blazing | Latency | Low | Function Calling |
| Mistral | Large, Codestral | Medium | Code Generation | Medium | Function Calling |
Recommendation Guide
- Speed & High Volume: Groq (Llama 3.3) or Google (Gemini 2.0 Flash)
- Complex System Prompts & Coding: Anthropic (Claude 3.5 Sonnet)
- General Enterprise Use: OpenAI (GPT-4o)
OpenAI
The industry standard. Note that reasoning models (o1, o3-mini) handle tool calling differently than the standard gpt-4o line.
agent.json Config:
"llm": {
"provider": "openai",
"model": "gpt-4o",
"key_ref": "OPENAI_API_KEY"
}Get your API key at platform.openai.com.
Anthropic
Claude 3.5 Sonnet is arguably the best model for following strict, complex system prompts and adhering to XML structural outputs.
agent.json Config:
"llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"key_ref": "ANTHROPIC_API_KEY"
}Get your API key at console.anthropic.com.
Google
Gemini models provide an enormous context window (1.5M+ tokens) and extreme speed, particularly the flash variants.
agent.json Config:
"llm": {
"provider": "google",
"model": "gemini-2.0-flash",
"key_ref": "GOOGLE_API_KEY"
}Get your API key at aistudio.google.com.
Groq
Groq uses LPUs (Language Processing Units) to achieve 500+ tokens per second. Ideal for fast-feedback systems where latency is the primary bottleneck.
agent.json Config:
"llm": {
"provider": "groq",
"model": "llama-3.3-70b-versatile",
"key_ref": "GROQ_API_KEY"
}Custom OpenAI-Compatible Endpoints
If you serve open weights via Ollama, vLLM, Azure OpenAI, or LiteLLM, you can route Savine to your custom endpoint.
agent.json Config:
"llm": {
"provider": "openai",
"model": "custom-model-name",
"key_ref": "CUSTOM_API_KEY",
"base_url": "https://api.yourdomain.com/v1"
}For local Ollama instances running alongside the Savine CLI, set base_url to http://host.docker.internal:11434/v1.
Cost Estimation
When building multi-agent systems, it is highly recommended to use cheaper, faster models (Gemini Flash, Groq) for routing/planning nodes, and reserve expensive models (GPT-4o, Claude 3.5 Sonnet) only for the final synthesis node.
Visit the Dashboard > Observability > Cost Analysis tab to run volume projection matrixes across the real pricing grids of every supported provider.