Supported LLM Providers

Savine operates on a Bring Your Own Key (BYOK) model. You maintain your billing relationship with the AI labs, retain ownership of your data, and we securely orchestrate the API calls.

Provider Comparison

Provider	Models Available	Speed	Strength	Cost Tier	Features
OpenAI	GPT-4o, o1, o3-mini	Medium	Reasoning, Tools	High	Function Calling, Vision
Anthropic	Claude 3.5 Sonnet	Medium	Coding, Instructions	High	Function Calling, Vision
Google	Gemini 2.0 Flash	Fast	Multimodal, Speed	Low	Function Calling, Vision
Groq	Llama 3.3, Mixtral	Blazing	Latency	Low	Function Calling
Mistral	Large, Codestral	Medium	Code Generation	Medium	Function Calling

Recommendation Guide

Speed & High Volume: Groq (Llama 3.3) or Google (Gemini 2.0 Flash)
Complex System Prompts & Coding: Anthropic (Claude 3.5 Sonnet)
General Enterprise Use: OpenAI (GPT-4o)

OpenAI

The industry standard. Note that reasoning models (o1, o3-mini) handle tool calling differently than the standard gpt-4o line.

agent.json Config:

json

"llm": {
  "provider": "openai",
  "model": "gpt-4o",
  "key_ref": "OPENAI_API_KEY"
}

Get your API key at platform.openai.com.

Anthropic

Claude 3.5 Sonnet is arguably the best model for following strict, complex system prompts and adhering to XML structural outputs.

agent.json Config:

json

"llm": {
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "key_ref": "ANTHROPIC_API_KEY"
}

Get your API key at console.anthropic.com.

Google

Gemini models provide an enormous context window (1.5M+ tokens) and extreme speed, particularly the flash variants.

agent.json Config:

json

"llm": {
  "provider": "google",
  "model": "gemini-2.0-flash",
  "key_ref": "GOOGLE_API_KEY"
}

Get your API key at aistudio.google.com.

Groq

Groq uses LPUs (Language Processing Units) to achieve 500+ tokens per second. Ideal for fast-feedback systems where latency is the primary bottleneck.

agent.json Config:

json

"llm": {
  "provider": "groq",
  "model": "llama-3.3-70b-versatile",
  "key_ref": "GROQ_API_KEY"
}

Custom OpenAI-Compatible Endpoints

If you serve open weights via Ollama, vLLM, Azure OpenAI, or LiteLLM, you can route Savine to your custom endpoint.

agent.json Config:

json

"llm": {
  "provider": "openai",
  "model": "custom-model-name",
  "key_ref": "CUSTOM_API_KEY",
  "base_url": "https://api.yourdomain.com/v1"
}

For local Ollama instances running alongside the Savine CLI, set base_url to http://host.docker.internal:11434/v1.

Cost Estimation

When building multi-agent systems, it is highly recommended to use cheaper, faster models (Gemini Flash, Groq) for routing/planning nodes, and reserve expensive models (GPT-4o, Claude 3.5 Sonnet) only for the final synthesis node.

Cost Calculator
Visit the Dashboard > Observability > Cost Analysis tab to run volume projection matrixes across the real pricing grids of every supported provider.

Supported LLM Providers ​

Provider Comparison ​

Recommendation Guide ​

OpenAI ​

Anthropic ​

Google ​

Groq ​

Custom OpenAI-Compatible Endpoints ​

Cost Estimation ​

Supported LLM Providers

Provider Comparison

Recommendation Guide

OpenAI

Anthropic

Google

Groq

Custom OpenAI-Compatible Endpoints

Cost Estimation