Skip to content

Supported LLM Providers

Savine operates on a Bring Your Own Key (BYOK) model. You maintain your billing relationship with the AI labs, retain ownership of your data, and we securely orchestrate the API calls.

Provider Comparison

ProviderModels AvailableSpeedStrengthCost TierFeatures
OpenAIGPT-4o, o1, o3-miniMediumReasoning, ToolsHighFunction Calling, Vision
AnthropicClaude 3.5 SonnetMediumCoding, InstructionsHighFunction Calling, Vision
GoogleGemini 2.0 FlashFastMultimodal, SpeedLowFunction Calling, Vision
GroqLlama 3.3, MixtralBlazingLatencyLowFunction Calling
MistralLarge, CodestralMediumCode GenerationMediumFunction Calling

Recommendation Guide

  • Speed & High Volume: Groq (Llama 3.3) or Google (Gemini 2.0 Flash)
  • Complex System Prompts & Coding: Anthropic (Claude 3.5 Sonnet)
  • General Enterprise Use: OpenAI (GPT-4o)

OpenAI

The industry standard. Note that reasoning models (o1, o3-mini) handle tool calling differently than the standard gpt-4o line.

agent.json Config:

json
"llm": {
  "provider": "openai",
  "model": "gpt-4o",
  "key_ref": "OPENAI_API_KEY"
}

Get your API key at platform.openai.com.


Anthropic

Claude 3.5 Sonnet is arguably the best model for following strict, complex system prompts and adhering to XML structural outputs.

agent.json Config:

json
"llm": {
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "key_ref": "ANTHROPIC_API_KEY"
}

Get your API key at console.anthropic.com.


Google

Gemini models provide an enormous context window (1.5M+ tokens) and extreme speed, particularly the flash variants.

agent.json Config:

json
"llm": {
  "provider": "google",
  "model": "gemini-2.0-flash",
  "key_ref": "GOOGLE_API_KEY"
}

Get your API key at aistudio.google.com.


Groq

Groq uses LPUs (Language Processing Units) to achieve 500+ tokens per second. Ideal for fast-feedback systems where latency is the primary bottleneck.

agent.json Config:

json
"llm": {
  "provider": "groq",
  "model": "llama-3.3-70b-versatile",
  "key_ref": "GROQ_API_KEY"
}

Custom OpenAI-Compatible Endpoints

If you serve open weights via Ollama, vLLM, Azure OpenAI, or LiteLLM, you can route Savine to your custom endpoint.

agent.json Config:

json
"llm": {
  "provider": "openai",
  "model": "custom-model-name",
  "key_ref": "CUSTOM_API_KEY",
  "base_url": "https://api.yourdomain.com/v1"
}

For local Ollama instances running alongside the Savine CLI, set base_url to http://host.docker.internal:11434/v1.


Cost Estimation

When building multi-agent systems, it is highly recommended to use cheaper, faster models (Gemini Flash, Groq) for routing/planning nodes, and reserve expensive models (GPT-4o, Claude 3.5 Sonnet) only for the final synthesis node.

Cost Calculator
Visit the Dashboard > Observability > Cost Analysis tab to run volume projection matrixes across the real pricing grids of every supported provider.