Skip to main content
Components LLM Gateway

LiteLLM

Core Stack

Unified interface for 100+ LLMs

Version
1.0.0
Last Updated
2024-01-05
Difficulty
Intermediate
Reading Time
3 min

LiteLLM

LiteLLM provides a unified interface for over 100 different LLM providers, making it easy to switch between models, implement fallbacks, and optimize costs across multiple providers.

Key Features

  • Unified API: Single interface for 100+ LLM providers
  • Cost Optimization: Built-in cost tracking and optimization features
  • Fallback Mechanisms: Automatic failover between providers
  • Load Balancing: Distribute requests across multiple models
  • Easy Provider Switching: Change providers without code changes

Installation

1
pip install litellm

Quick Start

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from litellm import completion

# OpenAI
response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Anthropic Claude
response = completion(
    model="claude-3-sonnet-20240229",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Google Gemini
response = completion(
    model="gemini-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Supported Providers

LiteLLM supports major providers including:

  • OpenAI: GPT-3.5, GPT-4, GPT-4 Turbo
  • Anthropic: Claude 3 (Opus, Sonnet, Haiku)
  • Google: Gemini Pro, PaLM
  • Cohere: Command, Command Light
  • Hugging Face: Open source models
  • Azure OpenAI: Enterprise OpenAI models
  • AWS Bedrock: Amazon’s managed LLM service

Use Cases

  • Multi-Provider Applications: Use different models for different tasks
  • Cost Optimization: Route requests to the most cost-effective provider
  • Provider Redundancy: Implement fallbacks for high availability
  • A/B Testing: Compare performance across different models

Best Practices

  1. Set Up Fallbacks: Configure multiple providers for reliability
  2. Monitor Costs: Use built-in cost tracking features
  3. Choose Models Wisely: Match model capabilities to your use case
  4. Handle Rate Limits: Implement proper retry logic
  5. Cache Responses: Cache common responses to reduce costs

Advanced Features

Fallback Configuration

1
2
3
4
5
6
7
8
from litellm import completion

# Configure fallback models
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    fallbacks=["gpt-3.5-turbo", "claude-3-sonnet-20240229"]
)

Load Balancing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from litellm import Router

# Set up load balancing
router = Router(
    model_list=[
        {
            "model_name": "gpt-3.5-turbo",
            "litellm_params": {
                "model": "gpt-3.5-turbo",
                "api_key": "your-openai-key"
            }
        },
        {
            "model_name": "claude-3-sonnet",
            "litellm_params": {
                "model": "claude-3-sonnet-20240229",
                "api_key": "your-anthropic-key"
            }
        }
    ]
)

response = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Cost Tracking

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from litellm import completion, cost_per_token

response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Calculate cost
cost = cost_per_token(
    model="gpt-3.5-turbo",
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens
)

print(f"Request cost: ${cost:.4f}")

Integration with FastAPI

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from fastapi import FastAPI
from litellm import completion
from pydantic import BaseModel

app = FastAPI()

class ChatRequest(BaseModel):
    message: str
    model: str = "gpt-3.5-turbo"

@app.post("/chat")
async def chat(request: ChatRequest):
    response = completion(
        model=request.model,
        messages=[{"role": "user", "content": request.message}]
    )
    
    return {
        "response": response.choices[0].message.content,
        "model": request.model,
        "cost": cost_per_token(
            model=request.model,
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens
        )
    }

Resources

Alternatives

OpenAI API

Anthropic SDK

Quick Decision Guide

Choose LiteLLM for the recommended stack with proven patterns and comprehensive support.