AI Gateway Patterns
Master AI Gateway patterns: intelligent routing, load balancing, fallback strategies, and unified AI API management.
45 min read•Advanced
Not Started
Loading...
What are AI Gateway Patterns?
AI Gateway patterns provide a unified entry point for managing multiple AI models and services. They handle intelligent routing, load balancing, fallback strategies, rate limiting, authentication, and observability across diverse AI providers and models.
Key Benefits: Unified API interface, intelligent model routing, cost optimization, improved reliability through fallbacks, and centralized governance and monitoring.
AI Gateway Performance Calculator
1000 RPS
3 models
Gateway Performance
Efficiency:92%
Avg Latency:90ms
Requests/Model:333 RPS
Cost Multiplier:0.99x
Availability:99.95%
Complexity:Medium
Core AI Gateway Patterns
Model Router Pattern
- • Route requests based on model capabilities
- • Version-based routing for A/B testing
- • Request classification and smart routing
- • Dynamic model selection
- • Context-aware model assignment
Load Balancer Pattern
- • Distribute load across model instances
- • Health check and failover management
- • Weighted routing based on performance
- • Queue management and backpressure
- • Auto-scaling based on demand
Fallback Pattern
- • Cascade to backup models on failure
- • Quality-based fallback strategies
- • Cross-provider failover
- • Graceful degradation patterns
- • Circuit breaker implementation
Request Aggregator
- • Batch multiple requests for efficiency
- • Multi-model ensemble patterns
- • Response merging and consensus
- • Parallel request processing
- • Result ranking and selection
Implementation Examples
AI Gateway Service Architecture
from fastapi import FastAPI, HTTPException, Depends
from typing import List, Dict, Any, Optional
import asyncio
import time
import random
from enum import Enum
from pydantic import BaseModel
class RoutingStrategy(str, Enum):
ROUND_ROBIN = "round_robin"
WEIGHTED = "weighted"
LATENCY_BASED = "latency_based"
COST_OPTIMIZED = "cost_optimized"
class ModelConfig(BaseModel):
model_id: str
endpoint_url: str
weight: float = 1.0
max_tokens: int = 4096
cost_per_token: float = 0.0001
avg_latency_ms: float = 100
capabilities: List[str] = []
fallback_priority: int = 1
class AIGateway:
def __init__(self):
self.models: Dict[str, ModelConfig] = {}
self.routing_strategy = RoutingStrategy.WEIGHTED
self.fallback_enabled = True
self.circuit_breakers = {}
self.request_stats = {}
def register_model(self, config: ModelConfig):
"""Register a new model with the gateway"""
self.models[config.model_id] = config
self.circuit_breakers[config.model_id] = {
'failures': 0,
'last_failure': 0,
'state': 'closed', # closed, open, half_open
'failure_threshold': 5
}
async def route_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
"""Main request routing logic"""
try:
# 1. Select model based on routing strategy
selected_model = self._select_model(request)
# 2. Check circuit breaker
if not self._is_circuit_closed(selected_model.model_id):
if self.fallback_enabled:
selected_model = self._get_fallback_model(selected_model.model_id)
else:
raise HTTPException(503, f"Model {selected_model.model_id} unavailable")
# 3. Execute request with monitoring
start_time = time.time()
response = await self._execute_request(selected_model, request)
latency = (time.time() - start_time) * 1000
# 4. Update stats and circuit breaker
self._update_stats(selected_model.model_id, latency, success=True)
self._reset_circuit_breaker(selected_model.model_id)
return {
'response': response,
'model_used': selected_model.model_id,
'latency_ms': latency,
'routing_strategy': self.routing_strategy.value
}
except Exception as e:
# Handle failures and circuit breaker logic
if 'selected_model' in locals():
self._handle_failure(selected_model.model_id, str(e))
# Try fallback if enabled
if self.fallback_enabled and 'selected_model' in locals():
return await self._try_fallback(selected_model.model_id, request)
raise HTTPException(500, f"AI Gateway error: {str(e)}")
def _select_model(self, request: Dict[str, Any]) -> ModelConfig:
"""Select best model based on routing strategy"""
available_models = [m for m in self.models.values()
if self._is_circuit_closed(m.model_id)]
if not available_models:
raise HTTPException(503, "No models available")
if self.routing_strategy == RoutingStrategy.WEIGHTED:
return self._weighted_selection(available_models)
elif self.routing_strategy == RoutingStrategy.LATENCY_BASED:
return min(available_models, key=lambda m: m.avg_latency_ms)
elif self.routing_strategy == RoutingStrategy.COST_OPTIMIZED:
return min(available_models, key=lambda m: m.cost_per_token)
else: # ROUND_ROBIN
return self._round_robin_selection(available_models)
def _weighted_selection(self, models: List[ModelConfig]) -> ModelConfig:
"""Weighted random selection based on model weights"""
total_weight = sum(m.weight for m in models)
rand_value = random.uniform(0, total_weight)
current_weight = 0
for model in models:
current_weight += model.weight
if rand_value <= current_weight:
return model
return models[0] # Fallback
def _is_circuit_closed(self, model_id: str) -> bool:
"""Check if circuit breaker allows requests"""
breaker = self.circuit_breakers.get(model_id, {})
if breaker.get('state') == 'open':
# Check if we should try half-open
if time.time() - breaker.get('last_failure', 0) > 30: # 30 sec cooldown
breaker['state'] = 'half_open'
return True
return False
return True
async def _execute_request(self, model: ModelConfig, request: Dict[str, Any]):
"""Execute request against selected model"""
# Simulated model API call
await asyncio.sleep(model.avg_latency_ms / 1000)
# Simulate occasional failures
if random.random() < 0.05: # 5% failure rate
raise Exception(f"Model {model.model_id} request failed")
return {
'text': f"Response from {model.model_id}",
'tokens_used': random.randint(50, 200),
'model_version': '1.0'
}
def _handle_failure(self, model_id: str, error: str):
"""Handle model failure and update circuit breaker"""
breaker = self.circuit_breakers[model_id]
breaker['failures'] += 1
breaker['last_failure'] = time.time()
if breaker['failures'] >= breaker['failure_threshold']:
breaker['state'] = 'open'
print(f"Circuit breaker opened for {model_id}")
async def _try_fallback(self, failed_model_id: str, request: Dict[str, Any]):
"""Try fallback models in priority order"""
fallback_models = sorted(
[m for m in self.models.values()
if m.model_id != failed_model_id and self._is_circuit_closed(m.model_id)],
key=lambda m: m.fallback_priority
)
for model in fallback_models:
try:
response = await self._execute_request(model, request)
return {
'response': response,
'model_used': model.model_id,
'fallback_from': failed_model_id,
'routing_strategy': 'fallback'
}
except Exception:
continue
raise HTTPException(503, "All fallback models failed")
# FastAPI Application
app = FastAPI(title="AI Gateway Service")
gateway = AIGateway()
# Register models
gateway.register_model(ModelConfig(
model_id="gpt-4",
endpoint_url="https://api.openai.com/v1/chat/completions",
weight=2.0,
cost_per_token=0.00003,
avg_latency_ms=800,
capabilities=["reasoning", "coding", "analysis"],
fallback_priority=1
))
gateway.register_model(ModelConfig(
model_id="claude-3",
endpoint_url="https://api.anthropic.com/v1/messages",
weight=1.8,
cost_per_token=0.000025,
avg_latency_ms=650,
capabilities=["reasoning", "writing", "analysis"],
fallback_priority=2
))
@app.post("/chat/completions")
async def chat_completions(request: Dict[str, Any]):
return await gateway.route_request(request)
@app.get("/gateway/status")
async def gateway_status():
return {
'models': len(gateway.models),
'routing_strategy': gateway.routing_strategy.value,
'fallback_enabled': gateway.fallback_enabled,
'circuit_breakers': {
model_id: breaker['state']
for model_id, breaker in gateway.circuit_breakers.items()
}
}Intelligent Request Classification
class RequestClassifier:
def __init__(self):
self.model_capabilities = {
'gpt-4': {'reasoning': 0.95, 'coding': 0.9, 'creative': 0.85, 'analysis': 0.9},
'claude-3': {'reasoning': 0.9, 'writing': 0.95, 'analysis': 0.85, 'safety': 0.95},
'llama-2': {'coding': 0.8, 'general': 0.85, 'cost_efficient': 0.95},
'gemini-pro': {'multimodal': 0.9, 'reasoning': 0.85, 'coding': 0.8}
}
self.task_patterns = {
'coding': [r'write.*code', r'implement.*function', r'debug', r'refactor'],
'reasoning': [r'explain.*why', r'analyze', r'compare', r'evaluate'],
'creative': [r'write.*story', r'generate.*content', r'brainstorm'],
'analysis': [r'summarize', r'extract.*data', r'analyze.*document']
}
def classify_request(self, request_text: str, context: Dict = None) -> Dict[str, Any]:
"""Classify request and recommend best model"""
import re
# Extract task type from request
task_scores = {}
for task_type, patterns in self.task_patterns.items():
score = 0
for pattern in patterns:
if re.search(pattern, request_text.lower()):
score += 1
task_scores[task_type] = score / len(patterns)
# Find dominant task type
primary_task = max(task_scores, key=task_scores.get) if task_scores else 'general'
# Score models for this task
model_scores = {}
for model_id, capabilities in self.model_capabilities.items():
score = capabilities.get(primary_task, 0.5)
# Apply context-based adjustments
if context:
if context.get('budget') == 'low':
score *= capabilities.get('cost_efficient', 0.8)
if context.get('latency_critical'):
score *= (1.0 - capabilities.get('avg_latency_ms', 500) / 1000)
if context.get('safety_critical'):
score *= capabilities.get('safety', 0.7)
model_scores[model_id] = score
# Recommend top models
ranked_models = sorted(model_scores.items(), key=lambda x: x[1], reverse=True)
return {
'primary_task': primary_task,
'task_confidence': max(task_scores.values()) if task_scores else 0,
'recommended_models': ranked_models[:3],
'routing_reason': f"Best for {primary_task} tasks"
}
# Usage in gateway
class EnhancedAIGateway(AIGateway):
def __init__(self):
super().__init__()
self.classifier = RequestClassifier()
def _select_model(self, request: Dict[str, Any]) -> ModelConfig:
"""Enhanced model selection with intelligent classification"""
# Classify the request
classification = self.classifier.classify_request(
request.get('messages', [{}])[-1].get('content', ''),
request.get('context', {})
)
# Get recommended model
recommended_model_id = classification['recommended_models'][0][0]
# Check if recommended model is available
if (recommended_model_id in self.models and
self._is_circuit_closed(recommended_model_id)):
return self.models[recommended_model_id]
# Fallback to standard routing
return super()._select_model(request)Real-World AI Gateway Implementations
OpenAI Gateway
- • Routes between GPT-3.5, GPT-4, and specialized models
- • Handles 100M+ API requests daily
- • Cost-based routing saves 30% on inference costs
- • Multi-region failover with <50ms switching
- • Used by ChatGPT, GitHub Copilot, Microsoft products
Anthropic Claude Gateway
- • Intelligent routing based on safety requirements
- • Handles constitutional AI filtering at gateway level
- • Context-aware model selection (Claude vs Claude Instant)
- • Real-time quality monitoring and fallbacks
- • Powers enterprise Claude deployments
Hugging Face Inference API
- • Routes across 50,000+ open-source models
- • Dynamic scaling based on model popularity
- • Automatic fallback to similar models
- • Serves 1B+ inferences monthly
- • Powers startups and enterprise ML workflows
Google Vertex AI Gateway
- • Unified access to PaLM, Gemini, and custom models
- • Intelligent routing based on model capabilities
- • Enterprise-grade security and compliance
- • Integrated with Google Cloud services
- • Used by major enterprise customers
AI Gateway Best Practices
✅ Do
- Implement comprehensive health checks for all models
- Use circuit breakers to prevent cascade failures
- Monitor latency, cost, and quality metrics continuously
- Implement graceful fallback strategies
- Use request classification for intelligent routing
- Implement proper authentication and rate limiting
❌ Don't
- Route blindly without understanding request context
- Ignore model-specific rate limits and quotas
- Create single points of failure in the gateway
- Skip monitoring and observability implementation
- Hard-code model endpoints or configurations
- Neglect security and access control
No quiz questions available
Quiz ID "ai-gateway-patterns" not found