System Designer - Learn System Design & Software Architecture

What is Azure Cosmos DB?

Azure Cosmos DB is Microsoft's globally distributed, multi-model database service designed for modern applications requiring massive scale, low latency, and high availability. It offers turnkey global distribution across 50+ Azure regions, five well-defined consistency levels, and comprehensive SLAs for availability, latency, throughput, and consistency.

With support for multiple APIs including SQL, MongoDB, Cassandra, Azure Table, and Gremlin, Cosmos DB allows you to use familiar tools and skills while gaining the benefits of a globally distributed database. It automatically scales throughput and storage based on demand and provides predictable performance through Request Units (RUs).

Cosmos DB Performance Calculator

Request Units/sec: 1,000

Storage: 100 GB

Consistency Level

Global Regions: 2

1,000

Point Reads/sec

200

Point Writes/sec

5ms

Read Latency

$150

Monthly Cost

Queries: 100/sec

Global Storage: 250GB

Partitions: 5

RU Utilization: 2.0%

Five Consistency Levels

Strong Consistency

Linearizability guarantee - reads always return the most recent committed value. Highest latency, strongest consistency.

Bounded Staleness

Reads lag behind writes by at most K versions or T time interval. Configurable staleness bounds.

Session Consistency (Default)

Read-your-writes, monotonic reads within a client session. Perfect balance of consistency and performance.

Consistent Prefix

Reads see writes in order, but may lag behind. No out-of-order reads, good for collaborative scenarios.

Eventual Consistency

No ordering guarantee, lowest latency and highest availability. Good for counters and non-critical data.

Multi-API Support

SQL API (Native)

• JSON document model

• SQL query syntax

• JavaScript stored procedures

• ACID transactions

MongoDB API

• MongoDB wire protocol

• Existing MongoDB drivers

• Aggregation pipeline

• GridFS support

Cassandra API

• CQL (Cassandra Query Language)

• Wide-column data model

• Existing Cassandra drivers

• Keyspace and table concepts

Azure Table API

• Azure Table Storage compatible

• Key-value data model

• Premium performance

• Global distribution

Gremlin API

• Apache TinkerPop Gremlin

• Graph data model

• Vertices and edges

• Graph traversals

Real-World Cosmos DB Implementations

Xbox Live

Powers gaming profiles, achievements, and social features for 100+ million gamers worldwide.

• Global gaming profile consistency
• Real-time leaderboards and achievements
• Session consistency for gaming sessions
• Multi-region low-latency access

Progressive Insurance

Uses Cosmos DB for real-time insurance quote calculations and customer data management.

• Real-time insurance quotes
• Customer profile management
• Claims processing workflows
• Regulatory compliance across states

Jet.com

Leverages Cosmos DB for e-commerce catalog, pricing, and recommendation systems.

• Product catalog and inventory
• Dynamic pricing algorithms
• Customer recommendation engine
• Order processing and tracking

Symantec

Utilizes Cosmos DB for global threat intelligence and security data analytics.

• Global threat intelligence database
• Real-time security event processing
• Malware signature distribution
• Customer security dashboard analytics

Cosmos DB Code Examples

SQL API Query

Query documents with SQL syntax and JOIN operations:

SQL API Query

SELECT 
    u.id,
    u.name,
    u.email,
    COUNT(o.id) as order_count,
    SUM(o.total) as total_spent
FROM users u 
JOIN orders o IN u.orders
WHERE u.city = 'Seattle' 
    AND o.date >= '2024-01-01'
GROUP BY u.id, u.name, u.email
ORDER BY total_spent DESC

MongoDB API

Use MongoDB drivers and aggregation pipeline:

MongoDB API Aggregation

db.users.aggregate([
    {
        $match: { 
            city: "Seattle",
            "orders.date": { $gte: "2024-01-01" }
        }
    },
    {
        $unwind: "$orders"
    },
    {
        $group: {
            _id: "$_id",
            name: { $first: "$name" },
            total_spent: { $sum: "$orders.total" },
            order_count: { $sum: 1 }
        }
    },
    { $sort: { total_spent: -1 } }
])

Gremlin Graph Traversal

Graph traversals for social networks and recommendations:

Gremlin Traversal

// Find friends of friends who like similar products
g.V('user123')
  .out('follows')
  .out('follows')
  .where(
    out('likes')
    .in('likes')
    .hasId('user123')
  )
  .dedup()
  .values('name')
  .limit(10)

Cosmos DB Best Practices

✅ Do

• Choose partition keys with high cardinality
• Use session consistency for most applications
• Implement proper retry logic with backoff
• Monitor RU consumption and optimize queries
• Use autoscale for variable workloads
• Design for cross-partition queries sparingly

❌ Don't

• Use sequential or timestamp partition keys
• Ignore hot partition warnings
• Over-provision RUs for steady-state workloads
• Store large documents (>100KB) without consideration
• Use strong consistency unless absolutely required
• Mix transactional and analytical queries

No quiz questions available

Questions prop is empty