Apache Cassandra: Wide-Column NoSQL Database

Master Cassandra's wide-column architecture, tunable consistency, and distributed system patterns for massive scale

35 min read
Not Started
Loading...

What is Apache Cassandra?

Apache Cassandra is a highly scalable, distributed wide-column NoSQL database designed to handle massive amounts of data across many commodity servers with no single point of failure. Originally developed by Facebook, it combines Amazon DynamoDB's distributed architecture with Google Bigtable's data model.

Cassandra excels in scenarios requiring high write throughput, linear scalability, and always-on availability. It's particularly well-suited for time-series data, IoT applications, and globally distributed systems where eventual consistency is acceptable.

Cassandra Cluster Calculator

20,000
Writes/sec
10ms
Write Latency
100%
Utilization
99.8%
Availability

Fault Tolerance: Can survive 2 node failure(s) with RF=3

Core Architecture Concepts

Ring Architecture

Nodes are organized in a ring with no master/slave relationships.

Node1 → Node2 → Node3 → Node1
Each node owns a token range

Consistent Hashing

Data is distributed based on partition key hash values.

hash(partition_key) → token
token determines node placement

Tunable Consistency

Choose consistency level per operation (ONE, QUORUM, ALL).

R + W > RF = Strong consistency
QUORUM reads + QUORUM writes

Wide-Column Model

Flexible schema with variable columns per row.

Row: user_id
Cols: name, email, last_login, ...

Data Modeling Patterns

Time-Series Pattern

Time Series Data Model
CREATE TABLE sensor_data (
  sensor_id text,
  timestamp timestamp,
  temperature decimal,
  humidity decimal,
  PRIMARY KEY (sensor_id, timestamp)
);

Partition by sensor_id, cluster by timestamp for efficient time-range queries.

Bucketing Pattern

Composite Partition Key Model
CREATE TABLE user_activity (
  user_id text,
  year_month text,
  activity_date timestamp,
  activity_type text,
  PRIMARY KEY ((user_id, year_month), activity_date)
);

Composite partition key prevents hot partitions and enables time-based queries.

Real-World Cassandra Implementations

Netflix

Operates 2,500+ Cassandra instances across multiple regions for global streaming.

  • • User viewing history and preferences
  • • Content metadata and recommendations
  • • A/B testing and analytics data
  • • 1 trillion operations per day

Instagram

Uses Cassandra for user feed generation and photo metadata storage.

  • • User activity feeds and timelines
  • • Photo and media metadata
  • • User relationship graphs
  • • 400+ million photos per day

Apple

Runs one of the largest Cassandra deployments for iCloud services.

  • • iCloud data synchronization
  • • App Store analytics
  • • iTunes metadata storage
  • • 75,000+ nodes across data centers

Uber

Leverages Cassandra for real-time location tracking and trip analytics.

  • • Driver and rider location data
  • • Trip history and analytics
  • • Real-time pricing calculations
  • • Fraud detection patterns

Cassandra Best Practices

✅ Do

  • • Design queries first, then model your data
  • • Use composite partition keys to distribute load
  • • Denormalize data for read performance
  • • Use appropriate consistency levels per use case
  • • Monitor compaction and repair processes
  • • Use TimeWindowCompactionStrategy for time-series

❌ Don't

  • • Use SELECT * or large result sets
  • • Create wide partitions (>100MB)
  • • Use secondary indexes on high-cardinality columns
  • • Rely on ALLOW FILTERING in production
  • • Use null values for clustering columns
  • • Ignore tombstone accumulation

📝 Cassandra Knowledge Quiz

1 of 6Current: 0/6

What type of NoSQL database is Apache Cassandra?