What is Apache Cassandra?
Apache Cassandra is a highly scalable, distributed wide-column NoSQL database designed to handle massive amounts of data across many commodity servers with no single point of failure. Originally developed by Facebook, it combines Amazon DynamoDB's distributed architecture with Google Bigtable's data model.
Cassandra excels in scenarios requiring high write throughput, linear scalability, and always-on availability. It's particularly well-suited for time-series data, IoT applications, and globally distributed systems where eventual consistency is acceptable.
Cassandra Cluster Calculator
Fault Tolerance: Can survive 2 node failure(s) with RF=3
Core Architecture Concepts
Ring Architecture
Nodes are organized in a ring with no master/slave relationships.
Each node owns a token range
Consistent Hashing
Data is distributed based on partition key hash values.
token determines node placement
Tunable Consistency
Choose consistency level per operation (ONE, QUORUM, ALL).
QUORUM reads + QUORUM writes
Wide-Column Model
Flexible schema with variable columns per row.
Cols: name, email, last_login, ...
Data Modeling Patterns
Time-Series Pattern
CREATE TABLE sensor_data (
sensor_id text,
timestamp timestamp,
temperature decimal,
humidity decimal,
PRIMARY KEY (sensor_id, timestamp)
);
Partition by sensor_id, cluster by timestamp for efficient time-range queries.
Bucketing Pattern
CREATE TABLE user_activity (
user_id text,
year_month text,
activity_date timestamp,
activity_type text,
PRIMARY KEY ((user_id, year_month), activity_date)
);
Composite partition key prevents hot partitions and enables time-based queries.
Real-World Cassandra Implementations
Netflix
Operates 2,500+ Cassandra instances across multiple regions for global streaming.
- • User viewing history and preferences
- • Content metadata and recommendations
- • A/B testing and analytics data
- • 1 trillion operations per day
Uses Cassandra for user feed generation and photo metadata storage.
- • User activity feeds and timelines
- • Photo and media metadata
- • User relationship graphs
- • 400+ million photos per day
Apple
Runs one of the largest Cassandra deployments for iCloud services.
- • iCloud data synchronization
- • App Store analytics
- • iTunes metadata storage
- • 75,000+ nodes across data centers
Uber
Leverages Cassandra for real-time location tracking and trip analytics.
- • Driver and rider location data
- • Trip history and analytics
- • Real-time pricing calculations
- • Fraud detection patterns
Cassandra Best Practices
✅ Do
- • Design queries first, then model your data
- • Use composite partition keys to distribute load
- • Denormalize data for read performance
- • Use appropriate consistency levels per use case
- • Monitor compaction and repair processes
- • Use TimeWindowCompactionStrategy for time-series
❌ Don't
- • Use SELECT * or large result sets
- • Create wide partitions (>100MB)
- • Use secondary indexes on high-cardinality columns
- • Rely on ALLOW FILTERING in production
- • Use null values for clustering columns
- • Ignore tombstone accumulation