Database Fundamentals
Master the foundation of data storage and retrieval. Learn when to use SQL vs NoSQL, understand database types, and make informed decisions for your system design.
Database Types & When to Use Them
Different database types excel at different problems. Understanding their strengths and weaknesses helps you choose the right tool for your specific use case.
Relational (SQL)
- • ACID compliance
- • Complex queries
- • Data consistency
- • Mature ecosystem
- • Vertical scaling limits
- • Schema rigidity
- • JOIN performance
- • Financial transactions
- • User accounts
- • Inventory management
- • Reporting systems
Document (NoSQL)
- • Flexible schema
- • JSON-like structure
- • Horizontal scaling
- • Developer friendly
- • No complex queries
- • Data duplication
- • Eventual consistency
- • Content management
- • Catalogs
- • User profiles
- • Real-time analytics
Key-Value
- • Ultra-fast reads
- • Simple model
- • Massive scale
- • High availability
- • No complex queries
- • Limited relationships
- • No transactions
- • Caching
- • Session storage
- • Shopping carts
- • Real-time recommendations
Graph
- • Relationship queries
- • Complex connections
- • Path finding
- • Pattern matching
- • Complex setup
- • Limited tools
- • Steep learning curve
- • Social networks
- • Fraud detection
- • Recommendation engines
- • Network analysis
CAP Theorem in Practice
CAP Theorem states you can only guarantee 2 out of 3: Consistency, Availability, Partition tolerance. In distributed systems, network partitions are inevitable, so you must choose between consistency and availability.
Consistency (C)
All nodes see the same data simultaneously
Availability (A)
System remains operational at all times
Partition Tolerance (P)
System continues despite network failures
CP Systems (Consistency + Partition Tolerance)
AP Systems (Availability + Partition Tolerance)
Database Scaling Strategies
Vertical Scaling (Scale Up)
Horizontal Scaling (Scale Out)
Common Horizontal Scaling Techniques
Read Replicas
Create read-only copies of your database to handle read traffic
Sharding
Split data across multiple databases based on shard key
Federation
Split databases by function (users, products, orders)
Database Selection Framework
Use this decision tree to choose the right database type for your specific requirements.
Step 1: Data Structure
- • Complex relationships between entities
- • Need complex queries and aggregations
- • ACID compliance is critical
- • Structured data with fixed schema
- • Flexible or evolving schema
- • Simple queries, key-based access
- • Horizontal scaling requirements
- • Semi-structured or unstructured data
Step 2: Scale Requirements
Any database will work. Choose based on team expertise.
SQL with proper indexing and read replicas usually sufficient.
Consider NoSQL, sharding, or distributed SQL systems.
Database Quick Reference
When in Doubt
- • Start with PostgreSQL (best general-purpose DB)
- • Add Redis for caching and sessions
- • Consider read replicas before sharding
- • Monitor before optimizing
- • Avoid premature optimization
Red Flags
- • Multiple database types without clear justification
- • Choosing NoSQL only for "web scale"
- • Ignoring data consistency requirements
- • Not planning for growth patterns
- • Choosing unfamiliar technology under pressure
🎯 Database Selection in the Real World
Learn from actual database decisions made by major tech companies
Metrics
Outcome
Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.
Lessons Learned
- Photo metadata doesn't require strong consistency - eventual consistency is acceptable
- Cassandra's peer-to-peer architecture eliminated single points of failure
- Linear scaling allowed Instagram to handle massive growth without complex sharding
- Trade-off: Lost complex query capabilities but gained operational simplicity
ScenariosClick to explore
Context
Instagram moved from MySQL to Cassandra for photo metadata storage
Metrics
Outcome
Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.
Key Lessons
- •Photo metadata doesn't require strong consistency - eventual consistency is acceptable
- •Cassandra's peer-to-peer architecture eliminated single points of failure
- •Linear scaling allowed Instagram to handle massive growth without complex sharding
- •Trade-off: Lost complex query capabilities but gained operational simplicity
1. Instagram Photo Storage Migration
Context
Instagram moved from MySQL to Cassandra for photo metadata storage
Metrics
Outcome
Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.
Key Lessons
- •Photo metadata doesn't require strong consistency - eventual consistency is acceptable
- •Cassandra's peer-to-peer architecture eliminated single points of failure
- •Linear scaling allowed Instagram to handle massive growth without complex sharding
- •Trade-off: Lost complex query capabilities but gained operational simplicity
2. Discord Message Storage Architecture
Context
Discord uses both MongoDB and Cassandra for different aspects of messaging
Metrics
Outcome
Hybrid approach: MongoDB for structured guild data requiring consistency, Cassandra for message storage requiring massive scale and availability.
Key Lessons
- •Different data types often require different database technologies
- •MongoDB's flexible schema worked well for evolving guild/channel features
- •Cassandra's time-series capabilities excel for message history storage
- •Polyglot persistence: use the best tool for each specific data pattern
3. Netflix Recommendation Engine Data
Context
Netflix uses multiple databases for their recommendation system
Metrics
Outcome
Three-database strategy: MySQL for transactional data, Cassandra for time-series viewing data, graph DB for content relationships.
Key Lessons
- •ACID compliance crucial for billing and account management (MySQL)
- •Time-series viewing data benefits from Cassandra's partition key design
- •Graph databases excel at "users who watched X also watched Y" queries
- •Each database optimized for specific access patterns and consistency requirements
4. Airbnb Search and Booking System
Context
Airbnb evolved from MySQL to a multi-database architecture
Metrics
Outcome
Kept MySQL for transactional integrity, added Elasticsearch for complex search, Redis for performance. Horizontal sharding enabled massive scale.
Key Lessons
- •Booking transactions require ACID compliance - kept MySQL for core business logic
- •Geographic and amenity search requires full-text capabilities - added Elasticsearch
- •Aggressive caching with Redis reduced database load during traffic spikes
- •Careful sharding strategy based on geographic regions improved performance
5. Uber Real-time Location Tracking
Context
Uber built custom database solutions for real-time location data
Metrics
Outcome
Built Schemaless (custom NoSQL) for location history, Redis for real-time data, MySQL for transactions. Each optimized for specific use case.
Key Lessons
- •Real-time location data has different requirements than historical location data
- •Redis in-memory storage perfect for current driver locations with TTL
- •Custom database built when existing solutions couldn't meet specific requirements
- •Financial transactions stayed on MySQL for ACID compliance and regulatory requirements
6. GitHub Repository and Code Storage
Context
GitHub uses MySQL for metadata and Git for actual code storage
Metrics
Outcome
Leveraged Git's distributed nature for code storage, MySQL for relational metadata, Elasticsearch for search. Each tool optimized for its purpose.
Key Lessons
- •Git is already a distributed database for code - don't reinvent it
- •User accounts, permissions, and metadata fit well in traditional relational model
- •Code search across millions of repositories requires specialized search technology
- •File storage and metadata storage have very different scaling and consistency needs