Database Fundamentals

20 min read•Beginner

Not Started

Master the foundation of data storage and retrieval. Learn when to use SQL vs NoSQL, understand database types, and make informed decisions for your system design.

Database Types & When to Use Them

Different database types excel at different problems. Understanding their strengths and weaknesses helps you choose the right tool for your specific use case.

Relational (SQL)

Examples

PostgreSQL

MySQL

Oracle

SQL Server

Strengths

• ACID compliance
• Complex queries
• Data consistency
• Mature ecosystem

Weaknesses

• Vertical scaling limits
• Schema rigidity
• JOIN performance

Best Use Cases

• Financial transactions
• User accounts
• Inventory management
• Reporting systems

Document (NoSQL)

Examples

MongoDB

CouchDB

Amazon DocumentDB

Strengths

• Flexible schema
• JSON-like structure
• Horizontal scaling
• Developer friendly

Weaknesses

• No complex queries
• Data duplication
• Eventual consistency

Best Use Cases

• Content management
• Catalogs
• User profiles
• Real-time analytics

Key-Value

Examples

Redis

DynamoDB

Cassandra

Riak

Strengths

• Ultra-fast reads
• Simple model
• Massive scale
• High availability

Weaknesses

• No complex queries
• Limited relationships
• No transactions

Best Use Cases

• Caching
• Session storage
• Shopping carts
• Real-time recommendations

Graph

Examples

Neo4j

Amazon Neptune

ArangoDB

Strengths

• Relationship queries
• Complex connections
• Path finding
• Pattern matching

Weaknesses

• Complex setup
• Limited tools
• Steep learning curve

Best Use Cases

• Social networks
• Fraud detection
• Recommendation engines
• Network analysis

CAP Theorem in Practice

CAP Theorem states you can only guarantee 2 out of 3: Consistency, Availability, Partition tolerance. In distributed systems, network partitions are inevitable, so you must choose between consistency and availability.

Consistency (C)

All nodes see the same data simultaneously

Example: Banking systems - account balance must be accurate across all systems

Availability (A)

System remains operational at all times

Example: Social media - users can always post, even if some data is temporarily inconsistent

Partition Tolerance (P)

System continues despite network failures

Example: Must handle when servers in different data centers cannot communicate

CP Systems (Consistency + Partition Tolerance)

Examples

MongoDB, Redis Cluster, HBase

Trade-off

System may become unavailable during network partitions to maintain data consistency

Use Cases

Financial systems, inventory management, critical data integrity

AP Systems (Availability + Partition Tolerance)

Examples

Cassandra, DynamoDB, CouchDB

Trade-off

Different nodes may return different data temporarily (eventual consistency)

Use Cases

Social media, content delivery, real-time analytics

Database Scaling Strategies

Vertical Scaling (Scale Up)

Approach

Add more CPU, RAM, or storage to existing server

Cost

Exponentially expensive at scale

$$$$

Complexity

No code changes needed

Low

Limits

Hardware limitations

High

Best for: Small to medium applications, when you need immediate performance boost

Horizontal Scaling (Scale Out)

Approach

Add more servers to distribute the load

Cost

Linear cost scaling

Complexity

Requires architectural changes

High

Limits

Nearly unlimited scale

Low

Best for: Large-scale applications, when unlimited growth is expected

Common Horizontal Scaling Techniques

Read Replicas

Create read-only copies of your database to handle read traffic

Good for: Read-heavy workloads (blogs, catalogs)

Sharding

Split data across multiple databases based on shard key

Good for: Large datasets, write-heavy workloads

Federation

Split databases by function (users, products, orders)

Good for: Microservices, domain separation

Database Selection Framework

Use this decision tree to choose the right database type for your specific requirements.

Step 1: Data Structure

Choose SQL if:

• Complex relationships between entities
• Need complex queries and aggregations
• ACID compliance is critical
• Structured data with fixed schema

Choose NoSQL if:

• Flexible or evolving schema
• Simple queries, key-based access
• Horizontal scaling requirements
• Semi-structured or unstructured data

Step 2: Scale Requirements

Small Scale (< 1M records)

Any database will work. Choose based on team expertise.

Medium Scale (1M - 100M records)

SQL with proper indexing and read replicas usually sufficient.

Large Scale (> 100M records)

Consider NoSQL, sharding, or distributed SQL systems.

Database Quick Reference

When in Doubt

• Start with PostgreSQL (best general-purpose DB)
• Add Redis for caching and sessions
• Consider read replicas before sharding
• Monitor before optimizing
• Avoid premature optimization

Red Flags

• Multiple database types without clear justification
• Choosing NoSQL only for "web scale"
• Ignoring data consistency requirements
• Not planning for growth patterns
• Choosing unfamiliar technology under pressure

🎯 Database Selection in the Real World

Learn from actual database decisions made by major tech companies

Metrics

Photo Uploads

95M photos/day

MySQL Performance

Struggling with sharding

Cassandra Performance

99.99% availability

Migration Result

Linear scaling achieved

Outcome

Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.

Lessons Learned

Photo metadata doesn't require strong consistency - eventual consistency is acceptable
Cassandra's peer-to-peer architecture eliminated single points of failure
Linear scaling allowed Instagram to handle massive growth without complex sharding
Trade-off: Lost complex query capabilities but gained operational simplicity

ScenariosClick to explore

Instagram Photo Storage Migration

Instagram moved from MySQL to Cassandra for photo metadata storage

Discord Message Storage Architecture

Discord uses both MongoDB and Cassandra for different aspects of messaging

Netflix Recommendation Engine Data

Netflix uses multiple databases for their recommendation system

Airbnb Search and Booking System

Airbnb evolved from MySQL to a multi-database architecture

Uber Real-time Location Tracking

Uber built custom database solutions for real-time location data

GitHub Repository and Code Storage

GitHub uses MySQL for metadata and Git for actual code storage

Context

Instagram moved from MySQL to Cassandra for photo metadata storage

Metrics

Photo Uploads

95M photos/day

MySQL Performance

Struggling with sharding

Cassandra Performance

99.99% availability

Migration Result

Linear scaling achieved

Outcome

Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.

Key Lessons

•Photo metadata doesn't require strong consistency - eventual consistency is acceptable
•Cassandra's peer-to-peer architecture eliminated single points of failure
•Linear scaling allowed Instagram to handle massive growth without complex sharding
•Trade-off: Lost complex query capabilities but gained operational simplicity

1. Instagram Photo Storage Migration

Context

Instagram moved from MySQL to Cassandra for photo metadata storage

Metrics

Photo Uploads

95M photos/day

MySQL Performance

Struggling with sharding

Cassandra Performance

99.99% availability

Migration Result

Linear scaling achieved

Outcome

Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.

Key Lessons

•Photo metadata doesn't require strong consistency - eventual consistency is acceptable
•Cassandra's peer-to-peer architecture eliminated single points of failure
•Linear scaling allowed Instagram to handle massive growth without complex sharding
•Trade-off: Lost complex query capabilities but gained operational simplicity

2. Discord Message Storage Architecture

Context

Discord uses both MongoDB and Cassandra for different aspects of messaging

Metrics

Messages/Day

1+ billion messages

MongoDB Usage

Guild/channel metadata

Cassandra Usage

Message storage & history

Read Latency

<50ms globally

Outcome

Hybrid approach: MongoDB for structured guild data requiring consistency, Cassandra for message storage requiring massive scale and availability.

Key Lessons

•Different data types often require different database technologies
•MongoDB's flexible schema worked well for evolving guild/channel features
•Cassandra's time-series capabilities excel for message history storage
•Polyglot persistence: use the best tool for each specific data pattern

3. Netflix Recommendation Engine Data

Context

Netflix uses multiple databases for their recommendation system

Metrics

User Profiles

230M+ subscribers

MySQL Usage

User accounts & billing

Cassandra Usage

Viewing history & preferences

Graph Database

Content relationships

Outcome

Three-database strategy: MySQL for transactional data, Cassandra for time-series viewing data, graph DB for content relationships.

Key Lessons

•ACID compliance crucial for billing and account management (MySQL)
•Time-series viewing data benefits from Cassandra's partition key design
•Graph databases excel at "users who watched X also watched Y" queries
•Each database optimized for specific access patterns and consistency requirements

4. Airbnb Search and Booking System

Context

Airbnb evolved from MySQL to a multi-database architecture

Metrics

Property Listings

6M+ listings

MySQL Shards

1000+ shards

Elasticsearch

Search & filtering

Redis

Session & caching

Outcome

Kept MySQL for transactional integrity, added Elasticsearch for complex search, Redis for performance. Horizontal sharding enabled massive scale.

Key Lessons

•Booking transactions require ACID compliance - kept MySQL for core business logic
•Geographic and amenity search requires full-text capabilities - added Elasticsearch
•Aggressive caching with Redis reduced database load during traffic spikes
•Careful sharding strategy based on geographic regions improved performance

5. Uber Real-time Location Tracking

Context

Uber built custom database solutions for real-time location data

Metrics

Location Updates

1M+ updates/sec

Redis Usage

Current driver locations

MySQL Usage

Trip records & payments

Custom DB

Historical location data

Outcome

Built Schemaless (custom NoSQL) for location history, Redis for real-time data, MySQL for transactions. Each optimized for specific use case.

Key Lessons

•Real-time location data has different requirements than historical location data
•Redis in-memory storage perfect for current driver locations with TTL
•Custom database built when existing solutions couldn't meet specific requirements
•Financial transactions stayed on MySQL for ACID compliance and regulatory requirements

6. GitHub Repository and Code Storage

Context

GitHub uses MySQL for metadata and Git for actual code storage

Metrics

Repositories

200M+ repositories

MySQL Usage

Repo metadata & users

Git Storage

Actual code & history

Elasticsearch

Code search

Outcome

Leveraged Git's distributed nature for code storage, MySQL for relational metadata, Elasticsearch for search. Each tool optimized for its purpose.

Key Lessons

•Git is already a distributed database for code - don't reinvent it
•User accounts, permissions, and metadata fit well in traditional relational model
•Code search across millions of repositories requires specialized search technology
•File storage and metadata storage have very different scaling and consistency needs

No quiz questions available

Quiz ID "database-fundamentals" not found