Database Fundamentals

20 min readBeginner
Not Started
Loading...

Master the foundation of data storage and retrieval. Learn when to use SQL vs NoSQL, understand database types, and make informed decisions for your system design.

Database Types & When to Use Them

Different database types excel at different problems. Understanding their strengths and weaknesses helps you choose the right tool for your specific use case.

1

Relational (SQL)

Examples
PostgreSQL
MySQL
Oracle
SQL Server
Strengths
  • ACID compliance
  • Complex queries
  • Data consistency
  • Mature ecosystem
Weaknesses
  • Vertical scaling limits
  • Schema rigidity
  • JOIN performance
Best Use Cases
  • Financial transactions
  • User accounts
  • Inventory management
  • Reporting systems
2

Document (NoSQL)

Examples
MongoDB
CouchDB
Amazon DocumentDB
Strengths
  • Flexible schema
  • JSON-like structure
  • Horizontal scaling
  • Developer friendly
Weaknesses
  • No complex queries
  • Data duplication
  • Eventual consistency
Best Use Cases
  • Content management
  • Catalogs
  • User profiles
  • Real-time analytics
3

Key-Value

Examples
Redis
DynamoDB
Cassandra
Riak
Strengths
  • Ultra-fast reads
  • Simple model
  • Massive scale
  • High availability
Weaknesses
  • No complex queries
  • Limited relationships
  • No transactions
Best Use Cases
  • Caching
  • Session storage
  • Shopping carts
  • Real-time recommendations
4

Graph

Examples
Neo4j
Amazon Neptune
ArangoDB
Strengths
  • Relationship queries
  • Complex connections
  • Path finding
  • Pattern matching
Weaknesses
  • Complex setup
  • Limited tools
  • Steep learning curve
Best Use Cases
  • Social networks
  • Fraud detection
  • Recommendation engines
  • Network analysis

CAP Theorem in Practice

CAP Theorem states you can only guarantee 2 out of 3: Consistency, Availability, Partition tolerance. In distributed systems, network partitions are inevitable, so you must choose between consistency and availability.

Consistency (C)

All nodes see the same data simultaneously

Example: Banking systems - account balance must be accurate across all systems

Availability (A)

System remains operational at all times

Example: Social media - users can always post, even if some data is temporarily inconsistent

Partition Tolerance (P)

System continues despite network failures

Example: Must handle when servers in different data centers cannot communicate

CP Systems (Consistency + Partition Tolerance)

Examples
MongoDB, Redis Cluster, HBase
Trade-off
System may become unavailable during network partitions to maintain data consistency
Use Cases
Financial systems, inventory management, critical data integrity

AP Systems (Availability + Partition Tolerance)

Examples
Cassandra, DynamoDB, CouchDB
Trade-off
Different nodes may return different data temporarily (eventual consistency)
Use Cases
Social media, content delivery, real-time analytics

Database Scaling Strategies

Vertical Scaling (Scale Up)

Approach
Add more CPU, RAM, or storage to existing server
Cost
Exponentially expensive at scale
$$$$
Complexity
No code changes needed
Low
Limits
Hardware limitations
High
Best for: Small to medium applications, when you need immediate performance boost

Horizontal Scaling (Scale Out)

Approach
Add more servers to distribute the load
Cost
Linear cost scaling
$$
Complexity
Requires architectural changes
High
Limits
Nearly unlimited scale
Low
Best for: Large-scale applications, when unlimited growth is expected

Common Horizontal Scaling Techniques

Read Replicas

Create read-only copies of your database to handle read traffic

Good for: Read-heavy workloads (blogs, catalogs)

Sharding

Split data across multiple databases based on shard key

Good for: Large datasets, write-heavy workloads

Federation

Split databases by function (users, products, orders)

Good for: Microservices, domain separation

Database Selection Framework

Use this decision tree to choose the right database type for your specific requirements.

Step 1: Data Structure

Choose SQL if:
  • • Complex relationships between entities
  • • Need complex queries and aggregations
  • • ACID compliance is critical
  • • Structured data with fixed schema
Choose NoSQL if:
  • • Flexible or evolving schema
  • • Simple queries, key-based access
  • • Horizontal scaling requirements
  • • Semi-structured or unstructured data

Step 2: Scale Requirements

Small Scale (< 1M records)

Any database will work. Choose based on team expertise.

Medium Scale (1M - 100M records)

SQL with proper indexing and read replicas usually sufficient.

Large Scale (> 100M records)

Consider NoSQL, sharding, or distributed SQL systems.

Database Quick Reference

When in Doubt

  • • Start with PostgreSQL (best general-purpose DB)
  • • Add Redis for caching and sessions
  • • Consider read replicas before sharding
  • • Monitor before optimizing
  • • Avoid premature optimization

Red Flags

  • • Multiple database types without clear justification
  • • Choosing NoSQL only for "web scale"
  • • Ignoring data consistency requirements
  • • Not planning for growth patterns
  • • Choosing unfamiliar technology under pressure

🎯 Database Selection in the Real World

Learn from actual database decisions made by major tech companies

Scenarios

Instagram Photo Storage Migration
Instagram moved from MySQL to Cassandra for photo metadata storage
Discord Message Storage Architecture
Discord uses both MongoDB and Cassandra for different aspects of messaging
Netflix Recommendation Engine Data
Netflix uses multiple databases for their recommendation system
Airbnb Search and Booking System
Airbnb evolved from MySQL to a multi-database architecture
Uber Real-time Location Tracking
Uber built custom database solutions for real-time location data
GitHub Repository and Code Storage
GitHub uses MySQL for metadata and Git for actual code storage

Context

Instagram moved from MySQL to Cassandra for photo metadata storage

Metrics

Photo Uploads
95M photos/day
MySQL Performance
Struggling with sharding
Cassandra Performance
99.99% availability
Migration Result
Linear scaling achieved

Outcome

Cassandra's AP properties (availability + partition tolerance) perfectly matched Instagram's need for global photo storage with eventual consistency.

Key Lessons

  • Photo metadata doesn't require strong consistency - eventual consistency is acceptable
  • Cassandra's peer-to-peer architecture eliminated single points of failure
  • Linear scaling allowed Instagram to handle massive growth without complex sharding
  • Trade-off: Lost complex query capabilities but gained operational simplicity

📝 Database Fundamentals Quiz

1 of 5Current: 0/5

According to CAP theorem, which combination is impossible to achieve simultaneously in a distributed system?