Design a URL Shortener (TinyURL/bit.ly)

Practice designing a scalable URL shortening service like TinyURL or bit.ly. Focus on data modeling, URL generation strategies, and handling scale.

System Requirements

Functional Requirements

  • Generate short URLs from long URLs
  • Redirect short URLs to original URLs
  • Custom short URL aliases (optional)
  • URL expiration (optional)
  • Analytics and click tracking

Non-Functional Requirements

  • Handle 100M URLs shortened per day
  • 100:1 read:write ratio (more redirects than creates)
  • Latency < 100ms for redirects
  • 99.9% availability
  • URLs should not be predictable

URL Generation Strategies

Base62 Encoding

Convert unique ID to base62 string

Pros

  • Simple implementation
  • Guaranteed uniqueness
  • Predictable length

Cons

  • Sequential (predictable)
  • Requires database counter

Example

ID 125 → "cb" in base62

Hash + Collision Resolution

Hash URL and handle collisions

Pros

  • Random-looking URLs
  • No database dependency
  • Fast generation

Cons

  • Collision handling complexity
  • Variable length

Example

MD5(url)[0:7] → "a1b2c3d"

UUID + Base62

Generate UUID then encode to base62

Pros

  • Guaranteed uniqueness
  • No collisions
  • Distributed generation

Cons

  • Longer URLs
  • More storage

Example

UUID → base62 → "2fK9mNq"

Capacity Estimation

Traffic & Storage

Read vs Write
1xWrite
100xRead
Daily Requests
100MCreates
10000MRedirects
Storage Growth
36TBYear 1
180TBYear 5

Key Metrics

Daily URL Creates
Peak: 2000 QPS
100M
Daily Redirects
Peak: 200K QPS
10B
Storage per URL
URL + metadata
500 bytes
Cache Hit Rate
Popular URLs cached
80%

Infrastructure Sizing

Web Servers
100+ servers for 200K QPS
Cache Layer
1TB Redis for hot URLs
Database
36TB storage, sharded MySQL

System Architecture

Client Apps → CDN → Load Balancer
Web Servers (URL Shortening Service)
↓ ← Cache Layer (Redis)
Application Layer (Base62 Encoder, Validator)
Database Cluster (Sharded MySQL/PostgreSQL)
Analytics Pipeline (Kafka → ClickHouse)

API Layer

• POST /shorten - Create short URL
• GET /:shortCode - Redirect
• GET /analytics/:id - Stats
• DELETE /:shortCode - Delete

Caching Strategy

• Popular URLs (80% traffic)
• TTL: 24 hours
• Cache-aside pattern
• Separate cache for analytics

Scaling Strategy

• Horizontal web server scaling
• Database sharding by hash
• Read replicas for analytics
• CDN for global distribution

Database Schema

urls table

id (Primary Key) short_url (Unique Index) long_url created_at expires_at (nullable) user_id (nullable) click_count (default 0)

clicks table

id (Primary Key) url_id (Foreign Key) clicked_at ip_address user_agent referrer

Database Sharding Strategy

Sharding Approaches

1

Hash-based Sharding

Shard = hash(short_url) % num_shards. Ensures even distribution.

Pros:
Even distribution, simple logic
Cons:
Resharding is difficult
2

Range-based Sharding

Partition by short_url ranges: [a-f], [g-m], [n-s], [t-z].

Pros:
Easy range queries, predictable
Cons:
Hotspots, uneven distribution

Recommended: Consistent Hashing

Hash Ring
Virtual nodes for even distribution
Easy Scaling
Add/remove shards with minimal rehashing
Fault Tolerance
Automatic failover to adjacent nodes

Replication Strategy

Master-Slave Setup
Each shard has 2 read replicas
1:2 ratio
Write Latency
Async replication to slaves
< 10ms
Read Distribution
Redirect traffic to read replicas
70% slaves
Failover Time
Automatic master promotion
< 30 seconds

Practice Questions

1

How would you handle 100M URLs being created per day? What are the database scaling strategies?

2

Design a caching strategy for redirects. What data should be cached and for how long?

3

How would you prevent abuse (spam URLs, malicious redirects)? Design rate limiting and content filtering.

4

Implement custom aliases feature. How do you handle conflicts and reservations?

5

Design real-time analytics. How do you track clicks without impacting redirect latency?