Design a URL Shortener (TinyURL/bit.ly)
Practice designing a scalable URL shortening service like TinyURL or bit.ly. Focus on data modeling, URL generation strategies, and handling scale.
System Requirements
Functional Requirements
- Generate short URLs from long URLs
- Redirect short URLs to original URLs
- Custom short URL aliases (optional)
- URL expiration (optional)
- Analytics and click tracking
Non-Functional Requirements
- Handle 100M URLs shortened per day
- 100:1 read:write ratio (more redirects than creates)
- Latency < 100ms for redirects
- 99.9% availability
- URLs should not be predictable
URL Generation Strategies
Base62 Encoding
Convert unique ID to base62 string
Pros
- • Simple implementation
- • Guaranteed uniqueness
- • Predictable length
Cons
- • Sequential (predictable)
- • Requires database counter
Example
ID 125 → "cb" in base62
Hash + Collision Resolution
Hash URL and handle collisions
Pros
- • Random-looking URLs
- • No database dependency
- • Fast generation
Cons
- • Collision handling complexity
- • Variable length
Example
MD5(url)[0:7] → "a1b2c3d"
UUID + Base62
Generate UUID then encode to base62
Pros
- • Guaranteed uniqueness
- • No collisions
- • Distributed generation
Cons
- • Longer URLs
- • More storage
Example
UUID → base62 → "2fK9mNq"
Capacity Estimation
Traffic & Storage
Key Metrics
Infrastructure Sizing
System Architecture
API Layer
Caching Strategy
Scaling Strategy
Database Schema
urls table
id (Primary Key)
short_url (Unique Index)
long_url
created_at
expires_at (nullable)
user_id (nullable)
click_count (default 0)
clicks table
id (Primary Key)
url_id (Foreign Key)
clicked_at
ip_address
user_agent
referrer
Database Sharding Strategy
Sharding Approaches
Hash-based Sharding
Shard = hash(short_url) % num_shards. Ensures even distribution.
Range-based Sharding
Partition by short_url ranges: [a-f], [g-m], [n-s], [t-z].
Recommended: Consistent Hashing
Replication Strategy
Practice Questions
How would you handle 100M URLs being created per day? What are the database scaling strategies?
Design a caching strategy for redirects. What data should be cached and for how long?
How would you prevent abuse (spam URLs, malicious redirects)? Design rate limiting and content filtering.
Implement custom aliases feature. How do you handle conflicts and reservations?
Design real-time analytics. How do you track clicks without impacting redirect latency?