System Designer

What is Riak?

Riak is a distributed key-value database designed for high availability, fault tolerance, and operational simplicity. Built on Amazon's Dynamo paper principles, it provides configurable consistency levels and automatic data distribution across nodes.

Key Features

• Masterless architecture with no single point of failure
• Tunable CAP theorem trade-offs (N, R, W values)
• Automatic data distribution via consistent hashing
• Built-in conflict resolution with vector clocks
• Multi-datacenter replication
• MapReduce for distributed computation

Riak Cluster Calculator

Cluster Nodes5 nodes

N Value (Replicas)3 replicas per object

R Value (Read Quorum)2 nodes must respond for reads

W Value (Write Quorum)2 nodes must acknowledge writes

Avg Object Size (KB)10 KB

Operations/sec10,000 ops/sec

Cluster Metrics

Raw Storage/Month:70313 GB

With Replication:210939 GB

Availability:99.94%

Consistency:Strong

Partition Tolerance:High

Node Failures Tolerated:2

CAP Theorem Configuration

Current Settings: N=3, R=2, W=2

High Availability

R=1, W=1 (Fast but eventual consistency)

• Fastest operations
• Tolerates more failures
• Risk of stale reads

Strong Consistency

R + W > N (Guaranteed consistency)

• Always consistent reads
• Higher latency
• Less fault tolerant

Balanced

R=2, W=2, N=3 (Common production)

• Good consistency
• Reasonable performance
• Tolerates 1 node failure

Real-World Examples

EA Sports

EA uses Riak for game statistics and player data across multiple game franchises.

• 50M+ player profiles globally distributed
• 99.99% uptime requirement for live games
• Multi-datacenter replication for low latency

Comcast

Comcast uses Riak for customer data and service provisioning across their network infrastructure.

• 30M+ customer records
• Geographic data distribution
• Integration with legacy billing systems

NHS (UK Healthcare)

NHS uses Riak for patient data storage requiring high availability and data sovereignty.

• 65M+ patient records
• Strict data locality requirements
• 24/7 availability for emergency services

Basic Operations

Python Client Example

import riak

# Connect to Riak cluster
client = riak.RiakClient(host='127.0.0.1', pb_port=8087)

# Create a bucket with custom properties
bucket = client.bucket('users')
bucket.set_properties({
    'n_val': 3,  # 3 replicas
    'r': 2,      # Read from 2 nodes
    'w': 2,      # Write to 2 nodes
    'pr': 1,     # Primary read quorum
    'pw': 1      # Primary write quorum
})

# Store an object
user_data = {
    'name': 'John Doe',
    'email': 'john@example.com',
    'created': '2024-01-15T10:00:00Z'
}

user_obj = bucket.new('user123', data=user_data)
user_obj.store()

# Retrieve an object
retrieved_user = bucket.get('user123')
print(retrieved_user.data['name'])  # 'John Doe'

# Update with conflict resolution
retrieved_user.data['last_login'] = '2024-01-16T14:30:00Z'
retrieved_user.store()

# Search using Secondary Indexes (2i)
bucket.new('user124', data={
    'name': 'Jane Smith',
    'email': 'jane@example.com',
    'department': 'engineering'
}).add_index('department_bin', 'engineering').store()

# Query by index
engineering_users = bucket.get_index('department_bin', 'engineering')
for key in engineering_users:
    user = bucket.get(key)
    print(f"User: {user.data['name']}")

# MapReduce example
mr = client.add('users')
mr.map('function(v) { var data = JSON.parse(v.values[0].data); return [data.department]; }')
mr.reduce('function(values) { return values.sort(); }')
departments = mr.run()
print(departments)

Best Practices

✅ Do

•Choose N, R, W values based on your consistency needs
•Use meaningful bucket names and key naming conventions
•Implement application-level conflict resolution
•Use secondary indexes (2i) for simple queries
•Monitor cluster health and ring status regularly

❌ Don't

•Use Riak for complex analytical queries
•Store large objects (>50MB) without chunking
•Ignore sibling conflicts in your application
•Set R=1, W=1 for critical consistent data
•Use MapReduce for real-time operations

No quiz questions available

Quiz ID "riak" not found