Skip to main contentSkip to user menuSkip to navigation

Riak

Master Riak: distributed key-value database with high availability, fault tolerance, and operational simplicity.

40 min readAdvanced
Not Started
Loading...

What is Riak?

Riak is a distributed key-value database designed for high availability, fault tolerance, and operational simplicity. Built on Amazon's Dynamo paper principles, it provides configurable consistency levels and automatic data distribution across nodes.

Key Features

  • • Masterless architecture with no single point of failure
  • • Tunable CAP theorem trade-offs (N, R, W values)
  • • Automatic data distribution via consistent hashing
  • • Built-in conflict resolution with vector clocks
  • • Multi-datacenter replication
  • • MapReduce for distributed computation

Riak Cluster Calculator

5 nodes
3 replicas per object
2 nodes must respond for reads
2 nodes must acknowledge writes
10 KB
10,000 ops/sec

Cluster Metrics

Raw Storage/Month:70313 GB
With Replication:210939 GB
Availability:99.94%
Consistency:Strong
Partition Tolerance:High
Node Failures Tolerated:2

CAP Theorem Configuration

Current Settings: N=3, R=2, W=2

High Availability

R=1, W=1 (Fast but eventual consistency)

  • • Fastest operations
  • • Tolerates more failures
  • • Risk of stale reads

Strong Consistency

R + W > N (Guaranteed consistency)

  • • Always consistent reads
  • • Higher latency
  • • Less fault tolerant

Balanced

R=2, W=2, N=3 (Common production)

  • • Good consistency
  • • Reasonable performance
  • • Tolerates 1 node failure

Real-World Examples

EA Sports

EA uses Riak for game statistics and player data across multiple game franchises.

  • • 50M+ player profiles globally distributed
  • • 99.99% uptime requirement for live games
  • • Multi-datacenter replication for low latency

Comcast

Comcast uses Riak for customer data and service provisioning across their network infrastructure.

  • • 30M+ customer records
  • • Geographic data distribution
  • • Integration with legacy billing systems

NHS (UK Healthcare)

NHS uses Riak for patient data storage requiring high availability and data sovereignty.

  • • 65M+ patient records
  • • Strict data locality requirements
  • • 24/7 availability for emergency services

Basic Operations

Python Client Example
import riak

# Connect to Riak cluster
client = riak.RiakClient(host='127.0.0.1', pb_port=8087)

# Create a bucket with custom properties
bucket = client.bucket('users')
bucket.set_properties({
    'n_val': 3,  # 3 replicas
    'r': 2,      # Read from 2 nodes
    'w': 2,      # Write to 2 nodes
    'pr': 1,     # Primary read quorum
    'pw': 1      # Primary write quorum
})

# Store an object
user_data = {
    'name': 'John Doe',
    'email': 'john@example.com',
    'created': '2024-01-15T10:00:00Z'
}

user_obj = bucket.new('user123', data=user_data)
user_obj.store()

# Retrieve an object
retrieved_user = bucket.get('user123')
print(retrieved_user.data['name'])  # 'John Doe'

# Update with conflict resolution
retrieved_user.data['last_login'] = '2024-01-16T14:30:00Z'
retrieved_user.store()

# Search using Secondary Indexes (2i)
bucket.new('user124', data={
    'name': 'Jane Smith',
    'email': 'jane@example.com',
    'department': 'engineering'
}).add_index('department_bin', 'engineering').store()

# Query by index
engineering_users = bucket.get_index('department_bin', 'engineering')
for key in engineering_users:
    user = bucket.get(key)
    print(f"User: {user.data['name']}")

# MapReduce example
mr = client.add('users')
mr.map('function(v) { var data = JSON.parse(v.values[0].data); return [data.department]; }')
mr.reduce('function(values) { return values.sort(); }')
departments = mr.run()
print(departments)

Best Practices

✅ Do

  • Choose N, R, W values based on your consistency needs
  • Use meaningful bucket names and key naming conventions
  • Implement application-level conflict resolution
  • Use secondary indexes (2i) for simple queries
  • Monitor cluster health and ring status regularly

❌ Don't

  • Use Riak for complex analytical queries
  • Store large objects (>50MB) without chunking
  • Ignore sibling conflicts in your application
  • Set R=1, W=1 for critical consistent data
  • Use MapReduce for real-time operations
No quiz questions available
Quiz ID "riak" not found