Skip to main contentSkip to user menuSkip to navigation

Amazon Redshift

Master Amazon Redshift: cloud data warehouse, performance tuning, distribution styles, and cost optimization.

45 min readIntermediate
Not Started
Loading...

What is Amazon Redshift?

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service in the cloud. It uses columnar storage, data compression, and zone maps to deliver fast query performance on datasets ranging from hundreds of gigabytes to exabytes. Redshift's massively parallel processing (MPP) architecture distributes and parallelizes queries across multiple nodes for exceptional performance.

Built for analytics workloads, Redshift provides 10x better compression than traditional row-based systems and integrates seamlessly with the broader AWS ecosystem. It's used by companies like Netflix for petabyte-scale analytics, by McDonald's for customer insights, and by NASA's Jet Propulsion Laboratory for space mission data analysis.

Redshift Cluster Calculator

$1701.92
Monthly Cost
30
Queries/Hour
6
Concurrent Users
3.33 TB
Compressed Size

Total vCPUs: 12

Total Memory: 96 GB

Storage Savings: $163.84/month

Redshift Architecture & Features

Columnar Storage

Data stored by column for analytical query optimization.

• 10x better compression ratios
• Query only necessary columns
• Advanced compression algorithms
• Zone maps for data pruning
• Optimized for aggregations

Distribution Styles

Strategic data distribution across cluster nodes.

• KEY: Co-locate related data
• ALL: Copy small tables to all nodes
• EVEN: Balanced distribution
• AUTO: ML-optimized placement
• Minimize network traffic

Massively Parallel Processing

Distribute queries across up to 128 nodes.

• Leader node coordinates queries
• Compute nodes process data
• Parallel execution across slices
• Network-optimized architecture
• Linear performance scaling

Redshift Spectrum

Query exabytes of data in S3 without loading.

• Serverless S3 querying
• Petabyte-scale analytics
• Multiple data formats
• Partition-aware processing
• Cost-effective storage

Real-World Redshift Implementations

Netflix

Analyzes 1+ petabyte of viewing data daily to power recommendation algorithms and content decisions.

  • • 1+ PB of data analyzed daily
  • • 100+ billion events processed
  • • Real-time recommendation updates
  • • A/B testing and experimentation

McDonald's

Processes data from 40,000+ restaurants worldwide for customer insights and operational optimization.

  • • 40,000+ restaurant locations
  • • Customer journey analytics
  • • Menu optimization insights
  • • Operational efficiency analysis

NASA JPL

Analyzes space mission data including Mars rover telemetry and deep space observation data.

  • • Space mission telemetry data
  • • Mars rover operational analysis
  • • Deep space observation processing
  • • Scientific research analytics

Philips Healthcare

Processes healthcare data from millions of medical devices for population health insights.

  • • Medical device data aggregation
  • • Population health analytics
  • • Clinical outcomes research
  • • Healthcare quality metrics

Advanced Redshift Capabilities

Workload Management (WLM)

Intelligent query queue management with automatic resource allocation.

WLM Configuration Example
-- Check current WLM configuration
SELECT * FROM STV_WLM_CLASSIFICATION_CONFIG;

-- Monitor query performance by queue
SELECT 
    service_class,
    service_class_name,
    avg_execution_time,
    max_execution_time,
    total_queue_time
FROM STL_WLM_QUERY 
WHERE starttime >= current_date - 1;

Materialized Views

Pre-computed results that automatically accelerate similar queries.

Materialized View Example
-- Create materialized view for common aggregations
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT 
    DATE_TRUNC('day', order_date) as sale_date,
    product_category,
    SUM(amount) as total_revenue,
    COUNT(*) as order_count
FROM sales
GROUP BY 1, 2;

-- Automatic refresh
REFRESH MATERIALIZED VIEW daily_sales_summary;

Concurrency Scaling

Automatically add compute capacity during peak workload periods.

Features:
• Automatic cluster scaling during peak usage
• Transparent to applications and users
• Pay only for additional compute time
• Maintains consistent query performance
• Handles thousands of concurrent queries

Redshift Best Practices

✅ Do

  • • Use appropriate distribution keys for large tables
  • • Choose compound sort keys for time-series data
  • • Leverage columnar compression (ANALYZE COMPRESSION)
  • • Use COPY command for bulk data loading
  • • Implement materialized views for frequently accessed aggregations
  • • Monitor WLM queue performance and adjust
  • • Use VACUUM to maintain sort order
  • • Leverage Spectrum for cold/archival data

❌ Don't

  • • Use Redshift for OLTP/transactional workloads
  • • Create too many small tables (use fewer, wider tables)
  • • Use single-column distribution keys with low cardinality
  • • Ignore table maintenance (VACUUM and ANALYZE)
  • • Load data row-by-row (use bulk COPY operations)
  • • Over-normalize data (denormalization is often better)
  • • Use reserved words as column names
  • • Forget to set table statistics after data loading

Performance Optimization Guidelines

Data Distribution
Choose appropriate DISTKEY
Minimize data movement
Sort Keys
Optimize for query patterns
Time-series and filters
Maintenance
Regular VACUUM and ANALYZE
Keep performance optimal
No quiz questions available
Questions prop is empty