What is Amazon Redshift?
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service in the cloud. It uses columnar storage, data compression, and zone maps to deliver fast query performance on datasets ranging from hundreds of gigabytes to exabytes. Redshift's massively parallel processing (MPP) architecture distributes and parallelizes queries across multiple nodes for exceptional performance.
Built for analytics workloads, Redshift provides 10x better compression than traditional row-based systems and integrates seamlessly with the broader AWS ecosystem. It's used by companies like Netflix for petabyte-scale analytics, by McDonald's for customer insights, and by NASA's Jet Propulsion Laboratory for space mission data analysis.
Redshift Cluster Calculator
Total vCPUs: 12
Total Memory: 96 GB
Storage Savings: $163.84/month
Redshift Architecture & Features
Columnar Storage
Data stored by column for analytical query optimization.
• Query only necessary columns
• Advanced compression algorithms
• Zone maps for data pruning
• Optimized for aggregations
Distribution Styles
Strategic data distribution across cluster nodes.
• ALL: Copy small tables to all nodes
• EVEN: Balanced distribution
• AUTO: ML-optimized placement
• Minimize network traffic
Massively Parallel Processing
Distribute queries across up to 128 nodes.
• Compute nodes process data
• Parallel execution across slices
• Network-optimized architecture
• Linear performance scaling
Redshift Spectrum
Query exabytes of data in S3 without loading.
• Petabyte-scale analytics
• Multiple data formats
• Partition-aware processing
• Cost-effective storage
Real-World Redshift Implementations
Netflix
Analyzes 1+ petabyte of viewing data daily to power recommendation algorithms and content decisions.
- • 1+ PB of data analyzed daily
- • 100+ billion events processed
- • Real-time recommendation updates
- • A/B testing and experimentation
McDonald's
Processes data from 40,000+ restaurants worldwide for customer insights and operational optimization.
- • 40,000+ restaurant locations
- • Customer journey analytics
- • Menu optimization insights
- • Operational efficiency analysis
NASA JPL
Analyzes space mission data including Mars rover telemetry and deep space observation data.
- • Space mission telemetry data
- • Mars rover operational analysis
- • Deep space observation processing
- • Scientific research analytics
Philips Healthcare
Processes healthcare data from millions of medical devices for population health insights.
- • Medical device data aggregation
- • Population health analytics
- • Clinical outcomes research
- • Healthcare quality metrics
Advanced Redshift Capabilities
Workload Management (WLM)
Intelligent query queue management with automatic resource allocation.
-- Check current WLM configuration
SELECT * FROM STV_WLM_CLASSIFICATION_CONFIG;
-- Monitor query performance by queue
SELECT
service_class,
service_class_name,
avg_execution_time,
max_execution_time,
total_queue_time
FROM STL_WLM_QUERY
WHERE starttime >= current_date - 1;
Materialized Views
Pre-computed results that automatically accelerate similar queries.
-- Create materialized view for common aggregations
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
DATE_TRUNC('day', order_date) as sale_date,
product_category,
SUM(amount) as total_revenue,
COUNT(*) as order_count
FROM sales
GROUP BY 1, 2;
-- Automatic refresh
REFRESH MATERIALIZED VIEW daily_sales_summary;
Concurrency Scaling
Automatically add compute capacity during peak workload periods.
• Automatic cluster scaling during peak usage
• Transparent to applications and users
• Pay only for additional compute time
• Maintains consistent query performance
• Handles thousands of concurrent queries
Redshift Best Practices
✅ Do
- • Use appropriate distribution keys for large tables
- • Choose compound sort keys for time-series data
- • Leverage columnar compression (ANALYZE COMPRESSION)
- • Use COPY command for bulk data loading
- • Implement materialized views for frequently accessed aggregations
- • Monitor WLM queue performance and adjust
- • Use VACUUM to maintain sort order
- • Leverage Spectrum for cold/archival data
❌ Don't
- • Use Redshift for OLTP/transactional workloads
- • Create too many small tables (use fewer, wider tables)
- • Use single-column distribution keys with low cardinality
- • Ignore table maintenance (VACUUM and ANALYZE)
- • Load data row-by-row (use bulk COPY operations)
- • Over-normalize data (denormalization is often better)
- • Use reserved words as column names
- • Forget to set table statistics after data loading