Skip to main contentSkip to user menuSkip to navigation

Snowflake

Master Snowflake: cloud data warehouse, virtual warehouses, data sharing, and performance optimization.

45 min readIntermediate
Not Started
Loading...

What is Snowflake?

Snowflake is a cloud-native data platform that delivers the performance, flexibility, and scalability of the cloud to enable every organization to be data-driven. Built from the ground up for the cloud, Snowflake's unique multi-cluster shared data architecture separates compute and storage to eliminate resource contention and enable unlimited scalability.

Unlike traditional data warehouses, Snowflake runs entirely in the cloud with zero infrastructure management required. It provides instant elasticity, pay-per-second billing, and revolutionary features like zero-copy cloning, Time Travel, and secure data sharing. Snowflake operates across AWS, Azure, and GCP, enabling true multi-cloud flexibility for modern data architectures.

Snowflake Performance & Cost Calculator

16
Credits/Hour
40
Queries/Min
250GB
Compressed Size
32
Max Concurrent

Monthly Storage Cost: ~$8(includes Time Travel)

Est. Compute Cost: ~$6912/month (30% utilization)

Snowflake Core Features

Multi-Cluster Architecture

Separates compute and storage for independent scaling without resource contention.

• Unlimited concurrent users
• Auto-scaling virtual warehouses
• Pay-per-second billing
• Zero-maintenance cloud-native
• Cross-cloud compatibility

Zero-Copy Cloning

Instantly create full copies of databases without duplicating storage.

• Instant database/table clones
• No additional storage cost
• Copy-on-write technology
• Point-in-time cloning
• Development environment setup

Time Travel & Fail-safe

Built-in data protection with historical querying up to 90 days.

• Query historical data states
• Recover from accidental changes
• UNDROP functionality
• Automatic data protection
• 7-day Fail-safe period

Secure Data Sharing

Live data sharing across organizations without copying or moving data.

• Real-time data sharing
• Cross-cloud sharing
• Secure access controls
• Data marketplace integration
• Reader accounts for consumers

Real-World Snowflake Implementations

Capital One

Modernized data architecture serving millions of customers with real-time analytics.

  • • 150+ petabytes of data processed
  • • Real-time fraud detection
  • • Customer 360-degree analytics
  • • Multi-cloud deployment strategy

DoorDash

Powers real-time logistics optimization and marketplace analytics.

  • • Real-time delivery optimization
  • • Driver and restaurant analytics
  • • Dynamic pricing algorithms
  • • Multi-region data replication

Nielsen

Monetizes data assets through Snowflake Data Marketplace for media measurement.

  • • Global TV and digital measurement
  • • Data marketplace revenue streams
  • • Cross-border data sharing
  • • Consumer behavior analytics

Western Union

Processes global financial transactions with advanced fraud detection.

  • • 500M+ annual transactions
  • • Real-time compliance monitoring
  • • Cross-border regulatory reporting
  • • Machine learning fraud prevention

Snowflake Configuration Examples

Multi-Cluster Warehouse Setup

warehouse-configuration.sql
-- Create auto-scaling warehouse for analytics workloads
CREATE WAREHOUSE analytics_wh WITH
    WAREHOUSE_SIZE = 'LARGE'
    AUTO_SUSPEND = 60                    -- Auto-suspend after 1 minute
    AUTO_RESUME = TRUE                   -- Auto-resume on query
    MIN_CLUSTER_COUNT = 1                -- Minimum clusters
    MAX_CLUSTER_COUNT = 10               -- Scale to 10 clusters
    SCALING_POLICY = 'STANDARD'          -- Auto-scale based on queue
    COMMENT = 'Analytics warehouse with auto-scaling';

-- ETL warehouse with cost controls
CREATE WAREHOUSE etl_wh WITH
    WAREHOUSE_SIZE = 'X-LARGE'
    AUTO_SUSPEND = 300                   -- 5-minute suspend for batch jobs
    RESOURCE_MONITOR = 'monthly_budget'
    INITIALLY_SUSPENDED = TRUE;

-- Resource monitor for cost governance
CREATE RESOURCE MONITOR monthly_budget WITH
    CREDIT_QUOTA = 1000                  -- 1000 credit monthly limit
    FREQUENCY = MONTHLY
    START_TIMESTAMP = CURRENT_TIMESTAMP()
    TRIGGERS ON 75 PERCENT DO NOTIFY
             ON 90 PERCENT DO SUSPEND
             ON 100 PERCENT DO SUSPEND_IMMEDIATE;

Time Travel & Cloning

time-travel-cloning.sql
-- Query historical data using Time Travel
SELECT * FROM orders 
    AT(TIMESTAMP => '2024-01-15 14:30:00'::timestamp)
WHERE order_date = '2024-01-15';

-- Zero-copy clone for development environment
CREATE DATABASE dev_analytics CLONE prod_analytics
    COMMENT = 'Development copy of production analytics';

-- Clone to specific point in time
CREATE TABLE customer_backup CLONE customers
    AT(OFFSET => -3600);  -- 1 hour ago (in seconds)

-- Undrop accidentally dropped objects
UNDROP TABLE customers;
UNDROP DATABASE analytics;

-- Streams and Tasks for real-time processing
CREATE STREAM order_changes ON TABLE raw.orders;

CREATE TASK process_orders
    WAREHOUSE = etl_wh
    SCHEDULE = '5 MINUTE'
    WHEN SYSTEM$STREAM_HAS_DATA('order_changes')
AS
    INSERT INTO analytics.order_summary
    SELECT order_date, COUNT(*), SUM(total)
    FROM order_changes
    WHERE metadata$action = 'INSERT'
    GROUP BY order_date;

-- Start the task
ALTER TASK process_orders RESUME;

Snowflake Best Practices

✅ Do

  • • Use auto-suspend to minimize compute costs
  • • Leverage zero-copy cloning for dev/test environments
  • • Implement resource monitors for cost governance
  • • Use clustering keys for large, frequently-queried tables
  • • Take advantage of result caching and pruning
  • • Use Streams and Tasks for real-time data pipelines
  • • Monitor credit usage and query performance
  • • Optimize warehouse sizing for workload patterns

❌ Don't

  • • Keep warehouses running when not in use
  • • Use overly large warehouses for simple queries
  • • Ignore clustering for large tables with poor performance
  • • Over-engineer with too many small warehouses
  • • Forget to set Time Travel retention appropriately
  • • Skip resource monitoring and cost alerts
  • • Use SELECT * on wide tables unnecessarily
  • • Mix transactional and analytical workloads
No quiz questions available
Questions prop is empty