Skip to main contentSkip to user menuSkip to navigation

Amazon Athena

Master Amazon Athena: serverless SQL analytics, S3 querying, partitioning, and cost optimization.

35 min readIntermediate
Not Started
Loading...

What is Amazon Athena?

Amazon Athena is a serverless interactive query service that enables SQL analysis directly on data stored in Amazon S3. Built on the open-source Trino (formerly PrestoSQL) engine, Athena allows you to analyze petabytes of data without managing infrastructure, paying only for queries you run.

Perfect for ad-hoc data analysis, log analytics, business intelligence, and data lake exploration. Athena integrates seamlessly with AWS Glue Data Catalog for metadata management and supports federated queries across multiple data sources including RDS, DynamoDB, and third-party databases.

Athena Cost Calculator

Cost Model

  • • Pay-per-query: $5 per TB scanned
  • • No upfront costs or minimum fees
  • • Columnar formats reduce scan volume
  • • Partition pruning minimizes costs

Optimization Tips

  • • Use Parquet format (10x cheaper)
  • • Implement date partitioning
  • • Compress data with GZIP/Snappy
  • • Select only needed columns

Example Costs

1TB CSV scan:$5.00
1TB Parquet scan:$0.50
Partitioned query (100GB):$0.05
Monthly savings with optimization:90%+

Core Concepts

Serverless Architecture

Amazon Athena operates as a fully managed, serverless interactive query service

Purpose:

Enable ad-hoc SQL analysis on S3 data without infrastructure management or capacity planning

Key Features:
  • No servers to provision, patch, or manage
  • Automatic scaling based on query complexity
  • Pay-per-query pricing model (per TB scanned)
  • Integration with AWS Glue Data Catalog
  • JDBC/ODBC connectivity for BI tools
Example:

Analyze years of log files stored in S3 with a simple SQL query, paying only for data scanned

Code Example:
-- Query web logs directly from S3
SELECT 
    DATE(timestamp) as date,
    COUNT(*) as page_views,
    COUNT(DISTINCT ip_address) as unique_visitors
FROM "web_logs"."access_logs"
WHERE year = '2024' 
    AND month = '03'
    AND status_code = 200
GROUP BY DATE(timestamp)
ORDER BY date DESC
LIMIT 30;

-- Cost: Only pay for data scanned (compressed files = lower cost)

Presto Query Engine

Built on open-source Trino (formerly PrestoSQL) for distributed SQL processing

Data Catalog Integration

Seamless integration with AWS Glue Data Catalog for schema and metadata management

Query Optimization

Built-in optimization techniques to minimize data scanning and reduce query costs

Data Formats & Performance

Support for multiple data formats with significant performance differences

Real-World Use Cases

Log Analytics & Monitoring

Technology, E-commerce, SaaS

Analyze application logs, web server logs, and CloudTrail data for insights and troubleshooting

Key Benefits

  • Real-time analysis of streaming log data
  • Cost-effective storage of historical logs in S3
  • Integration with CloudWatch and X-Ray
  • Custom dashboards and alerting

Implementation

CloudTrail → S3 → Glue Crawler → Athena queries → QuickSight dashboards

Business Intelligence & Reporting

Retail, Finance, Healthcare

Enable self-service analytics for business users with familiar SQL interface

Key Benefits

  • Direct connection to popular BI tools (Tableau, PowerBI)
  • No ETL required for ad-hoc analysis
  • Governed access through IAM and Lake Formation
  • Cost predictability with query-based pricing

Implementation

Data Lake → Glue Catalog → Athena → BI Tools (Tableau/PowerBI)

Data Lake Exploration

Research, Analytics, Data Science

Discover and explore raw data in data lakes before building formal pipelines

Key Benefits

  • Schema-on-read for flexible data exploration
  • No upfront infrastructure investment
  • Support for nested and complex data types
  • Integration with Jupyter notebooks and data science tools

Implementation

Raw data → S3 → Athena exploration → Validated schemas → Production pipelines

Cost Optimization Analytics

Cloud Operations, FinOps, DevOps

Analyze AWS billing and usage data to identify cost optimization opportunities

Key Benefits

  • Direct analysis of Cost and Usage Reports (CUR)
  • Custom cost allocation and chargeback reports
  • Trend analysis and forecasting
  • Integration with cost management tools

Implementation

CUR → S3 → Athena queries → Cost dashboards and alerts

Security & Compliance Analysis

Financial Services, Healthcare, Government

Analyze security logs and compliance data across AWS services

Key Benefits

  • Centralized security log analysis
  • Compliance reporting and auditing
  • Threat detection and investigation
  • Integration with security tools (GuardDuty, Security Hub)

Implementation

Security logs → S3 → Athena analysis → Security dashboards and alerts

No quiz questions available
Quiz ID "athena" not found