What is Prometheus?
Prometheus is an open-source monitoring and alerting toolkit originally built at SoundCloud. It features a multi-dimensional data model with time series data identified by metric name and key/value pairs, a flexible query language (PromQL), and autonomous single server nodes with no reliance on distributed storage.
Unlike traditional monitoring systems, Prometheus uses a pull-based model to collect metrics, making it ideal for dynamic, containerized environments where services can appear and disappear frequently.
Prometheus Sizing Calculator
Daily Storage: 412MB/day
Query Latency: ~200ms
Prometheus Architecture Components
Prometheus Server
Core component that scrapes, stores, and serves metrics.
• Store in local TSDB
• Serve PromQL queries
Alertmanager
Handles alerts from Prometheus server and routes them.
• Grouping & routing
• Silencing & inhibition
Exporters
Collect metrics from third-party systems and expose them.
• MySQL exporter
• Custom application metrics
Service Discovery
Automatically discover targets in dynamic environments.
• Consul, EC2, DNS
• File-based discovery
PromQL Query Examples
Basic Metrics
cpu_usage_percent{instance="server1"}
# Memory usage rate
rate(memory_usage_bytes[5m])
Aggregation Queries
avg(cpu_usage_percent) by (job)
# 95th percentile response time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Alerting Rules
avg(cpu_usage_percent) by (instance) > 80
# High error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.1
Real-World Prometheus Implementations
SoundCloud
Original creators of Prometheus for monitoring their microservices architecture.
- • 200+ microservices monitoring
- • Real-time alerting for audio streaming
- • Custom metrics for user engagement
- • 500k time series, 50k samples/sec
DigitalOcean
Uses Prometheus for infrastructure monitoring and customer-facing metrics.
- • Droplet performance monitoring
- • Load balancer health checks
- • Database cluster monitoring
- • 1M+ time series across regions
GitLab
Monitors their DevOps platform with Prometheus and Grafana.
- • CI/CD pipeline monitoring
- • Git repository performance
- • Application performance metrics
- • Container orchestration monitoring
CoreOS
Integrated Prometheus into their Container Linux for system monitoring.
- • Kubernetes cluster monitoring
- • etcd performance tracking
- • Container resource utilization
- • System-level metrics collection
Prometheus Best Practices
✅ Do
- • Use meaningful metric and label names
- • Keep cardinality low (avoid user IDs in labels)
- • Use histogram for latency measurements
- • Set appropriate scrape intervals (15s default)
- • Use recording rules for expensive queries
- • Monitor Prometheus itself
❌ Don't
- • Put unbounded values in labels
- • Use Prometheus for detailed logging
- • Store long-term data without external storage
- • Scrape too frequently (causes overhead)
- • Use too many labels per metric
- • Ignore storage and memory requirements