Prometheus: Monitoring & Metrics Collection

Master Prometheus monitoring architecture, PromQL querying, and observability patterns for modern systems

30 min read
Not Started
Loading...

What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit originally built at SoundCloud. It features a multi-dimensional data model with time series data identified by metric name and key/value pairs, a flexible query language (PromQL), and autonomous single server nodes with no reliance on distributed storage.

Unlike traditional monitoring systems, Prometheus uses a pull-based model to collect metrics, making it ideal for dynamic, containerized environments where services can appear and disappear frequently.

Prometheus Sizing Calculator

50,000
Total Series
3,333
Samples/sec
12GB
Storage (30d)
51GB
Memory Usage

Daily Storage: 412MB/day

Query Latency: ~200ms

Prometheus Architecture Components

Prometheus Server

Core component that scrapes, stores, and serves metrics.

• Scrape targets via HTTP
• Store in local TSDB
• Serve PromQL queries

Alertmanager

Handles alerts from Prometheus server and routes them.

• Deduplication
• Grouping & routing
• Silencing & inhibition

Exporters

Collect metrics from third-party systems and expose them.

• Node exporter (system metrics)
• MySQL exporter
• Custom application metrics

Service Discovery

Automatically discover targets in dynamic environments.

• Kubernetes API
• Consul, EC2, DNS
• File-based discovery

PromQL Query Examples

Basic Metrics

# CPU usage
cpu_usage_percent{instance="server1"}

# Memory usage rate
rate(memory_usage_bytes[5m])

Aggregation Queries

# Average CPU across all instances
avg(cpu_usage_percent) by (job)

# 95th percentile response time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Alerting Rules

# High CPU alert
avg(cpu_usage_percent) by (instance) > 80

# High error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.1

Real-World Prometheus Implementations

SoundCloud

Original creators of Prometheus for monitoring their microservices architecture.

  • • 200+ microservices monitoring
  • • Real-time alerting for audio streaming
  • • Custom metrics for user engagement
  • • 500k time series, 50k samples/sec

DigitalOcean

Uses Prometheus for infrastructure monitoring and customer-facing metrics.

  • • Droplet performance monitoring
  • • Load balancer health checks
  • • Database cluster monitoring
  • • 1M+ time series across regions

GitLab

Monitors their DevOps platform with Prometheus and Grafana.

  • • CI/CD pipeline monitoring
  • • Git repository performance
  • • Application performance metrics
  • • Container orchestration monitoring

CoreOS

Integrated Prometheus into their Container Linux for system monitoring.

  • • Kubernetes cluster monitoring
  • • etcd performance tracking
  • • Container resource utilization
  • • System-level metrics collection

Prometheus Best Practices

✅ Do

  • • Use meaningful metric and label names
  • • Keep cardinality low (avoid user IDs in labels)
  • • Use histogram for latency measurements
  • • Set appropriate scrape intervals (15s default)
  • • Use recording rules for expensive queries
  • • Monitor Prometheus itself

❌ Don't

  • • Put unbounded values in labels
  • • Use Prometheus for detailed logging
  • • Store long-term data without external storage
  • • Scrape too frequently (causes overhead)
  • • Use too many labels per metric
  • • Ignore storage and memory requirements

📝 Prometheus Knowledge Quiz

1 of 6Current: 0/6

What type of data model does Prometheus use?