Apache Pulsar: Cloud-Native Messaging

Master Pulsar for next-generation messaging, multi-tenancy, and serverless stream processing at scale

35 min read
Not Started
Loading...

What is Apache Pulsar?

Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now developed under the Apache Software Foundation. It provides a unified model for both queuing and streaming use cases with multi-tenancy, geo-replication, and tiered storage as core features.

Unlike traditional messaging systems, Pulsar separates computation from storage, using Apache BookKeeper for durable message storage. This architecture enables infinite retention, instant scaling, and efficient resource utilization, making it ideal for modern cloud-native applications requiring both low latency and high throughput messaging.

Pulsar Cluster Calculator

4ms
Avg Latency
1
Brokers Needed
57678GB
Storage (compressed)
1
Mbps (replication)

Daily Volume: 8240GB/day

Multi-tenancy: 1 tenants/broker

Storage Savings: 67% with tiering

Pulsar Architecture Components

Brokers (Serving Layer)

Stateless service layer that handles client connections and message routing.

• HTTP and binary protocol support
• Message routing and filtering
• Load balancing and topic assignment
• Schema validation and evolution

BookKeeper (Storage Layer)

Distributed storage system providing durability and consistency.

• Segment-based storage
• Write-ahead logging
• Horizontal scalability
• Data replication and recovery

ZooKeeper (Coordination)

Metadata storage and cluster coordination service.

• Topic metadata management
• Broker discovery and load balancing
• Configuration storage
• Leader election

Pulsar Proxy

Optional gateway for client access and load balancing.

• Client connection proxy
• Authentication and authorization
• Load balancing across brokers
• Protocol translation

Pulsar Unique Features

Multi-Tenancy

Built-in isolation at tenant and namespace levels with authentication and resource quotas.

# Topic naming hierarchy
persistent://tenant/namespace/topic

# Example: E-commerce company structure
persistent://company/orders/checkout-events
persistent://company/analytics/user-behavior
persistent://company/notifications/email-queue

Tiered Storage

Automatic data lifecycle management with hot, warm, and cold storage tiers.

Tiered Storage Configuration
# Configure tiered storage policy
bin/pulsar-admin namespaces set-offload-policies \
  --bucket s3://my-bucket \
  --region us-west-2 \
  --offloadAfterElapsed 7d \
  my-tenant/my-namespace

Geo-Replication

Cross-datacenter replication with conflict resolution and disaster recovery.

Geo-Replication Setup
# Set up geo-replication
bin/pulsar-admin namespaces set-clusters \
  --clusters us-west,us-east,eu-central \
  my-tenant/my-namespace

# Messages replicated across all regions automatically

Real-World Pulsar Implementations

Yahoo! / Verizon Media

Original creators of Pulsar, handling massive scale advertising and content delivery.

  • • 3M+ topics across 10+ data centers
  • • 100+ billion messages per day
  • • Real-time ad bidding and analytics
  • • Content recommendation pipelines

Splunk

Uses Pulsar for log aggregation and real-time data pipeline processing.

  • • High-throughput log ingestion
  • • Real-time search indexing
  • • Multi-tenant data isolation
  • • Tiered storage for cost optimization

Tencent

Leverages Pulsar for gaming, social media, and fintech applications.

  • • Gaming event streaming
  • • Social media activity feeds
  • • Payment transaction processing
  • • Real-time recommendation systems

Narvar

Implements Pulsar for e-commerce logistics and package tracking systems.

  • • Package tracking events
  • • Delivery notification systems
  • • Returns and exchanges processing
  • • Customer communication workflows

Pulsar vs Apache Kafka

Pulsar Advantages

  • • Native multi-tenancy with isolation
  • • Infinite message retention with tiered storage
  • • Unified queuing and streaming model
  • • Built-in geo-replication
  • • Serverless functions (Pulsar Functions)
  • • Schema registry and evolution

When to Choose Pulsar

  • • Multi-tenant cloud environments
  • • Long-term data retention requirements
  • • Mixed queuing and streaming workloads
  • • Global/multi-region deployments
  • • Cloud-native, serverless architectures
  • • Need for operational simplicity

Pulsar Best Practices

✅ Do

  • • Design tenant and namespace structure carefully
  • • Use appropriate message ordering guarantees
  • • Implement proper backpressure handling
  • • Configure tiered storage for cost optimization
  • • Monitor BookKeeper cluster health
  • • Use schema registry for message evolution

❌ Don't

  • • Create too many topics without proper planning
  • • Ignore message deduplication requirements
  • • Undersize BookKeeper cluster storage
  • • Mix different workload patterns carelessly
  • • Forget to set appropriate retention policies
  • • Ignore subscription backlog monitoring

📝 Apache Pulsar Knowledge Quiz

1 of 6Current: 0/6

How does Apache Pulsar's architecture differ from traditional messaging systems?