What is Apache Pulsar?
Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now developed under the Apache Software Foundation. It provides a unified model for both queuing and streaming use cases with multi-tenancy, geo-replication, and tiered storage as core features.
Unlike traditional messaging systems, Pulsar separates computation from storage, using Apache BookKeeper for durable message storage. This architecture enables infinite retention, instant scaling, and efficient resource utilization, making it ideal for modern cloud-native applications requiring both low latency and high throughput messaging.
Pulsar Cluster Calculator
Daily Volume: 8240GB/day
Multi-tenancy: 1 tenants/broker
Storage Savings: 67% with tiering
Pulsar Architecture Components
Brokers (Serving Layer)
Stateless service layer that handles client connections and message routing.
• Message routing and filtering
• Load balancing and topic assignment
• Schema validation and evolution
BookKeeper (Storage Layer)
Distributed storage system providing durability and consistency.
• Write-ahead logging
• Horizontal scalability
• Data replication and recovery
ZooKeeper (Coordination)
Metadata storage and cluster coordination service.
• Broker discovery and load balancing
• Configuration storage
• Leader election
Pulsar Proxy
Optional gateway for client access and load balancing.
• Authentication and authorization
• Load balancing across brokers
• Protocol translation
Pulsar Unique Features
Multi-Tenancy
Built-in isolation at tenant and namespace levels with authentication and resource quotas.
persistent://tenant/namespace/topic
# Example: E-commerce company structure
persistent://company/orders/checkout-events
persistent://company/analytics/user-behavior
persistent://company/notifications/email-queue
Tiered Storage
Automatic data lifecycle management with hot, warm, and cold storage tiers.
# Configure tiered storage policy
bin/pulsar-admin namespaces set-offload-policies \
--bucket s3://my-bucket \
--region us-west-2 \
--offloadAfterElapsed 7d \
my-tenant/my-namespace
Geo-Replication
Cross-datacenter replication with conflict resolution and disaster recovery.
# Set up geo-replication
bin/pulsar-admin namespaces set-clusters \
--clusters us-west,us-east,eu-central \
my-tenant/my-namespace
# Messages replicated across all regions automatically
Real-World Pulsar Implementations
Yahoo! / Verizon Media
Original creators of Pulsar, handling massive scale advertising and content delivery.
- • 3M+ topics across 10+ data centers
- • 100+ billion messages per day
- • Real-time ad bidding and analytics
- • Content recommendation pipelines
Splunk
Uses Pulsar for log aggregation and real-time data pipeline processing.
- • High-throughput log ingestion
- • Real-time search indexing
- • Multi-tenant data isolation
- • Tiered storage for cost optimization
Tencent
Leverages Pulsar for gaming, social media, and fintech applications.
- • Gaming event streaming
- • Social media activity feeds
- • Payment transaction processing
- • Real-time recommendation systems
Narvar
Implements Pulsar for e-commerce logistics and package tracking systems.
- • Package tracking events
- • Delivery notification systems
- • Returns and exchanges processing
- • Customer communication workflows
Pulsar vs Apache Kafka
Pulsar Advantages
- • Native multi-tenancy with isolation
- • Infinite message retention with tiered storage
- • Unified queuing and streaming model
- • Built-in geo-replication
- • Serverless functions (Pulsar Functions)
- • Schema registry and evolution
When to Choose Pulsar
- • Multi-tenant cloud environments
- • Long-term data retention requirements
- • Mixed queuing and streaming workloads
- • Global/multi-region deployments
- • Cloud-native, serverless architectures
- • Need for operational simplicity
Pulsar Best Practices
✅ Do
- • Design tenant and namespace structure carefully
- • Use appropriate message ordering guarantees
- • Implement proper backpressure handling
- • Configure tiered storage for cost optimization
- • Monitor BookKeeper cluster health
- • Use schema registry for message evolution
❌ Don't
- • Create too many topics without proper planning
- • Ignore message deduplication requirements
- • Undersize BookKeeper cluster storage
- • Mix different workload patterns carelessly
- • Forget to set appropriate retention policies
- • Ignore subscription backlog monitoring