What is Apache YARN?
Apache YARN (Yet Another Resource Negotiator) is the resource management and job scheduling technology in Hadoop 2.0+. YARN enables multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics.
YARN separates the resource management functionality from the programming model by splitting the JobTracker responsibilities into separate daemons: ResourceManager for resource management, NodeManager for node monitoring, and ApplicationMaster for application lifecycle management. This architecture allows multiple applications to run simultaneously while efficiently sharing cluster resources.
YARN Cluster Performance Calculator
Available Resources: 136GB RAM, 70 cores
Fault Tolerance: 30% node failure tolerance
YARN Core Components
ResourceManager
Central authority managing cluster resources and scheduling applications globally.
• Application scheduling
• Queue management
• High availability support
• Web UI and REST APIs
NodeManager
Per-node agent managing containers and monitoring local resources.
• Local resource monitoring
• Log aggregation
• Health checking
• Security enforcement
ApplicationMaster
Per-application coordinator managing task execution and resource negotiation.
• Task coordination
• Progress monitoring
• Failure handling
• Application-specific logic
Containers
Resource allocation units providing isolated execution environments for tasks.
• Process isolation
• Environment setup
• Resource monitoring
• Cleanup on completion
Real-World YARN Implementations
Yahoo
Original creator running massive clusters for web search and advertising analytics.
- • 40,000+ node clusters
- • Mixed MapReduce and Spark workloads
- • Multi-tenant resource sharing
- • Real-time and batch processing
eBay
Manages complex e-commerce analytics and recommendation systems.
- • User behavior analysis
- • Fraud detection systems
- • Recommendation engines
- • Financial reporting pipelines
Powers professional networking analytics and machine learning workflows.
- • Member connection analysis
- • Job recommendation algorithms
- • Content feed optimization
- • A/B testing frameworks
Financial Services
Banks use YARN for risk analytics, regulatory reporting, and fraud detection.
- • Risk calculation workflows
- • Regulatory compliance reports
- • Real-time fraud detection
- • Customer analytics pipelines
YARN Configuration Examples
Resource Configuration
<configuration>
<!-- ResourceManager Configuration -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>rm.example.com</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>rm.example.com:8088</value>
</property>
<!-- NodeManager Resource Configuration -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12288</value>
<description>Total memory available for containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
<description>Total CPU cores available for containers</description>
</property>
<!-- Container Memory Limits -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<!-- Virtual Memory Settings -->
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Disable virtual memory checking</description>
</property>
</configuration>
Fair Scheduler Configuration
<allocations>
<!-- Production Queue -->
<queue name="production">
<weight>3.0</weight>
<minResources>4096 mb,4 vcores</minResources>
<maxResources>16384 mb,16 vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<schedulingPolicy>fair</schedulingPolicy>
<aclSubmitApps>prod_users</aclSubmitApps>
</queue>
<!-- Development Queue -->
<queue name="development">
<weight>1.0</weight>
<minResources>2048 mb,2 vcores</minResources>
<maxResources>8192 mb,8 vcores</maxResources>
<maxRunningApps>5</maxRunningApps>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>dev_users</aclSubmitApps>
</queue>
<!-- Queue Placement Policy -->
<queuePlacementPolicy>
<rule name="specified" create="false" />
<rule name="primaryGroup" create="false" />
<rule name="user" create="true" />
<rule name="default" queue="development"/>
</queuePlacementPolicy>
<!-- Global Settings -->
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<queueMaxAppsDefault>15</queueMaxAppsDefault>
<defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
<defaultFairSharePreemptionTimeout>600</defaultFairSharePreemptionTimeout>
</allocations>
YARN Best Practices
✅ Do
- • Configure appropriate memory and CPU allocations per node
- • Use Fair or Capacity Scheduler for multi-tenant environments
- • Enable ResourceManager high availability for production
- • Monitor resource utilization and queue performance
- • Set up log aggregation for centralized log management
- • Configure preemption policies for resource guarantees
- • Use node labels for heterogeneous hardware
- • Implement health checks for NodeManager nodes
❌ Don't
- • Allocate all node memory to YARN (leave 15-20% for OS)
- • Enable virtual memory checking (often causes issues)
- • Use FIFO scheduler in multi-tenant environments
- • Ignore container memory leaks and resource violations
- • Run ResourceManager without high availability
- • Set container limits too small (causes thrashing)
- • Forget to configure queue ACLs and user limits
- • Neglect monitoring and alerting for cluster health