application priority with fairscheduler - cloudera - hadoop

Our hadoop cluster is using fair scheduler. And I have jobs submitted with two different users. some jobs let's say is submitted with aaa user and others jobs are submitted with bbb user.
yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.resourcemanager.max-completed-applications</name>
<value>10000</value>
</property>
<property>
I don't see any queues created as per the following document. and seems like for each user new queue is created under root.users.
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
if I set the priority of jobs submitted by aaa users to be higher than bbb user. What would that mean?
Would that mean, all the jobs submitted by bbb use will wait until jobs submitted by aaa user will be completed ?

Related

HBase's RegionServer crashs

I'm trying to create about 589 tables and make random insertions. I start processing table by table: so I create one table then make all of my insertions, then create another one until all of the data get ingested.
The architecture of this solution is :
Python client located in one machine which ingests HBase with data.
Cloudera server hosting HBase configured in stand-alone which is a VM located in the same machine as the client and indentified by its IP address. The caracteristics of this server are as follows: 64GB of storage, 4GB of RAM and 1 CPU.
The client communicates with an HBase Thrift Server.
So the problem here is that when I try to ingest all of that amount of data. The client is only able to create and insert about 300MB before the regionserver shuts down (about 45 tables created and respective rows inserted and then the server crashs at the 46th table's data ingestion). I have tested all of this with different machine caracteristics, the size of the ingested data varies from machine to another (If the machine has more memory, more data will be inserted [Have tested this with different VM hardware caracteristics]). I'm suspecting that it's coming from the management of the Java Heap Memory, so I have tried to make different configurations. But it didn't make it better. Here is my main configuration of HBase :
hbase-site.xml
<property>
<name>hbase.rest.port</name>
<value>8070</value>
<description>The port for the HBase REST server.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://quickstart.cloudera:8020/hbase</value>
</property>
<property>
<name>hbase.regionserver.ipc.address</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.master.ipc.address</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.thrift.info.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>10737418240</value> <!-- 10 GB -->
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>33554432</value> <!-- 32 MB -->
</property>
<property>
<name>hbase.client.write.buffer</name>
<value>8388608</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>64</value>
</property>
hbase-env.sh
# The maximum amount of heap to use. Default is left to JVM default.
export HBASE_HEAPSIZE=4G
# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=4g -XX:MaxPermSize=4g"
Here is the error that I get from the Master Server's log:
util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC):
pause of approximately 1016msNo GCs detected
and nothing appears in the RegionServer's log.
On the other hand, when I try to create only one table and to insert a greater amount of data this works!
Any brilliant idea about how to fix this, please?
Thanks in advance.
Your VM's memory is way too low. Try bumping it up to at least 12GB. You're forgetting that a Java process's heap is only one part of the memory footprint. By setting HBASE_HEAPSIZE=4G you're saying you want HBase to allocate all your VM's memory. The VM also needs to run Linux daemons and your Cloudera services besides HBase.

How to configure monopolistic FIFO application queue in YARN?

I need to disable parallel execution of YARN applications in hadoop cluster. Now, YARN has default settings, so several jobs can run in parallel. I see no advantages of this, because both jobs run slower.
I found this setting yarn.scheduler.capacity.maximum-applications which limits maximum number of applications, but it affects both submitted and running apps (as stated in docs). I'd like to keep submitted apps in queue until current running application is not finished. How can this be done?
1) Change Scheduler to FairScheduler
Hadoop distributions use CapacityScheduler by default (Cloudera uses FairScheduler as default Scheduler). Add this property to yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
2) Set default Queue
Fair Scheduler creates a queue per user. I.E., if three different users submit jobs then three individual queues will be created and the resources will be shared among the three queues. Disable it by adding this property in yarn-site.xml
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
This assures that all the jobs go into a single default queue.
3) Restrict Maximum Applications
Now that the job queue has been limited to one default queue. Restrict the maximum number of applications to 1 that can be run in that queue.
Create a file named fair-scheduler.xml under the $HADOOP_CONF_DIR and add these entries
<allocations>
<queueMaxAppsDefault>1</queueMaxAppsDefault>
</allocations>
Also, add this property in yarn-site.xml
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>$HADOOP_CONF_DIR/fair-scheduler.xml</value>
</property>
Restart YARN services after adding these properties.
On submitting multiple applications, the application ACCEPTED first will be considered as the Active application and the remaining will be queued as Pending applications. These pending applications will continue to be in ACCEPTED state until the RUNNING application is FINISHED. The Active application will be allowed to utilise all the available resources.
Reference: Hadoop: Fair Scheduler
As per my understanding about your question. I see, the above code line/setting only may not help you. Can you check below code with your existing setup, it may give you some solution.
<allocations>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<queue name="<<Your Queue Name>>"
<weight>40</weight>
<schedulingPolicy>fifo</schedulingPolicy>
</queue>
<queue name=<<Your Queue Name>>>
<weight>60</weight>
<queue name=<<Your Queue Name>> />
<queue name=<<Your Queue Name>> />
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false" />
<rule name="primaryGroup" create="false" />
<rule name="default" queue=<<Your Queue Name>> />
</queuePlacementPolicy>
</allocations>

changing default scheduler in hadoop 1.2.1

As FIFO has been default scheduler in hadoop 1.2.1, where exactly do i need to make changes to change default scheduler from FIFO to capacity or fair. I had recently checked mapred-default.xml which is present inside hadoop-core-1.2.1.jar as directed in this answer but i didnt get where to hit and change the scheduling criteria. Please provide guidance thanking in advance
where exactly do i need to make changes to change default scheduler from FIFO to capacity or fair
In the mapred-site.xml
Fair Scheduler
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
Capacity Scheduler
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>
Note, you may want to actually read the documentation from those links because they tell you how to set them up in detail.

MapReduce jobs get stuck in Accepted state

I have my own MapReduce code that I'm trying to run, but it just stays at Accepted state. I tried running another sample MR job that I'd run previously and which was successful. But now, both the jobs stay in Accepted state. I tried changing various properties in the mapred-site.xml and yarn-site.xml as mentioned here and here but that didn't help either. Can someone please point out what could possibly be going wrong. I'm using hadoop-2.2.0
I've tried many values for the various properties, here is one set of values-
In mapred-site.xml
<property>
<name>mapreduce.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapreduce.job.tracker.reserved.physicalmemory.mb</name>
<value></value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>256</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>256</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>400</value>
<source>mapred-site.xml</source>
</property>
In yarn-site.xml
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>400</value>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>.3</value>
</property>
I've had the same effect and found that making the system have more memory available per worker node and reduce the memory required for an application helped.
The settings I have (on my very small experimental boxes) in my yarn-site.xml:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
Had the same issue, and for me it was a full hard drive (>90% full) which was the issue. Cleaning space saved me.
A job stuck in accepted state on YARN is usually because of free resources are not enough. You can check it at http://resourcemanager:port/cluster/scheduler:
if Memory Used + Memory Reserved >= Memory Total, memory is not enough
if VCores Used + VCores Reserved >= VCores Total, VCores is not enough
It may also be limited by parameters such as maxAMShare.
Am using Hadoop 3.0.1.I had faced the same issue where-in submitted map reduce job were shown as stuck in ACCEPTED state in ResourceManager web UI.Also, in the same ResourceManager web UI,under Cluster metrics -> Memory used was 0, Total Memory was 0; Cluster Node Metrics -> Active Nodes was 0, although NamedNode web UI listed the data nodes perfectly.Running yarn node -list on the cluster did not display any NodeManagers.Turns out, that my NodeManagers were not running.After starting the NodeManagers,the newly submitted map reduce jobs could proceed further.They were no more stuck in ACCEPTED state, and got to "RUNNING" state
I faced the same issue. And i changed every configuration mentioned in above answers but still it was no use. After this, i re-checked the health of my cluster. There, i observed that my one and only node was in un-healthy state. The issue was due to lack of disk space in my /tmp/hadoop-hadoopUser/nm-local-dir directory. Same can be checked by checking node health status at resource manager web UI at port 8032. To resolve this, i added below property in yarn-site.xml.
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>98.5</value>
</property>
After restarting my hadoop daemons, node status got changed to healthy and jobs started to run
Adding the property yarn.resourcemanager.hostname to the master node hostname in yarn-site.xml and copy this file to all the nodes in the cluster to reflect this configuration has solved the issue for me.

Hadoop HA Namenode remote access

Im configuring Hadoop 2.2.0 stable release with HA namenode but i dont know how to configure remote access to the cluster.
I have HA namenode configured with manual failover and i defined dfs.nameservices and i can access hdfs with nameservice from all the nodes included in the cluster, but not from outside.
I can perform operations on hdfs by contact directly the active namenode, but i dont want that, i want to contact the cluster and then be redirected to the active namenode. I think this is the normal configuration for a HA cluster.
Does anyone now how to do that?
(thanks in advance...)
You have to add more values to the hdfs site:
<property>
<name>dfs.ha.namenodes.myns</name>
<value>machine-98,machine-99</value>
</property>
<property>
<name>dfs.namenode.rpc-address.myns.machine-98</name>
<value>machine-98:8100</value>
</property>
<property>
<name>dfs.namenode.rpc-address.myns.machine-99</name>
<value>machine-145:8100</value>
</property>
<property>
<name>dfs.namenode.http-address.myns.machine-98</name>
<value>machine-98:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.myns.machine-99</name>
<value>machine-145:50070</value>
</property>
You need to contact one of the Name nodes (as you're currently doing) - there is no cluster node to contact.
The hadoop client code knows the address of the two namenodes (in core-site.xml) and can identity which is the active and which is the standby. There might be a way by which you can interrogate a zookeeper node in the quorum to identify the active / standby (maybe, i'm not sure) but you might as well check one of the namenodes - you have a 50/50 chance it's the active one.
I'd have to check, but you might be able to query either if you're just reading from HDFS.
for Active Name node you can always ask Zookeeper.
you can get the active name node from the below Zk Path.
/hadoop-ha/namenodelogicalname/ActiveStandbyElectorLock
There are two ways to resolve this situation(code with java)
use core-site.xml and hdfs-site.xml in your code
load conf via addResource
use conf.set in your code
set hadoop conf via conf.set
an example use conf.set

Resources