resourcemanager web ui does not show job status - hadoop

I have just configured a hadoop clustering by using cdh5.I could successful run test jobs in command line and get the results.resourcemanager ui doesnot show job status,even in the completion .If I set mapreduce.framework.name to yarn in mapred-site.xml and job fails and show failure status in the resourcemanager ui.
Test job,I have used to run
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.1.jar pi 16 10000
Here is my yarn-site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>rhel2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>rhel3</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>localhost:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>rhel2.had.com:2181,rhel3.had.com:2181,rhel4.had.com:2181</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>rhel2:9046</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- Node Config -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/pseudo-dist/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/pseudo-dist/yarn/log</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
</configuration>
I didn't set any parameter in mapred-site.xml and the file is empty
Please let me know,any changes to be done to mapred-site.xml or yarn-site.xml file to get web ui worked

Related

HADOOP History logs gone after reboot on Resourcemanger UI page

I can see the logs after running MR tasks on resource manager UI page.
But they were gone after I reboot hadoop cluster.
Configs below. Much appreciate for the help. It is not fixed for long time.
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop201:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop201:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/his_log/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/his_log</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/mr-stage-his</value>
<description></description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/resource_manager_logs</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop201:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>5184000</value>
</property>
I tried review the configs of mapred-site.xml and yarn-site.xml while it still doesn't work.
I expect that logs can still be seen after cluster reboot.

Yarn High Availability: ZKResourceManagerStateStore not found

I'm trying to set up Yarn to run in the HA configuration on Hadoop 2.7.3. When starting I receive the following error in the resource manager log file:
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKResourceManagerStateStore not found
My yarn-site.xml is bellow:
<configuration>
<!-- Resource Manager Configs -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKResourceManagerStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master:2181,slave1:2181,slave2:2181</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>
<!-- ResourceManager1 configs -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>master:23140</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>master:23130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>master:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>master:23125</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>master:23141</value>
</property>
<!-- ResourceManager2 configs -->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>slave1:23140</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>slave1:23130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>slave1:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slave1:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>slave1:23125</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>slave1:23141</value>
</property>
<!-- Node Manager Configs -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>master:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>master:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/pseudo-dist/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/pseudo-dist/yarn/log</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>
</configuration>
replace
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKResourceManagerStateStore</value>
</property>
with
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
in yarn-site.xml and try again

Dynamic Allocation on Hive on Spark

I configured spark engine in hive-site.xml using:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
I configured spark engine in hive-site.xml using:
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>yarn-cluster</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>4</value>
</property>
<property>
<name>spark.dynamicAllocation.initialExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.minExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.maxExecutors</name>
<value>8</value>
</property>
<property>
<name>spark.shuffle.service.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.io.compression.codec</name>
<value>lzf</value>
</property>
<property>
<name>spark.yarn.jar</name>
<value>hdfs://VCluster1/user/spark/share/lib/spark-assembly-1.3.1-hadoop2.7.1.jar</value>
</property>
<property>
<name>spark.kryo.referenceTracking</name>
<value>false</value>
</property>
<property>
<name>spark.kryo.classesToRegister</name>
<value>org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch</value>
</property>
In yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
When I run hive on spark job, dynamic allocation is not working. Spark would automatically assign spark.executor.instances to whatever the number I set to spark.dynamicAllocation.initialExecutors and not change. Can anyone help me to figure out the problem?
Thanks

HBase acl commands are too slow

I have configured HBase-1.1.5 and Hadoop-2.7.2 with Kerberos security.
I have enabled authorization for HBase.
When executing any authorization command like user_permission, grant, revoke, etc.
Its getting more than 40 seconds to execute
Below are hbase-site.xml configuration properties
<property>
<name>hbase.master</name>
<value>IP:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://IP:9000/HBase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>IP1:2181,IP2:2181,IP3:2181</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60020</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>60030</value>
</property>
<property>
<name>hbase.security.authentication</name>
<value>KERBEROS</value>
</property>
<property>
<name>hbase.master.keytab.file</name>
<value>masterkeytab</value>
</property>
<property>
<name>hbase.regionserver.keytab.file</name>
<value>regionserverkeytab</value>
</property>
<property>
<name>hbase.master.kerberos.principal</name>
<value>masterprincipal</value>
</property>
<property>
<name>hbase.regionserver.kerberos.principal</name>
<value>regionserverprincipal</value>
</property>
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
<property>
<name>hbase.ssl.enabled</name>
<value>true</value>
</property>
<property>
<name>hbase.superuser</name>
<value>#HadoopUser</value>
</property>
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
Help me to improve performance on HBase acl
Thanks in advance

Hadoop's Capacity Scheduler - Setting up multiple queues

I tried to set up 2 queues - queue1,queue2.
I added the names of these queues to the mapred-site.xml
<property>
<name>mapred.queue.names</name>
<value>queue1,queue2</value>
</property>
I configured CapacityScheduler.xml as shown below.
<?xml version="1.0"?>
<configuration>
<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>3000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.capacity</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.capacity</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-capacity</name>
<value>-1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-capacity</name>
<value>-1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.init-accept-jobs-factor</name>
<value>10</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.init-accept-jobs-factor</name>
<value>10</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-queue</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-init-accept-jobs-factor</name>
<value>10</value>
</property>
<!-- Capacity scheduler Job Initialization configuration parameters -->
<property>
<name>mapred.capacity-scheduler.init-poll-interval</name>
<value>5000</value>
</property>
<property>
<name>mapred.capacity-scheduler.init-worker-threads</name>
<value>5</value>
</property>
</configuration>
The bin/start-all.sh starts the following services.
17083 DataNode
17557 TaskTracker
17373 JobTracker
16902 NameNode
17279 SecondaryNameNode
17703 Jps
Im able to view the WEB UI for Jobtracker in
http://localhost:50030/
Tasktracker's WEB UI
http://localhost:50060/
shows "Unable to Connect". But after a few seconds the jobtracker and tasktracker shuts down. jps command on the terminal only shows
17083 DataNode
16902 NameNode
17279 SecondaryNameNode
17703 Jps
What might be the solution.
both of your queues have a capacity of 100 , which makes the capacity scheduler to think there are couple of queues that each have a capacity of 100%. I suggest you change the setting to :
<?xml version="1.0"?>
<configuration>
<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>3000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.capacity</name>
<value>80</value> <!-- change here -->
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.capacity</name>
<value>20</value> <!-- change here -->
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-capacity</name>
<value>-1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-capacity</name>
<value>-1</value>
</property>
The sum of all your queues must always and only be 100 (ie 100%) you can have two queues with 100 and 0 percent respectively - that is valid.
Also I think it's good practice to always have a "default" queue, with some allocation at the very least. I don't know what the scheduler will do if you don't specify the queue name when you don't have a default.

Resources