HADOOP History logs gone after reboot on Resourcemanger UI page - hadoop

I can see the logs after running MR tasks on resource manager UI page.
But they were gone after I reboot hadoop cluster.
Configs below. Much appreciate for the help. It is not fixed for long time.
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop201:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop201:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/his_log/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/his_log</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/mr-stage-his</value>
<description></description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/module/hadoop-3.1.3/logs/resource_manager_logs</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop201:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>5184000</value>
</property>
I tried review the configs of mapred-site.xml and yarn-site.xml while it still doesn't work.
I expect that logs can still be seen after cluster reboot.

Related

Dynamic Allocation on Hive on Spark

I configured spark engine in hive-site.xml using:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
I configured spark engine in hive-site.xml using:
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>yarn-cluster</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>4</value>
</property>
<property>
<name>spark.dynamicAllocation.initialExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.minExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.maxExecutors</name>
<value>8</value>
</property>
<property>
<name>spark.shuffle.service.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.io.compression.codec</name>
<value>lzf</value>
</property>
<property>
<name>spark.yarn.jar</name>
<value>hdfs://VCluster1/user/spark/share/lib/spark-assembly-1.3.1-hadoop2.7.1.jar</value>
</property>
<property>
<name>spark.kryo.referenceTracking</name>
<value>false</value>
</property>
<property>
<name>spark.kryo.classesToRegister</name>
<value>org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch</value>
</property>
In yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
When I run hive on spark job, dynamic allocation is not working. Spark would automatically assign spark.executor.instances to whatever the number I set to spark.dynamicAllocation.initialExecutors and not change. Can anyone help me to figure out the problem?
Thanks

Hadoop Cluster. Map reduce job stuck at map 100% and reduce 0%

I am new to Hadoop. I tried to create a hadoop cluster based on the example given on the Apache Hadoop site.
However when I run the map reduce example the application is stuck at map 100% and reduce 0%.
Please help
I have setup the environment using Vagrant and Virtual box. Created two instances.
I am running name node and a data node in one instance and resource manager and node manager in the other instance.
mapred-siet.xml configuration
<configuration>
<!-- Map Reduce applications configuration -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<!-- Map Reduce Job History Server -->
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
yarn-site.xml
e<configuration>
<!-- Resource Manager -->
<property>
<name>yarn.acl.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!-- Node Manager -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/hadoop-2.6.2/tempData</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/hadoop-2.6.2/logDir</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- History Server -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
I was able to run the application now. As I thought it was a problem with the memory required by the system. I changed the following properties as given below
yarn.scheduler.maximum-allocation-mb
8192
<!-- Node Manager -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
and repeated the process. its working fine now.

resourcemanager web ui does not show job status

I have just configured a hadoop clustering by using cdh5.I could successful run test jobs in command line and get the results.resourcemanager ui doesnot show job status,even in the completion .If I set mapreduce.framework.name to yarn in mapred-site.xml and job fails and show failure status in the resourcemanager ui.
Test job,I have used to run
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.1.jar pi 16 10000
Here is my yarn-site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>rhel2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>rhel3</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>localhost:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>rhel2.had.com:2181,rhel3.had.com:2181,rhel4.had.com:2181</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>rhel2:9046</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- Node Config -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/pseudo-dist/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/pseudo-dist/yarn/log</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
</configuration>
I didn't set any parameter in mapred-site.xml and the file is empty
Please let me know,any changes to be done to mapred-site.xml or yarn-site.xml file to get web ui worked

ACLs not working in Capacity Scheduler in YARN (CDH5)

ACLs is not wroking in Capacity Scheduler in CDH-5. Please see the below config. Only user1 and user2 should be able to queue2 and queue1 but all users are able to access all queues.
Let me know if there is a solution
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>batch,default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queues</name>
<value>queue1,queue2</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>20</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queue1.capacity</name>
<value>70</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queue2.capacity</name>
<value>30</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queue1.acl_submit_applications</name>
<value>user1</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queue2.acl_submit_applications</name>
<value>user2</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queue1.acl_administer_queue</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.queue2.acl_administer_queue</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>20000</value>
</property>
</configuration>
As states CHD5 documentation, Capacity Scheduler is not supported.
Reference

Hadoop's Capacity Scheduler - Setting up multiple queues

I tried to set up 2 queues - queue1,queue2.
I added the names of these queues to the mapred-site.xml
<property>
<name>mapred.queue.names</name>
<value>queue1,queue2</value>
</property>
I configured CapacityScheduler.xml as shown below.
<?xml version="1.0"?>
<configuration>
<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>3000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.capacity</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.capacity</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-capacity</name>
<value>-1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-capacity</name>
<value>-1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.init-accept-jobs-factor</name>
<value>10</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.init-accept-jobs-factor</name>
<value>10</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-queue</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.default-init-accept-jobs-factor</name>
<value>10</value>
</property>
<!-- Capacity scheduler Job Initialization configuration parameters -->
<property>
<name>mapred.capacity-scheduler.init-poll-interval</name>
<value>5000</value>
</property>
<property>
<name>mapred.capacity-scheduler.init-worker-threads</name>
<value>5</value>
</property>
</configuration>
The bin/start-all.sh starts the following services.
17083 DataNode
17557 TaskTracker
17373 JobTracker
16902 NameNode
17279 SecondaryNameNode
17703 Jps
Im able to view the WEB UI for Jobtracker in
http://localhost:50030/
Tasktracker's WEB UI
http://localhost:50060/
shows "Unable to Connect". But after a few seconds the jobtracker and tasktracker shuts down. jps command on the terminal only shows
17083 DataNode
16902 NameNode
17279 SecondaryNameNode
17703 Jps
What might be the solution.
both of your queues have a capacity of 100 , which makes the capacity scheduler to think there are couple of queues that each have a capacity of 100%. I suggest you change the setting to :
<?xml version="1.0"?>
<configuration>
<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>3000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.capacity</name>
<value>80</value> <!-- change here -->
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.capacity</name>
<value>20</value> <!-- change here -->
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue1.maximum-capacity</name>
<value>-1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.queue2.maximum-capacity</name>
<value>-1</value>
</property>
The sum of all your queues must always and only be 100 (ie 100%) you can have two queues with 100 and 0 percent respectively - that is valid.
Also I think it's good practice to always have a "default" queue, with some allocation at the very least. I don't know what the scheduler will do if you don't specify the queue name when you don't have a default.

Resources