Hadoop Yarn container logs missing - hadoop

We usually will be able to see yarn container logs in "/var/log/hadoop-yarn/containers" path. Though I am able to see logs for successful jobs, I am not able to see the logs for failed jobs. The node manager logs shows the logs getting deleted.
Log:
2017-07-13 14:16:04,170 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (DeletionService #1): Deleting path : /var/log/hadoop-yarn/containers/application_1234567890_12345/container_11234567890_12345_11_0000
01/stdout
2017-07-13 14:16:04,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl (LogAggregationService #6093): renaming /var/log/hadoop-yarn/apps/hadoop/logs/application_1234567890_12345/xx.xx.xx.xx_8041.tmp to /var/log/hadoop-yarn/apps/hadoop/logs/application_1234567890_12345/xx.xx.xx.xx_8041
2017-07-13 14:16:04,181 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (DeletionService #3): Deleting path : /var/log/hadoop-yarn/containers/application_1234567890_12345
2017-07-13 14:16:06,048 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl (Container Monitor): Stopping resource-monitoring for container_11234567890_12345_11_0000
Here's a snippet of my yarn-site.xml.
Can some one please advise on what config needs to be modified to retain logs for failed jobs?
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://ip-XX.XX.XX.XX:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/mnt/yarn</value>
<final>true</final>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.log-aggregation.enable-local-cleanup</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>32</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>

The logs get moved to HDFS when log aggregation is done, usually this is /app-logs on HDFS.
Check the below settings in the documentation
yarn.nodemanager.remote-app-log-dir
Normally /app-logs on HDFS but in your case it is set to /var/log/hadoop-yarn/apps, does this directory exist on HDFS? Looks like a local directory value was put here by mistake.
Other settings that may be useful:
yarn.log-aggregation-enable:
if ${yarn.log-aggregation-enable} is enabled then the NodeManager will immediately concatenate all of the containers logs into one file and upload them into HDFS in ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/ and delete them from the local userlogs directory
yarn.nodemanager.delete.debug-delay-sec:
Number of seconds after an application finishes before the nodemanager's DeletionService will delete the application's localized file directory and log directory. To diagnose Yarn application problems, set this property's value large enough (for example, to 600 = 10 minutes) to permit examination of these directories. After changing the property's value, you must restart the nodemanager in order for it to have an effect.

Related

set up Hadoop multi cluster on 2 windows 10

I am trying to set up a multi-node Hadoop cluster between 2 windows devices. I am using Hadoop 2.9.2.
how can I achieve that, please.
after a lot of trial and error the following did the job me.
do same configuration as previous answer by #AbsoluteBeginner.
disable windows firewall on all machines (i think you could keep it on and just mess around with the rules, but thats for you to find out)
hdfs namenode -format all nodes (master and slaves)
make sure that the datanode folder is empty in all 3 nodes (just shift+del)
in master node run start-all.cmd. all the following should appear.
50436 NameNode
54696 NodeManager
54744 DataNode
60028 Jps
7340 ResourceManager
in slave nodes run start-all.cmd. all the following should appear
6116 DataNode
2408 Jps
3208 NodeManager
note the reason that nameode and resource manager isn't appearing, is becuase they are running on master node and already occupy the port, and you only need the master resourcemanger and name node running
note if you saw multi-cluster tutorial of linux the master node also shows SeceondryNameNode when executing jps. not really sure why its not appearing in windows.
go to master:50070, and navigate to data nodes you should see something like this
go to master:8088, and navigate to Node you should see something like this
Install open-ssh server on both of your systems using this guide. Generating a new SSH public and private key pair on your local computer is the first step towards authenticating with a remote server without a password. Add the public key to the authorized_keys and add your hostname to list of known hosts. You can find guides on how to do this by searching the internet.
2.Add your hadoop master and slave ips to your hosts file. Open “C:\Windows\System32\drivers\etc\hosts”
and add
your-master-ip hadoopMaster
your-salve-ip hadoopSlave
you can use these names in your configuration files.
much like Linux systems, these are the steps you have to follow in order to run a Hadoop cluster on windows:
3. First you need to have Java installed on your system and JAVA_HOME must be added to your environment variables. You can download Java from Oracle website and install it.
Download Hadoop binary files from Apache website and extract it.
Note that you shouldn't have space in your folder names or you might encounter problems.
Next you have to add Java and Hadoop home and bin folders to your environment variables. just open start menu and type "environment variable" and open the edit environment variables window from control panel.
Add
HADOOP_HOME=”root of your hadoop extracted folder\hadoop-2.9.2″
HADOOP_BIN=”root of hadoop extracted folder\hadoop-2.9.2\bin”
JAVA_HOME=<Root of your JDK installation>”
Edit your "path" environment variable and add %JAVA_HOME%, %HADOOP_HOME%, %HADOOP_BIN%, %HADOOP_HOME%/sbin to your PATH one by one.
you can validate your additions by opening cmd and type in:
echo %HADOOP_HOME%
echo %HADOOP_BIN%
echo %PATH%
CONFIGURING HADOOP:
10. Open "your hadoop root\hadoop-2.9.2\etc\hadoop\hadoop-env.cmd" and add the following lines to the bottom of the file:
set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin
11.Open "your-hadoop-root\hadoop-2.9.2\etc\hadoop\hdfs-site.xml" and add the below content:
<property>
<name>dfs.name.dir</name>
<value>your desired address</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>your desired address</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoopMaster:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoopMaster:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
</property>
edit your core-site.xml and add:
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopMaster:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>your-temp-directory</value>
<description>A base for other temporary directories.</description>
</property>
Open "root to hadoop\hadoop-2.9.2\etc\hadoop\mapred-site.xml" and add below content within tags. If you don’t see mapred-site.xml then open mapred-site.xml.template file and rename it to mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hadoopMaster:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
14.Edit your yarn-site.xml and add:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>Long running service which executes on Node Manager(s) and provides MapReduce Sort and Shuffle functionality.</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Enable log aggregation so application logs are moved onto hdfs and are viewable via web ui after the application completed. The default location on hdfs is '/log' and can be changed via yarn.nodemanager.remote-app-log-dir property</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopMaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopMaster:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopMaster:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoopMaster:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoopMaster:8088</value>
</property>
In your slaves file in "root-hadoop-directory/hadoop/bin" add
hadoopSlave
Do these steps on your slave nodes too.
open cmd and cd to your sbin folder in hadoop directory.
18.format your nameNode
hadoop namenode -format
19.run the following command:
start-dfs.sh
then run:
start-yarn.sh

Hadoop Compression ERROR: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

When I'm running Apache Kylin on Hadoop, I met the following error related to Hadoop MapReduce:
2019-03-20 08:06:00,193 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:136)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1304)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1192)
at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1552)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:289)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:542)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat$LazyRecordWriter.write(LazyOutputFormat.java:113)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:468)
I guess the reason is that Hadoop cannot find the libsnappy.so* native library. I have searched solution online. Following this link, I have already added the following properties in corresponding xml files and restarted services:
# For HDFS core-site.xml
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
# For MapReduce2 mapred-site.xml
<property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native</value>
</propert
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
However, it didn't work. So I dig into the log of YARN. I found that in the launch_container.sh part, it has the following command:
export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1553049994285_0013/container_e04_1553049994285_0013_01_000005"
# ...omit other commands
export LD_LIBRARY_PATH="$PWD"
I think this command is wrong since instead of $PWD, the true path of to libsnappy.so* is.
LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native
Also as you can see, I have already set LD_LIBRARY_PATH to point to the true path in the mapred-site.xml file. Why does yarn still use $PWD?
Besides, I added one log message which is shown as follows. This ensures that the env LD_LIBRARY_PATH is really set mistakenly. So how can I solve this problem?
2019-03-20 08:06:00,044 INFO [main] org.apache.kylin.engine.mr.KylinMapper: linyanwen[from map]: /hadoop/yarn/local/usercache/root/appcache/application_1553049994285_0039/container_e04_1553049994285_0039_01_000005
Thanks!

Hadoop-Apache Ranger: StackOverflowError on namenode restart

I am getting this error after enabling hdfs plugin in apache ranger.
When I run enable-hdfs-plugin.sh ranger adds following configuration in hdfs-site.xml.
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
But if I remove the above property and restart my namenode, it starts with no error. Also, when I try to format the namenode it gives me the same error.
This is my install.properties of ranger's hdfs-plugin.
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-hdfs-plugin-impl to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-hdfs-plugin-impl
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-hdfs-plugin-shim-1.0.0-SNAPSHOT.jar to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-hdfs-plugin-shim-1.0.0-SNAPSHOT.jar
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-plugin-classloader-1.0.0-SNAPSHOT.jar to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-plugin-classloader-1.0.0-SNAPSHOT.jar
follow these instruction as per your file path. The problem is due to classloader is not found in your hadoop file path.

Access yarn logs via Web UI

How can I access yarn job log via web ui?
I can view the job log via yarn manager web site. But every time yarn restart, the application list of yarn manager is empty. the picture is before restart
I can access application log via CLI command, even I restart yarn.
$HADOOP_HOME/bin/yarn logs -applicationId application_1499949542308_0020
The jobhistory server web ui is empty all the time
My log settings in yarn-site.xml and mapred-site.xml
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/hadoop/nodemanager-logs</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hdp03.hp.sp.prd.bmsre.com:19888/jobhistory/logs</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdp03.hp.sp.prd.bmsre.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdp03.hp.sp.prd.bmsre.com:19888</value>
</property>
On Data Node, you can check the folder:
${HADOOP_HOME}/logs/userlogs
You just need to go to the folder which having the same name as the application id.
Yes, you can access Yarn retired jobs from Web UI.
access this url http://<jobtracker>:50070 to get the retired jobs.
With respect to your question you have restarted the yarn which means, a new log thread wakes up and does an upload of the logs to the configured location.
But in your question, does ' /app-logs' /app-logs path exists in your file-system. Please check.
There is a retention period, for how long the logs must be stored in that path and it is defined by the property name called yarn.log-aggregation.retain-seconds parameter.
To my understanding, the Job Tracker UI by default available at http://<jobtracker>:50070, exposes information on all currently running as well as retired MapReduce jobs and YARN has a JobHistory REST service that exposes details on finished applications.

Hadoop Mapreduce fails with permission management enabled

I enabled the permission management in my hadoop cluster, but I'm facing a problem sending jobs with pig. This is the scenario:
1 - I have hadoop/hadoop user
2 - I have myuserapp/myuserapp user that runs PIG script.
3 - We setup the path /myapp to be owned by myuserapp
4 - We set pig.temp.dir to /myapp/pig/tmp
But when we pig try to run the jobs we got the following error:
job_201303221059_0009 all_actions,filtered,raw_data DISTINCT Message: Job failed! Error - Job initialization failed: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=realtime, access=EXECUTE, inode="system":hadoop:supergroup:rwx------
Hadoop jobtracker requires this permission to statup it's server.
My hadoop policy looks like:
<property>
<name>security.client.datanode.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
</property>
<property>
<name>security.inter.tracker.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
</property>
<property>
<name>security.job.submission.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
<property>
My hdfs-site.xml:
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>755</value>
</property>
<property>
<name>dfs.web.ugi</name>
<value>hadoop,supergroup</value>
</property>
My core site:
...
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
...
And finally my mapred-site.xml
...
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred</value>
</property>
<property>
<name>mapreduce.jobtracker.jobhistory.location</name>
<value>/opt/logs/hadoop/history</value>
</property>
Is there a missing configuration? How can I deal with multiples users running jobs in a restrict HDFS cluster?
Your problem is probably the staging directory. Try adding this property to mapred-site.xml:
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
</property>
Then make sure that the submitting user (eg. 'realtime') has a home directory (eg. '/user/realtime') and that they own it.
The fair scheduler is designed to run map reduce jobs as the user and it creates separeted pools for users/groups but have shared resources. At first look, there are some issues with this scheduler related to permissions on certain directories not allowing other users to execute/write in places that are necessary for the job to run.
So, one solution is to use Capacity scheduler:
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>
Capacity Scheduler, use a number of named queues, where each queue has a configurable number of map and reduce slots. And one good thing about capacity is the ability of placing a limit on percent of running tasks per user, so that users share a cluster with a quota.

Resources