Why does my yarn application not have logs even with logging enabled? - hadoop

I have enabled logs in the xml file: yarn-site.xml, and I restarted yarn by doing:
sudo service hadoop-yarn-resourcemanager restart
sudo service hadoop-yarn-nodemanager restart
I ran my application, and then I see the applicationID in yarn application -list. So, I do this: yarn logs -applicationId <application ID>, and I get the following:
hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/ does not have any log files
Do I need to change some other configuration? Or am I accessing the logs the wrong way?
Thank you.

yarn application -list
will list only the applications that are either in SUBMITTED, ACCEPTED or RUNNING state.
Log aggregation collects each container's logs and moves these logs onto the directory configured in yarn.nodemanager.remote-app-log-dir only after the completion of the application. Refer the description of yarn.log-aggregation-enable property here.
So, the applicationId listed by the command isn't completed yet and the logs are not yet collected. Thus the response when trying to access the logs of a running application
hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/ does not have any log files
You can try the same command yarn logs -applicationId <application ID> to view the logs once the application has completed.
To list all the FINISHED applications, use
yarn application -list -appStates FINISHED
Or to list all the applications
yarn application -list -appStates ALL

Enable Log Aggregation
Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

In version 2.3.2 of hadoop and higher you can get log aggregation to occur hourly on running jobs using this configuration in yarn-site.xml:
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
See this for further details: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html

It was probably saved with another appOwner. You can try to specify the application owner in your command:
yarn logs -appOwner .. -application_id ..

ROOT CAUSE: When log aggregation has been enabled each users application logs will, by default, be placed in the directory hdfs:///app-logs//logs/<APPLICATION_ID>. By default only the user that submitted the job and members of the hadoop group will have access to read the log files. In the example directory listing below you can see that the permissions are 770. No access for anyone other than the owner and members of the hadoop group.
[root#mycluster ~]$ hdfs dfs -ls /app-logs
Found 3 items
drwxrwx--- - hive hadoop 0 2017-03-10 15:33 /app-logs/hive
drwxrwx--- - user1 hadoop 0 2017-03-10 15:37 /app-logs/user1
drwxrwx--- - spark hadoop 0 2017-03-10 15:39 /app-logs/spark
SOLUTION: The message above can be deceiving and does not necessarily indicate that log aggregation has not been enabled. To obtain yarn logs for an application the 'yarn logs' command must be executed as the user that submitted the application. In the example below the application was submitted by user1. If we execute the same command as above as the user 'user1' we should get the following output if log aggregation has been enabled.
yarn logs -applicationId application_1473860344791_0001
16/09/19 23:10:33 INFO impl.TimelineClientImpl: Timeline service address: http://mycluster.somedomain.com:8188/ws/v1/timeline/
16/09/19 23:10:33 INFO client.RMProxy: Connecting to ResourceManager at mycluster.somedomain.com/192.168.1.89:8050
16/09/19 23:10:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/09/19 23:10:34 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e03_1473860344791_0001_01_000001 on mycluster.somedomain.com_45454
LogType:stderr
Log Upload Time:Wed Sep 14 09:44:15 -0400 2016
LogLength:0
Log Contents:
End of LogType:stderr
REFERENCE: The following document describes how to use log aggregation to collect logs for long-running YARN applications.
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_yarn-resource-management/content/ch_log_a...

Related

Running spark shell on yarn client error

I have Spark 1.6.1 and I have set
export HADOOP_CONF_DIR=/folder/location
Now if I run spark shell:
$ ./spark-shell --master yarn --deploy-mode client
I get this type of error (relevant part)
$ 16/09/18 15:49:18 INFO impl.TimelineClientImpl: Timeline service address: http://URL:PORT/ws/v1/timeline/
16/09/18 15:49:18 INFO client.RMProxy: Connecting to ResourceManager at URL/IP:PORT
16/09/18 15:49:18 INFO yarn.Client: Requesting a new application from cluster with 9 NodeManagers
16/09/18 15:49:19 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (14336 MB per container)
16/09/18 15:49:19 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/09/18 15:49:19 INFO yarn.Client: Setting up container launch context for our AM
16/09/18 15:49:19 INFO yarn.Client: Setting up the launch environment for our AM container
16/09/18 15:49:19 INFO yarn.Client: Preparing resources for our AM container
16/09/18 15:49:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/09/18 15:49:19 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied: user=Menmosyne, access=WRITE, inode="/user/Mnemosyne/.sparkStaging/application_1464874056768_0040":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
However when I run simply
$ ./spark-shell
(without specifying master) I get a lot more configurations on the screen than usual (ie it should load the configurations in the hadoop folder). So if I don't specify that the master is yarn, do my spark jobs still get submitted to the yarn cluster or not?
The default master in spark is local, that means that the application will run local in your machine and not in the cluster.
Yarn applications, in general (hive, mapreduce, spark, etc...), require to create temporal folders to store the partial data and/or current process configuration. Normally this temporal data is being written inside the HDFS user home (in your case /user/Mnemosyne)
Your problem is that your home folder was created by the user hdfs and your user Mnemosyne doesn't have privileges to write on it.
Then the spark job can not create the temporal structure in HDFS required to launch the application.
My suggestion is that you change the owner of the home folder (each user should be the owner of its home directory) and vaidate that the owner has full access to its home directory.
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#chown
The permissions on the home directory for Mnemosyne are incorrect. It is owned by the hdfs user and not Mnemosyne.
Run: hdfs dfs -chown -R Mnemosyne /user/Mnemosyne/
see hdfs chown docs here: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#chown
I just fixed this issue, with spark 1.6.2 and hadoop 2.6.0 cluster
1. copy spark-assembly-1.6.2-hadoop2.6.0.jar from local to hdfs
hdfs://Master:9000/spark/spark-assembly-1.6.2-hadoop2.6.0.jar
2.in spark-defaults.conf add parameter
spark.yarn.jars hdfs://Master:9000/spark/spark-assembly-1.6.2-hadoop2.6.0.jar
then run spark-shell --master yarn-client
all things OK
1 more thing if you want to run spark in yarn mode ,do not start spark cluster in local mode.

Running an Oozie job

I'm trying to configure Oozie to work on my hadoop-2.7.1 cluster. Everything seems to work fine, YARN, Hue, MapReduce and Spark. Jobs send by yarn jar... command finish correctly, but sending some job with oozie, either by CLI oozie job ... -run or by Hue, the job is stuck at 33% and node logs show this:
2015-11-06 06:08:56,121 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18030
2015-11-06 06:08:57,165 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:18030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
I don't use 18030 port anywhere in my configuration, probably I should change its hostname from localhost to the network hostname. But where do I configure it? I've tried to change yarn.resourcemanager.scheduler.address, but that wasn't it.
EDIT:
I run oozie job -config examples/apps/shell/job.properties -run with job.properties containing:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
examplesRoot=examples
oozie.libpath=/data/shared/hadoop-2.7.1/etc/hadoop
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell
The error is occurring while trying to contact the Resource Manager.
The above mentioned log line is being printed in RMProxy.java:
LOG.info("Connecting to ResourceManager at " + rmAddress);
When you are using Oozie with MRv1, in "job.properties" file, the value of jobTracker is set to the Job Tracker's address:
jobTracker={JobTracker Host}:{JobTracker Port}
But, when you migrate your Oozie job to MRv2, you need to change "job.properties", to make jobTracker value to point to Resource Manager address:
jobTracker={RM Host}:{RM Port}
Please refer to the link here: https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
jobTracker = Variable to define the resource manager address in case of Yarn implementation. Format: <resourcemanager_hostname>:<port>
EDIT:
I went through the Hadoop source code. The only place where port "18030" is being used is in "SLS" (Yarn Scheduler Load Simulator).
SLS has a yarn-site.xml file (present at location: \hadoop-tools\hadoop-sls\src\main\sample-conf\yarn-site.xml), which has following configuration:
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:18030</value>
</property>
From your description, it seems the yarn-site.xml that is being used, is similar to the one used by SLS.

Running JAR in Hadoop on Google Cloud using Yarn-client

i want to run a JAR in Hadoop on Google Cloud using Yarn-client.
i use this command in the master node of hadoop
spark-submit --class find --master yarn-client find.jar
but it return this error
15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
What is the problem? In case it is useful this is my yarn-site.xml
<?xml version="1.0" ?>
<!--
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/yarn-logs/</value>
<description>
The remote path, on the default FS, to store logs.
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-m-on8g</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5999</value>
<description>
In your case, it looks like the YARN ResourceManager may be unhealthy for unknown reasons; you can try to fix yarn with the following:
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh
However, it looks like you're using the Click-to-Deploy solution; Click-to-Deploy's Spark + Hadoop 2 deployment actually doesn't support Spark on YARN at the moment, due to some bugs and lack of memory configs. You'd normally run into something like this if you just try to run it with --master yarn-client out-of-the-box:
15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1434561664937
yarnAppState: RUNNING
15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
The well-supported way to deploy is a cluster on Google Compute Engine with Hadoop 2 and Spark configured to be able to run on YARN is to use bdutil. You'd run something like:
./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d \
-e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy
# Shorthand for logging in to the master
./bdutil -e my_custom_env.sh shell
# Handy way to run a socks proxy to make it easy to access the web UIs
./bdutil -e my_custom_env.sh socksproxy
# When done, delete your cluster
./bdutil -e my_custom_env.sh delete
With spark_on_yarn_env.sh Spark should default to yarn-client, though you can always re-specify --master yarn-client if you want. You can see a more detailed explanation of the flags available in bdutil with ./bdutil --help. Here are the help entries just for the flags I included above:
-b, --bucket
Google Cloud Storage bucket used in deployment and by the cluster.
-d, --use_attached_pds
If true, uses additional non-boot volumes, optionally creating them on
deploy if they don't exist already and deleting them on cluster delete.
-e, --env_var_files
Comma-separated list of bash files that are sourced to configure the cluster
and installed software. Files are sourced in order with later files being
sourced last. bdutil_env.sh is always sourced first. Flag arguments are
set after all sourced files, but before the evaluate_late_variable_bindings
method of bdutil_env.sh. see bdutil_env.sh for more information.
-P, --prefix
Common prefix for cluster nodes.
-p, --project
The Google Cloud Platform project to use to create the cluster.
-z, --zone
Specify the Google Compute Engine zone to use.

Cloudera hdfs another namenode already locked the storage directory

I am running CDH-5.3.2-1.cdh5.3.2.p0.10 with ClouderaManager on Centos 6.6.
My HDFS service was working on a Cluster. But I wanted to change the mounting point for the hadoop data. Yet without success, so I came with the idea to rollback all changes, but the previous configuration doesnt work what is discouraging.
I have two nodes within the cluster. One node for data is bad DataNodes Health Bad.
In the log I have got a few errors
1:40:10.821 PM ERROR org.apache.hadoop.hdfs.server.common.Storage
It appears that another namenode 931#spark1.xxx.xx has already locked the storage directory
1:40:10.821 PM INFO org.apache.hadoop.hdfs.server.common.Storage
Cannot lock storage /dfs/nn. The directory is already locked
1:40:10.821 PM WARN org.apache.hadoop.hdfs.server.common.Storage
java.io.IOException: Cannot lock storage /dfs/nn. The directory is already locked
1:40:10.822 PM FATAL org.apache.hadoop.hdfs.server.datanode.DataNode
Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to spark1.xxx.xx/10.10.10.10:8022. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:463)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1318)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:745)
I have been trying many possible solutions but without any luck.
formatting hadoop namenode -format
stopping cluster and rm -rf /dfs/* [and reformatting]
some adjustments to /dfs/nn/current/VERSION file
removing in_use.lock file and starting only a lacking node
removing a file in /tmp/hsperfdata_hdfs/ with name like the pid locking the directory.
There are files in the directory
[root#spark1 dfs]# ll
total 8
drwxr-xr-x 3 hdfs hdfs 4096 Apr 28 13:39 nn
drwx------ 3 hdfs hadoop 4096 Apr 28 13:40 snn
There is no dn dir what is a bit interesting.
All operations on hdfs files I perform as an hdfs user.
In the file /etc/hadoop/conf/hdfs-site.xml there is
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///dfs/nn</value>
</property>
Here is a similar thread of CDH users google group which might help you : https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/FYu0gZcdXuE
Also did you do the namenode format from cloudera manager or command line ? Ideally you should be doing it through cloudera manager and not command line.

Running hadoop job using java org.apache.hadoop.util.RunJar command

I want to submit a job to jobtracker using java (instead of hadoop) so that I can debug classpath issue.
export HADOOP_CLASSPATH=hbase-util-0.0.1-SNAPSHOT.jar:/etc/hadoop/conf:hbase-util-0.0.1-SNAPSHOT.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hbase/*:/usr/lib/hadoop/etc/hadoop/mapred-site.xml:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.0.1.jar:/usr/lib/hbase/hbase-0.92.1-cdh4.0.1-security.jar:/usr/lib/hbase/lib/zookeeper.jar:/usr/lib/hbase/lib:/etc/hbase/conf:/usr/lib/hbase/lib/guava-11.0.2.jar:/usr/lib/hbase/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hbase/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hbase:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
java -cp ${HADOOP_CLASSPATH} org.apache.hadoop.util.RunJar hbase-util-0.0.1-SNAPSHOT.jar hbase.util.RowDiffCounter SRM hdfs://dchilcmsnn01:8020/tmp/hadoop/mapred/temp/job1-temp-1491763074 /tmp/hadoop/mapred/temp/job1-temp-1491763075D SOURCE_MANAGEMENT SOURCE_MANAGEMENT
I get an error
ERROR [main] (UserGroupInformation.java:1235) - PriviledgedActionException as:devuser (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Adding the following properties does not help. I checked the job configuration page on the jobtracker to get the correct value.
-D mapreduce.framework.name=local
-D mapred.job.tracker=host101:8021
Do I need to pass in the user info as well?

Resources