hadoop distcp fails due to missing yarn log directory - hadoop

I am trying to run a distcp command on an EMR cluster:
hadoop distcp s3a://... hdfs://host/data/...
When I run this, it gives the following error:
Exit code: 1
Exception message: /bin/bash: /mnt/yarn/logs/application_1524773139099_0003/container_1524773139099_0003_02_000001/stdout: No such file or directory
Stack trace: ExitCodeException exitCode=1: /bin/bash: /mnt/yarn/logs/application_1524773139099_0003/container_1524773139099_0003_02_000001/stdout: No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I've checked all the nodes in the cluster, and they all have a /mnt/yarn/logs directory which I created. What is going on here?

Make sure the user that runs the job have sufficient privileges to create temp directories such as application_******* in the path /mnt/yarn/logs. preferable the hive user and also pull yarn logs for application_1524773139099_0003 to view errors that might explain the actual error.

Related

Container exited with a non-zero exit code 1 during wordcount

When I am executing the wordcount program in hadoop-mapreduce-examples using below command
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /wordcount/input/test_input.txt /wordcount/output
It is throwing me following exception
Exception from container-launch.
Countainer id: countainer_1540539176003_003_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode 1;
at org.apache.hadoop.util.Sgell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Sgell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.javaL1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExcutor.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurreunt.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
at java.util.concurreunt.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
How to fix it?
Sorry I'm new here.
Does it mean there is some memory problems?
You need to start by getting the correct logs.
Look at the url to track the job for the address to the YARN UI.
If that address is not available, you can copy the full app id to the logs command
yarn logs -applicationId application_1540...
From there, you can search for a stacktrace generated by the code.
If you've just setup Hadoop, I would guess that hdfs dfs -ls /wordcount_input/ throws some error about not existing or about permission denied

Hadoop FS File system error - copyToLocal([class org.apache.hadoop.fs.Path, class org.apache.hadoop.fs.Path]) does not exist

Inside the PysPark session , I want to copy file from S3 to Hadoop Cluster local directory while doing this got following error. Please help.
file_system.copyToLocal(false, java_path_src, java_path_dst)
Parameters-
java_path_src - s3://sandbox/metadata/2018-06-07T183915/test.jsonl
java_path_dst - /home/hadoop/output/
Error-
py4j.protocol.Py4JError: An error occurred while calling o144.copyToLocal. Trace:
py4j.Py4JException: Method copyToLocal([class org.apache.hadoop.fs.Path, class org.apache.hadoop.fs.Path]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

Flink job on yarn Not starting

I have written a simple flink job of word count. I am trying to run the job on yarn and getting the error :
2017-10-04 13:15:19,037 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Diagnostics for container ResourceID{resourceId='container_e27_1506324726020_9534_01_000002'} in state COMPLETE : exitStatus=1 diagnostics=Exception from container-launch.
Container id: container_e27_1506324726020_9534_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code
When i remove "logback.xml" from The configuration directory of flink /hdfs/flink-1.2.1/conf than the same is working fine.
Please help me to understand what is the issue with logback.xml . Not able to understand the cause of problem.

Container exited with a non-zero exit code 1 error during mapreduce task

On executing jar in hadoop, I get the following error:
16/11/04 18:32:59 INFO mapreduce.Job: Task Id : attempt_1478261728730_0005_m_000000_2, Status : FAILED
Exception from container-launch.
Container id: container_1478261728730_0005_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
16/11/04 18:33:09 INFO mapreduce.Job: map 100% reduce 0%
This is application log:
Native code library failed to load.
java.lang.UnsatisfiedLinkError: no opencv_java2411 in java.library.pathopencv_java2411
I don't know what it mean, can anybody help with this please?
You are missing opencv on your cluster nodes.
See here for all the details on how to handle this.
Long story short though, you need to install opencv on your executors. You cannot really compile it into your job's .jar in a portable way since it's C and not Java code.
Update:
Note that the environment on your Hadoop executors is set by your hadoop-env.sh. So it needs to contain a line like:
JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:/etc/opencv/lib

Spark jobs failing because HDFS is caching jars

I upload Scala / Spark jars to HDFS to test them on our cluster. After running, I frequently realize there are changes that need to be made. So I make the changes locally then push the new jar back up to HDFS. However, often (not always) when I do this, hadoop throws an error essentially saying that this jar is not the same as the old jar (duh).
I try clearing my Trash, .staging, and .sparkstaging directories but that doesn't do anything. I try renaming the jar, which will work sometimes and other times it won't (it's still ridiculous I have to do this in the first place).
Does anyone know why this is occurring and how I can prevent it from occurring? Thanks for any help. Here are some logs if that helps (edited out some paths):
Application application_1475165877428_124781 failed 2 times due to AM
Container for appattempt_1475165877428_124781_000002 exited with
exitCode: -1000 For more detailed output, check application tracking
page:http://examplelogsite/ Then, click on links to logs of each
attempt. Diagnostics: Resource MYJARPATH/EXAMPLE.jar changed on src
filesystem (expected 1475433291946, was 1475433292850
java.io.IOException: Resource MYJARPATH/EXAMPLE.jar changed on src
filesystem (expected 1475433291946, was 1475433292850 at
org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at
org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at
org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing
the application.
I haven't seen that exit code before, so to me, it doesn't say anything, I would suggest you to check the logs, like this:
yarn logs -applicationId <your_application_ID>
According to your log, I'm sure it comes from yarn side.
You can modify yarn yourself to skip this exception as workaround.
I ran into this thread cause the error log changed on src filesystem, I met this issue and skipped it by modify yarn src code.
For more details, you can refer to how-to-fix-resource-changed-on-src-filesystem-issue

Resources