Flink job on yarn Not starting - hadoop

I have written a simple flink job of word count. I am trying to run the job on yarn and getting the error :
2017-10-04 13:15:19,037 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Diagnostics for container ResourceID{resourceId='container_e27_1506324726020_9534_01_000002'} in state COMPLETE : exitStatus=1 diagnostics=Exception from container-launch.
Container id: container_e27_1506324726020_9534_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code
When i remove "logback.xml" from The configuration directory of flink /hdfs/flink-1.2.1/conf than the same is working fine.
Please help me to understand what is the issue with logback.xml . Not able to understand the cause of problem.

Related

Container exited with a non-zero exit code 1 during wordcount

When I am executing the wordcount program in hadoop-mapreduce-examples using below command
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /wordcount/input/test_input.txt /wordcount/output
It is throwing me following exception
Exception from container-launch.
Countainer id: countainer_1540539176003_003_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode 1;
at org.apache.hadoop.util.Sgell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Sgell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.javaL1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExcutor.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurreunt.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
at java.util.concurreunt.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
How to fix it?
Sorry I'm new here.
Does it mean there is some memory problems?
You need to start by getting the correct logs.
Look at the url to track the job for the address to the YARN UI.
If that address is not available, you can copy the full app id to the logs command
yarn logs -applicationId application_1540...
From there, you can search for a stacktrace generated by the code.
If you've just setup Hadoop, I would guess that hdfs dfs -ls /wordcount_input/ throws some error about not existing or about permission denied

hadoop distcp fails due to missing yarn log directory

I am trying to run a distcp command on an EMR cluster:
hadoop distcp s3a://... hdfs://host/data/...
When I run this, it gives the following error:
Exit code: 1
Exception message: /bin/bash: /mnt/yarn/logs/application_1524773139099_0003/container_1524773139099_0003_02_000001/stdout: No such file or directory
Stack trace: ExitCodeException exitCode=1: /bin/bash: /mnt/yarn/logs/application_1524773139099_0003/container_1524773139099_0003_02_000001/stdout: No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I've checked all the nodes in the cluster, and they all have a /mnt/yarn/logs directory which I created. What is going on here?
Make sure the user that runs the job have sufficient privileges to create temp directories such as application_******* in the path /mnt/yarn/logs. preferable the hive user and also pull yarn logs for application_1524773139099_0003 to view errors that might explain the actual error.

Spark job success but with ERROR CoarseGrainedExecutorBackend: Driver disassociated

My spark version is 1.6.2 and runs on yarn.
The log of drive container reports SUCCEEDED as follows:
16/11/17 17:25:56 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
16/11/17 17:25:56 INFO ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
However, The log of one executor container records an ERROR:
16/11/17 17:25:56 WARN CoarseGrainedExecutorBackend: An unknown (xxx-xxx-xxx-xxx:xxxx) driver disconnected.
16/11/17 17:25:56 ERROR CoarseGrainedExecutorBackend: Driver xxx-xxx-xxx-xxx:xxxx disassociated! Shutting down.
I think the job is succeeded because the output results are as expected.
But I want to know why the error was thrown and whether the job was really succeeded.
I find more info from the log of yarn NodeManager:
WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_xxxxxxxxx and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Someone encounter the same question?
Thanks.

Container exited with a non-zero exit code 1 error during mapreduce task

On executing jar in hadoop, I get the following error:
16/11/04 18:32:59 INFO mapreduce.Job: Task Id : attempt_1478261728730_0005_m_000000_2, Status : FAILED
Exception from container-launch.
Container id: container_1478261728730_0005_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
16/11/04 18:33:09 INFO mapreduce.Job: map 100% reduce 0%
This is application log:
Native code library failed to load.
java.lang.UnsatisfiedLinkError: no opencv_java2411 in java.library.pathopencv_java2411
I don't know what it mean, can anybody help with this please?
You are missing opencv on your cluster nodes.
See here for all the details on how to handle this.
Long story short though, you need to install opencv on your executors. You cannot really compile it into your job's .jar in a portable way since it's C and not Java code.
Update:
Note that the environment on your Hadoop executors is set by your hadoop-env.sh. So it needs to contain a line like:
JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:/etc/opencv/lib

issue Running Spark Job on Yarn Cluster

I want to run my spark Job in Hadoop YARN cluster mode, and I am using the following command:
spark-submit --master yarn-cluster
--driver-memory 1g
--executor-memory 1g
--executor-cores 1
--class com.dc.analysis.jobs.AggregationJob
sparkanalitic.jar param1 param2 param3
I am getting error below, kindly suggest whats going wrong, is the command correct or not. I am using CDH 5.3.1.
Diagnostics: Application application_1424284032717_0066 failed 2 times due
to AM Container for appattempt_1424284032717_0066_000002 exited with
exitCode: 15 due to: Exception from container-launch.
Container id: container_1424284032717_0066_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdfs
start time: 1424699723648
final status: FAILED
tracking URL: http://myhostname:8088/cluster/app/application_1424284032717_0066
user: hdfs
2015-02-23 19:26:04 DEBUG Client - stopping client from cache: org.apache.hadoop.ipc.Client#4085f1ac
2015-02-23 19:26:04 DEBUG Utils - Shutdown hook called
2015-02-23 19:26:05 DEBUG Utils - Shutdown hook called
Any help would be greatly appreciated.
It can mean a lot of things, for us, we get the similar error message because of unsupported Java class version, and we fixed the problem by deleting the referenced Java class in our project.
Use this command to see the detailed error message:
yarn logs -applicationId application_1424284032717_0066
You should remove ".setMaster("local")" in the code.
The command looks correct.
What I've come across is that the "exit code 15" normally indicates a TableNotFound Exception. That usually means there's an error in the code you're submitting.
You can check this by visiting the tracking URL.
For me exit code issue solved by placing hive-site.xml in spark/conf directory.
Remove the line "spark.master":"local[*]" in the spark configuration file if you are running the spark jobs under cluster.
Suppose run on the local pc, include it.
Mani

Resources