Hive query using tez stucks at Launching Job 1 out of 1 - hadoop

I have set up hadoop single node and tested hive query on 10gb using tez engine, which was working fine. After loading 6 tb data, query stuck at Launching Job 1 out of 1. And hive log says
Tez system stage directory
hdfs://127.0.0.1:9000/tmp/hive/hadoop/_tez_session_dir/0716a629-1a1d-4500-86d9-6041aaf61596/.tez/application_1675755644314_0003
doesn't exist and is created 2023-02-07T13:12:01,670 INFO [Tez
session start thread] impl.YarnClientImpl: Killed application
application_1675755644314_0002 2023-02-07T13:12:01,671 ERROR [Tez
session start thread] tez.TezSessionState: Failed to start Tez session
java.io.IOException: java.lang.InterruptedException: sleep interrupted
at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:456)
~[hive-exec-3.1.2.jar:3.1.2] at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.access$100(TezSessionState.java:101)
~[hive-exec-3.1.2.jar:3.1.2] at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState$1.call(TezSessionState.java:376)
~[hive-exec-3.1.2.jar:3.1.2] at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState$1.call(TezSessionState.java:371)
~[hive-exec-3.1.2.jar:3.1.2] at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]
Caused by: java.lang.InterruptedException: sleep interrupted at
java.lang.Thread.sleep(Native Method) ~[?:1.8.0_352] at
org.apache.tez.client.TezClient.waitTillReady(TezClient.java:1020)
~[tez-api-0.9.2.jar:0.9.2] at
org.apache.tez.client.TezClient.waitTillReady(TezClient.java:982)
~[tez-api-0.9.2.jar:0.9.2] at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:453)
~[hive-exec-3.1.2.jar:3.1.2] ... 5 more 2023-02-07T13:12:01,833 INFO
[93c56d8b-6e20-4b06-a5ba-1101e67ac8d7 main] impl.YarnClientImpl:
Submitted application application_1675755644314_0003
2023-02-07T13:12:01,839 INFO [93c56d8b-6e20-4b06-a5ba-1101e67ac8d7
main] client.TezClient: The url to track the Tez Session:
http://hadoop-node:8088/proxy/application_1675755644314_0003/
Any advice would be helpful.

Related

Spark 2.0 Thrift server not started in yarn mode

I have started the spark-2.0 thrift server in local environment and it's working fine, when I'll tried with the cluster environment the following exception thrown.
16/06/02 10:21:06 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended!
It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitFor
Application(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start
(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:502)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2246)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:749)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:57)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:81)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/06/02 10:21:06 INFO util.ShutdownHookManager: Shutdown hook called
When checking in application master logs
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
Spark-defaults configuration :
spark.executor.memory 2g
spark.driver.memory 4g
spark.executor.cores 1

Yarn MapReduce approximate-pi example fails exit code 1 when run as non-hadoop user

I am running a small private cluster of linux machines with Hadoop 2.6.2 and yarn. I launch yarn jobs from a linux edge node. The canned Yarn example to approximate the value of pi works perfectly when run by the hadoop (superuser, owner of the cluster) user, but fails when run from my personal account on the edge node. In both cases (hadoop, me) I run the job exactly like this:
clott#edge: /home/hadoop/hadoop-2.6.2/bin/yarn jar /home/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar pi 2 5
It fails; the full output is below. I think the file-not-found exception is totally bogus. I think something causes the launch of the container to fail, and so there's no output to be found. What causes container launches to fail, and how can this be debugged?
Because this identical same command works perfectly when run by the hadoop user but not when run by a different account on the same edge node, I suspect a permission or other yarn configuration problem; I don't suspect a missing-jar file problem. My personal account uses the same environment variables as the hadoop account, for what that's worth.
These questions are similar but I didn't find a solution:
https://issues.cloudera.org/browse/DISTRO-577
Running a map reduce job as a different user
Yarn MapReduce Job Issue - AM Container launch error in Hadoop 2.3.0
I have tried these remedies without any success:
In core-site.xml, set the value of hadoop.tmp.dir to /tmp/temp-${user.name}
Add my personal user account to every node in the cluster
I guess that many installations run with just a single user, but I'm trying to allow two people to work together on the cluster without trashing each other's intermediate results. Am I totally nuts?
Full output:
Number of Maps = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
15/12/22 15:29:18 INFO client.RMProxy: Connecting to ResourceManager at ac1.mycompany.com/1.2.3.4:8032
15/12/22 15:29:18 INFO input.FileInputFormat: Total input paths to process : 2
15/12/22 15:29:19 INFO mapreduce.JobSubmitter: number of splits:2
15/12/22 15:29:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1450815437271_0002
15/12/22 15:29:19 INFO impl.YarnClientImpl: Submitted application application_1450815437271_0002
15/12/22 15:29:19 INFO mapreduce.Job: The url to track the job: http://ac1.mycompany.com:8088/proxy/application_1450815437271_0002/
15/12/22 15:29:19 INFO mapreduce.Job: Running job: job_1450815437271_0002
15/12/22 15:29:31 INFO mapreduce.Job: Job job_1450815437271_0002 running in uber mode : false
15/12/22 15:29:31 INFO mapreduce.Job: map 0% reduce 0%
15/12/22 15:29:31 INFO mapreduce.Job: Job job_1450815437271_0002 failed with state FAILED due to: Application application_1450815437271_0002 failed 2 times due to AM Container for appattempt_1450815437271_0002_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://ac1.mycompany.com:8088/proxy/application_1450815437271_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1450815437271_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
15/12/22 15:29:31 INFO mapreduce.Job: Counters: 0
Job Finished in 13.489 seconds
java.io.FileNotFoundException: File does not exist: hdfs://ac1.mycompany.com/user/clott/QuasiMonteCarlo_1450816156703_163431099/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1817)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1841)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Yes Manjunath Ballur you were right it was a permissions problem! Finally learned how to preserve the yarn application logs, which clearly revealed the problem. Here are the steps:
Edit yarn-site.xml and add a property to delay yarn log deletion:
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>600</value>
</property>
Push yarn-site.xml to all nodes (ARGH I forgot this for a long time) and restart cluster.
Run yarn example to estimate pi as shown above, it fails. Look at http://namenode:8088/cluster/apps/FAILED to see the failed application, click on the link for the most recent failure, look at the bottom to see which nodes in the cluster were used.
Open a window on one of the nodes in the cluster where the app failed. Find the job directory, which in my case was
~hadoop/hadoop-2.6.2/logs/userlogs/application_1450815437271_0004/container_1450‌​815437271_0004_01_000001/
Et voila, I saw files stdout (only log4j bitching), stderr (nearly empty) and syslog (winner winner chicken dinner). In the syslog file I found this gem:
2015-12-23 08:31:42,376 INFO [main] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=clott, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/history":hadoop:supergroup:drwxrwx---
So the problem was permissions on hdfs:///tmp/hadoop-yarn/staging/history. A simple chmod 777 put me right, I'm not fighting the group perms anymore. Now a non-hadoop non-superuser can run a yarn job.

Unauthorized request to start container. This token is expired

I am getting "Unauthorized request to start container. This token is expired."
How to resovle it. The problem is reported on different forums, but I could not find an solution to it.
Below is the execution log
15/02/26 16:41:02 INFO impl.YarnClientImpl: Submitted application application_1424968835929_0001
15/02/26 16:41:02 INFO mapreduce.Job: The url to track the job: http://101-master15:8088/proxy/application_1424968835929_0001/
15/02/26 16:41:02 INFO mapreduce.Job: Running job: job_1424968835929_0001
15/02/26 16:41:04 INFO mapreduce.Job: Job job_1424968835929_0001 running in uber mode : false
15/02/26 16:41:04 INFO mapreduce.Job: map 0% reduce 0%
15/02/26 16:41:04 INFO mapreduce.Job: Job job_1424968835929_0001 failed with state FAILED due to: Application application_1424968835929_0001 failed 2 times due to Error launching appattempt_1424968835929_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1424969604829 found 1424969463686
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.
15/02/26 16:41:04 INFO mapreduce.Job: Counters: 0
Time taken: 0 days, 0 hours, 0 minutes, 9 seconds.
This exception occurs when your nodes have different time settings. Make sure that your all 3 nodes have same time n timezone settings and then restart computer.
This worked for me . Hope this help to you as well !!!!

HBase distributed log-splitting keeps failing because unable to get a lease

We used up all the free space on our test HDFS cluster so HBase crashed. After cleaning up some space, we were able to restart HBase, but after the startup a distributed log split job keeps failing.
The job looks like this:
Splitting log file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 into a temporary staging area.
The Regionserver are trying to get a lease on the file for some time:
2013-10-24 11:50:47,662 DEBUG org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or departed
2013-10-24 11:50:47,671 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker host-4,60020,1382614844870 acquired task /hbase/splitlog/hdfs%3A%2F%2F192.168.249.1%3A9000%2Fhdfs%2Fhbase%2F.logs%2Fhost-3%2C60020%2C1382113928374-splitting%2Fhost-3%252C60020%252C1382113928374.1382523937002
2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002, length=41274332
2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering lease on dfs file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002
2013-10-24 11:50:47,673 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=0 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 1ms
2013-10-24 11:50:50,674 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=1 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 3002ms
2013-10-24 11:50:51,674 DEBUG org.apache.hadoop.hbase.util.FSHDFSUtils: isFileClosed not available
2013-10-24 11:51:51,680 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=2 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 64008ms
Then the Master abort the job:
2013-10-24 11:55:48,685 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2013-10-24 11:55:48,687 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 interrupted, resigning
java.io.InterruptedIOException
at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136)
at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118)
... 9 more
It seems to me that the problem is the Regionserver which are unable to get a lease on this file, because it's already open, so I checked with sudo -u hdfs hadoop fsck /hdfs/hbase/.logs/ -openforwrite, and it confirms:
OPENFORWRITE: /hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 41274332 bytes, 1 block(s), OPENFORWRITE:
/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002: Under replicated blk_1073337163743094520_3534698. Target Replicas is 3 but found 2 replica(s).
I tried to shut down HBase, but the file stays OPENFORWRITE. How could I remove this flag?
ps> Hadoop 1.0.1, HBase 0.94.12

Oozie map-reduce example fails with ClassNotFoundException when using Bigtop 0.5.0

I'm using a relatively clean installation of CentOS 6.3 minimal with the Bigtop 0.5.0 repo and Sun Java 1.6. I add the Bigtop repo as per the instructions here.
I have installed Hadoop common and Oozie using yum. I configured oozie by running sudo service oozie init, then set up the HDFS paths using the commands in the init-hdfs.sh file in Bigtop 0.6.0.
I can run Java and streaming map reduce jobs without any problems. I can also run the Oozie streaming example that comes bundled with Bigtop. Unfortunately, when I try to run the map-reduce example, I get a java.lang.ClassNotFoundException
I can see from the HDFS audit logs that the oozie-examples-3.3.0.jar file gets inspected, but never opened. These are the only four entries for the jar file in the audit log for the time the workflow is running:
2013-03-12 14:42:07,394 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
2013-03-12 14:42:07,399 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
<snip>
2013-03-12 14:42:07,547 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
2013-03-12 14:42:07,550 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
The container logs I get from the webconsole on port 8088 show the following exception, but offer no further clues:
2013-03-12 15:10:18,681 FATAL [IPC Server handler 2 on 57310] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363061307536_0002_m_000000_0 - exited : java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:396)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.example.SampleMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1611)
at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:979)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.example.SampleMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1579)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1603)
... 16 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.example.SampleMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1485)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1577)
... 17 more
I managed to grab the job.xml file out of the temp directory while the failing stage of the workflow was running, and I can see that the jar file gets added to the classpath property:
<property><name>mapreduce.job.classpath.files</name><value>/user/user/user-oozi/0000001-130312141058075-oozie-oozi-W/mr-node--map-reduce/map-reduce-launcher.jar,/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar</value><source>programatically</source></property>
... but the class is still apparently not found. I've set all debugging up to DEBUG for all components and can find no more clues.
Have I simply misconfigured something, or is this actually a bug? I don't really know what to do next.

Resources