Oozie map-reduce example fails with ClassNotFoundException when using Bigtop 0.5.0 - hadoop

I'm using a relatively clean installation of CentOS 6.3 minimal with the Bigtop 0.5.0 repo and Sun Java 1.6. I add the Bigtop repo as per the instructions here.
I have installed Hadoop common and Oozie using yum. I configured oozie by running sudo service oozie init, then set up the HDFS paths using the commands in the init-hdfs.sh file in Bigtop 0.6.0.
I can run Java and streaming map reduce jobs without any problems. I can also run the Oozie streaming example that comes bundled with Bigtop. Unfortunately, when I try to run the map-reduce example, I get a java.lang.ClassNotFoundException
I can see from the HDFS audit logs that the oozie-examples-3.3.0.jar file gets inspected, but never opened. These are the only four entries for the jar file in the audit log for the time the workflow is running:
2013-03-12 14:42:07,394 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
2013-03-12 14:42:07,399 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
<snip>
2013-03-12 14:42:07,547 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
2013-03-12 14:42:07,550 INFO FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) via oozie (auth:SIMPLE) ip=/192.168.56.12 cmd=getfileinfo src=/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar dst=null perm=null
The container logs I get from the webconsole on port 8088 show the following exception, but offer no further clues:
2013-03-12 15:10:18,681 FATAL [IPC Server handler 2 on 57310] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363061307536_0002_m_000000_0 - exited : java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:396)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.example.SampleMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1611)
at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:979)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.example.SampleMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1579)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1603)
... 16 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.example.SampleMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1485)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1577)
... 17 more
I managed to grab the job.xml file out of the temp directory while the failing stage of the workflow was running, and I can see that the jar file gets added to the classpath property:
<property><name>mapreduce.job.classpath.files</name><value>/user/user/user-oozi/0000001-130312141058075-oozie-oozi-W/mr-node--map-reduce/map-reduce-launcher.jar,/user/user/examples/apps/map-reduce/lib/oozie-examples-3.3.0.jar</value><source>programatically</source></property>
... but the class is still apparently not found. I've set all debugging up to DEBUG for all components and can find no more clues.
Have I simply misconfigured something, or is this actually a bug? I don't really know what to do next.

Related

YARN application fails after submission while running any mapreduce jar

Here is the error snapshot :
[hduser#secondary ~]$ yarn jar test_word_count.jar com.test.wordc.WordCount /temp /test_output
18/10/11 16:13:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/11 16:13:02 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.70.148:8032
18/10/11 16:13:02 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/10/11 16:13:03 INFO input.FileInputFormat: Total input paths to process : 1
18/10/11 16:13:03 INFO mapreduce.JobSubmitter: number of splits:1
18/10/11 16:13:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539254507997_0001
18/10/11 16:13:04 INFO impl.YarnClientImpl: Submitted application application_1539254507997_0001
18/10/11 16:13:04 INFO mapreduce.Job: The url to track the job: http://secondary:8088/proxy/application_1539254507997_0001/
18/10/11 16:13:04 INFO mapreduce.Job: Running job: job_1539254507997_0001
18/10/11 16:13:05 INFO mapreduce.Job: Job job_1539254507997_0001 running in uber mode : false
18/10/11 16:13:05 INFO mapreduce.Job: map 0% reduce 0%
18/10/11 16:13:05 INFO mapreduce.Job: Job job_1539254507997_0001 failed with state FAILED due to: Application application_1539254507997_0001 failed 2 times due to Error launching appattempt_1539254507997_0001_000002. Got exception: java.io.IOException: Failed on local exception: java.io.IOException: java.io.EOFException; Host Details : local host is: "secondary/10.0.70.149"; destination host is: "slave4":38102;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy32.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.EOFException
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
... 12 more
. Failing the application.
18/10/11 16:13:05 INFO mapreduce.Job: Counters: 0
Here is the /etc/hosts file
[hduser#secondary ~]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.70.149 secondary
10.0.70.148 master
10.0.70.143 slave1
10.0.70.144 slave2
10.0.70.145 slave3
10.0.70.146 slave4
10.0.70.147 slave5
all the services are running fine. It is a 6 node cluster with 5 DN and 1 NN.
After submitting the job, I get an error which is listed above. NN hadoop version is hadoop 2.6.0 while hadoop version in DN is 2.5.2. It is a maven build with hadoop version 2.6.0 in pom.xml
Fixed the error by installing hadoop 2.5.2 on the namemode iteslf. Seems like the whole cluster requires the same hadoop version.

geomesa add-attribute-index fails

I am trying to add an index to my existing table using the following command (run inside accumulo-master docker image)
geomesa add-attribute-index -u root -p secret -i gis -z SERVER_IP -c posiciones -f posicion -a id_posicion --coverage join
But it does not work and produce this output:
INFO Running map reduce index job for attributes: [id_posicion] with coverage: join...
ERROR Error encountered running attribute index command. Check hadoop's job history logs for more information.
The hadoop job log is the following:
2017-09-17 20:39:48,253 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1505353025896_0020_000002
2017-09-17 20:39:48,706 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-09-17 20:39:48,757 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2017-09-17 20:39:49,079 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 20 cluster_timestamp: 1505353025896 } attemptId: 2 } keyId: -1893920016)
2017-09-17 20:39:49,094 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2017-09-17 20:39:49,095 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config org.apache.hadoop.mapred.DirectFileOutputCommitter
2017-09-17 20:39:49,173 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:519)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:499)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1594)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:499)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:284)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1552)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1549)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1482)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:223)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:516)
... 11 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 13 more
Any idea?
This is probably a bug - the jars to load are defined in this file. Likely the file needs to be updated for newer versions of accumulo - the missing class now appears to be in the accumulo-core jar. You should be able to fix it by adding the line accumulo-core to that file, which ends up in lib/geomesa-accumulo-jobs-<version>.jar in the tools distribution.
Is $ACCUMULO_HOME set? And are other geomesa commands working?
Setting $ACCUMULO_HOME to point to a copy of the Accumulo distribution would likely help. If you are using the GeoMesa tools from a machine which is not part of the cluster, then you can use the install-hadoop-accumulo.sh script in the tools distribution to download a copy of the necessary dependencies to $GEOMESA_HOME/lib.

Hadoop Streaming Error No such file or directory

I study Hadoop and test Hadoop Streaming using Ruby whether my MapReduce algorithm could work without error.
So, I did next command.
hadoop jar hadoop-streaming-2.7.2.jar -files mapper.rb,reducer.rb -mapper mapper.rb -reducer reducer.rb -input test.json -output test
But, next error occurred.
dir/usercache/Kuma/appcache/application_1469093819516_0005/container_1469093819516_0005_01_000002/./mapper.rb": error=2, No such file or directory
Also, next is one job all error.
16/07/21 19:22:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/var/folders/l_/21nh6rmn6f3fn3vd55kh0b5h0000gn/T/hadoop-unjar8708804571112102348/] [] /var/folders/l_/21nh6rmn6f3fn3vd55kh0b5h0000gn/T/streamjob5933629893966751936.jar tmpDir=null
16/07/21 19:22:05 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/07/21 19:22:05 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/07/21 19:22:06 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/21 19:22:06 INFO mapreduce.JobSubmitter: number of splits:2
16/07/21 19:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469093819516_0005
16/07/21 19:22:06 INFO impl.YarnClientImpl: Submitted application application_1469093819516_0005
16/07/21 19:22:06 INFO mapreduce.Job: The url to track the job: http://hatanokaoruakira-no-MacBook-Air.local:8088/proxy/application_1469093819516_0005/
16/07/21 19:22:06 INFO mapreduce.Job: Running job: job_1469093819516_0005
16/07/21 19:22:13 INFO mapreduce.Job: Job job_1469093819516_0005 running in uber mode : false
16/07/21 19:22:13 INFO mapreduce.Job: map 0% reduce 0%
16/07/21 19:22:18 INFO mapreduce.Job: Task Id : attempt_1469093819516_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/private/tmp/hadoop-Kuma/nm-local-dir/usercache/Kuma/appcache/application_1469093819516_0005/container_1469093819516_0005_01_000002/./mapper.rb": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
mapper.rb and reducer.rb exist at current directory which executed the command.
wordcount MapReduce for testing is running without error, so I think that Hadoop setting is ok.
Environment
Mac El Capitan
Hadoop 2.7.2(presudo distributed mode)
files specified with -files options must be in hdfs:
command:
hadoop jar hadoop-streaming.jar -files f1.txt,f2.txt -input f1.txt -output test1
Error:
Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/f1.txt
After copying files to hdfs (in your case you will need to copy to hdfs root dir - (according to error in your log)):
hadoop fs -put f1.txt /user/cloudera
hadoop fs -put f2.txt /user/cloudera
job ran with NO errors:
[cloudera#quickstart hadoop-mapreduce]$ hadoop jar hadoop-streaming.jar -files f1.txt,f2.txt -input f1.txt -output test1
packageJobJar: [] [/usr/jars/hadoop-streaming-2.6.0-cdh5.7.0.jar] /tmp/streamjob5729321067745308196.jar tmpDir=null
16/07/23 11:36:16 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/07/23 11:36:17 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/07/23 11:36:18 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/23 11:36:18 INFO mapreduce.JobSubmitter: number of splits:1
HadoopStreaming - Making_Files_Available_to_Tasks:
The -files and -archives options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.
Note: The -files and -archives options are generic options. Be sure to place the generic options before the command options, otherwise the command will fail.

Debugging hadoop Wordcount program in eclipse in windows

Trying to run Wordcount program in hadoop in eclipse (windows 7). and passing these argument in eclipse only
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\input\word.txt
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\output
I have created input file in project only like input folder and inside it word.txt file
But it is throughing below excption
2015-04-08 15:30:09,947 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-08 15:30:10,238 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable E:\hadoop\hadoop-HADOOP_HOME\hadoop-2.6.0\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:144)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:206)
at com.WordCount.main(WordCount.java:52)
2015-04-08 15:30:11,039 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-08 15:30:11,041 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/hadoop/eclipse-hadoop-pro/workspace-hadoop/WordCountPro/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.WordCount.main(WordCount.java:61)
I doubt if Hadoop is installed correctly. Check in your machine if all the daemons are running or not.If not, then consider re-checking or re-installing what you are missing.
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH
to point at /etc/hadoop where core-site.xml, hdfs-site.xml, and mapred-site.xml are defined. Hadoop cluster is functional and works fine with non-Pig jobs. Based on the error descriptions below, I understand that Pig cannot find the JobContextImpl class it is looking for.
Based on the Hadoop 1.1.2 API documentation, I don't believe "task" is a sub-package of the "mapreduce" package. I have tried adding hadoop-core-1.1.2.jar directly to $PIG_CLASSPATH
and that did not work. (After looking at the contents of hadoop-core-1.1.2.jar, and the Hadoop 1.1.2 API documentation, I don't believe JobContextImpl is defined in the package Pig is attempting to load it from). How do I get Pig 0.13 to work with Hadoop 1.1.2?
=======Error follows as below=======
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
2014-08-03 14:01:06,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.localdomain:8020/
2014-08-03 14:01:06,388 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.localdomain:8021
2014-08-03 14:01:06,440 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
===Contents of pig_1407088865958.log ===
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:68)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:79)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
Though it is unclear how well this works for everyone, it appears that the asker mentioned how he solved the problem:
In my searching for help I saw posts stating that it needs to be
recompiled with a parameter that indicates version. The parameter
values I saw were 23,24. I did not know how that parameter mapped to
the version of hadoop that I am using 1.1.2. I hacked the bin/pig
script to point to hadoop-core-1.1.2.jar. The script requires
HADOOP_HOME be set (which is deprecated).

Resources