Oozie launching MR jobs as YARN user instead of given user name - hadoop

I have designed a Oozie workflow to run a Sqoop script.
I'm submitting the workflow using the user name given by hadoop admin team.
Script is failing because oozie is launching MR jobs as YARN user which is not able to access my userid directory in HDFS i.e /user/cv1100.
I checked the MR log there I can see the property as "user.name=yarn"
How can I change this? I have mentioned "user.name" in job.properties file of Oozie.
Below is the error I’m getting in logs, check if it helps
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
error: error reading /usr/lib/hadoop/lib/smore.jar; /usr/lib/hadoop/lib/smore.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/adfs.jar; /usr/lib/hadoop/lib/adfs.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/aftp.jar; /usr/lib/hadoop/lib/aftp.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/janusclient.jar; /usr/lib/hadoop/lib/janusclient.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/adfs-api-loader.jar; /usr/lib/hadoop/lib/adfs-api-loader.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/aster-networking.jar; /usr/lib/hadoop/lib/aster-networking.jar (Permission denied)
Note: /tmp/sqoop-yarn/compile/a9988b6ea5448f4cc962b625361feb1a/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
31075 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - Loading data to table ecsdashboard.test_oozie
31175 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - Failed with exception Unable to move sourcehdfs:///user/qjdht93/test/_SUCCESS to destination hdfs://apps/hive/warehouse/ecsdashboard.db/test_oozie/_SUCCESS
31176 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
31728 [main] ERROR org.apache.sqoop.tool.ImportTool - Encountered IOException running import job: java.io.IOException: Hive exited with status 1
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:425)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:206)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:174)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://Had1:8020/user/qjdht93/oozie-oozi/0000054-150404155202480-oozie-oozi-W/sqoop2hive--sqoop/action-data.seq
Oozie Launcher ends
Log Type: syslog
Log Length: 2914
2015-04-14 02:49:11,912 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-04-14 02:49:11,943 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2015-04-14 02:49:12,012 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-04-14 02:49:12,012 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-04-14 02:49:12,024 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-04-14 02:49:12,024 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1428177121154_1938, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#45d1f40c)
2015-04-14 02:49:12,048 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.3:8050, Ident: (owner=qjdht93, renewer=oozie mr token, realUser=oozie, issueDate=1428994143490, maxDate=1429598943490, sequenceNumber=253, masterKeyId=11)
2015-04-14 02:49:12,100 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-04-14 02:49:12,383 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data2/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data3/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data4/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data5/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data6/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data7/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data8/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data9/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data10/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data11/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data12/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938
2015-04-14 02:49:12,821 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-14 02:49:13,273 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-04-14 02:49:13,596 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://Had1:8020/user/qjdht93/oozie-oozi/0000054-150404155202480-oozie-oozi-W/sqoop2hive--sqoop/input/dummy.txt:0+5
2015-04-14 02:49:13,622 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2015-04-14 02:49:13,659 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id

Related

ERROR streaming.StreamJob: Job not successful

I install Hadoop 2.9.0 and I have 4 nodes. The namenode and resourcemanager services are running on master and datanodes and nodemanagers are running on slaves. Now I wanna run a python MapReduce job. But Job not successful!
Please tell me what should I do?
Log of running of job in terminal:
hadoop#hadoopmaster:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.9.0.jar -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/hadoop/* -output /user/hadoop/output
18/06/17 04:26:28 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
18/06/17 04:26:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [mapper.py, /tmp/hadoop-unjar3316382199020742755/] [] /tmp/streamjob4930230269569102931.jar tmpDir=null
18/06/17 04:26:28 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.111.175:8050
18/06/17 04:26:29 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.111.175:8050
18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
18/06/17 04:26:29 INFO mapred.FileInputFormat: Total input files to process : 4
18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
18/06/17 04:26:29 INFO mapreduce.JobSubmitter: number of splits:4
18/06/17 04:26:29 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/06/17 04:26:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529233655437_0004
18/06/17 04:26:30 INFO impl.YarnClientImpl: Submitted application application_1529233655437_0004
18/06/17 04:26:30 INFO mapreduce.Job: The url to track the job: http://hadoopmaster.png.com:8088/proxy/application_1529233655437_0004/
18/06/17 04:26:30 INFO mapreduce.Job: Running job: job_1529233655437_0004
18/06/17 04:45:12 INFO mapreduce.Job: Job job_1529233655437_0004 running in uber mode : false
18/06/17 04:45:12 INFO mapreduce.Job: map 0% reduce 0%
18/06/17 04:45:12 INFO mapreduce.Job: Job job_1529233655437_0004 failed with state FAILED due to: Application application_1529233655437_0004 failed 2 times due to Error launching appattempt_1529233655437_0004_000002. Got exception: org.apache.hadoop.net.ConnectTimeoutException: Call From hadoopmaster.png.com/192.168.111.175 to hadoopslave1.png.com:40569 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=hadoopslave1.png.com/192.168.111.173:40569]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.GeneratedConstructorAccessor38.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:774)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
at org.apache.hadoop.ipc.Client.call(Client.java:1349)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy82.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy83.startContainers(Unknown Source)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=hadoopslave1.png.com/192.168.111.173:40569]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:687)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1385)
... 19 more
. Failing the application.
18/06/17 04:45:13 INFO mapreduce.Job: Counters: 0
18/06/17 04:45:13 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
OK. I found the reason of problem. In fact, the following error should be resolved:
hadoopmaster.png.com/192.168.111.175 to hadoopslave1.png.com:40569
failed on socket timeout exception
So just did:
sudo ufw allow 40569

Error when trying to execute pig statment

I am trying to execute a pig statement that shows me the data in a txt file and I am running in mapreduce mode, but I am getting an error please can somebody help me to resolve this!!
[root#master ~]# pig -x mapreduce
17/04/19 17:42:34 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/04/19 17:42:34 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/04/19 17:42:34 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-04-19 17:42:34,853 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r1746530) compiled Jun 01 2016, 23:10:49
2017-04-19 17:42:34,853 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1492603954851.log
2017-04-19 17:42:34,907 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2017-04-19 17:42:36,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost
2017-04-19 17:42:37,130 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-f60d05c3-9fee-4624-9aa8-07f1584e6165
2017-04-19 17:42:37,130 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt> dump b;
2017-04-19 17:42:41,135 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/tmp/temp1549818457":dead:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1692)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3894)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:983)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2017-04-19 17:42:41,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias b
Details at logfile: /root/pig_1492603954851.log
You can try this :-
pig -x mapreduce -p 'pig.temp.dir'='<temp_location_hdfs>'
'temp_location_hdfs' should have either 775 or 777 permissions.
Then you can try :-
hadoop fs -chmod -R 777 /tmp/*
It seems like you do not have proper permission to pig.temp.dir setting and hence this issue. By default pig writes the intermediate results in /tmp on HDFS. Overwrite it by using -Dpig.temp.dir.

Oozie workflow giving error on Hive job when underlying job completes successfully

Part of self-learning I am exploring Oozie, and I am practicing on Hortonworks Sandbox VM. The problem is that Oozie workflow is getting error and getting killed as a result when underlying job given by the link in Oozie UI shows success.
I have looked at this question and have included
<job-xml>hive-site.xml</job-xml>
in the job description, and have copied hive-site.xml to HDFS to the correct folder but to no avail. Additionally, I have double checked all URLs and everything is right.
I am running the Oozie job from command line. I have no idea where to start debugging or how to get a more detailed error. Following are screenshots:
Oozie Error
Underlying Hive job indicates successful completion.
I do not see the final result as hive table as I am supposed to see.
Following is the log output of the Map task:
<<< Invocation of Hive command completed <<<
Hadoop Job IDs executed by Hive:
Intercepting System.exit(12)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [12]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000005-160711211729704-oozie-oozi-W/define_congress_table--hive/action-data.seq
2016-07-12 05:30:57,817 INFO [main] zlib.ZlibFactory (ZlibFactory.java:<clinit>(49)) - Successfully loaded & initialized native-zlib library
2016-07-12 05:30:57,818 INFO [main] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
Oozie Launcher ends
2016-07-12 05:30:57,836 INFO [main] mapred.Task (Task.java:done(1038)) - Task:attempt_1468271868299_0037_m_000000_0 is done. And is in the process of committing
2016-07-12 05:30:57,878 INFO [main] mapred.Task (Task.java:commit(1199)) - Task attempt_1468271868299_0037_m_000000_0 is allowed to commit now
2016-07-12 05:30:57,887 INFO [main] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(582)) - Saved output of task 'attempt_1468271868299_0037_m_000000_0' to hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000005-160711211729704-oozie-oozi-W/define_congress_table--hive/output/_temporary/1/task_1468271868299_0037_m_000000
2016-07-12 05:30:57,936 INFO [main] mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_1468271868299_0037_m_000000_0' done.
Log Type: syslog
Log Upload Time: Tue Jul 12 05:31:05 +0000 2016
Log Length: 2781
2016-07-12 05:30:48,083 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-07-12 05:30:48,151 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-07-12 05:30:48,152 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-07-12 05:30:48,163 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-07-12 05:30:48,163 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1468271868299_0037, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#1fbe7534)
2016-07-12 05:30:48,212 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 10.0.2.15:8050, Ident: (owner=root, renewer=oozie mr token, realUser=oozie, issueDate=1468301434802, maxDate=1468906234802, sequenceNumber=22, masterKeyId=90)
2016-07-12 05:30:48,257 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-07-12 05:30:48,496 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /hadoop/yarn/local/usercache/root/appcache/application_1468271868299_0037
2016-07-12 05:30:48,955 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-07-12 05:30:49,414 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-07-12 05:30:49,414 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-07-12 05:30:49,423 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-07-12 05:30:49,475 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 4558
2016-07-12 05:30:49,647 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit#1f16b6e6
2016-07-12 05:30:49,654 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2016-07-12 05:30:49,700 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2016-07-12 05:30:50,069 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
2016-07-12 05:30:50,253 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
To find the error check:
Check the "Child Job URLs" check each child job.
Check stdout and stderr
Sometime the error is in the child job and you will need to do some drill down on every one of them.

How to resolve the following apache pig error?

I am executing the following commands:
A= load 'user/cloudera' using PigStorage(':');
foreach A generate $0,$4,$5;
dump B;
On executing the last command I get the following error which I am unable to resolve.Being a newbie to bigdata and apache hadoop stack,I am unable to comprehend this error.Please help ASAP.Also searching on StackOverflow for similar errors did not help:
2015-11-13 06:36:46,170 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-11-13 06:36:46,208 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2015-11-13 06:36:46,212 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-11-13 06:36:46,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-11-13 06:36:46,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-11-13 06:36:46,404 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-11-13 06:36:46,415 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-11-13 06:36:46,445 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-11-13 06:36:49,232 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job306801006066349255.jar
2015-11-13 06:37:04,185 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job306801006066349255.jar created
2015-11-13 06:37:04,223 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-11-13 06:37:04,238 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-11-13 06:37:04,238 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2015-11-13 06:37:04,238 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-11-13 06:37:04,274 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-11-13 06:37:04,274 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-11-13 06:37:04,283 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-11-13 06:37:04,363 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-11-13 06:37:05,416 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1447417089361_0004
2015-11-13 06:37:05,420 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
2015-11-13 06:37:05,420 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 18 more
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1447417089361_0004
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,3],B[4,3] C: R:
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1447417089361_0004
2015-11-13 06:37:05,440 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-11-13 06:37:10,463 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-11-13 06:37:10,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1447417089361_0004 has failed! Stop running all dependent jobs
2015-11-13 06:37:10,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-11-13 06:37:10,620 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1447417089361_0004. Redirecting to job history server.
2015-11-13 06:37:10,844 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1447417089361_0004. Redirecting to job history server.
2015-11-13 06:37:10,849 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-11-13 06:37:10,850 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.4.2 0.12.0-cdh5.4.2 cloudera 2015-11-13 06:36:46 2015-11-13 06:37:10 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1447417089361_0004 A,B MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 18 more
hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528,
Input(s):
Failed to read data from "hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera"
Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1447417089361_0004
2015-11-13 06:37:10,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-11-13 06:37:10,853 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B
Details at logfile: /home/cloudera/pig_1447424730804.log
Solution:
Change line 1 of your code to:
A= load '/user/cloudera' using PigStorage(':');
Having said that, are you sure your input data is in your home area? That seems unlikely. It's more likely to be in a folder within your home area, i.e. /user/cloudera/input-data.
Before running your job do:
hdfs dfs -ls /user/cloudera
to confirm that input data is actually in that folder. If it's not, work out where it actually is, and make sure it is on the HDFS and not locally.
Explanation:
The relevant part of the logs is
ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
This suggests it's related to the input path. The part of your code that deals with the input path is
A= load 'user/cloudera' using PigStorage(':');
By not adding a forward slash to /user then it assumes that everything is relative to your home area, so for example writing load 'input' would lead the Pig job to read in hdfs://quickstart.cloudera:8028/user/cloudera/input. In your case then the missing slash means it adds it to your user area.
Please check where are you running pig statements in local/linux mode or mapreduce mode , if you are running local mode you cant read HDFS file
run this cmd $ pig -x mapreduce
after this
Grunt> A = load '/;
Grunt>Dump/store /
Input(s):
Failed to read data from
"hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera"
Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
If you look at the above error code (copied from question that was posted), input file path is taken by pig as below
'hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera'
So when running pig in mapreduce mode and using quickstart VM, you have use
'hdfs://quickstart.cloudera:8020' before your input file path....
Is the input a folder?...if so pig script will read all the files in the folder....
Example:
say your input file path which is located in hdfs is '/usr/cloudera/inputdata.txt' then your query has to be
A= load 'hdfs://quickstart.cloudera:8020/user/cloudera/inputdata.txt' using PigStorage(':');
foreach A generate $0,$4,$5;
dump B;
The only thing that u have to check is the file path.
hdfs dfs -ls /
check how the path structure comes.
like ..../input/data/file.
file u are going to use.
A = load '/input/data/file';
dump A;
only thing is the proper path of your file that u want to load.
I had a similar issue and I was able to fix by deleting the passwd file from the target folder and then copying it again using the correct relative path:
A= load '/user/cloudera' using PigStorage(':');
The instruction in the video is missing the '/' before the user
Step 1:
hdfs dfs -rm /user/cloudera/passwd
Step 2:
A= load '/user/cloudera' using PigStorage(':');
Then followed the instructions from the Coursera video.

Loading data from HDFS does not work with Elephantbird

I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script:
register 'lib/elephant-bird-core-3.0.9.jar';
register 'lib/elephant-bird-pig-3.0.9.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
twitter = LOAD 'statuses.log.2013-04-01-00'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP twitter;
The output I get is
[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R:
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504,
Input(s):
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00"
Output(s):
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log
The file exists and is accessible:
$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00
This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I got a similar error when running another user defined function for loading data:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
When I force pig to local mode (''-x local'') I get the more obvious error
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess.
This is indeed a versioning problem: some libraries are not yet compatible with the new MapReduce API, see for example the issues #56, #247 and #308.
For ElephantBird the issue is solved in a recent version. Using ElephantBird 4.1 in the above code and adding the Hadoop compatibility module
register 'lib/elephant-bird-core-4.1.jar';
register 'lib/elephant-bird-pig-4.1.jar';
register 'lib/elephant-bird-hadoop-compat-4.1.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
solved the problem! :-)

Resources