Oozie workflow giving error on Hive job when underlying job completes successfully - debugging

Part of self-learning I am exploring Oozie, and I am practicing on Hortonworks Sandbox VM. The problem is that Oozie workflow is getting error and getting killed as a result when underlying job given by the link in Oozie UI shows success.
I have looked at this question and have included
<job-xml>hive-site.xml</job-xml>
in the job description, and have copied hive-site.xml to HDFS to the correct folder but to no avail. Additionally, I have double checked all URLs and everything is right.
I am running the Oozie job from command line. I have no idea where to start debugging or how to get a more detailed error. Following are screenshots:
Oozie Error
Underlying Hive job indicates successful completion.
I do not see the final result as hive table as I am supposed to see.
Following is the log output of the Map task:
<<< Invocation of Hive command completed <<<
Hadoop Job IDs executed by Hive:
Intercepting System.exit(12)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [12]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000005-160711211729704-oozie-oozi-W/define_congress_table--hive/action-data.seq
2016-07-12 05:30:57,817 INFO [main] zlib.ZlibFactory (ZlibFactory.java:<clinit>(49)) - Successfully loaded & initialized native-zlib library
2016-07-12 05:30:57,818 INFO [main] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
Oozie Launcher ends
2016-07-12 05:30:57,836 INFO [main] mapred.Task (Task.java:done(1038)) - Task:attempt_1468271868299_0037_m_000000_0 is done. And is in the process of committing
2016-07-12 05:30:57,878 INFO [main] mapred.Task (Task.java:commit(1199)) - Task attempt_1468271868299_0037_m_000000_0 is allowed to commit now
2016-07-12 05:30:57,887 INFO [main] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(582)) - Saved output of task 'attempt_1468271868299_0037_m_000000_0' to hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000005-160711211729704-oozie-oozi-W/define_congress_table--hive/output/_temporary/1/task_1468271868299_0037_m_000000
2016-07-12 05:30:57,936 INFO [main] mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_1468271868299_0037_m_000000_0' done.
Log Type: syslog
Log Upload Time: Tue Jul 12 05:31:05 +0000 2016
Log Length: 2781
2016-07-12 05:30:48,083 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-07-12 05:30:48,151 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-07-12 05:30:48,152 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-07-12 05:30:48,163 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-07-12 05:30:48,163 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1468271868299_0037, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#1fbe7534)
2016-07-12 05:30:48,212 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 10.0.2.15:8050, Ident: (owner=root, renewer=oozie mr token, realUser=oozie, issueDate=1468301434802, maxDate=1468906234802, sequenceNumber=22, masterKeyId=90)
2016-07-12 05:30:48,257 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-07-12 05:30:48,496 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /hadoop/yarn/local/usercache/root/appcache/application_1468271868299_0037
2016-07-12 05:30:48,955 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-07-12 05:30:49,414 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-07-12 05:30:49,414 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-07-12 05:30:49,423 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-07-12 05:30:49,475 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 4558
2016-07-12 05:30:49,647 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit#1f16b6e6
2016-07-12 05:30:49,654 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2016-07-12 05:30:49,700 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2016-07-12 05:30:50,069 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
2016-07-12 05:30:50,253 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050

To find the error check:
Check the "Child Job URLs" check each child job.
Check stdout and stderr
Sometime the error is in the child job and you will need to do some drill down on every one of them.

Related

MapReduce job is failing with an error failed to write data

I'm trying to export data from teradata to hadoop. but my export query is failing by giving an error "Failed to write data".Please look at the Mapreduce and application logs below:
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 4931
2016-03-08 22:47:07,414 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-03-08 22:47:07,509 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-03-08 22:47:07,510 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1457504560070_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#175b9425)
2016-03-08 22:47:07,556 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.2:8032,39.7.48.3:8032, Ident: (owner=hive, renewer=oozie mr token, realUser=oozie, issueDate=1457506410968, maxDate=1458111210968, sequenceNumber=908, masterKeyId=280)
2016-03-08 22:47:07,599 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-03-08 22:47:07,848 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data1/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data2/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data3/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data4/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data5/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data6/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data7/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data8/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data9/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data10/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data12/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004
2016-03-08 22:47:08,132 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-03-08 22:47:08,623 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-03-08 22:47:08,840 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataInputSplit#2ece4966
2016-03-08 22:47:08,844 INFO [main] com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReader: recordreader class com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReaderinitialize time is: 1457506028844
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 300417020(1201668080)
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1146
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 841167680
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1201668096
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 300417020; length = 75104256
2016-03-08 22:47:09,515 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2016-03-08 22:47:09,848 WARN [main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.metastore.local does not exist
2016-03-08 22:47:09,914 INFO [main] hive.metastore: Trying to connect to metastore with URI thrift://apus2.labs.teradata.com:9083
2016-03-08 22:47:09,951 INFO [main] hive.metastore: Connected to metastore.
2016-03-08 22:47:10,407 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2016-03-08 22:47:10,452 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2016-03-08 22:47:10,457 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
APPLICATION Master LOGS:
Log Type: stderr
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 240
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Log Type: stdout
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 0
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 66959
Showing 4096 bytes of 66959 total. Click here for the full log.
ILED
2016-03-08 22:59:19,325 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event JOB_FAILED
2016-03-08 22:59:19,456 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1.jhist to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,550 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,562 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1_conf.xml to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,614 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,645 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary
2016-03-08 22:59:19,654 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2016-03-08 22:59:19,671 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Task failed task_1457504560070_0004_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-03-08 22:59:19,672 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: History url is http://apus2.labs.teradata.com:19888/jobhistory/job/job_1457504560070_0004
2016-03-08 22:59:19,680 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Waiting for application to be successfully unregistered.
2016-03-08 22:59:20,682 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:7 ContRel:0 HostLocal:6 RackLocal:1
2016-03-08 22:59:20,684 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://C423A /user/hive/.staging/job_1457504560070_0004
2016-03-08 22:59:20,711 INFO [Thread-89] org.apache.hadoop.ipc.Server: Stopping server on 46067
2016-03-08 22:59:20,712 INFO [IPC Server listener on 46067] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 46067
2016-03-08 22:59:20,712 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-03-08 22:59:20,714 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted.
Plese help me in resolving the issue.
You must be using sqoop to bring data to hadoop. Please paste command you are running. For "Failed to write data" , there can be multiple issues. destination parent directory is not avialble, space is not there at cluster etc.Only command can give explanation.

Error while doing bulkload in HBase

Im trying to do bulkload in HBase but below exception is coming while loading the data...
Application application_1439213972129_0080 initialization failed (exitCode=255) with output: Requested user root is not whitelisted and has id 0,which is below the minimum allowed 500
Failing this attempt. Failing the application.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary -Dimporttsv.separator=',' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv
2015-08-13 18:24:33,076 INFO [main] mapreduce.TableMapReduceUtil: Setting speculative execution off for bulkload operation
2015-08-13 18:24:33,123 INFO [main] mapreduce.TableMapReduceUtil: Configured 'hbase.mapreduce.mapr.tablepath' to /tables/emp_salary_new1
2015-08-13 18:24:33,220 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,372 INFO [main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2015-08-13 18:24:33,735 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,770 INFO [main] mapreduce.TableOutputFormat: Created table instance for /tables/emp_salary_new1
2015-08-13 18:24:34,252 INFO [main] input.FileInputFormat: Total input paths to process : 1
2015-08-13 18:24:34,294 INFO [main] mapreduce.JobSubmitter: number of splits:1
2015-08-13 18:24:34,535 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1439213972129_0055
2015-08-13 18:24:34,792 INFO [main] security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
2015-08-13 18:24:35,031 INFO [main] impl.YarnClientImpl: Submitted application application_1439213972129_0055
2015-08-13 18:24:35,114 INFO [main] mapreduce.Job: The url to track the job: http://hadoop-c02n02.ss.sw.ericsson.se:8088/proxy/application_1439213972129_0055/
2015-08-13 18:24:35,115 INFO [main] mapreduce.Job: Running job: job_1439213972129_0055
2015-08-13 18:24:53,253 INFO [main] mapreduce.Job: Job job_1439213972129_0055 running in uber mode : false
2015-08-13 18:24:53,256 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-08-13 18:24:53,281 INFO [main] mapreduce.Job: Job job_1439213972129_0055 failed with state FAILED due to: Application application_1439213972129_0055 failed 2 times due to AM Container for appattempt_1439213972129_0055_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop-c02n02.ss.sw.ericsson.se:8088/cluster/app/application_1439213972129_0055Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e02_1439213972129_0055_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : user is mapradm
main : requested yarn user is mapradm
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
2015-08-13 18:24:53,320 INFO [main] mapreduce.Job: Counters: 0
Looks like you are loading data in MapR DB not in hbase.But its fine hbase commands are compatible with MarDB. I just a small change in your command and see if that works for you.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary '-Dimporttsv.separator=,' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv

Oozie launching MR jobs as YARN user instead of given user name

I have designed a Oozie workflow to run a Sqoop script.
I'm submitting the workflow using the user name given by hadoop admin team.
Script is failing because oozie is launching MR jobs as YARN user which is not able to access my userid directory in HDFS i.e /user/cv1100.
I checked the MR log there I can see the property as "user.name=yarn"
How can I change this? I have mentioned "user.name" in job.properties file of Oozie.
Below is the error I’m getting in logs, check if it helps
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
error: error reading /usr/lib/hadoop/lib/smore.jar; /usr/lib/hadoop/lib/smore.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/adfs.jar; /usr/lib/hadoop/lib/adfs.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/aftp.jar; /usr/lib/hadoop/lib/aftp.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/janusclient.jar; /usr/lib/hadoop/lib/janusclient.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/adfs-api-loader.jar; /usr/lib/hadoop/lib/adfs-api-loader.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/aster-networking.jar; /usr/lib/hadoop/lib/aster-networking.jar (Permission denied)
Note: /tmp/sqoop-yarn/compile/a9988b6ea5448f4cc962b625361feb1a/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
31075 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - Loading data to table ecsdashboard.test_oozie
31175 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - Failed with exception Unable to move sourcehdfs:///user/qjdht93/test/_SUCCESS to destination hdfs://apps/hive/warehouse/ecsdashboard.db/test_oozie/_SUCCESS
31176 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
31728 [main] ERROR org.apache.sqoop.tool.ImportTool - Encountered IOException running import job: java.io.IOException: Hive exited with status 1
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:425)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:206)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:174)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://Had1:8020/user/qjdht93/oozie-oozi/0000054-150404155202480-oozie-oozi-W/sqoop2hive--sqoop/action-data.seq
Oozie Launcher ends
Log Type: syslog
Log Length: 2914
2015-04-14 02:49:11,912 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-04-14 02:49:11,943 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2015-04-14 02:49:12,012 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-04-14 02:49:12,012 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-04-14 02:49:12,024 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-04-14 02:49:12,024 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1428177121154_1938, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#45d1f40c)
2015-04-14 02:49:12,048 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.3:8050, Ident: (owner=qjdht93, renewer=oozie mr token, realUser=oozie, issueDate=1428994143490, maxDate=1429598943490, sequenceNumber=253, masterKeyId=11)
2015-04-14 02:49:12,100 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-04-14 02:49:12,383 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data2/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data3/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data4/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data5/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data6/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data7/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data8/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data9/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data10/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data11/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data12/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938
2015-04-14 02:49:12,821 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-14 02:49:13,273 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-04-14 02:49:13,596 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://Had1:8020/user/qjdht93/oozie-oozi/0000054-150404155202480-oozie-oozi-W/sqoop2hive--sqoop/input/dummy.txt:0+5
2015-04-14 02:49:13,622 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2015-04-14 02:49:13,659 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id

Loading data from HDFS does not work with Elephantbird

I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script:
register 'lib/elephant-bird-core-3.0.9.jar';
register 'lib/elephant-bird-pig-3.0.9.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
twitter = LOAD 'statuses.log.2013-04-01-00'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP twitter;
The output I get is
[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R:
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504,
Input(s):
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00"
Output(s):
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log
The file exists and is accessible:
$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00
This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I got a similar error when running another user defined function for loading data:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
When I force pig to local mode (''-x local'') I get the more obvious error
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess.
This is indeed a versioning problem: some libraries are not yet compatible with the new MapReduce API, see for example the issues #56, #247 and #308.
For ElephantBird the issue is solved in a recent version. Using ElephantBird 4.1 in the above code and adding the Hadoop compatibility module
register 'lib/elephant-bird-core-4.1.jar';
register 'lib/elephant-bird-pig-4.1.jar';
register 'lib/elephant-bird-hadoop-compat-4.1.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
solved the problem! :-)

Hadoop MapReduce - Pig/Cassandra - Unable to create input splits

I'm trying to run a MapReduce Job with Pig and Cassandra and I always get the error:
ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
[SOLVED]
There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS,
PIG_PARTITIONER
/opt/cassandra-0.7.0-beta3/contrib/pig$ bin/pig_cassandra example-script.pig
10/11/15 17:38:26 INFO pig.Main: Logging error messages to: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
2010-11-15 17:38:27,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop-master-1.dkd.lan:8020
2010-11-15 17:38:29,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoop-master-1.dkd.lan:8021
2010-11-15 17:38:32,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://hadoop-master-1.dkd.lan/tmp/temp657556636/tmp-375431593:org.apache.pig.builtin.BinStorage) - 1-82 Operator Key: 1-82)
2010-11-15 17:38:32,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-11-15 17:38:33,364 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-11-15 17:38:38,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-11-15 17:38:38,999 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-11-15 17:38:39,055 [Thread-4] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-11-15 17:38:39,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-11-15 17:38:40,340 [Thread-4] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://hadoop-master-1.dkd.lan/var/lib/hadoop-0.20/cache/mapred/mapred/staging/dkd-sprenger/.staging/job_201011101636_0011
2010-11-15 17:38:40,356 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-11-15 17:38:40,357 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-11-15 17:38:40,402 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2010-11-15 17:38:40,517 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
Details at logfile: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
Anyone who has an idea -> SOLVED
There were some environment variables I missed to set them.
enviroment:
Ubuntu Server 10.4
Versions:
hadoop: 0.20
pig: 0.7
cassandra: 0.7.0 beta3
The asker already updated the question to include the answer:
[SOLVED] There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS, PIG_PARTITIONER

Resources