Cannot write to log - pig error in local mode - hadoop

Folks
I am new to hadoop eco system , was trying to install pig .
got stuck with the below error will trying to execute the tutorial script.
while installing the pig in hadoop cluster i did not change any permission
please let me know if have to do or the below is pertain to that.
any information is highly appreciable
2016-01-08 07:19:20,655 [main] WARN org.apache.pig.Main - Cannot write to
log file: /usr/lib/pig-0.6.0/tutorial/scripts/pig_1452266360654.log
2016-01-08 07:19:20,723 [main] WARN org.apache.pig.Main - Cannot write to log file: /usr/lib/pig-0.6.0/tutorial/scripts//script1-hadoop.pig1452266360723.log
2016-01-08 07:19:20,740 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. null
2016-01-08 07:19:20,747 [main] WARN org.apache.pig.Main - There is no log file to write to.
2016-01-08 07:19:20,747 [main] ERROR org.apache.pig.Main - java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:394)
at java.util.Properties.setProperty(Properties.java:143)
at org.apache.pig.Main.main(Main.java:373)

Related

Not able to export Hbase table into CSV file using HUE Pig Script

I have installed Apache Amabari and configured the Hue. I want to export hbase table data into csv file using pig script but I am getting following error.
2017-06-03 10:27:45,518 [ATS Logger 0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Exception caught by TimelineClientConnectionRetry, will try 30 more time(s).
Message: java.net.ConnectException: Connection refused
2017-06-03 10:27:45,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-06-03 10:27:45,709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/usr/lib/hbase/lib/hbase-common-1.2.0-cdh5.11.0.jar' does not exist.
2017-06-03 10:27:45,899 [main] INFO org.apache.pig.Main - Pig script completed in 4 seconds and 532 milliseconds (4532 ms)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code 2
Oozie Launcher failed, finishing Hadoop job gracefully
Please help me and where I am doing wrong.
Let me know your concerns.

pig script does not exists error , even if I can see it in hdfs

I am trying to run the pig script using the -f usecatalog option but it is giving me issue.
it says script does not exist, while I can see the file is present in hdfs file system. see below.
[hdfs#ip-xx-xx-xx-x-xx ec2-user]$ pig -useHCatalog -f /user/admin/pig/scripts/hcat1.pig
WARNING: Use "yarn jar" to launch YARN applications.
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0.2.3.4.0-3485 (rexported) compiled Dec 16 20 15, 04:30:33
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,184 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /user/admin/pig/scripts/hca t1.pig does not exist
Details at logfile: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,203 [main] INFO org.apache.pig.Main - Pig script completed in 753 milliseconds (753 ms)
[hdfs#ip-xxx-xx-xx-xx ec2-user]$ hadoop fs -cat /user/admin/pig/scripts/hcat1.pig
a = load 'trucks' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by truckid == 'A1';
store b INTO '/user/admin/pig/scritps/outputb1';
You need to specify the complete HDFS URI to run the scripts that are stored in HDFS.
Here is what you need:
$pig -useHCatalog hdfs://namenode_hostname:port/user/admin/pig/scripts/hcat1.pig

PIG setup throwing error

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.
When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce
Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

How to Import/Load .csv file in PIG?

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
2014-09-06 08:41:23,527 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-06 08:41:23,544 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-06 08:41:23,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2739171785773930333.jar
2014-09-06 08:42:39,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2739171785773930333.jar created
2014-09-06 08:42:39,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-09-06 08:42:39,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-09-06 08:42:39,630 [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2014-09-06 08:42:39,891 [Thread-12] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://192.168.195.130:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201408292336_0009
2014-09-06 08:42:39,891 [Thread-12] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
2014-09-06 08:42:40,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.1.1 0.10.0-cdh4.1.1 training 2014-09-06 08:41:23 2014-09-06 08:42:40 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A inputfile MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785,
Input(s):
Failed to read data from "/training/pig/datetemp.txt"
Output(s):
Failed to produce result in "hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-09-06 08:42:40,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias inputfile
Details at logfile: /home/training/pig_1410006833865.log
Please help me here..!!
PigStorage is case sensitive. Use PigStorage and not pigstorage.
Your question headliner said you were trying to load a CSV file. For that, I've had good luck with using org.apache.pig.piggybank.storage.CSVExcelStorage() in my LOAD statements as demonstrated at https://martin.atlassian.net/wiki/x/WYBmAQ.
Why don't you write PigStorage('\t') as you have mentioned already you have tab delimited file instead of PigStorage()
Mentioned code -
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage()
As (EventID: chararray,eventdate: chararray,count:int);
May be this might solve your problem.
let me know if it is something else.
hdfs://192.168.195.130:8020/training/pig/datetemp.txt
file wat not found in your hdfs!! make sure the input file to be placed in the above location.
Have you checked whether the input path exists?
Try:
fs -ls /training/pig/ in Grunt Shell
if it displays datetemp.txt in the list then it will work otherwise give proper input path
Log is telling the ERROR clearly.
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
Can you check the file exists in HDFS or not?
You can also check your pig is running in mapreduce mode or local mode.
You can specify ',' in PigStorage Class to read CSV file.
Query looks like :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
And make sure that you have file '/training/pig/datetemp.txt' on HDFS.
To test run : hadoop fs -ls /training/pig/datetemp.txt

Loading data from HDFS does not work with Elephantbird

I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script:
register 'lib/elephant-bird-core-3.0.9.jar';
register 'lib/elephant-bird-pig-3.0.9.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
twitter = LOAD 'statuses.log.2013-04-01-00'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP twitter;
The output I get is
[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R:
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504,
Input(s):
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00"
Output(s):
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log
The file exists and is accessible:
$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00
This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I got a similar error when running another user defined function for loading data:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
When I force pig to local mode (''-x local'') I get the more obvious error
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess.
This is indeed a versioning problem: some libraries are not yet compatible with the new MapReduce API, see for example the issues #56, #247 and #308.
For ElephantBird the issue is solved in a recent version. Using ElephantBird 4.1 in the above code and adding the Hadoop compatibility module
register 'lib/elephant-bird-core-4.1.jar';
register 'lib/elephant-bird-pig-4.1.jar';
register 'lib/elephant-bird-hadoop-compat-4.1.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
solved the problem! :-)

Resources