How to use hadoop pig streaming a compiled c program? - hadoop

I have testing the hadoop pig on a small cluster.
I have successfully using pig to stream perl, python, shell script and even jars but not c binaries!
I just build a simple Hello World program in c
And compiled it as test
And run it using ./test under ubuntu11.04 and g++ compiler is up to date.
The program run perfectly in the OS.
However when I tried to stream it in pig, it is always failed!
This is the pig script:
a = load ('test.txt');
define p `./test` ship('/home/clouduser/test');
b = stream a through p;
dump p;
test.txt contains only a space
and I have test the same configuration with perl, python, shell script and java successfully.
grunt> a = load 'test.txt';
grunt> define p `./1.sh` ship('/home/clouduser/1.sh');
grunt> b = stream a through p;
grunt> dump b
2011-09-08 23:53:33,940 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: STREAMING
2011-09-08 23:53:33,940 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-09-08 23:53:34,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: b: Store(hdfs://cloudlab-namenode/tmp/temp-502536453/tmp-1972014919:org.apache.pig.impl.io.InterStorage) - scope-2 Operator Key: scope-2)
2011-09-08 23:53:34,026 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2011-09-08 23:53:34,048 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-09-08 23:53:34,048 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-09-08 23:53:34,111 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2011-09-08 23:53:34,126 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-09-08 23:53:35,938 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-09-08 23:53:35,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-09-08 23:53:36,312 [Thread-9] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2011-09-08 23:53:36,313 [Thread-9] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2011-09-08 23:53:36,324 [Thread-9] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2011-09-08 23:53:36,324 [Thread-9] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2011-09-08 23:53:36,326 [Thread-9] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2011-09-08 23:53:36,494 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-09-08 23:53:37,101 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201109051400_0283
2011-09-08 23:53:37,101 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://172.19.1.4:50030/jobdetails.jsp?jobid=job_201109051400_0283
2011-09-08 23:54:01,755 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201109051400_0283 has failed! Stop running all dependent jobs
2011-09-08 23:54:01,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-09-08 23:54:01,774 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: './1.sh ' failed with exit status: 127
2011-09-08 23:54:01,774 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-09-08 23:54:01,776 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u1 0.8.1-cdh3u1 clouduser 2011-09-08 23:53:34 2011-09-08 23:54:01 STREAMING
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201109051400_0283 a,b STREAMING,MAP_ONLY Message: Job failed! Error - NA hdfs://cloudlab-namenode/tmp/temp-502536453/tmp-1972014919,
Input(s):
Failed to read data from "hdfs://cloudlab-namenode/user/clouduser/test.txt"
Output(s):
Failed to produce result in "hdfs://cloudlab-namenode/tmp/temp-502536453/tmp-1972014919"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201109051400_0283
2011-09-08 23:54:01,776 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2011-09-08 23:54:01,793 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: './1.sh ' failed with exit status: 127
Details at logfile: /home/clouduser/pig_1315540364239.log
I even tried to run this file in shell script and ship the shell script and this c binary
but still failed!
Anyone has idea?!
StackOverflow Seems not allowing the original c code but the code runs fine

From the given log:
Failed to read data from "hdfs://cloudlab-namenode/user/clouduser/test.txt"
Make sure you have the file test.txt in the cluster path "hdfs://cloudlab-namenode/user/clouduser/test.txt"
From the log line:
2011-09-08 23:54:01,793 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: './1.sh ' failed with exit status: 127
Check if ./1.sh can be executed?

Related

Can we access the entire hadoop job logs in a oozie workflow xml?

oozie #emailAction #hadoop
Im running a hadoop pig job using a oozie workflow. How Can I access the entire log for the hadoop job in the workflow xml, so that I can use it in the success/failure email action?
Thanks
Sample Log I need in email:
2016-10-26 13:58:30,385 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2016-10-26 13:58:30,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-10-26 13:58:30,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-10-26 13:58:30,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-10-26 13:58:30,608 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2016-10-26 13:58:30,639 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-10-26 13:58:30,640 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-10-26 13:58:30,647 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=2369469310
2016-10-26 13:58:30,648 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 3
2016-10-26 13:58:30,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5719456061273645490.jar
2016-10-26 13:58:33,816 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5719456061273645490.jar created
2016-10-26 13:58:33,834 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-10-26 13:58:33,865 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-10-26 13:58:33,896 [JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2016-10-26 13:58:34,053 [JobControl] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-10-26 13:58:34,053 [JobControl] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2016-10-26 13:58:34,115 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-10-26 13:58:34,166 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 18
2016-10-26 13:58:34,367 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201610241241_0117
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4] C: R:
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: XXX/jobdetails.jsp?jobid=job_201610241241_0117
2016-10-26 13:58:45,851 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 6% complete
2016-10-26 13:58:46,865 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 8% complete
2016-10-26 13:58:48,907 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 12% complete
2016-10-26 13:58:51,982 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 17% complete
2016-10-26 13:58:55,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 21% complete
2016-10-26 13:58:58,098 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 25% complete
2016-10-26 13:59:01,120 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 26% complete
2016-10-26 13:59:42,816 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 32% complete
2016-10-26 13:59:44,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2016-10-26 13:59:45,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 35% complete
2016-10-26 13:59:49,351 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 39% complete
2016-10-26 13:59:53,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 42% complete
2016-10-26 14:01:04,726 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-10-26 14:01:04,728 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.1 0.11.0-cdh4.7.1 hadoop 2016-10-26 13:58:30 2016-10-26 14:01:04 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_201610241241_0117 18 0 138 24 76 79 0 0 0 0 A MAP_ONLY /home/hadoop/xx/xx/xx/20161015/00,
Input(s):
Successfully read 116235853 records (2369955422 bytes) from: "/home/hadoop/xx/data/xx/20161015/00/part*"
Output(s):
Successfully stored 116235853 records (5855768014 bytes) in: "/home/hadoop/xx/xx/xx/20161015/00"
Counters:
Total records written : 116235853
Total bytes written : 5855768014
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201610241241_0117
2016-10-26 14:01:04,747 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
Based on the question and comments I would recommend you to do the following:
Once a job fails, don't transition it directly to the OK node. Instead route it either to a fail node (if you just want to see faillures when looking on the cluster) or route it first to a mail node, and then to either OK or fail depending on your preference.
In the mail sent in the mail node, you could add the job id. Then people know that they need to look at this job on the server because something failed in it.
Optionally you could always send a mail, in that case use a setup with transition to either a mailOK or a mailFail node, so people know that the process ran at all, and whether they need to look at a faillure.

ERROR 2997: Unable to recreate exception from backed error: while using CSVExcelStorage

ERROR 2997: Unable to recreate exception from backed error.
Here i have parsed apache log file but when i am trying to export it into csv format this error occurs. Code and error :
grunt> STORE logs INTO '/home/cloudera/workspace/Test_log.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE','NOCHANGE');
2015-12-24 10:50:44,821 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2015-12-24 10:50:44,830 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2015-12-24 10:50:44,937 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-12-24 10:50:49,055 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2015-12-24 10:50:49,056 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is
false, will not generate code.
2015-12-24 10:50:49,056 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Starting process to move
generated code to distributed cache
2015-12-24 10:50:49,056 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Setting key
[pig.schematuple.classes] with classes to deserialize []
2015-12-24 10:50:49,158 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2015-12-24 10:50:49,158 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker
is deprecated. Instead, use mapreduce.jobtracker.address
2015-12-24 10:50:49,159 [JobControl] INFO
org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager
at /0.0.0.0:8032
2015-12-24 10:50:49,177 [JobControl] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-12-24 10:50:49,428 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2015-12-24 10:50:49,431 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 2
2015-12-24 10:50:49,467 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - number of splits:2
2015-12-24 10:50:49,518 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:
job_1450979216927_0004
2015-12-24 10:50:49,578 [JobControl] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
application application_1450979216927_0004
2015-12-24 10:50:49,581 [JobControl] INFO
org.apache.hadoop.mapreduce.Job - The url to track the job:
http://quickstart.cloudera:8088/proxy/application_1450979216927_0004/
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_1450979216927_0004
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases logs
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: logs[7,7],null[-1,-1] C: R:
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1450979216927_0004
2015-12-24 10:50:49,702 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2015-12-24 10:51:17,030 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2015-12-24 10:52:04,848 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-12-24 10:52:04,848 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_1450979216927_0004 has failed! Stop running all dependent jobs
2015-12-24 10:52:04,848 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2015-12-24 10:52:05,039 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backed error:
AttemptID:attempt_1450979216927_0004_m_000001_3 Info:Error:
org.apache.pig.data.Tuple.isNull()Z
2015-12-24 10:52:05,040 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
failed! 2015-12-24
10:52:05,040[main]INFOorg.apache.pig.tools.pigstats.SimplePigStats -
Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.4.0 0.12.0-cdh5.4.0 cloudera 2015-12-24 10:50:44 2015-12-24 10:52:05 UNKNOWN
Failed!
Failed Jobs: JobId Alias Feature Message Outputs
job_1450979216927_0004 logs MAP_ONLY Message: Job
failed! /home/cloudera/workspace/Test_log.csv,
Input(s): Failed to read data from "/myhdfs/project/TestLog.txt"
Output(s): Failed to produce result in
"/home/cloudera/workspace/Test_log.csv"
Counters: Total records written : 0 Total bytes written : 0 Spillable
Memory Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0
Job DAG: job_1450979216927_0004
2015-12-24 10:52:05,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
Seems to be an error in your dataprocessing chain. This log entry points out that a tuple is null. Just an idea: This can happen, eg when you use customized UDFs, which return Null in some cases.
2015-12-24 10:52:05,039 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1450979216927_0004_m_000001_3 Info:Error: org.apache.pig.data.Tuple.isNull()Z

How to Import/Load .csv file in PIG?

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
2014-09-06 08:41:23,527 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-06 08:41:23,544 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-06 08:41:23,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2739171785773930333.jar
2014-09-06 08:42:39,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2739171785773930333.jar created
2014-09-06 08:42:39,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-09-06 08:42:39,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-09-06 08:42:39,630 [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2014-09-06 08:42:39,891 [Thread-12] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://192.168.195.130:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201408292336_0009
2014-09-06 08:42:39,891 [Thread-12] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
2014-09-06 08:42:40,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.1.1 0.10.0-cdh4.1.1 training 2014-09-06 08:41:23 2014-09-06 08:42:40 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A inputfile MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785,
Input(s):
Failed to read data from "/training/pig/datetemp.txt"
Output(s):
Failed to produce result in "hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-09-06 08:42:40,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias inputfile
Details at logfile: /home/training/pig_1410006833865.log
Please help me here..!!
PigStorage is case sensitive. Use PigStorage and not pigstorage.
Your question headliner said you were trying to load a CSV file. For that, I've had good luck with using org.apache.pig.piggybank.storage.CSVExcelStorage() in my LOAD statements as demonstrated at https://martin.atlassian.net/wiki/x/WYBmAQ.
Why don't you write PigStorage('\t') as you have mentioned already you have tab delimited file instead of PigStorage()
Mentioned code -
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage()
As (EventID: chararray,eventdate: chararray,count:int);
May be this might solve your problem.
let me know if it is something else.
hdfs://192.168.195.130:8020/training/pig/datetemp.txt
file wat not found in your hdfs!! make sure the input file to be placed in the above location.
Have you checked whether the input path exists?
Try:
fs -ls /training/pig/ in Grunt Shell
if it displays datetemp.txt in the list then it will work otherwise give proper input path
Log is telling the ERROR clearly.
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
Can you check the file exists in HDFS or not?
You can also check your pig is running in mapreduce mode or local mode.
You can specify ',' in PigStorage Class to read CSV file.
Query looks like :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
And make sure that you have file '/training/pig/datetemp.txt' on HDFS.
To test run : hadoop fs -ls /training/pig/datetemp.txt

Loading data from HDFS does not work with Elephantbird

I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script:
register 'lib/elephant-bird-core-3.0.9.jar';
register 'lib/elephant-bird-pig-3.0.9.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
twitter = LOAD 'statuses.log.2013-04-01-00'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP twitter;
The output I get is
[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R:
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504,
Input(s):
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00"
Output(s):
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log
The file exists and is accessible:
$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00
This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I got a similar error when running another user defined function for loading data:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
When I force pig to local mode (''-x local'') I get the more obvious error
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess.
This is indeed a versioning problem: some libraries are not yet compatible with the new MapReduce API, see for example the issues #56, #247 and #308.
For ElephantBird the issue is solved in a recent version. Using ElephantBird 4.1 in the above code and adding the Hadoop compatibility module
register 'lib/elephant-bird-core-4.1.jar';
register 'lib/elephant-bird-pig-4.1.jar';
register 'lib/elephant-bird-hadoop-compat-4.1.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
solved the problem! :-)

Hadoop MapReduce - Pig/Cassandra - Unable to create input splits

I'm trying to run a MapReduce Job with Pig and Cassandra and I always get the error:
ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
[SOLVED]
There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS,
PIG_PARTITIONER
/opt/cassandra-0.7.0-beta3/contrib/pig$ bin/pig_cassandra example-script.pig
10/11/15 17:38:26 INFO pig.Main: Logging error messages to: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
2010-11-15 17:38:27,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop-master-1.dkd.lan:8020
2010-11-15 17:38:29,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoop-master-1.dkd.lan:8021
2010-11-15 17:38:32,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://hadoop-master-1.dkd.lan/tmp/temp657556636/tmp-375431593:org.apache.pig.builtin.BinStorage) - 1-82 Operator Key: 1-82)
2010-11-15 17:38:32,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-11-15 17:38:33,364 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-11-15 17:38:38,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-11-15 17:38:38,999 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-11-15 17:38:39,055 [Thread-4] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-11-15 17:38:39,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-11-15 17:38:40,340 [Thread-4] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://hadoop-master-1.dkd.lan/var/lib/hadoop-0.20/cache/mapred/mapred/staging/dkd-sprenger/.staging/job_201011101636_0011
2010-11-15 17:38:40,356 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-11-15 17:38:40,357 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-11-15 17:38:40,402 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2010-11-15 17:38:40,517 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
Details at logfile: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
Anyone who has an idea -> SOLVED
There were some environment variables I missed to set them.
enviroment:
Ubuntu Server 10.4
Versions:
hadoop: 0.20
pig: 0.7
cassandra: 0.7.0 beta3
The asker already updated the question to include the answer:
[SOLVED] There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS, PIG_PARTITIONER

Resources