Error in pig on spark while executing - hadoop

I'm trying to run a script but I'm getting the error. Here is my script:
X = load 'hdfs://localhost:54310/testing/abcd.txt' USING PigStorage('\t')AS(user,time,query);
Y = LIMIT X 10;
dump Y;
This is the error I'm getting while executing the above script.
2014-06-15 14:31:42,438 [main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to run saveAsNewAPIHadoopFile at StoreConverter.java:58
2014-06-15 14:31:42,491 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.

Related

Pig: After join dump is throwing ERROR 1066: Unable to open iterator for alias C

Below is my requirement:
Input:
0104919 ,08476,48528,2016,2016-08-29
00104919 ,08476,48528,2016,2016-09-05
00104919 ,08476,48528,2016,2016-09-12
00104919 ,08476,48528,2017,2016-08-29
Output after join should be:
2,00104919 ,08476,48528,2016,2016-09-05,2016-09-12
3,00104919 ,08476,48528,2016,2016-09-12,2016-08-29
Below is my code:
TABL = LOAD '/TABL/part-r-00000' using PigStorage('~') AS (a,b,c,d,e,f);
pre_Q1 = FOREACH TABL generate a,b,c,d,e;
DIST = DISTINCT pre_Q1;
ORDR = ORDER DIST BY *;
Q1 = rank ORDR;
Q2 = FOREACH Q1 GENERATE rank_ORDR + 1 AS rank_Q2, a, b, c, d, e;
Q_join = join Q2 by (rank_Q2, a, b, c, d), Q1 by (rank_ORDR, a, b, c, d);
C = limit Q_join 100;
dump C;
I am getting the below error.
Can someone point out what must be causing the below error.
Failed Jobs:
JobId Alias Feature Message Outputs
job_1474127474437_528208 C,Q2,Q_join HASH_JOIN Message: Job failed!
Input(s):
Successfully read 5235587 records (1516199217 bytes) from: "/TABL/part-r-00000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1474127474437_528166 -> job_1474127474437_528185,
job_1474127474437_528185 -> job_1474127474437_528190,
job_1474127474437_528190 -> job_1474127474437_528204,
job_1474127474437_528204 -> job_1474127474437_528206,
job_1474127474437_528206 -> job_1474127474437_528208,
job_1474127474437_528208 -> null,
null
2017-01-04 04:02:37,407 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,569 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,729 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,887 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2017-01-04 04:02:37,945 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C
Details at logfile: /var/log/gphd/pig/pig.log
Try to modify the first line as below :
TABL = LOAD '/TABL/part-r-00000' using PigStorage(',') AS (a,b,c,d,e,f);
And watch out to the space at the end of the column a, it may affect the join !

Pig filter fails due to unexpected data

I am running Cassandra and have about 20k records in it to play with. I am trying to run a filter in pig on this data but am getting the following message back:
2015-07-23 13:02:23,559 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:260)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:205)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:263)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:44)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.(CqlRecordReader.java:259)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)
... 7 more
You would think this is an obvious error, and believe me there are a ton of results on google for this. It's clear that some piece of my data isn't conforming to the expected type of a given column. What I don't understand is 1.) why this is happening, and 2.) how to debug it. If I try to insert invalid data into Cassandra from my nodejs app, it will throw this kind of error if my data type doesn't match the columns data type, which means that this shouldn't be possible? I've read that data validation using UTF8 is wonky and that setting a different kind of validation is the answer, but I don't know how to do that. Here are my steps to reproduce:
grunt> define CqlNativeStorage org.apache.cassandra.hadoop.pig.CqlNativeStorage();
grunt> test = load 'cql://blah/blahblah' USING CqlNativeStorage();
grunt> describe test;
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - Found ksDef name: blah
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - partition keys: ["ad_id"]
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster keys: []
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - row key validator: org.apache.cassandra.db.marshal.UTF8Type
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster key validator: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type)
blahblah: {ad_id: chararray,address: chararray,city: chararray,date_created: long,date_listed: long,fireplace: bytearray,furnished: bytearray,garage: bytearray,neighbourhood: chararray,num_bathrooms: int,num_bedrooms: int,pet_friendly: bytearray,postal_code: chararray,price: double,province: chararray,square_feet: int,url: chararray,utilities_included: bytearray}
grunt> query1 = FILTER blahblah BY city == 'New York';
grunt> dump query1;
Then it runs for awhile and dumps out tons of logs and the error appears.
Discovered my problem: the pig partioner did not match CQL3, and therefore the data was being parsed incorrectly. Previously the environment variable was PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner. After I changed it to PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner it started working.

Pig Dump command throwing error

Unable to fetch data form join.
Data:
Jorge Posada |Yankees| {(Catcher,2000),(Designated_hitter,2001)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell |Oakland|{(Catcher,2000),(First_baseman,2001)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado |Atlanta| {(Second_baseman,2002),(Infielder,2003),(Left_fielder,2001)}|[games#258,hit_by_pitch#3]
**Code:**
bfile= LOAD 'basketball1.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:tuple(point:chararray,year:int)},bat:map[]);
bfile1= foreach bfile generate name,pos.year as year;
bfile2= foreach bfile1 generate name,flatten(year) as play_year ;
bfile3= group bfile2 by play_year;
bfile4= foreach bfile3 generate group,COUNT($1) as count;
bfile5= foreach bfile generate flatten(pos.year) as year,bat#'games' as games_cnt;
bfile6= group bfile5 by year;
bjoin= join bfile3 by group ,bfile6 by group;
bjoin1= foreach bjoin generate bfile3.group,bfile3::bfile2.name as name,
bfile6::bfile5.games_cnt as tot_games;
**Describe bjoin1:**
bjoin: {bfile3::group: int,bfile3::bfile2: {(name: chararray,play_year: int)},
bfile6::group: int,bfile6::bfile5: {(year: int,games_cnt: bytearray)}}
While doing dump bjoin1 I face the following issue:
2014-11-15 07:31:42,318 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2014-11-15 07:31:42,321 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias bjoin1
Details at logfile: /home/cloudera/pig_1416065344409.log
grunt> 2014-11-15 07:31:47,857 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

Bulk loading using pig

STORE A INTO 'hbase://xyz' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(' id:id zip:zip desc:desc1 desc:desc2 income:income ')
AS (id:id zip:zip desc:desc1 desc:desc2 income:income);
i am executing above pig script for storing data in HBase and i am getting following error
2013-09-23 05:34:44,676 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "as" "AS "" at line 1, column 138.
Was expecting one of:
"parallel" ...
";" ...
2013-09-23 05:34:44,676 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to.
2013-09-23 05:34:44,676 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Encountered " "as" "AS "" at line 1, column 138.
Was expecting one of:
"parallel" ...
";" ...
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1618)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1562)
at org.apache.pig.PigServer.registerQuery(PigServer.java:534)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:871)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
at org.apache.pig.Main.run(Main.java:455)
at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Encountered " "as" "AS "" at line 1, column 138.
Was expecting one of:
"parallel" ...
";" ...
at org.apache.pig.impl.logicalLayer.parser.QueryParser.generateParseException(QueryParser.java:9599)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_consume_token(QueryParser.java:9475)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:826)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1612)
... 9 more
please help
STORE function cannot have 'AS'. Refer documentation: http://pig.apache.org/docs/r0.10.0/basic.html#store

Pig: error while union the result of mapreduce

Pig script
base = load 'u.base' as (uid:long, gid:long, pref:double);
sim1 = mapreduce 'mahout-core-0.7-job.jar'
store base into 'input'
load 'output' as (gid1:long, gid2:long, sim:double)
`org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i input -o output -s SIMILARITY_EUCLIDEAN_DISTANCE`;
sim2 = foreach sim1 generate gid2 as gid1, gid1 as gid2, sim;
sim3 = union sim1,sim2;
dump sim3;
Pig output
2013-03-28 09:21:32,564 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNION,NATIVE
2013-03-28 09:21:32,676 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-03-28 09:21:32,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4
2013-03-28 09:21:32,702 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2127: Internal Error: Cloning of plan failed for optimization.
Details at logfile: /home/chenwl/logs/pig_1364433685680.log
Pig log
Pig Stack Trace
---------------
ERROR 2127: Internal Error: Cloning of plan failed for optimization.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sim3
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.loadScript(GruntParser.java:531)
at org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:480)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:804)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:449)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias sim3
at org.apache.pig.PigServer.storeEx(PigServer.java:935)
at org.apache.pig.PigServer.store(PigServer.java:898)
at org.apache.pig.PigServer.openIterator(PigServer.java:811)
... 16 more
Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2127: Internal Error: Cloning of plan failed for optimization.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:304)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:219)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:273)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:46)
at org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:71)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:94)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:617)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:146)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.storeEx(PigServer.java:931)
... 18 more
Caused by: java.lang.CloneNotSupportedException: Unable to find clone for op 1-36: Native('hadoop jar mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i input -o output -s SIMILARITY_EUCLIDEAN_DISTANCE ') - scope-12
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:273)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:298)
... 29 more
================================================================================
Environment
OS: ubuntu 12.04
Hadoop: 1.0.4 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290
Pig: 0.11.0 (r1446324)
P.S.:
It works if sim1 was loaded from hdfs, e.g. sim1 = load 'sim' as (gid1:long, gid2:long, sim:double).

Resources