Pig Job is unable to read data from hdfs (Error: 1066) - hadoop

I'm currently trying to implement a Binary Pig (see https://github.com/endgameinc/binarypig for more information) Cluster to analyze Malware Binaries with Hadoop and Pig. I used Cloudera CDH for installing Hadoop and Pig.
My Pig script is as follows:
SET debug 'on';
register '/home/myuser/binarypig-1.0-SNAPSHOT-jar-with-dependencies.jar';
SET mapred.cache.files /tmp/scripts#scripts;
SET mapred.create.symlink yes;
%default INPUT 'hdfs://namenode1:8020/bla/test/malware.archive.seq'
%default TIMEOUT_MS '180000'
%default USE_DEVSHM 'true'
data = load '$INPUT' using com.endgame.binarypig.loaders.ExecutingTextLoader('scripts/strings.sh', '$TIMEOUT_MS', '$USE_DEVSHM');
DUMP data;
The bash script strings.sh is just executing the unix "string" command to collect all the strings of each file within the malware.archive.seq container. I'm running the script with on my namenode:
pig -f strings.pig
For some reason my the job always fails with the following error messages:
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1440074864855_0058 data MAP_ONLY Message: Job failed! hdfs://namenode1:8020/tmp/temp-362821719/tmp-171792164,
Input(s):
Failed to read data from "hdfs://namenode1:8020/bla/test/malware.zip.seq"
Output(s):
Failed to produce result in "hdfs://namenode1:8020/tmp/temp-362821719/tmp- 171792164"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1440074864855_0058
2015-08-25 17:07:21,616 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-08-25 17:07:21,616 [main] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
2015-08-25 17:07:21,622 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias data
The file hdfs://namenode1:8020/bla/test/malware.zip.seq does exist and the rights are set to 777 just to exclude permission errors.
Since my guess is that it has something to do with the load command within the pig script, here are the debug messages for the load command:
2015-08-25 17:07:06,639 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
Does anyone have an idea how to fix this or even how to debug this?
Edit (pig_log added):
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias data
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias data
at org.apache.pig.PigServer.openIterator(PigServer.java:892)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:884)
... 13 more
================================================================================

Related

Sentiment Analysis of twitter data using hadoop and pig

Tweets from twitter are stored in hdfs in hadoop.
The tweets need to be processed for sentiment analysis. The tweets in hdfs are in avro format so they need to be processed using Json loader But in pig scripting the tweets from hdfs are not getting read.After changing jar files the pig script is showing failed message
By using these following jar files by pig script is getting failed.
REGISTER '/home/cloudera/Desktop/elephant-bird-hadoop-compat-4.17.jar';
REGISTER '/home/cloudera/Desktop/elephant-bird-pig-4.17.jar';
REGISTER '/home/cloudera/Desktop/json-simple-3.1.0.jar';
These are another set of jar files with which its not failing but data is also not getting read.
REGISTER '/home/cloudera/Desktop/elephant-bird-hadoop-compat-4.17.jar';
REGISTER '/home/cloudera/Desktop/elephant-bird-pig-4.17.jar';
REGISTER '/home/cloudera/Desktop/json-simple-1.1.jar';
Here is all my pig scripting commands i have used:
tweets = LOAD '/user/cloudera/OutputData/tweets' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH tweets GENERATE myMap#'id' as id ,myMap#'tweets' as tweets;
tokens = foreach B generate id, tweets, FLATTEN(TOKENIZE(tweets)) As word;
dictionary = load ' /user/cloudera/OutputData/AFINN.txt' using PigStorage('\t') AS(word:chararray,rating:int);
word_rating = join tokens by word left outer, dictionary by word using 'replicated';
describe word_rating;
rating = foreach word_rating generate tokens::id as id,tokens::tweets as tweets, dictionary::rating as rate;
word_group = group rating by (id,tweets);
avg_rate = foreach word_group generate group, AVG(rating.rate) as tweet_rating;
positive_tweets = filter avg_rate by tweet_rating>=0;
DUMP positive_tweets;
negative_tweets = filter avg_rate by tweet_rating<=0;
DUMP negative_tweets;
Error on dumping above tweets command for the first set of jar files:
Input(s):
Failed to read data from "/user/cloudera/OutputData/tweets"
Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-1614543351/tmp37889715"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1556902124324_0001
2019-05-03 09:59:09,409 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2019-05-03 09:59:09,427 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias tweets. Backend error : org.json.simple.parser.ParseException
Details at logfile: /home/cloudera/pig_1556902594207.log
Error on dumping above tweets command for the second set of jar files:
Input(s):
Successfully read 0 records (5178477 bytes) from: "/user/cloudera/OutputData/tweets"
Output(s):
Successfully stored 0 records in: "hdfs://quickstart.cloudera:8020/tmp/temp-1614543351/tmp479037703"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1556902124324_0002
2019-05-03 10:01:05,417 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-05-03 10:01:05,418 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2019-05-03 10:01:05,418 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-05-03 10:01:05,428 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-05-03 10:01:05,428 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
Expected output was sorted positive and neative tweets but getting errors.
Please do help. Thank you.
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias tweets. Backend error : org.json.simple.parser.ParseException This usually indicates a syntax error in the Pig script.
The AS keyword in a LOAD statement usually require a schema. myMap in your LOAD statement is not a valid schema.
See https://stackoverflow.com/a/12829494/8886552 for an example of JsonLoader.

Pig filter fails due to unexpected data

I am running Cassandra and have about 20k records in it to play with. I am trying to run a filter in pig on this data but am getting the following message back:
2015-07-23 13:02:23,559 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:260)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:205)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:263)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:44)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.(CqlRecordReader.java:259)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)
... 7 more
You would think this is an obvious error, and believe me there are a ton of results on google for this. It's clear that some piece of my data isn't conforming to the expected type of a given column. What I don't understand is 1.) why this is happening, and 2.) how to debug it. If I try to insert invalid data into Cassandra from my nodejs app, it will throw this kind of error if my data type doesn't match the columns data type, which means that this shouldn't be possible? I've read that data validation using UTF8 is wonky and that setting a different kind of validation is the answer, but I don't know how to do that. Here are my steps to reproduce:
grunt> define CqlNativeStorage org.apache.cassandra.hadoop.pig.CqlNativeStorage();
grunt> test = load 'cql://blah/blahblah' USING CqlNativeStorage();
grunt> describe test;
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - Found ksDef name: blah
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - partition keys: ["ad_id"]
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster keys: []
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - row key validator: org.apache.cassandra.db.marshal.UTF8Type
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster key validator: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type)
blahblah: {ad_id: chararray,address: chararray,city: chararray,date_created: long,date_listed: long,fireplace: bytearray,furnished: bytearray,garage: bytearray,neighbourhood: chararray,num_bathrooms: int,num_bedrooms: int,pet_friendly: bytearray,postal_code: chararray,price: double,province: chararray,square_feet: int,url: chararray,utilities_included: bytearray}
grunt> query1 = FILTER blahblah BY city == 'New York';
grunt> dump query1;
Then it runs for awhile and dumps out tons of logs and the error appears.
Discovered my problem: the pig partioner did not match CQL3, and therefore the data was being parsed incorrectly. Previously the environment variable was PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner. After I changed it to PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner it started working.

Pig Dump command throwing error

Unable to fetch data form join.
Data:
Jorge Posada |Yankees| {(Catcher,2000),(Designated_hitter,2001)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell |Oakland|{(Catcher,2000),(First_baseman,2001)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado |Atlanta| {(Second_baseman,2002),(Infielder,2003),(Left_fielder,2001)}|[games#258,hit_by_pitch#3]
**Code:**
bfile= LOAD 'basketball1.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:tuple(point:chararray,year:int)},bat:map[]);
bfile1= foreach bfile generate name,pos.year as year;
bfile2= foreach bfile1 generate name,flatten(year) as play_year ;
bfile3= group bfile2 by play_year;
bfile4= foreach bfile3 generate group,COUNT($1) as count;
bfile5= foreach bfile generate flatten(pos.year) as year,bat#'games' as games_cnt;
bfile6= group bfile5 by year;
bjoin= join bfile3 by group ,bfile6 by group;
bjoin1= foreach bjoin generate bfile3.group,bfile3::bfile2.name as name,
bfile6::bfile5.games_cnt as tot_games;
**Describe bjoin1:**
bjoin: {bfile3::group: int,bfile3::bfile2: {(name: chararray,play_year: int)},
bfile6::group: int,bfile6::bfile5: {(year: int,games_cnt: bytearray)}}
While doing dump bjoin1 I face the following issue:
2014-11-15 07:31:42,318 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2014-11-15 07:31:42,321 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias bjoin1
Details at logfile: /home/cloudera/pig_1416065344409.log
grunt> 2014-11-15 07:31:47,857 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint

I'm using Hadoop 2.2.0 in a Cluster setup and I repeatedly get the following error, the Exception is produced in the name node olympus under file /opt/dev/hadoop/2.2.0/logs/hadoop-deploy-secondarynamenode-olympus.log e.g.
2014-02-12 16:19:59,013 INFO org.mortbay.log: Started SelectChannelConnector#olympus:50090
2014-02-12 16:19:59,013 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Web server init done
2014-02-12 16:19:59,013 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Secondary Web-server up at: olympus:50090
2014-02-12 16:19:59,013 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint Period :3600 secs (60 min)
2014-02-12 16:19:59,013 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Log Size Trigger :1000000 txns
2014-02-12 16:20:59,161 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -47 namespaceID = 291272852 cTime = 0 ; clusterId = CID-e3e4ac32-7384-4a1f-9dce-882a6e2f4bd4 ; blockpoolId = BP-166254569-192.168.92.21-1392217748925.
Expecting respectively: -47; 431978717; 0; CID-85b65e19-4030-445b-af8e-5933e75a6e5a; BP-1963497814-192.168.92.21-1392217083597.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:519)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:380)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:346)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:342)
at java.lang.Thread.run(Thread.java:744)
2014-02-12 16:21:59,183 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -47 namespaceID = 291272852 cTime = 0 ; clusterId = CID-e3e4ac32-7384-4a1f-9dce-882a6e2f4bd4 ; blockpoolId = BP-166254569-192.168.92.21-1392217748925.
Expecting respectively: -47; 431978717; 0; CID-85b65e19-4030-445b-af8e-5933e75a6e5a; BP-1963497814-192.168.92.21-1392217083597.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:519)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:380)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:346)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:342)
at java.lang.Thread.run(Thread.java:744)
Can anyone advice what's wrong here?
I had the same error and it went when I deleted the [hadoop temporary directory] /dfs/namesecondary directory.
For me [hadoop temporary directory] is the value of hadoop.tmp.dir in core-site.xml
We need to stop the hadoop services first , and then delete the tmp secondary namenode directory (hadoop.tmp.dir will tell the path for secondary namenode data directory). After this, start the services again and the issue will be fixed.

Pig: error while union the result of mapreduce

Pig script
base = load 'u.base' as (uid:long, gid:long, pref:double);
sim1 = mapreduce 'mahout-core-0.7-job.jar'
store base into 'input'
load 'output' as (gid1:long, gid2:long, sim:double)
`org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i input -o output -s SIMILARITY_EUCLIDEAN_DISTANCE`;
sim2 = foreach sim1 generate gid2 as gid1, gid1 as gid2, sim;
sim3 = union sim1,sim2;
dump sim3;
Pig output
2013-03-28 09:21:32,564 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNION,NATIVE
2013-03-28 09:21:32,676 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-03-28 09:21:32,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4
2013-03-28 09:21:32,702 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2127: Internal Error: Cloning of plan failed for optimization.
Details at logfile: /home/chenwl/logs/pig_1364433685680.log
Pig log
Pig Stack Trace
---------------
ERROR 2127: Internal Error: Cloning of plan failed for optimization.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sim3
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.loadScript(GruntParser.java:531)
at org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:480)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:804)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:449)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias sim3
at org.apache.pig.PigServer.storeEx(PigServer.java:935)
at org.apache.pig.PigServer.store(PigServer.java:898)
at org.apache.pig.PigServer.openIterator(PigServer.java:811)
... 16 more
Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2127: Internal Error: Cloning of plan failed for optimization.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:304)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:219)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:273)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:46)
at org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:71)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:94)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:617)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:146)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.storeEx(PigServer.java:931)
... 18 more
Caused by: java.lang.CloneNotSupportedException: Unable to find clone for op 1-36: Native('hadoop jar mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i input -o output -s SIMILARITY_EUCLIDEAN_DISTANCE ') - scope-12
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:273)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:298)
... 29 more
================================================================================
Environment
OS: ubuntu 12.04
Hadoop: 1.0.4 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290
Pig: 0.11.0 (r1446324)
P.S.:
It works if sim1 was loaded from hdfs, e.g. sim1 = load 'sim' as (gid1:long, gid2:long, sim:double).

Resources