apache nifi Stateless - not able to set parm of controller service (DBCPConnectionPool 1.10.0) - apache-nifi

I am following the NiFi 1.10 stateless guildeline to create a simple process group of executing a sql in mysql db. I have put necessary parm of db controller service to parameter context.
it works well in nifi canvas. Then i add it to registry and prepare a json parm file: stateless-simpledb.json
{
"registryUrl": "http://localhost:18080",
"bucketId": "cac8f127-e328-45c1-a4cb-0e03dc837ceb",
"flowId": "cc2753f2-78f3-4449-a2fd-343dfeaafe15",
"flowVersion": "3",
"parameters": {
"lastIngestId" : "20000",
"mysql-jdbc-driver-name" : "com.mysql.jdbc.Driver",
"db-user" : "root",
"db-password" : "password",
"db-con-url" : "jdbc:mysql://localhost:3306/mms",
"jdbc-jar-path" : "/program/jdbc/mysql-connector-java.jar"
}
}
and run the one-off command:
/program/nifi/bin/nifi.sh stateless RunFromRegistry Once --file /app/poc/nifi-stateless/conf/stateless-simpledb.json
It raise error:
=== FlowFileRepository Type ===
org.apache.nifi.controller.repository.RocksDBFlowFileRepository
org.apache.nifi:nifi-framework-nar:1.10.0 || /program/nifi-1.10.0/work/stateless-nars/nifi-framework-nar-1.10.0.nar-unpacked
org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
org.apache.nifi:nifi-framework-nar:1.10.0 || /program/nifi-1.10.0/work/stateless-nars/nifi-framework-nar-1.10.0.nar-unpacked
org.apache.nifi.controller.repository.VolatileFlowFileRepository
org.apache.nifi:nifi-framework-nar:1.10.0 || /program/nifi-1.10.0/work/stateless-nars/nifi-framework-nar-1.10.0.nar-unpacked
=== End FlowFileRepository types ===
23:32:32.626 [main] INFO org.apache.nifi.stateless.bootstrap.ExtensionDiscovery - Successfully discovered extensions in 4411 milliseconds
23:32:32.633 [main] DEBUG org.apache.nifi.stateless.core.ComponentFactory - Setting context class loader to org.apache.nifi.nar.InstanceClassLoader#50fa5938 (parent = org.apache.nifi.nar.NarClassLoader[/program/nifi-1.10.0/work/stateless-nars/nifi-dbcp-service-nar-1.10.0.nar-unpacked]) to create org.apache.nifi.dbcp.DBCPConnectionPool
23:32:32.647 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input #{jdbc-jar-path} found 1 Parameter references: [org.apache.nifi.parameter.StandardParameterReference#2d3eecda]
23:32:32.650 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input /program/jdbc/mysql-connector-java.jar found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input 500 millis found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input 8 found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input 0 found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input 8 found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input -1 found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input -1 found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input 30 mins found 0 Parameter references: []
23:32:32.651 [main] DEBUG org.apache.nifi.parameter.ExpressionLanguageAwareParameterParser - For input -1 found 0 Parameter references: []
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.bootstrap.RunStatelessNiFi.main(RunStatelessNiFi.java:69)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.StatelessNiFi.main(StatelessNiFi.java:103)
... 5 more
Caused by: java.lang.RuntimeException: Failed to enable Controller Service {id=691ecc97-ff46-3a5e-8aad-37dc568bc247, name=MYSQL-MMS-stateless-test, type=class org.apache.nifi.dbcp.DBCPConnectionPool} because validation failed: ['Database Connection URL' is invalid because Database Connection URL is required, 'Database Driver Class Name' is invalid because Database Driver Class Name is required]
at org.apache.nifi.stateless.core.StatelessControllerServiceLookup.enableControllerServices(StatelessControllerServiceLookup.java:133)
at org.apache.nifi.stateless.core.StatelessFlow.<init>(StatelessFlow.java:153)
at org.apache.nifi.stateless.core.StatelessFlow.createAndEnqueueFromJSON(StatelessFlow.java:469)
at org.apache.nifi.stateless.runtimes.Program.runLocal(Program.java:133)
at org.apache.nifi.stateless.runtimes.Program.launch(Program.java:67)
... 10 more
Seems the apache nifi stateless function failed to set controller service even it's in "process group" scope.
Would anyone has any advice?

As mentioned in the comments, this appears to be a known problem with the validation of controller services.
This can be avoided by using Nifi 1.12 and above as it got fixed in the following jira: https://issues.apache.org/jira/plugins/servlet/mobile#issue/NIFI-7380
Though I am not entirely sure of this, it may also be possible that this simply indicates that your controller service is not configured correctly. This would be worth double checking.

Related

SQLExceptionHelper : Input parameter not set, Index : 0

I am using spring-data-rest & hibernate to expose a table "File(Id, Name)" from a SAP IQ(Sybase IQ) database. The below error occurs when I do a "GET" on the "File" table using "curl http://localhost:8080/files/1".
2022-07-20 20:01:03 WARN [http-nio-8081-exec-1] o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: JZ0SA
2022-07-20 20:01:03 ERROR [http-nio-8081-exec-1] o.h.e.jdbc.spi.SqlExceptionHelper - JZ0SA: Prepared Statement: Input parameter not set, index: 0.
2022-07-20 20:01:03 INFO [http-nio-8081-exec-1] o.h.e.i.DefaultLoadEventListener - HHH000327: Error performing load command
org.hibernate.exception.GenericJDBCException: could not extract ResultSet
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:67)
I have the same table in "Sybase 16" but it is working perfectly with the same code base.
Anyone faced similar issue?
Thanks in advance

Spring-Batch - MultiFileResourcePartitioner - Caused by: java.net.SocketException: Connection reset

I am using FlatFileItemReader and have extended the AbstractResource to return a stream from Amazon S3 object.
S3Object amazonS3Object = s3client.getObject(new GetObjectRequest(bucket,file));
InputStream stream = null;
stream = amazonS3Object.getObjectContent();
return stream;
In my batch job I have also implemented MultiFileResourcePartitioner in which i gave the bucket to partition all the files. I am able to read only part of few files and after which i get a socket reset error.see below pieces of error
.ResourcelessTransactionManager$ResourcelessTransaction#122ba881]
2015-08-24 23:24:03 DEBUG RepeatTemplate:366 - Repeat operation about to start at count=9
2015-08-24 23:24:03 DEBUG StepContextRepeatCallback:68 - Preparing chunk execution for StepContext: org.springframework.batch.core.scope.context.StepContext#252ce07a
2015-08-24 23:24:03 DEBUG StepContextRepeatCallback:76 - Chunk execution starting: queue size=0
2015-08-24 23:24:03 DEBUG ResourcelessTransactionManager:367 - Creating new transaction with name [null]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT
2015-08-24 23:24:03 DEBUG RepeatTemplate:464 - Starting repeat context.
2015-08-24 23:24:03 DEBUG RepeatTemplate:366 - Repeat operation about to start at count=1
2015-08-24 23:24:03 DEBUG RepeatTemplate:366 - Repeat operation about to start at count=2
2015-08-24 23:24:03 DEBUG RepeatTemplate:366 - Repeat operation about to start at count=3
2015-08-24 23:24:03 DEBUG RepeatTemplate:366 - Repeat operation about to start at count=4
2015-08-24 23:24:03 DEBUG DefaultClientConnection:160 - Connection 0.0.0.0:58171<->10.37.135.39:8099 shut down
2015-08-24 23:24:03 DEBUG DefaultClientConnection:176 - Connection 0.0.0.0:58171<->10.37.135.39:8099 closed
Caused by: org.springframework.batch.item.file.NonTransientFlatFileException: Unable to read from resource: [null]
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:220)
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:173)
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:83)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:133)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:121)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at com.sun.proxy.$Proxy17.read(Unknown Source)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:91)
at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
... 22 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)</i>
My requirement is to process files with millions of records out of an S3 bucket and the application runs on AWS. I have passed the S3 client configurations with retry's and open connections which didn't help much.
As #Michael Minella mentioned, it might be a choice for you to use Spring Cloud AWS project to get resources:
#Autowired
private ResourcePatternResolver resourcePatternResolver;
public void resolveAndLoad() throws IOException {
Resource[] allTxtFilesInFolder = this.resourcePatternResolver.getResources("s3://bucket/name/*.txt");
Resource[] allTxtFilesInBucket = this.resourcePatternResolver.getResources("s3://bucket/**/*.txt");
Resource[] allTxtFilesGlobally = this.resourcePatternResolver.getResources("s3://**/*.txt");
}
And then pass the resources to your MultiFileResourcePartitioner to see the exception can be solved.

Pig filter fails due to unexpected data

I am running Cassandra and have about 20k records in it to play with. I am trying to run a filter in pig on this data but am getting the following message back:
2015-07-23 13:02:23,559 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:260)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:205)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:263)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:44)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.(CqlRecordReader.java:259)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)
... 7 more
You would think this is an obvious error, and believe me there are a ton of results on google for this. It's clear that some piece of my data isn't conforming to the expected type of a given column. What I don't understand is 1.) why this is happening, and 2.) how to debug it. If I try to insert invalid data into Cassandra from my nodejs app, it will throw this kind of error if my data type doesn't match the columns data type, which means that this shouldn't be possible? I've read that data validation using UTF8 is wonky and that setting a different kind of validation is the answer, but I don't know how to do that. Here are my steps to reproduce:
grunt> define CqlNativeStorage org.apache.cassandra.hadoop.pig.CqlNativeStorage();
grunt> test = load 'cql://blah/blahblah' USING CqlNativeStorage();
grunt> describe test;
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - Found ksDef name: blah
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - partition keys: ["ad_id"]
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster keys: []
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - row key validator: org.apache.cassandra.db.marshal.UTF8Type
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster key validator: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type)
blahblah: {ad_id: chararray,address: chararray,city: chararray,date_created: long,date_listed: long,fireplace: bytearray,furnished: bytearray,garage: bytearray,neighbourhood: chararray,num_bathrooms: int,num_bedrooms: int,pet_friendly: bytearray,postal_code: chararray,price: double,province: chararray,square_feet: int,url: chararray,utilities_included: bytearray}
grunt> query1 = FILTER blahblah BY city == 'New York';
grunt> dump query1;
Then it runs for awhile and dumps out tons of logs and the error appears.
Discovered my problem: the pig partioner did not match CQL3, and therefore the data was being parsed incorrectly. Previously the environment variable was PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner. After I changed it to PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner it started working.

java.io.IOException: Job status not available about hive

When I use hive with select * from table_name;, it works.
When I use select t.a from table_name t OR select * from table_name where ..., the following error happens :
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1414949555870_118360, Tracking URL = N/A
Kill Command = /usr/local/hadoop-2.5.1/bin/hadoop job -kill job_1414949555870_118360
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:347)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:295)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.isJobPreparing(HadoopShimsSecure.java:104)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:541)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Ended Job = job_1414949555870_118360 with exception java.io.IOException(Job status not available )
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask`
So, what's wrong about hive?
DEBUG info as follow, maybe useful!! Please help me!!!
15/04/14 16:53:48 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client#60cb201b
15/04/14 16:53:48 DEBUG mapred.ClientServiceDelegate: Failed to contact AM/History for job job_1414949555870_118441 retrying..
java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "slave109":43759; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost

Pig: error while union the result of mapreduce

Pig script
base = load 'u.base' as (uid:long, gid:long, pref:double);
sim1 = mapreduce 'mahout-core-0.7-job.jar'
store base into 'input'
load 'output' as (gid1:long, gid2:long, sim:double)
`org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i input -o output -s SIMILARITY_EUCLIDEAN_DISTANCE`;
sim2 = foreach sim1 generate gid2 as gid1, gid1 as gid2, sim;
sim3 = union sim1,sim2;
dump sim3;
Pig output
2013-03-28 09:21:32,564 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNION,NATIVE
2013-03-28 09:21:32,676 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-03-28 09:21:32,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4
2013-03-28 09:21:32,702 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2127: Internal Error: Cloning of plan failed for optimization.
Details at logfile: /home/chenwl/logs/pig_1364433685680.log
Pig log
Pig Stack Trace
---------------
ERROR 2127: Internal Error: Cloning of plan failed for optimization.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sim3
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.loadScript(GruntParser.java:531)
at org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:480)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:804)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:449)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias sim3
at org.apache.pig.PigServer.storeEx(PigServer.java:935)
at org.apache.pig.PigServer.store(PigServer.java:898)
at org.apache.pig.PigServer.openIterator(PigServer.java:811)
... 16 more
Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2127: Internal Error: Cloning of plan failed for optimization.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:304)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:219)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:273)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:46)
at org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:71)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:94)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:617)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:146)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.storeEx(PigServer.java:931)
... 18 more
Caused by: java.lang.CloneNotSupportedException: Unable to find clone for op 1-36: Native('hadoop jar mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i input -o output -s SIMILARITY_EUCLIDEAN_DISTANCE ') - scope-12
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:273)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:298)
... 29 more
================================================================================
Environment
OS: ubuntu 12.04
Hadoop: 1.0.4 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290
Pig: 0.11.0 (r1446324)
P.S.:
It works if sim1 was loaded from hdfs, e.g. sim1 = load 'sim' as (gid1:long, gid2:long, sim:double).

Resources