ERROR 1066: Unable to open iterator for alias in pig - user-defined-functions

I am running a pig script which is as follows
REGISTER '/home/vishal/FirstUdf.jar';
DEFINE UPPER com.first.UPPER();
A = LOAD '/home/vishal/exampleforPIG1' AS (exchange: chararray, symbol: chararray, date: int,value:float);
B= FOREACH A GENERATE com.first.UPPER(exchange);
DUMP B;
Following is my UDF in java
package com.first;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
#SuppressWarnings("deprecation")
public class UPPER extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String str = (String) input.get(0);
return str.toLowerCase();
} catch (Exception e) {
throw WrappedIOException.wrap(
"Caught exception processing input row ", e);
}
}
}
Now when i try to run that ,it gives me the following error
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias B
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:866)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:683)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:430)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:858)
... 12 more
Whys is that in pig script it is not able to open an iterator for B ie (it is not able to assign an iterator for the following line)
B = FOREACH A GENERATE com.first.UPPER(exchange);
'exampleforPIG1' file has following data
NYSE CPO 2009-12-30 0.14
NYSE CPO 2009-09-28 0.14
NYSE CPO 2009-06-26 0.14
NYSE CPO 2009-03-27 0.14
NYSE CPO 2009-01-06 0.14
NYSE CCS 2009-10-28 0.414
NYSE CCS 2009-07-29 0.414
..
..
etc

Well two things,
If all you want to do is typecast to upper/lower case, why not use the inbuilt functions UPPER/LOWER.
You can find the usage in the reference manuals.
If you want to continue in the same method,
it should be
B = FOREACH A GENERATE UPPER(exchange);
You have already defined it as DEFINE UPPER com.first.UPPER();

I faced this issue and after breaking my head I found that the flaw was in the input data, even though I was cleaning by replacing null. I had a single record which had fields like '(null)', and it was causing everything to fail. Just check this once, if you have bad records like this.

Its the avro version which caused this error for me.
I was using avro-1.7.6.jar. Changing it to avro-1.4.0.jar solved my issue.

Are you running a pig 0.12.0 or earlier jar against hadoop 2.2, if this is the case then I managed to get around this error by recompiling the pig jar from src, here is a summary of the steps involved on a debian type box
download the pig-0.12.0.tar.gz
unpack the jar and set permissions
then inside the unpacked directory compile the src with 'ant clean jar -Dhadoopversion=23'
then you need to get the jar on your class-path in maven, for example, in the same directory
mvn install:install-file -Dfile=pig.jar -DgroupId={set a groupId}-
DartifactId={set a artifactId} -Dversion=1.0 -Dpackaging=jar
or if in eclipse then add jar as external libary/dependency
I was getting your exact trace trying to run pig 12 in a hadoop 2.2.0 and the above steps worked for me
UPDATE
I posted my issue on the pig jira and they responded. They have a pig jar already compiled for hadoop2 pig-h2.jar here http://search.maven.org/#artifactdetails|org.apache.pig|pig|0.12.0|jar
a maven tag for this jar is
org.apache.pig
pig
h2
0.12.0
provided

Safe mode is also on of reason for this exception
run the below command
hadoop dfsadmin -safemode leave

Related

Create Hive table with Phoenix handler throws NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo

I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;

Unable to resolve Over() in Apache Pig

I'm getting the following error when using Over() in Pig:
Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
The error occurs upon execution of the closing brace for C:
A = load 'data/watch*.txt' as (id,ts,watch);
B= GROUP A BY id;
C= FOREACH B {
C1 = ORDER A BY ts;
GENERATE FLATTEN(Stitch(C1,Over(C1.watch,'lag',-1,0)));
}
It seems to me that Over() is not included in my Pig but I'm not sure why because I believe my versions of pig and hadoop should be sufficiently up to date.
$ pig -version
Apache Pig version 0.12.1-SNAPSHOT (rexported)
compiled Feb 19 2014, 16:31:42
$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar
Any insight would be much appreciated. I'm wondering at this point if I should just use an Over() UDF from PiggyBank.
I believe there is no OVER function in the built-ins for Pig v12. You need to use the OVER function in piggybank.
REGISTER piggybank.jar
DEFINE Over org.apache.pig.piggybank.evaluation.Over();
A = load 'data/watch*.txt' as (id,ts,watch);
B= GROUP A BY id;
C= FOREACH B {
C1 = ORDER A BY ts;
GENERATE FLATTEN(Stitch(C1,Over(C1.watch,'lag',-1,0)));
}

Integrating Hbase with Hive: Register Hbase table

I am using Hortonworks Sandbox 2.0 which contains the following version of Hbase and Hive
Component Version
------------------------
Apache Hadoop 2.2.0
Apache Hive 0.12.0
Apache HBase 0.96.0
Apache ZooKeeper 3.4.5
...and
I am trying to register my hbase table into hive using the following query
CREATE TABLE IF NOT EXISTS Document_Table_Hive (key STRING, author STRING, category STRING) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES (‘hbase.columns.mapping’ = ‘:key,metadata:author,categories:category’) TBLPROPERTIES (‘hbase.table.name’ = ‘Document’);
This does not work, I get the following Exception:
2014-03-26 09:14:57,341 ERROR exec.DDLTask (DDLTask.java:execute(435)) – java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.setConf(HBaseStorageHandler.java:249)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
2014-03-26 09:14:57,368 ERROR ql.Driver (SessionState.java:printError(419)) – FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hbase/HBaseConfiguration
I have already created the Hbase Table “Document” and the describe command gives the following description
‘Document’,
{NAME => ‘categories’,..},
{NAME => ‘comments’,..},
{NAME => ‘metadata’,..}
I have tried the following things
add hive.aux.jars.path in hive-site.xml
hive.aux.jars.path
file:///etc/hbase/conf/hbase-site.xml,file:///usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar,file:///usr/lib/hive/lib/hive-hbase-handler-0.12.0.2.0.6.0-76.jar,file:///usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar,file:///usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar
add jars using hive add jar command
add jar /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar;
add jar /usr/lib/hive/lib/hive-hbase-handler-0.12.0.2.0.6.0-76.jar;
add jar /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar;
add jar /usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar;
add file /etc/hbase/conf/hbase-site.xml
specify the hadoop_classpath
export HADOOP_CLASSPATH=/etc/hbase/conf:/usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2:/usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar
And it is still not working!
How can I add the jars in the hive classpath so that it finds the hbaseConfiguration class,
or is it an entirely different issue?
No need to copy the entire jars. Just hbase-*.jar , zookeeper*.jar, hive-hbase-handler*.jar would be enough. By default all hadoop related jars would be added to hadoop classpath, Since hive internally uses hadoop command to execute.
Or
Instead of copying hbase jars to hive library by specifying HIVE_AUX_JARS_PATH environment variable to /usr/lib/hbase/lib/ in /etc/hive/conf/hive-env.sh will also do.
The second approach is more suggested than first

pig + hbase + hadoop2 integration

has anyone had successful experience loading data to hbase-0.98.0 from pig-0.12.0 on hadoop-2.2.0 in an environment of hadoop-2.20+hbase-0.98.0+pig-0.12.0 combination without encountering this error:
ERROR 2998: Unhandled internal error.
org/apache/hadoop/hbase/filter/WritableByteArrayComparable
with a line of log trace:
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/WritableByteArra
I searched the web and found a handful of problems and solutions but all of them refer to pre-hadoop2 and base-0.94-x which were not applicable to my situation.
I have a 5 node hadoop-2.2.0 cluster and a 3 node hbase-0.98.0 cluster and a client machine installed with hadoop-2.2.0, base-0.98.0, pig-0.12.0. Each of them functioned fine separately and I got hdfs, map reduce, region servers , pig all worked fine. To complete an "loading data to base from pig" example, i have the following export:
export PIG_CLASSPATH=$HADOOP_INSTALL/etc/hadoop:$HBASE_PREFIX/lib/*.jar
:$HBASE_PREFIX/lib/protobuf-java-2.5.0.jar:$HBASE_PREFIX/lib/zookeeper-3.4.5.jar
and when i tried to run : pig -x local -f loaddata.pig
and boom, the following error:ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable (this should be the 100+ times i got it dying countless tries to figure out a working setting).
the trace log shows:lava.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/WritableByteArrayComparable
the following is my pig script:
REGISTER /usr/local/hbase/lib/hbase-*.jar;
REGISTER /usr/local/hbase/lib/hadoop-*.jar;
REGISTER /usr/local/hbase/lib/protobuf-java-2.5.0.jar;
REGISTER /usr/local/hbase/lib/zookeeper-3.4.5.jar;
raw_data = LOAD '/home/hdadmin/200408hourly.txt' USING PigStorage(',');
weather_data = FOREACH raw_data GENERATE $1, $10;
ranked_data = RANK weather_data;
final_data = FILTER ranked_data BY $0 IS NOT NULL;
STORE final_data INTO 'hbase://weather' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:date info:temp');
I have successfully created a base table 'weather'.
Has anyone had successful experience and be generous to share with us?
ant clean jar-withouthadoop -Dhadoopversion=23 -Dhbaseversion=95
By default it builds against hbase 0.94. 94 and 95 are the only options.
If you know which jar file contains the missing class, e.g. org/apache/hadoop/hbase/filter/WritableByteArray, then you can use the pig.additional.jars property when running the pig command to ensure that the jar file is available to all the mapper tasks.
pig -D pig.additional.jars=FullPathToJarFile.jar bulkload.pig
Example:
pig -D pig.additional.jars=/usr/lib/hbase/lib/hbase-protocol.jar bulkload.pig

Hadoop Hive integration INSERT query

i'am hadoop newbie,and i'am trying this tutorial:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
1.Starting hive is done successfully with the parameter:
hive --auxpath /cygdrive/c/Hadoop/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/cygdrive/c/javaHBase/hbase-0.94.6/hbase-0.94.6.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/zookeeper-3.4.3.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/guava-r09.jar -hiveconf hbase.master=localhost:60010
2.starting hbase is successuful.
3."CREATE TABLE hbase_table_1" is done successfully
4.I verified with commands list and show tables, all is ok
here is my problem
"INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;"
i get this error message searching in "htp://localhost:50060/tasklog?attemptid..."
java.lang.ClassNotFoundException: org/apache/hadoop/hive/hbase/HBaseSerDe
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:280)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:78)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
...
i tried to copy hive jars to hbase install and vice vesa...
note: i added necessary JARS with hive command: ADD JAR C:...\hive-hbase-handler-0.9.0.jar etc.
hbase version: 0.94.6
hive version: 0.9.0
any additional export? or configuration?
I need help please!
Thanks a lot!
Try adding these jars in ditributed cache by running following commands in hive CLI or include these lines in $HOME/.hiverc file. That should resolve the ClassNotFoundException.
ADD JAR ...../hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar;
ADD JAR ...../hbase-0.94.1/hbase-0.94.6.jar;
ADD JAR ...../hbase-0.94.1/lib/zookeeper-3.4.3.jar;
ADD JAR ...../hbase-0.94.1/lib/guava-11.0.2.jar;
ADD JAR ...../hbase-0.94.1/lib/protobuf-java-2.4.0a.jar;
This is required when running hive queries in mapred mode (not local).
Brother do following things. your problem will be solved. ( change according to your version )
copy these files in hadoop lib directory
$HIVE_HOME/lib/hive-hbase-handler-0.8.1.jar,$HIVE_HOME/lib/hbase-0.90.0.jar,$HIVE_HOME/lib/zookeeper-3.3.1.jar,$HIVE_HOME/lib/guava-r06.jar,
and hive-serde jar

Resources