Unable to resolve Over() in Apache Pig - hadoop

I'm getting the following error when using Over() in Pig:
Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
The error occurs upon execution of the closing brace for C:
A = load 'data/watch*.txt' as (id,ts,watch);
B= GROUP A BY id;
C= FOREACH B {
C1 = ORDER A BY ts;
GENERATE FLATTEN(Stitch(C1,Over(C1.watch,'lag',-1,0)));
}
It seems to me that Over() is not included in my Pig but I'm not sure why because I believe my versions of pig and hadoop should be sufficiently up to date.
$ pig -version
Apache Pig version 0.12.1-SNAPSHOT (rexported)
compiled Feb 19 2014, 16:31:42
$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar
Any insight would be much appreciated. I'm wondering at this point if I should just use an Over() UDF from PiggyBank.

I believe there is no OVER function in the built-ins for Pig v12. You need to use the OVER function in piggybank.
REGISTER piggybank.jar
DEFINE Over org.apache.pig.piggybank.evaluation.Over();
A = load 'data/watch*.txt' as (id,ts,watch);
B= GROUP A BY id;
C= FOREACH B {
C1 = ORDER A BY ts;
GENERATE FLATTEN(Stitch(C1,Over(C1.watch,'lag',-1,0)));
}

Related

Unable to Store a Pig Relation using Parquet Storer

I am trying the below Pig statements in grunt shell.
pig version is --> Apache Pig version 0.12.1
grunt> register /home/user/surender/mapreducejars/parquet-pig-1.0.1.jar;
grunt> A = LOAD '/user/user/inputfiles/parquet.txt' USING PigStorage(',') AS (id:int,name:chararray);
grunt> STORE A into '/user/user/outputfiles/pig' USING parquet.pig.ParquetStorer;
2016-09-27 07:09:18,509 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. parquet/io/ParquetEncodingException
Details at logfile: /home/user/surender/localinputfiles/pig_1474973730264.log
I want to know what went wrong here .Can someone help me on storing the pig relation using parquetStorage
you need to add the parquet jar like parquet-pig-bundle-1.5.0.jar and register it by
REGISTER '/path_for_jar/parquet-pig-bundle-1.5.0.jar';
please check the link which explains about it.
Here's a link!

MultiStorage in pig

I have run the below pig script in the grunt shell
Register D:\Pig\contrib\piggybank\java\piggybank.jar;
a = load '/part' using PigStorage(',') as (uuid:chararray,timestamp:chararray,Name:chararray,EmailID:chararray,CompanyName:chararray,Location:chararray);
store a into '/output/multistorage' USING MultiStorage('/output/multistorage','2', 'none', ',');
while running this it throws error as shown below
2015-11-03 05:47:36,328 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 10
70: Could not resolve MultiStorage using imports: [, java.lang., org.apache.pig.
builtin., org.apache.pig.impl.builtin.]
Can any one help me in this?
You did not import your function as the log claims. If the jar is actually accessible for you, you can try the following code (There was one missing line):
REGISTER D:\Pig\contrib\piggybank\java\piggybank.jar;
DEFINE MULTISTORAGE org.apache.pig.piggybank.storage.MultiStorage();
a = LOAD'/part' USING PigStorage(',') AS (uuid:chararray,timestamp:chararray,Name:chararray,EmailID:chararray,CompanyName:chararray,Location:chararray);
STORE a into '/output/multistorage' USING MULTISTORAGE('/output/multistorage','2', 'none', ',');
You are then partitionnig by Name.

pig + hbase + hadoop2 integration

has anyone had successful experience loading data to hbase-0.98.0 from pig-0.12.0 on hadoop-2.2.0 in an environment of hadoop-2.20+hbase-0.98.0+pig-0.12.0 combination without encountering this error:
ERROR 2998: Unhandled internal error.
org/apache/hadoop/hbase/filter/WritableByteArrayComparable
with a line of log trace:
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/WritableByteArra
I searched the web and found a handful of problems and solutions but all of them refer to pre-hadoop2 and base-0.94-x which were not applicable to my situation.
I have a 5 node hadoop-2.2.0 cluster and a 3 node hbase-0.98.0 cluster and a client machine installed with hadoop-2.2.0, base-0.98.0, pig-0.12.0. Each of them functioned fine separately and I got hdfs, map reduce, region servers , pig all worked fine. To complete an "loading data to base from pig" example, i have the following export:
export PIG_CLASSPATH=$HADOOP_INSTALL/etc/hadoop:$HBASE_PREFIX/lib/*.jar
:$HBASE_PREFIX/lib/protobuf-java-2.5.0.jar:$HBASE_PREFIX/lib/zookeeper-3.4.5.jar
and when i tried to run : pig -x local -f loaddata.pig
and boom, the following error:ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable (this should be the 100+ times i got it dying countless tries to figure out a working setting).
the trace log shows:lava.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/WritableByteArrayComparable
the following is my pig script:
REGISTER /usr/local/hbase/lib/hbase-*.jar;
REGISTER /usr/local/hbase/lib/hadoop-*.jar;
REGISTER /usr/local/hbase/lib/protobuf-java-2.5.0.jar;
REGISTER /usr/local/hbase/lib/zookeeper-3.4.5.jar;
raw_data = LOAD '/home/hdadmin/200408hourly.txt' USING PigStorage(',');
weather_data = FOREACH raw_data GENERATE $1, $10;
ranked_data = RANK weather_data;
final_data = FILTER ranked_data BY $0 IS NOT NULL;
STORE final_data INTO 'hbase://weather' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:date info:temp');
I have successfully created a base table 'weather'.
Has anyone had successful experience and be generous to share with us?
ant clean jar-withouthadoop -Dhadoopversion=23 -Dhbaseversion=95
By default it builds against hbase 0.94. 94 and 95 are the only options.
If you know which jar file contains the missing class, e.g. org/apache/hadoop/hbase/filter/WritableByteArray, then you can use the pig.additional.jars property when running the pig command to ensure that the jar file is available to all the mapper tasks.
pig -D pig.additional.jars=FullPathToJarFile.jar bulkload.pig
Example:
pig -D pig.additional.jars=/usr/lib/hbase/lib/hbase-protocol.jar bulkload.pig

ERROR 1066: Unable to open iterator for alias in pig

I am running a pig script which is as follows
REGISTER '/home/vishal/FirstUdf.jar';
DEFINE UPPER com.first.UPPER();
A = LOAD '/home/vishal/exampleforPIG1' AS (exchange: chararray, symbol: chararray, date: int,value:float);
B= FOREACH A GENERATE com.first.UPPER(exchange);
DUMP B;
Following is my UDF in java
package com.first;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
#SuppressWarnings("deprecation")
public class UPPER extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String str = (String) input.get(0);
return str.toLowerCase();
} catch (Exception e) {
throw WrappedIOException.wrap(
"Caught exception processing input row ", e);
}
}
}
Now when i try to run that ,it gives me the following error
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias B
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:866)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:683)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:430)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:858)
... 12 more
Whys is that in pig script it is not able to open an iterator for B ie (it is not able to assign an iterator for the following line)
B = FOREACH A GENERATE com.first.UPPER(exchange);
'exampleforPIG1' file has following data
NYSE CPO 2009-12-30 0.14
NYSE CPO 2009-09-28 0.14
NYSE CPO 2009-06-26 0.14
NYSE CPO 2009-03-27 0.14
NYSE CPO 2009-01-06 0.14
NYSE CCS 2009-10-28 0.414
NYSE CCS 2009-07-29 0.414
..
..
etc
Well two things,
If all you want to do is typecast to upper/lower case, why not use the inbuilt functions UPPER/LOWER.
You can find the usage in the reference manuals.
If you want to continue in the same method,
it should be
B = FOREACH A GENERATE UPPER(exchange);
You have already defined it as DEFINE UPPER com.first.UPPER();
I faced this issue and after breaking my head I found that the flaw was in the input data, even though I was cleaning by replacing null. I had a single record which had fields like '(null)', and it was causing everything to fail. Just check this once, if you have bad records like this.
Its the avro version which caused this error for me.
I was using avro-1.7.6.jar. Changing it to avro-1.4.0.jar solved my issue.
Are you running a pig 0.12.0 or earlier jar against hadoop 2.2, if this is the case then I managed to get around this error by recompiling the pig jar from src, here is a summary of the steps involved on a debian type box
download the pig-0.12.0.tar.gz
unpack the jar and set permissions
then inside the unpacked directory compile the src with 'ant clean jar -Dhadoopversion=23'
then you need to get the jar on your class-path in maven, for example, in the same directory
mvn install:install-file -Dfile=pig.jar -DgroupId={set a groupId}-
DartifactId={set a artifactId} -Dversion=1.0 -Dpackaging=jar
or if in eclipse then add jar as external libary/dependency
I was getting your exact trace trying to run pig 12 in a hadoop 2.2.0 and the above steps worked for me
UPDATE
I posted my issue on the pig jira and they responded. They have a pig jar already compiled for hadoop2 pig-h2.jar here http://search.maven.org/#artifactdetails|org.apache.pig|pig|0.12.0|jar
a maven tag for this jar is
org.apache.pig
pig
h2
0.12.0
provided
Safe mode is also on of reason for this exception
run the below command
hadoop dfsadmin -safemode leave

Loading Hbase table with Pig. Float gives FIELD_DISCARDED_TYPE_CONVERSION_FAILED

I've got a HBase table that is loaded via the HBase Java api like so:
put.add(Bytes.toBytes(HBaseConnection.FAMILY_NAME), Bytes.toBytes("value"), Bytes.toBytes(value));
(Where the variable value is a normal java float.)
I proceed to load this with Pig as follows:
raw = LOAD 'hbase://tableName' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:value', '-loadKey true -limit 5') AS (id:chararray, value:float);
However when I dump this with:
dump raw;
I get:
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 time(s).
for each float value. The ID's are printed fine.
Im running:
Apache Hadoop 0.20.2.05
Pig 0.9.2
Hbase 0.92.0
My question: Why cant pig handle theses float values? What am I doing wrong?
Turns out you have to add a caster. Like so:
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:value', '-loadKey true -limit 5 -caster HBaseBinaryConverter')
Please try by following way:
test = load '/user/training/user' using PigStorage(',')
as (user_id, name, age:int, country, gender);
As default delimiter for loading is tab.

Resources