Error while using python udf in Pig - hadoop

I am trying to use python udf but it is throwing below error. I am using CDH5.2
cat /home/spanda20/pig_data/panda1.py
def get_length(data):
return len(data)
REGISTER '/home/spanda20/pig_data/panda1.py' USING jython as my_udf;
grunt> A = LOAD 'hdfs://itsusmpl00509.jnj.com:8020/user/spanda20/pig_1.dat' USING PigStorage(',') AS (name:chararray, id:int);
grunt> B = FOREACH A GENERATE name, id,my_udf.get_length(name) as name_len;
2015-01-25 20:47:15,243 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve
my_udf.get_length using imports: [, java.lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at
logfile: /home/spanda20/pig_1422230028021.log

Sometimes, after a pig REGISTER command fails for UDF, you might have to restart the client for PIG to reload the UDF

Related

Error in loading the csv file in Apache Pig

I tried to load the data using following command in apache pig on hdfs mode:
test = LOAD /user/swap/done2.csv using PigStorage (',')as (ID:long, Country:chararray, Carrier:float, ClickDate:chararray, Device:chararray, OS:chararray, UserIp:chararray, PublisherId:float, advertiserCampaignId:float, Fraud:float);
it gives the error as below:
2017-12-12 13:49:10,347 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input '/' expecting QUOTEDSTRING
Details at logfile: /home/matlab/Documents/pig_1513066708530.log
surprisingly My dataset does not have the 13 columns.
file path should be in quotes '' to LOAD
test = LOAD '/user/swap/done2.csv' using PigStorage (',')as (ID:long, Country:chararray, Carrier:float, ClickDate:chararray, Device:chararray, OS:chararray, UserIp:chararray, PublisherId:float, advertiserCampaignId:float, Fraud:float);

Creating schema for Tuple in Apache Pig

How can I create Pig schema for the below tuple data while loading the relation?
]$ cat data
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)
I tried the below statement in local mode
A = LOAD '/home/cloudera/data' AS (t1:tuple(t1a:int,t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
If I dump the data, I expected the result
DUMP A;
((3,8,9),(4,5,6))
((1,4,7),(3,7,5))
((2,5,8),(9,5,8))
But what I get was,
((3,8,9),)
((1,4,7),)
((2,5,8),)
I am using Apache Pig version 0.11.0-cdh4.7.0
the next work:
A = load '$input' using PigStorage(' ') AS (t1:tuple(t1a:int,t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
describe A;
dump A;
The dump:
((3,8,9),(4,5,6))
((1,4,7),(3,7,5))
((2,5,8),(9,5,8))

Pig Job Failed.Need suggestion

Please help me to understand why the below join is failing.
Code:
nyse_div= load '/home/cloudera/NYSE_daily_dividends' using PigStorage(',')as(exchange:chararray, symbol:chararray, date:chararray, dividends:double);
nyse_div1= foreach nyse_div generate symbol,SUBSTRING(date,0,4) as year,dividends;
nyse_div2= group nyse_div1 by (symbol,year);
nyse_div3= foreach nyse_div2 generate group,AVG(nyse_div1.dividends);
nyse_price= load '/home/cloudera/NYSE_daily_prices' using PigStorage(',')as(exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj:double);
nyse_price1= foreach nyse_price generate symbol,SUBSTRING(date,0,4) as year,open..;
nyse_price2= group nyse_price1 by (symbol,year);
nyse_price3= foreach nyse_price2 generate group,MAX(nyse_price1.high),MIN(nyse_price1.low);
nyse_final= join nyse_div3 by group,nyse_price3 by group;
--store nyse_div3 into 'home/cloudera/NYSE_daily_dividends/output' using PigStorage(',');
--store nyse_price3 into 'home/cloudera/NYSE_daily_dividends/output1' using PigStorage(',');
store nyse_final into '/home/cloudera/NYSE_daily_dividends/output' using PigStorage(',');
****Failed Jobs:**
JobId Alias Feature Message Outputs
job_local766969553_0008 nyse_final HASH_JOIN Message: Job failed! /home/cloudera/NYSE_daily_dividends/output,
Input(s):
Successfully read records from: "/home/cloudera/NYSE_daily_dividends"
Successfully read records from: "/home/cloudera/NYSE_daily_prices"
Output(s):
Failed to produce result in "/home/cloudera/NYSE_daily_dividends/output"
Job DAG:
job_local1308827629_0006 -> job_local766969553_0008,
job_local241929118_0007 -> job_local766969553_0008,
job_local766969553_0008
2014-11-12 17:00:35,263 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs**
Your code works perfectly for me. I think you have to paste the complete error log to understand the error.
please see the output that i got for the below input
NYSE_daily_dividends
NYSE,AIT,2009-11-12,0.15
NYSE,AIT,2009-08-12,0.15
NYSE,AIT,2009-05-13,0.15
NYSE,AIT,2009-02-11,0.15
NYSE_daily_prices
NYSE,AEA,2010-02-08,4.42,4.42,4.21,4.24,205500,4.24
NYSE,AEA,2010-02-05,4.42,4.54,4.22,4.41,194300,4.41
NYSE,AEA,2010-02-04,4.55,4.69,4.39,4.42,233800,4.42
NYSE,AIT,2009-02-11,0.15,4.87,4.55,4.55,234444,4.56
Output from your code
((AIT,2009),0.15,(AIT,2009),4.87,4.55)

string concatenation not working in pig

I have a table in hcatalog which has 3 string columns. When I try to concatenate string, I am getting the following error:
A = LOAD 'default.temp_table_tower' USING org.apache.hcatalog.pig.HCatLoader() ;
B = LOAD 'default.cdr_data' USING org.apache.hcatalog.pig.HCatLoader();
c = FOREACH A GENERATE CONCAT(mcc,'-',mnc) as newCid;
Could not resolve concat using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Could not infer the matching function for org.apache.pig.builtin.CONCAT as multiple or none of them fit. Please use an explicit cast
What might be the root cause of the problem?
May be it will help for concatenation in pig
data1 contain:
(Maths,abc)
(Maths,def)
(Maths,ef)
(Maths,abc)
(Science,ac)
(Science,bc)
(Chemistry,xc)
(Telugu,xyz)
considering schema as sub:Maths,Maths,Science....etc and name :abc,def ,ef..etc
X = FOREACH data1 GENERATE CONCAT(sub,CONCAT('#',name));
O/P of X is:
(Maths#abc)
(Maths#def)
(Maths#ef)
(Maths#abc)
(Science#ac)
(Science#bc)
(Chemistry#xc)
(Telugu#xyz)

How to use string functions in pig

I am trying to convert a string to upper case in pig using one of it's built-in functions. I am using pig in local mode.
emps.csv
1,John,35,M,101,50000.00,03/03/79
2,Jack,30,F,201,3540000.00,09/10/84
Commands for loading data (WORKS FINE)
empdata = load 'emps.csv' using PigStorage(',') as (id:int,name:chararray,age:int,gender:chararray,deptId:int,sal:double);
dump empdata
Convert to upper case and print it (FAILS WITH ERROR)
empnameucase = foreach empdata generate id,upper(name);
But I am getting following exception after executing above command:
Error Log:
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve upper using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653)
at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:769)
at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1491)
... 28 more
Please guide.
Try this,
You should specify the function name in UPPER case like
UPPER(name)
Hopt,it should work.

Resources