initialData = load 'Weather_Report.log' using PigStorage('|') as (cityid:int,cityname:chararray,currentWeather:chararray,weatherCode:int);
SPLIT initialData INTO noRainsCities IF weatherCode ==10;
STORE noRainCities INTO 'WEATHER_ANALYTICS/TEST_OUT/NoRainCititesData';
PLZ HElp me out guys
This is the error
2016-09-28 11:03:14,597 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 52> Syntax error, unexpected symbol at or near '='
iniData = LOAD 'empdetail.log' using PigStorage('|') as (id:int,x:chararray,city:chararray,tech:chararray);
split iniData into a if tech=='Java',b if city=='Pune';
dump a;
dump b;
GUYS : SPLIT DONT WORK UNTIL 2 OR MORE CONDITIONS ARE GIVEN ;
PROBLEM SOLVED THANKS
Related
This is the code I'm running:
bigrams = LOAD 's3://******' AS (bigram:chararray, year:int, occurrences:int, books:int);
bg_tmp = filter bigrams BY (occurrences >= 300) AND (books >= 12);
bg_tmp_2 = GROUP bg_tmp ALL;
occ_cnt = FOREACH bg_tmp_2 GENERATE bigram, SUM(bg_tmp_2.occurrences);
x = LIMIT occ_cnt 100;
DUMP x;
This is the error I'm getting when I'm computing occ_cnt
81201 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 5, column 48> Invalid scalar projection: bg_tmp_218/10/26 16:05:07 ERROR grunt.Grunt: ERROR 1200: Pig script failed to parse: <line 5, column 48> Invalid scalar projection: bg_tmp_2
Details at logfile: /mnt/var/log/pig/pig_1540569826316.log
I have no idea why this is happening. I'm using Apache Pig 0.17.0 and Hadoop 2.8.4 on AWS EMR
I would rewrite your query as
bg_tmp_2 = GROUP bg_tmp by (bigram);
occ_cnt = FOREACH bg_tmp_2 GENERATE group, SUM(bg_tmp.occurrences);
Replacing GROUP ALL since I think you want the SUM per bigram entry.
Replacing bg_tmp2 with bg_tmp since you want to reference the bg_tmp BAG inside bg_tmp_2 relation.
(If you run "describe bg_tmp_2", you'll see the following schema)
bg_tmp_2: {group: chararray,bg_tmp: {(bigram: chararray,year: int,occurrences: int,books: int)}}
This script working fine
data1 = LOAD '/user/maria_dev/ml-100k/test/u3.data' AS (usesrID:int, movieID:int, rating:int, ratingTime:int);
DUMP data1;
and the output is
When i used FILTER then PIG through error
data1 = LOAD '/user/maria_dev/ml-100k/test/u3.data' AS (usesrID:int, movieID:int, rating:int, ratingTime:int);
filterRowData1=filter data1 by (int)movieID == 556;
DUMP filterRowData1;
Error screen-shot
Error Detail:
2018-10-20 23:20:24,653 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1000: Error during parsing. Encountered " "filterRowData1=filter "" at line 2, column 1.
I have also tried
data1 = LOAD '/user/maria_dev/ml-100k/test/u3.data' AS (usesrID:int, movieID:int, rating:int, ratingTime:int);
filterRowData1=filter data1 by movieID == 556; (i have tried: '556'; but no luck)
DUMP filterRowData1;
filterRowData1 = filter data1 by movieID == 556;
you should space between alias name and query.
I tried to load the data using following command in apache pig on hdfs mode:
test = LOAD /user/swap/done2.csv using PigStorage (',')as (ID:long, Country:chararray, Carrier:float, ClickDate:chararray, Device:chararray, OS:chararray, UserIp:chararray, PublisherId:float, advertiserCampaignId:float, Fraud:float);
it gives the error as below:
2017-12-12 13:49:10,347 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input '/' expecting QUOTEDSTRING
Details at logfile: /home/matlab/Documents/pig_1513066708530.log
surprisingly My dataset does not have the 13 columns.
file path should be in quotes '' to LOAD
test = LOAD '/user/swap/done2.csv' using PigStorage (',')as (ID:long, Country:chararray, Carrier:float, ClickDate:chararray, Device:chararray, OS:chararray, UserIp:chararray, PublisherId:float, advertiserCampaignId:float, Fraud:float);
I need to get the average age in each gender group...
Here is my data set:
01::F::21::0001
02::M::31::21345
03::F::22::33323
04::F::18::123
05::M::31::14567
Basically this is
userid::gender::age::occupationid
Since there is multiple delimiter, i read somewhere here in stackoverflow to load it first via TextLoader()
loadUsers = LOAD '/user/cloudera/test/input/users.dat' USING TextLoader() as (line:chararray);
testusers = FOREACH loadusers GENERATE FLATTEN(STRSPLIT(line,'::')) as (user:int, gender:chararray, age:int, occupation:int);
grunt> DESCRIBE testusers;
testusers: {user: int,gender: chararray,age: int,occupation: int}
grouped_testusers = GROUP testusers BY gender;
average_age_of_testusers = FOREACH grouped_testusers GENERATE group, AVG(testusers.age);
after running
dump average_age_of_testusers
this is the error in hdfs
2016-10-31 13:39:22,175 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats -
ERROR 0: Exception while executing (Name: grouped_testusers: Local Rearrange[tuple]{chararray}(false) - scope-284 Operator Key: scope-284): org.apache.pig.backend.executionengine.ExecException:
ERROR 2106: Error while computing average in Initial 2016-10-31 13:39:22,175 [main]
ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
Input(s):
Failed to read data from "/user/cloudera/test/input/users.dat"
Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-169204712/tmp-1755697117"
This is my first try in programming in pig, so forgive me if the solution is very obvious.
Analyzing it further, it seems it has trouble computing the average, i thought i made a mistake in data type but age is int.
if you can help me, thank you.
I figured out the problem in this one. Please refer to How can correct data types on Apache Pig be enforced? for a better explanation.
But then, just to show what I did... I had to cast my data
FOREACH loadusers GENERATE FLATTEN((tuple(int,chararray,int,int)) STRSPLIT(line,'::')) as (user:int, gender:chararray, age:int, occupation:int);
AVG is failing because loadusers.age is being treated as string instead of int.
I have input text file( name multidelimiter) with followings records
1,Mical,2000;10
2,Smith,3000;20
I have written pig code as follows
A =LOAD '/user/input/multidelimiter' AS line;
B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)[,](.*)[,](.*)[;]')) AS (f1,f2,f3,f4);
But this code in not work given following error
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 78. Encountered: <EOF> after : "\'(.*)[,](.*)[,](.*)[;"
I refereed following links but not able to resolve my error
how to load files with different delimiter each time in piglatin
Please help me get out from this error.
Thanks.
Solution for your input example:
LOAD as comma separated, than STRSPLIT by ';' and FLATTEN
Finally got solution.
Here is my solution:
A =LOAD '/user/input/multidelimiter' using PigStorage(',') as (empid,ename,line);
B = FOREACH A GENERATE empid,ename, FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)\\u003B(.*)')) AS (sal:int,deptno:int);