How to read json data in pig?

How to read json data in pig? - hadoop

I have the following type of json file:
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
I am trying to execute the following pig script to load json data
A = load 'pigdemo/employeejson.json' using JsonLoader ('employees:{(firstName:chararray)},{(lastName:chararray)}');
getting error!!
Unable to recreate exception from backed error: Error:
org.codehaus.jackson.JsonParseException: Unexpected end-of-input:
expected close marker for ARRAY (from [Source:
java.io.ByteArrayInputStream#1553f9b2; line: 1, column: 1]) at
[Source: java.io.ByteArrayInputStream#1553f9b2; line: 1, column: 29]

First the reason that you see Unexpected end-of-input is because each recode should be in 1 line - like this :
{"employees":[{"firstName":"John", "lastName":"Doe"}, {"firstName":"Anna", "lastName":"Smith"}, {"firstName":"Peter", "lastName":"Jones"}]}
Now - since each row is employees list run the next command
A = load '$flurryData' using JsonLoader ('employees:bag {t:tuple(firstName:chararray, lastName:chararray)}');
describe A;
dump A;
Give the next output
A: {employees: {t: (firstName: chararray,lastName: chararray)}}
({(John,Doe),(Anna,Smith),(Peter,Jones)})
Hope this help !

Related

Getting anonymous error on JoinEnrichment processor org.apache.calcite.sql.validate.SqlValidatorExceptionannot

I am getting error on JoinEnrichment and don't know why it is coming.
My query in JoinEnrichment :
SELECT original.*, enrichment.*
FROM original
FULL JOIN enrichment
ON original.name = enrichment.name
Error
JoinEnrichment[id=a17b1f06-f80c-3641-945f-c2ff331f8028] Failed to join 'original' FlowFile FlowFile[filename=cmdbci-mtaas] and 'enrichment' FlowFile FlowFile[filename=cmdbci-mtaas]; routing to failure: java.sql.SQLException: Error while preparing statement [SELECT original., enrichment.
FROM original
FULL JOIN enrichment
ON original.name = enrichment.name]
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 4, column 4 to line 4, column 34: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <JAVATYPE(CLASS JAVA.LANG.STRING)>'. Supported form(s): '<COMPARABLE_TYPE> = <COMPARABLE_TYPE>'
Caused by: : Corg.apache.calcite.sql.validate.SqlValidatorExceptionannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <JAVATYPE(CLASS JAVA.LANG.STRING)>'. Supported form(s): '<COMPARABLE_TYPE> = <COMPARABLE_TYPE>'
Sample input from original
Location,Environment,ip_address,category,name,dv_sys_updated_on
"",,,Hardware,Ndiggan,2022-12-17 22:37:28
"",,,Hardware,class,2022-12-31 22:37:38
"",,,,Vlan2,2022-12-27 02:17:13
Sample input from enrichment
Location,Environment,ip_address,category,name,dv_sys_updated_on
"",,,Hardware,vpna,2022-12-17 22:36:02
"",,,Hardware,dlcccno,2022-12-17 22:37:04
"",,,Hardware,Ndiggan,2022-12-17 22:37:28

PIG: Unable to open iterator for alias AliasName.Scalar has more than one row in the output

I am new to pig and trying to learn on my own.
I have written a script to get the epoch time with a word that is reading from words.txt file.
Here is the script.
words = LOAD 'words.txt' AS word:chararray;
B = FOREACH A GENERATE CONCAT(CONCAT(A.word,'_'),(chararray)ToUnixTime(CurrentTime());
dump B;
But the issue is, if words.txt file have only one word it is giving proper output.
If it is having multiple words like
word1
word2
word3
word4
then it is giving the following error
ERROR 1066: Unable to open iterator for alias B
java.lang.Exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Scalar has more than one row in the output. 1st : (word1 ), 2nd :(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar"
should be "foo::bar" ) at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
0: Scalar has more than one row in the output. 1st : (word1 ), 2nd
:(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar"
should be "foo::bar" ) at
org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:122) at
o
Please suggest me to solve this issue.
Thank you.

solved on my own.
just removed the A. from the inner CONCAT. It worked for me.
script:
words = LOAD 'words.txt' AS word:chararray;
B = FOREACH A GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime());
dump B;

SPLIT command not working in apache-pig

initialData = load 'Weather_Report.log' using PigStorage('|') as (cityid:int,cityname:chararray,currentWeather:chararray,weatherCode:int);
SPLIT initialData INTO noRainsCities IF weatherCode ==10;
STORE noRainCities INTO 'WEATHER_ANALYTICS/TEST_OUT/NoRainCititesData';
PLZ HElp me out guys
This is the error
2016-09-28 11:03:14,597 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 52> Syntax error, unexpected symbol at or near '='

iniData = LOAD 'empdetail.log' using PigStorage('|') as (id:int,x:chararray,city:chararray,tech:chararray);
split iniData into a if tech=='Java',b if city=='Pune';
dump a;
dump b;
GUYS : SPLIT DONT WORK UNTIL 2 OR MORE CONDITIONS ARE GIVEN ;
PROBLEM SOLVED THANKS

How to process multi - delimiter file in pig 0.8

I have input text file( name multidelimiter) with followings records
1,Mical,2000;10
2,Smith,3000;20
I have written pig code as follows
A =LOAD '/user/input/multidelimiter' AS line;
B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)[,](.*)[,](.*)[;]')) AS (f1,f2,f3,f4);
But this code in not work given following error
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 78. Encountered: <EOF> after : "\'(.*)[,](.*)[,](.*)[;"
I refereed following links but not able to resolve my error
how to load files with different delimiter each time in piglatin
Please help me get out from this error.
Thanks.

Solution for your input example:
LOAD as comma separated, than STRSPLIT by ';' and FLATTEN

Finally got solution.
Here is my solution:
A =LOAD '/user/input/multidelimiter' using PigStorage(',') as (empid,ename,line);
B = FOREACH A GENERATE empid,ename, FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)\\u003B(.*)')) AS (sal:int,deptno:int);

Calabash-android: getting error when trying to press on object with coordinates and using 'perform_action' command

I’m trying to use rect output with perform action command.
For example:
query("* text:’Hello’", :y)
[
[0] 226.0
]
Trying:
perform_action('long_press_coordinate',200,y)
And getting the error:
RuntimeError: Action 'long_press_coordinate' unsuccessful: Can not deserialize instance of java.lang.String[] out of END_OBJECT token
at [Source: java.io.StringReader#412a8480; line: 1, column: 61] (through reference chain: sh.calaba.instrumentationbackend.Command["arguments"])
Is it a syntax issue that I’m dealing with or is it much more?
How do I ‘’turn’ the y value to a regular number?

I found code that works:
y=query("* text:’Hello’", :y)
perform_action('long_press_coordinate',200,y[0])
Hope it is helping.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to read json data in pig? - hadoop

Related

Getting anonymous error on JoinEnrichment processor org.apache.calcite.sql.validate.SqlValidatorExceptionannot

PIG: Unable to open iterator for alias AliasName.Scalar has more than one row in the output

SPLIT command not working in apache-pig

How to process multi - delimiter file in pig 0.8

Calabash-android: getting error when trying to press on object with coordinates and using 'perform_action' command

Categories

Resources