Cannot compute MAX - hadoop

Setup data
mkdir data
echo -e "1\n2\n3\n4\n8\n4\n3\n6" > data/data.txt
Launch Pig in local mode
pig -x local
Script
a = load 'data' Using PigStorage() As (value:int);
b = foreach a generate MAX(value);
dump b;
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.MAX as multiple or none of them fit. Please use an explicit cast.

Just found the answer, it just take a GROUP ALL before calling the function ... Kind of feel the error message could be a little clearer ...
a = load 'data' Using PigStorage() As (value:int);
b = GROUP a ALL;
c = foreach b generate MAX(a.value);
dump c;
> 8

Related

I can't run a correlation with ggcorrmat

I am getting this error when running a correlation matrix with the R package ggstatsplo
ggcorrmat(data = d, type = "nonparametric", p.adjust.method = "hochberg",pch = "")
Error: 'data_to_numeric' is not an exported object from 'namespace:datawizard'
Somebody could help me.
I expected to have the script to run normally as I have used it before (around July) without any errors.
Is there anything that has changed about the ggstatsplot package?

Pig script scheduled by crontab not giving result

I have pig script which when I run from pig(map reduce mode) gives proper result but when I schedule from crontab does not store output as per the script.
Pig script is,
a1 = load '/user/training/abhijit_hdfs/id' using PigStorage('\t') as (id:int,name:chararray,desig:chararray);
a2 = load '/user/training/abhijit_hdfs/trips' using PigStorage('\t') as (id:int,place:chararray,no_trips:int);
j = join a1 by id,a2 by id;
g = group j by(a1::id,a1::name,a1::desig);`
`su = foreach g generate group,SUM(j.a2::no_trips) as tripsum;
ord = order su by tripsum desc;
f2 = foreach ord generate $0.$0,$0.$1,$0.$2,$1;
store f2 into '/user/training/abhijit_hdfs/results/trip_output' using PigStorage(' ');
Crontab is,
[training#localhost ~]$ crontab -l
40 3 * * * /home/training/Abhijit_Local/trip_crontab.pig
Please Guide.
Your crontab is attempting to treat the Pig script as an executable file and run it directly. Instead, you will likely need to pass it through the pig command explicitly, as described in the Apache Pig documentation on Batch Mode. You may also find it helpful to redirect stdout and stderr output to a log file somewhere in case you need to troubleshoot failures.
40 3 * * * pig /home/training/Abhijit_Local/trip_crontab.pig 2>&1 > /some/path/to/logfile
Depending on PATH environment variable settings, you might find that it's necessary to specify the absolute path to the pig command.
40 3 * * * /full/path/pig /home/training/Abhijit_Local/trip_crontab.pig 2>&1 > /some/path/to/logfile

Getting #rid while Update-Upsert in OrientDB without searching again

I am currently using OrientDB to build a graph model. I am using PyOrient to send commands for creating the nodes and edges.
Whenever I use INSERT command I get a list of things which includes #rid in return.
result = db.command("INSERT INTO CNID SET connected_id {0}".format(somevalue))
print result
OUTPUT: {'#CNID':{'connected_id': '10000'},'version':1,'rid':'#12:1221'}
However if I use the Update-Upsert command I only get one value as return which is not the #rid.
result = db.command("UPDATE CNID SET connected_id={0} UPSERT WHERE connected_id={0}".format(cn_value))
print result
OUTPUT: 1
I want to know is it possible to get #rid as well while doing UPDATE-UPSERT operation.
I created the following example in PyOrient:
Structure:
A useful method to retrieve the #rid from an UPDATE / UPSERT operation could be the usage of the RETURN AFTER $current syntax in your SQL command.
PyOrient Code:
import pyorient
db_name = 'Stack37308500'
print("Connecting to the server...")
client = pyorient.OrientDB("localhost",2424)
session_id = client.connect("root","root")
print("OK - sessionID: ",session_id,"\n")
if client.db_exists( db_name, pyorient.STORAGE_TYPE_PLOCAL ):
client.db_open(db_name, "root", "root")
result = client.command("UPDATE CNID SET connected_id = 20000 UPSERT RETURN AFTER $current.#rid WHERE connected_id = 20000")
for idx, val in enumerate(result):
print(val)
client.db_close()
By specifying $current.#rid you'll be able to retrieve the #rid of the resulting record (in this case a new record).
Code Output:
Connecting to the server...
OK - sessionID: 25
##12:1
Studio:
You can also modify the query to retrieve the whole resulting record by use only $current without specifying #rid (in this case I updated the record #12:1).
Query:
UPDATE CNID SET connected_id = 30000 UPSERT RETURN AFTER $current WHERE connected_id = 20000
Code Output:
Connecting to the server...
OK - sessionID: 26
{'#CNID':{'connected_id': 30000},'version':2,'rid':'#12:1'}
Studio:
Hope it helps

Replace character in pig

My data is in the following format..
{"Foo":"ABC","Bar":"20090101100000","Quux":"{\"QuuxId\":1234,\"QuuxName\":\"Sam\"}"}
I need it to be in this format:
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}
I'm trying to using Pig's replace function to get it in the format I need..
So, I tried ..
"LOGS = LOAD 'inputloc' USING TextStorage() as unparsedString:chararray;;" +
"REPL1 = foreach LOGS REPLACE($0, '"{', '{');" +
"REPL2 = foreach REPL1 REPLACE($0, '}"', '}');"
"STORE REPL2 INTO 'outputlocation';"
It throws an error.. Unexpected token '{' in expression or statement.
So based on an answer here, I tried:
"REPL1 = foreach LOGS REPLACE($0, '"\\{', '\\{');"
Now, it gives an error.. Unexpected token '\\' in expression or statement.
Any help is sincerely appreciated..
Thanks
Works for me:
REPL1 = FOREACH LOGS GENERATE REPLACE($0, '"\\{', '\\{');
In your code you are missing the GENERATE and the double quotes at the beginning and end are wrong.
Please check the below code.
LOGS = load 'inputlocation' as unparsedString:chararray;
REPL1 = foreach LOGS generate REPLACE($0, '"\\{', '\\{');
REPL2 = foreach REPL1 generate REPLACE($0, '}"', '}');
STORE REPL2 INTO 'outputlocation';
Hope it will work.
Load the data using the delimiter as shown below:
sam = load 'sampledata' using PigStorage(',');
sam1 = foreach sam generate $0,$1,CONCAT(REPLACE($2,'([^A-Za-z0-9:"{]+)',''),REPLACE($3,'([^A-Za-z0-9:"}]+)',''));
This will give you the desired output.
({"Foo":"ABC","Bar":"20090101100000","Quux":"{"QuuxId":1234"QuuxName":"Sam"}"})

How to Load the Data with out text qualifiers using PIG/HIVE/Hbase?

I Have One CSV file, Which Contains text qualifier(" ") data. I want to load the data into hdfs using PIG/Hive/Hbase without text qualifiers. plz give your help
my file input.CSV
"Id","Name"
"1","Raju"
"2","Anitha"
"3","Rakesh"
I want output like:
Id,Name
1,Raju
2,Anitha
3,Rakesh
Try this in pig script
Suppose your input file name is input.csv
1.First move this input file to HDFS using copyfromlocal command.
2. Run this below pig script
PigScript:
HDFS mode:
A = LOAD 'hdfs://<hostname>:<port>/user/test/input.csv' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'"(.*)","(.*)"')) AS (id:int,name:chararray);
STORE B INTO '/user/test/output' USING PigStorage(',');
Local mode:
A = LOAD 'input.csv' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'"(.*)","(.*)"')) AS (id:int,name:chararray);
STORE B INTO 'output' USING PigStorage(',');
Output:
Id,Name
1,Raju
2,Anitha
3,Rakesh

Resources