Grunt shell error on spaces in string - hadoop

I am trying to register a UDF jar in the Pig grunt shell (Pig 0.13.0). The register statement below errors due to what I believe is the space in the path:
register '/home/hadoop/Eclipse Projects/pigudfs/target/pigudfs-0.0.1-SNAPSHOT.jar';
The following error is generated:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <QUOTEDSTRING> "\'/home/hadoop/Eclipse Projects/pigudfs/target/pigudfs-0.0.1-SNAPSHOT.jar\' "" at line 7, column 10
I have tried an array of options to try to escape the space without any luck...

Try this,
give the path name without the single quotes,
like
register home/hadoop/Eclipse Projects/pigudfs/target/pigudfs-0.0.1-SNAPSHOT.jar;

Related

Using subprocess.call from python that contains escaped characters

am triggering SQL loader from a python script (2.7);
The password does contain an # sign. If I call sql loader from the command line and escape the password (username/\"p#ssword\"#database) the process works. However, when I apply what I believe is the same logic within a python script I get an error:
SQL*Loader-704: Internal error: ulconnect: OCIServerAttach [0]
ORA-12154: TNS:could not resolve the connect identifier specified
Since I can run the same command in the cmd prompt successfully, I don't believe this is an issue with the TNSNAMES.ORA file containing any incorrect or missing parameters. I'm pretty confident this is an issue with calling SQL loader from the subprocess command and the escape characters.
Python Logic:
subprocess.call("sqlldr userid=" +config.ddw["user"] + "/\"" +
config.ddw["password"] +"\"#" + config.ddw["connection"] + "
control=C:/projects/controlFile.ctl log=C:/logFile.log)
If I print this statement the string looks like:
sqlldr userid=USERNAME/"p#ssw0rd"#connection/db
(2.7)control=C:/projects/controlFile.ctl log=C:/logFile.log
When I load the string directly in the command line it works:
sqlldr userid=USERNAME/\"p#ssw0rd\"#connection/db
control=C:/projects/controlFile.ctl log=C:/logFile.log
You need those double-quotes escaped so sqlldr sees them. I don't know python, but it appears you need to change that code to make sure you get a backslash in front of the double-quotes. You may need to escape the backslash too since it is most likely a special character.
Perhaps something like this?
subprocess.call("sqlldr userid=" +config.ddw["user"] + "/\\"" +
config.ddw["password"] +"\\"#" + config.ddw["connection"] + "
This is a SWAG so your mileage may vary a little :-)

bash: syntax error near unexpected token `(' - PIG, CentOs

I am trying to execute the following command in pig
7369,SMITH,CLERK,800.00,null,20
7499,ALLEN,SALESMAN,1600.00,300.00,30
Script
emp_bag = LOAD '/home/training/dvs/emp.csv' using PigStorage(',') AS (eno:int, ename:chararray, job:chararray, sal:int, comm:int, deptno:int);
And getting the below error
bash: syntax error near unexpected token `('
Please help to resolve this.
Are you running your pig command on bash ?
If yes, please start the pig console first and then run it.
Just type pig and enter.
Most likely the issue is the data of type float.You need to change the datatype for 4th and 5th field to float from int.
Also if null is a string then you will have to handle it using chararray field and replace 'null' with ''.
emp_bag = LOAD '/home/training/dvs/emp.csv' using PigStorage(',') AS (eno:int, ename:chararray, job:chararray, sal:float, comm:float, deptno:int);
Alternatively,you can check whether the issue is with the datatype by not specifying the schema in which case the default datatype will be bytearray.
emp_bag = LOAD '/home/training/dvs/emp.csv' using PigStorage(',')
You probably haven't activated the grunt shell.

Hive error when declaring hivevar

Trying to declare a variable in Hive using Hue online. Using the following code:
SET hivevar:TABLE1=location.tablename;
I am getting the following error message:
Error while compiling statement: FAILED: ParseException line 1:12 missing KW_ROLE at 'hivevar' near 'hivevar' line 1:19 missing EOF at ':' near 'hivevar'.
Can anyone tell me what this error message means or even what the KW_ROLE statement means?
Do you by any chance have a comment above that instruction ? Are you running that line and that line only ?
For example, the following will raise a similar Exception :
--This is a comment
SET hivevar:TABLE1=location.tablename;
But it works fine without the comment.
I guess you are making changes in MAC/Windows and moving the script to the server, Double dash "--" in MAC is a different from double dash "--" on Linux server, make changes on server itself and run the script...

whitespace character in case of parameter substitution

I want to pass a filter statement with in my pig script using parameter substitution
For that I have tried
exec -param flt='a1==1 AND a2=2' filterscript.pig
But sadly it is throwing an exception message
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: Local file 'AND' does not exist.
Pig version - 0.9.2
I have tried flt='\'a1==1 AND a2=2\'' and flt="a1==1 AND a2==2" suggested by pig users in apache forum as well as seen a similar post in SO.
Any help will be appreciated
I think you are using the parameter passed as it is as a condition. If so you will get an error like this. Instead you can pass them as separate paarmeters and form the condition string inside the pig script.
exec -p p1=1 -p p2=2 filterscript.pig
Inside your filterscript.pig script you can use these parameter values in condition clauses. For example
a1==$p1 AND a2=$p2
If you run your script outside the grunt shell you can do the followings:
pig -param flt="a1\=\=1 AND a2\=\=2" -f filterscript.pig
where filterscript.pig is something like this:
A = load ...
...
B = filter A by $flt;
...
Note that the '=' is also escaped, otherwise the filter condition won't be evalued to boolean.
If you want to use the filter substitution within the grunt shell as you tried with exec,
then you'll encounter the whitespace problem. Since escaping the whitespace character doesn't work, as a workaround you can create a parameter file :
cat params.txt
flt="a1\=\=1 AND a2\=\=2"
Then issue:
exec -param_file params.txt filterscript.pig
Note: I use Pig 0.12

Exit pig shell command safely

When I enter some erroneous command in a Pig interactive shell environment, it enters into listening mode (>>) like below. How do I safely come out of this command, but still stay in the pig shell environment?
Ctrl + C takes me out of the pig shell and I lose my environment setup till that point.
**grunt> Test_group = group Block2_Prep_filter by (page_visit_id as grp_page_visit_id, page_user_guid as grp_page_user_guid);
>> ;
>>
>>**
I've looked in the pig source code. This is called the secondary_prompt (found in PigScriptParser.jj, a context-free parser grammar file for JavaCC). To my eye it looks like it can't be gotten out of. I tried a lot of combinations of things I saw in that code and nothing worked. Also tried all the exit type words I could think of, to no avail.
When I did Ctrl + D, it exited and displayed:
>> 2013-06-19 12:51:43,632 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000:
Error during parsing. Lexical error at line 83, column 0. Encountered: <EOF> after : ""
Looking in the Grunt class, at that point, it does:
parser.setInteractive(false);
return parser.parseStopOnError();
This suggests to me that interactivity is over at this point.

Resources