bash: syntax error near unexpected token `(' - PIG, CentOs - hadoop

I am trying to execute the following command in pig
7369,SMITH,CLERK,800.00,null,20
7499,ALLEN,SALESMAN,1600.00,300.00,30
Script
emp_bag = LOAD '/home/training/dvs/emp.csv' using PigStorage(',') AS (eno:int, ename:chararray, job:chararray, sal:int, comm:int, deptno:int);
And getting the below error
bash: syntax error near unexpected token `('
Please help to resolve this.

Are you running your pig command on bash ?
If yes, please start the pig console first and then run it.
Just type pig and enter.

Most likely the issue is the data of type float.You need to change the datatype for 4th and 5th field to float from int.
Also if null is a string then you will have to handle it using chararray field and replace 'null' with ''.
emp_bag = LOAD '/home/training/dvs/emp.csv' using PigStorage(',') AS (eno:int, ename:chararray, job:chararray, sal:float, comm:float, deptno:int);
Alternatively,you can check whether the issue is with the datatype by not specifying the schema in which case the default datatype will be bytearray.
emp_bag = LOAD '/home/training/dvs/emp.csv' using PigStorage(',')

You probably haven't activated the grunt shell.

Related

Selecting id attribute using xpath/scrapy

I am trying to select the user name from the following forum url.
However, when I use the following in the scrapy shell:
admin:~/workspace/scrapper (master) $ scrapy shell "https://bitcointalk.org/index.php?action=profile;u=22232"
In [1]: response.xpath('//*[#id='bodyarea']/table/tbody/tr/td/table/tbody/tr[2]/td[1]/table/tbody/tr[1]/td[2]')
File "<ipython-input-4-abe70514018b>", line 1
response.xpath('//*[#id='bodyarea']/table/tbody/tr/td/table/tbody/tr[2]/td[1]/table/tbody/tr[1]/td[2]')
^
SyntaxError: invalid syntax
However, in Chrome the selector works fine.
Any suggestions what I am doing wrong?
I appreciate your replies!
This is because of quotes inconsistent usage. Note that you're using single quotes both for XPath and string inside XPath.
Use either
'//*[#id="bodyarea"]/table...'
or
"//*[#id='bodyarea']/table..."

Correcting a hive script

Suppose I am writing any script for exa. creation of table as,
hive (test)> create TABLE tlocal
> (id int,
> name string
> addr string);
FAILED: ParseException line 4:5 mismatched input 'addr' expecting ) near 'string' in create table statement.
Here I forgot to add a comma after name string, so I got the error. I want to add the comma after name string and run again. But, like sql, hive does not allow you to correct only the wrong part of script - I have to rewrite the script again from beginning.
How can I do this?
As Andrew suggested you can write your query in a file and run it using
hive -f <your query file>
Alternatively you can use Hue which is open-source Web interface that supports Apache Hadoop SQL editors for Apache Hive.

Error while exporting the results of a HiveQL query to CSV?

I am a beginner in Hadoop/Hive. I did some research to find out a way to export results of HiveQL query to CSV.
I am running below command line in Putty -
Hive -e ‘use smartsourcing_analytics_prod; select * from solution_archive_data limit 10;’ > /home/temp.csv;
However below is the error I am getting
ParseException line 1:0 cannot recognize input near 'Hive' '-' 'e'
I would appreciate inputs regarding this.
Run your command from outside the hive shell - just from the linux shell.
Run with 'hive' instead of 'Hive'
Just redirecting your output into csv file won't work. You can do:
hive -e 'YOUR QUERY HERE' | sed 's/[\t]/,/g' > sample.csv
like was offered here: How to export a Hive table into a CSV file?
AkashNegi answer will also work for you... a bit longer though
One way I do such things is to create an external table with the schema you want. Then do INSERT INTO TABLE target_table ... Look at the example below:
CREATE EXTERNAL TABLE isvaliddomainoutput (email_domain STRING, `count` BIGINT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/user/cloudera/am/member_email/isvaliddomain";
INSERT INTO TABLE isvaliddomainoutput
SELECT * FROM member_email WHERE isvalid = 1;
Now go to "/user/cloudera/am/member_email/isvaliddomain" and find your data.
Hope this helps.

Hive error when declaring hivevar

Trying to declare a variable in Hive using Hue online. Using the following code:
SET hivevar:TABLE1=location.tablename;
I am getting the following error message:
Error while compiling statement: FAILED: ParseException line 1:12 missing KW_ROLE at 'hivevar' near 'hivevar' line 1:19 missing EOF at ':' near 'hivevar'.
Can anyone tell me what this error message means or even what the KW_ROLE statement means?
Do you by any chance have a comment above that instruction ? Are you running that line and that line only ?
For example, the following will raise a similar Exception :
--This is a comment
SET hivevar:TABLE1=location.tablename;
But it works fine without the comment.
I guess you are making changes in MAC/Windows and moving the script to the server, Double dash "--" in MAC is a different from double dash "--" on Linux server, make changes on server itself and run the script...

whitespace character in case of parameter substitution

I want to pass a filter statement with in my pig script using parameter substitution
For that I have tried
exec -param flt='a1==1 AND a2=2' filterscript.pig
But sadly it is throwing an exception message
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: Local file 'AND' does not exist.
Pig version - 0.9.2
I have tried flt='\'a1==1 AND a2=2\'' and flt="a1==1 AND a2==2" suggested by pig users in apache forum as well as seen a similar post in SO.
Any help will be appreciated
I think you are using the parameter passed as it is as a condition. If so you will get an error like this. Instead you can pass them as separate paarmeters and form the condition string inside the pig script.
exec -p p1=1 -p p2=2 filterscript.pig
Inside your filterscript.pig script you can use these parameter values in condition clauses. For example
a1==$p1 AND a2=$p2
If you run your script outside the grunt shell you can do the followings:
pig -param flt="a1\=\=1 AND a2\=\=2" -f filterscript.pig
where filterscript.pig is something like this:
A = load ...
...
B = filter A by $flt;
...
Note that the '=' is also escaped, otherwise the filter condition won't be evalued to boolean.
If you want to use the filter substitution within the grunt shell as you tried with exec,
then you'll encounter the whitespace problem. Since escaping the whitespace character doesn't work, as a workaround you can create a parameter file :
cat params.txt
flt="a1\=\=1 AND a2\=\=2"
Then issue:
exec -param_file params.txt filterscript.pig
Note: I use Pig 0.12

Resources