Passing date as command line arguments in Hive - hadoop

I have my below query in test1.hql file. I am trying to pass the date (dt) as the command line argument.
select * from lip_data_quality where dt = '${hiveconf: start_date}';
So whenever I try to run the above test1.hql file from shell prompt like this-
hive -f hivetest1.hql -hiveconf start_date=20120709
I get zero records back. But the data is there in that table for that particular date. Why is it so? Something wrong I am doing?
Can anyone help me out here? I was following Bejoy's Article
I am working Hive 0.6

Eliminate the space between hiveconf: and start_date.
This may only be for string types, but Hive is picky in this respect.

Related

Passing a variable to a Hive script file -- works with integer but not string

I need to pass a variable to an hql file in Hive using putty. I've set up a test scenario. Basically I want to select a row from a table where a value equals the variable. It will work when the variable is an integer but not a string.
The hql file /home_dir_users/username/smb_bau/testy.hql has this code in it:
drop table if exists tam_seg.tbl_ppp;
create table tam_seg.tbl_ppp as
select
*
from
tam_seg.1_testy as b
where
b.column_a = ${hivevar:my_var};
tam_seg.1_testy looks like this:
column_a
A
B
C
D
ZZZ
123
I want to use PuTTY to pass the variable my_var to the hql file. It works if I try 123 using this:
hive --hivevar my_var=123 -f /home_dir_users/username/smb_bau/testy.hql
But it doesn't work if I try to select one of the strings. I have tried the below:
hive --hivevar my_var=ZZZ -f /home_dir_users/username/smb_bau/testy.hql
hive --hivevar my_var='ZZZ' -f /home_dir_users/username/smb_bau/testy.hql
my_var='ZZZ'
hive --hivevar my_var=$my_var -f /home_dir_users/username/smb_bau/testy.hql
But every time I get this error message:
*FAILED: SemanticException [Error 10004]: Line 9:14 Invalid table alias or column reference 'ZZZ': (possible column names are: column_a)*
I have also tried hiveconf, only one dash before it instead of two, not having hiveconf or hivevar before the variable in the code file.
Any ideas what am I doing wrong?
Many thanks.
OK so it looks like I have found the answer below through trial and error. I am leaving the post here in case any other users new to Hive find this useful.
I put single quotes round the variable in the hql file so it looks like this:
select
*
from
tam_seg.1_testy as b
where
b.column_a = '${hivevar:my_var}';
In a way this maybe seems obvious -- I would put single quotes round a string if I weren't using a variable. I guess I had my VBA/SQL Server hat on where a variable would not have quotes round it even if it were a string e.g. = strMyVar or = #STR_MY_VAR (otherwise the result would literally be "${hivevar:my_var}" as a string).

Error while exporting the results of a HiveQL query to CSV?

I am a beginner in Hadoop/Hive. I did some research to find out a way to export results of HiveQL query to CSV.
I am running below command line in Putty -
Hive -e ‘use smartsourcing_analytics_prod; select * from solution_archive_data limit 10;’ > /home/temp.csv;
However below is the error I am getting
ParseException line 1:0 cannot recognize input near 'Hive' '-' 'e'
I would appreciate inputs regarding this.
Run your command from outside the hive shell - just from the linux shell.
Run with 'hive' instead of 'Hive'
Just redirecting your output into csv file won't work. You can do:
hive -e 'YOUR QUERY HERE' | sed 's/[\t]/,/g' > sample.csv
like was offered here: How to export a Hive table into a CSV file?
AkashNegi answer will also work for you... a bit longer though
One way I do such things is to create an external table with the schema you want. Then do INSERT INTO TABLE target_table ... Look at the example below:
CREATE EXTERNAL TABLE isvaliddomainoutput (email_domain STRING, `count` BIGINT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/user/cloudera/am/member_email/isvaliddomain";
INSERT INTO TABLE isvaliddomainoutput
SELECT * FROM member_email WHERE isvalid = 1;
Now go to "/user/cloudera/am/member_email/isvaliddomain" and find your data.
Hope this helps.

Set date function as variable and use in beeline and hql file (hive)

Could anyone please explain to me how to solve this issue.
I want to use from_unixtime(unix_timestamp() - 86400, 'yyyyMMdd) as the value for a variable and use it in a query's where clause that is stored in an hql file. I have tried:
beeline --hiveconf date=from_unixtime(unix_timestamp(), 'yyyyMMdd) -f path/file.hql (in .hql file: WHERE date <= '${hiveconf:date}';)
It does not work because of the date function. Is there any way to first get the date value in some script and then use it together with the hql file? I have only seen examples with hive cli but not beeline and I have tried some different ways but can't get it to work. Would really appreciate some help. The query works with hardcoded dates.
Thanks!
By using the unix date functions you could convert the date to the required format and then pass to the hive variable. Below is a sample command you could use :
cur_date=`date +%Y%m%d`
beeline --hiveconf date=${cur_date} -f path/file.hql
Then in your hive query just use ${date} where ever required.
Hope this helps

Hive - How to store a query result in a variable in a Bash script

I need to store the result of a Hive query in a variable whose value will be used later. So, something like:
$var = select col1 from table;
$var_to_used_later = $var;
All this is part of a bash shell script. How to form the query so as to get the desired result?
Hive should provide command line support for you. I am not familiar with hive but I found this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli, you can check whether that works.
Personally, I used mysql to achieve similar goal before. The command is:
mysql -u root -p`[script to generate the key]` -N -B -e "use XXXDB; select aaa, bbb, COUNT(*) from xxxtable where some_attribute='$CertainValue';"
I used the method shown here and got it! Instead of calling a file as shown, I run the query directly and use the value stored in the variable.

Hive Parameter in hive query

I have two code.
One is named testing.hql
select dt, '${hiveconf:var}' from temp_table;
other is named testing.sh
temp= date --date='yesterday' +%y%m%d
hive -f testing.hql -hiveconf var=$temp
so basically im trying to pass date value to the query so i can filter the data i process in the query based on the current date.
im running this by this command. easy.
./testing.sh
This doesn't work,, can someone quickly check and see which part am i making a mistake?
So basically what I want to do is
select jobid from temp_table where dt >= '${hiveconf:var}';
so that the jobid i get are only the ones that are done from yesterday, since the
shell script sets paramter as the date for yesterday.
Thanks!
Currently this outputs the empty space after the dt value.
Figured it out.
2 simple bugs.
1) When doing shell scripts, executing lines have to be wrapped with `.
so i did
temp=`date --date='yesterday' +%y%m%d`
hive -f testing.hql -hiveconf var=$temp
and it works like a charm
2) in the query, the parameter must be in double quotes.
select jobid from temp_table where dt >= "${hiveconf:var}";
Hope this question can help others who had this issue.
There is a space after temp= removing that possibly should solve the issue
temp=<blank>date ...
temp variable should be declared as below,
temp=$(date --date='yesterday' +%y%m%d)
You can use BeeTamer for that. It allows to store result (or part of it) in a variable, and use this variable later in your code.
Beetamer is macro extension to Hive or Impala that allows to extend functionality of the Apache Hive and Cloudera Impala engines.
select avg(a) from abc;
%capture MY_AVERAGE;
select * from abc2 where avg_var=#MY_AVERAGE#;
In here you save average value from you query into macro variable MY_AVERAGE and then reusing it in the second query.

Resources