Parse Strings in HIVE using Shell - bash

I have a shell script that I use to parse a string variable into Hive, in order to filter my observations. I provide both the script and the hive code below.
In the following script I have a variable which has a string value and I try to parse it into hive, the example below:
Shell Script:
name1='"Maria Nash"' *(I use a single quote first and then a double)*
hive --hiveconf name=${name1} -f t2.hql
Hive code (t2.hql)
create table db.mytable as
SELECT *
FROM db.employees
WHERE emp_name='${hivevar:name}';
Conclusion
To be accurate, the final table is created but it does not contain any observation. The employees table contains observations which has emp_name "Maria Nash" though.
I think that I might not parse the string correctly from shell or I do not follow the correct syntax on how I should handle the parsed variable in the hive query.
I would appreciate your help!

you are passing variable in hiveconf namespace but in the sql script are using hivevar, you should also use hiveconf:
WHERE emp_name=${hiveconf:name} --hiveconf, not hivevar

Use of the CLI is deprecated
you can use beeline from a shell script
it should look something like
beeline << EOF
!connect jdbc:hive2://host:port/db username password
select *
from db.employees
where emp_name = "${1}"
EOF
assuming that $1 is the input from the script.
This is an example of how to do it rather than a production implementation. Generally,
Kerberos would be enabled so username and password wouldn't be there
and a valid token would be available
Validate the input parameters.
Given that you can do it in a single line
beeline -u jdbc:hive2://hostname:10000 -f {full Path to Script} --hivevar {variable}={value}

Related

Passing a variable to a Hive script file -- works with integer but not string

I need to pass a variable to an hql file in Hive using putty. I've set up a test scenario. Basically I want to select a row from a table where a value equals the variable. It will work when the variable is an integer but not a string.
The hql file /home_dir_users/username/smb_bau/testy.hql has this code in it:
drop table if exists tam_seg.tbl_ppp;
create table tam_seg.tbl_ppp as
select
*
from
tam_seg.1_testy as b
where
b.column_a = ${hivevar:my_var};
tam_seg.1_testy looks like this:
column_a
A
B
C
D
ZZZ
123
I want to use PuTTY to pass the variable my_var to the hql file. It works if I try 123 using this:
hive --hivevar my_var=123 -f /home_dir_users/username/smb_bau/testy.hql
But it doesn't work if I try to select one of the strings. I have tried the below:
hive --hivevar my_var=ZZZ -f /home_dir_users/username/smb_bau/testy.hql
hive --hivevar my_var='ZZZ' -f /home_dir_users/username/smb_bau/testy.hql
my_var='ZZZ'
hive --hivevar my_var=$my_var -f /home_dir_users/username/smb_bau/testy.hql
But every time I get this error message:
*FAILED: SemanticException [Error 10004]: Line 9:14 Invalid table alias or column reference 'ZZZ': (possible column names are: column_a)*
I have also tried hiveconf, only one dash before it instead of two, not having hiveconf or hivevar before the variable in the code file.
Any ideas what am I doing wrong?
Many thanks.
OK so it looks like I have found the answer below through trial and error. I am leaving the post here in case any other users new to Hive find this useful.
I put single quotes round the variable in the hql file so it looks like this:
select
*
from
tam_seg.1_testy as b
where
b.column_a = '${hivevar:my_var}';
In a way this maybe seems obvious -- I would put single quotes round a string if I weren't using a variable. I guess I had my VBA/SQL Server hat on where a variable would not have quotes round it even if it were a string e.g. = strMyVar or = #STR_MY_VAR (otherwise the result would literally be "${hivevar:my_var}" as a string).

Set date function as variable and use in beeline and hql file (hive)

Could anyone please explain to me how to solve this issue.
I want to use from_unixtime(unix_timestamp() - 86400, 'yyyyMMdd) as the value for a variable and use it in a query's where clause that is stored in an hql file. I have tried:
beeline --hiveconf date=from_unixtime(unix_timestamp(), 'yyyyMMdd) -f path/file.hql (in .hql file: WHERE date <= '${hiveconf:date}';)
It does not work because of the date function. Is there any way to first get the date value in some script and then use it together with the hql file? I have only seen examples with hive cli but not beeline and I have tried some different ways but can't get it to work. Would really appreciate some help. The query works with hardcoded dates.
Thanks!
By using the unix date functions you could convert the date to the required format and then pass to the hive variable. Below is a sample command you could use :
cur_date=`date +%Y%m%d`
beeline --hiveconf date=${cur_date} -f path/file.hql
Then in your hive query just use ${date} where ever required.
Hope this helps

Hive - How to store a query result in a variable in a Bash script

I need to store the result of a Hive query in a variable whose value will be used later. So, something like:
$var = select col1 from table;
$var_to_used_later = $var;
All this is part of a bash shell script. How to form the query so as to get the desired result?
Hive should provide command line support for you. I am not familiar with hive but I found this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli, you can check whether that works.
Personally, I used mysql to achieve similar goal before. The command is:
mysql -u root -p`[script to generate the key]` -N -B -e "use XXXDB; select aaa, bbb, COUNT(*) from xxxtable where some_attribute='$CertainValue';"
I used the method shown here and got it! Instead of calling a file as shown, I run the query directly and use the value stored in the variable.

Hadoop Hcatalog -How to pass key value pair

I have a create table script where the table name will be decided at runtime. How do I pass the value to sql script?
I'm trying something like this
hcat -e "create table ${D:TAB_NAME} (name string)" -DTAB_NAME=person
But I keep getting errors.
Can I get the correct syntax?
Try this:
hcat -e 'create table ${hiveconf:TAB_NAME} (name string);' -DTAB_NAME=person2
Here are two things to note:
In shell, default variable expansion is $ so your ${D:TAB_NAME} is getting expanded to nothing before even getting passed to hcat parser. So, either escape the $ or use strong quoting using: ''.
Use hiveconf instead of D for variable substitution as hcat under the hoods is still using hive to parse commands.

Passing date as command line arguments in Hive

I have my below query in test1.hql file. I am trying to pass the date (dt) as the command line argument.
select * from lip_data_quality where dt = '${hiveconf: start_date}';
So whenever I try to run the above test1.hql file from shell prompt like this-
hive -f hivetest1.hql -hiveconf start_date=20120709
I get zero records back. But the data is there in that table for that particular date. Why is it so? Something wrong I am doing?
Can anyone help me out here? I was following Bejoy's Article
I am working Hive 0.6
Eliminate the space between hiveconf: and start_date.
This may only be for string types, but Hive is picky in this respect.

Resources