Hive Passing Property/Text File to read Multiple Variable values in -hiveconf - hadoop

Currently I am able to use the below command:
hive -f hive-job.hql -hiveconf city='CA' -hiveconf country='US'
Here I am passing only 2 variable values. But I have around 15 to 20 variable values which I need to pass it through -hiveconf. These values are stored in a properties/text file.
Is there a possible way to read the file through -hiveconf ?

There is no direct way to add the property value to Hive variables. But there are two ways which I know might be helpful:
1.) Keep all the variables in hive-job-varibales.hql file as
set x=1;
set y=2;
...
Then call this file in the main file i.e hive -f hive-job.hql like this:
select ... from ..
...
hive-job-varibales.hql
2.) Use Java code to read from property files and convert the property values to hive variable format and use Hive JDBC connection to connect to Hive Server and run your queries in the order you want.
As per your requirement I would suggest to use the second option.
Hope it helps...!!!

You can do this using shell tools pretty easily.
Assuming your properties file is in typical "key=val" format, e.g.
a=1
b=some_value
c=foo
Then you can do:
sed 's/^/-hiveconf\n/g' my_properties_file | xargs hive -f hive-job.hql

Related

Parse Strings in HIVE using Shell

I have a shell script that I use to parse a string variable into Hive, in order to filter my observations. I provide both the script and the hive code below.
In the following script I have a variable which has a string value and I try to parse it into hive, the example below:
Shell Script:
name1='"Maria Nash"' *(I use a single quote first and then a double)*
hive --hiveconf name=${name1} -f t2.hql
Hive code (t2.hql)
create table db.mytable as
SELECT *
FROM db.employees
WHERE emp_name='${hivevar:name}';
Conclusion
To be accurate, the final table is created but it does not contain any observation. The employees table contains observations which has emp_name "Maria Nash" though.
I think that I might not parse the string correctly from shell or I do not follow the correct syntax on how I should handle the parsed variable in the hive query.
I would appreciate your help!
you are passing variable in hiveconf namespace but in the sql script are using hivevar, you should also use hiveconf:
WHERE emp_name=${hiveconf:name} --hiveconf, not hivevar
Use of the CLI is deprecated
you can use beeline from a shell script
it should look something like
beeline << EOF
!connect jdbc:hive2://host:port/db username password
select *
from db.employees
where emp_name = "${1}"
EOF
assuming that $1 is the input from the script.
This is an example of how to do it rather than a production implementation. Generally,
Kerberos would be enabled so username and password wouldn't be there
and a valid token would be available
Validate the input parameters.
Given that you can do it in a single line
beeline -u jdbc:hive2://hostname:10000 -f {full Path to Script} --hivevar {variable}={value}

Unable to resolve $proc_date from the script

I have multiple HQL's, below is the one example.
located at : /home/ganesh/CopyJobs/hql/
insert into XYZ.exttbl_form_data PARTITION (load_date="$proc_date") select FORM_DATA_ID,FORM_ID,USER_ID,INTERACTIONS_ID,SUBMISSION_DATETIME,FILEDS from PQR.exttbl_form_data where load_date="$proc_date"
In the main script im reading above mentioned HQLs as
export proc_date=2018-05-07
while read line
do
export hql=`cat /home/ganesh/CopyJobs/hql/$table_name.hql`
export hql_final=$(`eval echo"$hql"`)
echo "Final HQL: $hql_final"
hive -e "$hql_final;"
done < /home/ganesh/CopyJobs/config/tables.txt
where in tables.txt has list of all HQL.
I want to resolve the $proc_date however that not happening.
Use Hive variables substitution (hiveconf variables). I have fixed your script a little bit.
HQL file should look like this:
insert into XYZ.exttbl_form_data PARTITION (load_date='${hiveconf:proc_date}')
select FORM_DATA_ID,FORM_ID,USER_ID,INTERACTIONS_ID,SUBMISSION_DATETIME,FILEDS
from PQR.exttbl_form_data where load_date='${hiveconf:proc_date}'
${hiveconf:proc_date} - is a variable to be passed to the Hive.
The main script:
proc_date=2018-05-07
echo "proc_date is $proc_date"
while read line
do
hql_file=/home/ganesh/CopyJobs/hql/"$line".hql
echo "current hql_file is $hql_file"
hive -hiveconf proc_date="$proc_date" -f "$hql_file"
done < /home/ganesh/CopyJobs/config/tables.txt

HADOOP HIVE - Is there a command for setting csv output

Is there a command line in HIVE that can be used to define the format of the output file to CSV?
Something similar to the below example?
set hive.resultset.use.unique.column.names=false;
EDIT - Added the following for further context 12/18.
A terminal window I'm using has predefined settings for the command line when it runs an 'export' through a script. The following is it's commands:
set hive.metastore.warehouse.dir=/idn/home/user;
set mapred.job.queue.name=root.gmis;
set hive.exec.scratchdir=/axp/hivescratch/user;
set hive.resultset.use.unique.column.names=false;
set hive.cli.print.header=true;
set hive.groupby.orderby.position.alias=true;
Is there another command I could add versus the lengthy strings per below? I'm using in the other hive terminal the following; but it's SQL is different(?).
cloak-hive -e "INSERT OVERWRITE LOCAL DIRECTORY '/adshome/user/VS_PMD' ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
You can mention the out put file format to csv refer the following example command. Note that it’s same for beeline and hive
beeline -u jdbc:hive2://localhost:10000/default --silent=true --outputformat=csv2 -e "select * from sample_07 limit 10" > out.txt
On Apache documentation,
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format]
SELECT ... FROM ...
INSERT OVERWRITE LOCAL DIRECTORY directory1
ROW FORMAT DELIMITED
STORED AS TEXTFILE
SELECT ... FROM ...;
Maybe some work would be needed on ROW FORMAT to achieve expected result.
Please note also that LOCAL value refers to local filesystem.

Set date function as variable and use in beeline and hql file (hive)

Could anyone please explain to me how to solve this issue.
I want to use from_unixtime(unix_timestamp() - 86400, 'yyyyMMdd) as the value for a variable and use it in a query's where clause that is stored in an hql file. I have tried:
beeline --hiveconf date=from_unixtime(unix_timestamp(), 'yyyyMMdd) -f path/file.hql (in .hql file: WHERE date <= '${hiveconf:date}';)
It does not work because of the date function. Is there any way to first get the date value in some script and then use it together with the hql file? I have only seen examples with hive cli but not beeline and I have tried some different ways but can't get it to work. Would really appreciate some help. The query works with hardcoded dates.
Thanks!
By using the unix date functions you could convert the date to the required format and then pass to the hive variable. Below is a sample command you could use :
cur_date=`date +%Y%m%d`
beeline --hiveconf date=${cur_date} -f path/file.hql
Then in your hive query just use ${date} where ever required.
Hope this helps

Hive - How to store a query result in a variable in a Bash script

I need to store the result of a Hive query in a variable whose value will be used later. So, something like:
$var = select col1 from table;
$var_to_used_later = $var;
All this is part of a bash shell script. How to form the query so as to get the desired result?
Hive should provide command line support for you. I am not familiar with hive but I found this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli, you can check whether that works.
Personally, I used mysql to achieve similar goal before. The command is:
mysql -u root -p`[script to generate the key]` -N -B -e "use XXXDB; select aaa, bbb, COUNT(*) from xxxtable where some_attribute='$CertainValue';"
I used the method shown here and got it! Instead of calling a file as shown, I run the query directly and use the value stored in the variable.

Resources