command line arguments in hive ( .hql) files from a bash script - bash

I am having a main bash script running several other bash scripts and hql files. The hql files have hive queries. The hive queries have a where clause and it is on the date field. I am trying to automate a process and I need the where clause to change based on todays date ( which is obtained from the main bash script).
For example the .hql file looks like this:
This is selectrows.hql
DROP TABLE IF EXISTS tv.events_tmp;
CREATE TABLE tv.events_tmp
( origintime STRING,
deviceid STRING,
clienttype STRING,
loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'hdfs://nameservice1/data/full/events_tmp';
INSERT INTO TABLE tv.events_tmp SELECT origintime, deviceid, clienttype, loaddate FROM tv.events_tmp WHERE origintime >= '2015-11-02 00:00:00' AND origintime < '2015-11-03 00:00:00';
Since today is 2015-11-11, i want to be able to pass the date - 9 days and date-8 days to the .hql script from the bash script. Is there a way to pass these two variable from the bash script to the .hql file.
So the main bash script looks like this:
#!/bin/bash
# today's date
prodate=`date +%Y-%m-%d`
echo $prodate
dateneeded=`date -d "$prodate - 8 days" +%Y-%m-%d`
echo $dateneeded
# CREATE temp table
beeline -u 'jdbc:hive2://datanode:10000/;principal=hive/datanode#HADOOP.INT.BELL.CA' -d org.apache.hive.jdbc.HiveDriver -f /home/automation/tv/selectrows.hql
echo "created table"
thanks in advance.

You can use beeline -e option to execute queries using strings. Then pass the date parameters to the strings.
#!/bin/bash
# today's date
prodate=`date +%Y-%m-%d`
echo $prodate
dateneeded8=`date -d "$prodate - 8 days" +%Y-%m-%d`
dateneeded9=`date -d "$prodate - 9 days" +%Y-%m-%d`
echo $dateneeded8
echo $dateneeded9
hql="
DROP TABLE IF EXISTS tv.events_tmp;
CREATE TABLE tv.events_tmp
( origintime STRING,
deviceid STRING,
clienttype STRING,
loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'hdfs://nameservice1/data/full/events_tmp';
INSERT INTO TABLE tv.events_tmp SELECT origintime, deviceid, clienttype, loaddate FROM tv.events_tmp WHERE origintime >= '"
echo "$hql""$dateneeded9""' AND origintime < '""$dateneeded8""';"
# CREATE temp table
beeline -u 'jdbc:hive2://datanode:10000/;principal=hive/datanode#HADOOP.INT.BELL.CA' -d org.apache.hive.jdbc.HiveDriver -e "$hql""$dateneeded9""' AND origintime < '""$dateneeded8""';"
echo "created table"

An alternate way to pass an argument
create hive .hql file with defined variables
vi multi_var_file.hql
SELECT * FROM TEST_DB.TEST_TB WHERE TEST1='${var_1}' AND TEST2='${var_2}';
Pass the same variables into the Hive script to run
hive -hivevar var_1='TEST1' -hivevar var_2='TEST2' -f multi_var_file.hql

Related

How can I call query in variable while declaring it in Hive?

How can I call query in variable while declaring it in Hive? I am creating a shell script to drop partitions for previous date, so in the file.hql I am using :
Alter table table_name drop partition column >= ‘datesub(current_date-1)’;
But it is not working, so I have tried to declare the condition in variable and then call here. So I first try to declare the variable then call it in query :
set var1= Select date_sub(current_date, 1)
Alter table table_name drop partition column >= ‘${hoveconf:var1}’;
But this is not working because the variable is not declared correctly. So how to declare the query under variable?
Hive does not calculate variables before substitution, variables are being substituted as is. Also functions and sub-queries are not allowed in partition specification.
The solution is to calculate variable in a shell and pass it to the script:
bash$ dt=$(date -d '-1 day' +%Y-%m-%d)
bash$ hive -e "ALTER TABLE table_name drop partition (column >='$dt')"
Or if you prefer to call script file, then pass hiveconf variable:
bash$ dt=$(date -d '-1 day' +%Y-%m-%d)
bash$ hive -hiveconf dt="$dt" -f script_name
#In the script use '${hiveconf:dt}':
ALTER TABLE table_name drop partition (column >='${hiveconf:dt}')

Export hql output to csv in beeline

I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!
Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.

Replace array of string ( passed as argument to script ) replace those values in a HQL file using Bash shell script?

I have a script which accepts 3 arguments $1 $2 $3
but $3 is an array like ("2018" "01")
so I am executing my script as :
sh script.sh Employee IT "2018 01"
and there an HQL file ( emp.hql) in which I want to replace my partition columns with the array passed like below :
***"select deptid , employee_name from {TBL_NM} where year={par_col[i]} and month={par_col[i]}"***
so below is the code I have tried :
**Table=$1
dept=$2
Par_cols=($3)
for i in "${par_cols[#]}" ;do
sed -i "/${par_col[i]}/${par_col[i]}/g" /home/hk/emp.hql**
done
Error :
*sed: -e experssion #1 , char 0: no previous regular expression*
*sed: -e experssion #2 , char 0: no previous regular expression*
But I think logic to replace partition columns is wrong , could you please help me in this?
Desired Output in HQL file :
select deptid ,employee_name from employee where year=2018 and month=01
Little bit related to below like :
Shell script to find, search and replace array of strings in a file

How to export a Hive table into a CSV file including header?

I used this Hive query to export a table into a CSV file.
hive -f mysql.sql
row format delimited fields terminated by ','
select * from Mydatabase,Mytable limit 100"
cat /LocalPath/* > /LocalPath/table.csv
However, it does not include table column names.
How to export in csv the column names ?
show tablename ?
You should add set hive.cli.print.header=true; before your select query to get column names as the first row of your output. The output would look as Mytable.col1, Mytable.col2 ....
If you don't want the table name with the column names, use set hive.resultset.use.unique.column.names=false;. The first row of your output would then look like col1, col2 ...
Invoking hive command-line with the parameters suggested in the other answer here works for a plain select. So, you can extract the column names and create the csv to start with, as follows:
hive -S --hiveconf hive.cli.print.header=true --hiveconf hive.resultset.use.unique.column.names=false --database Mydatabase -e 'select * from Mytable limit 0;' > /LocalPath/table.csv
Post which you can have the actual data extraction part run, except this time, remember to append to the csv:
cat /LocalPath/* >> /LocalPath/table.csv ## From your question with >> for append

Hive table creation error through Bash Shell

Can anyone give me why I am getting error while creating partitioed table from bash shell.
[cloudera#localhost ~]$ hive -e "create table peoplecountry (
name1 string,
name2 string,
salary int,
country string
)
partitioned by (country string)
row format delimited
column terminated by '\n'";
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.7.0.jar!/hive-log4j.properties
Hive history file=/tmp/cloudera/hive_job_log_0fdf7083-8ab4-499f-8048-a85f162d1357_376056456.txt
FAILED: ParseException line 8:0 missing EOF at 'column' near 'delimited'
If you meant newline at end of each row of your data then you need to use:
line terminated by '\n'
instead of column terminated by ,
In case you meant each column in the row to separated by a delimiter , then specify as
fields terminated by '\n'
refer :
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Resources