Not able to pass parameters to hql from sh file - shell

I have a .sh file from which I am passing values to .hql, but its giving me errors
sm=1
XXXXX=""
while read -r line
do
name="$line"
XXXXX="hive$name(${XXXX[$sm]%?})"
echo $XXXXX
hive -hiveconf var1=$XXXXX -hiveconf var2=/user/cloudera/project -hiveconf var3=$name -f test1.hql
sm=$((sm + 1))
done < "$filename"
CREATE EXTERNAL TABLE IF NOT EXISTS ${hiveconf:var1}
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
location ${hiveconf:var2/hiveconf:var3};
*Please note that $XXXXX is creating a tablename with schema after reading from the file and some logic. When I echo it, there is no problem, but problem comes in .hql file. Error is somewhat like below :
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 2:4 cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in column type
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

Try to quote variable:
hive -hiveconf var1="$XXXXX"
All such variables should be quoted
Use this command inside script to check value passed:
! echo "${hiveconf:var1}";

An alternate to above is using hivevar:
sm=1
XXXXX=""
while read -r line
do
name="$line"
XXXXX="hive$name(${XXXX[$sm]%?})"
echo "${XXXXX}"
hive -hivevar var1=${XXXXX} -hivevar var2="/user/cloudera/project" -hivevar var3=${name} -f test1.hql
sm=$((sm + 1))
done < "$filename"
CREATE EXTERNAL TABLE IF NOT EXISTS ${var1}
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
location ${var2/var3};
Also here is a link for a read on the difference between hiveconf and hivevar in case the curiosity bug bites :)
What is the difference between -hivevar and -hiveconf?

Related

How can I call query in variable while declaring it in Hive?

How can I call query in variable while declaring it in Hive? I am creating a shell script to drop partitions for previous date, so in the file.hql I am using :
Alter table table_name drop partition column >= ‘datesub(current_date-1)’;
But it is not working, so I have tried to declare the condition in variable and then call here. So I first try to declare the variable then call it in query :
set var1= Select date_sub(current_date, 1)
Alter table table_name drop partition column >= ‘${hoveconf:var1}’;
But this is not working because the variable is not declared correctly. So how to declare the query under variable?
Hive does not calculate variables before substitution, variables are being substituted as is. Also functions and sub-queries are not allowed in partition specification.
The solution is to calculate variable in a shell and pass it to the script:
bash$ dt=$(date -d '-1 day' +%Y-%m-%d)
bash$ hive -e "ALTER TABLE table_name drop partition (column >='$dt')"
Or if you prefer to call script file, then pass hiveconf variable:
bash$ dt=$(date -d '-1 day' +%Y-%m-%d)
bash$ hive -hiveconf dt="$dt" -f script_name
#In the script use '${hiveconf:dt}':
ALTER TABLE table_name drop partition (column >='${hiveconf:dt}')

Parsing through a CSV file

I have a CSV files like this:
2015-12-10,22:45:00,205,5626,85
2015-12-10,23:00:01,79,5625,85
2015-12-13,13:00:01,4410,5629,85
2015-12-13,13:15:00,4244,5627,85
2015-12-13,13:30:00,4082,5627,85
I tried this script to generate an SQL statement:
#!/bin/bash
inputfile=${1}
echo $inputfile
OLDIFS=$IFS
IFS=,
while read date time current full cycle
do
echo -—$date --$time --$current --$full --$cycle
echo insert into table values($date,$time,$current,$full,$cycle)
sleep 1
done < $inputfile
IFS=$OLDIFS
But on execution I get this error and it doesn't run as expected:
/Scripts/CreateSql.sh: line 10: syntax error near unexpected token `('
/Scripts/CreateSql.sh: line 10: `echo insert into table values(\$date,$time,$current,$full,$cycle)'
I need the statement generated like this:
insert into table values($date,$time,$current,$full,$cycle)
Please kindly suggest a fix for this.
Use double quotes as anything under () to shell means spawn a new process.
echo "insert into table values($date,$time,$current,$full,$cycle)"
All,
i fixed this-
echo 'insert into table values ('$date','$time','$current','$full','$cycle')'

command line arguments in hive ( .hql) files from a bash script

I am having a main bash script running several other bash scripts and hql files. The hql files have hive queries. The hive queries have a where clause and it is on the date field. I am trying to automate a process and I need the where clause to change based on todays date ( which is obtained from the main bash script).
For example the .hql file looks like this:
This is selectrows.hql
DROP TABLE IF EXISTS tv.events_tmp;
CREATE TABLE tv.events_tmp
( origintime STRING,
deviceid STRING,
clienttype STRING,
loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'hdfs://nameservice1/data/full/events_tmp';
INSERT INTO TABLE tv.events_tmp SELECT origintime, deviceid, clienttype, loaddate FROM tv.events_tmp WHERE origintime >= '2015-11-02 00:00:00' AND origintime < '2015-11-03 00:00:00';
Since today is 2015-11-11, i want to be able to pass the date - 9 days and date-8 days to the .hql script from the bash script. Is there a way to pass these two variable from the bash script to the .hql file.
So the main bash script looks like this:
#!/bin/bash
# today's date
prodate=`date +%Y-%m-%d`
echo $prodate
dateneeded=`date -d "$prodate - 8 days" +%Y-%m-%d`
echo $dateneeded
# CREATE temp table
beeline -u 'jdbc:hive2://datanode:10000/;principal=hive/datanode#HADOOP.INT.BELL.CA' -d org.apache.hive.jdbc.HiveDriver -f /home/automation/tv/selectrows.hql
echo "created table"
thanks in advance.
You can use beeline -e option to execute queries using strings. Then pass the date parameters to the strings.
#!/bin/bash
# today's date
prodate=`date +%Y-%m-%d`
echo $prodate
dateneeded8=`date -d "$prodate - 8 days" +%Y-%m-%d`
dateneeded9=`date -d "$prodate - 9 days" +%Y-%m-%d`
echo $dateneeded8
echo $dateneeded9
hql="
DROP TABLE IF EXISTS tv.events_tmp;
CREATE TABLE tv.events_tmp
( origintime STRING,
deviceid STRING,
clienttype STRING,
loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'hdfs://nameservice1/data/full/events_tmp';
INSERT INTO TABLE tv.events_tmp SELECT origintime, deviceid, clienttype, loaddate FROM tv.events_tmp WHERE origintime >= '"
echo "$hql""$dateneeded9""' AND origintime < '""$dateneeded8""';"
# CREATE temp table
beeline -u 'jdbc:hive2://datanode:10000/;principal=hive/datanode#HADOOP.INT.BELL.CA' -d org.apache.hive.jdbc.HiveDriver -e "$hql""$dateneeded9""' AND origintime < '""$dateneeded8""';"
echo "created table"
An alternate way to pass an argument
create hive .hql file with defined variables
vi multi_var_file.hql
SELECT * FROM TEST_DB.TEST_TB WHERE TEST1='${var_1}' AND TEST2='${var_2}';
Pass the same variables into the Hive script to run
hive -hivevar var_1='TEST1' -hivevar var_2='TEST2' -f multi_var_file.hql

How to extract the sybase sql query output in a shell script

I am trying to execute a SQL query on SYBASE database using shell script.
A simple query to count the number of rows in a table.
#!/bin/sh
[ -f /etc/bash.bashrc.local ] && . /etc/bash.bashrc.local
. /gi/base_environ
. /usr/gi/bin/environ
. /usr/gi/bin/path
ISQL="isql <username> guest"
count() {
VAL=$( ${ISQL} <<EOSQL
set nocount on
go
set chained off
go
select count(*) from table_name
go
EOSQL
)
echo "VAL : $VAL"
echo $VAL | while read line
do
echo "line : $line"
done
}
count
The above code gives the output as follows
VAL : Password:
-----------
35
line : Password: ----------- 35
Is there a way to get only the value '35'. What I am missing here? Thanks in advance.
The "select count(*)" prints a result set as output, i.e. a column header (here, that's blank), a line of dashes for each column, and the column value for every row. Here you have only 1 column and 1 row.
If you want to get rid of the dashes, you can do various things:
select the count(*) into a variable and just PRINT the variable. This will remove the dashes from the output
perform some additional filtering with things like grep and awk on the $VAL variable before using it
As for the 'Password:' line: you are not specifying a password in the 'isql' command, so 'isql' will prompt for it (since it works, it looks like there is no password). Best specify a password flag to avoid this prompt -- or filter out that part as mentioned above.
Incidentally, it looks like you may be using the 'isql' from the Unix/Linux ODBC installation, rather than the 'isql' utility that comes with Sybase. Best use the latter (check with 'which isql').

Bash script to convert a date and time column to unix timestamp in .csv

I am trying to create a script to convert two columns in a .csv file which are date and time into unix timestamps. So i need to get the date and time column from each row, convert it and insert it into an additional column at the end containing the timestamp.
Could anyone help me? So far i have discovered the unix command to convert any give time and date to unixstamp:
date -d "2011/11/25 10:00:00" "+%s"
1322215200
I have no experience with bash scripting could anyone get me started?
Examples of my columns and rows:
Columns: Date, Time,
Row 1: 25/10/2011, 10:54:36,
Row 2: 25/10/2011, 11:15:17,
Row 3: 26/10/2011, 01:04:39,
Thanks so much in advance!
You don't provide an exerpt from your csv-file, so I'm using this one:
[foo.csv]
2011/11/25;12:00:00
2010/11/25;13:00:00
2009/11/25;19:00:00
Here's one way to solve your problem:
$ cat foo.csv | while read line ; do echo $line\;$(date -d "${line//;/ }" "+%s") ; done
2011/11/25;12:00:00;1322218800
2010/11/25;13:00:00;1290686400
2009/11/25;19:00:00;1259172000
(EDIT: Removed an uneccessary variable.)
(EDIT2: Altered the date command so the script actually works.)
this should do the job:
awk 'BEGIN{FS=OFS=", "}{t=$1" "$2; "date -d \""t"\" +%s"|getline d; print $1,$2,d}' yourCSV.csv
note
you didn't give any example. and you mentioned csv, so I assume that the column separator in your file should be "comma".
test
kent$ echo "2011/11/25, 10:00:00"|awk 'BEGIN{FS=OFS=", "}{t=$1" "$2; "date -d \""t"\" +%s"|getline d; print $1,$2,d}'
2011/11/25, 10:00:00, 1322211600
Now two imporvements:
First: No need for cat foo.csv, just stream that via < foo.csv into the while loop.
Second: No need for echo & tr to create the date stringformat. Just use bash internal pattern and substitute and do it inplace
while read line ; do echo ${line}\;$(date -d "${line//;/ }" +'%s'); done < foo.csv

Resources