I have multiple HQL's, below is the one example.
located at : /home/ganesh/CopyJobs/hql/
insert into XYZ.exttbl_form_data PARTITION (load_date="$proc_date") select FORM_DATA_ID,FORM_ID,USER_ID,INTERACTIONS_ID,SUBMISSION_DATETIME,FILEDS from PQR.exttbl_form_data where load_date="$proc_date"
In the main script im reading above mentioned HQLs as
export proc_date=2018-05-07
while read line
do
export hql=`cat /home/ganesh/CopyJobs/hql/$table_name.hql`
export hql_final=$(`eval echo"$hql"`)
echo "Final HQL: $hql_final"
hive -e "$hql_final;"
done < /home/ganesh/CopyJobs/config/tables.txt
where in tables.txt has list of all HQL.
I want to resolve the $proc_date however that not happening.
Use Hive variables substitution (hiveconf variables). I have fixed your script a little bit.
HQL file should look like this:
insert into XYZ.exttbl_form_data PARTITION (load_date='${hiveconf:proc_date}')
select FORM_DATA_ID,FORM_ID,USER_ID,INTERACTIONS_ID,SUBMISSION_DATETIME,FILEDS
from PQR.exttbl_form_data where load_date='${hiveconf:proc_date}'
${hiveconf:proc_date} - is a variable to be passed to the Hive.
The main script:
proc_date=2018-05-07
echo "proc_date is $proc_date"
while read line
do
hql_file=/home/ganesh/CopyJobs/hql/"$line".hql
echo "current hql_file is $hql_file"
hive -hiveconf proc_date="$proc_date" -f "$hql_file"
done < /home/ganesh/CopyJobs/config/tables.txt
Related
I've a concern which can be categorized in 2 ways:
My requirement is of passing argument from shell script to hive script.
OR
within one shell script I should include variable's value in hive statement.
I'll explain with an example for both:
1) Passing argument from shell script to hiveQL->
My test Hive QL:
select count(*) from demodb.demo_table limit ${hiveconf:num}
My test shell script:
cnt=1
sh -c 'hive -hiveconf num=$cnt -f countTable.hql'
So basically I want to include the value of 'cnt' in the HQL, which is not happening in this case. I get the error as:
FAILED: ParseException line 2:0 mismatched input '<EOF>' expecting Number near 'limit' in limit clause
I'm sure the error means that the variable's value isn't getting passed on.
2) Passing argument directly within the shell script->
cnt=1
hive -e 'select count(*) from demodb.demo_table limit $cnt'
In both the above cases, I couldn't pass the argument value. Any ideas??
PS: I know the query seems absurd of including the 'limit' in count but I have rephrased the problem I actually have. The requirement remains intact of passing the argument.
Any ideas, anyone?
Thanks in advance.
Set the variable this way:
#!/bin/bash
cnt=3
echo "Executing the hive query - starts"
hive -hiveconf num=$cnt -e ' set num; select * from demodb.demo_table limit ${hiveconf:num}'
echo "Executing the hive query - ends"
This works, if put in a file named hivetest.sh, then invoked with sh hivetest.sh:
cnt=2
hive -e "select * from demodb.demo_table limit $cnt"
You are using single quotes instead of double.
Using double quotes for OPTION #1 also works fine.
hadoop#osboxes:~$ export val=2;
hadoop#osboxes:~$ hive -e "select * from bms.bms1 where max_seq=$val";
or
vi test.sh
#########
export val=2
hive -e "select * from bms.bms1 where max_seq=$val";
#####################
Try this
cnt=1
hive -hiveconf number=$cnt select * from demodb.demo_table limit ${hiveconf:number}
I need to store the result of a Hive query in a variable whose value will be used later. So, something like:
$var = select col1 from table;
$var_to_used_later = $var;
All this is part of a bash shell script. How to form the query so as to get the desired result?
Hive should provide command line support for you. I am not familiar with hive but I found this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli, you can check whether that works.
Personally, I used mysql to achieve similar goal before. The command is:
mysql -u root -p`[script to generate the key]` -N -B -e "use XXXDB; select aaa, bbb, COUNT(*) from xxxtable where some_attribute='$CertainValue';"
I used the method shown here and got it! Instead of calling a file as shown, I run the query directly and use the value stored in the variable.
I'm trying to make use of the DESCRIBE function via Hive to output the column descriptions of each of the tables out to individual files. I've discovered the -f option so I can just read from a file and write the output back out:
hive -f nameOfSqlQueryFile.sql > out.txt
However, if I open the output file, it throws all the descriptions back to back and it's unclear where one description starts for a table and where it ends.
So, I've tried making a batch file that uses -e to describe each of the tables individually and output to a file:
#!/bin/bash
nameArr=( $(hive -e 'show tables;') )
count=0
for i in "${nameArr[#]}"
do
echo 'Working on table('$count'): '$i
hive -e 'describe '$i > $i'_.txt';
count=$(($count+1))
done
However, because this needs to reconnect for each query, it's remarkably slow, taking hours to process several hundred queries.
Does anyone have an idea of how else I might run each of these DESCRIBE functions, and ideally output to separate files?
You can probably use one of these, depending on how you process the output:
Just use the OK line as a separator and search for it using a script.
Use DESCRIBE EXTENDED which adds a line at the end with info on the table, including its location, which can be used to extract the table name (using sed, for example)
If you're just using the output file as a manual reference, insert a SQL statement that prints a separator of your choice between each table, e.g.:
DESCRIBE table;
SELECT '-----------------' FROM table;
Currently I am able to use the below command:
hive -f hive-job.hql -hiveconf city='CA' -hiveconf country='US'
Here I am passing only 2 variable values. But I have around 15 to 20 variable values which I need to pass it through -hiveconf. These values are stored in a properties/text file.
Is there a possible way to read the file through -hiveconf ?
There is no direct way to add the property value to Hive variables. But there are two ways which I know might be helpful:
1.) Keep all the variables in hive-job-varibales.hql file as
set x=1;
set y=2;
...
Then call this file in the main file i.e hive -f hive-job.hql like this:
select ... from ..
...
hive-job-varibales.hql
2.) Use Java code to read from property files and convert the property values to hive variable format and use Hive JDBC connection to connect to Hive Server and run your queries in the order you want.
As per your requirement I would suggest to use the second option.
Hope it helps...!!!
You can do this using shell tools pretty easily.
Assuming your properties file is in typical "key=val" format, e.g.
a=1
b=some_value
c=foo
Then you can do:
sed 's/^/-hiveconf\n/g' my_properties_file | xargs hive -f hive-job.hql
I'm trying to write a script which lists a directory and creates an SQL script to insert these directories, problem is I only want to insert new directories, here is what I have so far:
#If file doesn't exist add the search path test
if [ ! -e /home/aydin/movies.sql ]
then
echo "SET SEARCH_PATH TO noti_test;" >> /home/aydin/movies.sql;
fi
cd /media/htpc/
for i in *
do
#for each directory escape any single quotes
movie=$(echo $i | sed "s:':\\\':g" )
#build sql insert string
insertString="INSERT INTO movies (movie) VALUES (E'$movie');";
#if sql string exists in file already
if grep -Fxq "$insertString" /home/aydin/movies.sql
then
#comment out string
sed -i "s/$insertString/--$insertString/g" /home/aydin/movies.sql
else
#add sql string
echo $insertString >> /home/aydin/movies.sql;
fi
done;
#execute script
psql -U "aydin.hassan" -d "aydin_1.0" -f /home/aydin/movies.sql;
It seems to work apart from one thing, the script doesn't recognise entries with single quotes in them, so upon running the script again with no new dirs, this is what the file looks like:
--INSERT INTO movies (movie) VALUES (E'007, Moonraker (1979)');
--INSERT INTO movies (movie) VALUES (E'007, Octopussy (1983)');
INSERT INTO movies (movie) VALUES (E'007, On Her Majesty\'s Secret Service (1969)');
I'm open to suggestions on a better way to do this also, my process seems pretty elongated and inefficient :)
Script looks generally good to me. Consider the revised version (untested):
#! /bin/bash
#If file doesn't exist add the search path test
if [ ! -e /home/aydin/movies.sql ]
then
echo 'SET search_path=noti_test;' > /home/aydin/movies.sql;
fi
cd /media/htpc/
for i in *
do
#build sql insert string - single quotes work fine inside dollar-quoting
insertString="INSERT INTO movies (movie) SELECT \$x\$$movie\$x\$
WHERE NOT EXISTS (SELECT 1 FROM movies WHERE movie = \$x\$$movie\$x\$);"
#no need for grep. SQL is self-contained.
echo $insertString >> /home/aydin/movies.sql
done
#execute script
psql -U "aydin.hassan" -d "aydin_1.0" -f /home/aydin/movies.sql;
To start a new file, use > instead of >>
Use single quotes ' for string constants without variables to expand
Use PostgreSQL dollar-quoting so you don't have to worry about single-quotes in the strings. You'll have to escape the $ character in the shell to remove its special meaning in the shell.
Use an "impossible" string for the dollar-quote, so it cannot appear in the string. If you don't have one, you can test for the quote-string and alter it in the unlikely case it should be matched, to be absolutely sure.
Use SELECT .. WHERE NOT EXISTS for the INSERT to automatically prevent already existing entries to be re-inserted. This prevents duplicate entries in the table completely - not just among the new entries.
An index on movies.movie (possibly, but not necessarily UNIQUE) would speed up the INSERTs.
Why bother with grep and sed and not just let the database detect duplicates?
Add a unique index on movie and create a new (temporary) insert script on each run and then execute it with autocommit (default) or with the -v ON_ERROR_ROLLBACK=1 option of psql. To get a full insert script of your movie database dump it with the --column-inserts option of pg_dump.
Hope this helps.
There's utility daemon called incron, which will fire your script whenever some file is written in watched directory. It uses kernel events, no loops - Linux only.
In its config (full file path):
/media/htpc IN_CLOSE_WRITE /home/aydin/added.sh $#/$#
Then simplest adder.sh script without any param check:
#!/bin/bash
cat <<-EOsql | psql -U "aydin.hassan" -d "aydin_1.0"
INSERT INTO movies (movie) VALUES (E'$1');
EOsql
You can have thousands of files in one directory and no issue as you can face with your original script.