Return status of a hive script - bash

I have two questions regarding the capture of the return status/exit status of hive script.
Capture the return status in a unix script
try2.hql
select from_unixtime(unix_timestamp(),'YYYY-MM-DD')
This is called in the shell script try1.sh
echo "Start of script"
hive -f try2.hql
echo "End of script"
Now, I need to capture the return status of try2.hql. How can I do this ?
Control flow when multiple queries are available
There are a couple of hive queries in a script try3.hql
select stockname, stock_date from mystocks_stg;
select concat('Top10_Stocks_High_OP_',sdate,'_',srnk) as rowkey, sname, sdate, sprice, srnk from (
select stockname as sname, stock_date as sdate, stock_price_open as sprice,rank() over(order by stock_price_open desc) as srnk
from mystocks
where from_unixtime(unix_timestamp(stock_date,'yyyy-mm-dd'),'yyyymmdd') = '${hiveconf:batch_date}') tab
where tab.srnk <= 10;
try3.hql is called in the script try4.sh be passing the relevant parameters.
My question : In try3.hql, if there is any error in the first query, I must return to the shell script and abort the program, without executing the second script.
Please suggest.

For part 1 of your problem, you can change your script to exit the status of hive:
echo "Start of script"
hive -f try2.hql; hive_status=$?
echo "End of script"
exit $hive_status

And I have a solution for part 2.
You do know that "hive" CLI is deprecated in favor of "beeline" according to the documentation ?
HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline,
which is a JDBC client based on SQLLine. Due to new development being
focused on HiveServer2, Hive CLI will soon be deprecated in favor of
Beeline (HIVE-10511).
In beeline, by default, your script will stop as soon as there is an error in it. This is controlled by the "force" parameter.
--force=[true/false] continue running script even after errors
BTW, the solution provided by codeforester for part 1 still works with beeline.

echo "Start of script"
hive -f try2.hql
hive_status=$?
echo "End of script"
echo $hive_status>>$HOME/exit_status.log
In the home directory, you'll find the exit_status.log file created, in which you'll have the exit status of the script.

Related

how to use Big SQL commands to automate the synchronization with HIVE through shell script?

I have written a small shell script to automate the Big SQL and HIVE synchronization.
Code is as below
echo "Login to BigSql"
<path to>/jsqsh bigsql --user=abc --password=pwd
echo "login succesfull"
echo "Syncing hive table <tbl_name> to Big SQL"
call syshadoop.hcat_sync_objects('DB_name','tbl_name','a','REPLACE','CONTINUE');
echo "Syncing hive table TRAINING_TRACKER to Big SQL Successfully"
Unfortunately, I am getting the message:
Login to BigSql
Welcome to JSqsh 4.8
Type "\help" for help topics. Using JLine.
And then it enters the Big SQL command prompt. Now when I type "quit" and hit enter, it gives me following messages:
login succesful
Syncing hive table <tbl_name> to Big SQL
./script.sh: line 10: call syshadoop.hcat_sync_objects(DB_name,tbl_name,a,REPLACE,CONTINUE): command not found
What am I doing wrong?
You would need to redirect the output of your later commands into the jsqsh command. E.g. see this example
You can start JSqsh and run the script at the same time with this command:
/usr/ibmpacks/common-utils/current/jsqsh/bin/jsqsh bigsql < /home/bigsql/mySQL.sql
from here https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.2/com.ibm.swg.im.bigsql.doc/doc/bsql_jsqsh.html
There already is an auto-hcat sync job in Big SQL that does exactly what you're trying to do
Check if the job is running by
su - bigsql (or whatever instance owner)
db2 connect to bigsql
db2 "select NAME, BEGIN_TIME, END_TIME, INVOCATION, STATUS from
SYSTOOLS.ADMIN_TASK_STATUS where BEGIN_TIME > (CURRENT TIMESTAMP - 60 minutes)
and name ='Synchronise MetaData Changes from Hive' "
if you don't see an output , simply enable it through Ambari :
Enable Automatic Metadata Sync

hive query inside shell script

I want to run the hive query inside the shell script. I want to exit shell script and throw error if my hive query fails.
Right now, even if my hive query fails, the next steps are getting executed. can someone help with this:
val=hive -e "
select col1 from table_name;"
(assuming the table has only one row)
echo "don't run if hive fails"
hive -e "select col1 from table_name"
if test $? -ne 0
then
exit 1
fi

how to invoke shell script in hive

Could someone please explain me how to invoke a shell script from hive?. I explored on this and found that we have to use source FILE command to invoke a shell script from hive. But I am not sure how exactly I can call my shell script from hive using source File command. So can someone help me on this? Thanks in Advance.
using ! <command> - Executes a shell command from the Hive shell.
test_1.sh:
#!/bin/sh
echo "This massage is from $0 file"
hive-test.hql:
! echo showing databases... ;
show databases;
! echo showing tables...;
show tables;
! echo runing shell script...;
! /home/cloudera/test_1.sh
output:
$ hive -v -f hive-test.hql
showing databases...
show databases
OK
default
retail_edw
sqoop_import
Time taken: 0.997 seconds, Fetched: 3 row(s)
showing tables...
show tables
OK
scala_departments
scaladepartments
stack
stackover_hive
Time taken: 0.062 seconds, Fetched: 4 row(s)
runing shell script...
This massage is from /home/cloudera/test_1.sh file
To invoke a shell script through HIVE CLI, please look at the example below.
!sh file.sh;
or
!./file.sh;
Please go though Hive Interactive Shell Commands section in the link below for more information.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Don't know if it would suit you, but you can inverse your problem by launching the hive commands from the bash shell in combination with the hive queries results. You can even create a single bash script for this to combine your hive queries with bash commands in a single script:
#!/bin/bash
hive -e 'SELECT count(*) from table' > temp.txt
cat temp.txt

Table not found exception when running hive query via an Oozie shell script

I m trying to run a hive count query on a table from a bash action in the Oozie workflow but I always get a table not found exception.
#!/bin/bash
COUNT=$(hive -S -e "SELECT COUNT(*) FROM <table_name> where <condition>;")
echo $COUNT
The idea is to get the count stored in a variable for further analysis. This works absolutely fine if run it directly from a local file on the shell.
I can do this by splitting it into 2 separate actions, where I first output hive query result to a temp directory and then read the file in the bash script.
Any help appreciated. Thanks!
Fixed it. I had some user permissions issue in accessing the table and also had to add the following property config to do the trick:
SET mapreduce.job.credentials.binary = ${HADOOP_TOKEN_FILE_LOCATION}

Write a report of what the shell would done

In my UNIX shell script I need to insert a parameter to start it. This parameter can assume two valors (test and production). Inside the code I make an insert in an Oracle db. After this insert I have to make a condition that if the parameter is test then write the spool in another file and don't connect the db, else connect the db and make the insert normally. Fundamentally there are two ways; in the test I just want to see what the shell is going to do and the production that it makes the normal insert and his operations. I try this after the insert but I get a error:
if [[ "$choice" = "test" ]];
then
${TMP_PART2DAT} > ${TMP_REPORT}
else
SP_SQLLOGIN="$ORACLE_DB_OWN/$ORACLE_PWD#$ORACLE_SID"
sqlplus -S -L ${SP_SQLLOGIN} #${TMP_PART2SQL}
fi
Any ideas?
Try running your shell script with "bash -x" mode. You would be able to trace the command execution.
Try
cat ${TMP_PART2DAT} > ${TMP_REPORT}
for line 3 of your script.
This will overwrite everything in TMP_REPORT with the contents of TMP_PART2DAT.

Resources