how to invoke shell script in hive - shell

Could someone please explain me how to invoke a shell script from hive?. I explored on this and found that we have to use source FILE command to invoke a shell script from hive. But I am not sure how exactly I can call my shell script from hive using source File command. So can someone help me on this? Thanks in Advance.

using ! <command> - Executes a shell command from the Hive shell.
test_1.sh:
#!/bin/sh
echo "This massage is from $0 file"
hive-test.hql:
! echo showing databases... ;
show databases;
! echo showing tables...;
show tables;
! echo runing shell script...;
! /home/cloudera/test_1.sh
output:
$ hive -v -f hive-test.hql
showing databases...
show databases
OK
default
retail_edw
sqoop_import
Time taken: 0.997 seconds, Fetched: 3 row(s)
showing tables...
show tables
OK
scala_departments
scaladepartments
stack
stackover_hive
Time taken: 0.062 seconds, Fetched: 4 row(s)
runing shell script...
This massage is from /home/cloudera/test_1.sh file

To invoke a shell script through HIVE CLI, please look at the example below.
!sh file.sh;
or
!./file.sh;
Please go though Hive Interactive Shell Commands section in the link below for more information.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli

Don't know if it would suit you, but you can inverse your problem by launching the hive commands from the bash shell in combination with the hive queries results. You can even create a single bash script for this to combine your hive queries with bash commands in a single script:
#!/bin/bash
hive -e 'SELECT count(*) from table' > temp.txt
cat temp.txt

Related

setFilterIfMissing not working in hbase shell script

I have hbase shell query which works correctly in power shell using hbase shell:
scan 'table', {FILTER=>"SingleColumnFilter('ids','author_ids','=','regexstring:.*USER123.*',true,true)
I'm trying to rewrite the same to the shell script that can be run from the command line in the background using &, but setFilterIfMissing not working - records where specific column is missing are also presented in the output, not sure why
echo scan "'table', {FILTER=>org.apache.hadoop.hbase.filter.SingleColumnValueFilter.new(Bytes.toBytes('ids'),Bytes.toBytes('author_ids'),org.apache.hadoop.hbase.filter.CompareFilter::CompareOp.valueOf('EQUAL'),org.apache.hadoop.hbase.filter.RegexStringComparator.new('.*USER123.*')).setFilterIfMissing(true)}" | hbase shell -n > output.out
your help will be much appreciated

I do not want by Bash script to stop if a Hive command fails

I have a bash script sending a lot of HiveQL commands to hive. The problem is that I do not want it to stop if one of these commands fails. I tried the usual Bash command:
set +e
but it does not work (the script stops running if one of the Hive command fails). Do you know where is the problem ? An option in my hive config or something :-) ?
Thank you !
EDIT: I use the Hiveshell, doing something like this:
#Send my command to hive ...
hive -S -e "\"$MyCommand\""
#... but I want my script continue running if the command fails :-).

Unable to exit Hive

I've just installed Hive on my Ubuntu machine (14.04). When I run hive in the terminal, it comes up with Logging initialized using configuration in jar:file:/home/nkhl/Documents/apachehive/lib/hive-common-1.2.1.jar!/hive-log4j.properties which is fine, I guess. Then the Hive shell opens. I haven't learnt Hive (yet) so when i run quit to quit the shell, it does nothing.
Here's the version of Hive i am on now:
Hive 1.2.1
Subversion git://localhost.localdomain/home/sush/dev/hive.git -r 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
From source with checksum ab480aca41b24a9c3751b8c023338231
I close the terminal off, to quit the shell. Please help!
Thanks in advance.
I guess you must have forgotten to write semi-colon at the end of quit.
Use quit or exit to leave the interactive shell as shown below. Notice semi-colon (i.e. ; )
hive> quit;
OR
hive> exit;
Here we can exit from hive shell by the following 3 commands
1.hive>exit;
2.hive>quit;
As we all know that we can connect to hiveserver2 from beeline,jdbc-odbc,trift api.So when you are using beeline shell then the first two commands will not work so its better to use the following command to exit from beeline.
!exit
no semicolon should be used
You should also be about to use ctrl+c to exit
This is the right way to quit or exit from hive session.
hive> quit;
or
hive> exit;
Not the ;
You can quit using Ctrl(key) + C(Key) or quit; at the hive shell prompt.
That should work!!
use ctr+c to exit the hive
or hive > exit;
Once you type exit without ';' then ctrl+C won't work, in this case you shall directly quit the shell by closing terminal.

Need to pass Variable from Shell Action to Oozie Shell using Hive

All,
Looking to pass variable from shell action to the oozie shell. I am running commands such as this, in my script:
#!/bin/sh
evalDate="hive -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalBaais)
echo "evaldate=$evalPartition"
Trick being that it is a hive command in the shell.
Then I am running this to get it in oozie:
${wf:actionData('getPartitions')['evaldate']}
But it pulls a blank every time! I can run those commands in my shell fine and it seems to work but oozie does not. Likewise, if I run the commands on the other boxes of the cluster, they run fine as well. Any ideas?
The issue was configuration regarding to my cluster. When I ran as oozie user, I had write permission issues to /tmp/yarn. With that, I changed the command to run as:
baais="export HADOOP_USER_NAME=functionalid; hive yarn -hiveconf hive.execution.engine=mr -e 'select max(cast(create_date as int)) from db.table;'"
Where hive allows me to run as yarn.
The solution to your problem is to use "-S" switch in hive command for silent output. (see below)
Also, what is "evalBaais"? You might need to replace this with "evalDate". So your code should look like this -
#!/bin/sh
evalDate="hive -S -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalDate)
echo "evaldate=$evalPartition"
Now you should be able to capture the out.

how to call Pig scripts from shell script sequentially

I have squence of Pig scripts in a file and I want to execute it from Shell script
which execute pig scripts sqeuenciatly.
For Ex:
sh script.sh /it/provider/file_name PIGddl.txt
Suppose PIGddl.txt has Pig scripts like
Record Count
Null validation e.t.c
If all the Pig queries are in one file then how to execute the pig scripts from Shell scripts?
below idea works ,but if you want sequential process like if 1 execute then execute 2 else execute 3 kind of flow,you may go with Oozie for running and scheduling the jobs.
#!/bin/sh
x=1
while [ $x -le 3 ]
do
echo "pig_dcnt$x.pig will be run"
pig -f /home/Scripts/PigScripts/pig_dcnt$x.pig --param timestamp=$timestamp1
x=$(( $x + 1 ))
done
I haven't tested this but I'm pretty sure this will work fine.
Lets assume you have two pig files which you want to run using shell script then you would write a shell script file with following:
#!/bin/bash
pig
exec pig_script_file1.pig
exec pig_script_file2.pig
so when you run this shell script, initially it will execute pig command and goes into grunt shell and there it will execute your pig files in the order that you have mentioned
Update:
The above solution doesn't work. Please refer the below one which is
tested
Update your script file with the following so that it can run your pig files in the order that you have defined
#!/bin/bash
pig pig_script_file1.pig
pig pig_script_file2.pig
Here is what you have to do
1. Keep xxx.pig file at some location #
2. to execute this pig script from shell use the below command
pig -p xx=date(if you have some arguments to pass) -p xyz=value(if there is another arguments to be passed) -f /path/xxx.pig
-f is used to execute the pig lines of code from .pig file.

Resources