running hive script in a shell script via oozie shell action - shell

I have shell script " test.sh " as below
#!/bin/bash
export UDR_START_DT=default.test_tab_$(date +"%Y%m%d" -d "yesterday")
echo "Start date : "$UDR_START_DT
hive -f tst_tab.hql
the above shell script is saved in a folder in hadoop
/scripts/Linux/test.sh
the tst_tab.hql contains a simple create table statement, as I am just testing to get the hive working. This hive file is saved in the My documents folder in hue (same folder where my workflow is saved)
I have created an Oozie workflow that calls test.sh in a shell action.
Issue I am facing:
the above shell script runs successfully until line 3.
but when I add line 4 (hive -f tst_tab.hql), it generates the error
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I verified YARN logs too. Nothing helpful there.

Related

how to invoke shell script in hive

Could someone please explain me how to invoke a shell script from hive?. I explored on this and found that we have to use source FILE command to invoke a shell script from hive. But I am not sure how exactly I can call my shell script from hive using source File command. So can someone help me on this? Thanks in Advance.
using ! <command> - Executes a shell command from the Hive shell.
test_1.sh:
#!/bin/sh
echo "This massage is from $0 file"
hive-test.hql:
! echo showing databases... ;
show databases;
! echo showing tables...;
show tables;
! echo runing shell script...;
! /home/cloudera/test_1.sh
output:
$ hive -v -f hive-test.hql
showing databases...
show databases
OK
default
retail_edw
sqoop_import
Time taken: 0.997 seconds, Fetched: 3 row(s)
showing tables...
show tables
OK
scala_departments
scaladepartments
stack
stackover_hive
Time taken: 0.062 seconds, Fetched: 4 row(s)
runing shell script...
This massage is from /home/cloudera/test_1.sh file
To invoke a shell script through HIVE CLI, please look at the example below.
!sh file.sh;
or
!./file.sh;
Please go though Hive Interactive Shell Commands section in the link below for more information.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Don't know if it would suit you, but you can inverse your problem by launching the hive commands from the bash shell in combination with the hive queries results. You can even create a single bash script for this to combine your hive queries with bash commands in a single script:
#!/bin/bash
hive -e 'SELECT count(*) from table' > temp.txt
cat temp.txt

Run pig in oozie shell action

I have created a simple pigscript which loads 10 records and stores in a table.
When I invoke this pig(stored in HDFS) file using oozie Shell action,I get and error as follows:
>>> Invoking Shell command line now >>
Exit code of the Shell command 5
<<< Invocation of Shell command completed <<<
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have put the shell file in the lib folder in the workspace and added all the required jar files in the same lib folder. Please help me to solve this issue.
I solved this this issue by the following steps:
1)Created a workflow in hue placing a pig action to invoke pigscript.
2)Generated the workflow.xml file by clicking the run button.
3)Ran the workflow.xml through commandline by adding a shell wrapper class which iterates and gives dates as input parameters.
JOB.PROPERTIES file:
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=<jobtracker>
nameNode=<nameNode>
oozie.wf.application.path = /user/hue/oozie/workspaces/hue-oozie-1470122057.79/workflow.xml
shell file:
for date in 20160101 20160102 20160103
oozie job -oozie http://<serverip>:11000/oozie -config job.properties run

Need to pass Variable from Shell Action to Oozie Shell using Hive

All,
Looking to pass variable from shell action to the oozie shell. I am running commands such as this, in my script:
#!/bin/sh
evalDate="hive -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalBaais)
echo "evaldate=$evalPartition"
Trick being that it is a hive command in the shell.
Then I am running this to get it in oozie:
${wf:actionData('getPartitions')['evaldate']}
But it pulls a blank every time! I can run those commands in my shell fine and it seems to work but oozie does not. Likewise, if I run the commands on the other boxes of the cluster, they run fine as well. Any ideas?
The issue was configuration regarding to my cluster. When I ran as oozie user, I had write permission issues to /tmp/yarn. With that, I changed the command to run as:
baais="export HADOOP_USER_NAME=functionalid; hive yarn -hiveconf hive.execution.engine=mr -e 'select max(cast(create_date as int)) from db.table;'"
Where hive allows me to run as yarn.
The solution to your problem is to use "-S" switch in hive command for silent output. (see below)
Also, what is "evalBaais"? You might need to replace this with "evalDate". So your code should look like this -
#!/bin/sh
evalDate="hive -S -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalDate)
echo "evaldate=$evalPartition"
Now you should be able to capture the out.

how to call Pig scripts from shell script sequentially

I have squence of Pig scripts in a file and I want to execute it from Shell script
which execute pig scripts sqeuenciatly.
For Ex:
sh script.sh /it/provider/file_name PIGddl.txt
Suppose PIGddl.txt has Pig scripts like
Record Count
Null validation e.t.c
If all the Pig queries are in one file then how to execute the pig scripts from Shell scripts?
below idea works ,but if you want sequential process like if 1 execute then execute 2 else execute 3 kind of flow,you may go with Oozie for running and scheduling the jobs.
#!/bin/sh
x=1
while [ $x -le 3 ]
do
echo "pig_dcnt$x.pig will be run"
pig -f /home/Scripts/PigScripts/pig_dcnt$x.pig --param timestamp=$timestamp1
x=$(( $x + 1 ))
done
I haven't tested this but I'm pretty sure this will work fine.
Lets assume you have two pig files which you want to run using shell script then you would write a shell script file with following:
#!/bin/bash
pig
exec pig_script_file1.pig
exec pig_script_file2.pig
so when you run this shell script, initially it will execute pig command and goes into grunt shell and there it will execute your pig files in the order that you have mentioned
Update:
The above solution doesn't work. Please refer the below one which is
tested
Update your script file with the following so that it can run your pig files in the order that you have defined
#!/bin/bash
pig pig_script_file1.pig
pig pig_script_file2.pig
Here is what you have to do
1. Keep xxx.pig file at some location #
2. to execute this pig script from shell use the below command
pig -p xx=date(if you have some arguments to pass) -p xyz=value(if there is another arguments to be passed) -f /path/xxx.pig
-f is used to execute the pig lines of code from .pig file.

cront not working with hadoop command in shell script

I'm trying to schedule a cronjob using crontab to execute a shell script which executes a list of hadoop commands sequentially, but when i look at the hadoop folder the folders are not created or dropped. The hadoop connectivity on our cluster is pretty slow. so these hadoop command might take sometime to execute due to number of retries.
Cron expression
*/5 * * * * sh /test1/a/bin/ice.sh >> /test1/a/run.log
shell script
#!/bin/sh
if [ $# == 1 ]
then
TODAY=$1
else
TODAY=`/bin/date +%m%d%Y%H%M%S`
fi
# define seed folder here
#filelist = "ls /test1/a/seeds/"
#for file in $filelist
for file in `/bin/ls /test1/a/seeds/`
do
echo $file
echo $TODAY
INBOUND="hadoop fs -put /test1/a/seeds/$file /apps/hdmi-set/inbound/$file.$TODAY/$file"
echo $INBOUND
$INBOUND
SEEDDONE="hadoop fs -put /test1/a/seedDone /apps/hdmi-set/inbound/$file.$TODAY/seedDone"
echo $SEEDDONE
$SEEDDONE
done
echo "hadoop Inbound folders created for job1 ..."
Since there are no output that has been captured that could be used to debug the output, I can only speculate.
But from my past experience, one of the common reason hadoop jobs fail when they are spawned through scripts is that HADOOP_HOME is not available when these commands are executed.
Usually that is not the case when working directly from the terminal. Try adding the following to both ".bashrc" and ".bash_profile" or ".profile":
export HADOOP_HOME=/usr/lib/hadoop
You may have to change the path based on your specific installation.
And yes as comment says, don't just redirect standard output but error too in the file.

Resources