Oozie shell Action - Running hive from shell issue - shell

Based on a condition being true I am executing hive -e in shell script.It works fine.When I put this script in Shell action in Oozie and run ,I get a scriptName.sh: line 42: hive:command not found exception.
I tried passing the < env-var >PATH=/usr/lib/hive< /env-var> in the shell action, but I guess I am making some mistake there,because I get the same error scriptName.sh: line 42: hive:command not found
Edited:
I used which hive in the shell script. Its output is not consistent.I get two variations of output :
1. /usr/bin/hive along with a Delegation token can be issued only with kerberos or web authentication Java IOException."
2.which : hive not in {.:/sbin:/usr/bin:/usr/sbin:...}

Ok finally I figured it out .Might be a trivial thing for experts on Shell but can help someone starting out.
1. hive : command not found It was not a classpath issue.It was a shell issue.The environment i am running in is a korn shell (echo $SHELL to find out). But the hive script(/usr/lib/hive/bin/hive.sh) is a bash shell.So i changed the shebang (#! /bin/bash) in my script and it worked.
2.Delegation Token can only be issued with kerberos or web authentication.
In my hive script i added SET mapreduce.job.credentials.binary = ${HADOOP_TOKEN_FILE_LOCATION} HADOOP_TOKEN_FILE_LOCATION is a variable that holds the location of jobToken.This token needs to be passed for authentication of access to HDFS data(in my case,an HDFS read operation,through Hive Select query) in a secure cluster.Know more on Delegation Token Here .

Obviously, u miss shell environment variables.
To confirm it, use export in called shell by oozie.
If u use oozie call shell, a simple way is use /bin/bash -l your_script.
PS. PATH is a list of directories, so u need append ${HIVE_HOME}/bin to your PATH not ${HIVE_HOME}/bin/hive.

Related

Proper syntax for bash script line

Writing a script to retrieve various environment parameters back from a list of servers. My script returns no value when ran but the same command returns the desired value outside of a script.
I have tried using a couple of variations to retrieve the same data. One of the commands fails because of restrictions placed on the accounts I have access to. The second command works but only if executed in an elevated mode.
This fails with access denied (pwdx is restricted)
dzdo pgrep -f /some/path | xargs pwdx
This works outside of a script but returns no value within a script
dzdo /bin/readlink -e /proc/"$(pgrep -f /some/path)"/cwd
When using "bash -x" to execute my scriipt, I see the "readlink" code is blank.
Ideally, I would like to return the PID and path of the process running as the "pgrep" command does. I can work with the path alone as returned by the "readlink" version returns. The end goal is to gather the information from several servers for audit purposes. (version, etc.)
Am I using the wrong syntax for the "readlink" command? I'm fairly new to coding bash scripts so I appreciate any guidance to help understand when to to what if I'm using a command in a script vs command line.
If pwdx is the restricted program, you need to run that with dzdo, not pgrep.
pgrep -f /some/path | dzdo xargs pwdx

sqlplus does not execute the query if it is called by a ssh external connection

I have a script lying into a Unix server which looks like this:
mainScript.sh
#some stuff here
emailScript.sh $PARAM_1 $PARAM_2
#some other stuff here
As you can see, mainScript.sh is calling another script called emailScript.sh.
The emailScript.sh is supposed to perform a query via sqlplus, then parse the results and return them via email if any.
The interesting part of the code in emailScript.sh is this:
DB_SERVER=$1
USERNAME=$2
PASSWORD=$3
EVENT_DATE=$4
LIST_AUTHORIZED_USERS=$5
ENVID=$6
INTERESTED_PARTY=$7
RAW_LIST=$(echo "select distinct M_OS_USER from MX_USER_CONNECTION_DBF where M_EVENT_DATE >= to_date('$EVENT_DATE','DD-MM-YYYY') and M_OS_USER is not null and M_OS_USER not in $LIST_AUTHORIZED_USERS;" | sqlplus -s $USERNAME/$PASSWORD#$DB_SERVER)
As you can see, all I do is just creating the variable RAW_LIST executing a query with sqlplus.
The problem is the following:
If I call the script mainScript.sh via command line (PuTTy / KiTTy), the sqlplus command works fine and returns something.
If I call the script mainScript.sh via an external job (a ssh connection opened on the server via a Jenkins job), the sqlplus returns nothing and takes 0 seconds, meaning it doesn't even try to execute itself.
In order to debug, I've printed all the variables, the query itself in order to check if something wasn't properly set: everything is correctly set.
It really seems that the command sqlplus is not recognized, or something like this.
Would you please have any idea on how I can debug this? Where should I look the issue?
You need to consider few things here. While you are running the script, from which directory location you are executing the script? And while you are executing the script from your external application from which directory location it is executing the script. Better use full path to the script like /path/to/the/script/script.sh or use cd /path/to/the/script/ command to go to the script directory first and execute the script. Also check execute permission for your application. You as an user might have permission to execute the script or sql command but your application does not have that permission. Check the user id for your application and add that into the proper group.

I do not want by Bash script to stop if a Hive command fails

I have a bash script sending a lot of HiveQL commands to hive. The problem is that I do not want it to stop if one of these commands fails. I tried the usual Bash command:
set +e
but it does not work (the script stops running if one of the Hive command fails). Do you know where is the problem ? An option in my hive config or something :-) ?
Thank you !
EDIT: I use the Hiveshell, doing something like this:
#Send my command to hive ...
hive -S -e "\"$MyCommand\""
#... but I want my script continue running if the command fails :-).

Need to pass Variable from Shell Action to Oozie Shell using Hive

All,
Looking to pass variable from shell action to the oozie shell. I am running commands such as this, in my script:
#!/bin/sh
evalDate="hive -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalBaais)
echo "evaldate=$evalPartition"
Trick being that it is a hive command in the shell.
Then I am running this to get it in oozie:
${wf:actionData('getPartitions')['evaldate']}
But it pulls a blank every time! I can run those commands in my shell fine and it seems to work but oozie does not. Likewise, if I run the commands on the other boxes of the cluster, they run fine as well. Any ideas?
The issue was configuration regarding to my cluster. When I ran as oozie user, I had write permission issues to /tmp/yarn. With that, I changed the command to run as:
baais="export HADOOP_USER_NAME=functionalid; hive yarn -hiveconf hive.execution.engine=mr -e 'select max(cast(create_date as int)) from db.table;'"
Where hive allows me to run as yarn.
The solution to your problem is to use "-S" switch in hive command for silent output. (see below)
Also, what is "evalBaais"? You might need to replace this with "evalDate". So your code should look like this -
#!/bin/sh
evalDate="hive -S -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalDate)
echo "evaldate=$evalPartition"
Now you should be able to capture the out.

how to write a sqoop job using shell script and run them sequentially?

I need to run a set of sqoop jobs one after another inside a shell script. How can I achieve it? By default, it runs all the job in parallel which results in performance taking a hit. should i remove the "-m" parameter and run ?
-m parameter is used to run multiple map-only jobs for each sqoop command but not for all the commands that you issue.
so removing -m parameter will not help you to solve the problem.
first you need to write a shell script file with your sqoop commands
#!/bin/bash
sqoop_command_1
sqoop_command_2
sqoop_command_3
save the above command with some name like sqoop_jobs.sh
then issue permissions to run on the shell file
chmod 777 sqoop_jobs.sh
now you can run/execute your shell file by issuing the following command within your terminal
>./sqoop_jobs.sh
I hope this will help

Resources