Run pig in oozie shell action - hadoop

I have created a simple pigscript which loads 10 records and stores in a table.
When I invoke this pig(stored in HDFS) file using oozie Shell action,I get and error as follows:
>>> Invoking Shell command line now >>
Exit code of the Shell command 5
<<< Invocation of Shell command completed <<<
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have put the shell file in the lib folder in the workspace and added all the required jar files in the same lib folder. Please help me to solve this issue.

I solved this this issue by the following steps:
1)Created a workflow in hue placing a pig action to invoke pigscript.
2)Generated the workflow.xml file by clicking the run button.
3)Ran the workflow.xml through commandline by adding a shell wrapper class which iterates and gives dates as input parameters.
JOB.PROPERTIES file:
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=<jobtracker>
nameNode=<nameNode>
oozie.wf.application.path = /user/hue/oozie/workspaces/hue-oozie-1470122057.79/workflow.xml
shell file:
for date in 20160101 20160102 20160103
oozie job -oozie http://<serverip>:11000/oozie -config job.properties run

Related

running hive script in a shell script via oozie shell action

I have shell script " test.sh " as below
#!/bin/bash
export UDR_START_DT=default.test_tab_$(date +"%Y%m%d" -d "yesterday")
echo "Start date : "$UDR_START_DT
hive -f tst_tab.hql
the above shell script is saved in a folder in hadoop
/scripts/Linux/test.sh
the tst_tab.hql contains a simple create table statement, as I am just testing to get the hive working. This hive file is saved in the My documents folder in hue (same folder where my workflow is saved)
I have created an Oozie workflow that calls test.sh in a shell action.
Issue I am facing:
the above shell script runs successfully until line 3.
but when I add line 4 (hive -f tst_tab.hql), it generates the error
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I verified YARN logs too. Nothing helpful there.

how to write a sqoop job using shell script and run them sequentially?

I need to run a set of sqoop jobs one after another inside a shell script. How can I achieve it? By default, it runs all the job in parallel which results in performance taking a hit. should i remove the "-m" parameter and run ?
-m parameter is used to run multiple map-only jobs for each sqoop command but not for all the commands that you issue.
so removing -m parameter will not help you to solve the problem.
first you need to write a shell script file with your sqoop commands
#!/bin/bash
sqoop_command_1
sqoop_command_2
sqoop_command_3
save the above command with some name like sqoop_jobs.sh
then issue permissions to run on the shell file
chmod 777 sqoop_jobs.sh
now you can run/execute your shell file by issuing the following command within your terminal
>./sqoop_jobs.sh
I hope this will help

Handling fs (hadoop shell) command errors in PIG script

I have a PIG script with a couple of statements.
Sample script:
register x.jar;
fs -rmr <file-path>;
LOAD 'X' AS (uuid:chararray,value:chararray);
I'm invoking the fs shell to be deleting a file on the HDFS:
fs -rmr <file-path>
This would delete if present else continue.
If the file/directory is not present, the script exits & throws an error saying: No such file or directory
I run it using the following command:
pig -f filename.pig -param parameter1=value
"-f" is stopping it forcefully.
If I avoid "-f", I get the below error:
2015-02-02 02:50:15,388 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : parameter1
How can I continue irrespective of this error?
If you want to use parameter substitution in a pig script you need to escape the parameters in the script correctly by using the $ sign:
fs -rmr $parameter1;
If you call the script with -param parameter1=value this will replace $parameter1 with value.
If you need to provide parameter values while executing the pig script , the following is the syntax.
pig --param parameter1=value filename.pig
Try and let me know if you find any issues

command in .bashrc file cannot be executed correctly when submitted a pbs job

I have a script to submit a job in bash shell, which looks like
#!/bin/bash
# PBS -l nodes=1:ppn=1
#PBS -l walltime=00:30:00
#PBS -N xxxxx
However, after I submitted my job, I got an error message in xxxxx.e8980 file as follows:
/home/xxxxx/.bashrc: line 1: /etc/ini.modules: No such file or directory
but the file /etc/ini.modules is there. Why the system cannot find it?
Thank you very much!
When referencing files in a job that will be submitted to a cluster, you must either force the job to the specific node(s) that have the file or make sure the file is present on all compute nodes in the cluster.

Trigger oozie from shell script

I am trying to run a shell script which contains a oozie job; trigger this shell script from crontab. Oozie is not getting triggered !!!
shell script myshell.sh contains
#!/bin/bash
oozie job -run -config $1
crontab
*/5 * * * * /path/myshell.sh example.properties
Is there something I need to set in my environment or am I missing something!
Thanks
It looks like you're missing the -oozie argument to specify the oozie api url.
oozie job -oozie http://ooziehost:11000/oozie -run -config $1
you could also set the OOZIE_URL environment variable
#!/bin/bash
OOZIE_URL=http://ooziehost:11000/oozie
oozie job -run -config $1

Resources