I want to create some output files using shell script as a part of my project. The process works fine when the script is run independently.
But fails with below error when I try to integrate with Oozie workflow:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Sample workflow:
<action name="scr-filecreation">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${SimpleFileCreation}</exec>
<argument>${filedir}</argument>
<file>${SimpleFileCreation}#${SimpleFileCreation}</file>
<capture-output/>
</shell>
<ok to="income-success" />
<error to="income-failure" />
</action>
Below is the simple script to create a file with a few lines: fileCreate.sh
echo $1 > $1/file.txt
count_val1=`cat dir1/* | wc -l`
count_val2=`cat dir2/* | wc -l`
echo $count_acct >> $1/file.txt
echo $count_cust >> $1/file.txt
Related
I'm trying to make a script to check if there is any file missing in a hdfs path. The idea is to include it in an oozie workflow that when no file is found fails and does not continue with the flow
ALTRAFI="/input/ALTrafi*.txt"
ALTDATOS="/ALTDatos*.txt"
ALTARVA="/ALTarva*.txt"
TRAFCIER="/TrafCier*.txt"
if hdfs dfs -test -e $ALTRAFI; then
echo "[$ALTRAFI] Archive not found"
exit 1
fi
if hdfs dfs -test -e $ALTDATOS; then
echo "[$ALTDATOS] Archive not found"
exit 2
fi
if hdfs dfs -test -e $ALTARVA; then
echo "[$ALTARVA] Archive not found"
exit 3
fi
if hdfs dfs -test -e $TRAFCIER; then
echo "[$TRAFCIER] Archive not found"
exit 4
fi
But the script does not fail when it does not find a file and continues the flow of the workflow in oozie
oozie flow:
<start to="ValidateFiles"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="ValidateFiles">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>/hdp/apps/2.5.0.0-1245/tez/tez.tar.gz</value>
</property>
</configuration>
<exec>/produccion/apps/traficotasado/ValidateFiles.sh</exec>
<file>/produccion/apps/traficotasado/ValidateFiles.sh#ValidateFiles.sh</file> <!--Copy the executable to compute node's current working directory -->
</shell>
<ok to="CopyFiles"/>
<error to="Kill"/>
</action>
<action name="CopyFiles">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>/hdp/apps/2.5.0.0-1245/tez/tez.tar.gz</value>
</property>
</configuration>
<exec>/produccion/apps/traficotasado/CopyFiles.sh</exec>
<file>/produccion/apps/traficotasado/CopyFiles.sh#CopyFiles.sh</file> <!--Copy the executable to compute node's current working directory -->
</shell>
<ok to="DepuraFilesStage"/>
<error to="Kill"/>
</action>
Thanks for the help
If you want to know how I solved it, go here.
My workflow looks like this.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="${PROJECT_NAME}">
<global>
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>${REDUCE_TASKS}</value>
</property>
<property>
<name>today_without_dash</name>
<value>${today_without_dash}</value>
</property>
<property>
<name>yesterday_with_dash</name>
<value>${yesterday_with_dash}</value>
</property>
</configuration>
</global>
<start to="start_fair_usage"/>
<action name="start_fair_usage">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<exec>${copy_file}</exec>
<argument>${today_without_dash}</argument>
<argument>${mta}</argument>
<!-- <file>${path}#${start_fair_usage}</file> -->
<file>${path}${copy_file}#${copy_file}</file>
<capture-output/>
</shell>
<ok to="END"/>
<error to="KILL"/>
</action>
copy_file.sh
# directories in comverse where sub mtr rcr files are kept
echo "directories"
dirs=(
/user/comverse/data/${today_without_dash}_B
)
# clear the hdfs directory of old files and copy new files from comverse
echo "remove old files "${mta}
hadoop fs -rm -skipTrash /apps/hive/warehouse/db.db/fair_usage/fct_evkuzmin/file_${mta}/*
for i in $(hadoop fs -ls "${dirs[#]}" | egrep ${mta}.gz | awk -F " " '{print $8}')
do
hadoop fs -cp $i /apps/hive/warehouse/db.db/fair_usage/fct_evkuzmin/file_${mta}
echo "copy file - "${i}
done
echo "end copy "${mta}" files"
${today_without_dash} comes from coordinator
<property>
<name>today_without_dash</name>
<value>${coord:formatTime(coord:nominalTime(),'yyyyMMdd')}</value>
</property>
There are no errors, but it does not work either and I don't understand why.
There are some files in this
/apps/hive/warehouse/db.db/fair_usage/fct_evkuzmin/file_${mta}
directory, but they don't get deleted. Nothing gets copied there either. There are files in this directory.
/user/comverse/data/${today_without_dash}_B
${mta} is from job.properties. It's value is mta
copy_file.sh works when I start it through CLI, without OOZIE
EDIT
I asked around and was told that this might happen because the files I try to copy are too big.
As a part of OOZIE batch i.e.in my workflow,I have scheduled one shell action,it contains number of shell commands to process the data in customized form.
suppose if any exception/error occurred in one of the command by the time of processing,is it possible to handle exception/errors in OOZIE side?
OR
How can I handle in this scenario?
I want to handle in oozie side. please help me on this.
Thanks In advance. :)
i tried below code but i am getting error.
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>java</exec>
<argument>${nameNode}/user/cloudera/shelltest.sh</argument>
<file>${nameNode}/user/cloudera/shelltest.sh#shelltest.sh</file>
<capture-output/>
</shell>
<ok to="check-output"/>
<error to="fail"/>
</action>
<decision name="check-output">
<switch>
<case to="1">
${wf:actionData("shell1")["error.on"]}
</case>
<default to="fail"/>
</switch>
</decision>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
my shelltest.sh
#!/bin/sh
cp /Volumes/Documents/criticalfile.txt /Volumes/BackUp/.
if [ "$?" != "0" ]; then
echo "[Error] copy failed!" 1>&2
exit 1
fi
when i am running oozie job getting below error
Error: E0701 : E0701: XML schema error, cvc-pattern-valid: Value '1' is not facet-valid with respect to pattern '([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}' for type 'IDENTIFIER'
please let me know anything i missed.
Hi i am running oozie with shell script. In that shell script i am using sparkR jobs.whenever running oozie jobs i am getting error with library.
here is my error.
Stdoutput Running /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark/bin/spark-submit --class edu.berkeley.cs.amplab.sparkr.SparkRRunner --files pi.R --master yarn-client /SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar pi.R yarn-client 4
Stdoutput Error in library(SparkR) : there is no package called ‘SparkR’
Stdoutput Execution halted
Exit code of the Shell command 1
<<< Invocation of Shell command completed <<<
my job.properties file
nameNode=hdfs://ip-172-31-41-199.us-west-2.compute.internal:8020
jobTracker=ip-172-31-41-199.us-west-2.compute.internal:8032
queueName=default
oozie.libpath=hdfs://ip-172-31-41-199.us-west- 2.compute.internal:8020/SparkR-pkg/lib/
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=shell_example
oozie.wf.application.path=${oozieProjectRoot}/apps/shell
my workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.1" name="Test">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>script.sh</exec>
<file>oozie-oozi/script.sh#script.sh</file>
<file>/user/karun/examples/pi.R</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Incorrect output</message>
</kill>
<end name="end"/>
</workflow-app>
my shellscript file
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark
export YARN_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export HADOOP_CMD=/usr/bin/hadoop
/SparkR-pkg/lib/SparkR/sparkR-submit --master yarn-client pi.R yarn-client 4
I don't know how to resolve the issue.any help will be appreciated...
I have an Oozie workflow that contains a shell action that invokes a Python script that is failing with the following error.
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
The Python script (hello.py) is simple.
print("Hello, World!")
Here is my Oozie workflow.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hello">
<start to="shell-check-hour"/>
<action name="shell-check-hour">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hello.py</exec>
<file>hdfs://localhost:8020/user/test/hello.py</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Can someone see anything wrong with what I am doing? If I replace the Python script with a shell script, the shell script executes fine (both files are in the same directory). This leads me to believe that the problem is that for whatever reason, Python isn't being recognised by Oozie.
Add Hash-Bang to your script
For example, my script started with
#!/usr/bin/env python