Oozie - Run workflow by command line with configuration file in HDFS - hadoop

as a newbie with Oozie, I tried to run some tutorials by command line. My stepByStep:
upload my Oozie project (workflow xml file, job.properties file, jar and data) to HDFS via HUE interface. In my job.properties files, I've indicated every information like data name node, path to my application, ...
running via HUE interface, simply, I check on check box of workflow xml file and submit
I would like to run my Oozie project by command line:
with job.properties file in local, I run:
oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
How can I run my Oozie project instead of with the job.properties in local (instead of the config file in local, I want to run my job with the configuration file in HDFS)?
Thanks for any suggestion and feel free to comment if my question is not clear!

I don't know if there is a direct way, but you certainly could do something like
oozie job -oozie http://localhost:11000/oozie -config <(hdfs dfs -cat examples/apps/map-reduce/job.properties) -run

Related

oozie fails with Could not load db driver class: oracle.jdbc.OracleDriver

I am getting below error while executing sqoop export command(in shell script) with oozie.
"java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver"
sqoop export from cli(edge node) works fine.
I have added the ojdbc6.jar to below locations.
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/sqoop/lib/
(HDFS locations)
/user/oozie/share/lib/sqoop/ and
/user/oozie/share/lib/lib_20161215195933/sqoop
i have also set oozie.use.system.libpath=true in my oozie job.properties file
Please guide me if i am missing any setting.
log content
Thanks & Regards,
Sonali
Make sure that you upload a file to a directory /user/oozie/share/lib/sqoop (it could looks like /user/oozie/share/lib/lib_${timestamp}/sqoop for Cloudera and HDP).
Check if ojdbc6.jar file is correct - check if it contains OracleDriver.class and make sure size of the file is ok. It could be error while downloading.
Check permissions to ojdbc6.jar file (eventually, you can try to give 755 permissions to this file). Check who is the owner of the file - it should be oozie by default.
Update Oozie sharelib by execute below command (run this command on the host where Oozie Server is located):
sudo -u oozie oozie admin -oozie http://<Oozie_Server_Host>:11000/oozie -sharelibupdate
Verify sharelib for sqoop:
sudo -u oozie oozie admin -oozie http://<Oozie_Server_Host>:11000/oozie -shareliblist sqoop*
You can always restart Oozie service. It should update sharelib.
Create a directory named lib next to your workflow.xml in HDFS and put jars in there. Oozie will automatically make those jars available to all actions in that workflow.
Cloudera users should check this article. Especially paragraph 'One Last Thing'.

Unable to deploy Spark jobs using Oozie

I need to keep a spark job running 24/7 and for this I am using Oozie. To do this I have written a workflow.xml and job.properties files, containing the needful information to invoke it.
However when I try to send the oozie job using this:
oozie job –config /home/oozie/tst/job.properties -run
I get the following error message, which is very clear:
java.io.IOException: configuration is not specified
at org.apache.oozie.cli.OozieCLI.getConfiguration(OozieCLI.java:816)
at org.apache.oozie.cli.OozieCLI.jobCommand(OozieCLI.java:1055)
at org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:686)
at org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:639)
at org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:225)
configuration is not specified
The problem here is that the configuration file (job.properties) exists locally on the path specified. I also PUT the directory containing both files and .jar in the HDFS.
Any idea why is this failing?
Is Oozie the best tool for this task I have?
The config parameter takes local path not HDFS. check job.properties present in /home/oozie/tst/job.properties
check job.properties contain oozie.wf.application.path=PATH_TO_HDFS_PATH_WHERE_WORKFLOW.XML_IS_PRESENT
Plus I see the dash(-) given in config parameter is different then dash(-) in run parameter
Specify the host in your command
oozie job --oozie http://your_host:11000/oozie -config /home/oozie/tst/job.properties -run
11000 is deafult port

error while running example of oozie job

I tried running my first oozie job by following a blog post.
I used oozie-examples.tar.gz, after extracting, placed examples in hdfs.
I tried running map-reduce job in it but unfortunately got an error.
Ran below command:
oozie job -oozie http://localhost:11000/oozie -config /examples/apps/map-reduce/job.properties -run
Got the error:
java.io.IOException: configuration is not specified at
org.apache.oozie.cli.OozieCLI.getConfiguration(OozieCLI.java:787) at
org.apache.oozie.cli.OozieCLI.jobCommand(OozieCLI.java:1026) at
org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:662) at
org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:615) at
org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:218) configuration is
not specified
I don't know which configuration it is asking for as I am using Cloudera VM and it has by default got all the configurations set in it.
oozie job -oozie http://localhost:11000/oozie -config /examples/apps/map-reduce/job.properties -run
The -config parameter takes an local path not an HDFS path. The workflow.xml needs to be present in the HDFS and path is defined in the job.properties file with the property:
oozie.wf.application.path=<path to the workflow.xml>

Running Spark Jobs via Oozie

Is it possible to run Spark Jobs e.g. Spark-sql jobs via Oozie?
In the past we have used Oozie with Hadoop. Since we are now using Spark-Sql on top of YARN, looking for a way to use Oozie to schedule jobs.
Thanks.
Yup its possible ... The procedure is also same, that you have to provide Oozia a directory structure having coordinator.xml, workflow.xml and a lib directory containing your Jar files.
But remember Oozie starts the job with java -cp command, not with spark-submit, so if you have to run it with Oozie, Here is a trick.
Run your jar with spark-submit in background.
Look for that process in process list. It will be running under java -cp command but with some additional Jars, that are added by spark-submit. Add those Jars in CLASS_PATH. and that's it. Now you can run your Spark applications through Oozie.
1. nohup spark-submit --class package.to.MainClass /path/to/App.jar &
2. ps aux | grep '/path/to/App.jar'
EDITED: You can also use latest Oozie, which has Spark Action also.
To run Spark SQL by Oozie you need to use Oozie Spark Action.
You can locate oozie.gz on your distribution. Usually in cloudera you can find this oozie examples directory at below path.
]$ locate oozie.gz
/usr/share/doc/oozie-4.1.0+cdh5.7.0+267/oozie-examples.tar.gz
Spark SQL need hive-site.xml file for execution which you need to provide in workflow.xml
< spark-opts>--file /hive-site.xml < /spark-opts>

How can I check Oozie logs

My coordinator failed with Error : E0301 invalid resource [filename]
when I do hadoop fs -ls [filename] the file is listed.
how can I debug what is wrong.
how can I check log files???
oozie job -log requires jobId. in my case i dont have job id. how can I see logs in that case. appreciate responses.
thank you
If you are looking for a command line way to do this, you can run the following:
oozie job -oozie http://localhost:11000 -info <wfid>
oozie job -oozie http://localhost:11000 -log <wfid>
If you have the $OOZIE_URL set, then you do not need the -oozie parm in the above statements. This first command will show you the status of the job and each action. The second command will dig into the oozie log and display the part in the log that pertains to the workflow id that was passed in.
cd /var/log/oozie/
ls
You should see the log file there.
I highly recommend using the oozie webconsole when new to oozie. If you are using Cloudera it's under "Enabling the Oozie Web Console" here http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_17_6.html for CDH4. CDH3 link is similar.
Also the jobid is printed when you submit the job.

Resources