Set hadoop user to launch spark-submit via oozie shell action - shell

I want to set hadoop user for spark-submit action when launching oozie workflow via shell action: oozie MR (that launches shell) should launch as user A, but spark-submit (that is started from shell script) should launch as user B.
I tried to set user.name=A (in job.properties) with 'export HADOOP_USER_NAME=B' (in shell script) but it doesn't work unless A=B.
Can anyone help?
P.S. I'm using oozie 4.0.0 with CDH 5.3.1 and spark 1.2.0 on yarn.

I'm surprised exporting the HADOOP_USER_NAME in the shell script isn't working, but you might try adding a
<shell ...>
...
<env-var>HADOOP_USER_NAME=B</env-var>
...
</shell>
to the shell action in the xml.

Related

What does "moveToLocal: Option '-moveToLocal' is not implemented yet." means?

I'm running a oozie workflow with some bash scripts in a hadoop environment (Hadoop 2.7.3). but my workflow is failing because my shell action get an error. After save the commands output in a file as a log I found in it the next entry:
moveToLocal: Option '-moveToLocal' is not implemented yet.
After I get this error my shell action fails becouse it takes this as an error and fails the entire action?
Also that line means that my current version of hadoop (2.7.3) doesn't support that command?
According to the documentation for 2.7.3 hadoop version:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#moveToLocal
Says this command is not support it yet. Now, my shell action take it as an exception and terminate the shell script. I'm changing that command for an equivalent.

Getting No such file or directory error when i use shell from oozie

i am trying to run shell script from oozie, when i am using hadoop commands inside shell script, it's working fine but when i am trying to run local commands, i am getting no such file or directory exception.
Example:
sample.sh
hadoop fs -touchz /user/123/test.txt
this script is working, when i use NFS path or local path i am getting
"No such file or directory" exception,
Example:
sample.sh
touch /HDFS/user/123/test.txt
is there anything i am missing, please let me know, '/HDFS' is NFS path.
The thing is all the Oozie workflows will be executed by the Oozie server so if you have the directory /HDFS/user/123 created already in the Oozie server, it will work.
So the solution to make it work would be configuring the NFS to work with (attach) the Oozie server.
Update
After clarifying some of my own unknowns, what I had mentioned above is not entirely correct. Here is my updated answer:
When you, the client, submit the Oozie job, with YARN, it goes to the ResourceManager which then negotiates and routes it to any of the NodeManagers, so for your case to work, you would have to have the NFS mount configured on all of the NodeManagers to work properly.

oozie fails with Could not load db driver class: oracle.jdbc.OracleDriver

I am getting below error while executing sqoop export command(in shell script) with oozie.
"java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver"
sqoop export from cli(edge node) works fine.
I have added the ojdbc6.jar to below locations.
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/sqoop/lib/
(HDFS locations)
/user/oozie/share/lib/sqoop/ and
/user/oozie/share/lib/lib_20161215195933/sqoop
i have also set oozie.use.system.libpath=true in my oozie job.properties file
Please guide me if i am missing any setting.
log content
Thanks & Regards,
Sonali
Make sure that you upload a file to a directory /user/oozie/share/lib/sqoop (it could looks like /user/oozie/share/lib/lib_${timestamp}/sqoop for Cloudera and HDP).
Check if ojdbc6.jar file is correct - check if it contains OracleDriver.class and make sure size of the file is ok. It could be error while downloading.
Check permissions to ojdbc6.jar file (eventually, you can try to give 755 permissions to this file). Check who is the owner of the file - it should be oozie by default.
Update Oozie sharelib by execute below command (run this command on the host where Oozie Server is located):
sudo -u oozie oozie admin -oozie http://<Oozie_Server_Host>:11000/oozie -sharelibupdate
Verify sharelib for sqoop:
sudo -u oozie oozie admin -oozie http://<Oozie_Server_Host>:11000/oozie -shareliblist sqoop*
You can always restart Oozie service. It should update sharelib.
Create a directory named lib next to your workflow.xml in HDFS and put jars in there. Oozie will automatically make those jars available to all actions in that workflow.
Cloudera users should check this article. Especially paragraph 'One Last Thing'.

Spark setAppName doesn't appear in Hadoop running applications UI

I am running a spark streaming job and when I set the app name (a better readable string) for my spark streaming job, It doesn't appear in the Hadoop running applications UI. I always see the class name as the name in Hadoop UI
val sparkConf = new SparkConf().setAppName("BetterName")
How to set the job name in Spark, so it appears in this Hadoop UI ?
Hadoop URL for running applications is - http://localhost:8088/cluster/apps/RUNNING
[update]
Looks like this is the issue only with Spark Streaming jobs, couldn't find solution on how to fix it though.
When submitting a job via spark-submit, the SparkContext created can't set the name of the app, as the YARN is already configured for job before Spark. For the app name to appear in the Hadoop running jobs UI, you have to set it in the command line for spark-submit "--name BetterName". I kick off my job with a shell script that calls spark-submit, so added the name to the command in my shell script.

Running Spark Jobs via Oozie

Is it possible to run Spark Jobs e.g. Spark-sql jobs via Oozie?
In the past we have used Oozie with Hadoop. Since we are now using Spark-Sql on top of YARN, looking for a way to use Oozie to schedule jobs.
Thanks.
Yup its possible ... The procedure is also same, that you have to provide Oozia a directory structure having coordinator.xml, workflow.xml and a lib directory containing your Jar files.
But remember Oozie starts the job with java -cp command, not with spark-submit, so if you have to run it with Oozie, Here is a trick.
Run your jar with spark-submit in background.
Look for that process in process list. It will be running under java -cp command but with some additional Jars, that are added by spark-submit. Add those Jars in CLASS_PATH. and that's it. Now you can run your Spark applications through Oozie.
1. nohup spark-submit --class package.to.MainClass /path/to/App.jar &
2. ps aux | grep '/path/to/App.jar'
EDITED: You can also use latest Oozie, which has Spark Action also.
To run Spark SQL by Oozie you need to use Oozie Spark Action.
You can locate oozie.gz on your distribution. Usually in cloudera you can find this oozie examples directory at below path.
]$ locate oozie.gz
/usr/share/doc/oozie-4.1.0+cdh5.7.0+267/oozie-examples.tar.gz
Spark SQL need hive-site.xml file for execution which you need to provide in workflow.xml
< spark-opts>--file /hive-site.xml < /spark-opts>

Resources