Oozie job submission fails - hadoop

I am trying to submit an example map reduce oozie job and all the properties are configured properly with regards to the path and name node and job-tracker port etc. I validated the workflow.xml too . when I deploy the job I get a job id and when I check the status I see a status KILLED and the details basically say that
/var/tmp/oozie/oozie-oozi7188507762062318929.dir/map-reduce-launcher.jar does not exist.

In order to resolve this error, just crate hdfs folders and give appropriate permissions to them.
http://kadirsert.blogspot.com.tr/2014/03/oozie-says-jar-does-not-exist.html

Local file system (no HDFS) should have '/var/tmp/oozie' directory.
If the directory doesn't exist, create the directory and restart the Oozie server. Then there comes a lot of files under /var/tmp/oozie including *-launcher.jar files.
'/var/tmp/oozie' is the value of -Djava.io.tmpdir variable in Oozie server start-up command line. You can check the value using 'ps -ef | grep oozie' where the Oozie server is running.

Related

Getting No such file or directory error when i use shell from oozie

i am trying to run shell script from oozie, when i am using hadoop commands inside shell script, it's working fine but when i am trying to run local commands, i am getting no such file or directory exception.
Example:
sample.sh
hadoop fs -touchz /user/123/test.txt
this script is working, when i use NFS path or local path i am getting
"No such file or directory" exception,
Example:
sample.sh
touch /HDFS/user/123/test.txt
is there anything i am missing, please let me know, '/HDFS' is NFS path.
The thing is all the Oozie workflows will be executed by the Oozie server so if you have the directory /HDFS/user/123 created already in the Oozie server, it will work.
So the solution to make it work would be configuring the NFS to work with (attach) the Oozie server.
Update
After clarifying some of my own unknowns, what I had mentioned above is not entirely correct. Here is my updated answer:
When you, the client, submit the Oozie job, with YARN, it goes to the ResourceManager which then negotiates and routes it to any of the NodeManagers, so for your case to work, you would have to have the NFS mount configured on all of the NodeManagers to work properly.

Unable to deploy Spark jobs using Oozie

I need to keep a spark job running 24/7 and for this I am using Oozie. To do this I have written a workflow.xml and job.properties files, containing the needful information to invoke it.
However when I try to send the oozie job using this:
oozie job –config /home/oozie/tst/job.properties -run
I get the following error message, which is very clear:
java.io.IOException: configuration is not specified
at org.apache.oozie.cli.OozieCLI.getConfiguration(OozieCLI.java:816)
at org.apache.oozie.cli.OozieCLI.jobCommand(OozieCLI.java:1055)
at org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:686)
at org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:639)
at org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:225)
configuration is not specified
The problem here is that the configuration file (job.properties) exists locally on the path specified. I also PUT the directory containing both files and .jar in the HDFS.
Any idea why is this failing?
Is Oozie the best tool for this task I have?
The config parameter takes local path not HDFS. check job.properties present in /home/oozie/tst/job.properties
check job.properties contain oozie.wf.application.path=PATH_TO_HDFS_PATH_WHERE_WORKFLOW.XML_IS_PRESENT
Plus I see the dash(-) given in config parameter is different then dash(-) in run parameter
Specify the host in your command
oozie job --oozie http://your_host:11000/oozie -config /home/oozie/tst/job.properties -run
11000 is deafult port

Why MR2 map task is running under 'yarn' user and not under user I ran hadoop job?

I'm trying to run mapreduce job on MR2, Hadoop ver. 2.6.0-cdh5.8.0. Job has relative path to directory which has a lot of files to be compressed based on some criteria(not really necessary for this question). I'm running my job as following:
sudo -u my_user hadoop jar my_jar.jar com.example.Main
There is a folder on HDFS under path /user/my_user/ with files. But when I'm running my job I got following exception:
java.io.FileNotFoundException: File /user/yarn/<path_from_job> does not exist.
I'm migrating this job from MR1 where this job is working correctly. My suggestion is this is happening due to YARN, because each container started under YARN user. In my job configuration I've tried to set mapreduce.job.user.name="my_user" but this didn't help.
I've found ${user.home} usage in me Job configuration, but I don't know aware where it is set and is it possible to change this.
The only solution I found so far is to provide absolute path to folder. Is there any other way around, because I feel like this is not correct approach.
Thank you

Oozie run shell scripts on random nodes

I wrote smth like custom oozie FTP action (simple example described in "Professional Hadoop Solutions By: Boris Lublinsky; Kevin T. Smith; Alexey Yakubovich"). We have HDFS on node1 and Oozie server on node2. Node2 also has HDFS client.
My problem:
Oozie job started from node1 (All needed files located on HDFS on node1).
Oozie custom FTP action successfully downloaded CSV files from FTP on node2 (oozie server located)
I should pass file into HDFS and create external table from CSV on node1.
I tried to use Java action and call fileSystem.moveFromLocalFile(...) method. Also I tried to use Shell action like /usr/bin/hadoop fs -moveFromLocal /tmp\import_folder/filename.csv /user/user_for_import/imported/filename.csv but I hadn't effect. All actions seems tried to look files on node1. The same result if I start oozie job from node2.
Question: can I set node for FTP action to load files from FTP on node1? Or can I have any other ways to pass downloaded files in HDFS instead described?
Oozie runs all its actions as MR jobs on nodes from a configured Map Reduce cluster. There is no way to make Oozie run some actions on a particular node.
Basically, you should use Flume to ingest files into HDFS. Set up a Flume agent on your FTP node.
Ozzie allows user to run a shell script on a particular node via oozie sssh shell extension.
https://oozie.apache.org/docs/4.2.0/DG_SshActionExtension.html

Hadoop on Mesos fails with "Could not find or load main class org.apache.hadoop.mapred.MesosExecutor"

I have a Mesos cluster setup -- I have verified that the master can see the slaves -- but when I attempt to run a Hadoop job, all tasks wind up with a status of LOST. The same error is present in all the slave stderr logs:
Error: Could not find or load main class org.apache.hadoop.mapred.MesosExecutor
and that is the only line in the stderr logs.
Following the instructions on http://mesosphere.io/learn/run-hadoop-on-mesos/, I have put a modified Hadoop distribution on HDFS which each slave can access.
In the lib directory of the Hadoop distribution, I have added hadoop-mesos-0.0.4.jar and mesos-0.14.2.jar.
I have verified that each slave does in fact download this Hadoop distribution, and that hadoop-mesos-0.0.4.jar contains the class org.apache.hadoop.mapred.MesosExecutor, so I cannot figure out why the class cannot be found.
I am using Hadoop from CDH4.4.0 and mesos-0.15.0-rc4.
Does any one have any suggestions as to what might be the problem? I know I would always start with a CLASSPATH problem, but, in this case, the mesos-slave is downloading, unpacking, and attempting to run a Hadoop TaskTracker so I would imagine any CLASSPATH would be setup by the mesos-slave.
In the stdout of the slave logs, the environment is printed. There is a MESOS_HADOOP_HOME which is empty. Should this be set to something? If it is supposed to be set to the downloaded Hadoop distribution, I cannot set it in advance because the Hadoop distribution is downloaded to a new location every time.
In the event that is related (some permissions issue maybe), when attempting to browse slave logs via the master UI, I get the error Error browsing path: ....
The user running mesos-slave can browse to the correct directory when I do so manually.
I found the problem. bin/hadoop of the downloaded Hadoop distribution attempts to find its location by running which $0. However, that will find a current Hadoop installation if one exists (i.e. /usr/lib/hadoop), and will load the jars under that installation's lib directory instead of the downloaded one's lib directory.
I had to modify bin/hadoop of the downloaded distribution to find its own location with dirname $0 instead of which $0.

Resources