Getting No such file or directory error when i use shell from oozie - shell

i am trying to run shell script from oozie, when i am using hadoop commands inside shell script, it's working fine but when i am trying to run local commands, i am getting no such file or directory exception.
Example:
sample.sh
hadoop fs -touchz /user/123/test.txt
this script is working, when i use NFS path or local path i am getting
"No such file or directory" exception,
Example:
sample.sh
touch /HDFS/user/123/test.txt
is there anything i am missing, please let me know, '/HDFS' is NFS path.

The thing is all the Oozie workflows will be executed by the Oozie server so if you have the directory /HDFS/user/123 created already in the Oozie server, it will work.
So the solution to make it work would be configuring the NFS to work with (attach) the Oozie server.
Update
After clarifying some of my own unknowns, what I had mentioned above is not entirely correct. Here is my updated answer:
When you, the client, submit the Oozie job, with YARN, it goes to the ResourceManager which then negotiates and routes it to any of the NodeManagers, so for your case to work, you would have to have the NFS mount configured on all of the NodeManagers to work properly.

Related

Error in formating the namenode in Hadoop single cluster node

I am trying to install and configure hadoop in the system Ubuntu 16.04, as per the guidelines of https://data-flair.training/blogs/installation-of-hadoop-3-x-on-ubuntu/
all the steps were run successfully, but while trying to run the command hdfs namenode -format, I get a message
There is some problem with your bashrc file. Just check your variables inside bashrc. Even I faced the same problem when I started with hadoop. Mention correct path for each and every variable and afterwards use source ~/.bashrc to commit the changes done to your bashrc file

Why MR2 map task is running under 'yarn' user and not under user I ran hadoop job?

I'm trying to run mapreduce job on MR2, Hadoop ver. 2.6.0-cdh5.8.0. Job has relative path to directory which has a lot of files to be compressed based on some criteria(not really necessary for this question). I'm running my job as following:
sudo -u my_user hadoop jar my_jar.jar com.example.Main
There is a folder on HDFS under path /user/my_user/ with files. But when I'm running my job I got following exception:
java.io.FileNotFoundException: File /user/yarn/<path_from_job> does not exist.
I'm migrating this job from MR1 where this job is working correctly. My suggestion is this is happening due to YARN, because each container started under YARN user. In my job configuration I've tried to set mapreduce.job.user.name="my_user" but this didn't help.
I've found ${user.home} usage in me Job configuration, but I don't know aware where it is set and is it possible to change this.
The only solution I found so far is to provide absolute path to folder. Is there any other way around, because I feel like this is not correct approach.
Thank you

How can you resolve Oozie error JA009

I am running a simple Oozie workflow on Cloudera VM. The sub-workflow calls a shell script which sends a test email. However, I am getting the JA009 error:
(JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.).
I have already changed mapreduce.framework.name from yarn to classic in the following places:
/etc/oozie/conf/hadoop-conf/core-site.xml
/etc/oozie/conf/hadoop-config.xml
/etc/hadoop/conf/mapred-site.xml
Also, in /etc/hadoop/conf/hadoop-env.sh I changed:
"export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce"
to:
"export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce"
Is there anything I am missing? Yarn is not showing up in hadoop fs -ls /user (hive, pig, spark etc are). So I am assuming Yarn is not pre-installed here.

nutch 1.7 keeps change filesystem to local when run from oozie

I built and ran nutch 1.7 from command line just fine
hadoop jar apache-ntuch-1.7.job org.apache.nutch.crawl.Crawl hdfs://myserver/nutch/urls -dir hdfs://myserver/nutch/crawl -depth 5 -topN100
but when I ran the same thing from oozie, it keeps getting
Wrong FS: hdfs://myserver/nutch/crawl/crawldb/current, expected: file:///
I checked into the source, every time the code does
FileSystem fs = new JobClient(job).getFs();
the fs gets changed back to local fs.
I override all the instance of these statements, the job then dies in the fetch stage, simply says
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:838)
it really appears that running from oozie causes the wrong version of JobClient class (from hadoop-core.jar) to be loaded.
Anyone saw this before?
it seems the oozie conf directory is missing the proper *-site.xml files. I added mapred-site.xml to /etc/oozie/conf/hadoop-conf directory, and this problem went away.

Oozie job submission fails

I am trying to submit an example map reduce oozie job and all the properties are configured properly with regards to the path and name node and job-tracker port etc. I validated the workflow.xml too . when I deploy the job I get a job id and when I check the status I see a status KILLED and the details basically say that
/var/tmp/oozie/oozie-oozi7188507762062318929.dir/map-reduce-launcher.jar does not exist.
In order to resolve this error, just crate hdfs folders and give appropriate permissions to them.
http://kadirsert.blogspot.com.tr/2014/03/oozie-says-jar-does-not-exist.html
Local file system (no HDFS) should have '/var/tmp/oozie' directory.
If the directory doesn't exist, create the directory and restart the Oozie server. Then there comes a lot of files under /var/tmp/oozie including *-launcher.jar files.
'/var/tmp/oozie' is the value of -Djava.io.tmpdir variable in Oozie server start-up command line. You can check the value using 'ps -ef | grep oozie' where the Oozie server is running.

Resources