Oozie java-action does not include core-site.xml - hadoop

When running an Oozie java action on a freshly installed Hadoop HDP 2.2.2.4, and for example tries to access hdfs it accesses the wrong filesystem:
java.lang.IllegalArgumentException: Wrong FS: hdfs:/tmp/text.txt, expected: file:///
It can be fixed by included the core-site.xml in the Oozie action:
<file>hdfs:/path-to-core-site.xml-on-hdfs</file>
But what is the reason and what is the proper fix?

The reason of that the core-site.xml is not included in the class-path of the java-action is because the property mapreduce.application.classpath points to the wrong directory:
<snip>/etc/hadoop/conf/secure
It should point to
<snip>/etc/hadoop/conf
i.e, the full property should be something like, in mapred-site.xml:
<property>
<name>mapreduce.application.classpath</name>
<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf</value>
</property>

Those files are include in hadoop classpath, as I know since HDP 2.2, you need to add
// loading action conf prepared by Oozie
Configuration actionConf = new Configuration(false);
actionConf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));
to use *-site.xml, you can get the details in oozie document
https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.7_Java_Action

Related

oozie java.io.IOException: No FileSystem for scheme: hdfs

I have setup the oozie 4.3.1 with Hadoop 2.7.3.
oozie has been setup and running successfully and able to see web console http://localhost:11000/oozie/
and also confirm using oozie status command.
Issue 1:
While running the oozie examples after changing the job.properties with relevant details getting the error.
nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
bin/oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run
Error: E0902 : E0902: Exception occured: [No FileSystem for scheme: hdfs]
Issue 2: oozie admin -sharelibupdate
[ShareLib update status]
host = http://f091403isdpbato05:11000/oozie
status = java.io.IOException: No FileSystem for scheme: hdfs
hdfs path and other oozie related .xml files also updated with proper configurations.
Please let me know any solution to move ahead.
You can try adding the following to you core-site.xml :
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>

How to configure HBase in a HA mode?

I don't understand one parameter from hbase-site.xml :
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdfsHost:8020/hbase</value>
</property>
What we have to put in that parameter if we configured HDFS cluster in HA mode? I mean we have 2 name nodes (nn1, nn2) and 2 data nodes (dn1, dn2) then which node we have to use in "hbase.rootdir" parameter?
The most logical answer is the name node which is currently active. But if we will use active name node and it fails then hbase cluster becomes unavailable even if our nn2 will change its status to active. Hbase cluster will not understand that we have changed our active NN.
Moreover, I have configured HBase cluster with the following parameter:
<property>
<name>hbase.rootdir</name>
<value>hdfs://nn1:8020/hbase</value>
</property>
It doesn't work.
1. HMaster starts
2. I put "http://nn1:16010" into browser
3. HMaster disappears
Here is my logs/hbase-hadoop-master-nn1.log :
http://paste.openstack.org/show/549232/
I couldn't find answers in documentation. Please, help me to find out how to configure it
You should insert the whole nameservice there instead of concrete namenode. I'm assuming that you have only one nameservice configured. Look at the dfs.nameservices property in hdfs-site.xml. There should be something like "nameservice1" in there. Then change hbase.rootdir like so :
<property>
<name>hbase.rootdir</name>
<value>hdfs://nameservice1:8020/hbase</value>
</property>
(fs.defaultFS property in core-site.xml also uses the same notation)
One thing to watch for is that hbase should have access to the latest hdfs configuration with HA. Otherwise it will complain about the nameservice name.
copy the hdfs-site.xml and core-site.xml to hbase/conf folder, this way you won't see the error for unknown name of the HA nameservice that you created.

Oozie on YARN - oozie is not allowed to impersonate hadoop

I'm trying to use Oozie from Java to start a job on a Hadoop cluster. I have very limited experience with Oozie on Hadoop 1 and now I'm struggling trying out the same thing on YARN.
I'm given a machine that doesn't belong to the cluster, so when I try to start my job I get the following exception:
E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hadoop
Why is that and what to do?
I read a bit about core-site properties that need to be set
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>users</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>master</value>
</property>
Does it seem that this is the problem? Should I contact people responsible for cluster to fix that?
Could there be problems because I'm using same code for YARN as I did for Hadoop 1? Should something be changed? For example, I'm setting nameNode and jobTracker in workflow.xml, should jobTracker exist, since there is now ResourceManager? I have set the address of ResourceManager, but left the property name as jobTracker, could that be the error?
Maybe I should also mention that Ambari is used...
Hi please update the core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
and jobTracker address is the Resourcemananger address that will not be the case . once update the core-site.xml file it will works.
Reason:
Cause of this type of error is- You run oozie server as a hadoop user but you define oozie as a proxy user in core-site.xml file.
Solution:
change the ownership of oozie installation directory to oozie user and run oozie server as a oozie user and problem will be solved.

Use of core-site.xml in mapreduce program

I have seen mapreduce programs using/adding core-site.xml as a resource in the program. What is or how can core-site.xml be used in mapreduce programs ?
From documentation,
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
core-default.xml : Read-only defaults for hadoop,
core-site.xml: Site-specific configuration for a given hadoop installation
Configuration config = new Configuration();
config.addResource(new Path("/user/hadoop/core-site.xml"));
config.addResource(new Path("/user/hadoop/hdfs-site.xml"));
Core-site.xml and HDFS-site.xml will denote the hadoop and its hdfs so that your mapreduce program will findout the which cluster to be pointing out and where it to be performed..

How to use JobClient in hadoop2(yarn)

(Solved)I want to contact hadoop cluster and get some job/task information.
In hadoop1, I was able to use JobClient ( local pesudo distributed mode, use Eclipse):
JobClient jobClient = new JobClient(new InetSocketAddress("127.0.0.1",9001),new JobConf(config));
JobID job_id = JobID.forName("job_xxxxxx");
RunningJob job = jobClient.getJob(job_id);
.....
Today I set up a pesudo distributed hadoop2 YARN cluster, however, the above code doesn't work. I use the port of resource manager(8032).
JobClient jobClient = new JobClient(new InetSocketAddress("127.0.0.1",8032),new JobConf(config));
This line gives exception:
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
I search this exception but all solutions are not working. I use eclipse, and I have add all hadoop jars including hadoop-mapreduce-client-xxx. Also, I can successfully run example programs on my cluster.
Any suggestions on how to use JobClient on hadoop2 yarn?
Update: I am able to solve this issue by compile with the same hadoop lib as the rm server. In Eclipse it still gives this exception but after I compiled and deployed my project it works fine.(not sure why as in hadoop1 it works in eclipse) There is no need to change the api, JobClient is still functioning well in hadoop2
Have you configured the mapred-site.xml file as followed? It is located in $HADOOP_HOME/etc/hadoop/ in hadoop 2.x
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
edit: Also make sure that your yarn-site.xml (same location) contains the following property:
<property>
<name>yarn.resourcemanager.address</name>
<value>host:port</value>
</property>
One last thing: I strongly advise you to work with hostnames instead of IPs. There are known cases of failure with hadoop when IPs are set in the configuration files.

Resources