How to make Hbase resilient to name node failures in Hadoop 2 - hadoop

there is solution for HA hadoop + hbase stack for hadoop 1, but i can't find any mentions on such solution for hadoop 2.
It has name node avaliability but you still need to set hostname in hadoop setup, so if master name node goes down hbase remains blinded.
What solutions can you suggest for making hbase resilient to name node failures?

You need to configure name service and use name service instead of specifying specific IP.
For example here "mycluster" is name service name.
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
And then configure for HA
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
In hbase-site.xml also you can use "mycluster" name service to refer the cluster.
For more details, Please refer here

Related

what would happen if nodes in hadoop change their IP address?

my hadoop clusters do not work fine because of the network conditions.What if i change the entire network,like another router,thus change the IP addresses? could the clusters still work by updating some configurations? or i must torn it down and rebuilt everything?
Thanks in advance
It works once you change the ip addresses into the configuration, why did not you use the DNS?
Ok, it was not a good answer, let me apologize and give a better answer.
If you need to change configuration on a running cluster you can decommission and commission the data nodes.
Switch off the data node is not a good idea.
Data Node Decomissioning
The fist step is tell to yarn you are going to remove some nodes, then you have to say the same to node manager.
I don't know if your system is configured for decommissioning, if it so you have the key yarn.resourcemanager.nodes.exclude-path into the yarn-site.xml and dfs.hosts.exclude into hdfs-site.xml
hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>$YOUR_PATH/dfs.exclude</value>
<final>true</final>
</property>
yarn-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>$YOUR_PATH/dfs.exclude</value>
<final>true</final>
</property>
Open the file $YOUR_PATH/dfs.exclude and add hostnames / ip addresses of node you need to stop.
execute
yarn rmadmin -refreshNodes
hdfs dfsadmin -refreshNodes
Check if the data nodes are in decommission checking the web interface.
Data Node Comissioning
Works in the same way of the Decommissioning
yarn-site.xml
<property>
<name>yarn.resourcemanager.nodes.include-path</name>
<value>$YOUR_PATH/dfs.include</value>
<final>true</final>
</property>
hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value>$YOUR_PATH/dfs.include</value>
<final>true</final>
</property>
Open the file $YOUR_PATH/dfs.include and add hostnames / ip addresses of node you need to add.
yarn rmadmin -refreshNodes
hdfs dfsadmin -refreshNodes
wait some time
hdfs dfsadmin -report
Now the hosts you added are into the list.
If your configurations are missing the above keys you need to halt/restart the node manager and yarn after adding them.
Using these procedure you can halt data nodes in a safe way.

How to configure HBase in a HA mode?

I don't understand one parameter from hbase-site.xml :
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdfsHost:8020/hbase</value>
</property>
What we have to put in that parameter if we configured HDFS cluster in HA mode? I mean we have 2 name nodes (nn1, nn2) and 2 data nodes (dn1, dn2) then which node we have to use in "hbase.rootdir" parameter?
The most logical answer is the name node which is currently active. But if we will use active name node and it fails then hbase cluster becomes unavailable even if our nn2 will change its status to active. Hbase cluster will not understand that we have changed our active NN.
Moreover, I have configured HBase cluster with the following parameter:
<property>
<name>hbase.rootdir</name>
<value>hdfs://nn1:8020/hbase</value>
</property>
It doesn't work.
1. HMaster starts
2. I put "http://nn1:16010" into browser
3. HMaster disappears
Here is my logs/hbase-hadoop-master-nn1.log :
http://paste.openstack.org/show/549232/
I couldn't find answers in documentation. Please, help me to find out how to configure it
You should insert the whole nameservice there instead of concrete namenode. I'm assuming that you have only one nameservice configured. Look at the dfs.nameservices property in hdfs-site.xml. There should be something like "nameservice1" in there. Then change hbase.rootdir like so :
<property>
<name>hbase.rootdir</name>
<value>hdfs://nameservice1:8020/hbase</value>
</property>
(fs.defaultFS property in core-site.xml also uses the same notation)
One thing to watch for is that hbase should have access to the latest hdfs configuration with HA. Otherwise it will complain about the nameservice name.
copy the hdfs-site.xml and core-site.xml to hbase/conf folder, this way you won't see the error for unknown name of the HA nameservice that you created.

Namenode high availability client request

Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would the client know that which namenode is active?
It would be great if you provide some workflow type diagram or something that explains request steps in detail(start to end).
Please check Namenode HA architecture with key entities in HDFS client requests handling.
Where this request go first? I mean how would client know that which
namenode is active?
For client/driver it doesn't matter which namenode is active. because we query on HDFS with nameservice id rather than hostname of namenode. nameservice will automatically transfer client requests to active namenode.
Example: hdfs://nameservice_id/rest/of/the/hdfs/path
Explanation:
How this hdfs://nameservice_id/ works and what are the confs involved in it?
In hdfs-site.xml file
Create a nameservice by adding an id to it(here nameservice_id is mycluster)
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new nameservice</description>
</property>
Now specify namenode ids to determine namenodes in cluster
dfs.ha.namenodes.[$nameservice ID]:
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>Unique identifiers for each NameNode in the nameservice</description>
</property>
Then link namenode ids with namenode hosts
dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>machine1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>machine2.example.com:8020</value>
</property>
After that specify the Java class that HDFS clients use to contact the Active NameNode so that DFS Client uses this class to determine which NameNode is currently serving client requests.
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
Finally HDFS URL will be like this after these configuration changes.
hdfs://mycluster/<file_lication_in_hdfs>
To answer your question I have taken few configuration only. please check the detailed documentation for how does Namenodes, Journalnodes and Zookeeper machines form Namenode HA in HDFS.
If hadoop cluster is configured with HA, then it will have namenode IDs in hdfs-site.xml like this :
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>namenode1,namenode2</value>
</property>
Whichever NameNode is started first will become active. You may choose to start the cluster in a specific order such that your preferred node starts first.
If you want to determine the current status of namenode, you can use getServiceStatus() command :
hdfs haadmin -getServiceState <machine-name>
Well, while writing the driver class, you need to set the following properties in configuration object:
public static void main(String[] args) throws Exception {
if (args.length != 2){
System.out.println("Usage: pgm <hdfs:///path/to/copy> </local/path/to/copy/from>");
System.exit(1);
}
Configuration conf = new Configuration(false);
conf.set("fs.defaultFS", "hdfs://nameservice1");
conf.set("fs.default.name", conf.get("fs.defaultFS"));
conf.set("dfs.nameservices","nameservice1");
conf.set("dfs.ha.namenodes.nameservice1", "namenode1,namenode2");
conf.set("dfs.namenode.rpc-address.nameservice1.namenode1","hadoopnamenode01:8020");
conf.set("dfs.namenode.rpc-address.nameservice1.namenode2", "hadoopnamenode02:8020");
conf.set("dfs.client.failover.proxy.provider.nameservice1","org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
FileSystem fs = FileSystem.get(URI.create(args[0]), conf);
Path srcPath = new Path(args[1]);
Path dstPath = new Path(args[0]);
//in case the same file exists on remote location, it will be overwritten
fs.copyFromLocalFile(false, true, srcPath, dstPath);
}
Request will go to the nameservice1 and further handled by Hadoop cluster as per the namenode status(active/standby).
For more details, please refer the HDFS High availability

Oozie on YARN - oozie is not allowed to impersonate hadoop

I'm trying to use Oozie from Java to start a job on a Hadoop cluster. I have very limited experience with Oozie on Hadoop 1 and now I'm struggling trying out the same thing on YARN.
I'm given a machine that doesn't belong to the cluster, so when I try to start my job I get the following exception:
E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hadoop
Why is that and what to do?
I read a bit about core-site properties that need to be set
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>users</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>master</value>
</property>
Does it seem that this is the problem? Should I contact people responsible for cluster to fix that?
Could there be problems because I'm using same code for YARN as I did for Hadoop 1? Should something be changed? For example, I'm setting nameNode and jobTracker in workflow.xml, should jobTracker exist, since there is now ResourceManager? I have set the address of ResourceManager, but left the property name as jobTracker, could that be the error?
Maybe I should also mention that Ambari is used...
Hi please update the core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
and jobTracker address is the Resourcemananger address that will not be the case . once update the core-site.xml file it will works.
Reason:
Cause of this type of error is- You run oozie server as a hadoop user but you define oozie as a proxy user in core-site.xml file.
Solution:
change the ownership of oozie installation directory to oozie user and run oozie server as a oozie user and problem will be solved.

Hive/HBase Integration - Zookeeper Session Closes Immediately

We have an 8 node cluster using CDH3u2 configured using Cloudera Manager. We have a dedicated master node running our only instance of zookeeper. When I configure hive to run local hadoop, executed from the master node, I have no problem retreiving the data from HBase. When I run distributed map/reduce via hive, I am getting the following error when the slave nodes connect to zookeeper.
HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default).
We have tried setting max connections higher (we even tried removing the limit). This is a development cluster that has very few users, I know that the problem is not that there are too many connections (I am able to connect to zookeeper from the slave nodes using ./zkCli).
Server side logs indicate that the session was terminated by the client.
Client side hadoop log says:
'Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
Any idea why I am unable to maintian a connection to zookeeper via Hive Map/Reduce?
Configs for hbase and zookeeper are:
# Autogenerated by Cloudera SCM on Wed Dec 28 08:42:23 CST 2011
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
maxClientCnxns=1000
minSessionTimeout=4000
maxSessionTimeout=40000
HBase Site-XML is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://alnnimb01:8020/hbase</value>
<description>The directory shared by region servers. Should be fully-qualified to include the filesystem to use. E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR</description>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The port master should bind to.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
<description>The port for the hbase master web UI Set to -1 if you do not want the info server to run.</description>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
<description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper files that are configured with a relative path will go under this node. By default, all of HBase's ZooKeeper file path are configured with a relative path, so they will all go under this directory unless changed.</description>
</property>
<property>
<name>zookeeper.znode.rootserver</name>
<value>root-region-server</value>
<description>Path to ZNode holding root region location. This is written by the master and read by clients and region servers. If a relative path is given, the parent folder will be ${zookeeper.znode.parent}. By default, this means the root location is stored at /hbase/root-region-server.</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>The ZooKeeper client port to which HBase clients will connect</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>alnnimb01.aln.experian.com</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".</description>
Turns out that the Map/Reduce submitted by Hive is trying to connect to zookeeper at 'localhost', regardless of how the zookeeper.quorom is setup in the config file. I changed /etc/hosts to have to the alias 'localhost' point to the IP of my master node and the connection to zookeeper is maintained. Still looking for a better resolution, but this will work for now.
I figured it out. It was a configuration issue (as I suspected all along). The solution was to:
-set ‘hbase.zookeeper.quorum’ within the ‘hive-site.xml’ and place it in the ‘hadoop-conf’ directory
What threw me off was that there is no 'hbase.zookeeper.quorum' in hive-default.xml. I had been playing with 'hive.zookeeper.quorum' which was not the correct configuration to change.
I'm sorry for posting a new answer. I wanted to comment on the previous answer but the commenting UI seems to have disappeared >.< ...
Anyway, I wanted to say that I am experiencing the same problem, and it is solved by doing the /etc/hosts hack, but that seems like a very dirty solution...
Did anyone figure out a way of fixing this cleanly...??
Thanks :) !
I meet exactly the same problem. What I did is to use the following conf to start hive cli and it works fine.
hive --hiveconf hbase.zookeeper.quorum={zk-host}
You should config HBase to use the external zookeeper and replace {zk-host} with the host of zookeeper.
I'm still looking for how to resolve this when using jdbc to access hive.

Resources