what would happen if nodes in hadoop change their IP address? - hadoop

my hadoop clusters do not work fine because of the network conditions.What if i change the entire network,like another router,thus change the IP addresses? could the clusters still work by updating some configurations? or i must torn it down and rebuilt everything?
Thanks in advance

It works once you change the ip addresses into the configuration, why did not you use the DNS?
Ok, it was not a good answer, let me apologize and give a better answer.
If you need to change configuration on a running cluster you can decommission and commission the data nodes.
Switch off the data node is not a good idea.
Data Node Decomissioning
The fist step is tell to yarn you are going to remove some nodes, then you have to say the same to node manager.
I don't know if your system is configured for decommissioning, if it so you have the key yarn.resourcemanager.nodes.exclude-path into the yarn-site.xml and dfs.hosts.exclude into hdfs-site.xml
hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>$YOUR_PATH/dfs.exclude</value>
<final>true</final>
</property>
yarn-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>$YOUR_PATH/dfs.exclude</value>
<final>true</final>
</property>
Open the file $YOUR_PATH/dfs.exclude and add hostnames / ip addresses of node you need to stop.
execute
yarn rmadmin -refreshNodes
hdfs dfsadmin -refreshNodes
Check if the data nodes are in decommission checking the web interface.
Data Node Comissioning
Works in the same way of the Decommissioning
yarn-site.xml
<property>
<name>yarn.resourcemanager.nodes.include-path</name>
<value>$YOUR_PATH/dfs.include</value>
<final>true</final>
</property>
hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value>$YOUR_PATH/dfs.include</value>
<final>true</final>
</property>
Open the file $YOUR_PATH/dfs.include and add hostnames / ip addresses of node you need to add.
yarn rmadmin -refreshNodes
hdfs dfsadmin -refreshNodes
wait some time
hdfs dfsadmin -report
Now the hosts you added are into the list.
If your configurations are missing the above keys you need to halt/restart the node manager and yarn after adding them.
Using these procedure you can halt data nodes in a safe way.

Related

httpfs error Operation category READ is not supported in state standby

I am working on hadoop apache 2.7.1 and I have a cluster that consists of 3 nodes
nn1
nn2
dn1
nn1 is the dfs.default.name, so it is the master name node.
I have installed httpfs and started it of course after restarting all the services. When nn1 is active and nn2 is standby I can send this request
http://nn1:14000/webhdfs/v1/aloosh/oula.txt?op=open&user.name=root
from my browser and a dialog of open or save for this file appears, but when I kill the name node running on nn1 and start it again as normal then because of high availability nn1 becomes standby and nn2 becomes active.
So here httpfs should work, even if nn1 becomes stand by, but sending the same request now
http://nn1:14000/webhdfs/v1/aloosh/oula.txt?op=open&user.name=root
gives me the error
{"RemoteException":{"message":"Operation category READ is not supported in state standby","exception":"RemoteException","javaClassName":"org.apache.hadoop.ipc.RemoteException"}}
Shouldn't httpfs overcome nn1 standby status and bring the file? Is that because of a wrong configuration, or is there any other reason?
My core-site.xml is
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
It looks like HttpFs is not High Availability aware yet. This could be due to the missing configurations required for the Clients to connect with the current Active Namenode.
Ensure the fs.defaultFS property in core-site.xml is configured with the correct nameservice ID.
If you have the below in hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
then in core-site.xml, it should be
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
Also configure the name of the Java class which will be used by the DFS Client to determine which NameNode is the currently Active and is serving client requests.
Add this property to hdfs-site.xml
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
Restart the Namenodes and HttpFs after adding the properties in all nodes.

How to configure HBase in a HA mode?

I don't understand one parameter from hbase-site.xml :
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdfsHost:8020/hbase</value>
</property>
What we have to put in that parameter if we configured HDFS cluster in HA mode? I mean we have 2 name nodes (nn1, nn2) and 2 data nodes (dn1, dn2) then which node we have to use in "hbase.rootdir" parameter?
The most logical answer is the name node which is currently active. But if we will use active name node and it fails then hbase cluster becomes unavailable even if our nn2 will change its status to active. Hbase cluster will not understand that we have changed our active NN.
Moreover, I have configured HBase cluster with the following parameter:
<property>
<name>hbase.rootdir</name>
<value>hdfs://nn1:8020/hbase</value>
</property>
It doesn't work.
1. HMaster starts
2. I put "http://nn1:16010" into browser
3. HMaster disappears
Here is my logs/hbase-hadoop-master-nn1.log :
http://paste.openstack.org/show/549232/
I couldn't find answers in documentation. Please, help me to find out how to configure it
You should insert the whole nameservice there instead of concrete namenode. I'm assuming that you have only one nameservice configured. Look at the dfs.nameservices property in hdfs-site.xml. There should be something like "nameservice1" in there. Then change hbase.rootdir like so :
<property>
<name>hbase.rootdir</name>
<value>hdfs://nameservice1:8020/hbase</value>
</property>
(fs.defaultFS property in core-site.xml also uses the same notation)
One thing to watch for is that hbase should have access to the latest hdfs configuration with HA. Otherwise it will complain about the nameservice name.
copy the hdfs-site.xml and core-site.xml to hbase/conf folder, this way you won't see the error for unknown name of the HA nameservice that you created.

Should I have to run history server in all nodes to get job history in Hadoop Cluster WebUI

I am facing one issue in Hadoop cluster. I have a Hadoop cluster with 5 datanodes and one edge/gateway node.
My issue is that I had to start the history server in each of those nodes (1 namenode and 5 datanodes) to get any job history from hadoop webUI for any submitted job.
I have added mapreduce.jobhistory.address and mapreduce.jobhistory.webapp.address in mapred-site.xml
But it's not working properly I guess.
If I start the history server in name node or any other node only , Hadoop Cluster Web-UI is unable to show me the job history and ends up with some error.
My Mapred-site XML
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoopmaster:8021</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoopmaster:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoopmaster:19888</value>
</property>
</configuration>
For the time being as a workaround I start the history server in each node (namenode and all data node) manually. But think this is not right way.
Now I have 5 data node only so its still feasible to start history server in each and every node manually , but if case of multiple nodes(say 100/200) it will not be feasible any more to start history server in every node. There should be some standard solution for this issue...
Please help me out if anyone knows how to resolve this issue.
Thanks in advanceā€¦.
Finally I am able to solve the issue.
Actually in case of mapreduce.jobhistory.address , it will history server is running in one node only (jps).
It's working properly now...

hadoop no data node started

I am following this tutorial.
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
I got to this point and started the nodes.
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
But then when I run the next steps, it looks like no data node is running (as I get errors saying so).
Why is the data node down? And how can I fix this?
Here is the log from my data node.
hduser#test02:/usr/local/hadoop$ jps
3792 SecondaryNameNode
3929 Jps
3258 NameNode
hduser#test02:/usr/local/hadoop$ cat /usr/local/hadoop/logs/hadoop-hduser-datanode-test02.out
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
-m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 3781
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
hduser#test02:/usr/local/hadoop$
EDIT:
Seems I had this port number wrong.
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
Now when I made it right (i.e. equal to 9000) I have no name node starting up.
hduser#test02:/usr/local/hadoop$ jps
10423 DataNode
10938 Jps
10703 SecondaryNameNode
and I cannot browse:
http://my-server-name:50070/
any more.
Hope this gives you some hint what is happening.
I am total beginner with Hadoop and kind of lost now.
[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
[hdfs-site.xml]
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
In mapred-site.xml I have nothing.
1.first stop all the entities like namenode, datanode etc. (you will be having some script or command to do that)
Format tmp directory
Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents in the directory manually
Now format your namenode again
start all the entities then use jps command to confirm that the datanode has been started
Now run whichever application you may like or have.
Hope this helps.
Add this configuration
conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
stop hadoop
bin/stop-all.sh
change permission and remove temp directory data
chmod 755 /var/lib/hadoop/tmp
rm -Rf /var/lib/hadoop/tmp/*
format name node
bin/hadoop namenode -format
After 1 day of struggle, I just removed version 2.4 and installed Hadoop 2.2 (as I realized 2.2 is the latest stable version). Then I got it all working by following this nice tutorial.
http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1
Something is not right with this document about 2.4 which I was reading.
Not to talk that it's not suitable for beginners, and it's usually beginners who stumble upon it.
Maybe your slave's data master's data are not synced, delete data & name folder in ./hadoop/hdfs and recreate them. re-format namenode. Than start dfs.

get "ERROR: Can't get master address from ZooKeeper; znode data == null" when using Hbase shell

I installed Hadoop2.2.0 and Hbase0.98.0 and here is what I do :
$ ./bin/start-hbase.sh
$ ./bin/hbase shell
2.0.0-p353 :001 > list
then I got this:
ERROR: Can't get master address from ZooKeeper; znode data == null
Why am I getting this error ? Another question:
do I need to run ./sbin/start-dfs.sh and ./sbin/start-yarn.sh before I run base ?
Also, what are used ./sbin/start-dfs.sh and ./sbin/start-yarn.sh for ?
Here is some of my conf doc :
hbase-sites.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://127.0.0.1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/Users/apple/Documents/tools/hbase-tmpdir/hbase-data</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/apple/Documents/tools/hbase-zookeeper/zookeeper</value>
</property>
</configuration>
core-sites.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/micmiu/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.native.lib.available</name>
<value>false</value>
</property>
</configuration>
yarn-sites.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
If you just want to run HBase without going into Zookeeper management for standalone HBase, then remove all the property blocks from hbase-site.xml except the property block named hbase.rootdir.
Now run /bin/start-hbase.sh. HBase comes with its own Zookeeper, which gets started when you run /bin/start-hbase.sh, which will suffice if you are trying to get around things for the first time. Later you can put distributed mode configurations for Zookeeper.
You only need to run /sbin/start-dfs.sh for running HBase since the value of hbase.rootdir is set to hdfs://127.0.0.1:9000/hbase in your hbase-site.xml. If you change it to some location on local the filesystem using file:///some_location_on_local_filesystem, then you don't even need to run /sbin/start-dfs.sh.
hdfs://127.0.0.1:9000/hbase says it's a place on HDFS and /sbin/start-dfs.sh starts namenode and datanode which provides underlying API to access the HDFS file system. For knowing about Yarn, please look at http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html.
This could also happen if the vm or the host machine is put to sleep ,Zookeeper will not stay live.
Restarting the VM should solve the problem.
You need to start zookeeper and then run Hbase-shell
{HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
and you may want to check this property in hbase-env.sh
# Tell HBase whether it should manage its own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
Refer to Source - Zookeeper
One quick solution could be to Restart hbase:
1) Stop-hbase.sh
2) Start-hbase.sh
I had the exact same error. The Linux firewall was blocking connectivity. One can test ports via telnet. A quick fix is to turn off the firewall and see if it fixes it:
Completely disable the firewall on all of your nodes. Note: this command will not survive a reboot of your machines.
systemctl stop firewalld
Long term fix is that you must configure the firewall to allow the hbase ports.
Note, your version of hbase may use different ports:
https://issues.apache.org/jira/browse/HBASE-10123
The output from Hbase shell is quite high level that many misconfiguration would cause this message. To help yourself debug, it would be much better to look into the hbase log in
/var/log/hbase
to figure out the root cause of the issue.
I had the same problem too. For me, my root cause was due to hadoop-kms having a conflicting port number with my hbase-master. Both of them are using port 16000 so my HMaster didn't even get started when I invoke hbase shell. After I fixed that, my hbase worked.
Again, kms port conflict might not be your root-cause. Strongly suggest looking into /var/log/hbase to find the root cause.
In my case with same error in running hbase - I did not include the zookeeper properties in the hbase-site.xml and still get the above error messages (as based in Apache hbase guide, only the two properites: rootdir, and distributed are essential).
I can also trace back my output of jps command that find out that indeed my Hregion server and Hmaster were not properly up and running.
After stop and start (like a reset), I did have these two up and running and can run hbase properly.
if it's happening in VMWare or virtual box please restart Cloudera by command init1 please check you have root privilege and retry hope it will help :)
hbase shell

Resources