Hadoop : swap DataNode & NameNode without losing any HDFS data - hadoop

I have a cluster of 5 machines:
1 big NameNode
4 standard DataNodes
I want to change my current NameNode with a DataNode without losing the data stored in HDFS, so my cluster could become:
1 standard NameNode
3 standard DataNodes
1 big DataNode
Does someone know a simple way to do that?
Thank you very much

Decomission data node where namenode will be moved.
Stop the cluster.
Create a tar of dfs.name.dir from current namenode.
Copy all hadoop config files from current NN to target NN.
Replace the name/ip of target namenode by modifying core-site.xml.
Restore tarball of dfs.name.dir. Make sure that full path is same.
Now start the cluster by starting new namenode and one less datanode.
Verify that everything is working perfectly.
Add old namenode as datanode by configuring it as datanode.
I would suggest to uninstall and then install hadoop on both the nodes so that previous configuration does not cause any problem.

Related

Hadoop HDFS start up fails requires formatting

I have a multi-node standalone hadoop cluster for HDFS. I am able to load data to HDFS, however everytime I reboot my computer and start the cluster by start-dfs.sh, I don't see the dashboard until I perform hdfs namenode -format which erases all my data.
How do I start hadoop cluster without having to go through hdfs namenode -format?
You need to shutdown hdfs and the namenode cleanly (stop-dfs) before you shutdown your computer. Otherwise, you can corrupt the namenode, causing you to need to format to get back to a clean state

How to read Hadoop HDFS 64b compressed files from one Hadoop cluster on another Hadoop cluster

I have 64 b compressed Hadoop HDFS files(FSImage, edit logs, blk_*.meta files etc) of one Hadoop cluster and I want to read them on my local VM. As an option I'm thinking like:
Taking backup of my DataNode, NameNode and SecondaryNameNode
Format the DataNode, NameNode and SecondaryNameNode
Copy the files into respective location
Map the configuration properties
Restart the HDFS.
This might be the one of the way. Please help me to figure out if there is any alternative for the above approach. My apologies that i can not share the data.

How to bring down your namenode?

How to bring down your Namenode in Hadoop 1.2.1 on CentOs and swap your namenode with a Datanode instance, also I have to make sure no data is lost during the process.
I am using Hadoop 1.2.1 with master, slave 1 and slave 2 nodes.
I am looking for the Unix commands or the changes I need to make in the configuration files.
Please ask for any particular details if needed!
You can take a back up of namenode metadata and kill namenode. Install namenode packages on other node of interest and put the backup copy of metadata in namenode data dir. Now start namenode this should pick up your old metadata. Remember to change namenode details in all config files.

How to Switch between namenodes in hadoop?

Pseudomode Cluster:
Suppose first time I created a namenode on Machine "A" with name "Root1".
This will create a HDFS on tha machine.
Now i copy some file to HDFS using copyFromLocal and do some mapreduce.
Now i need to change some /conf files.
I'll change config file and to make them effective I formatted namenode with name "Root2".
If i browse the HDFS , it will be empty (means it will not contain those which copied earlier for "Root1").
If I want to see old file (for "Root1"), is there any way to switch to that HDFS or namenode (Root2 to Root1 ) ??
To be clear. Did you launch the another namenode on your machine ?
Type sudo jps in console or http://localhost:50070 in browser and check if you have more than one datanode. If there is just one node you lost your data from HDFS. If you have two namenodes you can check the filesystem in Internet browser on http://localhost:50070.
Here is instruction how to launch more than one datanode on one machine.

How to remove a hadoop node from DFS but not from Mapred?

I am fairly new to hadoop. For running some benchmarks, I need variety of hadoop configuration for comparison.
I want to know a method to remove a hadoop slave from DFS (not running datanode daemon anymore) but not from Mapred (keep running tasktracker), or vice-versa.
AFAIK, there is a single slave file for such hadoop nodes and not separate slave files for DFS and Mapred.
Currently, I am trying to start both DFS and Mapred on the slave node , and then killing datanode on the slave. But it takes a while to put that node in to 'dead nodes' on HDFS GUI. Any parameter can be tuned to make this timeout quicker ?
Thankssss
Try using dfs.hosts and dfs.hosts.exclude in the hdfs-site.xml, mapred.hosts and mapred.hosts.exclude in mapred-site.xml. These are for allowing/excluding hosts to connect to the NameNode and the JobTracker.
Once the list of nodes in the files has been updated appropriately, the NameNode and the JobTracker have to be refreshed using the hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes command respectively.
Instead of using slaves file to start all processes on your cluster, you can start only required daemons on each machine if you have few nodes.

Resources