I have a hdfs setup running in psuedo cluster mode. If I add an additional data node to I need to run the hdfs format on the new node before or after adding it as an additional slave.
refer to this link, there is no need to format any node. just do the instructions of the mentioned link.
There is no command to format a Data Node. After adding a Data Node, you should just start that Data Node.
You can only format a Name Node.
No need to format Namenode After adding new datanode. You just need to refresh the nodes by
hadoop dfsadmin -refreshNodes
Related
Where as i find many solutions for taking the back up of metadata in name node and would like to know how to take the back up of datanode? leaving replication factor aside but want to know the detail process to take the back up of data node in the production level for 20 node cluster.
distcp command in hadoop, can copy data from source clusster to target
for example :
hadoop distcp hftp://cdh57-namenode:50070/hbase hdfs://CDH59-nameservice/hbase
this command copy hbase folder from cdh57-namenode to CDH59-nameservice
more information can obtain from this link
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_admin_distcp_data_cluster_migrate.html
I'm a novice. I have a 3-Node Cluster. The Name Node, Job Tracker and Secondary Name Node are running in one node and two data nodes (HData1, HData2) in the other two cluster. If I store data from my local system to HDFS, how to find in which node it resides? Is there a way I can explicitly specify in which data node it has to be stored?
Thanks in advance!
Yes you can find it using hadoop fsck path
you can refer below links
how does hdfs choose a datanode to store
How to explicilty define datanodes to store a particular given file in HDFS?
Does cluster id of hadoop node changes with time? If yes, when does it change and what precautions I need to take while using the cluster id in JAVA code?
Cluster ID doesnot change in your landscape. It will change only when u reformat namenode. So dont reformat namenode.
I want to know that Hadoop how to process disk data after reformatted namenode.
The namenode stores metasore of clusters , In HDFS data from disk data mapping.
when reformatting namenode , hadoop how to execute deletion of disk data?
I will appreciate it!
If you reformat namenode, all the data in namenode is deleted and your namenode becomes fresh. It will be given a new cluster id. All the other nodes are now having old cluster id. So its always a bad idea to reformat namenode when cluster is active .
If you do this accidentally you need to restore namenode metadata from secondary namenode.
If u want to delete data in your data nodes you can simply execute hadoop hdfs -rm -R /
Shall we copyFromlocal/put file to hdfs before processing map-reduce job? When I run mapreduce example I was taught to format hdfs in master node and copyFromLocal files to that hdfs space in master.
Then why some tutorials said master nodes just inform metadata to client.The laptop(client) will copy file blocks to data nodes not to master? e.g. http://www.youtube.com/watch?v=ziqx2hJY8Hg at 25:50. My understanding based on this tutorial is that the file (splitted by blocks) will be copied to slave nodes. so we do not need to copyFromlocal /put files to master nodes. I was so confused. Can anybody explain where will files copied/replicated to?
Blocks will not be copied to master node.
The master (Namenode) sends meta data to the client containing the data node locations
for placing each block by the client.
No actual block data is transferred to the NameNode.
I found this comic to be a good hdfs explanation
The master node (Namenode) in hadoop just deals with the Metadata (Datanode<->data information). It does not deal with the actual files. The actual files are instead stored only in the datanodes.