Why datanode sends the block location information to namenode? - hadoop

On the https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html there are words:
the DataNodes are configured with the location of both NameNodes, and send block location information and heartbeats to both.
But why is this information sent to the namenode and its fallback brother? I thought that this information already contains in the namenode's fs image. The namenode should know where he put blocks.

Name Node contains the meta data of the entire cluster. It contains the details of each folder, file, replication factor, block names etc. The Name Node also stores the information about the location of the blocks for each file (this information is constructed from the Block Reports sent by the Data Nodes) in memory.
Data Nodes store following information for each block:
Actual data stored in the block
Meta data for the data stored in the block. Mainly contains checksums for the data stored in the block.
They periodically send the heart beat and block reports to the Name Node.
Heart Beat:
Interval of heart beat reports is determined by configuration parameter dfs.heartbeat.interval (in hdfs-site.xml). By default this is set to 3 seconds.
Some of the information contained in the Heart beat is:
Registration: Data node registration information
Capacity: Total storage capacity available at Data Node
dfsUsed: Storage used by HDFS
remaining: Remaining storage available for HDFS
blockPoolUsed: Storage used by the block pool
xmitsInProgress: Number of transfers from this Data Node to others
xceiverCount: Number of active transceiver threads
xmitsInProgress: Number of transfers from this Data Node to others
cacheCapacity: Total cache capacity available at Data Node
cacheUsed: Amount of cache used
This information is used by the Name Node in the following ways:
Health of the Data Node: Should this data node be marked as dead or alive?
Registration of new Data Node: If this is a newly added Data Node, its information is registered
Update the metrics of the Data Node: The information sent in the heart beat is used for updating the metrics of the node
Issue commands to the Data Node: The Name Node can issue following commands to the Data Node, based on the information received in the heart beat: BlockRecoveryCommand (to recover specified blocks), BlockCommand (for transferring blocks to another Data Node, for invalidating certain blocks), Cache/Uncache (commands for caching / uncaching the blocks)
Block Reports:
Interval of block reports is determined by configuration dfs.blockreport.intervalMsec (in hdfs-site.xml). By default this is set to 21600000 milliseconds.
Some of the information contained in the block report is:
Registration: Data node registration information
blocks: Information about the blocks, which contains: block ID, block length, block generation timestamp, state of the block replica (For e.g. replica is finalized or waiting to be recovered etc.)
This information is used by the Name Node for:
Process first block report: If it is a first time report for the newly registered Data Node, it just adds all the valid replicas. It ignores all the invalid blocks, till the next block report.
For updating the information about blocks: The (Data Node -> Blocks) map is updated in the Name Node. The new block report is compared with the old report and information about successful blocks, corrupted blocks, invalidated blocks etc. is updated

The Datanodes are not directly accessible from outside the cluster, its in a private network. Hadoop cluster is prone to node failures and the NameNode keeps track of all the data on the different DataNodes. So, any query to the cluster is addressed by the NN and it provides the block address on the DN.

Related

Why cant the metadata be stored in HDFS

Why cant the metadata be stored in HDFS with 3 replication. Why does it store in the local disk?
Because it will take more time to name node in resource allocation due to several I/o operations. So it's better to store metadata in memory of name node.
There are multiple reason
If it stored on HDFS, there will be network I/O. which will be
slower.
Name-node will have dependency on data node for metadata.
Again Metadata will be require for metadata to Name-node, So that it can identify where the metadata is on hdfs.
METADATA is the data about the data such as where the block is stored in rack, so that it can be located and if metadata is stored in hdfs and if those datanodes fail's you will lose all your data because now you don't know how to access those blocks where your data was stored.
Even though if you keep replication factor more, for each changes in datanodes, the changes are made in replicas of data nodes as well as in namenode's edit log.
Now since we have 3 replicas of namenodes for every change in datanode it first have to change in
1.Its own replica blocks
In namenode and replicas of namenode.(edit_log is edited 3times )
This would cause to write more data than first.But data storage is not the only and major problem,the main problem is the time that is required to do all these operations.
Therefore namenodes are backup on remote disk,so that even though your whole clusters get fails(possibilities are less) you can always backup your data.
To save from namenode failure Hadoop comes with
Primary Namenode ->consisits of namespace image and edit logs.
Secondary Namenode -> merging namespace and editlogs so that edit logs dont become too large.

In Hadoop 2.0 era are name node and data node terminology still valid?

Hadoop 2.0 brought in YARN which replaced the tasks of Job Tracker and Task Tracker. YARN consist of Resource Manager(Scheduler, Application Manager...), Node Manager and Application Master.
Does the terminology of data node and name node still exist in hadoop 2.0 environment. If they do what do they mean and what are the functions of these nodes and who manages them. Plus any other useful information please feel free to add.
(ps: might be data node and name node are part of HDFS only and they have nothing to do directly with respect to job processing which is handled by YARN. )
Yes, as you said name node and data node are related to the storage layer of hadoop(HDFS) and not to the processing layer(Map Reduce/Yarn). Name Node and data node are structured in a master/slave architecture where name node its the master and data nodes are the slaves. In a summary their functions are:
Name node: store all the metadata of the file system, including file names, locations, permissions, sizes , mapping of files to blocks, avaliable blocks.
Data node: they are the component responsible for the data itself .
So when you load data to hadoop it will be stored in the data nodes , and the corresponding metadata(file names ,locations, permissions, creation dates, etc) will be stored and indexed in memory on the name node.
Pretty much while some may call them Master/Worker. In short Name node responsible for managing file system namespace (metadata through EditLog and FsImage) and regulates access to files by clients. Clients contact Name node while writing files (where to write, block size) but write them directly onto data nodes. Data nodes actually store the data locally.
http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
And there is Name node HA feature available where there is a Active-hot standby support and fail-over is seamless ( Also Resource Manager HA ).

Storing a file on Hadoop when not all of its replicas can be stored on the cluster

Can somebody let me know what will happen if my Hadoop cluster (replication factor = 3) is only left with 15GB of space and I try to save a file which is 6GB in size?
hdfs dfs -put 6gbfile.txt /some/path/on/hadoop
Will the put operation fail giving error(probably cluster full) or will it save two replicas of the 6GB file and mark the blocks which it cannot save on the cluster as under-replicated and thereby occupying the whole of 15GB of leftover?
You should be able to store the file.
It will try and accommodate as many replicas as possible. When it fails to store all the replicas, it will throw a warning but not fail. As a result, you will land up with under-replicated blocks.
The warning that you would see is
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas
When ever you fire the put command :
dfs utility is behaving like a client here .
client will contact namenode first , then namenode will guide client, where to write the blocks and will keep the maintain metadata for that file , then its client responsibility to break data in block as per configuration specified.
Then client will then make a direct connection with different data nodes , where it has to write different blocks as per namenode reply.
First copy of data would be written by client only on data nodes ,subsequent copies data node will create on each other with the guidance from namenode .
So you should be able to put the file of 6 gb if 15 gb space is there ,because initially the original copies gets created on hadoop , later on once the replication process will start then problem would get arise.

How is data written to HDFS?

I'm trying to understand how is data writing managed in HDFS by reading hadoop-2.4.1 documentation.
According to the following schema :
whenever a client writes something to HDFS, he has no contact with the namenode and is in charge of chunking and replication. I assume that in this case, the client is a machine running an HDFS shell (or equivalent).
However, I don't understand how this is managed.
Indeed, according to the same documentation :
The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
Is the schema presented above correct ? If so,
is the namenode only informed of new files when it receives a Blockreport (which can take time, I suppose) ?
why does the client write to multiple nodes ?
If this schema is not correct, how is file creation working with HDFs ?
As you said DataNodes are responsible for serving read/write requests and block creation/deletion/replication.
Then they send on a regular basis “HeartBeats” ( state of health report) and “BlockReport”( list of blocks on the DataNode) to the NameNode.
According to this article:
Data Nodes send heartbeats to the Name Node every 3 seconds via a TCP
handshake, ... Every tenth heartbeat is a Block Report,
where the Data Node tells the Name Node about all the blocks it has.
So block reports are done every 30 seconds, I don't think that this may affect Hadoop jobs because in general they are independent jobs.
For your question:
why does the client write to multiple nodes ?
I'll say that actually, the client writes to just one datanode and tell him to send data to other datanodes(see this link picture: CLIENT START WRITING DATA ), but this is transparent. That's why your schema considers that the client is the one who is writing to multiple nodes

Where input data gets stored initially?

First step in Map Reduce is to copy input file(s) to HDFS.
Want to know where exactly this gets stored; On name node or data node or somewhere else ?
When we say copy to HDFS, where exactly we store input files initially ?
( I know later we split and store on data nodes ).
Or its something we directly copy from chunks from source/input machine to data nodes ? ( I am sure that is not the case )
Putting files in HDFS is a coordination effort between the client, Name node and the Data nodes. At a very high level the client talks to the name node to identify the data nodes where the file need to be stored, the client then sends the first block to the initial data node and transfers the file, the subsequent transfer for replication of that particular block happens from that particular data node.
Read the detailed protocol from here.

Resources