Why cant the metadata be stored in HDFS - hadoop

Why cant the metadata be stored in HDFS with 3 replication. Why does it store in the local disk?

Because it will take more time to name node in resource allocation due to several I/o operations. So it's better to store metadata in memory of name node.

There are multiple reason
If it stored on HDFS, there will be network I/O. which will be
slower.
Name-node will have dependency on data node for metadata.
Again Metadata will be require for metadata to Name-node, So that it can identify where the metadata is on hdfs.

METADATA is the data about the data such as where the block is stored in rack, so that it can be located and if metadata is stored in hdfs and if those datanodes fail's you will lose all your data because now you don't know how to access those blocks where your data was stored.
Even though if you keep replication factor more, for each changes in datanodes, the changes are made in replicas of data nodes as well as in namenode's edit log.
Now since we have 3 replicas of namenodes for every change in datanode it first have to change in
1.Its own replica blocks
In namenode and replicas of namenode.(edit_log is edited 3times )
This would cause to write more data than first.But data storage is not the only and major problem,the main problem is the time that is required to do all these operations.
Therefore namenodes are backup on remote disk,so that even though your whole clusters get fails(possibilities are less) you can always backup your data.
To save from namenode failure Hadoop comes with
Primary Namenode ->consisits of namespace image and edit logs.
Secondary Namenode -> merging namespace and editlogs so that edit logs dont become too large.

Related

From where the namenode gets information of the datanode

While saving a file on the HDFS, it will split the file and store accordingly and stores the information on the edit log and it's all fine.
My question is: when I request the read operation to the namenode, from where it will look the datanode details?
From fsimage or the edit log?
If it is looking from the fsimage, a new fsimage will be generated at one hour interval.
If I'd request it before that time interval, what would happen?
Let's break down where each bit of information about the filesystem is stored on the NameNode.
The filesystem namespace (hierarchy of directories and files) is stored entirely in memory on the NameNode. There is no on-disk caching. Everything is in memory at all times. The FsImage is used only for persistence in case of failure. It is read only on startup. The EditLog stores changes to the FsImage; again, the EditLog is read only on startup. The active NameNode will never read an FsImage or EditLog during normal operation. However, a BackupNode or Standby NameNode (depending on your configuration) will periodically combine new EditLog entries with an old FsImage to produce a new FsImage. This is done to make startup more rapid and to reduce the size of on-disk data structures (if no compaction was done, the size of the EditLog would grow indefinitely).
The namespace discussed above includes the mapping from a file to the blocks contained within that file. This information is persisted in the FsImage / EditLog. However, the location of those blocks is not persisted into the FsImage. This information lives only transiently in the memory of the NameNode. On startup, the location of the blocks is reconstructed using the blocks reports received from all of the DataNodes. Each DataNode essentially tells the NameNode, "I have block ID AAA, BBB, CCC, ..." and so on, and the NameNode uses these reports to construct the location of all blocks.
To answer your question simply, when you request a read operation from the NameNode, all information is read from memory. Disk I/O is only performed on a write operation, to persist the change to the EditLog.
Primary Source: HDFS Architecture Guide; also I am a contributor to HDFS core code.

understanding how hbase uses hdfs

I’m trying to understand how hbase uses the hdfs.
so here is what I understand (please correct me if I'm wrong):
I know that hbase use hdfs to store data and that data is split into regions, and that each region server my serve many regions,so I guess that one region (exclusively) may communicate with many data node to get and put data, so If that is correct then if that region server fails then data stored in those data node, will not be accessible anymore
thank you in advance :)
In general, a Regionserver runs on a datanode.
Due to how HDFS works, the Regionserver will perform its reads and writes to the local datanode when possible, and then HDFS will ensure that the data is replicated onto two other random datanodes. So at all times, the data written by that regionserver is stored on 3 nodes in HDFS.
While a regionserver is serving a region, only it will read / write the data for that region, but if the regionserver process crashes, the HBase master will select another regionsever to serve that region. The data will be unavailable for a few minutes, but HBase will recover quickly.
If the entire host fails, then as HDFS ensured the data was written onto two other nodes, the scenario is the same - the master will select a new regionserver to open the failed region and the data not be lost.

Hive Tables in multiple nodes - Processing

I have a conceptual doubt in Hive. I know that Hive s a data warehouse tool that runs on top of Hadoop. We know that Hadoop has a distributed file system -HDFS.
Suppose, I have one master and three slaves. Now, I have created a table employees in HiveQL. The table is so huge that it cant be stored in one machine. Hence it must be stored in all four machines. How can I load such data. Should it be done manually. Or like I type "LOAD DATA ... " in the master and it will be automatically get distributed among all the machines.
Hive uses HDFS as warehouse to store the data. So HDFS concept is used for data storage.
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files.
Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
Please refer HDFS architecture for more detail.

Storing a file on Hadoop when not all of its replicas can be stored on the cluster

Can somebody let me know what will happen if my Hadoop cluster (replication factor = 3) is only left with 15GB of space and I try to save a file which is 6GB in size?
hdfs dfs -put 6gbfile.txt /some/path/on/hadoop
Will the put operation fail giving error(probably cluster full) or will it save two replicas of the 6GB file and mark the blocks which it cannot save on the cluster as under-replicated and thereby occupying the whole of 15GB of leftover?
You should be able to store the file.
It will try and accommodate as many replicas as possible. When it fails to store all the replicas, it will throw a warning but not fail. As a result, you will land up with under-replicated blocks.
The warning that you would see is
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas
When ever you fire the put command :
dfs utility is behaving like a client here .
client will contact namenode first , then namenode will guide client, where to write the blocks and will keep the maintain metadata for that file , then its client responsibility to break data in block as per configuration specified.
Then client will then make a direct connection with different data nodes , where it has to write different blocks as per namenode reply.
First copy of data would be written by client only on data nodes ,subsequent copies data node will create on each other with the guidance from namenode .
So you should be able to put the file of 6 gb if 15 gb space is there ,because initially the original copies gets created on hadoop , later on once the replication process will start then problem would get arise.

Where does Name node store fsImage and edit Log?

I am a java programmer, learning Hadoop.
I read that the Name node in HDFS stores its information into two files namely fsImage & editLog. In case of start up it reads this data from the disk & performs checkpoint operation.
But at many places I also read that Name Node stores the data in RAM & that is why apache recommends a machine with high RAM for Name Node server.
Please enlighten me on this.
What data does it store in RAM & where does it store fsImage and edit Log ?
Sorry if I asked anything obvious.
Let me first answer
What data does it store in RAM & where does it store fsImage and edit Log ?
In RAM -- file to block and block to data node mapping.
In persistent storage (includes both edit log and fsimage) -- file related metadata (permissions, name and so on)
Regarding the storage location of the fsimage and editlog #mashuai's answer is spot on.
For a more detailed discussion you can read up on this
When namenode starts, it loads fsimage from persistent storage(disk) it's location specified by the property dfs.name.dir (hadoop-1.x) or dfs.namenode.name.dir (hadoop-2.x) in hdfs-site.xml. Fsimage is loaded into main memory. Also as you asked during namenode starting it performs check point operation. Namenode keeps the Fsimage in RAM inorder to serve requests fast.
Apart from initial checkpoint, subsequent checkpoints can be controlled by tuning the following parameters in hdfs-site.xml.
dfs.namenode.checkpoint.period # in second 3600 Secs by default
dfs.namenode.checkpoint.txns # No of namenode transactions
It store fsimage and editlog in dfs.name.dir , it's in hdfs-site.xml. When you start the cluster, NameNode load fsimage and editlog to the memory.
When Name Node starts, it goes in safe mode. It loads FSImage from persistent storage and replay edit logs to create updated view of HDFS storage(FILE TO BLOCK MAPPING). Then it writes this updated FSImage to to persistent storage. Now Name node waits for block reports from data nodes. From block reports it creates BLOCK TO DATA NODE MAPPING. When name node received certain threshold of block reports, it goes out of safe mode and Name Node can start serving client requests. Whenever any change in meta data done by client, NameNode(NN) first write thing change in edit log segment with increasing transaction ID to persistent storage (Hard Disk). Then it updates FSImage present in its RAM.
Fsimage and editlog are stored in dfs.name.dir , it's in hdfs-site.xml.
During the start of cluster, NameNode load fsimage and editlog to the memory(RAM).

Resources