Is there a programatic interface to find out whether given a blockID it exists on which data node.
I.e. ability to read fsImage and return this info.
One of the crude ways i know of is to look for a file with blockName in the dfs data dir.
This is however an O(n) solution and i am pretty sure there would be an O(1) solution to this.
Similar to how to find file from blockName in HDFS hadoop, there is no public interface to the namenode which will allow you to lookup information froma block Id (only by file name).
You can look at opening the fsImage but this will only give you a mapping from block ID to filename as the actual locations (DataNodes) which host the blocks are not stored in this file - the data nodes treewalk their data directories and report to the NameNode what blocks thay have.
I guess if you could attach a debugger to the name node, you might be able to inspect the block map, but because there is no map from ID to filename, it's still going to be a O(n) operation
Related
I used the 'hdfs oiv' command to read the fsimage into a xml file.
hdfs oiv -p XML -i /../dfs/nn/current/fsimage_0000000003132155181 -o fsimage.out
Based on my understanding, fsimage is supposed to store the "blockmap" like how the files got broken into blocks, and where each block is storing. However, here is how a record inode looks like in the output file.
<inode>
<id>37749299</id>
<type>FILE</type>
<name>a4467282506298f8-e21f864f16b2e7c1_468511729_data.0.</name>
<replication>3</replication>
<mtime>1442259468957</mtime>
<atime>1454539092207</atime>
<perferredBlockSize>134217728</perferredBlockSize>
<permission>impala:hive:rw-r--r--</permission>
<blocks>
<block>
<id>1108336288</id>
<genstamp>35940487</genstamp>
<numBytes>16187048</numBytes>
</block>
</blocks>
</inode>
However, I was expecting something like, hdfs path to a file, how that file got broken down into smaller pieces and where each piece has been stored (like which machine, which local fs path...etc...)
Is there a mapping anywhere on the name server containing:
the HDFS path to inode mapping
the blockid to local file system path / disk location mapping?
A bit late, but since I am looking into this now and stumbled across your question.
First of all, a bit of context.
(I am working with Hadoop 2.6)
The Name server is responsible for maintaining the INodes, which is in-memory representation of the (virtual) filesystem structure, while Blocks being maintained by the data nodes. I believe that there are several reason for Name node not to maintain the rest of the information, like the links to the data nodes where the data is stored within the each INode:
It would require more memory to represent all that information (memory is the resource which actually limits the amount of files which can be writing into HDFS cluster, since the whole structure is maintained in RAM, for faster access)
Would induce more workload on the name node, in case for example if the file is moved from one node to another, or new node is installed and the file needs to be replicated to it. Each time it would happen, Name node would need to update its state.
Flexibility, since the INode is an abstraction, thus adding the link would bind it to determined technology and communication protocol
Now coming back to your questions:
The fsimage file already contains the mapping to HDFS path. If you look more carefully in the XML, each INode, regardless its type has an ID (in you case it is 37749299). If you look further in the file, you can find the section <INodeDirectorySection>, which has the mapping between the parent and children and it is this ID field which is used to determine the relation. Through the <name> attribute you can easily determine the structure you see for example in the HDFS explorer.
Furthermore, you have <blocks> section, which has block ID (in your case it is 1108336288). If you look carefully into the sources of the Hadoop, you can find the method idToBlockDir in the DatanodeUtil which gives you a hint how the files are being organized on the disk and block id mapping is performed.
Basically the original id is being shifted twice (by 16 and by 8 bits).
int d1 = (int)((blockId >> 16) & 0xff);
int d2 = (int)((blockId >> 8) & 0xff);
And the final directory is built using obtained values:
String path = DataStorage.BLOCK_SUBDIR_PREFIX + d1 + SEP + DataStorage.BLOCK_SUBDIR_PREFIX + d2;
Where the block is stored using in the file which uses blk_<block_id> naming format.
I not a Hadoop expert, so if someone who understands this better could correct any of the flows in my logic, please do so. Hope this helps.
Consider I have a single File which is 300MB. The block size is 128MB.
So the input file is divided into the following chunks and placed in HDFS.
Block1: 128MB
Block2: 128MB
Block3: 64MB.
Now Does each block's data has byte offset information contained in it.
That is, do the blocks have the following offset information?
Block1: 0-128MB of File
Block2 129-256MB of File
Block3: 257MB-64MB of file
If so, how can I get the byte-offset information for Block2 (That is it starts at 129MB) in Hadoop.
This is for understanding purposes only. Any hadoop command-line tools to get this kind of meta data about the blocks?
EDIT
If the byte-offset info is not present, a mapper performing its map job on a block will start consuming lines from the beginning. If the offset information is present, the mapper will skip till it finds the next EOL and then starts processing the records.
So I guess byte offset information is present inside the blocks.
Disclaimer: I might be wrong on this one I have not read that much of the HDFS source code.
Basically, datanodes manage blocks which are just large blobs to them. They know the block id but that its. The namenode knows everything, especially the mapping between a file path and all the block ids of this file and where each block is stored. Each block id can be stored in one or more locations depending of its replication settings.
I don't think you will find public API to get the information you want from a block id because HDFS does not need to do the mapping this way. On the opposite you can easily know the blocks and their locations of a file. You can try explore the source code, especially the blockmanager package.
If you want to learn more, this article about the HDFS architecture could be a good start.
You can run hdfs fsck /path/to/file -files -blocks to get the list of blocks.
A Block does not contain offset info, only length. But you can use LocatedBlocks to get all blocks of a file and from this you can easily reconstruct each block what offset it starts at.
Introduction
Follow-up question to this question.
A File has been provided to HDFS and has been subsequently replicated to three DataNodes.
If the same file is going to be provided again, HDFS indicates that the file already exists.
Based on this answer a file will be split into blocks of 64MB (depending on the configuration settings). A mapping of the filename and the blocks will be created in the NameNode. The NameNode knows in which DataNodes the blocks of a certain file reside. If the same file is provided again the NameNode knows that blocks of this file exists on HDFS and will indicate that the file already exits.
If the content of a file is changed and provided again does the NameNode update the existing file or is the check restricted to mapping of filename to blocks and in particular the filename? Which process is responsible for this?
Which process is responsible for splitting a file into blocks?
Example Write path:
According to this documentation the Write Path of HBase is as follows:
Possible Write Path HDFS:
file provided to HDFS e.g. hadoop fs -copyFromLocal ubuntu-14.04-desktop-amd64.iso /
FileName checked in FSImage whether it already exists. If this is the case the message file already exists is displayed
file split into blocks of 64MB (depending on configuration
setting). Question: Name of the process which is responsible for block splitting?
blocks replicated on DataNodes (replication factor can be
configured)
Mapping of FileName to blocks (MetaData) stored in EditLog located in NameNode
Question
How does the HDFS' Write Path look like?
If the content of a file is changed and provided again does the NameNode update the existing file or is the check restricted to mapping of filename to blocks and in particular the filename?
No, it does not update the file. The name node only checks if the path (file name) already exists.
How does the HDFS' Write Path look like?
This is explained in detail in this paper: "The Hadoop Distributed File System" by Shvachko et al. In particular, read Section 2.C (and check Figure 1):
"When a client writes, it first asks the NameNode to choose DataNodes to host replicas of the first block of the file. The client organizes a pipeline from node-to-node and sends the data. When the first block is filled, the client requests new DataNodes to be chosen to host replicas of the next block. A new pipeline is organized, and the client sends the further bytes of the file. Choice of DataNodes for each block is likely to be different. The interactions among the client, the NameNode and the DataNodes are illustrated in Fig. 1."
NOTE: A book chapter based on this paper is available online too. And a direct link to the corresponding figure (Fig. 1 on the paper and 8.1 on the book) is here.
In HDFS , the blocks are distributed among the active nodes/slaves. The content of the blocks are simple text so is there any way to see read or access the blocks present in each data node ?
As an entire file or to read a single block (say block number 3) out of sequence?
You can read the file via various mechanisms including the Java API but you cannot start reading in the middle of the file (for example at the start of block 3).
Hadoop reads a block of data and feeds each line to the mapper for further processing. Also, the Hadoop clients gets the blocks related to a file from different Data Nodes before concatenating them. So, it should be possible to get the data from a particular block.
Hadoop Client might be a good place to start with to look at the code. But, HDFS provides file system abstraction. Not sure what the requirement would be for reading the data from a particular block.
Assuming you have ssh access (and appropriate permissions) to the datanodes, you can cd to the path where the blocks are stored and read the blocks stored on that node (e.g., do a cat BLOCK_XXXX). The configuration parameter that tells you where the blocks are stored is dfs.datanode.data.dir, which defaults to file://${hadoop.tmp.dir}/dfs/data. More details here.
Caveat: the block names are coded by HDFS depending on their internal block ID. Just by looking at their names, you cannot know to which file a block belongs.
Finally, I assume you want to do this for debugging purposes or just to satisfy your curiosity. Normally, there is no reason to do this and you should just use the HDFS web-UI or command-line tools to look at the contents of your files.
I use the command line and I want to know from which host I get a file (or which replica I get).
Normally it should be the nearest to me. But I changed a policy for the project. Thus I want to check the final results to see if my new policy works correctly.
Following command does not give any information:
hadoop dfs -get /file
And the next one gives me only the replica's position, but not which one is preferred for the get:
hadoop fsck /file -files -blocks -locations
HDFS abstracts this information away as it is not very useful for users to know where they are reading from (the filesystem is designed to be as less in your way as possible). Typically, the DFSClient intends to pick up the data in order of the hosts returned to it (moving onto an alternative in case of a failure). The hosts returned to it is sorted by the NameNode for appropriate data or rack locality - and that is how the default scenario works.
While the proper answer for your question would be to write good test cases that can both simulate and assert this, you can also run your program with the Hadoop logger set to DEBUG, to check the IPC connections made to various hosts (including DNs) when reading the files - and go through these to assert manually that your host-picking is working as intended.
Another way would be to run your client through a debugger and observe the parts around the connections made finally to retrieve data (i.e. after NN RPCs).
Thanks,
We finally use the networks statistics with a simple test case to find where hadoop takes the replicas.
But the easiest way is to print the array nodes modified by this method:
org.apache.hadoop.net.NetworkTopology pseudoSortByDistance( Node reader, Node[] nodes )
As we expected, the get of the replicas is based on the results of the methods. The firsts items are preferred. Normally the first item is taken except if there is an error with the node. For more information about this method, see Replication