I used the 'hdfs oiv' command to read the fsimage into a xml file.
hdfs oiv -p XML -i /../dfs/nn/current/fsimage_0000000003132155181 -o fsimage.out
Based on my understanding, fsimage is supposed to store the "blockmap" like how the files got broken into blocks, and where each block is storing. However, here is how a record inode looks like in the output file.
<inode>
<id>37749299</id>
<type>FILE</type>
<name>a4467282506298f8-e21f864f16b2e7c1_468511729_data.0.</name>
<replication>3</replication>
<mtime>1442259468957</mtime>
<atime>1454539092207</atime>
<perferredBlockSize>134217728</perferredBlockSize>
<permission>impala:hive:rw-r--r--</permission>
<blocks>
<block>
<id>1108336288</id>
<genstamp>35940487</genstamp>
<numBytes>16187048</numBytes>
</block>
</blocks>
</inode>
However, I was expecting something like, hdfs path to a file, how that file got broken down into smaller pieces and where each piece has been stored (like which machine, which local fs path...etc...)
Is there a mapping anywhere on the name server containing:
the HDFS path to inode mapping
the blockid to local file system path / disk location mapping?
A bit late, but since I am looking into this now and stumbled across your question.
First of all, a bit of context.
(I am working with Hadoop 2.6)
The Name server is responsible for maintaining the INodes, which is in-memory representation of the (virtual) filesystem structure, while Blocks being maintained by the data nodes. I believe that there are several reason for Name node not to maintain the rest of the information, like the links to the data nodes where the data is stored within the each INode:
It would require more memory to represent all that information (memory is the resource which actually limits the amount of files which can be writing into HDFS cluster, since the whole structure is maintained in RAM, for faster access)
Would induce more workload on the name node, in case for example if the file is moved from one node to another, or new node is installed and the file needs to be replicated to it. Each time it would happen, Name node would need to update its state.
Flexibility, since the INode is an abstraction, thus adding the link would bind it to determined technology and communication protocol
Now coming back to your questions:
The fsimage file already contains the mapping to HDFS path. If you look more carefully in the XML, each INode, regardless its type has an ID (in you case it is 37749299). If you look further in the file, you can find the section <INodeDirectorySection>, which has the mapping between the parent and children and it is this ID field which is used to determine the relation. Through the <name> attribute you can easily determine the structure you see for example in the HDFS explorer.
Furthermore, you have <blocks> section, which has block ID (in your case it is 1108336288). If you look carefully into the sources of the Hadoop, you can find the method idToBlockDir in the DatanodeUtil which gives you a hint how the files are being organized on the disk and block id mapping is performed.
Basically the original id is being shifted twice (by 16 and by 8 bits).
int d1 = (int)((blockId >> 16) & 0xff);
int d2 = (int)((blockId >> 8) & 0xff);
And the final directory is built using obtained values:
String path = DataStorage.BLOCK_SUBDIR_PREFIX + d1 + SEP + DataStorage.BLOCK_SUBDIR_PREFIX + d2;
Where the block is stored using in the file which uses blk_<block_id> naming format.
I not a Hadoop expert, so if someone who understands this better could correct any of the flows in my logic, please do so. Hope this helps.
Related
Consider I have a single File which is 300MB. The block size is 128MB.
So the input file is divided into the following chunks and placed in HDFS.
Block1: 128MB
Block2: 128MB
Block3: 64MB.
Now Does each block's data has byte offset information contained in it.
That is, do the blocks have the following offset information?
Block1: 0-128MB of File
Block2 129-256MB of File
Block3: 257MB-64MB of file
If so, how can I get the byte-offset information for Block2 (That is it starts at 129MB) in Hadoop.
This is for understanding purposes only. Any hadoop command-line tools to get this kind of meta data about the blocks?
EDIT
If the byte-offset info is not present, a mapper performing its map job on a block will start consuming lines from the beginning. If the offset information is present, the mapper will skip till it finds the next EOL and then starts processing the records.
So I guess byte offset information is present inside the blocks.
Disclaimer: I might be wrong on this one I have not read that much of the HDFS source code.
Basically, datanodes manage blocks which are just large blobs to them. They know the block id but that its. The namenode knows everything, especially the mapping between a file path and all the block ids of this file and where each block is stored. Each block id can be stored in one or more locations depending of its replication settings.
I don't think you will find public API to get the information you want from a block id because HDFS does not need to do the mapping this way. On the opposite you can easily know the blocks and their locations of a file. You can try explore the source code, especially the blockmanager package.
If you want to learn more, this article about the HDFS architecture could be a good start.
You can run hdfs fsck /path/to/file -files -blocks to get the list of blocks.
A Block does not contain offset info, only length. But you can use LocatedBlocks to get all blocks of a file and from this you can easily reconstruct each block what offset it starts at.
I have started learning Hadoop and just completed setting up a single node as demonstrated in hadoop 1.2.1 documentation
Now I was wondering if
When files are stored in this type of FS should I use a hierachial mode of storage - like folders and sub-folders as I do in Windows or files are just written into as long as they have a unique name?
Is it possible to add new nodes to the single node setup if say somebody were to use it in production environment. Or simply can a single node be converted to a cluster without loss of data by simply adding more nodes and editing the configuration?
This one I can google but what the hell! I am asking anyway, sue me. What is the maximum number of files I can store in HDFS?
When files are stored in this type of FS should I use a hierachial mode of storage - like folders and sub-folders as I do in Windows or files are just written into as long as they have a unique name?
Yes, use the directories to your advantage. Generally, when you run jobs in Hadoop, if you pass along a path to a directory, it will process all files in that directory. So.. you really have to use them anyway.
Is it possible to add new nodes to the single node setup if say somebody were to use it in production environment. Or simply can a single node be converted to a cluster without loss of data by simply adding more nodes and editing the configuration?
You can add/remove nodes as you please (unless by single-node, you mean pseudo-distributed... that's different)
This one I can google but what the hell! I am asking anyway, sue me. What is the maximum number of files I can store in HDFS?
Lots
To expand on climbage's answer:
Maximum number of files is a function of the amount of memory available to your Name Node server. There is some loose guidance that each metadata entry in the Name Node requires somewhere between 150-200 bytes of memory (it alters by version).
From this you'll need to extrapolate out to the number of files, and the number of blocks you have for each file (which can vary depending on file and block size) and you can estimate for a given memory allocation (2G / 4G / 20G etc), how many metadata entries (and therefore files) you can store.
In HDFS , the blocks are distributed among the active nodes/slaves. The content of the blocks are simple text so is there any way to see read or access the blocks present in each data node ?
As an entire file or to read a single block (say block number 3) out of sequence?
You can read the file via various mechanisms including the Java API but you cannot start reading in the middle of the file (for example at the start of block 3).
Hadoop reads a block of data and feeds each line to the mapper for further processing. Also, the Hadoop clients gets the blocks related to a file from different Data Nodes before concatenating them. So, it should be possible to get the data from a particular block.
Hadoop Client might be a good place to start with to look at the code. But, HDFS provides file system abstraction. Not sure what the requirement would be for reading the data from a particular block.
Assuming you have ssh access (and appropriate permissions) to the datanodes, you can cd to the path where the blocks are stored and read the blocks stored on that node (e.g., do a cat BLOCK_XXXX). The configuration parameter that tells you where the blocks are stored is dfs.datanode.data.dir, which defaults to file://${hadoop.tmp.dir}/dfs/data. More details here.
Caveat: the block names are coded by HDFS depending on their internal block ID. Just by looking at their names, you cannot know to which file a block belongs.
Finally, I assume you want to do this for debugging purposes or just to satisfy your curiosity. Normally, there is no reason to do this and you should just use the HDFS web-UI or command-line tools to look at the contents of your files.
I am not sure if this questions belongs here. If not, then I apologize. I am reading the HDFS paper and am finding it difficult to understand a few terminologies. Please find my questions below.
1) As per the paper, "The HDFS namespace is a hierarchy of files and directories. Files and directories are represented on the NameNode by inodes, which record attributes like permissions, modification and access times, namespace and disk space quotas."
What exactly does namespace information mean in inode. Does it mean the complete path of the file? Because, the previous statement says "The HDFS namespace is a hierarchy of files and directories".
2) As per the paper "The NameNode maintains the namespace tree and the mapping of file blocks to DataNodes
(the physical location of file data)." Are both namespace tree and namespace the same? Please refer to point 1 about definition of the namespace. How is the namespace tree information stored? Is it stored as part of inodes where each inode will also have a parent inode pointer?
3) As per the paper, "HDFS keeps the entire namespace in RAM. The inode data and the list of blocks belonging to each file comprise the metadata of the name system called the image." Does the image also contain the namespace?
4) What is the use of a namespace id? Is it used to distinguish between two different file system instances?
Thanks,
Venkat
What exactly does namespace information mean in inode. Does it mean the complete path of the file? Because, the previous statement says "The HDFS namespace is a hierarchy of files and directories
It means that you can browse your files like you do on your system ( via commands like hadoop dfs -ls) you will see results like : /user/hadoop/myFile.txt but physically this file is distributed on your cluster in several blocks according to your replication factor
Are both namespace tree and namespace the same? Please refer to point 1 about definition of the namespace. How is the namespace tree information stored? Is it stored as part of inodes where each inode will also have a parent inode pointer?
When you copy a file on your HDFS with commands like hadoop dfs -copyFrom local myfile.txt /user/hadoop/myfile.txt, the file is splitted according to the dfs.block.size value (default is 64MB). Then blocks are distributed on your datanodes (nodes used for storage). The namenode keep a map of all blocks in order to verify your data integrity when it starts (or with commands like hadoop fsck /).
Does the image also contain the namespace?
For this one I am not sure but I think the namespace is in RAM too.
What is the use of a namespace id? Is it used to distinguish between two different file system instances?
Yes, the namespace id is just an ID, it ensures the datanode data coherence.
I hope that helps you even it is far away from an exhaustive explanation.
Is there a programatic interface to find out whether given a blockID it exists on which data node.
I.e. ability to read fsImage and return this info.
One of the crude ways i know of is to look for a file with blockName in the dfs data dir.
This is however an O(n) solution and i am pretty sure there would be an O(1) solution to this.
Similar to how to find file from blockName in HDFS hadoop, there is no public interface to the namenode which will allow you to lookup information froma block Id (only by file name).
You can look at opening the fsImage but this will only give you a mapping from block ID to filename as the actual locations (DataNodes) which host the blocks are not stored in this file - the data nodes treewalk their data directories and report to the NameNode what blocks thay have.
I guess if you could attach a debugger to the name node, you might be able to inspect the block map, but because there is no map from ID to filename, it's still going to be a O(n) operation