Do blocks in HDFS have byte-offset information stored in Hadoop? - hadoop

Consider I have a single File which is 300MB. The block size is 128MB.
So the input file is divided into the following chunks and placed in HDFS.
Block1: 128MB
Block2: 128MB
Block3: 64MB.
Now Does each block's data has byte offset information contained in it.
That is, do the blocks have the following offset information?
Block1: 0-128MB of File
Block2 129-256MB of File
Block3: 257MB-64MB of file
If so, how can I get the byte-offset information for Block2 (That is it starts at 129MB) in Hadoop.
This is for understanding purposes only. Any hadoop command-line tools to get this kind of meta data about the blocks?
EDIT
If the byte-offset info is not present, a mapper performing its map job on a block will start consuming lines from the beginning. If the offset information is present, the mapper will skip till it finds the next EOL and then starts processing the records.
So I guess byte offset information is present inside the blocks.

Disclaimer: I might be wrong on this one I have not read that much of the HDFS source code.
Basically, datanodes manage blocks which are just large blobs to them. They know the block id but that its. The namenode knows everything, especially the mapping between a file path and all the block ids of this file and where each block is stored. Each block id can be stored in one or more locations depending of its replication settings.
I don't think you will find public API to get the information you want from a block id because HDFS does not need to do the mapping this way. On the opposite you can easily know the blocks and their locations of a file. You can try explore the source code, especially the blockmanager package.
If you want to learn more, this article about the HDFS architecture could be a good start.

You can run hdfs fsck /path/to/file -files -blocks to get the list of blocks.
A Block does not contain offset info, only length. But you can use LocatedBlocks to get all blocks of a file and from this you can easily reconstruct each block what offset it starts at.

Related

Hadoop : Why using FileSplit in the RecordReader Implementation

In Hadoop, Considering a scenario if a bigfile is already loaded into the hdfs filesystem, using either hdfs dfs put or hdfs dfs CopyFromLocal command, the bigfile will be splitted into blocks(64 MB).
In this case, When a customRecordReader has to be created to read the bigfile, Pls explain the reason for using FileSplit, when the bigfile is already splitted during the file loading progress and available in the forms of splitted blocks.
Pls explain the reason for using FileSplit, when the bigfile is already splitted during the file loading progress and available in the forms of splitted blocks.
I think you might be confused about what a FileSplit actually is. Let's say your bigfile is 128MB and your block size is 64MB. bigfile will take up two blocks. You know this already. You will also (usually) get two FileSplits when the file is being processed in MapReduce. Each FileSplit maps to a block as it was previously loaded.
Keep in mind that the FileSplit class does not contain any of the file's actual data. It is simply a pointer to data within the file.
HDFS splits the files in blocks for storage purpose and may split the data across multiple blocks based on actual file size.
So in case you are writing a customRecordReader then you will have to tell your record reader where to start and stop reading the block so that you can process the data. Reading the data from the beginning of each block or stopping your read at the end of each block may give your mapper incomplete records.
You are comparing apples and Oranges. The full name of the FileSplit is org.apache.hadoop.mapred.FileSplit, emphasis on mapred. Is a MapReduce conspet, not a file system one. FileSplit is simply a specialization of an InputSplit:
InputSplit represents the data to be processed by an individual Mapper.
You are unnecessarily adding to the discussion HDFS concepts like blocks. MapReduce is unrelated to HDFS (they have synergy together, true). MapReduce can run on many other file systems, like local raw, S3, Azure blobs etc.
Whether a FileSplit happens to coincide with an HDFS block is pure coincidence, from your point of view (is not coincidence as the job is split to take advantage of HDFS block locality, but that is a detail).

What metadata is stored on a datanode in HDFS?

While reading about the metadata that is stored on datanodes in HDFS.I came through these options but not sure whether all are correct or some or correct.
It stores a file with the checksum of the blocks that it stored.
It stores the version of the hadoop used for creating the blocks and
the namespaceid.
It stores information about the other blocks in the same namespace.
What is the correct answer.?
As per definitive guide:
HDFS blocks are stored in files with a blk_ prefix; they consist of the raw bytes of a portion of the file being stored. Each block has an associated metadata file with a .meta suffix. It is made up of a header with version and type information, followed by a series of checksums for sections of the block.
Too late to give an answer. But will be useful for some one.
Option 1 is correct.
It stores a file with the checksum of the blocks that it stored.
The .meta file in datanode will contain the checksum information for taht block which would be cross-checked when a client reads that block from datanode, if the checksum is not matched it throws an error.

Write Path HDFS

Introduction
Follow-up question to this question.
A File has been provided to HDFS and has been subsequently replicated to three DataNodes.
If the same file is going to be provided again, HDFS indicates that the file already exists.
Based on this answer a file will be split into blocks of 64MB (depending on the configuration settings). A mapping of the filename and the blocks will be created in the NameNode. The NameNode knows in which DataNodes the blocks of a certain file reside. If the same file is provided again the NameNode knows that blocks of this file exists on HDFS and will indicate that the file already exits.
If the content of a file is changed and provided again does the NameNode update the existing file or is the check restricted to mapping of filename to blocks and in particular the filename? Which process is responsible for this?
Which process is responsible for splitting a file into blocks?
Example Write path:
According to this documentation the Write Path of HBase is as follows:
Possible Write Path HDFS:
file provided to HDFS e.g. hadoop fs -copyFromLocal ubuntu-14.04-desktop-amd64.iso /
FileName checked in FSImage whether it already exists. If this is the case the message file already exists is displayed
file split into blocks of 64MB (depending on configuration
setting). Question: Name of the process which is responsible for block splitting?
blocks replicated on DataNodes (replication factor can be
configured)
Mapping of FileName to blocks (MetaData) stored in EditLog located in NameNode
Question
How does the HDFS' Write Path look like?
If the content of a file is changed and provided again does the NameNode update the existing file or is the check restricted to mapping of filename to blocks and in particular the filename?
No, it does not update the file. The name node only checks if the path (file name) already exists.
How does the HDFS' Write Path look like?
This is explained in detail in this paper: "The Hadoop Distributed File System" by Shvachko et al. In particular, read Section 2.C (and check Figure 1):
"When a client writes, it first asks the NameNode to choose DataNodes to host replicas of the first block of the file. The client organizes a pipeline from node-to-node and sends the data. When the first block is filled, the client requests new DataNodes to be chosen to host replicas of the next block. A new pipeline is organized, and the client sends the further bytes of the file. Choice of DataNodes for each block is likely to be different. The interactions among the client, the NameNode and the DataNodes are illustrated in Fig. 1."
NOTE: A book chapter based on this paper is available online too. And a direct link to the corresponding figure (Fig. 1 on the paper and 8.1 on the book) is here.

Reading contents of blocks directly in a datanode

In HDFS , the blocks are distributed among the active nodes/slaves. The content of the blocks are simple text so is there any way to see read or access the blocks present in each data node ?
As an entire file or to read a single block (say block number 3) out of sequence?
You can read the file via various mechanisms including the Java API but you cannot start reading in the middle of the file (for example at the start of block 3).
Hadoop reads a block of data and feeds each line to the mapper for further processing. Also, the Hadoop clients gets the blocks related to a file from different Data Nodes before concatenating them. So, it should be possible to get the data from a particular block.
Hadoop Client might be a good place to start with to look at the code. But, HDFS provides file system abstraction. Not sure what the requirement would be for reading the data from a particular block.
Assuming you have ssh access (and appropriate permissions) to the datanodes, you can cd to the path where the blocks are stored and read the blocks stored on that node (e.g., do a cat BLOCK_XXXX). The configuration parameter that tells you where the blocks are stored is dfs.datanode.data.dir, which defaults to file://${hadoop.tmp.dir}/dfs/data. More details here.
Caveat: the block names are coded by HDFS depending on their internal block ID. Just by looking at their names, you cannot know to which file a block belongs.
Finally, I assume you want to do this for debugging purposes or just to satisfy your curiosity. Normally, there is no reason to do this and you should just use the HDFS web-UI or command-line tools to look at the contents of your files.

How does HDFS with append works

Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?
Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?
How does appending to a file in HDFS deal with conccurency?
Is there risk of dataloss ?
Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?
According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:
HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data. Of course, if you fill up the last block, you will create a new block.
Only one single write or append to any file is allowed at the same time in HDFS, so there is no concurrency to handle. This is managed by the namenode. You need to close a file if you want someone else to begin writing to it.
If the last block in a file is not replicated, the append will fail. The append is written to a single replica, who pipelines it to the replicas, similar to a normal write. It seems to me like there is no extra risk of dataloss as compared to a normal write.
Here is a very comprehensive design document about append and it contains concurrency issues.
Current HDFS docs gives a link to that document, so we can assume that it is the recent one. (Document date is 2009)
And the related issue.
Hadoop Distributed File System supports appends to files, and in this case it should add the 20 MB to the 2nd block in your example (the one with 2 MB in it initially). That way you will end up with two blocks, one with 128 MB and one with 22 MB.
This is the reference to the append java docs for HDFS.

Resources