Disk block size and hadoop block size - hadoop

I have read many posts saying Hadoop block size of 64 MB reduces metadata and helps in performance improvement over 4 kb block size. But, why data block size is exactly 4kb in OS Disk and 64 MB in Hadoop.
Why not 100 or some other bigger number?

But, why data block size is exactly 4kb in OS Disk and 64 MB in Hadoop.
In HDFS we store huge amounts of data as compared to a single OS filesystem. So, it doesn't make sense to have small block sizes for HDFS. By having small block sizes, there will be more blocks and the NameNode has to store more metadata about the blocks. And also fetching of the data will be slow as data from higher number of blocks dispersed across many machines has to fetched.
Why not 100 or some other bigger number?
Initially the HDFS block size was 64MB and now it's 128MB by default. Check the dfs.blocksize property in hdfs-site.xml here. This is because of the bigger and better storage capacities and speed (HDD and SSD). We shouldn't be surprised when later it's changed to 256MB.
Check this HDFS comic to get a quick overview about HDFS.

In addition to the existing answers, the following is also relevant:
Blocks on an OS level and blocks on a HDFS level are different concepts. When you have a 10kb file on the OS, then that essentially means 3 blocks of 4kb get allocated, and the result is that you consume 12kb.
Obviously you don't want to allocate a large fraction of your space to blocks that are not full, so you need a small blocksize.
On HDFS however, the content of the block determines the size of the block.
So if you have 129MB that could be stored in 1 block of 128MB and 1 block of 1MB. (I'm not sure if it will spread it out differently).
As a result you don't 'lose' the 127 mb which is not allocated.
With this in mind you will want to have a comparatively large blocksize to optimize block management.

Related

HDFS and small files - part 2

This is with reference to the question : Small files and HDFS blocks where the answer quotes Hadoop: The Definitive Guide:
Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.
Which I completely agree with because as per my understanding, blocks are just a way for the namenode to map which piece of file is where in the entire cluster. And since HDFS is an abstraction over our regular filesystems, there is no way a 140 MB will consume 256 MB of space on HDFS if the block size is 128MB, or in other words, the remaining space in the block will not get wasted.
However, I stumbled upon another answer here in Hadoop Block size and file size issue which says:
There are limited number of blocks available dependent on the capacity of the HDFS. You are wasting blocks as you will run out of them before utilizing all the actual storage capacity.
Does that mean if I have 1280 MB of HDFS storage and I try to load 11 files with size 1 MB each ( considering 128 MB block size and 1 replication factor per block ), the HDFS will throw an error regarding the storage?
Please correct if I am assuming anything wrong in the entire process. Thanks!
No. HDFS will not throw error because
1280 MB of storage limit is not exhausted.
11 meta entries won't cross memory limits on the namenode.
For example, say we have 3GB of memory available on namenode. Namenode need to store meta entries for each file, each block. Each of this entries take approx. 150 bytes. Thus, you can store roughly max. 1 million files with each having one block. Thus, even if you have much more storage capacity, you will not be able to utilize it fully if you have multiple small files reaching the memory limit of namenode.
But, specific example mentioned in the question does not reach this memory limit. Thus, there should not be any error.
Consider, hypothetical scenario having available memory in the namenode is just 300 bytes* 10. In this case, it should give an error for request to store 11th block.
References:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
https://www.mail-archive.com/core-user#hadoop.apache.org/msg02835.html

How will the free space of a block in a data node be utilized by hdfs in hadoop?

My file has a size of 10MB, I stored this in hadoop, but the default block size in hdfs is 64 MB. Thus, my file uses 10 MB out-of 64 MB. How will HDFS utilize the remaining 54 MB of free space in the same block?
Logically, if you files are smaller than block size than HDFS will reduce the block size for that particular files to the size of file. So HDFS will only use 10MB for storing 10MB of small files.It will not waste 54MB or leave it blank.
Small file sin HDFS are desribed in detail here : http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
The remaining 54MB would be utilized for some other file. So this is how it works, assume you do a put or copyFromLocal of 2 small files each with size 20MB and your block size is 64MB. Now HDFS calculates the available space (suppose previously you have saved a file of 10 MB in a 64MB block it includes these remaining 54MB as well)in the filesystem(not available blocks) and gives a report in terms of block. Since you have 2 files, with replication factor as 3, so a total of 6 blocks would be allocated for your files even if your file size is less than the block size. If the cluster doesn't have 6 blocks(6*64MB) then the put process would fail. Since the report is fetched in terms of space not in terms of blocks, you would never run out of blocks. The only time files are measured in blocks is at block allocation time.
Read this blog for more information.

Query regarding the block size

With regards to the HDFS, I read from their site under the Data Replication section (below link) that
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication
'all blocks in a file except the last block are the same size'
Could you please let me know what is the reason the last block would not be of the same size?
Could it be that the total memory allocation may play a part over here?
However if the memory size is not an issue, would then still the last block would not of the same size as the rest of the blocks for a file?
And if yes, could you please elaborate a bit on this?
Any link to the JIRA for the development effort for this would be greatly appreciated.
Actually this is not at all an issue. Indeed it is uncertain that the last block of the file can be in the same size.
Consider a file of size 1000 MB and the block is 128MB, then the file will be splitted into 8 Blocks, where the first 7 blocks will be in even size which is equal to 128MB.
The total size of the 7 blocks will be 896MB (7*128MB), hence remaining size will be 104MB (1000-896). So the last block's actual size will be 104 MB wherein other 7 blocks are of 128 MB.
The namenode will allocate data blocks for every chunk of the file being stored on HDFS. It will not make any consideration for the chunks which's size is less than the data block size.
HDFS is designed to store chunks of data in equally sized data blocks so that the data blocks available on data nodes can be easily calculated and maintained by the namenode.

How HDFS calculate the available blocks?

Assuming that the block size is 128MB, the cluster has 10GB (so ~80 available blocks). Suppose that I have created 10 small files which together take 128MB on disk (block files, checksums, replication...) and 10 HDFS blocks. If I want to add another small file to HDFS, then what does HDFS use, the used blocks or the actual disk usage, to calculate the available blocks?
80 blocks - 10 blocks = 70 available blocks or (10 GB - 128 MB)/128 MB = 79 available blocks?
Thanks.
Block size is just an indication to HDFS how to split up and distribute the files across the cluster - there is not a physically reserved number of blocks in HDFS (you can change the block size for each individual file if you wish)
For your example, you need to also take into consideration the replication factor and checksum files, but essentially adding lots of small files (less than the block size) does not mean that you have wasted 'available blocks' - they take up as much room as they need (again you need to remember that replication will increase the physical data footprint required to store the file) and the number of 'available blocks' will be closer to your second calculation.
A final note - having lots to small files means that your name node will require more memory to track them (blocks sizes, locations etc), and its generally less efficient to process 128x1MB files than single 128MB file (although that depends on how you're processing it)

Hadoop block size and file size issue?

This might seem as a silly question but in Hadoop suppose blocksize is X (typically 64 or 128 MB) and a local filesize is Y (where Y is less than X).Now when I copy file Y to the HDFS will it consume one block or will hadoop create smaller size blocks?
One block is consumed by Hadoop. That does not mean that storage capacity will be consumed in an equivalent manner.
The output while browsing the HDFS from web looks like this:
filename1 file 48.11 KB 3 128 MB 2012-04-24 18:36
filename2 file 533.24 KB 3 128 MB 2012-04-24 18:36
filename3 file 303.65 KB 3 128 MB 2012-04-24 18:37
You see that each file size is lesser than the block size which is 128 MB. These files are in KB.
HDFS capacity is consumed based on the actual file size but a block is consumed per file.
There are limited number of blocks available dependent on the capacity of the HDFS. You are wasting blocks as you will run out of them before utilizing all the actual storage capacity. Remember that Unix filsystem also has concept of blocksize but is a very small number around 512 Bytes. This concept is inverted in HDFS where the block size is kept bigger around 64-128 MB.
The other issue is that when you run map/reduce programs it will try to spawn mapper per block so in this case when you are processing three small files, it may end up spawning three mappers to work on them eventually.
This wastes resources when the files are of smaller size. You also add latency as each mapper takes time to spawn and then ultimately would work on a very small sized file. You have to compact them into files closer to blocksize to take advantage of mappers working on lesser number of files.
Yet another issue with numerous small files is that it loads namenode which keeps the mapping (metadata) of each block and chunk mapping in main memory. With smaller files, you fill up this table faster and more main memory will be required as metadata grows.
Read the following for reference:
http://www.cloudera.com/blog/2009/02/the-small-files-problem/
http://www.ibm.com/developerworks/web/library/wa-introhdfs/
Oh! there is a discussion on SO : Small files and HDFS blocks

Resources