HDFS and small files - part 2 - hadoop

This is with reference to the question : Small files and HDFS blocks where the answer quotes Hadoop: The Definitive Guide:
Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.
Which I completely agree with because as per my understanding, blocks are just a way for the namenode to map which piece of file is where in the entire cluster. And since HDFS is an abstraction over our regular filesystems, there is no way a 140 MB will consume 256 MB of space on HDFS if the block size is 128MB, or in other words, the remaining space in the block will not get wasted.
However, I stumbled upon another answer here in Hadoop Block size and file size issue which says:
There are limited number of blocks available dependent on the capacity of the HDFS. You are wasting blocks as you will run out of them before utilizing all the actual storage capacity.
Does that mean if I have 1280 MB of HDFS storage and I try to load 11 files with size 1 MB each ( considering 128 MB block size and 1 replication factor per block ), the HDFS will throw an error regarding the storage?
Please correct if I am assuming anything wrong in the entire process. Thanks!

No. HDFS will not throw error because
1280 MB of storage limit is not exhausted.
11 meta entries won't cross memory limits on the namenode.
For example, say we have 3GB of memory available on namenode. Namenode need to store meta entries for each file, each block. Each of this entries take approx. 150 bytes. Thus, you can store roughly max. 1 million files with each having one block. Thus, even if you have much more storage capacity, you will not be able to utilize it fully if you have multiple small files reaching the memory limit of namenode.
But, specific example mentioned in the question does not reach this memory limit. Thus, there should not be any error.
Consider, hypothetical scenario having available memory in the namenode is just 300 bytes* 10. In this case, it should give an error for request to store 11th block.
References:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
https://www.mail-archive.com/core-user#hadoop.apache.org/msg02835.html

Related

Disk block size and hadoop block size

I have read many posts saying Hadoop block size of 64 MB reduces metadata and helps in performance improvement over 4 kb block size. But, why data block size is exactly 4kb in OS Disk and 64 MB in Hadoop.
Why not 100 or some other bigger number?
But, why data block size is exactly 4kb in OS Disk and 64 MB in Hadoop.
In HDFS we store huge amounts of data as compared to a single OS filesystem. So, it doesn't make sense to have small block sizes for HDFS. By having small block sizes, there will be more blocks and the NameNode has to store more metadata about the blocks. And also fetching of the data will be slow as data from higher number of blocks dispersed across many machines has to fetched.
Why not 100 or some other bigger number?
Initially the HDFS block size was 64MB and now it's 128MB by default. Check the dfs.blocksize property in hdfs-site.xml here. This is because of the bigger and better storage capacities and speed (HDD and SSD). We shouldn't be surprised when later it's changed to 256MB.
Check this HDFS comic to get a quick overview about HDFS.
In addition to the existing answers, the following is also relevant:
Blocks on an OS level and blocks on a HDFS level are different concepts. When you have a 10kb file on the OS, then that essentially means 3 blocks of 4kb get allocated, and the result is that you consume 12kb.
Obviously you don't want to allocate a large fraction of your space to blocks that are not full, so you need a small blocksize.
On HDFS however, the content of the block determines the size of the block.
So if you have 129MB that could be stored in 1 block of 128MB and 1 block of 1MB. (I'm not sure if it will spread it out differently).
As a result you don't 'lose' the 127 mb which is not allocated.
With this in mind you will want to have a comparatively large blocksize to optimize block management.

Will the replicates of Hadoop occupy NameNode's memory

We know each file in HDFS will occupy about 300 bytes memory in NameNode, because each file has 2 other replicates, so will one file totally occupy 900 bytes memory in NameNode, or the replicates don't occupy memory in NameNode.
Looking at optimisation for name node memory usage and performance done at HADOOP-1687 can see that the memory usage for blocks is multiplied by the replication factor. However, the memory usage for files and directories does not have increased cost based on replication.
The number of bytes used for a block prior to that change (i.e. in Hadoop 0.13) was 152 + 72 * replication, giving a figure of 368 bytes per block with the default replication setting of 3. Files were typically using 250 bytes, and directories 290 bytes, both regardless of replication setting.
The improvements were included from 0.15 (which did include some per-replication saving, but there was still a per-replication cost).
I haven't seen any other references indicating the per-replication memory usage has been removed.
From the Hadoop Wiki: "It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself."
The NameNode stores only file and directory information. With a replication factor of 3, a 300 MB file placed into HDFS will use a total of 900 MB of raw disk space. Each DataNode will get one copy of the 300 MB file, stored on disk not in memory.

How will the free space of a block in a data node be utilized by hdfs in hadoop?

My file has a size of 10MB, I stored this in hadoop, but the default block size in hdfs is 64 MB. Thus, my file uses 10 MB out-of 64 MB. How will HDFS utilize the remaining 54 MB of free space in the same block?
Logically, if you files are smaller than block size than HDFS will reduce the block size for that particular files to the size of file. So HDFS will only use 10MB for storing 10MB of small files.It will not waste 54MB or leave it blank.
Small file sin HDFS are desribed in detail here : http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
The remaining 54MB would be utilized for some other file. So this is how it works, assume you do a put or copyFromLocal of 2 small files each with size 20MB and your block size is 64MB. Now HDFS calculates the available space (suppose previously you have saved a file of 10 MB in a 64MB block it includes these remaining 54MB as well)in the filesystem(not available blocks) and gives a report in terms of block. Since you have 2 files, with replication factor as 3, so a total of 6 blocks would be allocated for your files even if your file size is less than the block size. If the cluster doesn't have 6 blocks(6*64MB) then the put process would fail. Since the report is fetched in terms of space not in terms of blocks, you would never run out of blocks. The only time files are measured in blocks is at block allocation time.
Read this blog for more information.

Is there any memory loss in HDFS if we use small files?

I have taken below Quoting from Hadoop - The Definitive Guide:
Note, however, that small files do not take up any more disk space than is required to store the raw contents of the file. For example, a 1 MB file stored with a block size of 128 MB uses 1 MB of disk space, not 128 MB,
Here my questions
1) 1 MB file stored with a block size of 128 MB uses 1 MB of disk space, not 128 MB.) How does hdfs use the remaining 127M in this block?
2)Is there any chance to store another file in same block?
1 MB file stored in 128MB block with 3 replication. Then the file will be stored in 3 blocks and uses 3*1=3 MB only not 3*128=384 MB. But it shows each the block size as 128 MB. It is just an abstraction to store the metadata in the namenode, but not an actual memory size used.
No way to store more than a file in a single block. Each file will be stored in a separate block.
Reference:
https://stackoverflow.com/a/21274388/3496666
https://stackoverflow.com/a/15065274/3496666
https://stackoverflow.com/a/14109147/3496666
NameNode Memory Usage:
Every file, directory and block in HDFS is represented as an object. i.e. each entry i the namenode is reflected to a item.
in the namenode’s memory, and each of object/item occupies 150 to 200 bytes of namenode memory.memorandums prefer fewer large files as a result of the metadata that needs to be stored.
Consider a 1 GB file with the default block size of 64MB.
-Stored as a single file 1 GB file
Name: 1 item
Block=16
Total Item = 16*3( Replication factor=3) = 48 + 1(filename) = 49
Total NameNode memory: 150*49
-Stored as 1000 individual 1 MB files
Name: 1000
Block=1000
Total Item = 1000*3( Replication factor=3) = 3000 + 1000(filename) = 4000
Total NameNode memory: 150*4000
Above results clarify that large number of small files is a overhead of naemnode memory as it takes more space of NameNode memory.
Block Name and Block ID is a unique ID of a particular block of data.This uniue ID is getting used to identified
the block during reading of the data when client make a request to read data.Hence it can not be shared.
HDFS is designed to handle large files. Lets say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000
requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead.
Each request has to be processed by the Name Node to figure out where that block can be found. That's a lot of traffic!
If you use 64Mb blocks, the number of requests goes down to 16, greatly reducing the cost of overhead and load on the Name Node.
To keep these things in mind hadoop recommend large block size.
HDFS block size is a logical unit of splitting a large file into small chunks. This chunks is basically called a block.
These chunks/block is used during further parallel processing of the data.i.e. MapReduce Programming or other model
to read/process of that within HDFS.
If a file is small enough to fit in this logical block then one block will get assigned for the file and it will
take disk space according to file size and Unix file system you are using.The detail about, how file gets stored in disk is available on this link.
HDFS block size Vs actual file size
As HDFS block size is a logical unit not a physical unit of the memory, so there is no waste of memory.
These link will be useful to understand the problem with small file.
Link1,
Link2
See Kumar's Answer
You could look into SequenceFiles or HAR Files depending on your use case. HAR files are analogous to the Tar command. MapReduce can act upon each HAR files with a little overhead. As for SequenceFiles, they are in a way a container of Key/Value pairs. The benefit of this is a Map task can act upon each of these pairs.
HAR Files
Sequence Files
More About Sequence Files

How a small file is stored in HDFS

In hadoop definitive guide :
a 1 MB file stored with a block size of 128 MB uses 1 MB of disk space, not128 MB.
what does this mean ?
does it use 1MB of size in a block of 128MB or 1MB is used and reamining 127MB is free to occupy by some other file ?
This is often a misconception about HDFS - the block size is more about how a single file is split up / partitioned, not about some reserved part of the file system.
Behind the schemes, each block is stored on the DataNodes underlying files system as a plain file (and an associated checksum). If you look into the data node folder on your disks you should be able to find the file (if you know the file's block ID and data node allocations - which you can discover from the NameNode Web UI).
So back to your question, a 1MB file with a block size of 16MB/32MB/128MB/512MB/1G/2G (you get the idea) will still only be a 1MB file on the data nodes disk. The difference between the block size and the amount of data stored in that block is then free for the underlying file system to use as it sees fit (by HDFS, or something else).
Hadoop Block size is Hadoop Storage Concept. Every Time When you store a File in Hadoop it will divided into the block sizes and based on the replication factor and data locality it will be distributed over the cluster.
For Details you can find my answer here
Small files and HDFS blocks

Resources