Any one know how many bytes occupy per file in namenode of Hdfs?
I want to estimate how many files can store in single namenode of 32G memory.
Each file or directory or block occupies about 150 bytes in the namenode memory. [1] So a cluster with a namenode with 32G RAM can support a maximum of (assuming namenode is the bottleneck) about 38 million files. (Each file will also take up a block, so each file takes 300 bytes in effect. I am also assuming 3x replication. So each file takes up 900 bytes)
In practice however, the number will be much lesser because all of the 32G will not be available to the namenode for keeping the mapping. You can increase it by allocating more heap space to the namenode in that machine.
Replication also effects this to a lesser degree. Each additional replica adds about 16 bytes to the memory requirement. [2]
[1] https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges/
[2] http://search-hadoop.com/c/HDFS:/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java%7C%7CBlockInfo
Cloudera recommends 1 GB of NameNode heap space per million blocks. 1 GB for every million files is less conservative but should work too.
Also you don't need to multiply by a replication factor, an accepted answer is wrong.
Using the default block size of 128 MB, a file of 192 MB is split into two block files, one 128 MB file and one 64 MB file. On the NameNode, namespace objects are measured by the number of files and blocks. The same 192 MB file is represented by three namespace objects (1 file inode + 2 blocks) and consumes approximately 450 bytes of memory.
One data file of 128 MB is represented by two namespace objects on the NameNode (1 file inode + 1 block) and consumes approximately 300 bytes of memory. By contrast, 128 files of 1 MB each are represented by 256 namespace objects (128 file inodes + 128 blocks) and consume approximately 38,400 bytes.
Replication affects disk space but not memory consumption. Replication changes the amount of storage required for each block but not the number of blocks. If one block file on a DataNode, represented by one block on the NameNode, is replicated three times, the number of block files is tripled but not the number of blocks that represent them.
Examples:
1 x 1024 MB file
1 file inode
8 blocks (1024 MB / 128 MB)
Total = 9 objects * 150 bytes = 1,350 bytes of heap memory
8 x 128 MB files
8 file inodes
8 blocks
Total = 16 objects * 150 bytes = 2,400 bytes of heap memory
1,024 x 1 MB files
1,024 file inodes
1,024 blocks
Total = 2,048 objects * 150 bytes = 307,200 bytes of heap memory
Even more examples article in the origin article from cloudera.
(Each file metadata = 150bytes) + (block metadata for the file=150bytes)=300bytes
so 1million files each with 1 block will consume=300*1000000=300000000bytes
=300MB for replication factor of 1. with replication factor of 3 it requires 900MB.
So as thumb rule for every 1GB you can store 1million files.
Related
Disk Capacity is 80 MB, block size is 512 bytes, and pointer size is 4 bytes.
What is the number of entries require in the FAT table?
What is the table size?
A file of size 12,500 bytes is to be stored on a hard disk drive where the sector size is 512 bytes, and a cluster consists of 8 sectors. How much slack space is there once the file has been saved?
A few things to know:
Slack space is the leftover sectors and bytes from a cluster allocation.
Hard drives understand sectors and are nearly universally 512 bytes as far as logical block addressing goes.
OS (operating systems) understand clusters, which are groups of sectors. This amount of sectors can vary between OS and file system.
The OS DOES NOT understand sectors. The hard drive DOES NOT understand clusters. But they are both terms needed for calculating slack space.
Example 1
Given:
Sector Size: 512 bytes (basically the ONLY sector size these days)
cluster size: 8 sectors
file size: 2560 bytes
In this scenario, when disk space is allocated to store the file, the smallest amount that the OS can read/write MUST be 8 sectors or 4096 bytes.
To find slack space:
first find your cluster size in bytes. Cluster == 8 sectors.
∴ allotment is 4096.
then find if the file size is larger or smaller than that allotment size.
2560 bytes < 4096.
∴ only one cluster is needed to save this file
subtract the file size from the cluster size and you have slack space.
4096 - 2560 == 1536 bytes (or 3 sectors) of slack space.
Example 2
Given:
Sector Size: 512 bytes
cluster size: 16 sectors
file size: 61440 bytes
In this scenario, when disk space is allocated to store the file, the smallest amount that the OS can read/write MUST be 16 sectors or 8192 bytes.
Let's work through the same process:
first find your cluster size in bytes. Cluster == 16 sectors.
∴ allotment is 8192.
then find if the file size is larger or smaller than that allotment size.
61440 bytes > 8192.
∴ several clusters are needed to save this file.
since this file is larger, divide it by the cluster size in bytes.
61440 / 8192 == 7.5 clusters required to save this file.
This isn't a nice round number, so we will have to round up. Recall that the OS cannot write LESS than a whole cluster and if we allot less whole clusters than necessary, we won't save the file.
∴ we require 8 clusters.
find the size in bytes of your allotment size.
8 clusters * 8192 == 65536.
subtract the file size from the cluster allotment and you have slack space.
65536 - 61440 == 4096 bytes (or 4 sectors) of slack space.
Try it.
I have a cluster of 13 machines with 4 physical CPUs and 24 G of RAM.
I started a spark cluster with one driver and 12 slaves.
I set the number of cores by slaves to 12 cores, meaning I have a cluster as foloowing :
Alive Workers: 12
Cores in use: 144 Total, 110 Used
Memory in use: 263.9 GB Total, 187.0 GB Used
I started an application with the folowing configuration :
[('spark.driver.cores', '4'),
('spark.executor.memory', '15G'),
('spark.executor.id', 'driver'),
('spark.driver.memory', '5G'),
('spark.python.worker.memory', '1042M'),
('spark.cores.max', '96'),
('spark.rdd.compress', 'True'),
('spark.serializer.objectStreamReset', '100'),
('spark.executor.cores', '8'),
('spark.default.parallelism', '48')]
I understand there are 15G of RAM by executor with 8 task slot and a parallelism of 48 (48 = 6 task slot * 12 slaves).
then I have two big files on HDFS : 6 G each, (from a directory of 12 files of 5 blocks of 128 Mb each) , with a 3x replication factor.
I union these two files => I get one dataframe of 12 GB I think but I see a 37 G reading input through the IHM :
That could be the first question : Why 37 Gb ?
Then as the execution time is too long for me, I try to cache the data so that I can go faster. But the caching method never finishes, here you can see it is already 45 minutes before the end (Vs 6 min not cached !):
So I try to understand why, and I see the usage of Memory/Disk on the storage section of the ihm :
So there are some part of the RDD that are staying on disk.
Furthemore I see the executors may still have free memory :
And I notice on the same "storage" page that the size of the RDD has jumped :
Storage Level: Disk Serialized 1x Replicated
Cached Partitions: 72
Total Partitions: 72
Memory Size: 42.7 GB
Disk Size: 73.3 GB
=> I understand : Memory Size: 42.7 GB + Disk Size: 73.3 GB = 110 G !
=> So my 6 G file has transformed on 37 G and then on 110 G ???
But i try to understand why is there still some memory left on my executor, and I go to the "err" dump of one, and I see :
18/02/08 11:04:08 INFO MemoryStore: Will not store rdd_50_46
18/02/08 11:04:09 WARN MemoryStore: Not enough space to cache rdd_50_46 in memory! (computed 1134.1 MB so far)
18/02/08 11:04:09 INFO MemoryStore: Memory use = 1641.6 KB (blocks) + 7.7 GB (scratch space shared across 6 tasks(s)) = 7.7 GB. Storage limit = 7.8 GB.
18/02/08 11:04:09 WARN BlockManager: Persisting block rdd_50_46 to disk instead.
And Here I see that the executor want to cache a 1641.6 KB block (only 1Mo !) and I can't because there is a ["scratch space"] of 7.7 Gb "shared across 6 tasks".
=> What is a "scratch space" ? ?
=> The 6 tasks => comes from the parallelism of 48 / 12 = 6
And then I come back to the app information, and I see that the count that lasted 48 min read only 37 Gb of data ! (The 48 min are clearly used to cache the data too)
When I do a count on the cached dataframe I have a 116G input read :
And at the end of the day, the time saved by the cached count is not that impressive, here are 3 duration :
4.8 ' : count on cached df
48' : count while caching
5.8' : count on not cached df (read directly from hdfs)
So why is it so ?
Because the cached df is not that much cached :
Meaning more or less 40 Gb in memory and 60 Gb on disk.
I am surprised because at 15G / executor * 12 slaves => 180 Gb of memory, and I can cache only 40 Gb ... But in fact I remember that the memory is splitted :
30% for spark
54% for storage
16% for shuffle
So I understand that I do have 54% * 15G for storage, ie 8.1 G, meaning that on my 180 Gb, I only have 97 Gb for storage. Why do I have 90 - 40 = 50 G not used then ?
Oups... This is a long post !
Plenty of questions... Sorry...
I use NtQueryInformationFile with FILE_STANDARD_INFORMATION struct to retrieve the allocation size of file. But for small files it returns incorrect1 result. For example text file with size 1 byte returns 8 bytes allocation size, instead 4096 bytes. Where is problem?
1 I'm assuming that this value is incorrect, because Explorer (on Windows XP Checked Build in my case) the size on disk reports higher figures (4096 bytes for a file with size 1).
file size in EndOfFile member. AllocationSize - this is how many disk space allocated for file -
Usually, this value is a multiple of the sector or cluster size of the
underlying physical device.
This is an example from the book Computer Organization and Architecture by Stallings
The cache can hold 64 Kbytes
Data are transfered between main memory and the cache in blocks of 4 bytes each. This means that the cache is organized as 16K = 2^14 lines of 4 bytes each *.
The main memory has 16M. That is 2^24 words. So 4M blocks of 4 bytes.
My confusion is in the second point. It is said that each block is of 4 bytes that is 4 words of 8 bits so one block is 32 bits = 2^5. Now I want to get the number of blocks in the cache. For that I divide the size of the cache with the size of one block, that is 2^16(64K)/2^5(4bytes) = 2^11 lines of 4 bytes each but the answer is 2^14. What am I doing wrong? Thanks!
Its 64K Bytes so it will be 2^16 Bytes.
You will have to make it to bits so it will be (2^16 * 2^3 bits) / 2^5 bits = 2^14