The memory consumption of hadoop's namenode? - hadoop

Can anyone give a detailed analysis of memory consumption of namenode? Or is there some reference material ? Can not find material in the network.Thank you!

I suppose the memory consumption would depend on your HDFS setup, so depending on overall size of the HDFS and is relative to block size.
From the Hadoop NameNode wiki:
Use a good server with lots of RAM. The more RAM you have, the bigger the file system, or the smaller the block size.
From https://twiki.opensciencegrid.org/bin/view/Documentation/HadoopUnderstanding:
Namenode: The core metadata server of Hadoop. This is the most critical piece of the system, and there can only be one of these. This stores both the file system image and the file system journal. The namenode keeps all of the filesystem layout information (files, blocks, directories, permissions, etc) and the block locations. The filesystem layout is persisted on disk and the block locations are kept solely in memory. When a client opens a file, the namenode tells the client the locations of all the blocks in the file; the client then no longer needs to communicate with the namenode for data transfer.
the same site recommends the following:
Namenode: We recommend at least 8GB of RAM (minimum is 2GB RAM), preferably 16GB or more. A rough rule of thumb is 1GB per 100TB of raw disk space; the actual requirements is around 1GB per million objects (files, directories, and blocks). The CPU requirements are any modern multi-core server CPU. Typically, the namenode will only use 2-5% of your CPU.
As this is a single point of failure, the most important requirement is reliable hardware rather than high performance hardware. We suggest a node with redundant power supplies and at least 2 hard drives.
For a more detailed analysis of memory usage, check this link out:
https://issues.apache.org/jira/browse/HADOOP-1687
You also might find this question interesting: Hadoop namenode memory usage

There are several technical limits to the NameNode (NN), and facing any of them will limit your scalability.
Memory. NN consume about 150 bytes per each block. From here you can calculate how much RAM you need for your data. There is good discussion: Namenode file quantity limit.
IO. NN is doing 1 IO for each change to filesystem (like create, delete block etc). So your local IO should allow enough. It is harder to estimate how much you need. Taking into account fact that we are limited in number of blocks by memory you will not claim this limit unless your cluster is very big. If it is - consider SSD.
CPU. Namenode has considerable load keeping track of health of all blocks on all datanodes. Each datanode once a period of time report state of all its block. Again, unless cluster is not too big it should not be a problem.

Example calculation
200 node cluster
24TB/node
128MB block size
Replication factor = 3
How much space is required?
# blocks = 200*24*2^20/(128*3)
~12Million blocks
~12,000 MB memory.

I guess we should make the distinction between how namenode memory is consumed by each namenode object and general recommendations for sizing the namenode heap.
For the first case (consumption) ,AFAIK , each namenode object holds an average 150 bytes of memory. Namenode objects are files, blocks (not counting the replicated copies) and directories. So for a file taking 3 blocks this is 4(1 file and 3 blocks)x150 bytes = 600 bytes.
For the second case of recommended heap size for a namenode, it is generally recommended that you reserve 1GB per 1 million blocks. If you calculate this (150 bytes per block) you get 150MB of memory consumption. You can see this is much less than the 1GB per 1 million blocks, but you should also take into account the number of files sizes, directories.
I guess it is a safe side recommendation. Check the following two links for a more general discussion and examples:
Sizing NameNode Heap Memory - Cloudera
Configuring NameNode Heap Size - Hortonworks
Namenode Memory Structure Internals

Related

HDFS and small files - part 2

This is with reference to the question : Small files and HDFS blocks where the answer quotes Hadoop: The Definitive Guide:
Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.
Which I completely agree with because as per my understanding, blocks are just a way for the namenode to map which piece of file is where in the entire cluster. And since HDFS is an abstraction over our regular filesystems, there is no way a 140 MB will consume 256 MB of space on HDFS if the block size is 128MB, or in other words, the remaining space in the block will not get wasted.
However, I stumbled upon another answer here in Hadoop Block size and file size issue which says:
There are limited number of blocks available dependent on the capacity of the HDFS. You are wasting blocks as you will run out of them before utilizing all the actual storage capacity.
Does that mean if I have 1280 MB of HDFS storage and I try to load 11 files with size 1 MB each ( considering 128 MB block size and 1 replication factor per block ), the HDFS will throw an error regarding the storage?
Please correct if I am assuming anything wrong in the entire process. Thanks!
No. HDFS will not throw error because
1280 MB of storage limit is not exhausted.
11 meta entries won't cross memory limits on the namenode.
For example, say we have 3GB of memory available on namenode. Namenode need to store meta entries for each file, each block. Each of this entries take approx. 150 bytes. Thus, you can store roughly max. 1 million files with each having one block. Thus, even if you have much more storage capacity, you will not be able to utilize it fully if you have multiple small files reaching the memory limit of namenode.
But, specific example mentioned in the question does not reach this memory limit. Thus, there should not be any error.
Consider, hypothetical scenario having available memory in the namenode is just 300 bytes* 10. In this case, it should give an error for request to store 11th block.
References:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
https://www.mail-archive.com/core-user#hadoop.apache.org/msg02835.html

what's the actual ideal NameNode memory size when meet a lot files in HDFS

I will have 200 million files in my HDFS cluster, we know each file will occupy 150 bytes in NameNode memory, plus 3 blocks so there are total 600 bytes in NN.
So I set my NN memory having 250GB to well handle 200 Million files. My question is that so big memory size of 250GB, will it cause too much pressure on GC ? Is it feasible that creating 250GB Memory for NN.
Can someone just say something, why no body answer??
Ideal name node memory size is about total space used by meta of the data + OS + size of daemons and 20-30% space for processing related data.
You should also consider the rate at which data comes in to your cluster. If you have data coming in at 1TB/day then you must consider a bigger memory drive or you would soon run out of memory.
Its always advised to have at least 20% memory free at any point of time. This would help towards avoiding the name node going into a full garbage collection.
As Marco specified earlier you may refer NameNode Garbage Collection Configuration: Best Practices and Rationale for GC config.
In your case 256 looks good if you aren't going to get a lot of data and not going to do lots of operations on the existing data.
Refer: How to Plan Capacity for Hadoop Cluster?
Also refer: Select the Right Hardware for Your New Hadoop Cluster
You can have a physical memory of 256 GB in your namenode. If your data increase in huge volumes, consider hdfs federation. I assume you already have multi cores ( with or without hyperthreading) in the name node host. Guess the below link addresses your GC concerns:
https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra.html

Hadoop put performance - large file (20gb)

I'm using hdfs -put to load a large 20GB file into hdfs. Currently the process runs # 4mins. I'm trying to improve the write time of loading data into hdfs. I tried utilizing different block sizes to improve write speed but got the below results:
512M blocksize = 4mins;
256M blocksize = 4mins;
128M blocksize = 4mins;
64M blocksize = 4mins;
Does anyone know what the bottleneck could be and other options I could explore to improve performance of the -put cmd?
20GB / 4minute comes out to about 85MB/sec. That's pretty reasonable throughput to expect from a single drive with all the overhead of HDFS protocol and network. I'm betting that is your bottleneck. Without changing your ingest process, you're not going to be able to make this magically faster.
The core problem is that 20GB is a decent amount of data and that data getting pushed into HDFS as a single stream. You are limited by disk I/O which is pretty lame given you have a large number of disks in a Hadoop cluster.. You've got a while to go to saturate a 10GigE network (and probably a 1GigE, too).
Changing block size shouldn't change this behavior, as you saw. It's still the same amount of data off disk into HDFS.
I suggest you split the file up into 1GB files and spread them over multiple disks, then push them up with -put in parallel. You might want even want to consider splitting these files over multiple nodes if network becomes a bottleneck. Can you change the way you receive your data to make this faster? Obvious splitting the file and moving it around will take time, too.
It depends a lot on the details of your setup. First, know that 20GB in 4 mins is 80MBps.
The bottleneck is most likely your local machine's hardware or its ethernet connection. I doubt playing with block size will improve your throughput by much.
If your local machine has a typical 7200rpm hard drive, its disk to buffer transfer rate is about 128MBps, meaning that it could load that 20BG file into memory in about 2:35, assuming you have 20GB to spare. However, you're not just copying it to memory, you're streaming it from memory to network packets, so it's understandable that you incur an additional overhead for processing these tasks.
Also see the wikipedia entry on wire speed, which puts a fast ethernet setup at 100Mbit/s (~12MB/s). Note that in this case fast ethernet is a term for a particular group of ethernet standards. You are clearly getting a faster rate than this. Wire speed is a good measure, because it accounts for all the factors on your local machine.
So let's break down the different steps in the streaming process on your local machine:
Read a chunk from file and load it into memory. Components: hard drive, memory
Split and translate that chunk into packets. Last I heard Hadoop doesn't use DMA features out of the box, so these operations will be performed by your CPU rather than the NIC. Components: Memory, CPU
Transmit packets to hadoop file servers. Components: NIC, Network
Without knowing more about your local machine, it is hard to specify which of these components is the bottleneck. However, these are the places to start investigating bitrate.
you may want to use distcp
hadoop distcp -Ddfs.block.size=$[256*1024*1024] /path/to/inputdata /path/to/outputdata
to perform parallel copy

Hadoop namenode disk size

Are there any suggestions about size of HDD on namenode physical machine? Sure, it does not store any data from HDFS like datanode but what should I depend on while creating cluster?
Physical disk space on the NameNode does not really matter unless you run a Datanode on the same node. However, it is very important to have good memory (RAM) space allocated to the NameNode. This is because the NameNode stores all the metadata of the HDFS (block allocations, block locations etc.), in memory. If sufficient memory is not allocated, the NameNode might run out of memory and fail.
You might need some space to actually store the the NameNode's FSImage, edit file and other relevant files.
It's actually recommended to configure NameNode to use multiple directories (one local and other NFS mount), so that multiple copies of File System metadata will be stored. That way, as long as the directories are on separate disks, a single disk failure will not corrupt the meta-data.
Please see this link for more details.
We're hearing from Cloudera that they recommend name nodes have faster disks - combination of SSD and 10kRPM SAS drives over typical 2TB, 7200K SAS drives. Does this sound reasonable or overkill since everything else I've read suggests that you don't really need expensive high speed storage for Hadoop.

Hadoop single node configuration on the high memory machine

I have a single node instance of Apache Hadoop 1.1.1 with default parameter values (see e.g. [1] and [2]) on the machine with a lot of RAM and very limited free disk space size. Then, I notice that this Hadoop instance wastes a lot of disk space during map tasks. What configuration parameters should I pay attention to in order to take advantage of high RAM capacity and decrease disk space usage?
You can use several of the mapred.* params to compress map output, which will greatly reduce the amount of disk space needed to store mapper output. See this question for some good pointers.
Note that different compression codecs will have different issues (i.e. GZip needs more CPU than LZO, but you have to install LZO yourself). This page has a good discussion of compression issues in Hadoop, although it is a bit dated.
The amount of RAM you need depends upon what you are doing in your map-reduce jobs, although you can increase your heap-size in:
conf/mapred-site.xml mapred.map.child.java.opts
See cluster setup for more details on this.
You can use dfs.datanode.du.reserved in hdfs-site.xml to specify an amount of disk space you won't use. I don't know whether hadoop is able to compensate with higher memory usage.
You'll have a problem, though, if you run a mapreduce job that's disk i/o intensive. I don't think any amount of configuring will help you then.

Resources