HDFS space usage on fresh install

HDFS space usage on fresh install - hadoop

I just installed HDFS and launched the service,
and there is already more than 800MB of used space.
What does it represent ?
$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://quickstart.cloudera:8020 54.5 G 823.7 M 43.4 G 1%

Related

Know the disk space of data nodes in hadoop?

Is there a way or any command using which I can come to know the disk space of each datanode or the total cluster disk space?
I tried the command
dfs -du -h /
but it seems that I do not have permission to execute it for many directories and hence cannot get the actual disk space.

From UI:
http://namenode:50070/dfshealth.html#tab-datanode
---> which will give you all the details about datanode.
From command line:
To get disk space of each datanode:
sudo -u hdfs hdfs dfsadmin -report
---> which will give you the details of entire HDFS and the individual datanodes OR
sudo -u hdfs hdfs dfs -du -h /
---> which will give you the total disk usage of each folder under root / directory

You view the information about all datanodes and their disk usage in the namenode UI's Datanodes tab.
Total cluster disk space can be seen in the summary part of the main page.
http://namenode-ip:50070

If you are using Hadoop cluster configured as simple security, you can execute the below command to get the usage of data nodes.
export HADOOP_USER_NAME=hdfs ;
* Above command can be used to get admin privilege in simple security, If you are using any other user for hdfs admin, replace hdfs with the respective hdfs admin user.
hadoop dfsadmin -report
Alternate option is to login to respective datanode and execute the below unix command to get disk utilization of that server.
df -h

Hadoop 3.2.0:
hduser#hadoop-node1:~$ hdfs dfs -df
Filesystem Size Used Available Use%
hdfs://hadoop-node1:54310 3000457228288 461352007680 821808787456 15%
hduser#hadoop-node1:~$
For human-readable numbers, use:
hduser#hadoop-node1:~$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://hadoop-node1:54310 2.7 T 429.7 G 765.4 G 15%
hduser#hadoop-node1:~$

Spark job crash with no space left on device because of hdfs "non DFS usage"

I am running a spark sql job on a small dataset (25 GB) and I always end up filling up the disk of things and eventually crashing up my executors.
$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://x.x.x.x:8020 138.9 G 3.9 G 14.2 G 3%
$ hdfs dfs -du -h
2.5 G .sparkStaging
0 archive
477.3 M checkpoint
When this append, I have to leave safemode.. hdfs dfsadmin -safemode leave
Looking at the spark job itself, it is obviously a problem of shuffle or caching dataframes. However, any idea why df reports such irregular Used/Available sizes? And why du does not list files?
I understand that it's related to the "non DFS usage" I can see in namenode overview. But why spark uses up so much "hidden" space to the point it makes my job crash?
Configured Capacity: 74587291648 (69.46 GB)
DFS Used: 476200960 (454.14 MB)
Non DFS Used: 67648394610 (63.00 GB)
DFS Remaining: 6462696078 (6.02 GB)

Remaining space on cloudera hadoop cluster in human readable format

I am looking for a command that shows the human readable form of the space left on hadoop cluster. I found a command on this forum and the output is in the image.
hdfs dfsadmin -report
[output of dfsadmin command][1]
I heard that there is another command in hortonworks that gives a more human readable output. And that command is hdfs dfsadmin -report
That command doesn't seem to work on cloudera.
Is there any equivalent command in cloudera?
Thanks much

It shouldn't matter whether you're using Cloudera or Hortonworks. If you're using an older version of hadoop the command might be hadoop dfsadmin -report.
Other options you have are:
hadoop fs -df -h
$ hadoop fs -df -h
Filesystem Size Used Available Use%
hdfs://<IP>:8020 21.8 T 244.2 G 21.6 T 1%
Shows the capacity, free and used space of the filesystem. If the filesystem has
multiple partitions, and no path to a particular partition is specified, then
the status of the root partitions will be shown.
hadoop fs -du -h /
$ hadoop fs -du -h /
772 /home
437.3 M /mnt
0 /tmp
229.2 G /user
9.3 G /var
Shows the amount of space, in bytes, used by the files that match the specified file pattern.

Find out actual disk usage in HDFS

Is there a way to find out how much space is consumed in HDFS?
I used
hdfs dfs -df
but it seems to be not relevant cause after deleting huge amount of data with
hdfs dfs -rm -r -skipTrash
the previous comand displays changes not at once but after several minutes (I need up-to-date disk usage info).

To see the space consumed by a particular folder try:
hadoop fs -du -s /folder/path
And if you want to see the usage, space consumed, space available, etc. of the whole HDFS:
hadoop dfsadmin -report

hadoop cli is deprecated. Use hdfs instead.
Folder wise :
sudo -u hdfs hdfs dfs -du -h /
Cluster wise :
sudo -u hdfs hdfs dfsadmin -report

hadoop fs -count -q /path/to/directory

HDFS free space available command

Is there a hdfs command to see available free space in hdfs. We can see that through browser at master:hdfsport in browser , but for some reason I can't access this and I need some command.
I can see my disk usage through command ./bin/hadoop fs -du -h but cannot see free space available.
Thanks for answer in advance.

Try this:
hdfs dfsadmin -report
With older versions of Hadoop, try this:
hadoop dfsadmin -report

Methods
1. dfsadmin
In newer versions of HDFS the hadoop CLI for dfsadmin is deprecated:
$ sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
So you should be using only hdfs at this point. Additionally when on systems where sudo is required you run it like so:
$ sudo -u hdfs hdfs dfsadmin -report
2. fs -df
You have an additional method available via the fs module to hadoop as well:
$ hadoop fs -df -h
Example output
dfsadmin
Also to provide a more thorough answer here's what the output would look like from a single node installation.
$ sudo -u hdfs hdfs dfsadmin -report
Configured Capacity: 7504658432 (6.99 GB)
Present Capacity: 527142912 (502.72 MB)
DFS Remaining: 36921344 (35.21 MB)
DFS Used: 490221568 (467.51 MB)
DFS Used%: 93.00%
Under replicated blocks: 128
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 192.168.114.48:50010 (host-192-168-114-48.td.local)
Hostname: host-192-168-114-48.td.local
Decommission Status : Normal
Configured Capacity: 7504658432 (6.99 GB)
DFS Used: 490221568 (467.51 MB)
Non DFS Used: 6977515520 (6.50 GB)
DFS Remaining: 36921344 (35.21 MB)
DFS Used%: 6.53%
DFS Remaining%: 0.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Thu Feb 04 13:35:04 EST 2016
In the above example, the HDFS HDD space has been 100% utilized.
fs -df
That same system with the -df subcommand from the fs module:
$ hadoop fs -df -h
Filesystem Size Used Available Use%
hdfs://host-192-168-114-48.td.local:8020 7.0 G 467.5 M 18.3 M 7%

Hadoop version 1:
hadoop fs -df -h
OR
hadoop dfsadmin -report
Hadoop version 2:
hdfs dfs -df -h
OR
hadoop dfsadmin -report

Try this command:
hdfs dfsadmin -report

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

HDFS space usage on fresh install - hadoop

I just installed HDFS and launched the service, and there is already more than 800MB of used space. What does it represent ? $ hdfs dfs -df -h Filesystem Size Used Available Use% hdfs://quickstart.cloudera:8020 54.5 G 823.7 M 43.4 G 1%

Related

Know the disk space of data nodes in hadoop?

Spark job crash with no space left on device because of hdfs "non DFS usage"

Remaining space on cloudera hadoop cluster in human readable format

Find out actual disk usage in HDFS

HDFS free space available command

Categories

Resources