Is there a hdfs command to see available free space in hdfs. We can see that through browser at master:hdfsport in browser , but for some reason I can't access this and I need some command.
I can see my disk usage through command ./bin/hadoop fs -du -h but cannot see free space available.
Thanks for answer in advance.
Try this:
hdfs dfsadmin -report
With older versions of Hadoop, try this:
hadoop dfsadmin -report
Methods
1. dfsadmin
In newer versions of HDFS the hadoop CLI for dfsadmin is deprecated:
$ sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
So you should be using only hdfs at this point. Additionally when on systems where sudo is required you run it like so:
$ sudo -u hdfs hdfs dfsadmin -report
2. fs -df
You have an additional method available via the fs module to hadoop as well:
$ hadoop fs -df -h
Example output
dfsadmin
Also to provide a more thorough answer here's what the output would look like from a single node installation.
$ sudo -u hdfs hdfs dfsadmin -report
Configured Capacity: 7504658432 (6.99 GB)
Present Capacity: 527142912 (502.72 MB)
DFS Remaining: 36921344 (35.21 MB)
DFS Used: 490221568 (467.51 MB)
DFS Used%: 93.00%
Under replicated blocks: 128
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 192.168.114.48:50010 (host-192-168-114-48.td.local)
Hostname: host-192-168-114-48.td.local
Decommission Status : Normal
Configured Capacity: 7504658432 (6.99 GB)
DFS Used: 490221568 (467.51 MB)
Non DFS Used: 6977515520 (6.50 GB)
DFS Remaining: 36921344 (35.21 MB)
DFS Used%: 6.53%
DFS Remaining%: 0.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Thu Feb 04 13:35:04 EST 2016
In the above example, the HDFS HDD space has been 100% utilized.
fs -df
That same system with the -df subcommand from the fs module:
$ hadoop fs -df -h
Filesystem Size Used Available Use%
hdfs://host-192-168-114-48.td.local:8020 7.0 G 467.5 M 18.3 M 7%
Hadoop version 1:
hadoop fs -df -h
OR
hadoop dfsadmin -report
Hadoop version 2:
hdfs dfs -df -h
OR
hadoop dfsadmin -report
Try this command:
hdfs dfsadmin -report
Related
Is there a way or any command using which I can come to know the disk space of each datanode or the total cluster disk space?
I tried the command
dfs -du -h /
but it seems that I do not have permission to execute it for many directories and hence cannot get the actual disk space.
From UI:
http://namenode:50070/dfshealth.html#tab-datanode
---> which will give you all the details about datanode.
From command line:
To get disk space of each datanode:
sudo -u hdfs hdfs dfsadmin -report
---> which will give you the details of entire HDFS and the individual datanodes OR
sudo -u hdfs hdfs dfs -du -h /
---> which will give you the total disk usage of each folder under root / directory
You view the information about all datanodes and their disk usage in the namenode UI's Datanodes tab.
Total cluster disk space can be seen in the summary part of the main page.
http://namenode-ip:50070
If you are using Hadoop cluster configured as simple security, you can execute the below command to get the usage of data nodes.
export HADOOP_USER_NAME=hdfs ;
* Above command can be used to get admin privilege in simple security, If you are using any other user for hdfs admin, replace hdfs with the respective hdfs admin user.
hadoop dfsadmin -report
Alternate option is to login to respective datanode and execute the below unix command to get disk utilization of that server.
df -h
Hadoop 3.2.0:
hduser#hadoop-node1:~$ hdfs dfs -df
Filesystem Size Used Available Use%
hdfs://hadoop-node1:54310 3000457228288 461352007680 821808787456 15%
hduser#hadoop-node1:~$
For human-readable numbers, use:
hduser#hadoop-node1:~$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://hadoop-node1:54310 2.7 T 429.7 G 765.4 G 15%
hduser#hadoop-node1:~$
I am running a spark sql job on a small dataset (25 GB) and I always end up filling up the disk of things and eventually crashing up my executors.
$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://x.x.x.x:8020 138.9 G 3.9 G 14.2 G 3%
$ hdfs dfs -du -h
2.5 G .sparkStaging
0 archive
477.3 M checkpoint
When this append, I have to leave safemode.. hdfs dfsadmin -safemode leave
Looking at the spark job itself, it is obviously a problem of shuffle or caching dataframes. However, any idea why df reports such irregular Used/Available sizes? And why du does not list files?
I understand that it's related to the "non DFS usage" I can see in namenode overview. But why spark uses up so much "hidden" space to the point it makes my job crash?
Configured Capacity: 74587291648 (69.46 GB)
DFS Used: 476200960 (454.14 MB)
Non DFS Used: 67648394610 (63.00 GB)
DFS Remaining: 6462696078 (6.02 GB)
I am looking for a command that shows the human readable form of the space left on hadoop cluster. I found a command on this forum and the output is in the image.
hdfs dfsadmin -report
[output of dfsadmin command][1]
I heard that there is another command in hortonworks that gives a more human readable output. And that command is hdfs dfsadmin -report
That command doesn't seem to work on cloudera.
Is there any equivalent command in cloudera?
Thanks much
It shouldn't matter whether you're using Cloudera or Hortonworks. If you're using an older version of hadoop the command might be hadoop dfsadmin -report.
Other options you have are:
hadoop fs -df -h
$ hadoop fs -df -h
Filesystem Size Used Available Use%
hdfs://<IP>:8020 21.8 T 244.2 G 21.6 T 1%
Shows the capacity, free and used space of the filesystem. If the filesystem has
multiple partitions, and no path to a particular partition is specified, then
the status of the root partitions will be shown.
hadoop fs -du -h /
$ hadoop fs -du -h /
772 /home
437.3 M /mnt
0 /tmp
229.2 G /user
9.3 G /var
Shows the amount of space, in bytes, used by the files that match the specified file pattern.
First, I have read this post:Is there an equivalent to `pwd` in hdfs?. It says there is no such 'pwd' in HDFS.
However, as I progressed with the instructions of Hadoop: Setting up a Single Node Cluster, I failed on this command:
$ bin/hdfs dfs -put etc/hadoop input
put: 'input': No such file or directory
It's weird that I succeed on this command for the first time I went through the instructions, but failed for the second time. It's also weird that I succeed on this command on my friends computer, which has the same system (Ubuntu 14.04) and hadoop version (2.7.1) as mine.
Can anyone explain what happened here? Is there some 'pwd' in HDFS after all?
Firstly, You are trying to run the command $ bin/hdfs dfs -put etc/hadoop input with user that doesn't exist in the VM/HDFS
Let me clearly explain you with the following example in HDP VM
[root#sandbox hadoop-hdfs-client]# bin/hdfs dfs -put /etc/hadoop input
put: `input': No such file or directory
Here I executed the command with root user and it didn't exist in the HDP VM. Check in the following command to list the users
[root#sandbox hadoop-hdfs-client]# hadoop fs -ls /user
Found 8 items
drwxrwx--- - ambari-qa hdfs 0 2015-08-20 08:33 /user/ambari-qa
drwxr-xr-x - guest guest 0 2015-08-20 08:47 /user/guest
drwxr-xr-x - hcat hdfs 0 2015-08-20 08:36 /user/hcat
drwx------ - hive hdfs 0 2015-09-04 09:52 /user/hive
drwxr-xr-x - hue hue 0 2015-08-20 09:05 /user/hue
drwxrwxr-x - oozie hdfs 0 2015-08-20 08:37 /user/oozie
drwxr-xr-x - solr hdfs 0 2015-08-20 08:41 /user/solr
drwxrwxr-x - spark hdfs 0 2015-08-20 08:34 /user/spark
In HDFS, If you want to copy a file and not mentioning the absolute path for destination argument, it will consider home of the logged user and place your file there. Here root user not found.
Now let's switch to hive user and test
[root#sandbox hadoop-hdfs-client]# su hive
[hive#sandbox hadoop-hdfs-client]$ bin/hdfs dfs -put /etc/hadoop input
[hive#sandbox hadoop-hdfs-client]$ hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x - hive hdfs 0 2015-09-04 10:07 /user/hive/input
Yay..Successfully Copied..
Hope it helps..!!!
It means that we need to move input files to hdfs location.
Suppose you have input file named input.txt and we need to move to HDFS, then follow the below command.
Command: hdfs dfs -put /input_location /hdfs_location
In case no specific directory in HDFS
hdfs dfs -put /home/Desktop/input.txt /
In case specific directory in HDFS (Note: We need to create a directory before proceeding)
hdfs dfs -put /home/Desktop/input.txt /MR_input
After that you can run the examples
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
Here Input and output are the paths which should be in HDFS.
Hope this helps.
Is there a way to find out how much space is consumed in HDFS?
I used
hdfs dfs -df
but it seems to be not relevant cause after deleting huge amount of data with
hdfs dfs -rm -r -skipTrash
the previous comand displays changes not at once but after several minutes (I need up-to-date disk usage info).
To see the space consumed by a particular folder try:
hadoop fs -du -s /folder/path
And if you want to see the usage, space consumed, space available, etc. of the whole HDFS:
hadoop dfsadmin -report
hadoop cli is deprecated. Use hdfs instead.
Folder wise :
sudo -u hdfs hdfs dfs -du -h /
Cluster wise :
sudo -u hdfs hdfs dfsadmin -report
hadoop fs -count -q /path/to/directory