How to free Non DFS Used space with Hortonworks hdp SSH client? - hadoop

I'm using HDP for self-study to learn Big data basics. Today I've faced the following: HDFS disk usage is 91%. With Non DFS Used 31.2 GB / 41.6 GB (74.96%).
What exactly should I do to free disk space? Is it possible to do from sandbox hdp SSH client? I'm running HPD on a Virtual box.
I've executed the command from sandbox hdp SSH client: hdfs dfs -du -h / But this is obviously HDFS data usage.
12.2 M /app-logs
1.5 G /apps
0 /ats
860.9 K /demo
724.4 M /hdp
0 /livy2-recovery
0 /mapred
0 /mr-history
479.6 M /ranger
176.6 K /spark2-history
0 /tmp
4.0 G /user
0 /webhdfs

Just treat this like any other disk almost full issue.
Login to the sandbox. Run du -s /*/* to see what is using up disk space. I suspect it's probably the log files (under /var/log/*).

Related

hadoop + how to rebalnce the hdfs

we have HDP cluster version 2.6.5 with 8 data nodes , all machines are installed on rhel 7.6 version
HDP cluster is based amabri platform version - 2.6.1
each data-node ( worker machine ) include two disks and each disk size is 1.8T
when we access the data-node machines we can see differences between the size of the disks
for example on the first data-node the size is : ( by df -h )
/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb
on the second data-node the size is:
/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb
on the third data-node th size is:
/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb
and so on
the big question is why HDFS not perform the re-balance on the HDFS disks?
for example expected results on all disks should be with the same size on all datanodes machines
why is the used size differences between datanode1 to datanode2 to datanode3 etc ?
any advice about the tune parameters in HDFS that can help us?
because its very critical when one disk is reached 100% size and the other are more small as 50%
This is known behaviour of the hdfs re-balancer in HDP 2.6, There are many reasons for unbalanced block distribution. Click to check all the possible reasons.
With HDFS-1312 a disk balance option have been introduced to address this issue.
Following articles shall help you tune it more efficiently:-
HDFS Balancer (1): 100x Performance Improvement
HDFS Balancer (2): Configurations & CLI Options
HDFS Balancer (3): Cluster Balancing Algorithm
I would suggest to upgrade to HDP3.X as HDP 2.x is not supported anymore by Cloudera Support.

Datanode is in dead state as DFS used is 100 percent

I am having a standalone setup of Apache Hadoop with Namenode and Datanode running in the same machine.
I am currently running Apache Hadoop 2.6 (I cannot upgrade it) running on Ubuntu 16.04.
Although my system is having more than 400 GB of Hard disk left but my Hadoop dashboard is showing 100%.
Why Apache Hadoop is not consuming the rest of the disk space available to it? Can anybody help me figuring out the solution.
There can be certain reasons for it.
You can try following steps:
Goto $HADOOP_HOME/bin
./hadoop-daemon.sh --config $HADOOP_HOME/conf start datanode
Then you can try the following things:-
If any directory other than your namenode and datanode directories taking up too much space, you can start cleaning up
Also you can run hadoop fs -du -s -h /user/hadoop (to see usage of the directories).
Identify all the unnecessary directories and start cleaning up by running hadoop fs -rm -R /user/hadoop/raw_data (-rm is to delete -R is to delete recursively, be careful while using -R).
Run hadoop fs -expunge (to clean up the trash immediately, some times you need to run multiple times).
Run hadoop fs -du -s -h / (it will give you hdfs usage of the entire file system or you can run dfsadmin -report as well - to confirm whether storage is reclaimed)
Many times it shows missing blocks ( with replication 1).

Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each
However when I tried to put a file of 3 gb on to the HDFS through the code below
/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin
it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).
If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?
Edit: Summary of my understanding of the issue you are facing:
1) Total HDFS free size is 5.32 GB
2) HDFS free size on each node is 2.6GB
Note: You have bad blocks (4 Blocks with corrupt replicas)
The following Q&A mentions similar issues:
Hadoop put command throws - could only be replicated to 0 nodes, instead of 1
In that case, running JPS showed that the datanode are down.
Those Q&A suggest a way to restart the data-node:
What is best way to start and stop hadoop ecosystem, with command line?
Hadoop - Restart datanode and tasktracker
Please try to restart your data-node, and let us know if it solved the problem.
When using HDFS - you have one shared file system
i.e. all nodes share the same file system
From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.
Execute the following command to get the HDFS free size:
hdfs dfs -df -h
hdfs dfsadmin -report
or (for older versions of HDFS)
hadoop fs -df -h
hadoop dfsadmin -report

hortonworks : start datanode failed

I have installed a new cluster HDP 2.3 using ambari 2.2. the problem is that namenode service can't be started and each time I try, I get the folowwing error. when I tried to find the problem I found an other error more explicit (port 50070 is used and I think that namenode use this port). Any one Has solved this problem before? thanks
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh
su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config
/usr/hdp/current/hadoop-client/conf start namenode'' returned 1.
starting namenode, logging to
/var/log/hadoop/hdfs/hadoop-hdfs-namenode-ip-10-8-23-175.eu-west-2.compute.internal.out
in order to install hortonworks cluster, ambari tries to set core file size limit to unlimited if not set initially. It seems like linux user which is installing the cluster doesn't have the privileges to set ulimits.
just set core file size to unlimited in
/etc/security/limits.conf and it should come up.
* soft core unlimited
* hard core unlimited

copyFromLocalFile doesn't work in CDH4

I've installed CDH4 on a ubuntu 12 LTS server successfully in the amazon cloud (1 server). I used Cloudera Manager free edition to install the software and had no errors).
I have a program that uses the java API to load a file from my home computer to HDFS in the cloud. I would like to know why this program fails and how to fix it.
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://node01:8020");
FileSystem fs = FileSystem.get(conf);
Path targetPath = new Path("/users/<username>/myfile.txt");
Path sourcePath = new Path("/home/<username>/myfile.txt");
fs.copyFromLocalFile(false,true,sourcePath,targetPath);
I get the following error (namenode log):
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/<username>/myfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
Then i upload my code to the cloud and run the code locally (uploading a file from the local fs to hdfs) there aren't any errors. It only happens when I run the code from my personal computer...
When i use the cli 'hadoop fs -put' command on my cloud server, I get no errors when writing to hdfs. I can also upload files using Hue. I've done some reading and found that this problem occurs when there isn't enough disk space, but I have plenty for both dfs and non-dfs (see report below). I can successfully read the hadoop filesystem with the java API from my home computer, and I can even connect and read/write from HBase using the API. All ports are open to my IP on this server. File permissions have been checked. After the program fails, I see the file I tried to upload in hdfs , but the contents are blank (similar to this post: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/XWA-3H0ekYY )
here is the output from hdfs dfsadmin -report
Configured Capacity: 95120474112 (88.59 GB)
Present Capacity: 95120474112 (88.59 GB)
DFS Remaining: 95039008768 (88.51 GB)
DFS Used: 81465344 (77.69 MB)
DFS Used%: 0.09%
Under replicated blocks: 177
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: privateip:port (node01)
Hostname: node01
Rack: /default
Decommission Status : Normal
Configured Capacity: 95120474112 (88.59 GB)
DFS Used: 81465344 (77.69 MB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 95039008768 (88.51 GB)
DFS Used%: 0.09%
DFS Remaining%: 99.91%
Last contact: Sun Jan 27 03:01:53 UTC 2013
I've resolved the problem-
I was connecting to hadoop from my home machine not on the hadoop local network. Apparently when you do this, the namenode tells my home machine to write to the datanode using the datanode's private IP. Not being on the same network, my home machine coulnd't connect to the datanode creating this error.
I resolved the problem by creating a VPN connection from my home network to the hadoop network and now everything work.

Resources