Hadoop namenode can't get out of safemode

I use hadoop2.6.0.
When I force the hadoop leave the safe mode,using hdfs dfsadmin -safemode leave, it shows Safe mode is OFF,but I still can't delete the file in the directory,the result show that:
rm: Cannot delete /mei/app-20151013055617-0001-614d554c-cc04-4800-9be8-7d9b3fd3fcef. Name node is in safe mode.
I try to solve this problem using the way listing in the Internet,it doesn't work...
I use the command 'hdfs dfsadmin -report',it shows:
Safe mode is ON
Configured Capacity: 52710469632 (49.09 GB)
Present Capacity: 213811200 (203.91 MB)
DFS Remaining: 0 (0 B)
DFS Used: 213811200 (203.91 MB)
DFS Used%: 100.00%
Under replicated blocks: 39
Blocks with corrupt replicas: 0
Missing blocks: 0
Live datanodes (1):
Name: (bdrhel6)
Hostname: bdrhel6
Decommission Status : Normal
Configured Capacity: 52710469632 (49.09 GB)
DFS Used: 213811200 (203.91 MB)
Non DFS Used: 52496658432 (48.89 GB)
DFS Remaining: 0 (0 B)
DFS Used%: 0.41%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Oct 14 03:30:33 EDT 2015
Does anyone have the same problem?
Any help on this please.

Safemode is an HDFS state in which the file system is mounted read-only; no replication is performed, nor can files be created or deleted. This is automatically entered as the NameNode starts, to allow all DataNodes time to check in with the NameNode and announce which blocks they hold, before the NameNode determines which blocks are under-replicated, etc. The NameNode waits until a specific percentage of the blocks are present and accounted-for; this is controlled in the configuration by the dfs.safemode.threshold.pct parameter. After this threshold is met, safemode is automatically exited, and HDFS allows normal operations.
1. Below command forces the NameNode to exit safemode
hdfs dfsadmin -safemode leave
2. Run hdfs fsck -move or hdfs fsck -delete to move or delete corrupted files.
Based on the report, It seems that Resource are low on NN. Add or free up more resources then turn off safe mode manually. If you turn off safe mode before adding more resources or freeing up resource, the NameNode will immediately return to safe mode.
Hadoop Tutorial-YDN

hdfs dfsadmin -safemode forceExit
did the trick for me.

I faced the same problem. It was occurring because there was no disk space for hadoop to run new commands to manipulate the files.
Since hadoop was in safemode, I could not even delete files inside hadoop.
I am using cloudera version of hadoop so I first deleted few files in cloudera file system. This freed up some space. Then I executed following command:
[cloudera#quickstart ~]$ hdfs dfsadmin -safemode leave | hadoop fs -rm -r <file on hdfs to be deleted>
This worked for me!


Unable to write to HDFS: WARN hdfs.DataStreamer - Unexpected EOF

I'm following a tutorial and while running in a single cluster test environment I suddenly cannot run any MR jobs or write data to HDFS. It worked good before and suddenly I keep getting below error (rebooting didn't help).
I can read and delete files from HDFS, but not write.
$ hdfs dfs -put war-and-peace.txt /user/hands-on/
19/03/25 18:28:29 WARN hdfs.DataStreamer: Exception for BP-1098838250-
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:399)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1020)
put: All datanodes [DatanodeInfoWithStorage[,DS-b90326de-a499-4a43-a66a-cc3da83ea966,DISK]] are bad. Aborting...
"hdfs dfsadmin -report" shows me everything is fine, enough disk space. I barely ran any jobs, just some test MRs and little test data.
$ hdfs dfsadmin -report
Configured Capacity: 52710469632 (49.09 GB)
Present Capacity: 43335585007 (40.36 GB)
DFS Remaining: 43334025216 (40.36 GB)
DFS Used: 1559791 (1.49 MB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Live datanodes (1):
Name: (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 52710469632 (49.09 GB)
DFS Used: 1559791 (1.49 MB)
Non DFS Used: 6690530065 (6.23 GB)
DFS Remaining: 43334025216 (40.36 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.21%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Mon Mar 25 18:30:45 EDT 2019
Also the NameNode WebUI (port 50070) shows me everything is fine, the logs too do not report any error. What could it be / how could I properly troubleshoot it?
CentOS Linux 6.9 minimal
Apache Hadoop 2.8.1

Hadoop: two datanodes but UI shows one and Spark: two workers UI shows one

I have seen lots of answers on SO and on Quora along with many websites. Some problems were solved when they configured firewall for slaves IPs, Some said it's a UI glitch. I am confused . I have two datanodes: one is pure datanode and another is Namenode+datanode. Problem is when I do <master-ip>:50075 it shows only one datanode ( that of machine which has namenode too ). but my hdfs dfsadmin -report shows I have two datanodes and after starting hadoop on my master and if I do jps on my pure-datanode-machine or slave machine I can see datanode running.
Firewall on both machines is off. sudo ufw status verbose gives Status: inactive response. Same scenerio is with spark. Spark UI show worker node as the node with master node not the pure worker node.But worker is running on pure-worker-machine. Again, is this a UI glitch or I am missing something?
hdfs dfsadmin -report
Configured Capacity: 991216451584 (923.14 GB)
Present Capacity: 343650484224 (320.05 GB)
DFS Remaining: 343650418688 (320.05 GB)
DFS Used: 65536 (64 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Live datanodes (2):
Name: (ekbana)
Hostname: ekbana
Decommission Status : Normal
Configured Capacity: 24690192384 (22.99 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 7112691712 (6.62 GB)
DFS Remaining: 16299675648 (15.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.02%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Jul 25 04:27:36 EDT 2017
Name: (saque-slave-ekbana)
Hostname: ekbana
Decommission Status : Normal
Configured Capacity: 966526259200 (900.15 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 590055215104 (549.53 GB)
DFS Remaining: 327350743040 (304.87 GB)
DFS Used%: 0.00%
DFS Remaining%: 33.87%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Jul 25 04:27:36 EDT 2017
/etc/hadoop/masters file on master node
/etc/hadoop/slaves file on master node
/etc/hadoop/masters file on slave node
Note: saque-master on slaves machine and ekbana on master machine is mapped to same IP.
Also UI looks similar to this question's UI
It's because of the same hostname(ekbana).
So in UI it will show only one entry for the same hostname.
if you want to confirm this, just start only one datanode which is not in master. you can see entry for that in the UI.
If you started other datanode too, it will mask second entry for the same hostname.
you can change the hostname and try.
I also Faced similar issue, where I couldn't see datanode information on dfshealth.html page. I had two hosts named master and slave.
etc/hadoop/masters (on master machine)
etc/hadoop/masters (slave machine)
and it was able to see datanodes on UI.

Hadoop error while using this command hadoop fs -mkdir /in

Hadoop command: hadoop fs -mkdir /in tried in folder C:\hwork but it did not work properly. Please help me to solve this.
You should be able to visit the namenode UI before doing any filesystem operations to verify your HDFS cluster is actually working. The error message you have given indicates that it is not.
The namenode UI is typically available on http://localhost:50070/dfshealth.jsp.
You can also verify that HDFS is working properly by running the following command:
hdfs dfsadmin -report
If "Safe Mode is ON", then things are not running properly. You should also have a non-zero configured capacity and at least one datanode which is available.
A healthy pseudo-distributed HDFS environment should give a report that looks something like this:
Configured Capacity: 41746268160 (38.88 GB)
Present Capacity: 34658451456 (32.28 GB)
DFS Remaining: 34655678464 (32.28 GB)
DFS Used: 2772992 (2.64 MB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: (datanode)
Hostname: datanode
Decommission Status : Normal
Configured Capacity: 41746268160 (38.88 GB)
DFS Used: 2772992 (2.64 MB)
Non DFS Used: 7087816704 (6.60 GB)
DFS Remaining: 34655678464 (32.28 GB)
DFS Used%: 0.01%
DFS Remaining%: 83.02%
Last contact: Thu May 07 16:51:50 UTC 2015

Hbase: Newly added regionserver is not severing requests

I'm setting up a Hbase cluster on a cloud infrastructure.
HBase version: 0.94.11
Hadoop version: 1.0.4
Currently I have 4 nodes in my cluster (1 master, 3 regionservers) and I'm using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send READ requests (Asynchronous READ requests).
Everything works fine with this setup (as I'm monitoring the hole process with ganglia and I'm getting lamda, throughput, latency combined with the YCSB's output), but the problem occurs when I add a new regionserver on-the-fly as it doesn't getting any requests.
What "on-the-fly" means:
While the YCSB is sending request to the cluster, I'm adding new regionservers using python scripts.
Addition Process (while the cluster is serving requests):
I'm creating a new VM which will act as the new regionserver and configure every needed aspect (hbase, hadoop, /etc/host, connect to private network, etc)
Stoping **hbase** balancer
Configuring every node in the cluster with the new node's information adding hostname to regioservers filesadding hostname to hadoop's slave fileadding hostname and IP to /etc/host file of every nodeetc
Executing on the master node:
(I've also tried to run `hbase start regionserver` on the newly added node and does exactly the same with the last command - starts the regionserver)
Once the newly added node is up and running I'm executing **hadoop** load balancer
When the hadoop load balancer stops I'm starting again the **hbase** load balancer
I'm connecting over ssh to the master node and check that the load balancers (hbase/hadoop) did their job as both the blocks and regions are uniformly spread across all the regionservers/slaves including the new one.
But when I run status 'simple' in the hbase shell I see that the new regionservers are not getting any requests. (below is the output of the command after adding 2 new regionserver "okeanos-nodes-4/5")
hbase(main):008:0> status 'simple'
5 live servers
okeanos-nodes-1:60020 1380865800330
requestsPerSecond=5379, numberOfOnlineRegions=4, usedHeapMB=175, maxHeapMB=3067
okeanos-nodes-2:60020 1380865800738
requestsPerSecond=5674, numberOfOnlineRegions=4, usedHeapMB=161, maxHeapMB=3067
okeanos-nodes-5:60020 1380867725605
requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27, maxHeapMB=3067
okeanos-nodes-3:60020 1380865800162
requestsPerSecond=3871, numberOfOnlineRegions=5, usedHeapMB=162, maxHeapMB=3067
okeanos-nodes-4:60020 1380866702216
requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29, maxHeapMB=3067
0 dead servers
Aggregate load: 14924, regions: 19
The fact that they don't serve any requests is also evidenced by the CPU usage, in a serving regionserver is about 70% while in these 2 regioservers is about 2%.
Below is the output of hadoop dfsadmin -report, as you can see the block are evenly distributed (according to hadoop balancer -threshold 2).
root#okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin -report
Configured Capacity: 105701683200 (98.44 GB)
Present Capacity: 86440648704 (80.5 GB)
DFS Remaining: 84188446720 (78.41 GB)
DFS Used: 2252201984 (2.1 GB)
DFS Used%: 2.61%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 5 (5 total, 0 dead)
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 309166080 (294.84 MB)
Non DFS Used: 3851579392 (3.59 GB)
DFS Remaining: 16979591168(15.81 GB)
DFS Used%: 1.46%
DFS Remaining%: 80.32%
Last contact: Fri Oct 04 11:30:31 EEST 2013
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 531652608 (507.02 MB)
Non DFS Used: 3852300288 (3.59 GB)
DFS Remaining: 16756383744(15.61 GB)
DFS Used%: 2.51%
DFS Remaining%: 79.26%
Last contact: Fri Oct 04 11:30:32 EEST 2013
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 502910976 (479.61 MB)
Non DFS Used: 3853029376 (3.59 GB)
DFS Remaining: 16784396288(15.63 GB)
DFS Used%: 2.38%
DFS Remaining%: 79.4%
Last contact: Fri Oct 04 11:30:32 EEST 2013
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 421974016 (402.43 MB)
Non DFS Used: 3852365824 (3.59 GB)
DFS Remaining: 16865996800(15.71 GB)
DFS Used%: 2%
DFS Remaining%: 79.78%
Last contact: Fri Oct 04 11:30:29 EEST 2013
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 486498304 (463.96 MB)
Non DFS Used: 3851759616 (3.59 GB)
DFS Remaining: 16802078720(15.65 GB)
DFS Used%: 2.3%
DFS Remaining%: 79.48%
Last contact: Fri Oct 04 11:30:29 EEST 2013
I've tried stopping YCSB, restarting hbase master and restarting YCSB but with no lack.. these 2 nodes don't serve any requests!
As there are many log and conf files, I have created a zip file with logs and confs (both hbase and hadoop) of the master, a healthy regionserver serving requests and a regionserver not serving requests.
Thank you in advance!!
I found what was going on and it had nothing to do with Hbase... I have forgotten to add the hostname and IP of the new RS to the YCSB server VM (/etc/hosts file).... :-(

Hadoop node taking a long time to decommission

EDIT: I finally figured out what the issue was. Some files had very high replication factor set, and I was reducing my cluster to 2 nodes. Once I reduced my replication factor on those files, the decommissioning successfully ended quickly.
I've added the node to be decommissioned in the dfs.hosts.exclude and mapred.hosts.exclude files, and executed this command:
bin/hadoop dfsadmin -refreshNodes.
In the NameNode UI, I see this node under Decommissioning Nodes, but it's taking too long time, and I don't have much data on the node being decommissioned.
Does it always take a very long time to decommision nodes or is there some place I should be looking? I'm not sure what is exactly going on.
I don't see any corrupted blocks also on this node:
$ ./hadoop/bin/hadoop fsck -blocks /
Total size: 157254687 B
Total dirs: 201
Total files: 189 (Files currently being written: 6)
Total blocks (validated): 140 (avg. block size 1123247 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 140 (100.0 %)
Over-replicated blocks: 6 (4.285714 %)
Under-replicated blocks: 12 (8.571428 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.9714285
Corrupt blocks: 0
Missing replicas: 88 (31.884058 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jul 22 14:42:45 IST 2013 in 33 milliseconds
The filesystem under path '/' is HEALTHY
$ ./hadoop/bin/hadoop dfsadmin -report
Configured Capacity: 25357025280 (23.62 GB)
Present Capacity: 19756299789 (18.4 GB)
DFS Remaining: 19366707200 (18.04 GB)
DFS Used: 389592589 (371.54 MB)
DFS Used%: 1.97%
Under replicated blocks: 14
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 3 (3 total, 0 dead)
Decommission Status : Decommission in progress
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 54947840 (52.4 MB)
Non DFS Used: 1786830848 (1.66 GB)
DFS Remaining: 6610563072(6.16 GB)
DFS Used%: 0.65%
DFS Remaining%: 78.21%
Last contact: Mon Jul 22 14:29:37 IST 2013
Decommission Status : Normal
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 167412428 (159.66 MB)
Non DFS Used: 1953377588 (1.82 GB)
DFS Remaining: 6331551744(5.9 GB)
DFS Used%: 1.98%
DFS Remaining%: 74.91%
Last contact: Mon Jul 22 14:29:37 IST 2013
Decommission Status : Normal
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 167232321 (159.49 MB)
Non DFS Used: 1860517055 (1.73 GB)
DFS Remaining: 6424592384(5.98 GB)
DFS Used%: 1.98%
DFS Remaining%: 76.01%
Last contact: Mon Jul 22 14:29:38 IST 2013
Decommissioning is not an instant process, even if you don't have much data.
First, when you decommission that means that the data has to be replicated quite a few blocks (depends on how large your block size is), and this could easily overwhelm your cluster and cause operational issues, so I believe this is somewhat throttled.
Also, depending on which Hadoop version you use, the thread that monitors decomissions only wakes up every so often. It used to be around 5 minutes in the earlier versions of Hadoop, but I believe now this is every minute or less.
Decommission in progress means that the blocks are being replicated, so I guess this really depends how much data you have, and you just have to wait since this won't be utilizing your cluster fully for this task.
While decommissioning in progress, temporary or staging files get cleaned automatically. These files are missing now and hadoop is not recognizing how that went missing. So the decommissioning process keeps waiting until that is resolved even though the actual decommissioning is done for all the other files.
In Hadoop GUI - if you notice the parameter "Number of Under-Replicated Blocks" is not reducing over the time or almost constant then this is the reason likely.
So list the files using below command
hadoop fsck / -files -blocks -racks
If you see those files are temporary and not required then delete those files or folder
Example: hadoop fs -rmr /var/local/hadoop/hadoop/.staging/* (give the correct path here)
This would solve the problem immediately. De-commissioned nodes will move to Dead Nodes in 5 mins.
Please note that the status won't change or will take ages (and fail eventually) if you do not have more active datanodes than the replication factor in file level or default level.
