hbase hdfs goes to safemode on process restart ( probably under replication reported? ) - hadoop

UPDATE
You need to give hdfs-site.xml to hbase/conf so hbase can use the correct target replica, else it uses default 3.
That fixes the message. But my namenode is always in safemode during every process restart.
The fsck is all fine with no errors, no under replicated etc.
I see no logs after:
2012-10-17 13:15:13,278 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode
will be turned off automatically.
2012-10-17 13:15:14,228 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node:
/default-rack/127.0.0.1:50010
2012-10-17 13:15:14,238 INFO org.apache.hadoop.hdfs.StateChange: BLOCK
NameSystem.processReport: from 127.0.0.1:50010, blocks: 20, processing time: 0 msecs
Any suggestions ?
I have dfs.replication set to 1.
hbase is in distributed mode.
First write goes through, but when I restart namenode always reports blocks as under reported.
Output from hadoop fsck /hbase
/hbase/tb1/.tableinfo.0000000003: Under replicated blk_-6315989673141511716_1029. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.oldlogs/hlog.1350414672838: Under replicated blk_-7364606700173135939_1027. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.regioninfo: Under replicated blk_788178851601564156_1027. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
Total size: 8731 B
Total dirs: 34
Total files: 25 (Files currently being written: 1)
Total blocks (validated): 25 (avg. block size 349 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 25 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 25 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 50 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Tue Oct 16 13:23:55 PDT 2012 in 0 milliseconds
Why does it say target replica is 3 but default replication factor is clearly 1.
Anyone please advice.
My versions are hadoop 1.0.3 and hbase 0.94.1
Thanks!

To force the Hdfs to exit from safemode.
Type this:
hadoop dfsadmin -safemode leave

Related

Hadoop 3.2 : No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)

I have a local Hadoop 3.2 installation : 1 master + 1 worker both running in my laptop. This is an experimental setup to make quick tests before submitting to a real cluster.
Everything is in good health:
$ jps
22326 NodeManager
21641 DataNode
25530 Jps
22042 ResourceManager
21803 SecondaryNameNode
21517 NameNode
$ hdfs fsck /
Connecting to namenode via http://master:9870/fsck?ugi=david&path=%2F
FSCK started by david (auth:SIMPLE) from /127.0.0.1 for path / at Wed Sep 04 13:54:59 CEST 2019
Status: HEALTHY
Number of data-nodes: 1
Number of racks: 1
Total dirs: 1
Total symlinks: 0
Replicated Blocks:
Total size: 0 B
Total files: 0
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Wed Sep 04 13:54:59 CEST 2019 in 0 milliseconds
The filesystem under path '/' is HEALTHY
When I'm running the provided Pi example, I get the following error:
$ yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar pi 16 1000
Number of Maps = 16
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
2019-09-04 13:55:47,665 INFO client.RMProxy: Connecting to ResourceManager at master/0.0.0.0:8032
2019-09-04 13:55:47,887 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/david/.staging/job_1567598091808_0001
2019-09-04 13:55:48,020 INFO input.FileInputFormat: Total input files to process : 16
2019-09-04 13:55:48,450 INFO mapreduce.JobSubmitter: number of splits:16
2019-09-04 13:55:48,508 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-09-04 13:55:49,000 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1567598091808_0001
2019-09-04 13:55:49,003 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-09-04 13:55:49,164 INFO conf.Configuration: resource-types.xml not found
2019-09-04 13:55:49,164 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-09-04 13:55:49,375 INFO impl.YarnClientImpl: Submitted application application_1567598091808_0001
2019-09-04 13:55:49,411 INFO mapreduce.Job: The url to track the job: http://cyclimse:8088/proxy/application_1567598091808_0001/
2019-09-04 13:55:49,412 INFO mapreduce.Job: Running job: job_1567598091808_0001
2019-09-04 13:55:55,477 INFO mapreduce.Job: Job job_1567598091808_0001 running in uber mode : false
2019-09-04 13:55:55,480 INFO mapreduce.Job: map 0% reduce 0%
2019-09-04 13:55:55,509 INFO mapreduce.Job: Job job_1567598091808_0001 failed with state FAILED due to: Application application_1567598091808_0001 failed 2 times due to AM Container for appattempt_1567598091808_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2019-09-04 13:55:54.458]Exception from container-launch.
Container id: container_1567598091808_0001_02_000001
Exit code: 1
[2019-09-04 13:55:54.464]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2019-09-04 13:55:54.465]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://cyclimse:8088/cluster/app/application_1567598091808_0001 Then click on links to logs of each attempt.
. Failing the application.
2019-09-04 13:55:55,546 INFO mapreduce.Job: Counters: 0
Job job_1567598091808_0001 failed!
It seems there's is something wrong with the configuration of Log4j: No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).. However it's using the default configuration ($HADOOP_CONF_DIR/log4j.properties).
After the execution, HDFS state looks like this:
$ hdfs fsck /
Connecting to namenode via http://master:9870/fsck?ugi=david&path=%2F
FSCK started by david (auth:SIMPLE) from /127.0.0.1 for path / at Wed Sep 04 14:01:43 CEST 2019
/tmp/hadoop-yarn/staging/david/.staging/job_1567598091808_0001/job.jar: Under replicated BP-24234081-0.0.0.0-1567598050928:blk_1073741841_1017. Target Replicas is 10 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
/tmp/hadoop-yarn/staging/david/.staging/job_1567598091808_0001/job.split: Under replicated BP-24234081-0.0.0.0-1567598050928:blk_1073741842_1018. Target Replicas is 10 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
Status: HEALTHY
Number of data-nodes: 1
Number of racks: 1
Total dirs: 11
Total symlinks: 0
Replicated Blocks:
Total size: 510411 B
Total files: 20
Total blocks (validated): 20 (avg. block size 25520 B)
Minimally replicated blocks: 20 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 2 (10.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 18 (47.36842 %)
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Wed Sep 04 14:01:43 CEST 2019 in 5 milliseconds
The filesystem under path '/' is HEALTHY
As I didn't find any solution on the Internet about it, here I am :).

ERROR: hbase:meta table is not consistent

Error while creating table in HBASE. "ERROR: java.io.IOException: Table Namespace Manager not ready yet, try
again later."
hcbk -fix shows ERROR: hbase:meta is not found on any region.
Error appeared after fresh start of hbase shell session No errors reported in master log during start.
Last session of Hbase closed properly but not zookeeper (suspecting this as a reason for meta table corruption).
I am able to list the tables created earlier
hbase(main):001:0> list
TABLE
IDX_STOCK_SYMBOL
Patient
STOCK_SYMBOL
STOCK_SYMBOL_BKP
SYSTEM.CATALOG
SYSTEM.FUNCTION
SYSTEM.SEQUENCE
SYSTEM.STATS
8 row(s) in 1.7930 seconds
Creating a table named custmaster
hbase(main):002:0> create 'custmaster', 'customer'
ERROR: java.io.IOException: Table Namespace Manager not ready yet, try
again later
at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3179)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1735)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1774)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40470)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Work Around: Running hbck to identify inconsistencies
[hduser#master ~]$ hbase hbck
>Version: 0.98.4-hadoop2
>Number of live region servers: 2
>Number of dead region servers: 0
>Master: master,60000,1538793456542
>Number of backup masters: 0
>Average load: 0.0
>Number of requests: 11
>Number of regions: 0
>Number of regions in transition: 1
>
>ERROR: META region or some of its attributes are null.
>ERROR: hbase:meta is not found on any region.
>ERROR: hbase:meta table is not consistent. Run HBCK with proper fix options to fix hbase:meta inconsistency. Exiting...
.
.
.
>Summary: >
>3 inconsistencies detected.
>Status: INCONSISTENT
Ran hbck wih-details option to identify the tables involved
[hduser#master ~]$ hbase hbck -details
>ERROR: META region or some of its attributes are null.
>ERROR: hbase:meta is not found on any region.
>ERROR: hbase:meta table is not consistent. Run HBCK with proper fix options to fix hbase:meta inconsistency. Exiting...
>Summary:
>3 inconsistencies detected.
>Status: INCONSISTENT
The output of -details clearly shows the meta is not found on any region.
Tried running the command hbase hbck -fixMeta but same result returned as above
Hence tried hbase hbck -fix
This command ran for sometime with the prompt "Trying to fix a problem with hbase:meta.." and resulted in below error
[hduser#master ~]$ hbase hbck -fix
Version: 0.98.4-hadoop2
Number of live region servers: 2
Number of dead region servers: 0
Master: master,60000,1538793456542
Number of backup masters: 0
Average load: 0.0
Number of requests: 19
Number of regions: 0
Number of regions in transition: 1
ERROR: META region or some of its attributes are null.
ERROR: hbase:meta is not found on any region.
Trying to fix a problem with hbase:meta..
2018-10-06 09:01:03,424 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2018-10-06 09:01:03,425 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x166473bbe720005
2018-10-06 09:01:03,432 INFO [main] zookeeper.ZooKeeper: Session: 0x166473bbe720005 closed
2018-10-06 09:01:03,432 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Sat Oct 06 08:52:13 IST 2018, org.apache.hadoop.hbase.client.RpcRetryingCaller#18920cc, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException): org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2416)
at org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:2472)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40456)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Sat Oct 06 08:52:13 IST 2018, org.apache.hadoop.hbase.client.RpcRetryingCaller#18920cc, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException): org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2416)
at org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:2472)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40456)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Help me on how to resolve this issue?
Thanks in advance !!
I havenot checked on NameNode and Datanode logs. But when I check the real issue turned out was corrupt file in HDFS.
Ran hadoop fsck / to check health of the file system.
[hduser#master ~]$ hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
18/10/06 09:52:00 WARN util.NativeCodeLoader: Unable to load native-hadoop libr
ary for your platform... using builtin-java classes where applicable
Connecting to namenode via http://master:50070/fsck?ugi=hduser&path=%2F
FSCK started by hduser (auth:SIMPLE) from /192.168.1.11 for path / at Sat Oct 0
6 09:52:02 IST 2018
...............................................................................
..
/user/hduser/hbase/.hbck/hbase-1538798774320/data/hbase/meta/1588230740/info/35
9783d4cd07419598264506bac92dcf: CORRUPT blockpool BP-1664228054-192.168.1.11-15
35828595216 block blk_1073744002
/user/hduser/hbase/.hbck/hbase-1538798774320/data/hbase/meta/1588230740/info/35 9783d4cd07419598264506bac92dcf: MISSING 1 blocks of total size 3934 B.........
/user/hduser/hbase/data/default/IDX_STOCK_SYMBOL/a27db76f84487a05f3e1b8b74c13fa
78/0/c595bf49443f4daf952df6cdaad79181: CORRUPT blockpool BP-1664228054-192.168.
1.11-1535828595216 block blk_1073744000
/user/hduser/hbase/data/default/IDX_STOCK_SYMBOL/a27db76f84487a05f3e1b8b74c13fa
78/0/c595bf49443f4daf952df6cdaad79181: MISSING 1 blocks of total size 1354 B...
.........
...
/user/hduser/hbase/data/default/SYSTEM.CATALOG/d63574fdd00e8bf3882fcb6bd53c3d83
/0/dcb68bbb5e394d19b06db7f298810de0: CORRUPT blockpool BP-1664228054-192.168.1.
11-1535828595216 block blk_1073744001
/user/hduser/hbase/data/default/SYSTEM.CATALOG/d63574fdd00e8bf3882fcb6bd53c3d83
/0/dcb68bbb5e394d19b06db7f298810de0: MISSING 1 blocks of total size 2283 B..... ......................Status: CORRUPT
Total size: 4232998 B
Total dirs: 109
Total files: 129
Total symlinks: 0
Total blocks (validated): 125 (avg. block size 33863 B)
********************************
UNDER MIN REPL'D BLOCKS: 3 (2.4 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 3
MISSING BLOCKS: 3
MISSING SIZE: 7571 B
CORRUPT BLOCKS: 3
********************************
Minimally replicated blocks: 122 (97.6 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.952
Corrupt blocks: 3
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Oct 06 09:52:02 IST 2018 in 66 milliseconds
The filesystem under path '/' is CORRUPT
And I run hdfs hbck -delete option to delete the corrupt files and fixed the issue.
Detailed explanation on cleaning hdfs f/s is available here --> How to fix corrupt HDFS FIles

Hadoop MultiNode Cluster Set up - Data not getting loaded to HDFS

I am trying to set up a multinode cluster (Hadoop 1.0.4) and all the daemons are coming up. I have a 2 node Cluster with a Master and one Slave. I am configuring only slave as datanode.
I did change the core-site.xml, mapred-site.xml and hdfs-site.xml , Master(with Master IP) and slaves file (with slave IP).
Configured passwordless ssh and copied it to authorized_keys , copied Master public key to slave authorized_keys.
Formatted Namenode
I could see all daemons - Namenode, Jobtracker and Secondary namenode running in Master and TaskTracker,Datanode on Slave machine.
But when I try to load the data to hdfs using hadoop fs -put command I am getting the following error
15/09/26 08:43:33 ERROR hdfs.DFSClient: Exception closing file /Hello : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /Hello could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
I did an fsck command and got the below message.
FSCK started by hadoop from /172.31.18.149 for path / at Sat Sep 26 08:46:00 EDT 2015
Status: HEALTHY
Total size: 0 B
Total dirs: 5
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 0
Number of racks: 0
Somehow Datanode is unavailable to the Namenode, but I couldnt figure out why.
Any help is appreciated . Thanks!

Hadoop node taking a long time to decommission

EDIT: I finally figured out what the issue was. Some files had very high replication factor set, and I was reducing my cluster to 2 nodes. Once I reduced my replication factor on those files, the decommissioning successfully ended quickly.
I've added the node to be decommissioned in the dfs.hosts.exclude and mapred.hosts.exclude files, and executed this command:
bin/hadoop dfsadmin -refreshNodes.
In the NameNode UI, I see this node under Decommissioning Nodes, but it's taking too long time, and I don't have much data on the node being decommissioned.
Does it always take a very long time to decommision nodes or is there some place I should be looking? I'm not sure what is exactly going on.
I don't see any corrupted blocks also on this node:
$ ./hadoop/bin/hadoop fsck -blocks /
Total size: 157254687 B
Total dirs: 201
Total files: 189 (Files currently being written: 6)
Total blocks (validated): 140 (avg. block size 1123247 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 140 (100.0 %)
Over-replicated blocks: 6 (4.285714 %)
Under-replicated blocks: 12 (8.571428 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.9714285
Corrupt blocks: 0
Missing replicas: 88 (31.884058 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jul 22 14:42:45 IST 2013 in 33 milliseconds
The filesystem under path '/' is HEALTHY
$ ./hadoop/bin/hadoop dfsadmin -report
Configured Capacity: 25357025280 (23.62 GB)
Present Capacity: 19756299789 (18.4 GB)
DFS Remaining: 19366707200 (18.04 GB)
DFS Used: 389592589 (371.54 MB)
DFS Used%: 1.97%
Under replicated blocks: 14
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)
Name: 10.40.11.107:50010
Decommission Status : Decommission in progress
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 54947840 (52.4 MB)
Non DFS Used: 1786830848 (1.66 GB)
DFS Remaining: 6610563072(6.16 GB)
DFS Used%: 0.65%
DFS Remaining%: 78.21%
Last contact: Mon Jul 22 14:29:37 IST 2013
Name: 10.40.11.106:50010
Decommission Status : Normal
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 167412428 (159.66 MB)
Non DFS Used: 1953377588 (1.82 GB)
DFS Remaining: 6331551744(5.9 GB)
DFS Used%: 1.98%
DFS Remaining%: 74.91%
Last contact: Mon Jul 22 14:29:37 IST 2013
Name: 10.40.11.108:50010
Decommission Status : Normal
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 167232321 (159.49 MB)
Non DFS Used: 1860517055 (1.73 GB)
DFS Remaining: 6424592384(5.98 GB)
DFS Used%: 1.98%
DFS Remaining%: 76.01%
Last contact: Mon Jul 22 14:29:38 IST 2013
Decommissioning is not an instant process, even if you don't have much data.
First, when you decommission that means that the data has to be replicated quite a few blocks (depends on how large your block size is), and this could easily overwhelm your cluster and cause operational issues, so I believe this is somewhat throttled.
Also, depending on which Hadoop version you use, the thread that monitors decomissions only wakes up every so often. It used to be around 5 minutes in the earlier versions of Hadoop, but I believe now this is every minute or less.
Decommission in progress means that the blocks are being replicated, so I guess this really depends how much data you have, and you just have to wait since this won't be utilizing your cluster fully for this task.
While decommissioning in progress, temporary or staging files get cleaned automatically. These files are missing now and hadoop is not recognizing how that went missing. So the decommissioning process keeps waiting until that is resolved even though the actual decommissioning is done for all the other files.
In Hadoop GUI - if you notice the parameter "Number of Under-Replicated Blocks" is not reducing over the time or almost constant then this is the reason likely.
So list the files using below command
hadoop fsck / -files -blocks -racks
If you see those files are temporary and not required then delete those files or folder
Example: hadoop fs -rmr /var/local/hadoop/hadoop/.staging/* (give the correct path here)
This would solve the problem immediately. De-commissioned nodes will move to Dead Nodes in 5 mins.
Please note that the status won't change or will take ages (and fail eventually) if you do not have more active datanodes than the replication factor in file level or default level.

datanode available: 0 when installing hadoop

I want to install hadoop-0.23.5 on single node, but after starting namenode and datanode, it shows that the datanode available is 0:
Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: �% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
I checked the datanode log file and this is the error:
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
I set dfs.namenode.rpc-address in hdfs-site.xml and I don't understand what the problem is. does anybody know how could i fix this problem.
You are probably hitting this issue which affected versions 0.23
What you need to do is update fs.default.name in core-default.xml

Resources