hadoop error not able to place enough replicas - hadoop

I am using hadoop 1.2.1. It was active for about 2 years. Now following error start to appear in logs and hbase 0.94.14 could not connect with it.
NameNode Error:
2016-03-09 11:57:23,965 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 to reach 2
Not able to place enough replicas
2016-03-09 11:57:23,965 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 to reach 2
Not able to place enough replicas
2016-03-09 11:57:23,965 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 to reach 2
Not able to place enough replicas
2016-03-09 11:57:26,966 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 to reach 2
Not able to place enough replicas
And in hbase master log file error is like following
2016-03-09 11:16:31,192 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: node1,12000,1457504177336.timeoutMonitor exiting
2016-03-09 11:16:31,193 INFO org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: node1,12000,1457504177336.splitLogManagerTimeoutMonitor exiting
2016-03-09 11:16:31,192 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: node1,12000,1457504177336.timerUpdater exiting
2016-03-09 11:16:31,218 INFO org.apache.zookeeper.ZooKeeper: Session: 0x2535a0114bb0001 closed
2016-03-09 11:16:31,218 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-03-09 11:16:31,218 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting
2016-03-09 11:16:31,218 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2120)
Wed Mar 9 11:22:26 PKT 2016 Stopping hbase (via master)
where is the problem. I have found a post who suggest that you should delete all data and format namdenode but I cannot do that as I cannot backup data.
This is the cluster summery
Configured Capacity: 3293363527680 (3 TB)
Present Capacity: 2630143946752 (2.39 TB)
DFS Remaining: 1867333337088 (1.7 TB)
DFS Used: 762810609664 (710.42 GB)
DFS Used%: 29%
Under replicated blocks: 35
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)
Name: 10.11.21.44:50010
Decommission Status : Normal
Configured Capacity: 441499058176 (411.18 GB)
DFS Used: 246261780480 (229.35 GB)
Non DFS Used: 194947321856 (181.56 GB)
DFS Remaining: 289955840(276.52 MB)
DFS Used%: 55.78%
DFS Remaining%: 0.07%
Last contact: Thu Mar 10 15:20:15 PKT 2016
Name: 10.11.21.42:50010
Decommission Status : Normal
Configured Capacity: 2410365411328 (2.19 TB)
DFS Used: 304959569920 (284.02 GB)
Non DFS Used: 238646935552 (222.26 GB)
DFS Remaining: 1866758905856(1.7 TB)
DFS Used%: 12.65%
DFS Remaining%: 77.45%
Last contact: Thu Mar 10 15:20:15 PKT 2016
Name: 10.11.21.43:50010
Decommission Status : Normal
Configured Capacity: 441499058176 (411.18 GB)
DFS Used: 211589259264 (197.06 GB)
Non DFS Used: 229625323520 (213.86 GB)
DFS Remaining: 284475392(271.3 MB)
DFS Used%: 47.93%
DFS Remaining%: 0.06%
Last contact: Thu Mar 10 15:20:16 PKT 2016

Related

Hadoop add new datanode fail when build cluster

I'm build a hadoop cluster, about two node, step by step with official document.
But append datanode not join the cluster at Web UI: http://{host address}:50070/dfshealth.html#tab-datanode
with command:
[az-user#AZ-TEST1-SPARK-SLAVE ~]$ yarn node --list
17/11/27 09:16:04 INFO client.RMProxy: Connecting to ResourceManager at /10.0.4.12:8032
Total Nodes:2
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
AZ-TEST1-SPARK-MASTER:37164 RUNNING AZ-TEST1-SPARK-MASTER:8042 0
AZ-TEST1-SPARK-SLAVE:42608 RUNNING AZ-TEST1-SPARK-SLAVE:8042 0
It shows there are two node, but with another command just shows one livenode:
[az-user#AZ-TEST1-SPARK-SLAVE ~]$ hdfs dfsadmin -report
Configured Capacity: 1081063493632 (1006.82 GB)
Present Capacity: 1026027008000 (955.56 GB)
DFS Remaining: 1026026967040 (955.56 GB)
DFS Used: 40960 (40 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (1):
Name: 10.0.4.12:50010 (10.0.4.12)
Hostname: AZ-TEST1-SPARK-MASTER
Decommission Status : Normal
Configured Capacity: 1081063493632 (1006.82 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 97816576 (93.29 MB)
DFS Remaining: 1026026967040 (955.56 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Nov 27 09:22:36 UTC 2017
command show the same result on Master node.
Thanks for any advice.
other messages
the problem similar as number-of-nodes-in-hadoop-cluster but not work on my stage.
I'm use bare ip not config host ip file as usual.
Fixed
Use host name in every node and their configuration file.
In cluster mode, it must use host name rather then bare ip.

Hadoop - 3 Data nodes are alive up and running but the report/url is not showing the live data nodes

I have One Name Node (Master Node)and 3 Data Node(Slave Nodes) .I have configured a single data node in Name node itself which is working fine and Showing up in the report. All the daemon are up an running individually but the 3 Data Nodes(Slave Nodes) are not listed in the hadoop dfsadmin -report.
When the jps is initiated everything looks good. :
Name Node
[hadoop#master ~]$ jps
4338 Jps
2114 NameNode
2420 SecondaryNameNode
2696 NodeManager
2584 ResourceManager
2220 DataNode
Slave Node
[hadoop#slave1 ~]$ jps
2114 NodeManager
2229 Jps
2015 DataNode
Slave Node
[hadoop#slave2 ~]$ jps
2114 NodeManager
2229 Jps
2015 DataNode
Slave Node
[hadoop#slave3 ~]$ jps
2114 NodeManager
2229 Jps
2015 DataNode
[hadoop#master ~]$ **hadoop dfsadmin -report**
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
16/07/14 21:27:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 7092494336 (6.61 GB)
Present Capacity: 1852854272 (1.73 GB)
DFS Remaining: 1852821504 (1.73 GB)
DFS Used: 32768 (32 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Live datanodes (1):
Name: 192.168.1.160:50010 (nn1)*(### comment - this is data node configured in the name node itself)*
Hostname: nn1
Decommission Status : Normal
Configured Capacity: 7092494336 (6.61 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 5239640064 (4.88 GB)
DFS Remaining: 1852821504 (1.73 GB)
DFS Used%: 0.00%
DFS Remaining%: 26.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jul 14 21:27:46 IST 2016
This issue resolved - The problem is just because the Data Node/Slave Nodes are not able to communicate with the Master Node. This is because the Firewall system in the master node is not accepting the incoming connection from the data node. There are two way to react to the situation
You have to allow incoming communication by the (IP) of salve nodes in Master node
Disable Firewall.
I have worked with the 2nd option:
Type in the following below command in the master node to disable the firewall.
service iptables save
service iptables stop
chkconfig iptables off

Hadoop: Conflict in JPS and hdfs admin report for checking the number of available data nodes

I am working on a five node hadoop multinode cluster. After setting up the clusers, I used JPS command to check whether all of the nodes are properly connected/not. Following were the results after running JPS command in one master node and all the other four slave nodes respectively.
master node
8825 SecondaryNameNode
8647 DataNode
9105 NodeManager
9418 Jps
8493 NameNode
8971 ResourceManager
slave nodes
1816 NodeManager
1711 DataNode
2154 Jps
But when I tried checking from the command hdfs dfsadmin -report, I got the following result:
Configured Capacity: 242317230080 (225.68 GB)
Present Capacity: 224333357056 (208.93 GB)
DFS Remaining: 224333332480 (208.93 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: 127.0.0.1:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 242317230080 (225.68 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 17983873024 (16.75 GB)
DFS Remaining: 224333332480 (208.93 GB)
DFS Used%: 0.00%
DFS Remaining%: 92.58%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
I am unable to understand as to why the Data nodes available is shown as 1 in the above report. Also, my program is running very slow and so I guess only of the datanodes is active. Kindly mention the cause behind this anomaly.

Hadoop datanodes cannot find namenode in standalone setup

There are no errors in any log but I believe my datanode cannot find my namenode.
This is the error that leads me to this conclusion (according to what I've found online):
[INFO ]: org.apache.hadoop.ipc.Client - Retrying connect to server: /hadoop.server:9000. Already tried 4 time(s).
jps output:
7554 Jps
7157 NameNode
7419 SecondaryNameNode
7251 DataNode
Please can someone offer some advice?
Result of dfsadmin
Configured Capacity: 13613391872 (12.68 GB)
Present Capacity: 9255071744 (8.62 GB)
DFS Remaining: 9254957056 (8.62 GB)
DFS Used: 114688 (112 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: 192.172.1.49:50010 (Hadoop)
Hostname: Hadoop
Decommission Status : Normal
Configured Capacity: 13613391872 (12.68 GB)
DFS Used: 114688 (112 KB)
Non DFS Used: 4358320128 (4.06 GB)
DFS Remaining: 9254957056 (8.62 GB)
DFS Used%: 0.00%
DFS Remaining%: 67.98%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Fri Aug 08 17:25:57 SAST 2014
Give a hostname to your machines and make their entries in the /etc/hosts file, like this ,
#hostname hdserver.example.com
#vim /etc/hosts
192.168.0.25 hdserver.example.com
192.168.0.30 hdclient.example.com
and save it.(Use correct IP addresses)
On client also give hostname hdclient.example.com and make above entries in /etc/hosts. This will help the nameserver to locate the machines with hostnames.
delete all contents from tmp folder: rm -Rf path/of/tmp/directory
format namenode: :bin/hadoop namenode -format
start all processes again : bin/start-all.sh

Hbase: Newly added regionserver is not severing requests

I'm setting up a Hbase cluster on a cloud infrastructure.
HBase version: 0.94.11
Hadoop version: 1.0.4
Currently I have 4 nodes in my cluster (1 master, 3 regionservers) and I'm using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send READ requests (Asynchronous READ requests).
Everything works fine with this setup (as I'm monitoring the hole process with ganglia and I'm getting lamda, throughput, latency combined with the YCSB's output), but the problem occurs when I add a new regionserver on-the-fly as it doesn't getting any requests.
What "on-the-fly" means:
While the YCSB is sending request to the cluster, I'm adding new regionservers using python scripts.
Addition Process (while the cluster is serving requests):
I'm creating a new VM which will act as the new regionserver and configure every needed aspect (hbase, hadoop, /etc/host, connect to private network, etc)
Stoping **hbase** balancer
Configuring every node in the cluster with the new node's information adding hostname to regioservers filesadding hostname to hadoop's slave fileadding hostname and IP to /etc/host file of every nodeetc
Executing on the master node:
`hadoop/bin/start-dfs.sh`
`hadoop/bin/start-mapred.sh`
`hbase/bin/start-hbase.sh`
(I've also tried to run `hbase start regionserver` on the newly added node and does exactly the same with the last command - starts the regionserver)
Once the newly added node is up and running I'm executing **hadoop** load balancer
When the hadoop load balancer stops I'm starting again the **hbase** load balancer
I'm connecting over ssh to the master node and check that the load balancers (hbase/hadoop) did their job as both the blocks and regions are uniformly spread across all the regionservers/slaves including the new one.
But when I run status 'simple' in the hbase shell I see that the new regionservers are not getting any requests. (below is the output of the command after adding 2 new regionserver "okeanos-nodes-4/5")
hbase(main):008:0> status 'simple'
5 live servers
okeanos-nodes-1:60020 1380865800330
requestsPerSecond=5379, numberOfOnlineRegions=4, usedHeapMB=175, maxHeapMB=3067
okeanos-nodes-2:60020 1380865800738
requestsPerSecond=5674, numberOfOnlineRegions=4, usedHeapMB=161, maxHeapMB=3067
okeanos-nodes-5:60020 1380867725605
requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27, maxHeapMB=3067
okeanos-nodes-3:60020 1380865800162
requestsPerSecond=3871, numberOfOnlineRegions=5, usedHeapMB=162, maxHeapMB=3067
okeanos-nodes-4:60020 1380866702216
requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29, maxHeapMB=3067
0 dead servers
Aggregate load: 14924, regions: 19
The fact that they don't serve any requests is also evidenced by the CPU usage, in a serving regionserver is about 70% while in these 2 regioservers is about 2%.
Below is the output of hadoop dfsadmin -report, as you can see the block are evenly distributed (according to hadoop balancer -threshold 2).
root#okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin -report
Configured Capacity: 105701683200 (98.44 GB)
Present Capacity: 86440648704 (80.5 GB)
DFS Remaining: 84188446720 (78.41 GB)
DFS Used: 2252201984 (2.1 GB)
DFS Used%: 2.61%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 5 (5 total, 0 dead)
Name: 10.0.0.11:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 309166080 (294.84 MB)
Non DFS Used: 3851579392 (3.59 GB)
DFS Remaining: 16979591168(15.81 GB)
DFS Used%: 1.46%
DFS Remaining%: 80.32%
Last contact: Fri Oct 04 11:30:31 EEST 2013
Name: 10.0.0.3:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 531652608 (507.02 MB)
Non DFS Used: 3852300288 (3.59 GB)
DFS Remaining: 16756383744(15.61 GB)
DFS Used%: 2.51%
DFS Remaining%: 79.26%
Last contact: Fri Oct 04 11:30:32 EEST 2013
Name: 10.0.0.5:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 502910976 (479.61 MB)
Non DFS Used: 3853029376 (3.59 GB)
DFS Remaining: 16784396288(15.63 GB)
DFS Used%: 2.38%
DFS Remaining%: 79.4%
Last contact: Fri Oct 04 11:30:32 EEST 2013
Name: 10.0.0.4:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 421974016 (402.43 MB)
Non DFS Used: 3852365824 (3.59 GB)
DFS Remaining: 16865996800(15.71 GB)
DFS Used%: 2%
DFS Remaining%: 79.78%
Last contact: Fri Oct 04 11:30:29 EEST 2013
Name: 10.0.0.10:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 486498304 (463.96 MB)
Non DFS Used: 3851759616 (3.59 GB)
DFS Remaining: 16802078720(15.65 GB)
DFS Used%: 2.3%
DFS Remaining%: 79.48%
Last contact: Fri Oct 04 11:30:29 EEST 2013
I've tried stopping YCSB, restarting hbase master and restarting YCSB but with no lack.. these 2 nodes don't serve any requests!
As there are many log and conf files, I have created a zip file with logs and confs (both hbase and hadoop) of the master, a healthy regionserver serving requests and a regionserver not serving requests.
https://dl.dropboxusercontent.com/u/13480502/hbase_hadoop_logs__conf.zip
Thank you in advance!!
I found what was going on and it had nothing to do with Hbase... I have forgotten to add the hostname and IP of the new RS to the YCSB server VM (/etc/hosts file).... :-(

Resources