I am trying to set up a multinode cluster (Hadoop 1.0.4) and all the daemons are coming up. I have a 2 node Cluster with a Master and one Slave. I am configuring only slave as datanode.
I did change the core-site.xml, mapred-site.xml and hdfs-site.xml , Master(with Master IP) and slaves file (with slave IP).
Configured passwordless ssh and copied it to authorized_keys , copied Master public key to slave authorized_keys.
Formatted Namenode
I could see all daemons - Namenode, Jobtracker and Secondary namenode running in Master and TaskTracker,Datanode on Slave machine.
But when I try to load the data to hdfs using hadoop fs -put command I am getting the following error
15/09/26 08:43:33 ERROR hdfs.DFSClient: Exception closing file /Hello : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /Hello could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
I did an fsck command and got the below message.
FSCK started by hadoop from /172.31.18.149 for path / at Sat Sep 26 08:46:00 EDT 2015
Status: HEALTHY
Total size: 0 B
Total dirs: 5
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 0
Number of racks: 0
Somehow Datanode is unavailable to the Namenode, but I couldnt figure out why.
Any help is appreciated . Thanks!
Related
I have One Name Node (Master Node)and 3 Data Node(Slave Nodes) .I have configured a single data node in Name node itself which is working fine and Showing up in the report. All the daemon are up an running individually but the 3 Data Nodes(Slave Nodes) are not listed in the hadoop dfsadmin -report.
When the jps is initiated everything looks good. :
Name Node
[hadoop#master ~]$ jps
4338 Jps
2114 NameNode
2420 SecondaryNameNode
2696 NodeManager
2584 ResourceManager
2220 DataNode
Slave Node
[hadoop#slave1 ~]$ jps
2114 NodeManager
2229 Jps
2015 DataNode
Slave Node
[hadoop#slave2 ~]$ jps
2114 NodeManager
2229 Jps
2015 DataNode
Slave Node
[hadoop#slave3 ~]$ jps
2114 NodeManager
2229 Jps
2015 DataNode
[hadoop#master ~]$ **hadoop dfsadmin -report**
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
16/07/14 21:27:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 7092494336 (6.61 GB)
Present Capacity: 1852854272 (1.73 GB)
DFS Remaining: 1852821504 (1.73 GB)
DFS Used: 32768 (32 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Live datanodes (1):
Name: 192.168.1.160:50010 (nn1)*(### comment - this is data node configured in the name node itself)*
Hostname: nn1
Decommission Status : Normal
Configured Capacity: 7092494336 (6.61 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 5239640064 (4.88 GB)
DFS Remaining: 1852821504 (1.73 GB)
DFS Used%: 0.00%
DFS Remaining%: 26.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jul 14 21:27:46 IST 2016
This issue resolved - The problem is just because the Data Node/Slave Nodes are not able to communicate with the Master Node. This is because the Firewall system in the master node is not accepting the incoming connection from the data node. There are two way to react to the situation
You have to allow incoming communication by the (IP) of salve nodes in Master node
Disable Firewall.
I have worked with the 2nd option:
Type in the following below command in the master node to disable the firewall.
service iptables save
service iptables stop
chkconfig iptables off
hduser#master-virtual-machine:/usr/local/hadoop/etc/hadoop$ jps
5934 Jps
3490 SecondaryNameNode
3678 ResourceManager
5108 NameNode
hduser#master-virtual-machine:/usr/local/hadoop/etc/hadoop$ hdfs dfsadmin -report
15/02/28 22:35:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
Can you check your datanode logs for any errors, and also see if datanode config files are configured properly ?
I want to install hadoop-0.23.5 on single node, but after starting namenode and datanode, it shows that the datanode available is 0:
Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: �% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
I checked the datanode log file and this is the error:
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
I set dfs.namenode.rpc-address in hdfs-site.xml and I don't understand what the problem is. does anybody know how could i fix this problem.
You are probably hitting this issue which affected versions 0.23
What you need to do is update fs.default.name in core-default.xml
UPDATE
You need to give hdfs-site.xml to hbase/conf so hbase can use the correct target replica, else it uses default 3.
That fixes the message. But my namenode is always in safemode during every process restart.
The fsck is all fine with no errors, no under replicated etc.
I see no logs after:
2012-10-17 13:15:13,278 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode
will be turned off automatically.
2012-10-17 13:15:14,228 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node:
/default-rack/127.0.0.1:50010
2012-10-17 13:15:14,238 INFO org.apache.hadoop.hdfs.StateChange: BLOCK
NameSystem.processReport: from 127.0.0.1:50010, blocks: 20, processing time: 0 msecs
Any suggestions ?
I have dfs.replication set to 1.
hbase is in distributed mode.
First write goes through, but when I restart namenode always reports blocks as under reported.
Output from hadoop fsck /hbase
/hbase/tb1/.tableinfo.0000000003: Under replicated blk_-6315989673141511716_1029. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.oldlogs/hlog.1350414672838: Under replicated blk_-7364606700173135939_1027. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.regioninfo: Under replicated blk_788178851601564156_1027. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
Total size: 8731 B
Total dirs: 34
Total files: 25 (Files currently being written: 1)
Total blocks (validated): 25 (avg. block size 349 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 25 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 25 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 50 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Tue Oct 16 13:23:55 PDT 2012 in 0 milliseconds
Why does it say target replica is 3 but default replication factor is clearly 1.
Anyone please advice.
My versions are hadoop 1.0.3 and hbase 0.94.1
Thanks!
To force the Hdfs to exit from safemode.
Type this:
hadoop dfsadmin -safemode leave
When i try to copy directory of 3 files in hdfs i get following errors
hduser#saket-K53SM:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg
12/08/01 23:48:46 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hduser/gutenberg/gutenberg/pg20417.txt could only be replicated to 0 nodes, instead of 1
12/08/01 23:48:46 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/08/01 23:48:46 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hduser/gutenberg/gutenberg/pg20417.txt" - Aborting...
copyFromLocal: java.io.IOException: File /user/hduser/gutenberg/gutenberg/pg20417.txt could only be replicated to 0 nodes, instead of 1
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hduser/gutenberg/gutenberg/pg20417.txt could only be replicated to 0 nodes, instead of 1
My fsck output is
hduser#saket-K53SM:/usr/local/hadoop$ bin/hadoop fsck -blocks
FSCK started by hduser from /127.0.0.1 for path / at Wed Aug 01 23:50:49 IST 2012
Status: HEALTHY
Total size: 0 B
Total dirs: 10
Total files: 0 (Files currently being written: 2)
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 0
Number of racks: 0
FSCK ended at Wed Aug 01 23:50:49 IST 2012 in 3 milliseconds
The filesystem under path '/' is HEALTHY
Also when i try to format namenode i get following error
hduser#saket-K53SM:/usr/local/hadoop$ bin/hadoop namenode -format
12/08/01 23:53:07 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = saket-K53SM/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch- 1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
Re-format filesystem in /app/hadoop/tmp/dfs/name ? (Y or N) y
Format aborted in /app/hadoop/tmp/dfs/name
12/08/01 23:53:09 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at saket-K53SM/127.0.1.1
Any help would be appreciated..
I believe its a very silly issue. Enter “Y” instead of lower case "y" (its supposed to be uppercase)
Have you tried to:
stop namenode
stop datanode
delete /app/hadoop*
format namenode
start datanode and namenode again
After stopping all the daemons (i.g stop-all.sh),delete data directory which holds the temporary files of namenode . After deleting data directory start again the hadoop daemons i.g start-all.sh.
The path of the "data" directory is the value of hadoop.tmp.dir property in conf/core-site.xml .
I think this will fix your problem