While setting up multinode hadoop cluster I faced up several issues.
Going through different web portals for correct setup. Some fundamental question arose
I am using Hadoop 2.8.5 to set up a 2 node cluster in master slave configuration.
On first machine format the namenode using hdfs namenode format
clusterID and BlockpoolID got assigned like below:
#Fri Mar 29 11:14:41 IST 2019
namespaceID=576041649
clusterID=CID-98480e8d-f7a9-4e1a-8997-400a7aa150c3
cTime=1553838281164
storageType=NAME_NODE
blockpoolID=BP-954411427-x.x.x.y-1553838281164
layoutVersion=-63
Now on the 2nd machine, I ran command hdfs namenode format -clusterId CID-98480e8d-f7a9-4e1a-8997-400a7aa150c3
#Fri Mar 29 11:15:38 IST 2019
namespaceID=304822257
clusterID=CID-98480e8d-f7a9-4e1a-8997-400a7aa150c3
cTime=1553838338130
storageType=NAME_NODE
blockpoolID=BP-1421744029-x.x.x.x-1553838338130
layoutVersion=-63
Considering the slave and master should have same clusterID, Correct me if I am wrong.
The configuration seems to be working correctly but I am getting error in logs at logs/hadoop-cassandra-datanode-localnosql1.log and logs/hadoop-cassandra-datanode-localnosql2.log
2019-03-29 11:25:44,009 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-954411427-x.x.x.y-1553838281164 (Datanode Uuid 4b90bebb-3c34-43d5-8285-6ec6dfefc0a7) service to localnosql1/x.x.x.x:8020 Blockpool ID mismatch: previously connected to Blockpool ID BP-954411427-x.x.x.y-1553838281164 but now connected to Blockpool ID BP-1421744029-x.x.x.x-1553838338130
2019-03-29 11:25:49,010 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-954411427-x.x.x.y-1553838281164 (Datanode Uuid 4b90bebb-3c34-43d5-8285-6ec6dfefc0a7) service to localnosql1/x.x.x.x:8020 Blockpool ID mismatch: previously connected to Blockpool ID BP-954411427-x.x.x.y-1553838281164 but now connected to Blockpool ID BP-1421744029-x.x.x.x-1553838338130
2019-03-29 11:25:54,012 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-954411427-x.x.x.y-1553838281164 (Datanode Uuid 4b90bebb-3c34-43d5-8285-6ec6dfefc0a7) service to localnosql1/x.x.x.x:8020 Blockpool ID mismatch: previously connected to Blockpool ID BP-954411427-x.x.x.y-1553838281164 but now connected to Blockpool ID BP-1421744029-x.x.x.x-1553838338130
2019-03-29 11:25:59,013 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-954411427-x.x.x.y-1553838281164 (Datanode Uuid 4b90bebb-3c34-43d5-8285-6ec6dfefc0a7) service to localnosql1/x.x.x.x:8020 Blockpool ID mismatch: previously connected to Blockpool ID BP-954411427-x.x.x.y-1553838281164 but now connected to Blockpool ID BP-1421744029-x.x.x.x-1553838338130
2019-03-29 11:26:04,014 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-954411427-x.x.x.y-1553838281164 (Datanode Uuid 4b90bebb-3c34-43d5-8285-6ec6dfefc0a7) service to localnosql1/x.x.x.x:8020 Blockpool ID mismatch: previously connected to Blockpool ID BP-954411427-x.x.x.y-1553838281164 but now connected to Blockpool ID BP-1421744029-x.x.x.x-1553838338130
What these error logs are suggesting?
Does the blockpool ID on the all master and slave nodes need to be same like clusterId, If yes how to do that?
Why are you trying to format namenode twice? Ideally, in multinode configuration, there is one namenode and many datanodes.
While setting up for the first time, you initialize namenode by "hdfs namenode -format" then you start datanodes and it works fine.
If you are trying multi-master configuration (multiple namenode running at same time), them i am not sure this will work.
If you are trying active-standby configuration for namenode, you may try below steps
Hadoop Namenode HA setup
Related
Our cluster is running with 2 core nodes with little dfs capacity and it needs to be increased.
I added a new volume of 500GB to the core node instance and mounted it to /mnt1 and updated the hdfs-site.xml in both master and core nodes.
<property>
<name>dfs.datanode.dir</name>
<value>/mnt/hdfs,/mnt/hdfs1</value>
</property>
Then I restarted both hadoop-hdfs-namenode and hadoop-hdfs-datanode services. But the datanode is getting shutdown due to the new volume.
2018-06-19 11:25:05,484 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode (DataNode: [[[DISK]file:/mnt/hdfs/, [DISK]file:/mnt/hdfs1]] heartbeating to ip-10-60-12-232.ap-south-1.compute.internal/10.60.12.232:8020): Initialization
failed for Block pool (Datanode Uuid unassigned) service
to ip-10-60-12-232.ap-south-1.compute.internal/10.60.12.232:8020.
Exiting.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed
volumes - current valid volumes: 1, volumes configured: 2, volumes
failed: 1, volume failures tolerated: 0
Upon searching I see that people suggested to format namenode so that blockpool id will be assigned to both volumes. How can I fix this issue?
When I started the Hadoop cluster, the following Exception was thrown. I dont't have idea for solving it. Anyone help me. Thanks
2017-07-10 09:40:58,960 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /tools/hadoop/hadoop_storage/hdfs/datanode: namenode clusterID = CID-47191263-b5b7-4a4d-b8b5-a78b782e66bb; datanode clusterID = CID-79a53373-9652-4c08-9735-b5972e0450ca
2017-07-10 09:40:58,960 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:54310. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
at java.lang.Thread.run(Thread.java:745)
2017-07-10 09:40:58,961 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:54310
2017-07-10 09:40:58,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2017-07-10 09:41:00,962 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-07-10 09:41:00,964 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-07-10 09:41:00,966 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
It perhaps you format your cluster one more time thus it generate different ID cluster in the master node and data node.
Your namenode and datanode cluster ID does not match and you make sure to make them the same.
In name node, change cluster id in the file located in:
$ nano HADOOP_FILE_SYSTEM/namenode/current/VERSION
In data node you cluster id is stored in the file:
$ nano HADOOP_FILE_SYSTEM/datanode/current/VERSION
Whatever the way you change ID, but assure that the ID in the cluster's nodes are the same.
#VanThaoNguyen is correct
In my case:
/installation directory/hdata/dfs/name/current
/installation directory/hdata/dfs/data/current
clusterID=xxxx-xxxx-xxxx-xxxx
should be same for name node and data node.
I am installing Hadoop 2.7.2 (1 master NN -1 second NN-3 datanode) and cannot start the datanodes!!!
After trouble shouting the logs (see below), the fatal error is due to ClusterID mismatch... easy! just change the IDs.
WRONG... when I check my VERSION files on the NameNode and the DataNodes they are identical..
So the question is simple: INTO the log file --> Where the ClusterID of the NameNode is coming From????
LOG FILE:
WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /home/hduser/mydata/hdfs/datanode: namenode clusterID = **CID-8e09ff25-80fb-4834-878b-f23b3deb62d0**; datanode clusterID = **CID-cd85e59a-ed4a-4516-b2ef-67e213cfa2a1**
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to master/172.XX.XX.XX:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
atorg.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
at java.lang.Thread.run(Thread.java:745)
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to master/172.XX.XX.XX:9000
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
COPY of THE VERSION FILE
the master
storageID=DS-f72f5710-a869-489d-9f52-40dadc659937
clusterID=CID-cd85e59a-ed4a-4516-b2ef-67e213cfa2a1
cTime=0
datanodeUuid=54bc8b80-b84f-4893-8b96-36568acc5d4b
storageType=DATA_NODE
layoutVersion=-56
THE DataNode
storageID=DS-f72f5710-a869-489d-9f52-40dadc659937
clusterID=CID-cd85e59a-ed4a-4516-b2ef-67e213cfa2a1
cTime=0
datanodeUuid=54bc8b80-b84f-4893-8b96-36568acc5d4b
storageType=DATA_NODE
layoutVersion=-56
Just to summarize (and close) this issue, I would like to share how I fixed this issue.
On the MASTER and the 2nd Namenode the Namenode VERSION file is under ~/.../namenode/current/VERSION.
BUT for DATANODES the path is different. it should look something like this ~/.../datanode/current/VERSION
ClusterIDs between the 2 VERSION files should be identical
Hope it helps!
I also faced the same issue while installing 2.7.2. Data node is not coming up. Error shown in the datanode log file is
java.io.IOException: Incompatible clusterIDs in
/home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode: namenode
clusterID = CID-XXX; datanode
clusterID = CID-YYY
What i have done is
HADOOP_DIR/bin/hadoop namenode -format -clusterID CID-YYY
(No quotes required for cluster id)
Just to add one more thing.
First, stop the dfs and delete the namenode and datanode directory/folders as specified in the hfs-site.xml.
And after that go to the ../namenode/current/VERSION file and copy the clusterId and replate the clusterID in ../datanode/current/VERSION file with the previously copied clusterID.
I get the following error trying to start datanodes in HA HDFS cluster
2016-01-06 22:54:58,064 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory [DISK]file:/home/data/hdfs/dn/ has already been used.
2016-01-06 22:54:58,082 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1354640905-10.146.52.232-1452117061014
2016-01-06 22:54:58,083 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze storage directories for block pool BP-1354640905-10.146.52.232-1452117061014
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /home/data/hdfs/dn/current/BP-1354640905-10.146.52.232-1452117061014
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:396)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1338)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2016-01-06 22:54:58,084 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage for block pool: BP-1354640905-10.146.52.232-1452117061014 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /home/data/hdfs/dn/current/BP-1354640905-10.146.52.232-1452117061014
2016-01-06 22:54:58,084 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to master3/10.146.52.232:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1338)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2016-01-06 22:54:58,084 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to master3/10.146.52.232:8020
2016-01-06 22:54:58,084 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to master2/10.146.52.231:8020. Exiting.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure config value: 3
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:261)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1351)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2016-01-06 22:54:58,085 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to master2/10.146.52.231:8020
2016-01-06 22:54:58,185 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid unassigned)
I have already check the clusters ID in namenode and datanode and they are similar...
I tried to reformat everything several times...
Thanks for your help !
I have seen messages like this in the log file when the file system for the DataNode is corrupt. Perhaps, try running fsck -y on each of the disks used by the DataNode. In your case:
fsck -y /home/data/hdfs
Once the disk(s) is(are) clean you should be able to start the DataNode. The NameNode will work ensure that the replication factor is fixed for any lost blocks.
I had a similar problem (but don't know without more logs, but mine didn't say "Datanode Uuid unassigned"), and fsck didn't solve it.
In my case, I had moved a subset of disks from one node to another node that already had disks, and disabled the old node, so there was a problem with the disks not matching the DatanodeUuid of the new machine.
Above those lines in the log, there were entries like:
2016-04-11 19:32:02,991 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /archive14/dfs/data is in an inconsistent state: Root /archive14/dfs/data: DatanodeUuid=5ba6418e-2c24-4582-8225-3e7f7fff9feb, does not match 519c1e34-a573-41f7-9e80-dca606fce704 from other StorageDirectory.
To solve that, I ran:
sed -i -r "s/${olduuid}/${olduuid}/' /mountpoints*/dfs/data/current/VERSION
This replaces the old UUID in the VERSION file with the new one. Then starting the datanode works.
Maybe in your case, you had a missing UUID rather than an incorrect one.
Deleting the name node directory and the data node directory and then creating the new directories worked for me. Use this technique assuming that you will lost the data.
For my case,I reinstall hdfs by CM6.2.0 and instance two namenodes for HA.
Then reformat these namenode each other,but this option cause the error below.
Initialization failed for Block pool BP-666417012-10.253.76.213-1557044865448 (Datanode Uuid 5132035c-8d6a-4617-af7e-7d07355a905b) service to hzd-t-vbdl-02/10.253.76.222:8022 Blockpool ID mismatch: previously connected to Blockpool ID BP-666417012-10.253.76.213-1557044865448 but now connected to Blockpool ID BP-1262695848-10.253.76.222-1557045124181
Process method:
ansible all -m shell -a " more /XXX/hdfs/dfs/nn/current/VERSION "
hzd-t-vbdl-01 | CHANGED | rc=0 >>
Sun May 05 16:27:45 CST 2019
namespaceID=732385684
clusterID=cluster54
cTime=1557044865448
storageType=NAME_NODE
blockpoolID=BP-666417012-10.253.76.213-1557044865448
layoutVersion=-64
hzd-t-vbdl-02 | CHANGED | rc=0 >>
Sun May 05 16:32:04 CST 2019
namespaceID=892287385
clusterID=cluster54
cTime=1557045124181
storageType=NAME_NODE
blockpoolID=BP-1262695848-10.253.76.222-1557045124181
layoutVersion=-64
Finally copy the context from hzd-t-vbdl-01(early formated) to hzd-t-vbdl-02,and restart namenodes and datanodes
I try to start hadoop 2.2.0 offical release in my mac osx(10.9).
But it's failed to start datanode. Please refer to below error message:
2013-11-05 00:29:45,381 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1261650833-192.168.5.102-1383580328383 (storage id DS-195316454-192.168.5.102-50010-1383582585181) service to ning/192.168.5.102:9010
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-195316454-192.168.5.102-50010-1383582585181, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-e22d5b33-0f57-47bb-ae7f-73f7393833b7;nsid=1896158194;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:739)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3929)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:948)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:24079)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)