Hadoop 3.3.0 on WSL 2 - Failed to start name node, but data node works fine - hadoop

I'm running Hadoop 3.3.0 on WSL 2 in Windows 11, I followed this guide to setup my configuration: Install Hadoop 3.3.0 on WSL 2
When I start namenode and datanode with:
start-dfs.sh
It shows no error, but in jps the namenode is not running:
8805 DataNode
9034 Seconday Namenode
9212 Jps
Looking in the namenode log file, it shows failed to start namenode:
2023-01-02 20:50:51,714 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2023-01-02 20:50:51,715 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2023-01-02 20:50:51,715 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2023-01-02 20:50:51,719 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: Could not parse line: Filesystem 1024-blocks Used Available Capacity Mounted on
at org.apache.hadoop.fs.DF.parseOutput(DF.java:195)
at org.apache.hadoop.fs.DF.getFilesystem(DF.java:76)
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker$CheckedVolume.<init>(NameNodeResourceChecker.java:70)
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirToCheck(NameNodeResourceChecker.java:166)
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.<init>(NameNodeResourceChecker.java:135)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1266)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:862)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:783)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1014)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:987)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1756)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1821)
2023-01-02 20:50:51,722 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: Could not parse line: Filesystem 1024-blocks Used Available Capacity Mounted on
2023-01-02 20:50:51,725 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
Here is my configuration file core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
And for hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
I've been stuck with this problem for several days, thank you for all the help!

Related

The process of NameNode isn't present when executing jps

I'm new to the Hadoop ecosystem,
I installed hadooop 3.3.0 as a Pseudo-Distributed Mode.
The all application http://localhost:8088/ is working but to view name node of the application on http://localhost:9870/ i couldn't (This site can’t be reached).
$ jps
24553 Jps
20537 NodeManager
20429 ResourceManager
and
$ hadoop version
Hadoop 3.3.0
Source code repository https://github.com/apache/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:21Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
I tried to restart the process but in vain
$ stop-all.sh
WARNING: Stopping all Apache Hadoop daemons as mhannani in 10 seconds.
WARNING: Use CTRL-C to abort.
Stopping namenodes on [HP]
Stopping datanodes
Stopping secondary namenodes [HP]
2021-01-06 16:42:07,540 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping nodemanagers
Stopping resourcemanager
format :
$ hdfs namenode -format
2021-01-06 16:44:14,683 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = HP/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.3.0
and then
$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as mhannani in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [HP]
Starting datanodes
Starting secondary namenodes [HP]
HP: ERROR: Cannot set priority of secondarynamenode process 29847
2021-01-06 16:45:38,266 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
Please how can i fix that, in order to access my hdfs file system from the browser as it was on earlier version of Hadoop on http://localhost:9870/50075 ?
Any help or advice would be appreciated, Thanks folks.
The issue was not setting correctly the namenode path, and datanode paths of my local file systems :
on $HADOOP_HOME/etc/hadoop/hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file://AbsolutePATH/TO/WHERE/THE/namenode/Should/be/stored</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file://The/same/for/dataNode</value>
</property>
</configuration>

ERROR in datanode execution while running Hadoop first time in Windows 10

I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files:
hdfs-site.xml
mapred-site.xml
core-site.xml
yarn-site.xml
Then, I executed the following command:
C:\hadoop-3.1.1\bin> hdfs namenode -format
The format ran correctly so I directed to C:\hadoop-3.1.1\sbin to execute the following command:
C:\hadoop-3.1.1\sbin> start-dfs.cmd
The command prompt opens 2 new windows: one for datanode and another for namenode.
The namenode window keeps running:
2018-09-02 21:37:06,232 INFO ipc.Server: IPC Server Responder: starting
2018-09-02 21:37:06,232 INFO ipc.Server: IPC Server listener on 9000: starting
2018-09-02 21:37:06,247 INFO namenode.NameNode: NameNode RPC up at: localhost/127.0.0.1:9000
2018-09-02 21:37:06,247 INFO namenode.FSNamesystem: Starting services required for active state
2018-09-02 21:37:06,247 INFO namenode.FSDirectory: Initializing quota with 4 thread(s)
2018-09-02 21:37:06,247 INFO namenode.FSDirectory: Quota initialization completed in 3 milliseconds
name space=1
storage space=0
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0
2018-09-02 21:37:06,279 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
While the datanode gives following error:
ERROR: datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2018-09-02 21:37:04,250 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2018-09-02 21:37:04,250 INFO datanode.DataNode: SHUTDOWN_MSG:
And then, the datanode shuts down! I tried several ways to overcome this error, but this is first time I am installing Hadoop on windows and can't understand what to do next!
I got things working, after I removed the file system reference for the datanode in hdfs-site.xml. I found that enabled the software to create and initialise its own datanode, which then popped up in sbin. After that I could use hdfs without a hitch. Here is what worked for me for Hadoop 3.1.3 on windows:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/Users/myusername/hadoop/hadoop-3.1.3/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>datanode</value>
</property>
</configuration>
Cheers,
MV
I had the same problem and what worked for me was editing hdfs-site.xml as follows:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/Hadoop/hadoop-3.1.2/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/C:/Hadoop/hadoop-3.1.2/data/datanode</value>
</property>

Run HDFS pseudo mode in a docker container

I'm trying to run a HDFS under pseudo mode in a docker container, configured with this page: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation, but I didn't use start-all.sh script as it isn't supposed to be able to do ssh, so I manually ran command bin/hdfs --daemon start namenode|datanode to start them one by one. The problem is I can see namenode started successfully, but datanode quited without any error message. the last piece of log from datanode is:
...
2018-04-09 21:04:03,830 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/apps/hadoop/hdfs/data
2018-04-09 21:04:04,188 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-04-09 21:04:04,296 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-04-09 21:04:04,296 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2018-04-09 21:04:04,665 INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2018-04-09 21:04:04,667 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2018-04-09 21:04:04,671 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is hdfs
2018-04-09 21:04:04,671 INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2018-04-09 21:04:04,677 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
2018-04-09 21:04:04,733 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:9866
2018-04-09 21:04:04,735 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwidth is 10485760 bytes/s
2018-04-09 21:04:04,735 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
core-site.xml file:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost</value>
</property>
</configuration>
And hdfs-site.xml is
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/apps/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/apps/hadoop/hdfs/data</value>
</property>
</configuration>
Did I miss any thing from there?
I think it is base image issue, I was using alpine, once I changed to centos, datanode works! must be something missing from alpine, appreciate if anyone knows what is it, as centos based image eventually will much more bigger then alpine.

the directory is already locked hadoop

I am getting below error while starting hadoop:
2015-09-04 08:49:05,648 ERROR org.apache.hadoop.hdfs.server.common.Storage: It appears that another node 854#ip-1-2-3-4 has already locked the storage directory: /mnt/xvdb/tmp/dfs/namesecondary
java.nio.channels.OverlappingFileLockException
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:712)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:678)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:499)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:962)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:243)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)
2015-09-04 08:49:05,650 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /mnt/xvdb/tmp/dfs/namesecondary. The directory is already locked
2015-09-04 08:49:05,650 FATAL org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start secondary namenode
java.io.IOException: Cannot lock storage /mnt/xvdb/tmp/dfs/namesecondary. The directory is already locked
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:683)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:499)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:962)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:243)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)
2015-09-04 08:49:05,652 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-09-04 08:49:05,653 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down SecondaryNameNode at ip-#ip-1-2-3-4/#ip-1-2-3-4
************************************************************/
Hadoop version: 2.7.1(3 node cluster)
hdfs-site.xml configuration file:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/mnt/xvdb/hadoop/dfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/mnt/xvdb/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
I have tried formatting name node as well, but it didn't help. Can anyone help me with this?
I found solution to above problem here : http://misconfigurations.blogspot.in/2014/10/hadoop-initialization-failed-for-block.html
If there is any other solution,would like to have a look.
P.S: I have deleted the directory pointed out by "dfs.datanode.data.dir" and it has erased all data on HDFS but helped me to fix the issue. So You can use an alternate way, if has any, for fixing this issue.

Unable to create default datanode in hadoop Cluster

I am using Ubuntu 10.04. I installed hadoop in my local directory as a standalone one.
~-desktop:~$ hadoop/bin/hadoop version
Hadoop 1.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473
Compiled by hortonfo on Mon May 6 06:59:37 UTC 2013
From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
This command was run using /home/circar/hadoop/hadoop-core-1.2.0.jar
conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/circar/hadoop/dataFiles</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
I've formatd the namenode twice
~-desktop:~$ hadoop/bin/hadoop namenode -format
Then I start hadoop with
~-desktop:~$ hadoop/bin/start-all.sh
Showing Result as:
starting namenode, logging to /home/circar/hadoop/libexec/../logs/hadoop-circar-namenode-circar-desktop.out
circar#localhost's password:
localhost: starting datanode, logging to /home/circar/hadoop/libexec/../logs/hadoop-circar-datanode-circar-desktop.out
circar#localhost's password:
localhost: starting secondarynamenode, logging to /home/circar/hadoop/libexec/../logs/hadoop-circar-secondarynamenode-circar-desktop.out
starting jobtracker, logging to /home/circar/hadoop/libexec/../logs/hadoop-circar-jobtracker-circar-desktop.out
circar#localhost's password:
localhost: starting tasktracker, logging to /home/circar/hadoop/libexec/../logs/hadoop-circar-tasktracker-circar-desktop.out
But in
/logs/hadoop-circar-datanode-circar-desktop.log
It is showing error as:
2013-06-24 17:32:47,183 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = circar-desktop/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013
STARTUP_MSG: java = 1.6.0_26
************************************************************/
2013-06-24 17:32:47,315 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-06-24 17:32:47,324 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-06-24 17:32:47,325 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-06-24 17:32:47,325 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-06-24 17:32:47,447 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-06-24 17:32:47,450 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-06-24 17:32:49,265 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/circar/hadoop/dataFiles/dfs/data: namenode namespaceID = 186782509; datanode namespaceID = 1733977738
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:412)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:319)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1698)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1637)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1655)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1781)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1798)
2013-06-24 17:32:49,266 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at circar-desktop/127.0.1.1
************************************************************/
JPS Showing:
~-desktop:~$ jps
8084 Jps
7458 JobTracker
7369 SecondaryNameNode
7642 TaskTracker
6971 NameNode
When I'm trying to Stop it , it shows:
~-desktop:~$ hadoop/bin/stop-all.sh
stopping jobtracker
circar#localhost's password:
localhost: stopping tasktracker
stopping namenode
circar#localhost's password:
localhost: *no datanode to stop*
circar#localhost's password:
localhost: stopping secondarynamenode
What wrong am i doing? Anyone help me?
Ravi is correct. But also make sure that your cluster_id in both ${dfs.data.dir}/current/VERSION and ${dfs.name.dir}/current/VERSION matches. If not change the data node's cluster_id to have the same as namenode. After making the changes follow the steps Ravi mentioned.
Namenode generates new namespaceID every time you format HDFS. Datanodes bind themselves to namenode through namespaceID.
Follow the below steps to fix the problem
a) Stop the problematic DataNode.
b) Edit the value of namespaceID in ${dfs.data.dir}/current/VERSION to match the corresponding value of the current NameNode in ${dfs.name.dir}/current/VERSION.
c) Restart the DataNode.

Resources