Namenode stops working after hadoop restart - hadoop

I have a server with Hadoop installed on.
I wanted to change some configuration (about the mapreduce.map.output.compress); therefore, I changed the configuration file, and I restarted Hadoop, with:
stop-all.sh
start-all.sh
After that, I was not able to use it again, becouse it was in Safe Mode:
The reported blocks is only 0 but the threshold is 0.9990 and the total blocks 11313. Safe mode will be turned off automatically.
Please, notice that the number of reported blocks is 0, and it was not increasing at all.
Therefore, I forced it to leave the Safe Mode with:
bin/hadoop dfsadmin -safemode leave
Now, I get errors like this:
2014-03-09 18:16:40,586 [Thread-1] ERROR org.apache.hadoop.hdfs.DFSClient - Failed to close file /tmp/temp-39739076/tmp2073328134/GQL.jar
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/temp-39739076/tmp2073328134/GQL.jar could only be replicated to 0 nodes, instead of 1
If it helps, my hdfs-site.xml is:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hduser/hadoop/name/data</value>
</property>
</configuration>

I've run into this problem many times. Whenever you get the error stating x could only be replicated to 0 nodes, instead of 1, the following steps should fix the problem:
Stop all Hadoop services with: stop-all.sh
Delete the dfs/name and dfs/data directories
Format the NameNode with: hadoop namenode -format
Start Hadoop again with: start-all.sh

Related

No datanode running in hadoop 2.9.2

I'm very new to hadoop, so I've started following the hadoop 2.9.2 getting started. When I run the command
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
it returns a success, but when I look at the output/part-r-00000.txt file, which is meant to show the result, it is empty, even though the input directory contains the .xml files of etc/hadoop as it is supposed to.
I've started the whole process over and over again, reading all the logs, in order to understand where the error might be. Anyway, when I run the bin/hdfs namenode -format, it shows me this error:
ERROR common.Util: Syntax error in URI file://path to temp_directory/dfs/name. Please check hdfs configuration.
java.net.URISyntaxException: Illegal character in authority at index 7: file://path to temp_directory/dfs/name
at java.base/java.net.URI$Parser.fail(URI.java:2915)
at java.base/java.net.URI$Parser.parseAuthority(URI.java:3249)
at java.base/java.net.URI$Parser.parseHierarchical(URI.java:3160)
at java.base/java.net.URI$Parser.parse(URI.java:3116)
at java.base/java.net.URI.<init>(URI.java:600)
at org.apache.hadoop.hdfs.server.common.Util.stringAsURI(Util.java:49)
at org.apache.hadoop.hdfs.server.common.Util.stringCollectionAsURIs(Util.java:99)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStorageDirs(FSNamesystem.java:1466)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceEditsDirs(FSNamesystem.java:1511)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceEditsDirs(FSNamesystem.java:1480)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1137)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1614)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
and also this occurs when I run bin/hdfs dfs -put etc/hadoop input:
WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/federico/input/hadoop/capacity-scheduler.xml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
it seems pretty clear that there are no datanodes running. So, assumed this situation, how can I initialize a datanode to make things work, and how do I know if my datanode is running as it is expected to?
EDIT: I've tried to follow some suggestion fro different users experiencing a similar problem and tihs error came out:
WARN org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/dfs/data
java.io.FileNotFoundException: File file:/dfs/data does not exist
and thus the datanode creation fails. How do I deal with it?
Please update you hdfs-site.xml as follows where dfs.datanode.data.dir value should be set as per your expectations. You can find this file in /etc/hadoop under Hadoop installation directory.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/myname/data/hdfs/data</value>
</property>
</configuration>
Use similar paths for linux as /home/myname/data/hdfs/data

Hadoop: Secondary NameNode Permission Denied

I'm attempting to run Hadoop in pseudo-distributed mode to learn how the system work. To install it, I've downloaded Hadoop-3.0.0 from the site, untarred it. I've done my configurations as follows (leaving out the configuration tags for brevity):
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
hdsf-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value> </property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
After doing this, I've formatted my hdfs using
hdfs namenode -format
I've also setup passwordless ssh using the following:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa2
cat ~/.ssh/id_rsa2.pub >> ~/.ssh/authorized_keys
(I've also added id_rsa2.pub as the default for localhost using a config file, since I already was using id_rsa.pub for something else and didn't want to mix-and-match in case I broke something)
I'm able to ssh into localhost. All looks well.
Then I run start-dfs.sh, and I see this error:
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [zm.local]
zm.local: zm#zm.local: Permission denied (publickey,password,keyboard-interactive).
2018-01-16 17:31:35,807 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
If I run jps (after starting yarn and mapreduce history server), I have the following:
37921 NodeManager
38070 Jps
37434 NameNode
38060 JobHistoryServer
37821 ResourceManager
Noticeably, the SecondaryNameNode is missing, my assumption being it's due to the error above.
I can then try to use hadoop's fs command and I'm able to create a folder and look it up. But if I try to copy any data over, I get notified that the NameNode is in SAFEmode. If I turn off save mode using:
hdfs dfsadmin -safemode leave
It immediately turns back on. By going to the namenode port on localhost, I see the following message:
Safe mode is ON. Resources are low on NN. Please add or free up more resourcesthen turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
However, I have plenty of resources. The single datanode is using less than 8% of it's allotted space, the namenode as almost 100GB of space. The datanode and namenode are both reporting as healthy. Thus, I think the problem is the lack of a secondary namenode. With that in mind, is anyone aware what might be causing the SecondaryNameNode to have different permission issues from the PrimaryNameNode? It seems to be trying to put the sNN somewhere on the local machine instead - but when I check in /tmp/hadoop*, all of the file permissions seem to be normal.
Thanks for any help.

Installing Hadoop on NFS

As a start, I've installed Hadoop (0.15.2) and setup a cluster of 3 nodes: one each for NameNode, DataNode and the JobTracker. All the daemons are up and running. But when I issue any command I get the above error. For instance, when I do a copyFromLocal, I get the following error:
Am I missing something?
More details:
I am trying to install Hadoop on an NFS file system. I've installed 1.0.4 version and tried running it but to of no avail. The 1.0.4 version doesn't start the datanode. And the log files for the datanode are empty. Hence I switched back to 0.15 version which started all the daemons atleast.
I believe the problem is due to the underlying NFS file system i.e. all the datanodes and masters using the same files and folders. But I am not sure if that is actually the case.
But I don't see any reason why I shouldn't be able to run Hadoop on NFS (after appropriately setting the configuration parameters).
Currently I am trying and figuring out if I could set the name and data directories differently for different machines based on the individual machine names.
Configuration file: (hadoop-site.xml)
<property>
<name>fs.default.name</name>
<value>mumble-12.cs.wisc.edu:9001</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>mumble-13.cs.wisc.edu:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.secondary.info.port</name>
<value>9002</value>
</property>
<property>
<name>dfs.info.port</name>
<value>9003</value>
</property>
<property>
<name>mapred.job.tracker.info.port</name>
<value>9004</value>
</property>
<property>
<name>tasktracker.http.port</name>
<value>9005</value>
</property>
Error using Hadoop 1.0.4 (DataNode doesn't get started):
2013-04-22 18:50:50,438 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001, call addBlock(/tmp/hadoop-akshar/mapred/system/jobtracker.info, DFSClient_502734479, null) from 128.105.112.13:37204: error: java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Error using Hadoop 0.15.2:
[akshar#mumble-12] (38)$ bin/hadoop fs -copyFromLocal lib/junit-3.8.1.LICENSE.txt input
13/04/17 03:22:11 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
copyFromLocal: Connection reset
I was able to get Hadoop to run over NFS using version 1.1.2. It might work for other versions, but I can't guarantee anything.
If you have an NFS file system then each node should have access to the filesystem. The fs.default.name tells Hadoop the filesystem URI to use, so it should be pointed to the local disk. I'll assume that your NFS directory is mounted to each node at /nfs.
In core-site.xml you should define:
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/nfs/tmp</value>
</property>
In mapred-site.xml you should define:
<property>
<name>mapred.job.tracker</name>
<value>node1:8021</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred-local</value>
</property>
Since hadoop.tmp.dir is pointed to the nfs drive then the default locations of mapred.system.dir and mapreduce.jobtracker.staging.root.dir point to locations on the nfs drive. It might run with leaving the default value for mapred.local.dir, but it is supposed to point to the local filesystem so to be safe you can put that in /tmp.
You don't have to worry about hdfs-site.xml. This configuration file is used when you start the namenode, but with everything being distributed on the nfs drive you shouldn't run HDFS.
Now you can run start-mapred.sh on the jobtracker node and run a hadoop job. Don't run start-all.sh or start-dfs.sh because those will start HDFS. If you run multiple DataNodes that point to the same NFS directory, then one DataNode will lock that directory and the others will shutdown because they are unable to obtain a lock.
I tested the configuration with:
bin/hadoop jar hadoop-examples-1.1.2.jar wordcount /nfs/data/test.text /nfs/out
Note that you need to specify full paths to the input and output locations.
I also tried:
bin/hadoop jar hadoop-examples-1.1.2.jar grep /nfs/data/loremIpsum.txt /nfs/out2 lorem
It gave me the same output as when I run it in Standalone, so I assume it is performing correctly.
Here is more information on fs.default.name:
http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

"Connection refused" Error for Namenode-HDFS (Hadoop Issue)

All my nodes are up and running when we see using jps command, but still I am unable to connect to hdfs filesystem. Whenever I click on Browse the filesystem on the Hadoop Namenode localhost:8020 page, the error which i get is Connection Refused. Also I have tried formatting and restarting the namenode but still the error persist. Can anyone please help me solving this issue.
Check whether all your services are running JobTracker, Jps, NameNode. DataNode, TaskTracker by running jps command.
Try to run start them one by one:
./bin/stop-all.sh
./bin/hadoop-daemon.sh start namenode
./bin/hadoop-daemon.sh start jobtracker
./bin/hadoop-daemon.sh start tasktracker
./bin/hadoop-daemon.sh start datanode
If you're still getting the error, stop them again and clean your temp storage directory. The directory details are in the config file ./conf/core-site.xml and the run,
./bin/stop-all.sh
rm -rf /tmp/hadoop*
./bin/hadoop namenode -format
Check the logs in the ./logs folder.
tail -200 hadoop*jobtracker*.log
tail -200 hadoop*namenode*.log
tail -200 hadoop*datanode*.log
Hope it helps.
HDFS may use port 9000 under certain distribution/build.
please double check your name node port.
Change the core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopvm:8020</value>
<final>true</final>
</property>
change to the ip adress .
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.132.129:8020</value>
<final>true</final>
</property>

Namenode not getting started

I was using Hadoop in a pseudo-distributed mode and everything was working fine. But then I had to restart my computer because of some reason. And now when I am trying to start Namenode and Datanode I can find only Datanode running. Could anyone tell me the possible reason of this problem? Or am I doing something wrong?
I tried both bin/start-all.sh and bin/start-dfs.sh.
I was facing the issue of namenode not starting. I found a solution using following:
first delete all contents from temporary folder: rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
format the namenode: bin/hadoop namenode -format
start all processes again:bin/start-all.sh
You may consider rolling back as well using checkpoint (if you had it enabled).
hadoop.tmp.dir in the core-site.xml is defaulted to /tmp/hadoop-${user.name} which is cleaned after every reboot. Change this to some other directory which doesn't get cleaned on reboot.
Following STEPS worked for me with hadoop 2.2.0,
STEP 1 stop hadoop
hduser#prayagupd$ /usr/local/hadoop-2.2.0/sbin/stop-dfs.sh
STEP 2 remove tmp folder
hduser#prayagupd$ sudo rm -rf /app/hadoop/tmp/
STEP 3 create /app/hadoop/tmp/
hduser#prayagupd$ sudo mkdir -p /app/hadoop/tmp
hduser#prayagupd$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser#prayagupd$ sudo chmod 750 /app/hadoop/tmp
STEP 4 format namenode
hduser#prayagupd$ hdfs namenode -format
STEP 5 start dfs
hduser#prayagupd$ /usr/local/hadoop-2.2.0/sbin/start-dfs.sh
STEP 6 check jps
hduser#prayagupd$ $ jps
11342 Jps
10804 DataNode
11110 SecondaryNameNode
10558 NameNode
In conf/hdfs-site.xml, you should have a property like
<property>
<name>dfs.name.dir</name>
<value>/home/user/hadoop/name/data</value>
</property>
The property "dfs.name.dir" allows you to control where Hadoop writes NameNode metadata.
And giving it another dir rather than /tmp makes sure the NameNode data isn't being deleted when you reboot.
Open a new terminal and start the namenode using path-to-your-hadoop-install/bin/hadoop namenode
The check using jps and namenode should be running
Why do most answers here assume that all data needs to be deleted, reformatted, and then restart Hadoop?
How do we know namenode is not progressing, but taking lots of time.
It will do this when there is a large amount of data in HDFS.
Check progress in logs before assuming anything is hung or stuck.
$ [kadmin#hadoop-node-0 logs]$ tail hadoop-kadmin-namenode-hadoop-node-0.log
...
016-05-13 18:16:44,405 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 117/141 transactions completed. (83%)
2016-05-13 18:16:56,968 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 121/141 transactions completed. (86%)
2016-05-13 18:17:06,122 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 122/141 transactions completed. (87%)
2016-05-13 18:17:38,321 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 123/141 transactions completed. (87%)
2016-05-13 18:17:56,562 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 124/141 transactions completed. (88%)
2016-05-13 18:17:57,690 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 127/141 transactions completed. (90%)
This was after nearly an hour of waiting on a particular system.
It is still progressing each time I look at it.
Have patience with Hadoop when bringing up the system and check logs before assuming something is hung or not progressing.
In core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yourusername/hadoop/tmp/hadoop-${user.name}
</value>
</property>
</configuration>
and format of namenode with :
hdfs namenode -format
worked for hadoop 2.8.1
If anyone using hadoop1.2.1 version and not able to run namenode, go to core-site.xml, and change dfs.default.name to fs.default.name.
And then format the namenode using $hadoop namenode -format.
Finally run the hdfs using start-dfs.sh and check for service using jps..
Did you change conf/hdfs-site.xml dfs.name.dir?
Format namenode after you change it.
$ bin/hadoop namenode -format
$ bin/hadoop start-all.sh
If you facing this issue after rebooting the system, Then below steps will work fine
For workaround.
1) format the namenode: bin/hadoop namenode -format
2) start all processes again:bin/start-all.sh
For Perm fix: -
1) go to /conf/core-site.xml change fs.default.name to your custom one.
2) format the namenode: bin/hadoop namenode -format
3) start all processes again:bin/start-all.sh
Faced the same problem.
(1) Always check for the typing mistakes in the configuring the .xml files, especially the xml tags.
(2) go to bin dir. and type ./start-all.sh
(3) then type jps , to check if processes are working
Add hadoop.tmp.dir property in core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yourname/hadoop/tmp/hadoop-${user.name}</value>
</property>
</configuration>
and format hdfs (hadoop 2.7.1):
$ hdfs namenode -format
The default value in core-default.xml is /tmp/hadoop-${user.name}, which will be deleted after reboot.
Try this,
1) Stop all hadoop processes : stop-all.sh
2) Remove the tmp folder manually
3) Format namenode : hadoop namenode -format
4) Start all processes : start-all.sh
If you kept default configurations when running hadoop the port for the namenode would be 50070. You will need to find any processes running on this port and kill them first.
Stop all running hadoop with : bin/stop-all.sh
check all processes running in port 50070
sudo netstat -tulpn | grep :50070 #check any processes running in
port 50070, if there are any the / will
appear at the RHS of the output.
sudo kill -9 <process_id> #kill_the_process.
sudo rm -r /app/hadoop/tmp #delete the temp folder
sudo mkdir /app/hadoop/tmp #recreate it
sudo chmod 777 –R /app/hadoop/tmp (777 is given for this example purpose only)
bin/hadoop namenode –format #format hadoop namenode
bin/start-all.sh #start-all hadoop services
Refer this blog
For me the following worked after I changed the directory of the namenode
and datanode in hdfs-site.xml
-- before executing the following steps stop all services with stop-all.sh or in my case I used the stop-dfs.sh to stop the dfs
On the new configured directory, for every node (namenode and datanode), delete every folder/files inside it (in my case a 'current' directory).
delete the Hadoop temporary directory: $rm -rf /tmp/haddop-$USER
format the Namenode: hadoop/bin/hdfs namenode -format
start-dfs.sh
After I followed those steps my namenode and datanodes were alive using the new configured directory.
I ran $hadoop namenode to start namenode manually at foreground.
From the logs I figured out that 50070 is ocuupied, which was defaultly used by dfs.namenode.http-address. After configuring dfs.namenode.http-address in hdfs-site.xml, everything went well.
I got the solution just share with you that will work who got the errors:
1. First check the /home/hadoop/etc/hadoop path, hdfs-site.xml and
check the path of namenode and datanode
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
2.Check the permission,group and user of namenode and datanode of the particular path(/home/hadoop/hadoopdata/hdfs/datanode), and check if there are any problems in all of them and if there are any mismatch then correct it. ex .chown -R hadoop:hadoop in_use.lock, change user and group
chmod -R 755 <file_name> for change the permission
After deleting a resource managers' data folder, the problem is gone.
Even if you have formatting cannot solve this problem.
If your namenode is stuck in safemode you can ssh to namenode, su hdfs user and run the following command to turn off safemode:
hdfs dfsadmin -fs hdfs://server.com:8020 -safemode leave
Instead of formatting namenode, may be you can use the below command to restart the namenode. It worked for me:
sudo service hadoop-master restart
hadoop dfsadmin -safemode leave
I was facing the same issue of namenode not starting with Hadoop-3.2.1**** version. I did the steps to resolve the issue:
Delete the contents from temporary folder from the name node directory. In my case the "current" directory made by root user: rm -rf (dir name)
Format the namenode: hdfs namenode -format
start the processes again:start-dfs.sh
Point #1 has change in the hdfs-site.xml file.
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/node-data/hdfs/namenode</value>
</property>
I ran into the same thing after a restart.
for hadoop-2.7.3 all I had to do was format the namenode:
<HadoopRootDir>/bin/hdfs namenode -format
Then a jps command shows
6097 DataNode
755 RemoteMavenServer
5925 NameNode
6293 SecondaryNameNode
6361 Jps

Resources