Hadoop start-dfs.sh could not start nodes

Hadoop start-dfs.sh could not start nodes - hadoop

While I was trying to start a hdfs on my localhost server, something went wrong.
After ssh localhost hdfs namenode -format, I tried to run start-dfs.sh. Everything seemed to be on track because my zsh did not recognize the java error and ticked green for me. However when I tried to check the daemon with jps, I could not find my nodes working.
Then I dug deeper into the log. The data/name/secondary node all logged the following error similar to below (I picked datanode as example):
2021-11-13 23:44:46,791 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2021-11-13 23:44:46,791 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
Then I referred to some of the solutions to the former error 15, they all told me to simply restart the daemon. However, after running stop-all.sh and reformatting the namenode, these errors remain unchanged. And now I doubt whether I was wrong during some of the steps setting up the hadoop.
I got my changed file listed below:
core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml:
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
ps: My hadoop-env seemed to have been configured properly because java is running normally. Plus, I don't think there is any environment variable problem in this case. Also, ssh localhost seemed to be working fine during the period. At least no error was reported while using ssh.
However, I'm ready to get these related files posted if the problem is in these cases.

Related

ERROR datanode.DataNode: Exception in secureMain

I was trying to install Hadoop on windows.
Namenode is working fine but Data Node is not working fine. Following error is being displayed again and again even after trying for several times.
Following Error is being shown on CMD regarding dataNode:
2021-12-16 20:24:32,624 INFO checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/C:/Users/mtalha.umair/datanode 2021-12-16 20:24:32,624 ERROR datanode.DataNode: Exception in secureMain org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value configured for dfs.datanode.failed.volumes.tolerated -
1. Value configured is >= to the number of configured volumes (1).
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:176)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2799)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2714)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2756)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2900)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2924) 2021-12-16 20:24:32,640 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value configured for dfs.datanode.failed.volumes.tolerated - 1. Value configured is >= to the number of configured volumes (1). 2021-12-16 20:24:32,640 INFO datanode.DataNode: SHUTDOWN_MSG:
I have referred to many different articles but to no avail. I have tried to use another version of Hadoop but the problem remains and as I am just starting out, I can't fully understand the problem therefore I need help
these are my configurations
-For core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
For mapred-site.xml
mapreduce.framework.name
yarn
-For yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
-For hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/D:/big-data/hadoop-3.1.3/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>datanode</value> </property> <property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>1</value> </property> <property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Well unfortunately the reason this is failing is exactly what the message says. Let me try to say it another way.
dfs.datanode.failed.volumes.tolerated = 1
The number of (dfs.datanode.data.dir) folders you have configured is 1.
You are saying you will tolerate no data drives (1 drive configured and you'll tolerate it breaking). This does not make sense and is why this is being raised as an issue.
You need to alter it so there's a gap of at least 1 (so that you can still have a running datanode.)
Here are your options:
Configure more data volumes (2) with dfs.datanode.failed.volumes.tolerated Set to 1. For example, store data in both your C and D drive.
dfs.datanode.failed.volumes.tolerated to 0; and keep you data volumes
as is (1)

set up Hadoop multi cluster on 2 windows 10

I am trying to set up a multi-node Hadoop cluster between 2 windows devices. I am using Hadoop 2.9.2.
how can I achieve that, please.

after a lot of trial and error the following did the job me.
do same configuration as previous answer by #AbsoluteBeginner.
disable windows firewall on all machines (i think you could keep it on and just mess around with the rules, but thats for you to find out)
hdfs namenode -format all nodes (master and slaves)
make sure that the datanode folder is empty in all 3 nodes (just shift+del)
in master node run start-all.cmd. all the following should appear.
50436 NameNode
54696 NodeManager
54744 DataNode
60028 Jps
7340 ResourceManager
in slave nodes run start-all.cmd. all the following should appear
6116 DataNode
2408 Jps
3208 NodeManager
note the reason that nameode and resource manager isn't appearing, is becuase they are running on master node and already occupy the port, and you only need the master resourcemanger and name node running
note if you saw multi-cluster tutorial of linux the master node also shows SeceondryNameNode when executing jps. not really sure why its not appearing in windows.
go to master:50070, and navigate to data nodes you should see something like this
go to master:8088, and navigate to Node you should see something like this

Install open-ssh server on both of your systems using this guide. Generating a new SSH public and private key pair on your local computer is the first step towards authenticating with a remote server without a password. Add the public key to the authorized_keys and add your hostname to list of known hosts. You can find guides on how to do this by searching the internet.
2.Add your hadoop master and slave ips to your hosts file. Open “C:\Windows\System32\drivers\etc\hosts”
and add
your-master-ip hadoopMaster
your-salve-ip hadoopSlave
you can use these names in your configuration files.
much like Linux systems, these are the steps you have to follow in order to run a Hadoop cluster on windows:
3. First you need to have Java installed on your system and JAVA_HOME must be added to your environment variables. You can download Java from Oracle website and install it.
Download Hadoop binary files from Apache website and extract it.
Note that you shouldn't have space in your folder names or you might encounter problems.
Next you have to add Java and Hadoop home and bin folders to your environment variables. just open start menu and type "environment variable" and open the edit environment variables window from control panel.
Add
HADOOP_HOME=”root of your hadoop extracted folder\hadoop-2.9.2″
HADOOP_BIN=”root of hadoop extracted folder\hadoop-2.9.2\bin”
JAVA_HOME=<Root of your JDK installation>”
Edit your "path" environment variable and add %JAVA_HOME%, %HADOOP_HOME%, %HADOOP_BIN%, %HADOOP_HOME%/sbin to your PATH one by one.
you can validate your additions by opening cmd and type in:
echo %HADOOP_HOME%
echo %HADOOP_BIN%
echo %PATH%
CONFIGURING HADOOP:
10. Open "your hadoop root\hadoop-2.9.2\etc\hadoop\hadoop-env.cmd" and add the following lines to the bottom of the file:
set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin
11.Open "your-hadoop-root\hadoop-2.9.2\etc\hadoop\hdfs-site.xml" and add the below content:
<property>
<name>dfs.name.dir</name>
<value>your desired address</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>your desired address</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoopMaster:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoopMaster:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
</property>
edit your core-site.xml and add:
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopMaster:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>your-temp-directory</value>
<description>A base for other temporary directories.</description>
</property>
Open "root to hadoop\hadoop-2.9.2\etc\hadoop\mapred-site.xml" and add below content within tags. If you don’t see mapred-site.xml then open mapred-site.xml.template file and rename it to mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hadoopMaster:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
14.Edit your yarn-site.xml and add:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>Long running service which executes on Node Manager(s) and provides MapReduce Sort and Shuffle functionality.</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Enable log aggregation so application logs are moved onto hdfs and are viewable via web ui after the application completed. The default location on hdfs is '/log' and can be changed via yarn.nodemanager.remote-app-log-dir property</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopMaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopMaster:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopMaster:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoopMaster:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoopMaster:8088</value>
</property>
In your slaves file in "root-hadoop-directory/hadoop/bin" add
hadoopSlave
Do these steps on your slave nodes too.
open cmd and cd to your sbin folder in hadoop directory.
18.format your nameNode
hadoop namenode -format
19.run the following command:
start-dfs.sh
then run:
start-yarn.sh

“WARN hdfs.DFSUtil: Namenode for null remains unresolved for ID null.”

I want to test if my hadoop worked well after configuration, but after the input, the command start-all.sh show below error in terminal
WARN hdfs.DFSUtil: Namenode for null remains unresolved for ID null.
Check your hdfs-site.xml file to ensure namenodes are configured
properly.
Starting namenodes on [master]
master: ssh: Could not resolve hostname master: Name or service not known
I checked my hdfs-site.xml file and resolved it as others given like this
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/lidekanfa/tools/hadoop-2.7.7/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/lidekanfa/tools/hadoop-2.7.7/hdfs/data</value>
</property>
</configuration>
It still doesn't work. Then I checked my hosts file and I have given the ip and name ,and more I can log in the slave without password. What is the problem?
Thanks a lot!

I have got the answer. There are 2 points.
First, my master's name called lidekanfa, not master. But in hdfs-site.xml file and other configuration files where should use master's name(lidekanfa), I used master instead. So it warns Namenode for null remains unresolved for ID null.
Second, There is an another hidden problem for me. In the installation tutorials for beginner, they use the same user name such as root etc among machines, but I didn't notice that. This lead the problem that after I fixed the problem mentioned above, it called me to input the password, but the user name and the ID didn't match, so the hadoop didn't work. To solve this problem, I reproduce the keys and start hadoop with root identity. Meanwhile, you may rewrite the sshd_config file to allow login as root. You can also solve this problem use the same user name among machines.

I too had the same problem.The problem was with my core-site.xml. after correcting the localhost, it worked fine. Namenode was able to connect the localhost.
In my case
error core-site.xml : <value>hdfs://localhosts:9000</value>
corrected core-site.xml : <value>hdfs://localhost:9000</value>

Raspberry Pi Hadoop Cluster Configuration

I've recently been trying to build and configure a (8-Pi) Raspberry Pi 3 Hadoop-cluster (as a personal project over the summer). Please bear with me (unfortunately I am a little new to Hadoop). I am using is Hadoop version 2.9.2. I think its important to note that right now I am trying to just get one Namenode and one Datanode completely functional with one-another, before moving ahead and replicating the same procedure on the remaining seven Pi's.
The issue: My Namenode (alias: master) is the only node that is being displayed as a 'Live Datanode' under both the dfs-health interface, and through the use of :
dfsadmin -report
Even though the Datanode is being displayed as an 'Active Node' (within the Nodes of the cluster Hadoop UI) and 'master' is not listed within the slaves file. The configuration I am aiming for is that the Namenode should not perform any of Datanode operations. Additionally I am trying to configure the cluster in such a way that the command above will display my Datanode (alias: slave-01) as a 'Live Datanode'.
I suspect that my issue is caused by the fact that both my Namenode and Datanode make use of the same host-name (raspberrypi), however am unsure of the configuration changes I am required to make in order to correct the issue. After having looked into the documentation, I unfortunately couldn't find a conclusive answer as to whether this is allowed or not.
If someone could please help me solve this issue it would be extremely appreciated! I have provided any relevant file-information below (which I thought may be useful for solving the issue). Thank you :)
PS: All files are identical within the Namenode and Datanode unless otherwise specified.
===========================================================================
Update 1
I have removed localhost from the slaves file on both the Namenode and Datanode, and changed their respective hostnames to 'master' and 'slave-01' as well.
After running JPS: I have noticed that all of the correct processes are running on the master node, however I am having an error on the Datanode for which the log shows:
ExitCodeException exitCode=1: chmod: changing permissions of '/opt/hadoop_tmp/hdfs/datanode': Operation not permitted.
If someone could please help me solve this issue it would be extremely appreciated! Unfortunately the issue persists despite changing permissions using 'chmod 777'. Thanks in advance :)
===========================================================================
Hosts File
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.1.1 raspberrypi
192.168.1.2 master
192.168.1.3 slave-01
Master File
master
Slaves File
localhost
slave-01
Core-Site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>fs.default.FS</name>
<value>hdfs://master:9000/</value>
</property>
</configuration>
HDFS-Site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop_tmp/hdfs/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop_tmp/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Mapred-Site.xml
<configuration>
<property>
<name>mapreduce.job.tracker</name>
<value>master:5431</value>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Yarn-Site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>

You could let your local router serve up the host names rather than manipulate /etc/hosts yourselves, but in order to change each Pi's name, edit /etc/hostname and reboot.
Before and after boots, check running hostname -f
Note: "master" is really meaningless once you have a "YARN master", "HDFS master", "Hive Master", etc. Best to literally say namenode, data{1,2,3}, yarn-rm, and so on
Regarding permissions issues, you could run everything as root, but that's insecure outside a homelab, so you'd want to run a few adduser commands for at least hduser (as documented elsewhere, but can be anything else), and yarn, then run commands as those users, after chown -R the data and log directories to be owned by these users and Unix groups they belong to

Error in starting hadoop Job Tracker

I tried to run a simple program in hadoop using Windows-Cygwin.
I am able to start the namenode .
The jobtracker start however fails with exception :
FATAL mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2200)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2192)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2186)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
I tried all possible methods to resolve this ,but in vain. Any pointers will greatly help me.
Hdfs-site.xml configurations :
<configuration><br>
<property>
<name>fs.default.name< /name>
<value>hdfs://localhost:9100</value>
</property>
<property>
<name>mapred.job.tracker< /name>
<value>localhost:9101< /value>
</property>
<property>
<name>dfs.replication< /name>
<value>1</value>
</property>
</configuration>

The problem is the following lines should on into mapred-site.xml and NOT hdfs-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
By the way why are you trying to run Hadoop in Windows? For development? You don't have a linux machine or reluctant to install one?
One more thing, you usually put this property in core-site.xml not hdfs-site.xml,
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9100</value>
</property>

I faced the same issue when working on the "Pseudo Distributed" examples as at this page: http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html#PseudoDistributed
It turned out that hadoop simply wasn't picking up my conf files. The examples at the link above assume you are running in your install of hadoop (i.e. /Usr/jane/hadoop-1.1.2). I was trying to run the examples in another directory. I'm sure you could configure hadoop to recognize other 'conf' directories, but I took the easy route and just started running in my hadoop directory.
This thread helped me figure it out: https://issues.apache.org/jira/browse/HDFS-2515

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hadoop start-dfs.sh could not start nodes - hadoop

Related

ERROR datanode.DataNode: Exception in secureMain

set up Hadoop multi cluster on 2 windows 10

“WARN hdfs.DFSUtil: Namenode for null remains unresolved for ID null.”

Raspberry Pi Hadoop Cluster Configuration

Error in starting hadoop Job Tracker

Categories

Resources