Hadoop: Incorrect configuration - macos

Hi stackoverflow community,
so I've been wanting to install hadoop, but I have come to a problem.
I've looked at other approaches, but I still keep receiving. I am completely new to hadoop, so I don't really know where to go. I am on a macbook pro with El Capitan if relevant. Once I make sbin/start-dfs.sh I receive this:
sbin/start-dfs.sh
16/05/10 11:09:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
Password:
localhost: /usr/local/Cellar/hadoop/2.7.2/libexec/sbin/hadoop-daemon.sh: line 69: [: MacBook: integer expression expected
localhost: starting namenode, logging to /usr/local/Cellar/hadoop/2.7.2/libexec/logs/hadoop-name-namenode-name’s
localhost: Error: Could not find or load main class MacBook
The hadoop-daemon.sh is:
The relevant XMLs are as follow:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
If anything is wanted I will freely provide. Thank you for all the help and I truly appreciate it, since I really want to start using Hadoop.
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home
export HADOOP_PREFIX=/usr/local/Cellar/hadoop
Hey so this is an update if anyone is considered: I now get this
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [myIP#]
New note: I am redoing the process with and refollowing this guide. Whether or not success is mine, I will post my update here :)!
zhongyaonan.com/hadoop-tutorial/…

Looks like your conf directory is not set properly try following steps
export HADOOP_CONF_DIR = $HADOOP_HOME/etc/hadoop
hdfs namenode -format
hdfs getconf -namenodes
./start-dfs.sh

Related

The process of NameNode isn't present when executing jps

I'm new to the Hadoop ecosystem,
I installed hadooop 3.3.0 as a Pseudo-Distributed Mode.
The all application http://localhost:8088/ is working but to view name node of the application on http://localhost:9870/ i couldn't (This site can’t be reached).
$ jps
24553 Jps
20537 NodeManager
20429 ResourceManager
and
$ hadoop version
Hadoop 3.3.0
Source code repository https://github.com/apache/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:21Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
I tried to restart the process but in vain
$ stop-all.sh
WARNING: Stopping all Apache Hadoop daemons as mhannani in 10 seconds.
WARNING: Use CTRL-C to abort.
Stopping namenodes on [HP]
Stopping datanodes
Stopping secondary namenodes [HP]
2021-01-06 16:42:07,540 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping nodemanagers
Stopping resourcemanager
format :
$ hdfs namenode -format
2021-01-06 16:44:14,683 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = HP/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.3.0
and then
$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as mhannani in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [HP]
Starting datanodes
Starting secondary namenodes [HP]
HP: ERROR: Cannot set priority of secondarynamenode process 29847
2021-01-06 16:45:38,266 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
Please how can i fix that, in order to access my hdfs file system from the browser as it was on earlier version of Hadoop on http://localhost:9870/50075 ?
Any help or advice would be appreciated, Thanks folks.
The issue was not setting correctly the namenode path, and datanode paths of my local file systems :
on $HADOOP_HOME/etc/hadoop/hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file://AbsolutePATH/TO/WHERE/THE/namenode/Should/be/stored</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file://The/same/for/dataNode</value>
</property>
</configuration>

Unwanted secondary namenode on my hadoop cluster

I'm learning Hadoop-2.9.2 especially on namenode HA.
At the first time of cluster setting, I put dfs.namenode.secondary.http-address into hdfs-site.xml and it works properly.
After reading this, I found out there is no need of secondary namenode under hadoop HA.
Note that, in an HA cluster, the Standby NameNode also performs
checkpoints of the namespace state, and thus it is not necessary to
run a Secondary NameNode, CheckpointNode, or BackupNode in an HA
cluster. In fact, to do so would be an error. This also allows one who
is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to
reuse the hardware which they had previously dedicated to the
Secondary NameNode.
finally, I removed dfs.namenode.secondary.http-address from hdfs.site.xml and restarted the cluster but secondarynamenode on 0.0.0.0 is still there.
here is my hdfs-site.xml below.
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/nameNode</value>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/datanode</value>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hdfs command shows me the same like below.
$ hdfs getconf -secondarynamenodes
18/12/21 00:35:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0.0.0.0
I ensure that all config files are synchronized across all nodes.
According to my understanding, I need to get gid of secondarynamenode before getting into HA setting.
If I'm not wrong, How can I handle with this?
I do know that I can do hadoop-daemon.sh stop secondarynamenode but I don't want it to start at start-dfs.sh.

Hadoop: Secondary NameNode Permission Denied

I'm attempting to run Hadoop in pseudo-distributed mode to learn how the system work. To install it, I've downloaded Hadoop-3.0.0 from the site, untarred it. I've done my configurations as follows (leaving out the configuration tags for brevity):
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
hdsf-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value> </property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
After doing this, I've formatted my hdfs using
hdfs namenode -format
I've also setup passwordless ssh using the following:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa2
cat ~/.ssh/id_rsa2.pub >> ~/.ssh/authorized_keys
(I've also added id_rsa2.pub as the default for localhost using a config file, since I already was using id_rsa.pub for something else and didn't want to mix-and-match in case I broke something)
I'm able to ssh into localhost. All looks well.
Then I run start-dfs.sh, and I see this error:
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [zm.local]
zm.local: zm#zm.local: Permission denied (publickey,password,keyboard-interactive).
2018-01-16 17:31:35,807 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
If I run jps (after starting yarn and mapreduce history server), I have the following:
37921 NodeManager
38070 Jps
37434 NameNode
38060 JobHistoryServer
37821 ResourceManager
Noticeably, the SecondaryNameNode is missing, my assumption being it's due to the error above.
I can then try to use hadoop's fs command and I'm able to create a folder and look it up. But if I try to copy any data over, I get notified that the NameNode is in SAFEmode. If I turn off save mode using:
hdfs dfsadmin -safemode leave
It immediately turns back on. By going to the namenode port on localhost, I see the following message:
Safe mode is ON. Resources are low on NN. Please add or free up more resourcesthen turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
However, I have plenty of resources. The single datanode is using less than 8% of it's allotted space, the namenode as almost 100GB of space. The datanode and namenode are both reporting as healthy. Thus, I think the problem is the lack of a secondary namenode. With that in mind, is anyone aware what might be causing the SecondaryNameNode to have different permission issues from the PrimaryNameNode? It seems to be trying to put the sNN somewhere on the local machine instead - but when I check in /tmp/hadoop*, all of the file permissions seem to be normal.
Thanks for any help.

How to configure hadoop to use non-default port: "0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused"

When I run start-dfs I get the below error and it looks like I need to tell hadoop to use a different port since that is what I require when I ssh into localhost. In other words the following works successfully: ssh -p 2020 localhost.
[Wed Jan 06 16:57:34 root#~]# start-dfs.sh
16/01/06 16:57:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: namenode running as process 85236. Stop it first.
localhost: datanode running as process 85397. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused
16/01/06 16:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/hdfs/datanode</value>
</property>
</configuration>
If your Hadoop cluster nodes run sshd listening on a non-standard port, then it is possible to tell the Hadoop scripts to initiate ssh connections to that port. In fact, it's possible to customize any of the options passed to the ssh command.
This is controlled by an environment variable named HADOOP_SSH_OPTS. You can edit your hadoop-env.sh file and define it there. (By default this environment variable is not defined.)
For example:
export HADOOP_SSH_OPTS="-p 2020"

Getting Exception on "hadoop fs -ls /"

I run hadoop-2.0.5-alpha.
When I list hdfs files, I get this Exception:
bin/hadoop fs -ls /
13/07/07 18:47:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status;
My core-site.xml looks like that:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
</configuration>
What could be wrong?
If you have multpile versions of hadoop installed on your system, verify your PATH. You may be using the wrong version of hadoop as the client.
I ran into this problem when I had two versions of hadoop installed: hadoop-1.1.2 and hadoop-2.1.0-beta. It turned out that my path was incorrect and I was attempting to run the hadoop command from hadoop-1.1.2 against hadoop 2.1.0-beta.
In addition to your PATH, check the settings of your HADOOP_CONF_DIR or even HADOOP_HOME environment variables to be sure they are pointing to the correct directory for your hadoop 2 installation.

Resources