Hadoop namenode cant fire the datanode - hadoop

I have a multinode setup on separate machines the namenode cant fire the datanode and the task tracker, the namenode, secondary node , jobtracker works fine
the namenode machine named namenode#namenode IP 192.168.1.1
the datanode machine named datanode2#datanode2 IP 192.168.1.2
the ssh server is setup and the id_rsa.pub is copied to the datanode
but when applying the start-all.sh command
when firing the datanode it asks for a password for namenode#datanode2
when providing the password it say permission denied

You need to have core-site.xml with your namenode address. This needs to be same across cluster.
<property>
<name>fs.default.name</name>
<value>hdfs://$namenode.full.hostname:8020</value>
<description>Enter your NameNode hostname</description>
</property>
You can use script to start individual demons . Follow this SO post.

Change permissions for .ssh folder and authorized_keys file as follows:
sudo chmod 700 ~/.ssh
sudo chmod 640 ~/.ssh/authorized_keys
or
sudo chmod 700 /home/hadoop/.ssh
sudo chmod 640 /home/hadoop/.ssh/authorized_keys
Refer this for more details.
UPDATE I:
Try 600 instead of 640 like this:
sudo chmod 600 $HOME/.ssh/authorized_keys
sudo chown 'hadoop' $HOME/.ssh/authorized_keys
If this did not work,try this one:
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopusrname#HOSTNAME.local
Change HOSTNAME with you local hostname and hadoopusrname with your hadoop username.

Related

I keep getting "Permission Denied" in Google Cloud terminal when trying to open Hadoop

I am trying to run Hadoop on a GCP. When ever I type in the command
start-dfs.ssh && start-yarn
I get the following.......
localhost: chuckpryorjr#localhost: Permission denied (publickey).
localhost: chuckpryorjr#localhost: Permission denied (publickey).
Starting secondary namenodes [0.0.0.0]
0.0.0.0: chuckpryorjr#0.0.0.0: Permission denied (publickey).
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-ecosystem/hadoop-2.9.2/logs/yarn-chuckpryorj
r-resourcemanager-hadoopmasters.out
localhost: chuckpryorjr#localhost: Permission denied (publickey).
I don't get it. Before they used to prompt me for a password(which I didn't ever recall making), now its just outright denying me. How can make this passwordless? Also the very first time I installed hadoop on GCP and ran it..it worked fine. Sometimes I can get through to complete my work..sometimes I cant.
How can make this passwordless?
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Then update your local authorized keys file for localhost
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
And if you have other servers, you can use ssh-copy-id to place the key into those
In my case when i added my Hadoop user to sudoer it worked fine.
sudo adduser hadoop sudo

Must the username of NameNode be equal to DataNode's?

I run namenode with user hduser#master, datanodes are running with user1#slave1, user1#slave2. Setting up SSh keys works fine, I can ssh remotely to my DataNode machines from master.
However, when I try to run the hadoop-daemons.sh for my datanodes it fails because it tries to ssh with the wrong user:
hduser#master:~$ hadoop-daemons.sh start datanode
hduser#slave3's password: hduser#slave1's password: hduser#slave2's password:
slave1: Permission denied (publickey,password).
slave2: Permission denied (publickey,password).
slave3: Permission denied (publickey,password).
I tried to reset the public and private key for my master and copying it to the data nodes
$ ssh-keygen -t rsa -P ""
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub user1#slave1
But gives me same error.
Does the user on the NameNode need to be the same as for the DataNodes?
Answer: After resetting the VMs and adding same user and installing Hadoop on the Data Nodes with the same user as for Name Node, it worked. So I guess the answer is yes...

Hadoop Multinode Cluster, slave permission denied

I'm trying to do multinode cluster (actually with 2 nodes - 1 master and 1 slave) on Hadoop. I follow the instruction Multinode Cluster for Hadoop 2.x
When I execute the order:
./sbin/start-all.sh
I got the error message for my slave node:
slave: Permission denied (publickey)
I already modified both .ssh/authorized_keys files on master and slave and add the keyprint from .ssh/id_rsa.pub from master and slave.
Finally I restarted the ssh with the next command sudo service ssh restart also on the both nodes (master and slaves).
By the executing of the order ./sbin/start-all.sh I don't have a problem with the master node, but slave node get me back the error message permission denied.
Has anybody some ideas, why I can not see the slave node?
The execution of the jps order get me currently next result:
master
18339 Jps
17717 SecondaryNameNode
18022 NodeManager
17370 NameNode
17886 ResourceManager
slave
2317 Jps
I think, master is ok, but I have troubles with slave.
After ssh-keygen on the Master. Copy the id_rsa.pub to the authorized_keys using cat id_rsa.pub >> authorized_keys on all the slaves. Test the password-less ssh using:
ssh <slave_node_IP>
if you have copied the whole hadoop folder from master to slave nodes(for easy replication), make sure that the slave node's hadoop folder has the correct owner from the slave system.
chown * 777 <slave's username> </path/to/hadoop>
I ran this command on my slave system and it solved my problem.

unable to start start-dfs.sh in Hadoop Multinode cluster

I have created a hadoop multinode cluster and also configured SSH in both master and slave nodes now i can connect to slave without password in master node
But when i try to start-dfs.sh in master node I'm unable to connect to slave node the execution stops at below line
log:
HNname#master:~$ start-all.sh
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-namenode-master.out
HDnode#slave's password: master: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-datanode-master.out
I pressed Enter
slave: Connection closed by 192.168.0.2
master: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-secondarynamenode-master.out
jobtracker running as process 10396. Stop it first.
HDnode#slave's password: master: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-tasktracker-master.out
slave: Permission denied, please try again.
HDnode#slave's password:
after entering the slave password the connection is closed
Below things I have tried but no results:
formatted namenode in both master & slave node
created new ssh key and configured in both the nodes
override the default HADOOP_LOG_DIR form the this post
I think you missed this step "Add the SSH Public Key to the authorized_keys file on your target hosts"
Just redo the password-less ssh step correctly. Follow this:
Generate public and private SSH keys
ssh-keygen
Copy the SSH Public Key (id_rsa.pub) to the root account on your
target hosts
.ssh/id_rsa
.ssh/id_rsa.pub
Add the SSH Public Key to the authorized_keys file on your target
hosts
cat id_rsa.pub >> authorized_keys
Depending on your version of SSH, you may need to set permissions on
the .ssh directory (to 700) and the authorized_keys file in that
directory (to 600) on the target hosts.
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
Check the connection:
ssh root#<remote.target.host>
where <remote.target.host> has the value of each host name in your cluster.
If the following warning message displays during your first
connection: Are you sure you want to continue connecting (yes/no)?
Enter Yes.
Refer: Set Up Password-less SSH
Note: password will not be asked, if your passwordless ssh is setup properly.
Make sure to start hadoop services with a new user called hadoop.
Then make sure to add the public key to the slaves with that new user.
If this doesn't work, check your firewall or iptables
I hope it helps
That mean you haven't created public key properly.
Follow below sequence.
Create User
Give all required permissions to that user
Generate public key with same user
Format Name Node
Start hadoop services.
Now it should not ask for password.

How to execute a command on a running docker container?

I have a container running hadoop. I have another docker file which contains Map-Reduce job commands like creating input directory, processing a default example, displaying output. Base image for the second file is hadoop_image created from first docker file.
EDIT
Dockerfile - for hadoop
#base image is ubuntu:precise
#cdh installation
#hadoop-0.20-conf-pseudo installation
#CMD to start-all.sh
start-all.sh
#start all the services under /etc/init.d/hadoop-*
hadoop base image created from this.
Dockerfile2
#base image is hadoop
#flume-ng and flume-ng agent installation
#conf change
#flume-start.sh
flume-start.sh
#start flume services
I am running both containers separately. It works fine. But if i run
docker run -it flume_service
it starts flume and show me a bash prompt [/bin/bash is the last line of flume-start.sh]. The i execute
hadoop fs -ls /
in the second running container, i am getting the following error
ls: Call From 514fa776649a/172.17.5.188 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I understand i am getting this error because hadoop services are not started yet. But my doubt is my first container is running. I am using this as base image for second container. Then why am i getting this error? Do i need to change anything in hdfs-site.xml file on flume contianer?
Pseudo-Distributed mode installation.
Any suggestions?
Or Do i need to expose any ports and like so? If so, please provide me an example
EDIT 2
iptables -t nat -L -n
I see
sudo iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-6
MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-6
MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-
Chain DOCKER (2 references)
target prot opt source destination
It is in docker#domian. Not inside a container.
EDIT
See last comment under surazj' answer
Have you tried linking the container?
For example, your container named hadoop is running in psedo dist mode. You want to bring up another container that contains flume. You could link the container like
docker run -it --link hadoop:hadoop --name flume ubuntu:14.04 bash
when you get inside the flume container - type env command to see ip and port exposed by hadoop container.
From the flume container you should be able to do something like. (ports on hadoop container should be exposed)
$ hadoop fs -ls hdfs://<hadoop containers IP>:8020/
The error you are getting might be related to some hadoop services not running on flume. do jps to check services running. But I think if you have hadoop classpath setup correctly on flume container, then you can run the above hdfs command (-ls hdfs://:8020/) without starting anything. But if you want
hadoop fs -ls /
to work on flume container, then you need to start hadoop services on flume container also.
On your core-site.xml add dfs.namenode.rpc-address like this so namenode listens to connection from all ip
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>0.0.0.0:8020</value>
</property>
Make sure to restart the namenode and datanode
sudo /etc/init.d/hadoop-hdfs-namenode restart && sudo /etc/init.d/hadoop-hdfs-datanode restart
Then you should be able to do this from your hadoop container without connection error, eg
hadoop fs -ls hdfs://localhost:8020/
hadoop fs -ls hdfs://172.17.0.11:8020/
On the linked container. Type env to see exposed ports by your hadoop container
env
You should see something like
HADOOP_PORT_8020_TCP=tcp://172.17.0.11:8020
Then you can verify the connection from your linked container.
telnet 172.17.0.11 8020
I think I met the same problem yet. I either can't start hadoop namenode and datanode by hadoop command "start-all.sh" in docker1.
That is because it launch namenode and datanode through "hadoop-daemons.sh" but it failed. The real problem is "ssh" is not work in docker.
So, you can do either
(solution 1) :
Replace all terms "daemons.sh" to "daemon.sh" in start-dfs.sh,
than run start-dfs.sh
(solution 2) : do
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
You can see datanode and namenode are working fine by command "jps"
Regards.

Resources