Ambari can't start namenode with a connection exception - hadoop

I have been strapped in this for a few days,I tried a lot of methods on the Internet but it didn't work,so I register on stackoverflow and write my first question.
My environment is HDP2.4 and Ubuntu14.04 LTS
the log information is like this:
safemode: Call From master/9.119.131.105 to master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2016-04-21 15:59:57,796 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://master:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From master/9.119.131.105 to master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
(1).At first I tried to avoid the safemode,but it isn't the key of this issue.
(2).Then I started to work on the 8020 port of master,master is the namenode of my cluster and I am sure that the hosts file is correct:
9.119.131.105 master
(3).ssh function is good,the four nodes in the cluster can load each other without passcodes
(4).I'm sure the fire wall is off
# /etc/init.d/iptables stop
# ufw status
Status: inactive
(5).I also tried to open the 8020 port manually:
# iptables -A INPUT -p tcp --dport 8020 -j ACCEPT
it still didn't work.....
(6).I tried :
# telnet master 8020
but:
telnet:Unable to connect to remote host: Connect refused
(7).I find that not only the 8020 port can't work,but also 50070 can't work
(8).I'm sure the parameters in the configure files is 8020:
fs.defaultFS
hdfs://master:8020
dfs.namenode.rpc-address
master:8020
I'm expecting somebody can help me,I will be very appreciate about thant.
Thank you!

Related

Kerberos HBase Zookeeper fails

I'm trying to kerberise my HBase Cluster and I get some problems with Zookeeper. When I start Hbase I get this error on the Master log :
ERROR [main-SendThread(X.X.X.X:2181)] client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
ERROR [main-SendThread(X.X.X.X:2181)] zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
DEBUG [main-EventThread] zookeeper.ZKWatcher: master:16000-0x16c236187be0000, quorum=Y.Y.Y.Y:2181,X.X.X.X:2181, baseZNode=/hbase Received ZooKeeper Event, type=None, state=AuthFailed, path=null
DEBUG [main] zookeeper.ZooKeeper: Close called on already closed client
On the Zookeeper log, I get :
WARN [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Unexpected exception, tries=0, connecting to /X.X.X.X:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
I verified my firewall, the ports are open
For the configuration, I followed the HBase Reference Guide :
http://hbase.apache.org/book.html#zk.sasl.auth
At first I thought it was a problem with my keytab but Hadoop is working fine with it.
I run HBase 2.0.5, Hadoop 3.1.2 and the Zookeeper is the one provided by HBase.
Following #SamsonScharfrichter 's comment, I've tried a few things :
I've created and specified in /etc/hosts the FQDN of my servers and modified my configurations to reflect this change.
Changed the hostname of my servers for the FQDN
tried to nslookup my hostnames, didn't work since they are specified in /etc/hosts
It didn't do anything, I'm still getting the error. My guess is that Kerberos tries to search for a DNS on my public NIC and not my private. I do not know why it struggles so hard to find my servers, since hadoop has absolutely no problem with it.
EDIT - I set up a private DNS on my network. DNS working great, still getting the error. I'm about to give up
EDIT 2 - I installed tshark on the node with the error. Apparently I get a frame with the message :
Error: KRB5KDC_ERR_C_PRINCIPAL_UNKNOWN
which is weird, I verified my keytab and the principals listed in kadmin. Maybe there defaults principals that I don't use ?

Spark I/O error constructing remote block

I want to create an home-made spark cluster with two computer in the same network. The setup is the following:
A) 192.168.1.9 spark master with hadoop hdfs installed
Hadoop has this core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
</configuration>
B) 192.168.1.6 with spark only (slave)
From B I want to access to a file in A's hadoop hdfs using the spark command:
...
# Load files
file_1 = "input_1.pkl"
file_2 = "input_2.pkl"
hdfs_base_path = "hdfs://192.168.1.9:9000/folderx/"
sc.addFile(hdfs_base_path + file_1)
sc.addFile(hdfs_base_path + file_2)
# Get files back
with open(SparkFiles.get(file_1), 'rb') as fw:
// use fw
However, if I want to test the program in B, when I execute the program in B using the command:
./spark-submit --master local program.py
The output is the following:
17/07/25 19:02:51 INFO SparkContext: Added file hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl at hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl with timestamp 1501002171301
17/07/25 19:02:51 INFO Utils: Fetching hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl to /tmp/spark-838c3774-36ec-4db1-ab01-a8a8c627b100/userFiles-b4973f80-be6e-4f2e-8ba1-cd64ddca369a/fetchFileTemp1979399086141127743.tmp
17/07/25 19:02:51 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
And later:
17/07/25 19:02:51 WARN DFSClient: Failed to connect to /127.0.0.1:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
The program tries to access 127.0.0.1:50010, and it is wrong. Should I install hadoop also in B? If it is not necessary, what is the correct configuration? Thank you!
btw, just in case anyone comes to find some sort of solution, I fixed my issue by pointed quickstart.cloudera to real IP address instead of 127.0.0.1.
The default /etc/hosts is
127.0.0.1 quickstart.cloudera quickstart localhost localhost.domain
What you want is
127.0.0.1 localhost localhost.domain
xxxIP_Address_oF_YOUR_VM quickstart.cloudera quickstart
You may want to modify /usr/bin/cloudera-quickstart-ip as well, because every time you restart your VM, the hosts file may got reset again.

Hadoop localhost connection failed

I am trying to run the following command:
hadoop fs -ls /data
It is giving me the following error:
Call From elkd02/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I do not understand what is going wrong. I have checked the /etc/hosts file and it contains following:
127.0.0.1 Localhost
127.0.1.1 elkd02
How should I resolve the issue?

how to connect localhost as client and server in hadoop same user in ubuntu

If client and server in same ubuntu machine, not able to connect.
giving error
Call From ashish/127.0.0.1 to localhost:54310 failed on connection
exception: java.net.ConnectException: Connection refused; For more
details see: http://wiki.apache.org/hadoop/ConnectionRefused
By server do you mean namenode? By client do you mean hadoop client, datanode, nodemanager? Are you sure namenode is running and exposed on localhost:54310? Could you try
> nc -vz localhost 54310
How does your /etc/hosts look like? How does your core-site.xml and hdfs-site.xml looks like? What do you get for (as your hadoop user):
> jps -ml
What do you get for:
> sudo iptables -L
Also take a look at:
Hadoop cluster setup - java.net.ConnectException: Connection refused
Hadoop Datanodes cannot find NameNode

There are something wrong about Hadoop cluster

I have build a hadoop cluster on ECS on Aliyun of Alibaba.com( it's like AWS). The OS is Ubuntu12.04 . The version of Hadoop is 2.7.1
The cluster is consisted of one master and two slaves.
I can start it successfully. Every node can work well, and I
can use ssh to access two slave node from master node.
Every node is started.
But when I run the wordcount program, there is something wrong. The
error is as following:
exception: java.net.ConnectException: Call From master/10.144.52.189 to localhost:38635 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
When I added Port 38635 in the file /etc/ssh/sshd_config, I run the wordcount program again. The error is still existed, the only difference is the Port 38635 changed.
exception: java.net.ConnectException: Call From master/10.144.52.189 to localhost:46656 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
How to fix this problem? The ports 38635 and 46656 are added in /etc/ssh/sshd_config, the error occurs when run the wordcount program with a new port in the error information.

Resources