Weird DNS server causes Hadoop and HBase to malfunction - hadoop

I have a network with some weird (as I understand) DNS server which causes Hadoop or HBase to malfunction.
It resolves my hostname to some address my machine doesn't know about (i.e. there is no such interface).
Hadoop does work if I have following entries in /etc/hosts:
127.0.0.1 localhost
127.0.1.1 myhostname
If entry "127.0.1.1 myhostname" is not present uploading file to HDFS fails and complains that it can replicate the file only to 0 datanodes instead of 1.
But in this case HBase does not work: creating a table from HBase shell causes NotAllMetaRegionsOnlineException (caused actually by HMaster trying to bind to wrong address returned by DNS server for myhostname).
In other network, I am using following /etc/hosts:
127.0.0.1 localhost
192.168.1.1 myhostname
And both Hadoop and HBase work.
The problem is that in second network the address is dynamic and I can't list it into /etc/hosts to override result returned by weird DNS.
Hadoop is run in pseudo-distributed mode. HBase also runs on single node.
Changing behavior of DNS server is not an option.
Changing "localhost" to 127.0.0.1 in hbase/conf/regionservers doesn't change anything.
Can somebody suggest a way how can I override its behavior while retaining internet connection (I actually work at client's machine through Teamviewer). Or some way to configure HBase (or Zookeeper it is managing) not to use hostname to determine address to bind?

Luckily, I've found the workaround to this DNS server problem.
DNS server returned invalid address when queried by local hostname.
HBase by default does reverse DNS lookup on local hostname to determine where to bind.
Because the address returned by DNS server was invalid, HMaster wasn't able to bind.
Workaround:
In hbase/conf/hbase-site.xml explicitly specify interfaces that will be used for master and regionserver:
<configuration>
<property>
<name>hbase.master.dns.interface</name>
<value>lo</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo</value>
</property>
</configuration>
In this case, I specified loopback interface (lo) to be used for both master and regionserver.

a simple tool I wrote to check for DNS issues:
https://github.com/sujee/hadoop-dns-checker

Related

hadoop fs -ls : Call From server/127.0.1.1 to localhost failed

I have hadoop installed in pseudo-distributed mode.
When running the command
hadoop fs -ls
I am getting the following error:
ls: Call From kali/127.0.1.1 to localhost:9000 failed on connection exception:
java.net.ConnectException: Connection refused;
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Any suggestions?
If you read the link in the error, I see two immediate points that need addressed.
If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
You should treat pseudodistributed mode as a remote system, even if it is only running locally.
For HDFS, you can resolve that by putting your computer hostname (preferably the full FQDN for your domain), as the HDFS address in core-site.xml. For your case, hdfs://kali:9000 should be enough
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
I'm not completely sure why it needs removed, but the general answer I can think of is that Hadoop is a distributed system, and as I mentioned, treat the pseudodistributed mode as if it's remote HDFS server. Therefore, no loopback addresses should use your computers hostname
For example, remove the second line of this
127.0.0.1 localhost
127.0.1.1 kali
Or remove the hostname from this
127.0.0.1 localhost kali
Most importantly (emphasis added)
None of these are Hadoop problems, they are hadoop, host, network and firewall configuration issues

Datanode is not connected to the Namenode cloudera

I want to acces cloudera from a distant machine via Talend for big data. In order to do that i changed the ip of the host in cloudera by editing the file /etc/hosts and /etc/sysconfig/network.
I can acces cloudera from Talend. However the problem is that my datanode and Namenode seems to be not connected. When i check the log details of my Datanode i get the following errors :
Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 beginning handshake with NN
Initialization failed for Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=5802ab81-2c28-4beb-b155-cac31567d23a, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-83500942-6c65-4ecd-a0c2-a448be86bcb0;nsid=975591446;c=0)
the datanode still uses the wrong ip adress ( 127.0.0.1 ) even though i made the modifications in core-site.xml, hdfs-site.xml and mapred-site.xml by editing the previous ip adress by the new one.
I followed the steps given in this tutorial to do so :
https://www.youtube.com/watch?v=fSGpYHjGIRY
How can i fix this error ?
On Debian 8, the /etc/hosts will contain entry for 127.0.1.1 with your hostname you gave during Linux installation. Cloudera will use this IP-address for some of its services.
A regular HDFS will contain multiple servers with different hostnames/IP-addresses and will list those IPs as allowed. As your log says, the traffic is originating from 127.0.0.1, which is not the IP-address of your hostname.
For Cloudera single-server setup, the only way I found was to do the initial setup so, that /etc/hosts doesn't have the 127.0.1.1 entry in it.

Hadoop NameNode Ip Address

I'm using Hadoop MapReduce paradigm, and i need to get the NameNode IP address from the DataNode, can any one give me an idea how to do this?
Thanks.
Easiest way would be to quickly open the core-site.xml file under HADOOP_HOME/conf directory. The value of fs.default.name property will tell you the host and port where NN is running.
Delete the line 127.0.0.1 localhost in your /etc/host and put your IP and the name of all your machines. Hadoop is resolving all the IPs and names of machines on the cluster as 127.0.0.1 localhost if you leave the file as default.

hadoop api configuration on the client machine

ultra-noob. I have a server machine with cdh3u1 pseudo-distrib, and a client machine with a java application using the cdh3u1 API.
How do I configure the client to talk to the server? I've been googling for hours and couldn't find where is the "client configuration" file. The "hdfs-default", "core-default" and "mapred-default" and their "-site" counterparts all look like server (namenode and datanode) config to me.
Is it just "multipurpose client server" config and I should cherry-pick the attributes in these files that are appropriate to the client? which are they? probably missing something big here...
Thanks, Ido
make sure that the client machine can access the hadoop server machine ip. If you use a virtualbox for the hadoop server (cdh3 vm), then add a "host-only" network interface (see details here: host-only networking with virtualbox. I'm assuming that your static ip for the hadoop server is 192.168.56.101 and that you're able to ping it from your client.
configure a hostname for your hadoop server machine in both the server and client machine. If you want to name your hadoop server "local-elephant", add the following line to /etc/hosts in both machines: 192.168.56.101 local-elephant.
in the server machine goto /etc/hadoop/conf change the values of the following properties from "localhost" to "local-elephant": in core-site.xml the value of fs.default.name and in mapred-site.xml the value of mapred.job.tracker.
in the client machine, create core-site.xml and mapred-site.xml in the classpath of your java application. In those files put only the fs.default.name and mapred.job.tracker properties.

Hadoop job tracker only accessible from localhost

I'm setting up Hadoop (0.20.2). For starters, I just want it to run on a single machine - I'll probably need a cluster at some point, but I'll worry about that when I get there. I got it to the point where my client code can connect to the job tracker and start jobs, but there's one problem: the job tracker is only accessible from the same machine that it's running on. I actually did a port scan with nmap, and it shows port 9001 open when scanning from the Hadoop machine, and closed when it's from somewhere else.
I tried this on three machines (one Mac, one Ubuntu, and an Ubuntu VM running in VirtualBox), it's the same. None of them have any firewalls set up, so I'm pretty sure it's a Hadoop problem. Any suggestions?
In your hadoop configuration files, does fs.default.name and mapred.job.tracker refer to localhost?
If so, then Hadoop will only listen to port 9000 and 9001 on the loopback interface, which is inaccessible from any other host. Make sure fs.default.name and mapred.job.tracker refer to your machine's externally accessible host name.
Make sure that you have not double listed your master in the /etc/hosts file.
I had the following which only allowed master to listen on 127.0.1.1
127.0.1.1 hostname master
192.168.x.x hostname master
192.168.x.x slave-1
192.168.x.x slave-2
The above answer caused the problem. I changed my /ect/hosts file to the following to make it work.
127.0.1.1 hostname
192.168.x.x hostname master
192.168.x.x slave-1
192.168.x.x slave-2
Use the command netstat -an | grep :9000 to verify your connections are working!
In addition to the above answer, I found that in /etc/hosts on the master (running ubuntu) had the line:
127.0.1.1 master
Which meant that running nslookup master on the master returned a local address - so in spite of using master in mapred-site.xml I suffered the same problem. My solution (there are probably better ones) was to create an alias in my DNS server and use that instead. I guess you could probably also change the IP address in /etc/hosts to the external one, but I haven't tried this - I'm not sure what implications it would have for other services.

Resources