How to configure ports for hostname and localhost? - hadoop

I am running a browser on the single node Hortonworks Hadoop cluster(HDP 2.3.4) on Centos 6.7:
With localhost:8000 and <hostname>:8000, I can access Hue. Same works for Ambari at 8080
However, several other ports, I only can access with the hostname. So with e.g. <hostname>:50070, I can access the namenode service. If I use localhost:50070, I cannot setup a connection. So I assume localhost is blocked, the namenode not.
How can I set up that localhost and <hostname> have the same port configuration?

This likely indicates that the NameNode HTTP server socket is bound to a single network interface, but not the loopback interface. The NameNode HTTP server address is controlled by configuration property dfs.namenode.http-address in hdfs-site.xml. Typically this specifies a host name or IP address, and this maps to a single network interface. You can tell it to bind to all network interfaces by setting property dfs.namenode.http-bind-host to 0.0.0.0 (the wildcard address, matching all network interfaces). The NameNode must be restarted for this change to take effect.
There are similar properties for other Hadoop daemons. For example, YARN has a property named yarn.resourcemanager.bind-host for controlling how the ResourceManager binds to a network interface for its RPC server.
More details are in the Apache Hadoop documentation for hdfs-default.xml and yarn-default.xml. There is also full coverage of multi-homed deployments in HDFS Support for Multihomed Networks.

Related

How to configure HDFS to listen to 0.0.0.0

I have an hdfs cluster listening on 192.168.50.1:9000 which means it only accepts connections via that IP. I would like it to listen at 0.0.0.0:9000. When I enter 127.0.0.1 localhost master in the /etc/hosts, then it starts at 127.0.0.1:9000, which prevents all nodes to connect.
This question is similar to this one How to make Hadoop servers listening on all IPs, but for hdfs, not yarn.
Is there an equivalent setting for core-site.xml like yarn.resourcemanager.bind-host or any other way to configure this? If not, then what's the reasoning behind this? Is it a security feature?
For the NameNode you need to set these to 0.0.0.0 in your hdfs-site.xml:
dfs.namenode.rpc-bind-host
dfs.namenode.servicerpc-bind-host
dfs.namenode.lifeline.rpc-bind-host
dfs.namenode.http-bind-host
dfs.namenode.https-bind-host
The DataNodes use 0.0.0.0 by default.
If you ever need to find a config variable for HDFS, refer to hdfs-default.xml.
Also very useful, if you look at any of the official Hadoop docs, at the bottom left corner of the page are all the default values for the various XML files.
So you can go to Apache Hadoop 2.8.0 or your specific version and find the settings you're looking for.
Well the question is quite old already, however, usually you do not need to configure the bind address 0.0.0.0 because it is the default value! You'd rather have an entry in the /etc/hosts file 127.0.0.1 hostname which hadoop resolves to 127.0.0.1.
Consequently you need to remove that entry and hadoop will bind to all interfaces (0.0.0.0) without any additional config entries in the config files.

Can not open localhost:8088. Trying to install Hadoop3 on Windows10

localhost:9870 is working fine. the problem is localhost:8088. Did they move it same as 9870?
No. As stated in Apache Hadoop 3.0.0:
Default ports of multiple services have been changed.
Previously, the default ports of multiple Hadoop services were in the Linux ephemeral port range (32768-61000). This meant that at startup, services would sometimes fail to bind to the port due to a conflict with another application.
These conflicting ports have been moved out of the ephemeral range, affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our documentation has been updated appropriately, but see the release notes for HDFS-9427 and HADOOP-12811 for a list of port changes.
Since the YARN ports were never in the ephemeral port range, they didn't need to be changed.
This is confirmed by looking at the yarn-default.xml for Hadoop 3.0.0.
| yarn.resourcemanager.webapp.address | ${yarn.resourcemanager.hostname}:8088 | The http address of the RM web application. If only a host is provided as the value, the webapp will be served on a random port.

Access hadoop nodes web UI from multiple links

i am using the following setup for hadoop's nodes web ui access :
dfs.namenode.http-address : 127.0.0.1:50070
By which i am able to access the nodes web ui link only form the local machine as :
http://127.0.0.1:50070
Is there any way by which i can make it accessible from outside as well ? say like :
http://<Machine-IP>:50070
Thanks in Advance !!
You can use hostname or ipaddress instead of localhost/127.0.0.1.
Make sure you can ping the hostname or ip from the remote machine. If you can ping it then you can able to access web ui.
To ping it
Open cmd/terminal
type the below command in remote machines
ping hostname/ip
From http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html
The following table lists web interfaces that you can view on the core
and task nodes. These Hadoop interfaces are available on all clusters.
To access the following interfaces, replace slave-public-dns-name in
the URI with the public DNS name of the node. For more information
about retrieving the public DNS name of a core or task node instance,
see Connecting to Your Linux/Unix Instances Using SSH in the Amazon
EC2 User Guide for Linux Instances. In addition to retrieving the
public DNS name of the core or task node, you must also edit the
ElasticMapReduce-slave security group to allow SSH access over TCP
port 22. For more information about modifying security group rules,
see Adding Rules to a Security Group in the Amazon EC2 User Guide for
Linux Instances.
YARN ResourceManager
YARN NodeManager
Hadoop HDFS NameNode
Hadoop HDFS DataNode
Spark HistoryServer
Because there are several application-specific interfaces available on
the master node that are not available on the core and task nodes, the
instructions in this document are specific to the Amazon EMR master
node. Accessing the web interfaces on the core and task nodes can be
done in the same manner as you would access the web interfaces on the
master node.
There are several ways you can access the web interfaces on the master
node. The easiest and quickest method is to use SSH to connect to the
master node and use the text-based browser, Lynx, to view the web
sites in your SSH client. However, Lynx is a text-based browser with a
limited user interface that cannot display graphics. The following
example shows how to open the Hadoop ResourceManager interface using
Lynx (Lynx URLs are also provided when you log into the master node
using SSH).
Copy lynx http://ip-###-##-##-###.us-west-2.compute.internal:8088/
There are two remaining options for accessing web interfaces on the
master node that provide full browser functionality. Choose one of the
following:
Option 1 (recommended for more technical users): Use an SSH client to connect to the master node, configure SSH tunneling with local port
forwarding, and use an Internet browser to open web interfaces hosted
on the master node. This method allows you to configure web interface
access without using a SOCKS proxy.
to do this use the command
$ ssh -gnNT -L 9002:localhost:8088 user#example.com
where user#example.com is your username. Note the use of -g to open access to external ip addresses (beware this is a security risk)
you can check this is running using
nmap localhost
to close this ssh tunnel when done use
ps aux | grep 9002
to find the pid of your running ssh process and kill it.
Option 2 (recommended for new users): Use an SSH client to connect to the master node, configure SSH tunneling with dynamic port
forwarding, and configure your Internet browser to use an add-on such
as FoxyProxy or SwitchySharp to manage your SOCKS proxy settings. This
method allows you to automatically filter URLs based on text patterns
and to limit the proxy settings to domains that match the form of the
master node's DNS name. The browser add-on automatically handles
turning the proxy on and off when you switch between viewing websites
hosted on the master node, and those on the Internet. For more
information about how to configure FoxyProxy for Firefox and Google
Chrome, see Option 2, Part 2: Configure Proxy Settings to View
Websites Hosted on the Master Node.
This seems like insanity to me but I have been unable to find how to configure access in core-site.xml to override the web interface for the ResourceManager which by default it is available at localhost:8088/ and if Amazon think this is the way then I tend to go along with it

Hadoop namenode web UI not opening in CDH4

I recently installed the CDH distribution of Cloudera to create a 2 node cluster. From the Cloudera Manager UI, all services are running fine.
All the command line tools (hive etc ) are also working fine and I am able to read and write data to hdfs.
However the namenode (and datanode) web UI alone is not opening. Checking on netstat -a | grep LISTEN, the processes are listening on the assigned ports and there are no firewall rules which are blocking the connections ( I already disabled iptables)
I initially though that it could be a DNS issue but even the IP address is not working. The Cloudera Manager installed on the same machine on another port is opening fine.
Any pointers on how to debug this problem?
I had faced the same issue.
First it was because NAMENODE in safemode
then after because of two IP address (I have two NIC configured on CDH Cluster one internal connectivity of the servers (10.0.0.1) and other is to connect servers form Internet (192.168.0.1))
When i try to open NAMENODE GUI form any of the server connected to cluster on network 10.0.0.1 then GUI is opening and works fine but from other any other machine connected to servers by 192.168.0.1 network it fails.

Configuring a slave's hostname using internal IP - Multiple NICs

In my Hadoop environment, I need to configure my slave nodes so that when they communicate in the middle of a map/reduce job they use the internal IP instead of the external IP that it picks up from the hostname.
Is there any way to set up my Hadoop config files to specify that the nodes should communicate using the internal IPs instead of the external IPs? I've already used the internal IPs in my core-site.xml, master, and slave files.
I've done some research and I've seen people mention the "slave.host.name" parameter, but which config file would I place this parameter in? Are there any other solutions to this problem?
Thanks!
The IP routing tables have to be changed so that the network between the Hadoop nodes uses a particular gateway. Don't think Hadoop has any setting to change which gateway to use.
You can configured slave.host.name in mapred-site.xml for each slave node.
Also remember to use that host name (instead of IP) consistently for all other configurations (core-site.xml, hdfs-site.xml, mapred-site.xml, masters, slaves) and also /etc/hosts file.

Resources