Access hadoop nodes web UI from multiple links - hadoop

i am using the following setup for hadoop's nodes web ui access :
dfs.namenode.http-address : 127.0.0.1:50070
By which i am able to access the nodes web ui link only form the local machine as :
http://127.0.0.1:50070
Is there any way by which i can make it accessible from outside as well ? say like :
http://<Machine-IP>:50070
Thanks in Advance !!

You can use hostname or ipaddress instead of localhost/127.0.0.1.
Make sure you can ping the hostname or ip from the remote machine. If you can ping it then you can able to access web ui.
To ping it
Open cmd/terminal
type the below command in remote machines
ping hostname/ip

From http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html
The following table lists web interfaces that you can view on the core
and task nodes. These Hadoop interfaces are available on all clusters.
To access the following interfaces, replace slave-public-dns-name in
the URI with the public DNS name of the node. For more information
about retrieving the public DNS name of a core or task node instance,
see Connecting to Your Linux/Unix Instances Using SSH in the Amazon
EC2 User Guide for Linux Instances. In addition to retrieving the
public DNS name of the core or task node, you must also edit the
ElasticMapReduce-slave security group to allow SSH access over TCP
port 22. For more information about modifying security group rules,
see Adding Rules to a Security Group in the Amazon EC2 User Guide for
Linux Instances.
YARN ResourceManager
YARN NodeManager
Hadoop HDFS NameNode
Hadoop HDFS DataNode
Spark HistoryServer
Because there are several application-specific interfaces available on
the master node that are not available on the core and task nodes, the
instructions in this document are specific to the Amazon EMR master
node. Accessing the web interfaces on the core and task nodes can be
done in the same manner as you would access the web interfaces on the
master node.
There are several ways you can access the web interfaces on the master
node. The easiest and quickest method is to use SSH to connect to the
master node and use the text-based browser, Lynx, to view the web
sites in your SSH client. However, Lynx is a text-based browser with a
limited user interface that cannot display graphics. The following
example shows how to open the Hadoop ResourceManager interface using
Lynx (Lynx URLs are also provided when you log into the master node
using SSH).
Copy lynx http://ip-###-##-##-###.us-west-2.compute.internal:8088/
There are two remaining options for accessing web interfaces on the
master node that provide full browser functionality. Choose one of the
following:
Option 1 (recommended for more technical users): Use an SSH client to connect to the master node, configure SSH tunneling with local port
forwarding, and use an Internet browser to open web interfaces hosted
on the master node. This method allows you to configure web interface
access without using a SOCKS proxy.
to do this use the command
$ ssh -gnNT -L 9002:localhost:8088 user#example.com
where user#example.com is your username. Note the use of -g to open access to external ip addresses (beware this is a security risk)
you can check this is running using
nmap localhost
to close this ssh tunnel when done use
ps aux | grep 9002
to find the pid of your running ssh process and kill it.
Option 2 (recommended for new users): Use an SSH client to connect to the master node, configure SSH tunneling with dynamic port
forwarding, and configure your Internet browser to use an add-on such
as FoxyProxy or SwitchySharp to manage your SOCKS proxy settings. This
method allows you to automatically filter URLs based on text patterns
and to limit the proxy settings to domains that match the form of the
master node's DNS name. The browser add-on automatically handles
turning the proxy on and off when you switch between viewing websites
hosted on the master node, and those on the Internet. For more
information about how to configure FoxyProxy for Firefox and Google
Chrome, see Option 2, Part 2: Configure Proxy Settings to View
Websites Hosted on the Master Node.
This seems like insanity to me but I have been unable to find how to configure access in core-site.xml to override the web interface for the ResourceManager which by default it is available at localhost:8088/ and if Amazon think this is the way then I tend to go along with it

Related

Connect to YARN UI web interface from ssh

I have to make some benchmarks for a spark-scala application.
I'm working on a cluster with 9 nodes. I access to my account with ssh -i "mykey" "user#host.com" (through Ubuntu on VM or bitvise ssh client for windows).
Now I have to test my application runtime performances but i don't know broswer the "master:8088/cluster" link from terminal or VM. I need to reach YARN web UI and HDFS. I read a lot of articles about ssh tunneling and port forwarding?

How to access Hadoop from another machine on same network?

I have one machine running Hadoop and I do want to access Ambari by another machine on the same network.
How can I do that?
For security reasons, ports used by Hadoop cannot be accessed over a public IP address. To connect to Hadoop from a different machine, you must the open port of the service you want to access remotely. I believe Openstack provides the option for assigning Floating IPs. That's one of the options to enable you to access remotely. You must create a port redirect rule on your VM program.

Pull data from icinga satellite to master behind firewall

I have the following situation:
A private enterprise network with a Icinga2 master, monitoring the internal servers. The firewall blocks all inbound access, however all servers to have outbound internet access (multiple protocols, such as SSH, HTTP, HTTPS).
We also have an environment in Azure with 1 publicly accessable VM (nginx) and behind that, in a private network, application servers. I'd also like to monitor these servers. I read that I can set up a Icinga2 satellite (in Azure), that monitors the Azure environment and sends the data to the master.
This would be a great solution. However, my master is in our private enterprise network, so the Icinga satellite can't push any data to the master. The only option would be that the master pulls the data periodically from the satellite(s). It's possible for the master to login via SSH agent forwarding to the servers in Azure. Is this possible or is there a better solution? I'd rather not create a reverse SSH tunnel.
You might just use the icinga2 client and let the master connect to the client (ensure that the endpoint object contains host/port attributes). Once the connection is established the client will send its check results (and historical replay logs even if there).

Job tracking URL in Google Compute engine not working

I am using Google Compute Engine to run Mapreduce jobs on Hadoop (pretty much all default configs). While running the job I get a tracking URL of the form http://PROJECT_NAME:8088/proxy/application_X_Y/ but it fails to open. Did I forget to configure something?
To elaborate on the option Amal mentioned in the other answer of using the "external ip address" of your Google Compute Engine VM, you can obtain the external IP address by running gcloud compute instances describe --zone <your zone> <your master hostname> and looking for natIP.
To open port 8088, you'll have to set up a firewall rule opening that port, likely on your default Google Compute Engine network. You'll want to specify a your.ip.address.here/32 address in the --source-ranges to restrict incoming traffic to just your local machine dialing into your VM, otherwise the anyone in the IP source-ranges would be able to access your Hadoop pages.
If you had used bdutil to turn up your cluster, there's an alternative way which is much easier and more secure; simply run
bdutil <your flags used in deployment, like -e hadoop2, --prefix, etc.> socksproxy
to open SSH with dynamic port forwarding to use as a SOCKS5 proxy that your browser can point to. If you're running on Linux or Mac and have Chrome or Firefox installed, bdutil should also print out a copy/paste command for starting a fresh isolated browser pre-configured to use the socks proxy so that you can click through all the useful links.
If bdutil didn't print out a browser command or you didn't use bdutil, you can also run and configure your SSH socks proxy using these instructions. An SSH-based socks proxy is more secure than opening up firewall ports, and also allows the Hadoop page links to work (otherwise you have to keep manually replacing the hostnames with the external IP addresses).
One correction. You are using YARN. So there is no jobtracker. Jobtracker is present in hadoop 1.x. In YARN, the processing layer became a generic framework and the jobtracker got replaced with Resource manager and application master. The UI that you mentioned in the question was of Resource Manager.
For your problem, try the following tips.
Use the public ip address of the resource manager instance instead of PROJECT_NAME.
Check whether the 8088 port is opened for accessing it from outside.
Another (more secure) way to do this is to use gcloud compute to make an ssh tunnel to your deployment, and then launch Chrome though it.
$ gcloud compute ssh clustername --zone=us-central1-a --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n"
You will need to replace clustername with the name of your deployment, and change the --zone if necessary.
From there, you can launch Chrome through it and then reach the hadoop job tracking URL.
$ chrome --proxy-server="socks5://localhost:1080" \
--host-resolver-rules="MAP * 0.0.0.0 , \
EXCLUDE localhost" --user-data-dir=/tmp/clustername

Amazon EMR Application Master web UI?

I have started running PIG jobs on Amazon EMR using Hadoop YARN (AMI 3.3.1) however as there is no longer a job tracker in Yarn, I can't seem to be able to find a web UI so that I can track the number of Mappers and Reducers for a MapReduce job, when I try to access the Application Master link provided in the resource manager UI page, I am told that the page doesn't exists (Picture provided below).
Does anyone know how I can access a UI through my web browser that will show me the current job status in terms of number of mappers, reducers and the % completed for each etc?
Thanks
Once you click the ApplicationMaster link from ResourceManager webpage, you'll be redirected to ApplicationMaster web ui; as EMR uses EC2 instances and each EC2 instance has 2 IP addresses associated with it, one used for private communication and another for public. EMR uses private ip addresses (private DNS) to setup hadoop hence, you'll be redirected to a url like this:
http://10.204.137.136:9046/proxy/application_1423027388806_0003/
which you could see is pointing to instance's private ip address and hence your browser cannot resolve the ip address, you just have to replace the private ip address with the public ip address (or public dns name) of that instance:
Obtaining the public ip address of an instance
Using the EC2 web interface
You could login to the AWS EC2 console and find the instance's ip address's
Using the console:
If you are logged into the instance and want to know it's public ip address then issue the following command which will give you back the public ip address of that instance.
curl http://169.254.169.254/latest/meta-data/public-ipv4
Also take a look at this AWS documentation page on how to view web interfaces which provides other options like setting up SSH tunneling and using SOCKS proxy.

Resources