Yarn UI shows no information about applications - hadoop

I know that the similar question was asked Applications not shown in yarn UI when running mapreduce hadoop job?
but the answers did not solve my problems.
I am running Hadoop streaming on Linux 17.01. I setup a cluster with 3 nodes and 1 master node.
When I start Hadoop, I can access localhost:50070 to see other nodes (all nodes are alive).
However, I see no information in "Application" of localhost:8088
as well as by command "yarn application -list -appStates ALL".
Here is my configuration.
My yarn-site.xml (for all nodes)
Here is all processes on master node
The problems may due to yarn services are running on ipv6. However, I followed I followed this thread
https://askubuntu.com/questions/440649/how-to-disable-ipv6-in-ubuntu-14-04
to change all Yarn services to ipv4. However, still there is no tasks displayed on Yarn UI, even I can see all nodes in my cluster marked as "active" on Yarn UI.
So, I do not know why this happened. Do you have any suggestion?
Thank you very much.

I haven't typically seen YARN being configured for IPv4, but this property is added into the hadoop-env.sh
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
I'm sure you also add a similar variable into the yarn-env.sh for YARN_OPTS, I think
However, it's not really clear from the your question when / if you've even submitted an application for anything to appear

Related

Number of yarn applications getting launched as soon as hadoop services gets up. Cluster is 4 nodes ie. Hadoop HA cluster

Hadoop-HA cluster - 4 nodes
As soon as I start hadoop services unnecessary yarn applications gets launched and no application logs gets generated. Not able to debug problem without logs. Can anyone help me to resolve this issue.
https://i.stack.imgur.com/RjvkB.png
Never come across such issue. But it seems that there is some script or may be some oozie job triggering these apps. Try Yarn-Clean if this is of any help.
Yarn-Clean

How can I tell how many nodes are in my YARN cluster

I want to find out how many nodes are in a yarn cluster that I'm working on.
I don't have any admin Ambari privileges and I can see the yarn UI page and I can see the nodes page.
I can see there are 4 nodes.
However, can someone tell me does that mean there are 4 nodes running yarn Node Manager, or is that 4 nodes that have yarn containers already spun up?
Don't direct me to the yarn documentation!
I've already been looking through it.
You could just ask the people who do have access to Ambari (maybe even ask if they let you get a read only account).
The YARN UI tells you both, though. It tells you how many nodes are available, the sum of all available node memory, and the number of running, pending, and allocated containers.
If no containers are up, the nodes are still listed

legecy UI in hadoop 2.7.0

How can we enable Legacy UI in Hadoop 2.7.0?
http://localhost:50070/dfshealth.html#tab-overview
I assume you playing with a pseudo cluster (single node), namenode hosts the web service, you should be able to open the that UI as long as namenode is started even with a minimal number of properties set in hdfs-site.xml and core-site.xml.
So it mostly looks like your namenode is not running properly, 1st thing is to check the namenode log, hdfs-hdfs-namenode-...log. See what happened ...

How to submit a job to specifc nodes in Hadoop?

I have a Hadoop cluster with 1 Master and 5 slaves. Is there any way of submitting jobs to specific set of slaves? Basically what i am trying to do is benchmark my application with many possibilities. So after testing with 5 slaves, I would like to run my application with 4 slaves and then 3 slaves and so on.
Currently the only way I know of is decommissioning a slave and removing from the hadoop cluster. But that seems to be a tedious task. I was wondering if there is an easier approach so as to avoid removing a node from the cluster.
Thanks.
In hadoop/conf there is a file called 'slaves' here you can simply add or remove nodes, and then restart your dfs and mapred.
There is a setting that points to a file with a list of excluded hosts you can set in the mapred-site-xml. Though also a bit cumbersome, changing a single configuration value might be preferable physically decommissioning and recommissioning multiple nodes. You could prepare multiple host exclusion files in advance, change the setting and restart the mapreduce service. Restarting the mapreduce service is pretty quick.
In 0.23 this setting is named mapreduce.jobtracker.hosts.exclude.filename. This is a feature introduced in 0.21, though I believe the setting was named mapred.hosts.exclude then. Check what this setting is called for the version of Hadoop you are using.
For those who encounter this problem, comments from Alex and stackoverflow question will help in successfully decommissioning a node from hadoop cluster.
EDIT : Just editing files hdfs-site.xml and mapred-site.xml and executing hadoop dfsadmin -refreshNodes might put your datanode into decommissioning node status for a long time. So it is also necessary to change dfs.replication to an appropriate value.

Can Hadoop Yarn run a grid?

Can a Hadoop Yarn instance manage nodes from different places on Earth, networks? Can it manage nodes that use different platforms?
Every note about Yarn I found tells that Yarn manages clusters, but if the app I deploy is written in Java then it should probably work on the nodes regardless of the nodes' hardware.
Similarly, Yarn seems general enough to support more than just a LAN.
YARN is not platform aware. It is also not aware about how application processes on different hosts communicate with each other to perform the work.
In the same time for YARN application master should be run as a command line - and thereof any node on the cluster with enough resources should be able to run it.
If not every platform is capable to run specific app master- then YARN should be aware on it. Today it can not, but I can imegine platform to be special kind of resource - and then YARN will select appropriate node
Regarding LAN if you have application master which knows how to manage job over several LAN - it is should be fine with YARN.

Resources