How can I tell how many nodes are in my YARN cluster - hadoop

I want to find out how many nodes are in a yarn cluster that I'm working on.
I don't have any admin Ambari privileges and I can see the yarn UI page and I can see the nodes page.
I can see there are 4 nodes.
However, can someone tell me does that mean there are 4 nodes running yarn Node Manager, or is that 4 nodes that have yarn containers already spun up?
Don't direct me to the yarn documentation!
I've already been looking through it.

You could just ask the people who do have access to Ambari (maybe even ask if they let you get a read only account).
The YARN UI tells you both, though. It tells you how many nodes are available, the sum of all available node memory, and the number of running, pending, and allocated containers.
If no containers are up, the nodes are still listed

Related

Yarn UI shows no information about applications

I know that the similar question was asked Applications not shown in yarn UI when running mapreduce hadoop job?
but the answers did not solve my problems.
I am running Hadoop streaming on Linux 17.01. I setup a cluster with 3 nodes and 1 master node.
When I start Hadoop, I can access localhost:50070 to see other nodes (all nodes are alive).
However, I see no information in "Application" of localhost:8088
as well as by command "yarn application -list -appStates ALL".
Here is my configuration.
My yarn-site.xml (for all nodes)
Here is all processes on master node
The problems may due to yarn services are running on ipv6. However, I followed I followed this thread
https://askubuntu.com/questions/440649/how-to-disable-ipv6-in-ubuntu-14-04
to change all Yarn services to ipv4. However, still there is no tasks displayed on Yarn UI, even I can see all nodes in my cluster marked as "active" on Yarn UI.
So, I do not know why this happened. Do you have any suggestion?
Thank you very much.
I haven't typically seen YARN being configured for IPv4, but this property is added into the hadoop-env.sh
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
I'm sure you also add a similar variable into the yarn-env.sh for YARN_OPTS, I think
However, it's not really clear from the your question when / if you've even submitted an application for anything to appear

Can we run two 'slave' nodes on the same machines?

We are running a 3 node mesos cluster and mesos master is running on each node. Also, 2 slaves are running on each node. Is this a good practice? 2 slaves on each cluster won't be sending too much offer and end up being overloaded? What is the recommended config for 3 nodes cluster?
Thread from Mesos User Mailing List
It depends on your isolation setting (mainly cgroup, or any node level
resources). In general, we don't recommend folks use multiple agents on a
node.
It's possible to make it work by setting cgroup_root separately for
MesosContainerizer. For DockerContainerizer, currently, we hard code
DOCKER_NAME_PREFIX, making it not possible to use two agents on a node
properly.
Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.

How to allocate physical resources for a big data cluster?

I have three servers and I want to deploy Spark Standalone Cluster or Spark on Yarn Cluster on that servers.
Now I have some questions about how to allocate physical resources for a big data cluster. For example, i want to know whether i can deploy Spark Master Process and Spark Worker Process on the same node. Why?
Server Details:
CPU Cores: 24
Memory: 128GB
I need your help. Thanks.
Of course you can, just put host with Master in slaves. On my test server I have such configuration, master machine is also worker node and there is one worker-only node. Everything is ok
However be aware, that is worker will fail and cause major problem (i.e. system restart), then you will have problem, because also master will be afected.
Edit:
Some more info after question edit :) If you are using YARN (as suggested), you can use Dynamic Resource Allocation. Here are some slides about it and here article from MapR. It a very long topic how to configure memory properly for given case, I think that these resources will give you much knowledge about it
BTW. If you have already intalled Hadoop Cluster, maybe try YARN mode ;) But it's out of topic of question

Could not determine the current leader

I'm in this situation in which I got two masters and four slaves in mesos. All of them are running fine. But when I'm trying to access marathon I'm getting the 'Could not determine the current leader' error. I got marathon in both masters (117 and 115).
This is basically what I'm running to get marathon up:
java -jar ./bin/../target/marathon-assembly-0.11.0-SNAPSHOT.jar --master 172.16.50.117:5050 --zk zk://172.16.50.115:2181,172.16.50.117:2181/marathon
Could anyone shed some light over this?
First, I would double-check that you're able to talk to Zookeeper from the Marathon hosts.
Next, there are a few related points to be aware of:
Per the Zookeeper administrator's guide (http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup) you should have an odd number of Zookeeper instances for HA. A cluster size of two is almost certainly going to turn out badly.
For a highly available Mesos cluster, you should run an odd number of masters and also make sure to set the --quorum flag appropriately based on that number. See the details of how to set the --quorum flag (and why it's important) in the operational guide on the Apache Mesos website here: http://mesos.apache.org/documentation/latest/operational-guide
In a highly-available Mesos cluster (#masters > 1) you should let both the Mesos agents and the frameworks discover the leading master using Zookeeper. This lets them rediscover the leading master in case a failover occurs. In your case assuming canonical ZK ports you would set the --zk flag on the Mesos masters to --zk=zk://172.16.50.117:2181,172.16.50.115:2181/mesos (add a third ZK instance, see the first point above). The same value should be used for the --master flags in both the Mesos agents and Marathon, instead of specifying a single master.
It's best to run an odd number of masters in your cluster. To do so, either add another master so you have three or remove one so you have only one.

Can Hadoop Yarn run a grid?

Can a Hadoop Yarn instance manage nodes from different places on Earth, networks? Can it manage nodes that use different platforms?
Every note about Yarn I found tells that Yarn manages clusters, but if the app I deploy is written in Java then it should probably work on the nodes regardless of the nodes' hardware.
Similarly, Yarn seems general enough to support more than just a LAN.
YARN is not platform aware. It is also not aware about how application processes on different hosts communicate with each other to perform the work.
In the same time for YARN application master should be run as a command line - and thereof any node on the cluster with enough resources should be able to run it.
If not every platform is capable to run specific app master- then YARN should be aware on it. Today it can not, but I can imegine platform to be special kind of resource - and then YARN will select appropriate node
Regarding LAN if you have application master which knows how to manage job over several LAN - it is should be fine with YARN.

Resources