Could not determine the current leader - mesos

I'm in this situation in which I got two masters and four slaves in mesos. All of them are running fine. But when I'm trying to access marathon I'm getting the 'Could not determine the current leader' error. I got marathon in both masters (117 and 115).
This is basically what I'm running to get marathon up:
java -jar ./bin/../target/marathon-assembly-0.11.0-SNAPSHOT.jar --master 172.16.50.117:5050 --zk zk://172.16.50.115:2181,172.16.50.117:2181/marathon
Could anyone shed some light over this?

First, I would double-check that you're able to talk to Zookeeper from the Marathon hosts.
Next, there are a few related points to be aware of:
Per the Zookeeper administrator's guide (http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup) you should have an odd number of Zookeeper instances for HA. A cluster size of two is almost certainly going to turn out badly.
For a highly available Mesos cluster, you should run an odd number of masters and also make sure to set the --quorum flag appropriately based on that number. See the details of how to set the --quorum flag (and why it's important) in the operational guide on the Apache Mesos website here: http://mesos.apache.org/documentation/latest/operational-guide
In a highly-available Mesos cluster (#masters > 1) you should let both the Mesos agents and the frameworks discover the leading master using Zookeeper. This lets them rediscover the leading master in case a failover occurs. In your case assuming canonical ZK ports you would set the --zk flag on the Mesos masters to --zk=zk://172.16.50.117:2181,172.16.50.115:2181/mesos (add a third ZK instance, see the first point above). The same value should be used for the --master flags in both the Mesos agents and Marathon, instead of specifying a single master.

It's best to run an odd number of masters in your cluster. To do so, either add another master so you have three or remove one so you have only one.

Related

Can we run two 'slave' nodes on the same machines?

We are running a 3 node mesos cluster and mesos master is running on each node. Also, 2 slaves are running on each node. Is this a good practice? 2 slaves on each cluster won't be sending too much offer and end up being overloaded? What is the recommended config for 3 nodes cluster?
Thread from Mesos User Mailing List
It depends on your isolation setting (mainly cgroup, or any node level
resources). In general, we don't recommend folks use multiple agents on a
node.
It's possible to make it work by setting cgroup_root separately for
MesosContainerizer. For DockerContainerizer, currently, we hard code
DOCKER_NAME_PREFIX, making it not possible to use two agents on a node
properly.
Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.

Is it possible to create a hadoop cluster by connecting three single node machines?

If i have three three virtual machines with cloudera hadoop single node installed, is it possible to create a cluster by connecting three of them?
like one as namenode and other two as datanodes.
I am following this documentaion...
Of course you can connect them and things should be easy once all hosts run in pseudo-distributed mode ( all the demons on the same host ).
In theory all you have to do is configuration change on all 3 hosts.
In practice you have to read also this because things are a bit different.
The first external datanode is hard work, any other will follow with no problems.
This tutorial provides exactly what you need. HTH.

Determin whether slave nodes in hadoop cluster has been assigned tasks

I'm new to Hadoop and MapReduce. I just deployed a Hadoop cluster with one master machine and 32 slave machines. However when I start to run an example program, it seems that it just runs to slow. How can I determine whether a map/reduce task has really been assigned to a slave node for execution?
The example program is executed like that:
hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 32 100
okay lots of possibilities there. Hadoop comes out to help in distributed task.
So if your code is written in way that everything is dependent then there is no use of 32 slaves. rather it will take overhead time to manage connection.
check your hadoopMasterIp:50070 if if all the datanodes(slave) is running or not. obviously if you did not change dfs.http.address in your core-site.xml.
The easiest way to take a look at Yarn Web UI. By default it uses port 8088 on your master node (change master in the URI by your own IP address):
http://master:8088/cluster
There you can see total resources of your cluster and list of all applications. For every application you can find out how many mappers/reducers were used and where (on what machine) they were executed.

Provision to start group of applications on same Mesos slave

I have cluster of 3 Mesos slaves, where I have two applications: “redis” and “memcached”. Where redis depends on memcached and the requirement is both of the applications/services should start on same node instead of different slave nodes.
So I have created the application group and added the dependency properly in the JSON file. After launching the JSON file via “v2/groups” REST API, I observe that sometime both application group will start on same node but sometimes it will start on different slaves which breaks our requirement.
So intent/requirement is; if any application fails to start on a slave both the application should failover to other slave node. Also can I configure the JSON file to tell Marathon to start the application group on slave-1 (specific slave first) if it is available else start it on other slave in a cluster. Due to some reason if this application group will start on other slave can Marathon relaunch the application group to slave-1 if it is available to serve the request.
Thanks in advance for help.
Edit/Update (2):
Mesos, Marathon, and DC/OS support for PODs is available now:
DC/OS: https://dcos.io/docs/1.9/usage/pods/using-pods/
Mesos: https://github.com/apache/mesos/blob/master/docs/nested-container-and-task-group.md
Marathon: https://github.com/mesosphere/marathon/blob/master/docs/docs/pods.md
I assume you are talking about marathon apps.
Marathon application groups don't have any semantics concerning co-location on the same node and the same is the case for dependencies.
You seem to be looking for a Kubernetes like Pod abstraction in marathon, which is on the roadmap but not yet available (see update above :-)).
Hope this helps!
I think this should be possible (as a workaround) if you specify the correct app contraints within the group's JSON.
Have a look at the example request at
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_groups_post
and the constraints syntax at
https://mesosphere.github.io/marathon/docs/constraints.html
e.g.
"constraints": [["hostname", "CLUSTER", "slave-1"]]
should do. Downside is that there will be no automatic failover to another slave that way. Still, I'd be curious why both apps need to specifically run on the same slave node...

Set up mesosphere chronos cluster

I am using chronos as timer service and need set up a cluster in case one of them goes down unexpectedly. I set up mesos master/slaves, zookeeper, and added mesos master/zookeeper addresses to each chronos node. What I got finally:
1. each chronos node shared the same jobs data
2. one chronos node as a framework was registered to mesos master
3. I ran curl -IL for each node but didn't get redirected to the leading node. As the doc (https://mesos.github.io/chronos/docs/faq.html#which-node) says, I should be redirected.
By following the clustering guide (https://github.com/Metaswitch/chronos/blob/dev/doc/clustering.md), I created the chronos_cluster.conf and restarted all nodes, nothing changed. I guess I failed to get the chronos cluster running correctly. Did I missing something or did anything wrong? I didn't found a guide on http://mesos.github.io/chronos/docs/. Thanks!
Resolved. In fact all nodes share same zookeeper, then they run on a cluster. I saw the log message saying "INFO Proxying request to ip-xxx-xxx-xxx-xxx:4400 . (org.apache.mesos.chronos.scheduler.api.RedirectFilter:37)"

Resources