how to specify weights for different roles in apache mesos - mesos

i would like to understand the concept of weights for different roles in mesos and where to specify that. I have used --weights parameter while starting the mesos-master. But it didn't work. I want to know like if for a particular role, if we give weight as 2.0, how does the resource allocation change compared to default weight of 1.

Related

consistent hashing where you want a key mapped to multiple servers

I'm wondering if I'm missing a concept here somewhere, and wondering if someone can explain how this might work.
My understanding of consistent hashing makes perfect sense where I want to map a particular key to one server. I can just map the key to a single server or virtual node clockwise or counterclockwise from the key.
How does consistent hashing work if I want to specify that each key should be mapped to some quorum of servers I define? For example, what if I have 5 servers and want each key mapped on at least two servers? Would I just choose the two unique servers clockwise on the ring, or is there some other strategy you need? Could you equivalently choose one server clockwise and one counterclockwise? What if you want a key mapped to an arbitrary number of servers?
My issue may be also that I just don't know the right terminology to search for. My particular use case is that I want to have a cluster of Prometheus collectors, say 7, and say I have a pool of 150 exporters. I want each exporter to be collected by at least 3 of the collectors. What's a good way to think about this problem? Appreciate thoughts, thanks!
It turns out that consistent hashing is actually a special case of rendezvous hashing, which is what I was looking for.

Efficient node traffic allocation

Users can be assigned to one experiment on my site. I have an API that developers use to trigger logic for each experiment. They call ExperimentEngine.run() to trigger the code logic for the experiment.
I would like to allocate traffic to each experiment at exposure, at the point where a user might be exposed to the logic for that experiment. I would like to allocate traffic so that experiments that users usually see last don't get starved.
For example, if user A is exposed to experiment A at login and then goes to page B and get's exposed to experiment B, the user A should be assigned to either experiment A or B at point exposure. That means that they will only see one of the experiments and not both (either A or B) or neither. I would like to figure out the right algorithm so that experiment B (which is downstream and shown to the user after they've seen experiment A) does not get starved of traffic. I don't want all traffic going to experiment A.
So the flow is as follow
User visits page A where experiment A is implemented
We decide whether to assign users to experiment A. If user is assigned to A, user will be able to see experiment A.
User visits page B where experiment B is implemented, we decide whether to assign users to experiment B
Users can only see experiments that they are assigned to.
I want to come up with an algorithm that allows me to assign traffic to
experiments regardless of where they are implemented so that the traffic distribution is efficient and experiments implemented downstream don't get starved (even though user sees experiment B last, they still get a good chance of being assigned to B)
Can someone please point me in the right direction to an algorithm that I can use to efficiently allocate traffic to experiments so that experiments reach sample size and stats sig in good time in a system where experiments are allocated traffic at point of exposure and where experiments are "exposed" to the user at different points of the flow (early or later on) and in a way that makes it so that experiments exposed later on are not starved of traffic?
A possible algorithm:
For each experiment, we make a decision of whether to assign based on the experiment's location using coin flip.
If we get heads, a list of experiments that match the user's criteria and that are implemented for that location are selected.
An experiment is chosen from that list based on priority system.
At every location, a % of users are assigned to one of the experiments implemented at that location.
When we decide to assign or not to assign to any experiments at that location, that decision is not made again for the user.
What I am struggling with is what that priority system algorithm should be?
and also is this the most efficient way to assign users to experiments that are implemented at different points of the flow?
How do we decide whether to assign users to an experiment at a specific location? Right now we use coin flip, but that means 50% of users will be assigned to an experiment at each location, which does not work.
If you can collect lists of page visits per user then you can work out, for each probability of running an experiment when a user visits its page, the probability with which each experiment is run.
Given this you need to work out what collection of probability settings will achieve the desired result. If you have a user track that visits pages A,B,C each running different experiments with probabilities p, q, r, then the probability of running A is p, the probability of running B is q(1-p), and the probability of running C is r(1-q)(1-p), and the overall probabilities are the sum of all of the user tracks - so you can work out not only the probabilities as a function of p,q,r but also the derivatives of these probabilities with respect to p,q,r.
This means that you should be able to find some numerical analysis optimization routine that will find values of p,q,r... to minimize the sum of the squared differences between the probabilities of running particular experiments from those values and whatever target values for those probabilities you have.
(Actually the maths might be nicer if you optimize some linear function of the probability the user running the various experiments, probably varying the linear function until you get a result that appeals to you).

Is there a strongly consistent group membership protocol?

I'm looking for an algorithm where groups of connected nodes can be merged together to form a new group (by creating links between nodes from different groups). And where a group can be partitioned to form new partitions.
Unlike with consensus style of membership protocols (e.g. the one described in the Raft paper) where only one group can remain after the partition, I'd like each new partition to form a new group.
Also I'd like that for each partition, each of its members is going to agree on who belongs to that partition with a strong consistency guarantee.
Or put differently, I'd like the following property to hold: After a group undergoes a membership change, if two nodes that belonged to the original group can still communicate (there is a path between the two), they should agree on the sequence of changes that happened to the group.
My understanding is that the fact that each new partition is going to agree on a different set of members in a sense implies that the Consistency part from the CAP theorem is relaxed. Giving me hope that such protocol may exist(?).
No consensus protocol (such as Paxos, Raft etc.) can be leveraged to develop a multi-group membership protocol. This is due to the fact that all consensus protocols are based on the fundamental idea that any decision can be taken only if the majority of members have "agreed-accepted" it. In this way, the "split-brain" phenomenon is avoided, because there could be no 2 partitions (of size greater than the majority: (n/2)+1) that have agreed on a different leader (and thus member set), since at least one member would be member of both partitions and would have voted for only one of the partitions (the one that asked first for vote).
One protocol that could possibly be leveraged to create a multi-group membership protocol is the Virtual Synchrony. However, note that virtual synchrony is a protocol used to send messages to (statically) predefined process groups, aka to the currently existing members of these groups. As a result, it is not designed for cases, where new process groups should be created (dynamically) at each new partition. Also note that the virtual synchrony is a protocol that does not scale to bigger members, since the message latency is proportional to the groups size.
I believe that by using the virtual synchrony protocol, you could develop such a membership protocol, which could satisfy the condition
After a group undergoes a membership change, if two nodes that belonged to the original group can still communicate (there is a path between the two), they should agree on the sequence of changes that happened to the group
However, note that this membership is not strongly consistent in a strict sense, because the failure of a node might be propagated inside the group eventually. Nonetheless, the message deliveries (which is what matters most) will be delivered in a way that ensures these deliveries obey the memberships of the group. This is achieved through imposing order on message deliveries in the members side.
Another alternative approach for a membership protocol are gossip-based membership protocols, with real-life implementations being integrated in various tools in industry, such as Consul. In order to leverage this approach, you could emit multiple different classes of messages from each member, depending on the different groups that you would like to monitor. However, again these groups are statically defined inside the protocol and eventually consistent (meaning that every failure will be finally detected by all live members).
As a conclusion, I think that a strongly-consistent membership protocol is not feasible in a strict definition, since you cannot distinguish between a member that has failed and a member that is responding really-really slow (basis of FLP and CAP theorem).

Algorithm for comparator with multi product results

I would like to implement an application that creates customized packages starting from a user's search.
For example, if the user has a sum of money and wants to assemble a computer, the system must return the best possible combination of components being in that price. For example, the user indicates that requires a processor, a mainboard, a RAM and an HDD: He must spend at most $ 1,000.
Then, the application will search and display the best combination for the user (each product has a weight - some kind of ranking - and value).
Which algorithm you advise me to do this research?
Thank you

does k-means clusterer of apache commons math contains a means method?

I have to get the means of a k-means clustering. currently I'm using the apache commons math library which implements a k-means plus plus clustering algorithm. do anybody know, if there is a simple way to get the means after the clustering with this library or have i to implement it by myself?
if not, can you explain me how to calculate it or give me a code example?
The output of the clustering algorithm must at least contain the cluster assignments, i.e. which cluster each point belongs to. If you have that, then the k-means clustering cluster centers are simply given by the mean of the points that belong to each cluster.
The KMeansPlusPlusClusterer (package org.apache.commons.math3.ml.clustering, version 3.2+ ) returns a List of CentroidCluster objects. From a CentroidCluster you can get the cluster center (= mean of cluster points) by calling the getCenter() method.

Resources