Comparison-based ranking algorithm - algorithm

I would like to rank or sort a collection of items (with size potentially greater than 100,000) where items in the collection have no intrinsic (comparable) value, instead all I have is the comparisons between any two items which have been provided by users in a subjective manner.
Example: Consider a collection with elements [a, b, c, d] and comparisons by users b > a, a > d, d > c. The correct order of this collection would be [b, a, d, c].
This example is simple, however there could be more complicated cases:
Since the comparisons are subjective, a user could also say that c > b. In which case that would cause a conflict with the ordering above.
Also you may not have comparisons that “connects” all the items, i.e. b > a, d > c. In which case the ordering is ambiguous. It could be [b, a, d, c] or [d, c, b, a]. In this case either ordering is acceptable.
If possible it would be nice to somehow take into account multiple instances of the same comparison and give those with higher occurrences more weight. But a solution without this condition would still be acceptable.
A similar application of this algorithm was used by Zuckerberg's FaceMash application where he ranked people based on comparisons (if I understood it correctly), but I have not been able to find what that algorithm actually was.
Is there an algorithm which already exists that can solve the problem above? I would not like to spend effort trying to come up with one if that is the case. If there is no specific algorithm, is there perhaps certain types of algorithms or techniques which you can point me to?

This is a problem that has already occurred in another arena: competitive games! Here, too, the goal is to assign each player a global "rank" on the basis of a series of 1 vs. 1 comparisons. The difficulty, of course, is that the comparisons are not transitive (I take "subjective" to mean "provided by a human being" in your question). Kasparov beats Fischer beats (don't know another chess player!) Bob beats Kasparov, potentially.
This renders useless algorithms that rely on transitivity (i.e. a > b and b > c => a > c) as you end up with (likely) a highly cyclic graph.
Several rating systems have been devised to tackle this problem.
The most well-known system is probably the Elo algorithm/score for competitive chess players. Its descendants (for instance, the Glicko rating system) are more sophisticated and take into account statistical properties of the win/loss record---in other words, how reliable is a rating? This is similar to your idea of weighting more heavily records with more "games" played. Glicko also forms the basis for the TrueSkill system used on Xbox Live for multiplayer video games.

You may be interested in the minimum feedback arc set problem. Essentially the problem is to find the minimum number of comparisons that "go the wrong way" if the elements are linearly ordered in some ordering. This is the same as finding the minimum number of edges that must be removed to make the graph acyclic. Unfortunately, solving the problem exactly is NP-hard.
A couple of links that discuss the problem:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.8157&rep=rep1&type=pdf
http://en.wikipedia.org/wiki/Feedback_arc_set

I googled this out, look for chapter 12.3, Topological sorting and Depth-first Search
http://www.cs.cmu.edu/~avrim/451f09/lectures/lect1006.pdf
Your set of relations describe a directed acyclic graph (hopefully acyclic) and so graph topological sorting is exactly what you need.

Related

Team creation algorithm via player preference

I'm making a matchmaking client that matches 10 people together into two teams:
Each person chooses four people they would like to play with, ranked from highest to lowest.
Two teams are then formed out of the strongest relationships in that set.
How would you create an algorithm that solves this problem?
Example:
Given players [a, b, c, d, e, f, g, h, i, j], '->' meaning a preference pick.
a -> b (weight: 4)
a -> c (weight: 3)
a -> d (weight: 2)
a -> e (weight: 1)
b -> d (weight: 4)
b -> h (weight: 3)
b -> a (weight: 2)
...and so on
This problem seemed simple on the surface (after all it is only just a matchmaking client), but after thinking about it for a while it seems that there needs to be quite a lot of relationships taken into account.
Edit (pasted from a comment):
Ideally, I would avoid a brute-force approach to scale to larger games which require 100 players and 25 teams, where picking your preferred teammates would be done through a search function. I understand that this system may not be the best for its purpose - however, it is an interesting problem and I would like to find an efficient solution while learning something along the way.
A disclaimer first.
If your user suggested this, there are two possibilities.
Either they can provide the exact details of the algorithm, so ask them.
Or they most probably don't know what they are talking about, and just generated a partial idea on the spot, in which case, it's sadly not worth much on average.
So, one option is to search how matchmaking works in other projects, disregarding the idea completely.
Another is to explore the user's idea.
Probably it won't turn into a good system, but there is a chance it will.
In any case, you will have to do some experiments yourself.
Now, to the case where you are going to have fun exploring the idea.
First, for separating ten items into two groups of five, there are just choose(10,5)=252 possibilities, so, unless the system has to do it millions of times per second, you can just calculate some score for all of them, and choose the best one.
The most straightforward way is perhaps to consider all 2^{10} = 1024 ways to form a subset of 10 elements, and then explore the ones where the size of the subset is 5.
But there may be better, more to-the-point, tools readily available, depending on the language or framework.
The 10-choose-5 combination is one group, the items not taken are the other group.
So, what would be the score of a combination?
Now we look at our preferences.
For each preference satisfied, we can add its weight, or its weight squared, or otherwise, to the score.
Which works best would sure need some experimentation.
Similarly, for each preference not satisfied, we can add a penalty depending on its weight.
Next, we can consider all players, and maybe add more penalty for each of the players which has none of their preferences satisfied.
Another thing to consider is team balance.
Since the only data so far are preferences (which may well turn out to be insufficient), an imbalance means that one team has many of their preferences satisfied, and the other has only few, if any at all.
So, we add yet another penalty depending on the absolute difference of (satisfaction sum of the first team) and (satisfaction sum of the second team).
Sure there can be other things to factor in...
Based on all this, construct a system which at least looks plausible on the surface, and then experiment and experiment again, tweaking it so that it better fits the matchmaking goals.
I would think of a way to score proposed teams against the selections from people, such as scoring proposed teams against the weights.
I would try and optimise this by hill-climbing (e.g. swapping a pair of people and looking to see if that improves the score) if only because people could look at the final solution and try this themselves - so you don't want to miss improvements of this sort.
I would hill-climb multiple times, from different starting points, and pick the answer found with the best score, because hill-climbing will probably end at local optima, not global optima.
At least some of the starting points should be based on people's original selections. This would be easiest if you got people's selections to amount to an entire team's worth of choices, but you can probably build up a team from multiple suggestions if you say that you will follow person A's suggestions, and then person B's selection if needed, and then person C's selection if needed, and so on.
If you include as starting points everybody's selections, or selections based on priority ABCDE.. and then priority BCDE... and then priority CDEF... then you have the property that if anybody submits a perfect selection your algorithm will recognise it as such.
If your hill-climbing algorithm tries swapping all pairs of players to improve, and continues until it finds a local optimum and then stops, then you also have the property that if anybody submits a selection which is only one swap away from perfection, your algorithm will recognise it as such.

0/1 knapsack with dependent item weight?

The standard 0/1 knapsack requires that the weight of every item is independent to others. Then DP is a efficient algorithm towards the solution. But now I met a similar but extensions of this problem, that
the weight of new items are dependent on previous items already in
the knapsack.
For example, we have 5 items a, b, c, d and e with weight w_a, ..., w_e. item b and c have weight dependency.
When b is already in the knapsack, the weight of item c will be smaller than w_c because it can share some space with b, i.e. weight(b&c) < w_b + w_c. Symmetrically, when c is already in the knapsack, the weight of b will be smaller than w_b.
This uncertainty results a failure of original DP algorithm, since it depend on the correctness of previous iterations which may not correct now. I have read some papers about knapsack but they either have dependencies subjected to profit (quadratic knapsack problem), or have variable weight which follows a random distribution (stochastic knapsack problem). I have also aware of the previous question 1/0 Knapsack Variation with Weighted Edges, but there is only a very generic answer available, and no answer about what is the name of this knapsack.
One existing solution:
I have also read one approximate solution in a paper about DBMS optimizations, where they group the related items as one combined item for knapsack. If use this technique into our example, the items for knapsack will be a, bc, d, e, therefore there is no more dependencies between any two of these four items. However it is easy to construct an example that does not get optimal result, like when an item with "small weight and benefit" is grouped with another item with "large weight and benefit". In this example, the "small" item should not be selected in solution, but is selected together with the "large" item.
Question:
Is there any kind of efficient solving techniques that can get optimal result, or at least with some error guarantee? Or am I taking the wrong direction for modelling this problem?
Could you not have items a, b, c, bc, d and e? Possibly with a constraint that b and bc can't be both in the knapsack and similarly so with c and bc? My understanding is that that would be a correct solution since any solution that has b and c can be improved by substituting both by bc (by definition). The constraints on membership should take care of any other cases.
This is a very interesting problem and I have been working on this for a while. The first thing to consider is that binary knapsack problem with dependent item weights/value is not trivial at all. You may consider using Bayesian networks, Markov models, and other similar techniques for solving this problem. Nonetheless, any practical approach to this problem has to make some assumptions either about the optimization model or its input. Here is an example of formulating the binary knapsack problem with value-dependent items. https://arxiv.org/pdf/1702.06662.pdf
In this work, authors have proposed modeling the input (value-related dependencies) using fuzzy graphs and then using the proposed integer linear programming model to solve the optimization problem. An extended version of the work has been accepted for publication and will be soon available online.
Please do not hesitate to contact me if you needed further information. I can also provide you with the source code of the model if needed.
In the end I managed to solve the problem with the B&B method proposed by #Holt. Here is the key settings:
(0) Before running the B&B algorithm, group all items depend on their dependency. All items in one partition have weight dependency with all other items in the same group, but not with items in other groups.
Settings for B&B:
(1) Upper-bound: assume that the current item has the minimum weight, i.e. assume all dependencies exist.
(2) Lower-bound: assume that the current item has the maximum weight, i.e. assume all dependencies do not exist.
(3) Current weight: Calculate the real current weight.
All the above calculations can be done in a linear time by playing around with the groups we get in step 0. Specifically, when obtaining those weights, scanning only items in current group (the group which the current item be in) is enough - items in other groups have no dependencies with the current one, so it will not change the real weight of current item.

Algorithm for building tree depending on node attributes

I am trying to solve a programming problem where I need to implement the following algorithm (roughly):
There are couple of nodes, ie, A, B, C, etc.
Every node can have multiple items in it, ie, a, b, c, x, y, z, etc. For example,
A [a, b, c, x, y, z]
B [a, b, c]
C [x, y, z]
There can be infinite number of nodes and items and nodes can have any number of items in it (but same item wont repeat again).
What I have to do is I have to create heirarchy among the nodes depending on the common items inside the nodes. So, in the above example, A should have higher heirarchy over B and C. Or in other words, A is master and B and C are the slaves.
So, I was thinking if I can make a tree from the nodes depending on common items, then it will be easier for me. But I don't know which algorithm to use. Anybody know which will be suitable for my case? Building tree is not mandatory, if there are other ways to achieve the same thing, it will be okay. Thanks.
Try using AVL trees.
Note that the worst case for AVL trees may look something like this. You can read more about the worst case here.
Most importantly, given two 'nodes' does the logic to compare them and determine which is higher exist? If not then that needs to be built first!
Once you know how to compare, then AVL trees can be used to build and maintain the 'hierarchy'.
I have adapted the algorithm proposed by the paper "Data Mining for Path Traversal Patterns in a Web Environment" by Ming-Syan Chen, Jong Soo Park and Philip S. Yu. This paper is available here. Though the algorithm here directly does not solve my problem, but I did little bit adaptation in the algorithm so that it fits my problem situation. Now it works fine and I get the result I need.
I would like to thank everyone took time to read my question and proposed solution.

genetic algorithm crossover operation

I am trying to implement a basic genetic algorithm in MATLAB. I have some questions regarding the cross-over operation. I was reading materials on it and I found that always two parents are selected for cross-over operation.
What happens if I happen to have an odd number of parents?
Suppose I have parent A, parent B & parent C and I cross parent A with B and again parent B with C to produce offspring, even then I get 4 offspring. What is the criteria for rejecting one of them, as my population pool should remain the same always? Should I just reject the offspring with the lowest fitness value ?
Can an arithmetic operation between parents, like suppose OR or AND operation be deemed a good crossover operation? I found some sites listing them as crossover operations but I am not sure.
How can I do crossover between multiple parents ?
"Crossover" isn't so much a well-defined operator as the generic idea of taking aspects of parents and using them to produce offspring similar to each parent in some ways. As such, there's no real right answer to the question of how one should do crossover.
In practice, you should do whatever makes sense for your problem domain and encoding. With things like two parent recombination of binary encoded individuals, there are some obvious choices -- things like n-point and uniform crossover, for instance. For real-valued encodings, there are things like SBX that aren't really sensible if viewed from a strict biological perspective. Rather, they are simply engineered to have some predetermined properties. Similarly, permutation encodings offer numerous well-known operators (Order crossover, Cycle crossover, Edge-assembly crossover, etc.) that, again, are the result of analysis of what features in parents make sense to make heritable for particular problem domains.
You're free to do the same thing. If you have three parents (with some discrete encoding like binary), you could do something like the following:
child = new chromosome(L)
for i=1 to L
switch(rand(3))
case 0:
child[i] = parentA[i]
case 1:
child[i] = parentB[i]
case 2:
child[i] = parentC[i]
Whether that is a good operator or not will depend on several factors (problem domain, the interpretation of the encoding, etc.), but it's a perfectly legal way of producing offspring. You could also invent your own more complex method, e.g., taking a weighted average of each allele value over multiple parents, doing boolean operations like AND and OR, etc. You can also build a more "structured" operator if you like in which different parents have specific roles. The basic Differential Evolution algorithm selects three parents, a, b, and c, and computes an update like a + F(b - c) (with some function F) roughly corresponding to an offspring.
Consider reading the following academic articles:
DEB, Kalyanmoy et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, v. 6, n. 2, p. 182-197, 2002.
DEB, Kalyanmoy; AGRAWAL, Ram Bhushan. Simulated binary crossover for continuous search space. Complex systems, v. 9, n. 2, p. 115-148, 1995.
For SBX, method of crossing and mutate children mentioned by #deong, see answer simulated-binary-crossover-sbx-crossover-operator-example
Genetic algorithm does not have an arbitrary and definite form to be made. Many ways are proposed. But generally, what applies in all are the following steps:
Generate a random population by lot or any other method
Cross parents to raise children
Mutate
Evaluate the children and parents
Generate new population based only on children or children and parents (different approaches exist)
Return to item 2
NSGA-II, the DEB quoted above, is one of the most widely used and well-known genetic algorithms. See an image of the flow taken from the article:

Algorithm for best suiting people's choices from a definite list of items where there is only one of each available?

Ladies and Gents,
My best friends and I do a "Secret Santa" type gift exchange every year, this year I've been trying to think of a couple of ways to make it interesting. There are six of us involved and I want to design a small program that allows the six of us to rank their preferred gift-recipients from 1 to 5 as well as their preferred gift-givers.
So, let's say we're called A, B, C, D, E and F.
A submits two lists:
List 1 - People I would most like to give a present to: B, D, C, F, E
List 2 - People I would most like to recieve a present from: F, D, E, B, C
All six of us will submit both these lists, so I'll have 12 lists all together. I suppose my question is what is the best algorithm to now go ahead and assign each person a gift recipient?
I thought of something like this:
If two people have both selected each other in their opposing lists (i.e. A most wants to give to B, B most wants to get from A) then I immediately assign A to B. So now A is removed from our list of gift-recipients and B is removed from our pool of gift-givers.
Once I've assigned the "perfect matches" I'm kind of lost though, is there an establish algorithm for situations like this? Obviously it's only for entertainment value but surely there must be a "real" application of something similar? Perhaps timetabling or something?
My Google-fu has failed me but I have a feeling it might just be due my own lack of precision in search terms.
Cheers,
(and Happy Holidays I guess),
Rob
Update / Part 2
Okay, Ying Xiao came to the rescue by recommending the Gale Shapley Algorithm for the Stable Marriage Problem and I've implemented that in Python and it works a treat. However, this is just a thought that occurred to me. I guess within our group of six best friends there are three pairings of "extra-best" friends so I have a feeling we'll just end up with three pairs of AB, CD, EF and BA, DC, FE in terms of gift giving and recieving.
Is there an algorithm we could design that did take peoples rankings into account but also restricted two people forming a "closed group"? That is, if A is assigned to buy a gift for B, B can not be assigned to buy a gift for A? Perhaps I need to solve the Stable roommates problem?
Related questions:
Secret santa algorithm.
What is the best low-tech protocol to simulate drawing names out of a hat and ensure secrecy?
The Gale-Shapley algorithm (for the Stable Marriage problem) applies only when each person has a ranked list of all other participants -- you may or may not be able to convert your problem to that form (make everyone rank everyone).
Also, note that the thing it is optimizing for is something different: it tries to find a set of stable marriages, where no pair of people will "elope" because they prefer each other to their current partners. This is not something you care about in your Secret Santa application.
What you want (depending on your definition of "best") is a maximum-weight bipartite matching, which fixes both the above objections: put the "givers" on one side, the "receivers" on the other (so two copies of each person, in this case), give each edge a weight corresponding to how highly that giver ranks that receiver, and it is now the assignment problem. You can use the Hungarian algorithm for this, or simpler (slower) ones. You can also vary how you assign the weights to optimize for different things (e.g. maximize the number of people who get their first choice, or minimize the worst choice that anyone gets, etc.)
If you do use the Gale-Shapley stable marriage algorithm, note that it is optimal for the "proposers" (male-optimal and female-pessimal), so be sure to put the "givers" as the "proposers", and not vice versa.
For each person, create two virtual people, a "giver" and a "receiver". Now match the set of givers against the set of receivers using the Gale Shapley Algorithm. Runs in O(n^2) time.
http://en.wikipedia.org/wiki/Stable_marriage_problem

Resources