(sorry for the poor english)
Hi, I'm trying to implement Gale, Shapley algorithm to different sized groups. I found a trick that consisted to duplicate "offers" for a given man (ie: a man's preferences is repeated in the men's preference list) and this second offer is added in women's lists.
example:
Let men be {1,2} and women {3,4}, I want some of the men to have more than one marriage (let 1 be this lucky/unlucky guy)
From the start the preferences lists of men is:
1:[3,4]
2:[4,3]
the preferences lists of women is:
3:[2,1]
4:[1,2]
It is a one-to-one matching problem.
To handle the case of polygamy, I follow the trick.
I can create a man 1' which hold the second offer of man 1 and has the same preference as 1
1':[3,4]
and I update women's list to add 1':
3:[2,1,1']
4:[1,1',2]
It becomes a many-to-one matching problem.
However it's now possible to one man (1) to get two mariages while another is still single. How can I get rid of it?
If you rank all of the duplicates below all of the non-duplicates, then Gale–Shapley should match everyone, e.g., 3:[2,1,1'] and 4:[1,2,1']. The reasoning is that no woman will reject a proposal from a non-duplicate man if she's matched with a duplicate man, hence all non-duplicate men will be matched.
Related
I have an Elasticsearch query which is fairly big. Paraphrasing, this query returns matching "consultants" and sorts them in the following fashion:
distance
based on their office, consultants can share offices and have the exact same distance value, consultants at the nearest office show first)
office leaders
boolean field, consultant is a leader or they are not, this sort will push office leaders to the top of their office "grouping"
all remaining consultants
By way of example, for me (based in London) I might get something like this:
1. [London office leader]
2. [London consultant]
3. [London consultant]
4. [London consultant]
5. [Paris office leader]
6. [Paris consultant]
7. ...etc
My challenge here is to ensure that the "all remaining consultants" section contains a balanced mix of males and females (i.e we don't have a group of men, then a group of women, but rather have them more evenly spread).
I have tried using a script sort to "boost" consultants with a gender === F || id is odd calculation; the idea was that all women would be boosted, and a percentage of men would also be boosted. However, this doesn't work very well and results in "clumping" of women (i.e a man, followed by nine females, then two men and then the other females).
I am open to suggestions, but my current idea as to how to approach this is to manually specify a "pattern", e.g. something like:
MFMFMMFMFFFMFMMFFMFMMFMFFFMFMF
and have ES sort the results so that the pattern is satisfied.
My end goal is to sort something like this:
London
leaders (sorted by M/F pattern)
others (sorted by M/F pattern)
Paris
leaders (sorted by M/F pattern)
others (sorted by M/F pattern)
This is where I'm a little stuck - I'm sure I can write a script sort, but I think I need context outside of the document currently being sorted, e.g. a global reference to know what has already been sorted.
The result set is paginated so my solution needs to ensure that the order is the same every time (or at least consistent for the session).
Any help appreciated, even if it's just pointing me in the direction of a specific algorithm which I've not thought of.
Sort of a very long winded explanation of what I'm looking at so I apologize in advance.
Let's consider a Recipe:
Take the bacon and weave it ...blahblahblah...
This recipe has 3 Tags
author (most important) - Chandler Bing
category (medium importance) - Meat recipe (out of meat/vegan/raw/etc categories)
subcategory (lowest importance) - Fast food (our of fast food / haute cuisine etc)
I am a new user that sees a list of randomly sorted recipes (my palate/profile isn't formed yet). I start interacting with different recipes (reading them, saving them, sharing them) and each interaction adds to my profile (each time I read a recipe a point gets added to the respective category/author/subcategory). After a while my profile starts to look something like this :
Chandler Bing - 100 points
Gordon Ramsey - 49 points
Haute cuisine - 12 points
Fast food - 35 points
... and so on
Now, the point of all this exercise is to actually sort the recipe list based on the individual user's preferences. For example in this case I will always see Chandler Bing's recipes on the top (regardless of category), then Ramsey's recipes. At the same time, Bing's recipes will be sorted based on my preferred categories and subcategories, seeing his fast food recipes higher than his haute cuisine ones.
What am I looking at here in terms of a sorting algorithm?
I hope that my question has enough information but if there's anything unclear please let me know and I'll try to add to it.
I would allow the "Tags" with the most importance to have the greatest capacity in point difference. Example: Give author a starting value of 50 points, with a range of 0-100 points. Give Category a starting value of 25 points, with a possible range of 0-50 points, give subcategory a starting value of 12.5 points, with a possible range of 0-25 points. That way, if the user's palate changes over time, s/he will only have to work down from the maximum, or work up from the minimum.
From there, you can simply add up the points for each "Tag", and use one of many languages' sort() methods to compare each recipe.
You can write a comparison function that is used in your sort(). The point is when you're comparing two recipes just add up the points respectively based on their tags and do a simple comparison. That and whatever sorting algorithm you choose should do just fine.
You can use a recursively subdividing MSD (sort of radix sort algorithm). Works as follows:
Take the most significant category of each recipe.
Sort the list of elements based on that category, grouping elements with the same category into one bucket (Ramsay bucket, Bing bucket etc).
Recursively sort each bucket, starting with the next category of importance (Meat bucket etc).
Concatenate the buckets together in order.
Complexity: O(kn) where k is the number of category types and N is the number of recipes.
I think what you're looking for is not a sorting algorithm, but a rating scheme.
You say, you want to sort by preferences. Let's assume, these preferences have different “dimensions”, like level of complexity, type of cuisine, etc.
These dimensions have different levels of measurement. These can be e.g. numeric or simple categories/tags. It would be your job to:
Create a scheme of dimensions and scales that can represent a user's preferences.
Operationalize real-world data to fit into this scheme.
Create a profile for the users which reflects their preferences. Same for the chefs; treat them just like normal users here.
To actually match a user to a chef (or, even to another user), create a sorting callback that matches all your dimensions against each other and makes sure that in each of the dimension the compared users have a similar value (on a numeric scale), or an overlapping set of properties (on a nominal scale, like tags). Then you sort the result by the best match.
I am trying to solve a problem of a dating site. Here is the problem
Each user of app will have some attributes - like the books he reads, movies he watches, music, TV show etc. These are defined top level attribute categories. Each of these categories can have any number of values. e.g. in books : Fountain Head, Love Story ...
Now, I need to match users based on profile attributes. Here is what I am planning to do :
Store the data with reverse indexing. i.f. Each of Fountain Head, Love Story etc is index key to set of users with that attribute.
When a new user joins, get the attributes of this user, find which index keys for this user, get all the users for these keys, bucket (or radix sort or similar sort) to sort on the basis of how many times a user in this merged list.
Is this good, bad, worse? Any other suggestions?
Thanks
Ajay
The algorithm you described is not bad, although it uses a very simple notion of similarity between people.
Let us make it more adjustable, without creating a complicated matching criteria. Let's say people who like the same book are more similar than people who listen to the same music. The same goes with every interest. That is, similarity in different fields has different weights.
Like you said, you can keep a list for each interest (like a book, a song etc) to the people who have that in their profile. Then, say you want to find matches of guy g:
for each interest i in g's interests:
for each person p in list of i
if p and g have mismatching sexual preferences
continue
if p is already in g's match list
g->match_list[p].score += i->match_weight
else
add p to g->match_list with score i->match_weight
sort g->match_list based on score
The choice of weights is not a simple task though. You would need a lot of psychology to get that right. Using your common sense however, you could get values that are not that far off.
In general, matching people is much more complicated than summing some scores. For example a certain set of matching interests may have more (or in some cases less) effect than the sum of them individually. Also, an interest in one may totally result in a rejection from the other no matter what other matching interest exists (Take two very similar people that one of them loves and the other hates twilight for example)
Suppose I have a list of (e.g.) restaurants. A lot of users get a list of pairs of restaurants, and select the one of the two they prefer (a la hotornot).
I would like to convert these results into absolute ratings: For each restaurant, 1-5 stars (rating can be non-integer, if necessary).
What are the general ways to go with this problem?
Thanks
I would consider each pairwise decision as a vote in favor of one of the restaurants, and each non-preferred partner as a downvote. Count the votes across all users and restaurants, and then sort cluster them equally (so that that each star "weighs" for a number of votes).
Elo ratings come to mind. It's how the chess world computes a rating from your win/loss/draw record. Losing a matchup against an already-high-scoring restaurant gets penalized less than against a low-scoring one, a little like how PageRank cares more about a link from a website it also ranks highly. There's no upper bound to your possible score; you'd have to renormalize somehow for a 1-5 star system.
A sort is said to be stable if it maintains the relative order of elements with equal keys. I guess my question is really, what is the benefit of maintaining this relative order? Can someone give an example? Thanks.
It enables your sort to 'chain' through multiple conditions.
Say you have a table with first and last names in random order. If you sort by first name, and then by last name, the stable sorting algorithm will ensure people with the same last name are sorted by first name.
For example:
Smith, Alfred
Smith, Zed
Will be guaranteed to be in the correct order.
A sorting algorithm is stable if it preserves the order of duplicate keys.
OK, fine, but why should this be important? Well, the question of "stability" in a sorting algorithm arises when we wish to sort the same data more than once according to different keys.
Sometimes data items have multiple keys. For example, perhaps a (unique) primary key such as a social insurance number, or a student identification number, and one or more secondary keys, such as city of residence, or lab section. And we may very well want to sort such data according to more than one of the keys. The trouble is, if we sort the same data according to one key, and then according to a second key, the second key may destroy the ordering achieved by the first sort. But this will not happen if our second sort is a stable sort.
From Stable Sorting Algorithms
A priority queue is an example of this. Say you have this:
(1, "bob")
(3, "bill")
(1, "jane")
If you sort this from smallest to largest number, an unstable sort might do this.
(1, "jane")
(1, "bob")
(3, "bill")
...but then "jane" got ahead of "bob" even though it was supposed to be the other way around.
Generally, they are useful for sorting multiple entries in multiple steps.
Not all sorting is based upon the entire value. Consider a list of people. I may only want to sort them by their names, rather than all of their information. With a stable sorting algorithm, I know that if I have two people named "John Smith", then their relative order is going to be preserved.
Last First Phone
-----------------------------
Wilson Peter 555-1212
Smith John 123-4567
Smith John 012-3456
Adams Gabriel 533-5574
Since the two "John Smith"s are already "sorted" (they're in the order I want them), I won't want them to change positions. If I sort these items by last, then first with an unstable sorting algorithm, I could end up either with this:
Last First Phone
-----------------------------
Adams Gabriel 533-5574
Smith John 123-4567
Smith John 012-3456
Wilson Peter 555-1212
Which is what I want, or I could end up with this:
Last First Phone
-----------------------------
Adams Gabriel 533-5574
Smith John 012-3456
Smith John 123-4567
Wilson Peter 555-1212
(You see the two "John Smith"s have switched places). This is NOT what I want.
If I used a stable sorting algorithm, I would be guaranteed to get the first option, which is what I'm after.
An example:
Say you have a data structure that contains pairs of phone numbers and employees who called them. A number/employee record is added after each call. Some phone numbers may be called by several different employees.
Furthermore, say you want to sort the list by phone number and give a bonus to the first 2 people who called any given number.
If you sort with an unstable algorithm, you may not preserve the order of callers of a given number, and the wrong employees could be given the bonus.
A stable algorithm makes sure that the right 2 employees per phone number get the bonus.
It means if you want to sort by Album, AND by Track Number, that you can click Track number first, and it's sorted - then click Album Name, and the track numbers remain in the correct order for each album.
One case is when you want to sort by multiple keys. For example, to sort a list of first name / surname pairs, you might sort first by the first name, and then by the surname.
If your sort was not stable, then you would lose the benefit of the first sort.
The advantage of stable sorting for multiple keys is dubious, you can always use a comparison that compares all the keys at once. It's only an advantage if you're sorting one field at a time, as when clicking on a column heading - Joe Koberg gives a good example.
Any sort can be turned into a stable sort if you can afford to add a sequence number to the record, and use it as a tie-breaker when presented with equivalent keys.
The biggest advantage comes when the original order has some meaning in and of itself. I couldn't come up with a good example, but I see JeffH did so while I was thinking about it.
Let's say you are sorting on an input set which has two fields, and, you only sort on the first. The '|' character divides the fields.
In the input set you have many entries, but, you have 3 entries that look like
.
.
.
AAA|towing
.
.
.
AAA|car rental
.
.
.
AAA|plumbing
.
.
.
Now, when you get done sorting you expect all the fields with AAA in them to be together.
A stable sort will give you:
.
.
.
AAA|towing
AAA|car rental
AAA|plumbing
.
.
.
ie, the three records which had the same sort key, AAA, are in the same order in the output that they were in the input. Note that they are not sorted on the second field, because you didn't sort on the second field in the record.
An unstable sort will give you:
.
.
.
AAA|plumbing
AAA|car rental
AAA|towing
.
.
.
Note that the records are still sorted only on the first field, and, the order of the
second field differs from the input order.
An unstable sort has the potential to be faster. A stable sort tends to mimic what non-computer scientist/non-math folks have in their mind when they sort something. Ie, if you did an insertion sort with index cards you would most likely have a stable sort.
You can't always compare all the fields at once. A couple of examples: (1) memory limits, where you are sorting a large disk file, and there isn't room for all the fields of all records in main memory; (2) Sorting a list of base class pointers, where some of the objects may be derived subclasses (you only have access to the base class fields).
Also, stable sorts have deterministic output given the same input, which can be important for debugging and testing.