Algorithm to meta ranking of ranked results - algorithm

Imagine you ask your team mates to do a election on who should organize the next barbecue. Your team is about 120 Persons and you want to select 3 persons out of a pool of 6 persons to do that job.
Each of the 120 persons can vote for up to 3 persons by ranking them: 1st best person is X, 2nd best person Y, 3rd best person is Z.
At the end all votes should be aggregated in a ranked result listing.
| Candidate | Voter 1 | Voter 2 | Voter 3 |
-------------------------------------------
| A | 1. Pos | | 2. Pos |
| B | 3. Pos | 1. Pos | 3. Pos |
| C | 2. Pos | 2. Pos | |
| D | | 3. Pos | |
| E | | | |
| F | | | 1. Pos |
-------------------------------------------
If there where no ranking done by the voters and each vote is equal it would be nice to aggregate the result. B got 3 votes, A and C got 2 votes. All other got less votes. The winner are: A,B and C.
I do not know what algorithms exist to aggregate ranked data and i do not know what the result should look like. F got a vote for pos.1, that is good, but A and B got such a vote too. From my point of view A and B are better, because they got more votes. But is A better than B? A got a pos.2 but B got 2 times pos.3, what should be ranked higher? Is 2 times pos.2 better than 1 time pos.1 and 2 times pos.3?
Sounds like implementing a meta search engine ranking algo. What algorithms exists? What algo should I use?

As you asked "What should i use?" i can recommend that group of methods called "Condorcet methods", as Terje D. mentioned. If you do not want to learn more about the complex theory of election methods i can recommend one of the condorcet methods: "Schulze method" (also known as: path winner or beatpath winner). This is e.g. used by Debian, KDE and Pirates Party of Germany.
You can use this online ballot to get a ad hoc solution to your problem: https://modernballots.com/elections/qm65cnts/vote/
If you want to implement it into your company website (intranet, or whatever) i recommend you contribute to an existing project. If you are a PHP developer check this out: https://bitbucket.org/robla/electowidget/src/14581ac7a5f2/lib/methods/SchulzeMethod.php
Electowidget was initially a plugin for MediaWiki. Maybe it is a good point to start and maybe you want to contribute some changes to make it a library.

Maybe just do it like this: Assign 3 points to each first place, 2 points to each second place, and 1 point to each 3rd place. Then check which candidates have most points.

Related

Optimal way to transfer account's balance between people

I am looking for an optimal way to transfer funds between accounts to make sure that everyone has the same amount on their account. I've calculated correctly number that should be a result in every account but I am searching for an algorithm for who to who should transfer and how much funds (without central bank) so everyone results with the same balance.
Example
+--------+---------+
| person | balance |
+--------+---------+
| A | 7 500 |
| B | -2 500 |
| C | -10 000 |
| D | 15 000 |
+--------+---------+
In this example everyone should end up with balance o 2 500. To achieve that:
Person A should transfer 5 000 to person B
Person D should transfer 12 500 to person C
To sum it up data I have is:
number of people (>= 2)
starting balance of every person
what balance should every person have after transfers
Is there any algorithm for that? Is that an NP-complete problem?
This problem is NP-hard. It's essentially equivalent to this other problem:
Given a group of people and an amount of credits and debits, find the minimum number of transfers between those people to settle all credits and debits.
Here, your goal is to make everyone's total equal to the average, which basically is the same problem as the one above once you subtract the average total from everyone's account.

Building an attribution model

My problem involves measuring the impact of four activities undertaken by a population of 1,000 individuals that is attempting to lose weight.
The four activities are: (a) Eating healthy food; (b) Walking for an hour daily; (c) Meditation daily for 20-minutes; and (d) In-house physical exercise for 30-minutes.
For the sake of simplicity, let us assume that there are 1,000 participants who signed up on January 1, 2015, and their weights were measured then. Furthermore, let us assume that they will diligently commit to doing the same activity for the entirety of a quarter (i.e., if they commit to eating healthy food in Q1 they don't undertake any other activity; however they may change the activity at the start of Q2 or choose to continue what they did in Q1).
Finally, on December 31, 2015, I catch them before they head to Times Square to celebrate the onset of new year, and weigh them.
So my table looks something like :
Individual | Initial Weight | Q1 | Q2 | Q3 | Q4 | Final Weight
A-1 | 183 | A | B | A | C | 176
A-2 | 265 | D | C | B | B | 223
A-3 | 331 | A | A | A | D | 322
.
.
A-1000 | 257 | D | B | C | A | 228
My goal is to measure the impact of each activity's contribution to the weight loss across the population, keeping in mind that there is a distinct possibility that the order of activities undertaken could have an impact.
( In my real problem one of complexities that I haven't spelt out is that instead of doing the same activity throughout the quarter, individuals would have actually done any of those activity on a daily basis.)
Any thoughts would be appreciated.
Stackoverflow is mainly for problems that involve code. Like you have a piece of code that doesn't work for a specific language.
Try posting this question on Stack Exchange users there would love to answer problems like these.

LibreOffice Calc sort rows by column comparison

I'm trying to sort rows based on two columns matching.
For example, in the following table, two users rate the same books. In sorting the example table below, Book 2 should come first, and Book 4 second, because the user's ratings both match.
BOOK USER A USER B
Book 1 4.5 3.5
Book 2 2.0 2.0
Book 3 5.0 3.5
Book 4 3.0 3.0
The remaining which did not match, would be in ascending order based on USER A ratings (although this isn't the important bit really).
I can use the basic Sort - sorting Book and USER A by USER A ascending, then sort USER B ascending separately, and will all match up again with the correct ratings for the correct books, and as I want it. But I need a more functional way of doing this.
Mainly so I can copy the sorted data to a new sheet.
I am not certain if this is "a more functional way of doing this" but assuming something like :
| A | B | C
------------------------------
1 | BOOK USER A USER B
2 | Book 1 4.5 3.5
3 | Book 2 2.0 2.0
4 | Book 3 5.0 3.5
5 | Book 4 3.0 3.0
If the maximum rating is 5 it can be solved easily with a very simple formula in the column D :
=IF(B2-C2=0;-5+B2;B2)
Basically it checks the difference between the columns B and C. If they are equals it will return a negative value based on the difference with the maximum. If not we use the rating from User A.
You can then sort the whole range (ascending) based on column D. You should get the result you want.

Fast way to search based on non-literal comparison

Fast way to search based on non-literal comparison
I am developing a small search over rather large data sets, basically all strings. The relation between the table fields are simple enough, though the comparison mustn’t be literal. i.e. it should be able to correlate “filippo“, “philippo“, “filipo“ and so forth.
I have found a few ways it could be done, very frequently stumbling on Levinstein distance (this, here and here), though I am not sure it is practical on my specific case.
In a nutshell I have two tables, a small one with “search keys“ and a more massive one in which the search should be performed. Both tables have the same fields and they both have the same "meaning". E.g.
KEYS_TABLE
# | NAME | MIDNAME | SURNAME | ADDRESS | PHONE
1 | John | Fake | Doe | Sesame St. | 333-12-32
2 | Ralph | Stue | Michel | Bart. Ghost St. | 778-13000
...
and
SEARCH_TABLE
# | NAME | MIDNAME | SURNAME | ADDRESS | PHONE
...
532 | Jhon | F. | Doe | Sesame Street | 3331232
...
999 | Richard | Dalas | Doe | Sesame St. | 333-12-32
All I want to do is os obtain some sort of metric, or rank for each given record on KEYS_TABLE, report all records from SEARCH_TABLE above a certain relevance (defined either by the metric or simply some "KNN" like method).
I say that Levinstein distance might not be practical because it would require to calculate for every field in every row in KEYS_TABLE x SEARCH_TABLE. Considering that SEARCH_TABLE has about 400 million records and KEYS_TABLE varies from 100k to 1mil, the resulting number is way too large.
I was hoping there was some way I could previously enrich both tables, or some simpler (cheaper) way to perform the search.
Worth mentioning that I am allowed to transform the data at will. e.g. normalise St. to st, Street to st, remove special chars and so on.
What would be my options?
One approach (heuristic!) I can think about is:
In addition to the original fields in the table, for each field also store its normalized form obtained by some stemming algorithm. If you are using java, lucene's EnglishAnalyzer might help you with this step.
Do an exact comparison using the standard methods to find for each entry in table1 a list of candidates. An entry e2 in table2 will be a candidate to entry e1 in table1 if they have some common field where the normalized form matches the regular form. That can be done efficiently using some data structure that allows quick string searches - there are plenty of these.
For each entry in e1 - find the "best" candidate/s for it in the list, using the exact metric you chose (for example your suggested leneshtein distance)
You might want to do some post-processing to make sure you don't have two elements in table1 mapped to the same element in table2, if that's an issue.
Depending on what misspellings are likely, you might be able to use Soundex or Metaphone for your searches.

Open space sitting optimization algorithm

As a result of changes in the company, we have to rearrange our sitting plan: one room with 10 desks in it. Some desks are more popular than others for number of reasons. One solution would be to draw a desk number from a hat. We think there is a better way to do it.
We have 10 desks and 10 people. Lets give every person in this contest 50 hypothetical tokens to bid on the desks. There is no limit of how much you bid on one desk, you can put all 50, which would be saying "I want to sit only here, period". You can also say "I do not care" by giving every desk 5 tokens.
Important note: nobody knows what other people are doing. Everyone has to decide based only on his/her best interest (sounds familiar?)
Now lets say we obtained these hypothetical results:
# | Desk# >| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | Alise | 30 | 2 | 2 | 1 | 0 | 0 | 0 | 15 | 0 | 0 | = 50
2 | Bob | 20 | 15 | 0 | 10 | 1 | 1 | 1 | 1 | 1 | 0 | = 50
...
10 | Zed | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | = 50
Now, what we need to find is that one (or more) configuration(s) that gives us maximum satisfaction (i.e. people get desks they wanted taking into account all the bids and maximizing on the total of the group. Naturally the assumption is the more one bade on the desk the more he/she wants it).
Since there are only 10 people, I think we can brute force it looking into all possible configurations, but I was wondering it there is a better algorithm for solving this kind of problems?
You seem to be looking at the Assignment Problem which can be solved using Hungarian Algorithm. This is a well researched problem and you will probably find code on the web, ready to use.
In your case you can use cost = 50 - bid and use the above (any solution to assignment problem).
Even faster, if you have Excel you should have a version of SOLVER available as well. Just set up your bid matrix (10x10 with bids), assignment matrix (10x10 with 0/1 assignments), use sumproduct(bids,assignments) to calculate the value of an assignment, make that your objective function, and add constraints so the there's only one assignment of people to desks and desks to people. Make sure you have the options> "linear model" box checked and "assume non-negative" and solve away ! I just set up a sample 10x10 problem - seems to work OK.

Resources