Building an attribution model - measurement

My problem involves measuring the impact of four activities undertaken by a population of 1,000 individuals that is attempting to lose weight.
The four activities are: (a) Eating healthy food; (b) Walking for an hour daily; (c) Meditation daily for 20-minutes; and (d) In-house physical exercise for 30-minutes.
For the sake of simplicity, let us assume that there are 1,000 participants who signed up on January 1, 2015, and their weights were measured then. Furthermore, let us assume that they will diligently commit to doing the same activity for the entirety of a quarter (i.e., if they commit to eating healthy food in Q1 they don't undertake any other activity; however they may change the activity at the start of Q2 or choose to continue what they did in Q1).
Finally, on December 31, 2015, I catch them before they head to Times Square to celebrate the onset of new year, and weigh them.
So my table looks something like :
Individual | Initial Weight | Q1 | Q2 | Q3 | Q4 | Final Weight
A-1 | 183 | A | B | A | C | 176
A-2 | 265 | D | C | B | B | 223
A-3 | 331 | A | A | A | D | 322
.
.
A-1000 | 257 | D | B | C | A | 228
My goal is to measure the impact of each activity's contribution to the weight loss across the population, keeping in mind that there is a distinct possibility that the order of activities undertaken could have an impact.
( In my real problem one of complexities that I haven't spelt out is that instead of doing the same activity throughout the quarter, individuals would have actually done any of those activity on a daily basis.)
Any thoughts would be appreciated.

Stackoverflow is mainly for problems that involve code. Like you have a piece of code that doesn't work for a specific language.
Try posting this question on Stack Exchange users there would love to answer problems like these.

Related

Optimal way to transfer account's balance between people

I am looking for an optimal way to transfer funds between accounts to make sure that everyone has the same amount on their account. I've calculated correctly number that should be a result in every account but I am searching for an algorithm for who to who should transfer and how much funds (without central bank) so everyone results with the same balance.
Example
+--------+---------+
| person | balance |
+--------+---------+
| A | 7 500 |
| B | -2 500 |
| C | -10 000 |
| D | 15 000 |
+--------+---------+
In this example everyone should end up with balance o 2 500. To achieve that:
Person A should transfer 5 000 to person B
Person D should transfer 12 500 to person C
To sum it up data I have is:
number of people (>= 2)
starting balance of every person
what balance should every person have after transfers
Is there any algorithm for that? Is that an NP-complete problem?
This problem is NP-hard. It's essentially equivalent to this other problem:
Given a group of people and an amount of credits and debits, find the minimum number of transfers between those people to settle all credits and debits.
Here, your goal is to make everyone's total equal to the average, which basically is the same problem as the one above once you subtract the average total from everyone's account.

How to space neighbouring wireless transmissions temporally?

I have a rather interesting problem that has probably been solved before, but I'm not really sure where to look.
The situation
I am developing a system that consists of:
mobile wireless nodes that move around
the nodes broadcast a beacon every 1000 milliseconds
the nodes can hear the beacons from other such nodes
the beacon includes information about the sender (that is unique, like a MAC address)
the beacon can include other information to a limit (such as info about other neighbours)
The requirements
I am trying to develop an algorithm & implementation that:
gets neighbouring nodes to align themselves on 'virtual slots' in the 1000ms (given +/- 10ms tolerance)
tolerates up to 4 neighbouring nodes within radio range of each other (i.e. the number of slots would be 4, i.e. 250ms +/-10ms)
can work through observation alone (i.e. is fully decentralised)
can tolerate sets of already aligned neighbours coming in range of each other (i.e. 2 + 2 = 4)
gets nodes to converge on the slots gradually by lengthening or shortening beacon period to 1001/999ms respectively
makes no assumptions about the bidirectionality of the radio link
E.g. some example topologies that should work are:
4 nodes that can all see each other
A-B
|X|
C-D
long lines that can see at most 4 neighbours
A-B-C-D-E-F-G-H-I
|
X-Y-Z
Examples of how the algorithm COULD work
Example 1 -A & B close to each other, algorithm decides B should move towards the right
1000 2000 3000 4000
0 250 500 750 0 250 500 750 0 250 500 750 0 250 500 750 0
| | | | | | | | | | | | | | | | |
A A A A A
>> >>>>
B B B B
Time ------------------------------------------------------------------------------>
Example 2 - A & B aligned, algorithm decides C should move towards the left
1000 2000 3000 4000
0 250 500 750 0 250 500 750 0 250 500 750 0 250 500 750 0
| | | | | | | | | | | | | | | | |
A B A B A B A B A
<< <<<<
C C C C
Time ------------------------------------------------------------------------------>
Current ideas / guides
The radio communications are unacknowledged/broadcast beacons, so confirmation of the RF link is impossible. However, if the nodes included in their own beacon the identifiers of their neighbours they can kind of share enough information.
e.g. In example 2 above, if A can hear B, then A's beacon could include "B is 250ms after me". Likewise, if B can hear A, then B's beacon could include "A is 750ms after me".
With this style of information this problem starts to look a lot like a network graph problem, where each node can build outwards from itself based on the nodes it can hear and the nodes they can hear in turn. I have looked into things like Spanning Tree Protocol for inspiration but haven't had much luck yet.
The problem
Although it looks like a network graph problem, the issue is really how to shift the timing of the beacons.
In essence, the algorithm answers the question "Should I move my own beacon? If yes, which direction?", but it takes into account:
How hard it is for neighbours to move (i.e. if they already have a well optimised neighbourhood)?
How will my neighbours be moving (i.e. will we both move apart, or should just one of us move)?
Actual questions
Sooooo, after quite a lot of text, my questions really are:
Are there any examples of this behaviour anyone knows about?
Do you think that the graph creation is a good/bad idea?
Do you think the graph will just get in the way of what I'm really trying to do - space temporally?
Is gradual convergence a good idea?
Is the virtual slot idea a good one?
Does this have parallels to STP, but that we could have multiple root nodes?
BTW: I'm not interested in Radio/MAC layer solutions to this, it really is an application issue!
EDIT: The End Result
I never got around to solving this "slotting" idea. Instead our nodes just moved away from any conflicts. I.e. instead of trying to develop a solution to the slotting algorithm we just made something that would attempt to space equally but that could result in oscillations.
I suggest that you break this up into two distributed algorithms that you can find written up.
1) Distributed clock synchronization. Have all nodes agree on a common time base, using algorithms such as those used by NTP. At this point, and maybe for some time, there may not be a common agreement in time to separate different transmissions, but you can use http://en.wikipedia.org/wiki/ALOHAnet#The_ALOHA_protocol while this is a problem.
2) Distributed graph colouring. One starting point is http://en.wikipedia.org/wiki/Graph_coloring#Parallel_and_distributed_algorithms. I don't even recognise the algorithms named there, but it at least looks like there is work you can refer to. Under "Decentralized algorithms" they also reference applications to Wireless Channel Allocation, so some of this might be quite close to what you want.

Algorithm to meta ranking of ranked results

Imagine you ask your team mates to do a election on who should organize the next barbecue. Your team is about 120 Persons and you want to select 3 persons out of a pool of 6 persons to do that job.
Each of the 120 persons can vote for up to 3 persons by ranking them: 1st best person is X, 2nd best person Y, 3rd best person is Z.
At the end all votes should be aggregated in a ranked result listing.
| Candidate | Voter 1 | Voter 2 | Voter 3 |
-------------------------------------------
| A | 1. Pos | | 2. Pos |
| B | 3. Pos | 1. Pos | 3. Pos |
| C | 2. Pos | 2. Pos | |
| D | | 3. Pos | |
| E | | | |
| F | | | 1. Pos |
-------------------------------------------
If there where no ranking done by the voters and each vote is equal it would be nice to aggregate the result. B got 3 votes, A and C got 2 votes. All other got less votes. The winner are: A,B and C.
I do not know what algorithms exist to aggregate ranked data and i do not know what the result should look like. F got a vote for pos.1, that is good, but A and B got such a vote too. From my point of view A and B are better, because they got more votes. But is A better than B? A got a pos.2 but B got 2 times pos.3, what should be ranked higher? Is 2 times pos.2 better than 1 time pos.1 and 2 times pos.3?
Sounds like implementing a meta search engine ranking algo. What algorithms exists? What algo should I use?
As you asked "What should i use?" i can recommend that group of methods called "Condorcet methods", as Terje D. mentioned. If you do not want to learn more about the complex theory of election methods i can recommend one of the condorcet methods: "Schulze method" (also known as: path winner or beatpath winner). This is e.g. used by Debian, KDE and Pirates Party of Germany.
You can use this online ballot to get a ad hoc solution to your problem: https://modernballots.com/elections/qm65cnts/vote/
If you want to implement it into your company website (intranet, or whatever) i recommend you contribute to an existing project. If you are a PHP developer check this out: https://bitbucket.org/robla/electowidget/src/14581ac7a5f2/lib/methods/SchulzeMethod.php
Electowidget was initially a plugin for MediaWiki. Maybe it is a good point to start and maybe you want to contribute some changes to make it a library.
Maybe just do it like this: Assign 3 points to each first place, 2 points to each second place, and 1 point to each 3rd place. Then check which candidates have most points.

Fast way to search based on non-literal comparison

Fast way to search based on non-literal comparison
I am developing a small search over rather large data sets, basically all strings. The relation between the table fields are simple enough, though the comparison mustn’t be literal. i.e. it should be able to correlate “filippo“, “philippo“, “filipo“ and so forth.
I have found a few ways it could be done, very frequently stumbling on Levinstein distance (this, here and here), though I am not sure it is practical on my specific case.
In a nutshell I have two tables, a small one with “search keys“ and a more massive one in which the search should be performed. Both tables have the same fields and they both have the same "meaning". E.g.
KEYS_TABLE
# | NAME | MIDNAME | SURNAME | ADDRESS | PHONE
1 | John | Fake | Doe | Sesame St. | 333-12-32
2 | Ralph | Stue | Michel | Bart. Ghost St. | 778-13000
...
and
SEARCH_TABLE
# | NAME | MIDNAME | SURNAME | ADDRESS | PHONE
...
532 | Jhon | F. | Doe | Sesame Street | 3331232
...
999 | Richard | Dalas | Doe | Sesame St. | 333-12-32
All I want to do is os obtain some sort of metric, or rank for each given record on KEYS_TABLE, report all records from SEARCH_TABLE above a certain relevance (defined either by the metric or simply some "KNN" like method).
I say that Levinstein distance might not be practical because it would require to calculate for every field in every row in KEYS_TABLE x SEARCH_TABLE. Considering that SEARCH_TABLE has about 400 million records and KEYS_TABLE varies from 100k to 1mil, the resulting number is way too large.
I was hoping there was some way I could previously enrich both tables, or some simpler (cheaper) way to perform the search.
Worth mentioning that I am allowed to transform the data at will. e.g. normalise St. to st, Street to st, remove special chars and so on.
What would be my options?
One approach (heuristic!) I can think about is:
In addition to the original fields in the table, for each field also store its normalized form obtained by some stemming algorithm. If you are using java, lucene's EnglishAnalyzer might help you with this step.
Do an exact comparison using the standard methods to find for each entry in table1 a list of candidates. An entry e2 in table2 will be a candidate to entry e1 in table1 if they have some common field where the normalized form matches the regular form. That can be done efficiently using some data structure that allows quick string searches - there are plenty of these.
For each entry in e1 - find the "best" candidate/s for it in the list, using the exact metric you chose (for example your suggested leneshtein distance)
You might want to do some post-processing to make sure you don't have two elements in table1 mapped to the same element in table2, if that's an issue.
Depending on what misspellings are likely, you might be able to use Soundex or Metaphone for your searches.

Open space sitting optimization algorithm

As a result of changes in the company, we have to rearrange our sitting plan: one room with 10 desks in it. Some desks are more popular than others for number of reasons. One solution would be to draw a desk number from a hat. We think there is a better way to do it.
We have 10 desks and 10 people. Lets give every person in this contest 50 hypothetical tokens to bid on the desks. There is no limit of how much you bid on one desk, you can put all 50, which would be saying "I want to sit only here, period". You can also say "I do not care" by giving every desk 5 tokens.
Important note: nobody knows what other people are doing. Everyone has to decide based only on his/her best interest (sounds familiar?)
Now lets say we obtained these hypothetical results:
# | Desk# >| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | Alise | 30 | 2 | 2 | 1 | 0 | 0 | 0 | 15 | 0 | 0 | = 50
2 | Bob | 20 | 15 | 0 | 10 | 1 | 1 | 1 | 1 | 1 | 0 | = 50
...
10 | Zed | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | = 50
Now, what we need to find is that one (or more) configuration(s) that gives us maximum satisfaction (i.e. people get desks they wanted taking into account all the bids and maximizing on the total of the group. Naturally the assumption is the more one bade on the desk the more he/she wants it).
Since there are only 10 people, I think we can brute force it looking into all possible configurations, but I was wondering it there is a better algorithm for solving this kind of problems?
You seem to be looking at the Assignment Problem which can be solved using Hungarian Algorithm. This is a well researched problem and you will probably find code on the web, ready to use.
In your case you can use cost = 50 - bid and use the above (any solution to assignment problem).
Even faster, if you have Excel you should have a version of SOLVER available as well. Just set up your bid matrix (10x10 with bids), assignment matrix (10x10 with 0/1 assignments), use sumproduct(bids,assignments) to calculate the value of an assignment, make that your objective function, and add constraints so the there's only one assignment of people to desks and desks to people. Make sure you have the options> "linear model" box checked and "assume non-negative" and solve away ! I just set up a sample 10x10 problem - seems to work OK.

Resources