As a result of changes in the company, we have to rearrange our sitting plan: one room with 10 desks in it. Some desks are more popular than others for number of reasons. One solution would be to draw a desk number from a hat. We think there is a better way to do it.
We have 10 desks and 10 people. Lets give every person in this contest 50 hypothetical tokens to bid on the desks. There is no limit of how much you bid on one desk, you can put all 50, which would be saying "I want to sit only here, period". You can also say "I do not care" by giving every desk 5 tokens.
Important note: nobody knows what other people are doing. Everyone has to decide based only on his/her best interest (sounds familiar?)
Now lets say we obtained these hypothetical results:
# | Desk# >| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | Alise | 30 | 2 | 2 | 1 | 0 | 0 | 0 | 15 | 0 | 0 | = 50
2 | Bob | 20 | 15 | 0 | 10 | 1 | 1 | 1 | 1 | 1 | 0 | = 50
...
10 | Zed | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | = 50
Now, what we need to find is that one (or more) configuration(s) that gives us maximum satisfaction (i.e. people get desks they wanted taking into account all the bids and maximizing on the total of the group. Naturally the assumption is the more one bade on the desk the more he/she wants it).
Since there are only 10 people, I think we can brute force it looking into all possible configurations, but I was wondering it there is a better algorithm for solving this kind of problems?
You seem to be looking at the Assignment Problem which can be solved using Hungarian Algorithm. This is a well researched problem and you will probably find code on the web, ready to use.
In your case you can use cost = 50 - bid and use the above (any solution to assignment problem).
Even faster, if you have Excel you should have a version of SOLVER available as well. Just set up your bid matrix (10x10 with bids), assignment matrix (10x10 with 0/1 assignments), use sumproduct(bids,assignments) to calculate the value of an assignment, make that your objective function, and add constraints so the there's only one assignment of people to desks and desks to people. Make sure you have the options> "linear model" box checked and "assume non-negative" and solve away ! I just set up a sample 10x10 problem - seems to work OK.
Related
I am looking for an optimal way to transfer funds between accounts to make sure that everyone has the same amount on their account. I've calculated correctly number that should be a result in every account but I am searching for an algorithm for who to who should transfer and how much funds (without central bank) so everyone results with the same balance.
Example
+--------+---------+
| person | balance |
+--------+---------+
| A | 7 500 |
| B | -2 500 |
| C | -10 000 |
| D | 15 000 |
+--------+---------+
In this example everyone should end up with balance o 2 500. To achieve that:
Person A should transfer 5 000 to person B
Person D should transfer 12 500 to person C
To sum it up data I have is:
number of people (>= 2)
starting balance of every person
what balance should every person have after transfers
Is there any algorithm for that? Is that an NP-complete problem?
This problem is NP-hard. It's essentially equivalent to this other problem:
Given a group of people and an amount of credits and debits, find the minimum number of transfers between those people to settle all credits and debits.
Here, your goal is to make everyone's total equal to the average, which basically is the same problem as the one above once you subtract the average total from everyone's account.
I'll preface this with saying I am extremely new to neural networks and their operation. I've done a fair bit of reading, played with a few cloud based tools (Cortana and AWS), but beyond that, I am not well adept in the algorithms, the kind of neural networks etc...
I'm looking for advice on what systems / tools / kinds of algorithms I can use to achieve the below.
Problem Description
I have a data set that contains time series data for a number of users. The data set can contain a variable number of unique users (prob max out at 150), and each user has 4 different sets of time series data for four different variables. Example data set below
V = Variable
User | Time | V1 | V2 | V3 | V4
1 | 12.00am | 13 | 1045 | 12.2 | 52.4
1 | 12.01am | 12 | 1565 | 11.9 | 50.3
2 | 12.00am | 2 | 15434 | 1.93 | 47.2
2 | 12.01am | 2.02 | 17434 | 1.98 | 43.1
And so on for x users and hundreds of data points for each user.
Required Output
By parsing the data, I want to be able to train the system to either give back a binary TRUE or FALSE for a user based on the input, or alternatively, a probability % of the user being TRUE.
The binary is effectively a TRUE or FALSE result. There can only be one TRUE of all 10 users. I think getting back a % of chance of being TRUE is probably the simplest form? I may be wrong.
Input Format
End point is to have an API that I can send the data set to and it returns user and their probability (or the binary TRUE | FALSE result).
Systems
I would prefer to be able to do this on a 3rd party service as opposed to having to build my own systems to do the processing, but not a necessity.
Training Data
I have years of data to be able to train the system, hundreds of thousands of real user sets and so on.
To Wrap It Up
Looking for advice on the what and the how to predict a binary outcome from multiple time series data sets.
Really appreciate any assistance and guidance here.
Thanks
Russ
I'm working on a similar problem (I am no expert either) but I'll share my approach in case it answers your "what" part of the question.
My solution was to transform the dataset so I ended up with a problem that could be solved with traditional classification algorithms (Random Forest, boosting, ...)
This approach requires that the data is labeled. Each row of the transformed dataset will represent the information associated to each TRUE or FALSE outcome in the training dataset. Each row will be an unique event and will have:
1 column with the response
p sets of columns (one set for each of the p original variables)
k variables to indicate seasonality
Each set of the p sets of columns will consist of the variable at time t (time when you recorded the response of that row), the variable at time t-1 (lag1), ..., and the variable at time t-T (lagT).
Example:
Original dataset (I've retained only V1 and V2 and added an outcome variable)
User
Time
V1
V2
outcome
1
12.00am
13
1045
FALSE
1
12.01am
12
1565
TRUE
Transformed dataset
ID
V1_lag1
V1_lag0
V2_lag1
V2_lag0
outcome
event_id
13
12
1045
1565
TRUE
With this set up you could fit a model that would predict the probability of TRUE at time t for a new observation, based on V1 and V2 evaluated at time t and V1 and V2 evaluated at lag1 (t-1min).
You could also create new features that would describe the variables better (See Features for time series classification).
And you should incorporate the seasonality somehow if the variables show a seasonality pattern:
ID
V1_lag1
V1_lag0
V2_lag1
V2_lag0
day
hour
outcome
event_id
13
12
1045
1565
wed
12am
TRUE
My problem involves measuring the impact of four activities undertaken by a population of 1,000 individuals that is attempting to lose weight.
The four activities are: (a) Eating healthy food; (b) Walking for an hour daily; (c) Meditation daily for 20-minutes; and (d) In-house physical exercise for 30-minutes.
For the sake of simplicity, let us assume that there are 1,000 participants who signed up on January 1, 2015, and their weights were measured then. Furthermore, let us assume that they will diligently commit to doing the same activity for the entirety of a quarter (i.e., if they commit to eating healthy food in Q1 they don't undertake any other activity; however they may change the activity at the start of Q2 or choose to continue what they did in Q1).
Finally, on December 31, 2015, I catch them before they head to Times Square to celebrate the onset of new year, and weigh them.
So my table looks something like :
Individual | Initial Weight | Q1 | Q2 | Q3 | Q4 | Final Weight
A-1 | 183 | A | B | A | C | 176
A-2 | 265 | D | C | B | B | 223
A-3 | 331 | A | A | A | D | 322
.
.
A-1000 | 257 | D | B | C | A | 228
My goal is to measure the impact of each activity's contribution to the weight loss across the population, keeping in mind that there is a distinct possibility that the order of activities undertaken could have an impact.
( In my real problem one of complexities that I haven't spelt out is that instead of doing the same activity throughout the quarter, individuals would have actually done any of those activity on a daily basis.)
Any thoughts would be appreciated.
Stackoverflow is mainly for problems that involve code. Like you have a piece of code that doesn't work for a specific language.
Try posting this question on Stack Exchange users there would love to answer problems like these.
I have a rather interesting problem that has probably been solved before, but I'm not really sure where to look.
The situation
I am developing a system that consists of:
mobile wireless nodes that move around
the nodes broadcast a beacon every 1000 milliseconds
the nodes can hear the beacons from other such nodes
the beacon includes information about the sender (that is unique, like a MAC address)
the beacon can include other information to a limit (such as info about other neighbours)
The requirements
I am trying to develop an algorithm & implementation that:
gets neighbouring nodes to align themselves on 'virtual slots' in the 1000ms (given +/- 10ms tolerance)
tolerates up to 4 neighbouring nodes within radio range of each other (i.e. the number of slots would be 4, i.e. 250ms +/-10ms)
can work through observation alone (i.e. is fully decentralised)
can tolerate sets of already aligned neighbours coming in range of each other (i.e. 2 + 2 = 4)
gets nodes to converge on the slots gradually by lengthening or shortening beacon period to 1001/999ms respectively
makes no assumptions about the bidirectionality of the radio link
E.g. some example topologies that should work are:
4 nodes that can all see each other
A-B
|X|
C-D
long lines that can see at most 4 neighbours
A-B-C-D-E-F-G-H-I
|
X-Y-Z
Examples of how the algorithm COULD work
Example 1 -A & B close to each other, algorithm decides B should move towards the right
1000 2000 3000 4000
0 250 500 750 0 250 500 750 0 250 500 750 0 250 500 750 0
| | | | | | | | | | | | | | | | |
A A A A A
>> >>>>
B B B B
Time ------------------------------------------------------------------------------>
Example 2 - A & B aligned, algorithm decides C should move towards the left
1000 2000 3000 4000
0 250 500 750 0 250 500 750 0 250 500 750 0 250 500 750 0
| | | | | | | | | | | | | | | | |
A B A B A B A B A
<< <<<<
C C C C
Time ------------------------------------------------------------------------------>
Current ideas / guides
The radio communications are unacknowledged/broadcast beacons, so confirmation of the RF link is impossible. However, if the nodes included in their own beacon the identifiers of their neighbours they can kind of share enough information.
e.g. In example 2 above, if A can hear B, then A's beacon could include "B is 250ms after me". Likewise, if B can hear A, then B's beacon could include "A is 750ms after me".
With this style of information this problem starts to look a lot like a network graph problem, where each node can build outwards from itself based on the nodes it can hear and the nodes they can hear in turn. I have looked into things like Spanning Tree Protocol for inspiration but haven't had much luck yet.
The problem
Although it looks like a network graph problem, the issue is really how to shift the timing of the beacons.
In essence, the algorithm answers the question "Should I move my own beacon? If yes, which direction?", but it takes into account:
How hard it is for neighbours to move (i.e. if they already have a well optimised neighbourhood)?
How will my neighbours be moving (i.e. will we both move apart, or should just one of us move)?
Actual questions
Sooooo, after quite a lot of text, my questions really are:
Are there any examples of this behaviour anyone knows about?
Do you think that the graph creation is a good/bad idea?
Do you think the graph will just get in the way of what I'm really trying to do - space temporally?
Is gradual convergence a good idea?
Is the virtual slot idea a good one?
Does this have parallels to STP, but that we could have multiple root nodes?
BTW: I'm not interested in Radio/MAC layer solutions to this, it really is an application issue!
EDIT: The End Result
I never got around to solving this "slotting" idea. Instead our nodes just moved away from any conflicts. I.e. instead of trying to develop a solution to the slotting algorithm we just made something that would attempt to space equally but that could result in oscillations.
I suggest that you break this up into two distributed algorithms that you can find written up.
1) Distributed clock synchronization. Have all nodes agree on a common time base, using algorithms such as those used by NTP. At this point, and maybe for some time, there may not be a common agreement in time to separate different transmissions, but you can use http://en.wikipedia.org/wiki/ALOHAnet#The_ALOHA_protocol while this is a problem.
2) Distributed graph colouring. One starting point is http://en.wikipedia.org/wiki/Graph_coloring#Parallel_and_distributed_algorithms. I don't even recognise the algorithms named there, but it at least looks like there is work you can refer to. Under "Decentralized algorithms" they also reference applications to Wireless Channel Allocation, so some of this might be quite close to what you want.
Imagine you ask your team mates to do a election on who should organize the next barbecue. Your team is about 120 Persons and you want to select 3 persons out of a pool of 6 persons to do that job.
Each of the 120 persons can vote for up to 3 persons by ranking them: 1st best person is X, 2nd best person Y, 3rd best person is Z.
At the end all votes should be aggregated in a ranked result listing.
| Candidate | Voter 1 | Voter 2 | Voter 3 |
-------------------------------------------
| A | 1. Pos | | 2. Pos |
| B | 3. Pos | 1. Pos | 3. Pos |
| C | 2. Pos | 2. Pos | |
| D | | 3. Pos | |
| E | | | |
| F | | | 1. Pos |
-------------------------------------------
If there where no ranking done by the voters and each vote is equal it would be nice to aggregate the result. B got 3 votes, A and C got 2 votes. All other got less votes. The winner are: A,B and C.
I do not know what algorithms exist to aggregate ranked data and i do not know what the result should look like. F got a vote for pos.1, that is good, but A and B got such a vote too. From my point of view A and B are better, because they got more votes. But is A better than B? A got a pos.2 but B got 2 times pos.3, what should be ranked higher? Is 2 times pos.2 better than 1 time pos.1 and 2 times pos.3?
Sounds like implementing a meta search engine ranking algo. What algorithms exists? What algo should I use?
As you asked "What should i use?" i can recommend that group of methods called "Condorcet methods", as Terje D. mentioned. If you do not want to learn more about the complex theory of election methods i can recommend one of the condorcet methods: "Schulze method" (also known as: path winner or beatpath winner). This is e.g. used by Debian, KDE and Pirates Party of Germany.
You can use this online ballot to get a ad hoc solution to your problem: https://modernballots.com/elections/qm65cnts/vote/
If you want to implement it into your company website (intranet, or whatever) i recommend you contribute to an existing project. If you are a PHP developer check this out: https://bitbucket.org/robla/electowidget/src/14581ac7a5f2/lib/methods/SchulzeMethod.php
Electowidget was initially a plugin for MediaWiki. Maybe it is a good point to start and maybe you want to contribute some changes to make it a library.
Maybe just do it like this: Assign 3 points to each first place, 2 points to each second place, and 1 point to each 3rd place. Then check which candidates have most points.