I am required to solve a specific problem.
I'm given a representation of a social network.
Each node is a person, each edge is a connection between two persons. The graph is undirected (as you would expect).
Each person has a personal "affinity" for buying a product (to simplify things, let's say there's only one product involved in this whole problem).
In each "step" in time, each person, independently, chooses whether to buy the product or not.
There's probability invovled here. A few parameters are taken into account:
His personal affinity for the product,
The percentage of his friends that already bought the product
The gain for a person buying the product is 1 dollar.
The problem is to point out X persons (let's say, 5 persons) that will receive the product in step 0, and will maximize the total expected value of the gain after Y steps (let's say, 10 steps)
The network is very large. It's not possible to simulate all the options in a naive way.
What tool / library / algorithm should I be using?
Thank you.
P.S.
When investigating this matter in google and wikipedia, a few terms kept popping up:
Dynamic network analysis
Epidemic model
but it didn't help me to find an answer
Generally, people who have the most neighbours have the most influence when they buy something.
So my heuristic would be to order people first by the number of neighbours they have (in decreasing order), then by the number of neighbours that each of those neighbours has (in order from highest to lowest), and so on. You will need at most Y levels of neighbour counts, though fewer may suffice in practice. Then simply take the first X people on this list.
This is only a heuristic, because e.g. if a person has many neighbours but most or all of them are likely to have already bought the product through other connections, then it may give a higher expectation to select a different person having fewer neighbours, but whose neighbours are less likely to already own the product.
You do not need to construct the entire list and then sort it; you can construct the list and then insert each item into a heap, and then just extract the highest-scoring X people. This will be much faster if X is small.
If X and Y are as low as you suggest then this calculation will be pretty fast, so it would be worth doing repeated runs in which instead of starting with the first X people owning the product, for each run you randomly select the initial X owners according to a probability that depends on their position in the list (the further down the list, the lower the probability).
Check out the concept of submodularity, a pretty powerful mathematical concept. In particular, check out slide 19, where submodularity is used to answer the question "Given a social graph, who should get free cell phones?". If you have access, also read the corresponding paper. That should get you started.
Related
My problem is the following:
Me and my team are moving to another part of the office and we have to decide everybody's place to sit. However, everybody has priorities. I would like to find an algorithm which helps us to distribute the seats in a way that everybody is satisfied. (Or the most of them at least.)
I've started to implement my own algorithm where I ask 3 preferred options (the team consists of 10 people and there are 10 places) from everybody and consider there "seniority" (the length of the time they have spent in the team) as a rank between them.
However, I've stuck without any luck, tried to browse the internet for an algorithm which solves a similar problem but didn't find any.
What would be the best way to solve this? Is there any
generally known algorithm which solves this or a similar problem?
Thank you!
What first comes to mind for me is the stable marriage problem. Here's the problem statement for the original algorithm:
Given n men and n women, where each person has ranked all members of the opposite sex in order of preference, marry the men and women together such that there are no two people of opposite sex who would both rather have each other than their current partners. When there are no such pairs of people, the set of marriages is deemed stable.
Please read up on the Gale–Shapley algorithm, which is what I'll adapt for this problem.
Have each worker make a list of their rankings for all the spots. These will be the "men". Then, each spot will use the seniority ranking as their rankings for the "men". The spots will be the "women" in the Gale-Shapley algorithm.
You will get a seat assignment that has no "unstable marriage". Here's what an unstable marriage is:
There is an element A of the first matched set which prefers some given element B of the second matched set over the element to which A is already matched, and
B also prefers A over the element to which B is already matched.
In this context, an unstable marriage means that there is a worker-seat between W1 and S1 assignment such that another worker, W2, has ranked S1 higher. Not only that, S1 has also ranked W2 higher. Since the seats made their list based off the seniority list, it means that W2 has higher seniority.
In effect, this means that you'll get a seating assignment such that no worker has a seat that someone else with higher seniority wants "more".
The bottom of that Wiki article mentions packages in R and Python that have already implemented the algorithm, so it's just up to you to input the preference lists.
Disclaimer: This is probably not the most efficient algorithm. All the seats have the same ranking list, so there's probably a shortcut somewhere. However, it's easier to use a cannon to kill a fly, if the cannon is already written in R/Python for you. Also, this is the only algorithm I remember from uni, so this is the only hammer I have for any nail.
I decided to implement a brute force solution as lots of the comments suggested.
So:
I asked everybody from the team to give a preference order between the seats (10 to 1, what I use as score the "teamMember-seat" pairings, 10 is the highest score)
collected all of the "teamMember-seat" pairings with scores e.g. name:Steve, seat:seat1, score:5 (the score is from the given order from the previous step)
generated all the possible sitting combination from these
e.g.
List1: [name:Steve seat:seat1 score:5], [name:John seat:seat2 score:3] ... [name:X seat:seatY score:X]
List2: [name:Steve seat:seat2 score:4], [name:John seat:seat1 score:4] ... [name:X seat:seatY score:X]
...
ListX: [],[]...
chose the "teamMember-seat" list(s) with the highest score (score of the list is calculated by summing the scores of the "teamMember-seat" pairings)
if there are 2 lists with equal scores, then the algorithm choose that one where the most senior team members get the most preferred seats of them
if still there are more then one list (combination) the algorithm choose one randomly
I'm sure there are some better algorithms to do this as some of you suggested but I've run out of time.
I didn't post the code since it is really long and not too complicated to implement. However, if you need it, don't hesitate to drop a private message.
Say you have a class with 5 sections: A,B,C,D,E. Each section meets at different times, thus students registering for the course will have preference for which section they will take (they can only take one section). When students register for the course, they list 3 sections they would prefer to take, in order of preference.
Each section has n students. Let's say for simplicity that exactly n*5 students have registered for the course.
So, the question is: How do you efficiently match students to their preferred section?
I've seen some questions with similar matching scenario questions, but none quite fit and I'm afraid I don't know enough about algorithms to make up my own. BTW, this is a real problem and I know the department in question takes a few days to do it by hand.
To determine whether each student can be assigned to a preferred section, construct an integer-valued maximum flow in the following network, where the three Xs stand for capacity-1 arcs from students to the sections they prefer (polynomial-time via, e.g., the push-relabel algorithm). There's a solution if and only if the maximum flow moves m = n*5 units; then the assignments are determined by which arcs from each student is saturated.
capacity-1 arcs capacity-n arcs
| |
v v
student 1
/ student 2 section1
/ . X section2 \
source < . X section3 > sink
\ . X section4 /
\ student m-1 section5
student m
To take the order of preference into account, switch to solving a min-cost flow problem, still poly-time solvable (though you may find the network simplex mode of a general-purpose LP solver easier to use) which allows a cost to specified for each arc. Choose a score for each preference level depending on what you think is fair.
I'm positive that this has been asked before, but scheduling problems are like snowflakes, and I can't find the old question by keywords alone.
Maybe you could randomly distribute them into sections. Next you select random pairs of student and consider if swapping them improves the distribution (does it increase the match with their preferences?). You can iterate until there is no improvement possible for X iterations.
This is obviously a very naive approach but if your sample is small it might converge quickly. You cannot guarantee you have the optimal solution, but therefore you'd need a brute force approach which is probably not possible.
Is there a scoring system in which if student 1 is in section A the score is 20? (on the other hand if student 2 is in section A, score is 15?
I'm asking since if there's only one spot left for section A, and both student 1 and 2 has section A most preferred, then who ever gets registered first gets the spot. Instead of who ever is best fit (higher score).
If there is no scoring, you can just loop through the students and put them in the section they prefer. If the first one is full, try their second preference, then the next. If all three sections the student prefers are filled, just enroll them to one that isn't filled.
(It'd be different if there is scoring since you have to go with a priority queue for each section and maximize that.)
I was out buying groceries the other day and needed to search through my wallet to find my credit card, my customer rewards (loyalty) card, and my photo ID. My wallet has dozens of other cards in it (work ID, other credit cards, etc.), so it took me a while to find everything.
My wallet has six slots in it where I can put cards, with only the first card in each slot initially visible at any one time. If I want to find a specific card, I have to remember which slot it's in, then look at all the cards in that slot one at a time to find it. The closer it is to the front of a slot, the easier it is to find it.
It occurred to me that this is pretty much a data structures question. Suppose that you have a data structure consisting of k linked lists, each of which can store an arbitrary number of elements. You want to distribute elements into the linked lists in a way that minimizes looking up. You can use whatever system you want for distributing elements into the different lists, and can reorder lists whenever you'd like. Given this setup, is there an optimal way to order the lists, under any of the assumptions:
You are given the probabilities of accessing each element in advance and accesses are independent, or
You have no knowledge in advance what elements will be accessed when?
The informal system I use in my wallet is to "hash" cards into different slots based on use case (IDs, credit cards, loyalty cards, etc.), then keep elements within each slot roughly sorted by access frequency. However, maybe there's a better way to do this (for example, storing the k most frequently-used elements at the front of each slot regardless of their use case).
Is there a known system for solving this problem? Is this a well-known problem in data structures? If so, what's the optimal solution?
(In case this doesn't seem programming-related: I could imagine an application in which the user has several drop-down lists of commonly-used items, and wants to keep those items ordered in a way that minimizes the time required to find a particular item.)
Although not a full answer for general k, this 1985 paper by Sleator and Tarjan gives a helpful analysis of the amortised complexity of several dynamic list update algorithms for the case k=1. It turns out that move-to-front is very good: assuming fixed access probabilities for each item, it never requires more than twice the number of steps (moves and swaps) that would be required by the optimal (static) algorithm, in which all elements are listed in nonincreasing order of probability.
Interestingly, a couple of other plausible heuristics -- namely swapping with the previous element after finding the desired element, and maintaining order according to explicit frequency counts -- don't share this desirable property. OTOH, on p. 2 they mention that an earlier paper by Rivest showed that the expected amortised cost of any access under swap-with-previous is <= the corresponding cost under move-to-front.
I've only read the first few pages, but it looks relevant to me. Hope it helps!
You need to look at skip lists. There is a similar problem with arranging stations for a train system where there are express trains and regular trains. An express train stops only at express stations while regular trains stop at regular stations and express stations. Where should the express stops be placed so that one can minimize the average number of stops when travelling from a start station to any station.
The solution is to use stations at ternary numbers (i.e., at 1, 3, 6, 10 etc where T_n = n * (n + 1) / 2).
This is assuming all stops (or cards) are equally likely to be accessed.
If you know the access probabilities of your n cards in advance and you have k wallet slots and accesses are independent, isn't it fairly clear that the greedy solution is optimal? That is, the most frequently-accessed k cards go at the front of the pockets, next-most-frequently accessed k go immediately behind, and so forth? (You never want a lower-probability card ranked before a higher-probability card.)
If you don't know the access probabilities, but you do know they exist and that card accesses are independent, I imagine sorting the cards similarly, but by number-of-accesses-seen-so-far instead is asymptotically optimal. (Move-to-front is cool too, but I don't see an obvious reason to use it here.)
Perhaps you get something interesting if you penalise card moves as well; if I have any known probability distribution on card accesses, independent or not, I just greedily re-sort the cards every time I do an access.
Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.
I'm trying to write a program to automate a ticket draft.
We have a certain number of season ticket passes and want to split up the tickets among a group of people. There are X number of games, Y number of season passes, and Z number of people. Each of Z people has ranked the X games.
My code basically goes through the draft order and back picking out the tickets from their ranking if available, otherwise, picking the next highest ranking. For the most part it works. The problem is, there's a point where most of the tickets are taken and the remaining tickets left are ones you already have so you just don't pick them. People therefore have different numbers of tickets. Is there a good way to get around this?
If you have X games and Y season passes, presumably there are X*Y tickets available to give to the Z people, right?
This sounds like it could be treated as an optimization problem, but to do so you have to identify your main goals? I'm guessing you want each person to receive X*Y / Z tickets (split them evenly), but maybe not. I'm guessing you also want to maximize the aggregate satisfaction (defined in some way according to the rankings) in tickets. You would probably want to give a large penalty in satisfaction for a person if he receives more than 1 ticket for the same game. I believe this last aspect might be why the straight draft approach is not the best, but I could be mistaken.
Once you are clear on what you are trying to optimize (if this is indeed an optimization problem), then you can consider the best approach to the problem. This could be your own custom-built solution, or you could try an existing technique (genetic algorithm, etc.). Before doing so though it is important that you frame the problem properly.
If there were no preferences involved, this would be a straight min-cut max flow problem. http://en.wikipedia.org/wiki/Maximum_flow_problem, as follows:
Create a source vertex A. From A, create Z vertices, one for each person. The capacity can be infinite (or very, very large). Create a sink B, and create X vertices, one for each game, linked to B; the capacity should be Y (you have Y tickets per game). From each person, link to each game they've ranked, with capacity 1.
If you look at the wiki link above, there are about 10 algorithms to solve this basic problem. Find one you understand and can implement yourself, because you'll need to modify it slightly. I'm not familiar with all of them, but the ones I know about have a step 'pick an edge' or 'pick a path.' You should modify the 'how you pick an edge' logic to take the priority ordering of the games into account. I'm not sure exactly what the ordering should be (you'll probably need to experiment), but if you say the lowest ranked game is 1, the next is 2, up to X, then a score like 'ranking of the edge - number of games the person is already signed up for' might work.
I think this is a variant of the Stable Marriage Problem or the Stable Roommates Problem for which there are known algorithms for solving.