What class of algorithms can be used to solve this? - algorithm

EDIT: Just to make sure someone is not breaking their head on the problem... I am not looking for the best optimal algorithm. Some heuristic that makes sense is fine.
I made a previous attempt at formulating this and realized I did not do a great job at it so I removed that question. I have taken another shot at formulating my problem. Please feel free to provide any constructive criticism that can help me improve this.
Input:
N people
k announcements that I can make
Distance that my voice can be heard (say 5 meters) i.e. I may decide to announce or not depending on the number of people within these 5 meters
Goal:
Maximize the total number of people who have heard my k announcements and (optionally) minimize the time in which I can finish announcing all k announcements
Constraints:
Once a person hears my announcement, he is be removed from the total i.e. if he had heard my first announcement, I do not count him even if he hears my second announcement
I can see the same person as well as the same set of people within my proximity
Example:
Let us consider 10 people numbered from 1 to 10 and the following pattern of arrival:
Time slot 1: 1 (payoff = 1)
Time slot 2: 2 3 4 5 (payoff = 4)
Time slot 3: 5 6 7 8 (payoff = 4 if no announcement was made previously in time slot 2, 3 if an announcement was made in time slot 2)
Time slot 4: 9 10 (payoff = 2)
and I am given 2 announcements to make. Now if I were an oracle, I would choose time slots 2 and time slots 3 because then 7 people would have heard (because 5 already heard my announcement in Time slot 2, I do not consider him anymore). I am looking for an online algorithm that will help me make these decisions on whether or not to make an announcement and if so based on what factors. Does anyone have any ideas on what algorithms can be used to solve this or a simpler version of this problem?

There should be an approach relying upon a max-flow algorithm. In essence, you're trying to push the maximum amount of messages from start->end. Though it would be multidimensional, you could have a super-sink, which connects to each value of t, then have each value of t connect to the people you can reach at this time and then have a super-sink. This way, you simply have to compute a max-flow (with the added constraint of no more than k shouts, which should be solvable with a bit of dynamic programming). It's a terrifically dirty way to solve it, but it should get the job done deterministically and without the use of heuristics.

I don't know that there is really a way to solve this or an algorithm to do it the way you have formulated it.
It seems like basically you are trying to reach the maximum number of people with exactly 2 announcements. But without knowing any information about the groups of people in advance, you can't really make any kind of intelligent decision about whether or not to use your first announcement. Your second one at least has the benefit of knowing when not to be used (i.e. if the group has no new members then you can know its not worth wasting the announcement). But it still has basically the same problem.
The only real way to solve this is to use knowledge about the type of data or the desired outcome to make guesses. If you know that groups average 100 people with a standard deviation of 10, then you could just refuse to announce if less than 90 people are present. Or, if you know you need to reach at least 100 people with two announcements, you could choose never to announce to less than 50 at once. Obviously those approaches risk never announcing at all if the actual data does not meet what you would expect. But that's always going to be a risk, since you could get 1 person in the first group and then 0 in all of the rest, no matter what you do.
Or, you could try more clearly defining the problem, I have a hard time figuring out how to relate this to computers.

Lets start my trying to solve the simplest possible variant of the problem: Lets assume N people and K timeslots, but only one possible announcement. Lets also assume that each person will only ever stay for one timeslot and that each person who hasn't yet shown up has an equally probable chance of showing up at any future timeslot.
Given these simplifications, at each timeslot you look at the payoff of announcing at the current timeslot and compare to the chance of a future timeslot having a higher payoff, eg, lets assume 4 people 3 timeslots:
Timeslot 1: Person 1 shows up, so you know you could get a payoff of 1 by announcing, but then you have 3 people to show up in 2 remaining timeslots, so at least one of those timeslots is guaranteed to have 2 people, so don't announce..
So at each timeslot, you can calculate the chance that a later timeslot will have a higher payoff than the current by treating the remaining (N) people and (K) timeslots as being N independent random numbers each from 1..k, and calculate the chance of at least one value k being hit more than or equal to the current-payoff times. (Similar to the Birthday problem, but for more than 1 collision) and then you need to decide hwo much to discount based on expected variances. (bird in the hand, etc)
Generalization of this solution to the original problem is left as an exercise for the reader.

Related

Maximise subset selection under constraints

I am working on a scheduling problem for a team of volunteers. I have boiled my problem down to the following algorithmic problem:
I have a matrix with ~60 rows representing volunteers and ~14 columns representing days. Each entry is an integer in the range 0 to 3 inclusive representing how free the volunteer is on that day. I want to choose exactly 4 entries from each column (4 volunteers a day) such that (in order of importance)
A 0-entry is never chosen.
The workload is as spread out as possible (first give everyone one shift, then start giving out second shifts, etc. We can expect that most volunteers will only have one shift per 2-week period, and some may even have none.)
The sum over selected entries is maximised (volunteers get days that they prefer).
I want to output a decision matrix that has a 1 whenever a volunteer is chosen for a day, and 0 otherwise. I believe this is an instance of the nurse-rostering problem, so I'm not expecting a fast solution, but I just want to make a brute force algorithm that will work in a reasonable time for my ~60 person team. I'm just really not sure how to start tackling this problem. Is it suited for backtracking, or is there some way to calculate the best placement of each volunteer based on the distribution of his/her day-scores?

How to design an algorithm to put elements into groups with constraints?

I was given a task of putting students into groups (to prepare a coding camp), but with several constraints. Though I've finished the task by hand, I'd like to know is there already exist some algorithms for tasks like this, or how can I design such an algorithm.
Background: 40 students in total, with these attributes:
gender: F/M
grade: Year 1/2
school: School 1/School 2/...
early assessment result: Rank from 1 to 40
Constraints: All of them needs to be satisfied.
Exactly 4 people per group
Each group needs to have at least a girl
Each group needs to have at least a Year 2 student
4 group members needs to come from 4 different schools
Each group needs to have at least a student who ranked top 10 in early assessment
What I'm expecting:
The Best: An existing algorithm/program for these kind of problems
Or, An algorithm for this specific problem
Or at least, Some ideas of creating an algorithm for this specific problem
My thoughts:
Since I've successed in making groups by hand, I know that such a solution indeed exists for my current dataset. But if I need an algorithm to find a solution for me, it should first try to check whether a solution even exists, by check if the number of girl / Year 2 students is greater than 10 (with pigeonhole principle), and some other conditions. And obviously, Constraint 5 is the easiest, and can provide a base solution for the rest. However, I still can not find a systematic way of doing it. Perhaps bruteforce and randomization can help? I'm not sure.
And sorry, since the data is confidential, I can not post it.
Update: After consulting a friend, here is a possible method:
First put the top 1 to 10 into 10 different groups.
Then iterate through groups. If the only person in the group is a boy/girl, try to add a girl/boy from a different school.
Then the problem size is reduced from 2^40 to 2^20, making bruthforce a viable solution.

Any idea for an Algorithm to distribute work between users, ensuring that all have at least "nice work"?

I have been struggling for the past months for an algorithm of something I usually do by hand. I'm pretty sure there must be some ideas there, but I haven't found any.
The problem is the following:
Lets say we have X users and X work to do (the same amount of users and work). Some of this work is nasty, boring, or just exhausting, while other is nice, creative and open. I'd like to be able to generate a work-list to give to each user for a week, having a different position every day (for example). Of course, all the users deserve to have "nice times" and the "boring tasks" have to be done too...
In a really small example:
Tasks = X (boring) / Y (cool)
Users = A / B
Day 1:
A -> X
B -> Y
Day 2:
A -> Y
B -> X
The main idea is to have an even workflow between all the users (so all have bad and good works).
Extra points if it is possible to define that some users are more "special" and deserve "better treats". Also, the tasks may be categorized not just as "good/bad" but with a numbered scale.
What I have thought so far:
The best algorithm or idea I came with is to sort the users and give the prefered ones first the best places, until I distribute all the work. Then for the second day I swap the worst with the best and so on (1 with X, 2 with X-1...). Here my concerns:
Dont know how to continue for the third day (If I repeat the same idea, 1 will be back to 1, X to X and then they will only do two tasks in their whole week...
The preferred one, number one, gets to do the "worst task" X. Also, the ones in the middle may not change of task (just bad & bad).
Please let me know if I can explain my idea better or if you have any hints in mind. (Graphs, other possible ideas, etc, included).
I belive in addition to assign a status of boring and cool, you could find out the (average) amount of time that every activity takes so that way you could make a balance of work. That's how asembly lines are balance (based on the amount of time), all cells of work must have sort of same amount of time of work assigged every day, so that way some cells of work will have for example 12 to 20 activities while some just one or two, but the amout of time will be aproximatly the same.

Algorithm to find the best possible available times

Here is my scenario,
I run a Massage Place which offers various type of massages. Say 30 min Massage, 45 min massage, 1 hour massage, etc. I have 50 rooms, 100 employees and 30 pieces of equipment.When a customer books a massage appointment, the appointment requires 1 room, 1 employee and 1 piece of equipment to be available.
What is a good algorithm to find available resources for 10 guests for a given day
Resources:
Room – 50
Staff – 100
Equipment – 30
Business Hours : 9AM - 6PM
Staff Hours: 9AM- 6PM
No of guests: 10
Services
5 Guests- (1 hour massages)
3 Guests - (45mins massages)
2 Guests - (1 hour massage).
They are coming around the same time. Assume there are no other appointment on that day
What is the best way to get ::
Top 10 result - Fastest search which meets all conditions gets the top 10 result set. Top ten is defined by earliest available time. 9 – 11AM is best result set. 9 – 5pm is not that good.
Exhaustive search (Find all combinations) - all sets – Every possible combination
First available met (Only return the first match) – stop after one of the conditions have been met
I would appreciate your help.
Thanks
Nick
First, it seems the number of employees, rooms, and equipment are irrelevant. It seems like you only care about which of those is the lowest number. That is your inventory. So in your case, inventory = 30.
Next, it sounds like you can service all 10 people at the same time within the first hour of business. In fact, you can service 30 people at the same time.
So, no algorithm is necessary to figure that out, it's a static solution. If you take #Mario The Spoon's advice and weight the different duration massages with their corresponding profits, then you can start optimizing when you have more than 30 customers at a time.
Looks like you are trying to solve a problem for which there are quite specialized software applications. If your problem is small enough, you could try to do a brute force approach using some looping and backtracking, but as soon as the problem becomes too big, it will take too much time to iterate through all possibilities.
If the problem starts to get big, look for more specialized software. Things to look for are "constraint based optimization" and "constraint programming".
E.g. the ECLIPSe tool is an open-source constraint programming environment. You can find some examples on http://eclipseclp.org/examples/index.html. One nice example you can find there is the SEND+MORE=MONEY problem. In this problem you have the following equation:
S E N D
+ M O R E
-----------
= M O N E Y
Replace every letter by a digit so that the sum is correct.
This also illustrates that although you can solve this brute-force, there are more intelligent ways to solve this (see http://eclipseclp.org/examples/sendmore.pl.txt).
Just an idea to find a solution:
You might want to try to solve it with a constraint satisfaction problem (CSP) algorithm. That's what some people do if they have to solve timetable problems in general (e.g. room reservation at the University).
There are several tricks to improve CSP performance like forward checking, building a DAG and then do a topological sort and so on...
Just let me know, if you need more information about CSP :)

Algorithm for most recently/often contacts for auto-complete?

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete
I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.
What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.
I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.
Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine
Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.
Take a look at Self organizing lists.
A quick and dirty look:
Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.
Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.
It looks like the move to front implementation would best suit your needs.
EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.
Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.
The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.
This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.
Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)
I'd go for something similar to:
NoM = Number of Mail
(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30
Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0
It's just an idea, anyway it is to give different importance to the most and just mailed contacts.
If you want to get crazy, mark the most 'active' emails in one of several ways:
Last access
Frequency of use
Contacts with pending sales
Direct bosses
Etc
Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.
It's a lot of work but kind of fun...
Maybe count the number of emails sent to each address. Then:
ORDER BY EmailCount DESC, LastName, FirstName
That way, your most-often-used addresses come first, even if they haven't been used in a few days.
I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).
I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.
Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.
This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.
The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.
In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.
I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).
I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).
Each use:
UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo#bar.com'
Each interval:
UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)
In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.
The increment and diminisher would have to be adjusted to preference, of course.

Resources