I'm looking for a good algorithm or technic to find the best solution for the following problem. First, I’ll introduce the context and then, the problem.
I work for a company with more than 2000 employees; all of them work with pattern shift, this means that any employee has a pattern which specifies the sequence of workday and free day. We have these patterns:
5-2-5-2 (5 days work, 2 free, 5 days work, 2 free) and so on.
At this moment we have all these patterns and different numbers of start, that is to say, a pattern can start at a certain date at a specific part inside the pattern, for e.g., the pattern 5-4-5-3 has 17 possible starting sequences, this number is a sum of 5+4+5+3 = 17 possible sequences.
Now the problem,
Every 6 months each employee can change the pattern and start in any sequence number of the pattern.
But we must analyze all the requirements and accept or reject in order to obtain the better combination for the company operation, because we need that every day have the same work force but we understand that this is impossible but the algorithm will help us find a good solution, not perfect.
I was reading about the "Nurse scheduling problem" with Google Or-Tool but I don’t understand how to set Pattern Sequence to create a solution for this problem. I read some opinions about GA (genetic algorithms) and all of them said that this kind of solution is not good for this kind of problem.
Does anyone have a similar problem? Can someone give me a more accurate example with Google OR-tools than the example in GitHub.
it's not necessary to find a strictly optimal solution; the roster is currently done manual, and I'm pretty sure the result is considerably sub-optimal most of the time.

Does anyone have a similar problem?
Sounds a lot like OptaWeb Employee Rostering, which is a vertical on top of OptaPlanner, the constraint solver. Take a look at the source code. It's all open source.

I think this can be modeled as a MIP model.
Thinking out loud:
Introduce a binary decision variable:
δ(i,p) = 1 if pattern i is selected for person p
0 otherwise
This includes the current pattern (say i=0). This would allow the cases:
an employee does not submit a new pattern (then we only have i=0 for this employee)
an employee submits one or more preferable patterns
We have the constraints:
sum(i, δ(i,p)) = 1 ∀p
sum((i,p), pattern(i,p,t)*δ(i,p)) ≈ requiredlevel(t) ∀t
δ(i,p) ∈ {0,1}
here pattern(i,p,t) describes pattern i: it is 1 if period t is covered when pattern (i,p) is used and 0 otherwise. Here I use ≈ to indicate "approximately". (This is easily modeled using slacks and possibly penalty terms in the objective).
Now we maximize
maximize sum((i,p), weight(i,p) * δ(i,p))
where weight(i,p) indicates the preference for a pattern (e.g. weight(0,p)=0 i.e. no bonus points when not selected a newly, preferred pattern).
Something like this should not be too difficult to set up. Of course many refinements are possible. These type of model tend to solve quite quickly.

What is the workflow ?
If you have a fixed roster, and one person proposes a new pattern. Just remove this person contribution, test all (17) starting points of the new pattern and score them.
If you can change patterns, or starting points for more than 1 employee, create an integer variable per starting point. From this starting point, it is easy to compute the persons contribution for each shifted day of the pattern. Then you can optimize quality of service w.r.t. the starting points of each pattern, summing potential contributions per day of the week for each employee.
Is that clear ?


How to design an algorithm to put elements into groups with constraints?

I was given a task of putting students into groups (to prepare a coding camp), but with several constraints. Though I've finished the task by hand, I'd like to know is there already exist some algorithms for tasks like this, or how can I design such an algorithm.
Background: 40 students in total, with these attributes:
gender: F/M
grade: Year 1/2
school: School 1/School 2/...
early assessment result: Rank from 1 to 40
Constraints: All of them needs to be satisfied.
Exactly 4 people per group
Each group needs to have at least a girl
Each group needs to have at least a Year 2 student
4 group members needs to come from 4 different schools
Each group needs to have at least a student who ranked top 10 in early assessment
What I'm expecting:
The Best: An existing algorithm/program for these kind of problems
Or, An algorithm for this specific problem
Or at least, Some ideas of creating an algorithm for this specific problem
My thoughts:
Since I've successed in making groups by hand, I know that such a solution indeed exists for my current dataset. But if I need an algorithm to find a solution for me, it should first try to check whether a solution even exists, by check if the number of girl / Year 2 students is greater than 10 (with pigeonhole principle), and some other conditions. And obviously, Constraint 5 is the easiest, and can provide a base solution for the rest. However, I still can not find a systematic way of doing it. Perhaps bruteforce and randomization can help? I'm not sure.
And sorry, since the data is confidential, I can not post it.
Update: After consulting a friend, here is a possible method:
First put the top 1 to 10 into 10 different groups.
Then iterate through groups. If the only person in the group is a boy/girl, try to add a girl/boy from a different school.
Then the problem size is reduced from 2^40 to 2^20, making bruthforce a viable solution.

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

What class of algorithms can be used to solve this?

EDIT: Just to make sure someone is not breaking their head on the problem... I am not looking for the best optimal algorithm. Some heuristic that makes sense is fine.
I made a previous attempt at formulating this and realized I did not do a great job at it so I removed that question. I have taken another shot at formulating my problem. Please feel free to provide any constructive criticism that can help me improve this.
N people
k announcements that I can make
Distance that my voice can be heard (say 5 meters) i.e. I may decide to announce or not depending on the number of people within these 5 meters
Maximize the total number of people who have heard my k announcements and (optionally) minimize the time in which I can finish announcing all k announcements
Once a person hears my announcement, he is be removed from the total i.e. if he had heard my first announcement, I do not count him even if he hears my second announcement
I can see the same person as well as the same set of people within my proximity
Let us consider 10 people numbered from 1 to 10 and the following pattern of arrival:
Time slot 1: 1 (payoff = 1)
Time slot 2: 2 3 4 5 (payoff = 4)
Time slot 3: 5 6 7 8 (payoff = 4 if no announcement was made previously in time slot 2, 3 if an announcement was made in time slot 2)
Time slot 4: 9 10 (payoff = 2)
and I am given 2 announcements to make. Now if I were an oracle, I would choose time slots 2 and time slots 3 because then 7 people would have heard (because 5 already heard my announcement in Time slot 2, I do not consider him anymore). I am looking for an online algorithm that will help me make these decisions on whether or not to make an announcement and if so based on what factors. Does anyone have any ideas on what algorithms can be used to solve this or a simpler version of this problem?
There should be an approach relying upon a max-flow algorithm. In essence, you're trying to push the maximum amount of messages from start->end. Though it would be multidimensional, you could have a super-sink, which connects to each value of t, then have each value of t connect to the people you can reach at this time and then have a super-sink. This way, you simply have to compute a max-flow (with the added constraint of no more than k shouts, which should be solvable with a bit of dynamic programming). It's a terrifically dirty way to solve it, but it should get the job done deterministically and without the use of heuristics.
I don't know that there is really a way to solve this or an algorithm to do it the way you have formulated it.
It seems like basically you are trying to reach the maximum number of people with exactly 2 announcements. But without knowing any information about the groups of people in advance, you can't really make any kind of intelligent decision about whether or not to use your first announcement. Your second one at least has the benefit of knowing when not to be used (i.e. if the group has no new members then you can know its not worth wasting the announcement). But it still has basically the same problem.
The only real way to solve this is to use knowledge about the type of data or the desired outcome to make guesses. If you know that groups average 100 people with a standard deviation of 10, then you could just refuse to announce if less than 90 people are present. Or, if you know you need to reach at least 100 people with two announcements, you could choose never to announce to less than 50 at once. Obviously those approaches risk never announcing at all if the actual data does not meet what you would expect. But that's always going to be a risk, since you could get 1 person in the first group and then 0 in all of the rest, no matter what you do.
Or, you could try more clearly defining the problem, I have a hard time figuring out how to relate this to computers.
Lets start my trying to solve the simplest possible variant of the problem: Lets assume N people and K timeslots, but only one possible announcement. Lets also assume that each person will only ever stay for one timeslot and that each person who hasn't yet shown up has an equally probable chance of showing up at any future timeslot.
Given these simplifications, at each timeslot you look at the payoff of announcing at the current timeslot and compare to the chance of a future timeslot having a higher payoff, eg, lets assume 4 people 3 timeslots:
Timeslot 1: Person 1 shows up, so you know you could get a payoff of 1 by announcing, but then you have 3 people to show up in 2 remaining timeslots, so at least one of those timeslots is guaranteed to have 2 people, so don't announce..
So at each timeslot, you can calculate the chance that a later timeslot will have a higher payoff than the current by treating the remaining (N) people and (K) timeslots as being N independent random numbers each from 1..k, and calculate the chance of at least one value k being hit more than or equal to the current-payoff times. (Similar to the Birthday problem, but for more than 1 collision) and then you need to decide hwo much to discount based on expected variances. (bird in the hand, etc)
Generalization of this solution to the original problem is left as an exercise for the reader.

Shuffle and deal a deck of card with constraints

Here is the facts first.
In the game of bridge there are 4
players named North, South, East and
All 52 cards are dealt with 13 cards
to each player.
There is a Honour counting systems.
Ace=4 points, King=3 points, Queen=2
points and Jack=1 point.
I'm creating a "Card dealer" with constraints where for example you might say that the hand dealt to north has to have exactly 5 spades and between 13 to 16 Honour counting points, the rest of the hands are random.
How do I accomplish this without affecting the "randomness" in the best way and also having effective code?
I'm coding in C# and .Net but some idea in Pseudo code would be nice!
Since somebody already mentioned my Deal 3.1, I'd like to point out some of the optimizations I made in that code.
First of all, to get the most flexibly constraints, I wanted to add a complete programming language to my dealer, so you could generate whole libraries of constraints with different types of evaluators and rules. I used Tcl for that language, because I was already learning it for work, and, in 1994 when Deal 0.0 was released, Tcl was the easiest language to embed inside a C application.
Second, I needed the constraint language to run fairly fast. The constraints are running deep inside the loop. Quite a lot of code in my dealer is little optimizations with lookup tables and the like.
One of the most surprising and simple optimizations was to not deal cards to a seat until a constraint is checked on that seat. For example, if you want north to match constraint A and south to match constraint B, and your constraint code is:
match constraint A to north
match constraint B to south
Then only when you get to the first line do you fill out the north hand. If it fails, you reject the complete deal. If it passes, next fill out the south hand and check its constraint. If it fails, throw out the entire deal. Otherwise, finish the deal and accept it.
I found this optimization when doing some profiling and noticing that most of the time was spent in the random number generator.
There is one fancy optimization, which can work in some instances, call "smart stacking."
deal::input smartstack south balanced hcp 20 21
This generates a "factory" for the south hand which takes some time to build but which can then very quickly fill out the one hand to match this criteria. Smart stacking can only be applied to one hand per deal at a time, because of conditional probability problems. [*]
Smart stacking takes a "shape class" - in this case, "balanced," a "holding evaluator", in this case, "hcp", and a range of values for the holding evaluator. A "holding evaluator" is any evaluator which is applied to each suit and then totaled, so hcp, controls, losers, and hcp_plus_shape, etc. are all holding evalators.
For smartstacking to be effective, the holding evaluator needs to take a fairly limited set of values. How does smart stacking work? That might be a bit more than I have time to post here, but it's basically a huge set of tables.
One last comment: If you really only want this program for bidding practice, and not for simulations, a lot of these optimizations are probably unnecessary. That's because the very nature of practicing makes it unworthy of the time to practice bids that are extremely rare. So if you have a condition which only comes up once in a billion deals, you really might not want to worry about it. :)
[Edit: Add smart stacking details.]
Okay, there are exactly 8192=2^13 possible holdings in a suit. Group them by length and honor count:
Holdings(length,points) = { set of holdings with this length and honor count }
Holdings(3,7) = {AK2, AK3,...,AKT,AQJ}
and let
h(length,points) = |Holdings(length,points)|
Now list all shapes that match your shape condition (spades=5):
Note that the collection of all possible hand shapes has size 560, so this list is not huge.
For each shape, list the ways you can get the total honor points you are looking for by listing the honor points per suit. For example,
Shape Points per suit
5-4-4-0 10-3-0-0
5-4-4-0 10-2-1-0
5-4-4-0 10-1-2-0
5-4-4-0 10-0-3-0
5-4-4-0 9-4-0-0
Using our sets Holdings(length,points), we can compute the number of ways to get each of these rows.
For example, for the row 5-4-4-0 10-3-0-0, you'd have:
So, pick one of these rows at random, with relative probability based on the count, and then, for each suit, choose a holding at random from the correct Holdings() set.
Obviously, the wider the range of hand shapes and points, the more rows you will need to pre-compute. A little more code, you can still do this with some cards pre-determined - if you know where the ace of spades or west's whole hand or whatever.
[*] In theory, you can solve these conditional probability issues for smart stacking with multiple hands, but the solution to the problem would make it effective only for extremely rare types of deals. That's because the number of rows in the factory's table is roughly the product of the number of rows for stacking one hand times the number of rows for stacking the other hand. Also, the h() table has to be keyed on the number of ways of dividing the n cards amongst hand 1, hand 2, and other hands, which changes the number of values from roughly 2^13 to 3^13 possible values, which is about two orders of magnitude bigger.
Since the numbers are quite small here, you could just take the heuristic approach: Randomly deal your cards, evaluate the constraints and just deal again if they are not met.
Depending on how fast your computer is, it might be enough to do this:
do a random deal
Until the board meets all the constraints
As with all performance questions, the thing to do is try it and see!
edit I tried it and saw:
done 1000000 hands in 12914 ms, 4424 ok
This is without giving any thought to optimisation - and it produces 342 hands per second meeting your criteria of "North has 5 spades and 13-16 honour points". I don't know the details of your application but it seems to me that this might be enough.
I would go for this flow, which I think does not affect the randomness (other than by pruning solutions that do not meet constraints):
List in your program all possible combinations of "valued" cards whose total Honour points count is between 13 and 16. Then pick randomly one of these combinations, removing the cards from a fresh deck.
Count how many spades you already have among the valued cards, and pick randomly among the remaining spades of the deck until you meet the count.
Now pick from the deck as much non-spades, non-valued cards as you need to complete the hand.
Finally pick the other hands among the remaining cards.
You can write a program that generates the combinations of my first point, or simply hardcode them while accounting for color symmetries to reduce the number of lines of code :)
Since you want to practise bidding, I guess you will likely be having various forms of constraints (and not just 1S opening, as I guess for this current problem) coming up in the future. Trying to come up with the optimal hand generation tailored to the constraints could be a huge time sink and not really worth the effort.
I would suggest you use rejection sampling: Generate a random deal (without any constraints) and test if it satisfies your constraints.
In order to make this feasible, I suggest you concentrate on making the random deal generation (without any constraints) as fast as you can.
To do this, map each hand to a 12byte integer (the total number of bridge hands fits in 12 bytes). Generating a random 12 byte integer can be done in just 3, 4 byte random number calls, of course since the number of hands is not exactly fitting in 12 bytes, you might have a bit of processing to do here, but I expect it won't be too much.
Richard Pavlicek has an excellent page (with algorithms) to map a deal to a number and back.
See here: http://www.rpbridge.net/7z68.htm
I would also suggest you look at the existing bridge hand dealing software (like Deal 3.1, which is freely available) too. Deal 3.1 also supports doing double dummy analysis. Perhaps you could make it work for you without having to roll one of your own.
Hope that helps.

Algorithm for most recently/often contacts for auto-complete?

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete
I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.
What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.
I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.
Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine
Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.
Take a look at Self organizing lists.
A quick and dirty look:
Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.
Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.
It looks like the move to front implementation would best suit your needs.
EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.
Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.
The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.
This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.
Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)
I'd go for something similar to:
NoM = Number of Mail
(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30
Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0
It's just an idea, anyway it is to give different importance to the most and just mailed contacts.
If you want to get crazy, mark the most 'active' emails in one of several ways:
Last access
Frequency of use
Contacts with pending sales
Direct bosses
Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.
It's a lot of work but kind of fun...
Maybe count the number of emails sent to each address. Then:
ORDER BY EmailCount DESC, LastName, FirstName
That way, your most-often-used addresses come first, even if they haven't been used in a few days.
I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).
I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.
Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.
This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.
The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.
In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.
I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).
I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).
Each use:
UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo#bar.com'
Each interval:
UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)
In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.
The increment and diminisher would have to be adjusted to preference, of course.
