Minimum nurses by shift - job-scheduling

I'm trying to play with the first example of this link : https://developers.google.com/optimization/scheduling/employee_scheduling
I would like to have at least two nurses by shifts.
So I modified the line :
model.Add(sum(shifts[(n, d, s)] for n in all_nurses) == 1)
By :
model.Add(sum(shifts[(n, d, s)] for n in all_nurses) == 2)
And I increased the number of nurses to 8.
Unfortunately, the program doesn't find any solutions.
But it should, there are enough nurses for this.
Do you have an idea ? Thanks!
PS : If I set the number of days to 1, it works, but if it's superior to 1, it found nothing.

Actually this example is problematic and needs to be updated. The problem is that it assumes that the number of possible shifts is equal to the number of nurses, and thus each shift is performed by exactly one nurse.
For a more interesting starting point, I suggest you look at:
https://github.com/google/or-tools/blob/stable/examples/python/shift_scheduling_sat.py
This example is more complex and includes specific constraints that are better suited to shift scheduling.

Related

Finding minimum number of days

I got this question as a part of the interview and I am still unable to solve it.
It goes like this
A person has to complete N units of work; the nature of work is the same.
In order to get the hang of the work, he completes only one unit of work in the first day.
He wishes to celebrate the completion of work, so he decides to complete one unit of work in the last day.
Given that he is only allowed to complete x, x+1 or x-1 units of work in a day, where x is the units of work
completed on the previous day.
How many minimum days will he take to complete N units of work?
Sample Input:
6
-1
0
2
3
9
13
Here, line 1 represents the number of input test cases.
Sample Output:
0
0
2
3
5
7
Each number represents the minimum days required for each input in the sample input.
I tried doing it using the coin change approach but was not able to do so.
In 2k days, it's possible to do at most 2T(k) work (where T(k) is the k'th triangular number). In 2k+1 days, it's possible to do at most T(k+1)+T(k) at most work. That's because if there's an even (2k) number of days, the most work is 1+2+3+...+k + k+(k-1)+...3+2+1. Similarly, if there's an odd (2k+1) number of days, the most work is 1+2+3+...+k+(k+1)+k+...+3+2+1.
Given this pattern, it's possible to reduce the amount of work to any value (greater than 1) -- simply reduce the work done on the day with the most work, never picking the start or end day. This never invalidates the rule that the amount of work on one day is never more than 1 difference from an adjacent day.
Call this function F. That is:
F(2k) = 2T(k)
F(2k+1) = T(k)+T(k+1)
Recall that T(k) = k(k+1)/2, so the equations simplify:
F(2k) = k(k+1)
F(2k+1) = k(k+1)/2 + (k+1)(k+2)/2 = (k+1)^2
Armed with these observations, you can solve the original problem by finding the smallest number of days where it's possible to do at least N units of work. That is, the smallest d such that F(d) >= N.
You can, for example, use binary search to find d, or an optimal approach is to solve the equations. The minimal even solution has d/2 * (d/2 + 1) >= N which you can solve as a quadratic equation, and the minimal odd solution has (d+1)^2/4 >= N, which has a solution d = ceil(2sqrt(N)-1). Once you've found the minimal even and odd solutions, then you can pick the smaller of the two.
AS you want to have the minimum amounts of days you can just say yeah x+1, since if you want the minimum amount of days, BUT you have to consider that his last day x should be 1 so you have to break at a given point and go x-1, so now we have to determine the Breakpoint.
The Breakpoint is located in the middle of the days, since you start at 1 and want to end at 1.
For example you have to do 16 units so you distribute your days like:
Work done:
1234321.
7 days worked.
When you can't make an even Breakpoint like above repeat some values
5 = 1211
Samples:
2: 11
3: 111
9: 12321
13: 1234321
If you need to do exactly N units, not more, then you could use dynamic programming on the matrix a[x][y] where x is the amount of work done in the last day, y is the total amount of work, and a[x][y] is the the minimum number of days needed. You could use Dijkstra's algorithm to minimize a[1][N]. Begin with a[1][1]=1.

Algorithm complexity

Suppose an oracle knows a natural number nthat you wish to know.
The oracle answers only Yes/No, to the following three types of queries:
Is the number greater than x ?
Is the number lesser than x ?
Is the number equal to x ?
(where x can be an arbitrary natural number, that can changed across queries).
Describe a method for posing queries to the oracle, which is asymptotically efficient in the number of queries posed.
Perform the analysis and write a proof of correctness. Note that the number of queries posed will be a function of n
This question is not very fair as it requires asymptotic efficiency, without giving any hint on the goal. We can use an informal information theoretic bound and say that the answer conveys i bits of information, which is Omega(i)=Omega(Lg n).
The algorithm
Phase 1: find the number of significant bits.
Ask x<1b, x<10b, x<100b, x<1000b, x<10000b, x<100000b... (all powers of 2)
until you get a yes.
Phase 2: find all bits.
Take the value of the last bound where phase 1 stopped and divide it by 2.
Then, going from the second most significant to the least significant bit,
set the next bit and ask if <x. Keep the bit set if you get a no.
Example
Let us assume x=10110b, your questions will go as follows:
x<1b ? no
x<10b ? no
x<100b ? no
x<1000b ? no
x<10000b ? no
x<100000b ? yes
Q=10000b
x<11000b ? yes
Q=10000b
x<10100b ? no
Q=10100b
x<10110b ? no
Q=10110b
x<10111b ? yes
Q=10110b
For 5 bits, 10 questions.
Correctness
In phase 1, the search intervals form a partition of the integers and the search will stop sooner or later. When it stops, P<=x<2P holds, where P is a power of 2,or 2^k<=x<2^(k+1).
In phase 2, we keep the invariant condition Q<=x<Q+2^(k+1) by iterative halving (initially Q=0): given Q<=x<Q+2^(k+1), we ask for x<Q+2^k and conclude either Q<=x<Q+2^k or Q+2^k<=x<Q+2^(k+1), which we turn to Q'<=x<Q'+2^k by setting Q'=Q+2^k. In the end, Q<=x<Q+1.
Efficiency
Phase 1 takes as many queries as there are significant bits.
Phase 2 takes as many queries as there are significant bits.
Check out the Wikipedia post on binary search algorithm. That can be a starting point for you.
Binary Search Algorithm

Guessing a number knowing only if the number proposed is lower or higher?

I need to guess a number. I can only see if the number I'm proposing is lower or higher. Performance matters a whole lot, so I thought of the following algorithm:
Let's say the number I'm trying to guess is 600.
I start out with the number 1000 (or for even higher performance, the average result of previous numbers).
I then check if 1000 is higher or lower than 600. It is higher.
I then divide the number by 2 (so that it is now 500), and check if it is lower or higher than 600. It is lower.
I then find the difference and divide it by 2 in the following way to retrieve a new number: (1000 + 500) / 2. The result is 750. I then check that number.
And so on.
Is this the best approach or is there a smarter way of doing this? For my case, every guess takes approximately 500 milliseconds, and I need to guess quite a lot of numbers in as low time as possible.
I can roughly assume that the average result of previous guesses is close to the upcoming numbers too, so there's a pattern there which I can use for my own advantage.
yes binary search is the most effective way of doing this. Binary Search is what you described. For a number between 1 and N Binary Search runs in O(log(n)) time.
So here is the algorithm to find a number between 1-N
int a = 1, b = n, guess = average of previous answers;
while(guess is wrong) {
if(guess lower than answer) {a = guess;}
else if(guess higher than answer) {b = guess;}
guess = (a+b)/2;
} //Go back to while
Well, you're taking the best possible approach without the extra information - it's a binary search, basically.
Exactly how you use the "average result of previous guesses" is up to you; it would suggest biasing the results towards that average, but you'd need to perform analysis of just how indicative previous results are in order to work out the best approach. Don't just use the average: use the complete distribution.
For example, if all the results have been in the range 600-700 (even though the hypothetical range is up to 1000) with an average of 670, you might start with 670 but if it says "guess higher" then you would probably want to choose a value between 670 and 700 as your next guess, rather than 835 (which is very likely to be higher than the real result).
I suggest you log all the results from previous enquiries, so you can then use that as test data for alternative approaches.
In general, binary search starting at the middle point of the range is the optimal strategy. However, you have additional specific information which may make this a suboptimal strategy. This depends critically in what exactly "close to the average of the previous results" means.
If numbers are close to the previous average then dividing by 2 in the second step is not optimal.
Example: Previous numbers 630, 650, 620, 660. You start with 640.
Your number is actually closer. Imagine that it is 634.
The number is lower. If in the second step you divide by 2, you get 320, thus losing any advantage about the previous average numbers.
You should analyze the behaviour further. It may be optimal, in your specific case, to start at the mean of the N previous numbers and then add or substract some quantity related to the standard deviation of the previous numbers.
Yes, binary search (your algorithm) is correct here. However there is one thing missing in the standard binary search:
For binary search you normally need to know the maximum and minimum between which you are searching. In case you do not know this, you have to iteratively find the maximum in the beginning, like so:
Start with zero
if it is higher than the number searched, zero is your maximum and you have to find a minimum
if it is lower than the number searched, zero is your minimum and you have to find a maximum
You can search for your maximum/minimum by starting at 1 or -1 and always multiplying by two until you find a number which is greater/smaller
When you always multiply by two, you will be much faster than when you search linearly.
Do you know the range of possible values? If yes, always start in the middle and do exactly what you describe.
A standard binary search between 0 and N(N is the given number) will give you the answer in logN time.
int a = 1, b = n+1, guess = average of previous answers;
while(guess is wrong) {
if(guess lower than answer) {a = guess;}
else if(guess higher than answer) {b = guess;}
guess = (a+b)/2;
} //Go back to while
You got to add +1 to n else you can never get n since it's an int.
I gave an answer to a similar question "Optimal algorithm to guess any random integer without limits?"
Actually, provided there algorithm not just searches for the conceived number, but it estimates a median of the distribution of the number that you may re-conceive at each step! And also the number could be even from the real domain ;)

Decision Tree learning algorithm

I want to preface this by saying that this is a homework assignment.
I am given a set of Q binary input variables that will be used to classify output of Y which is also binary.
The first part of the question is: at most how many examples do I need to enumarate all possibile combinations of Q? I am currently think that since it asks for at most I will need Q as it is possible that all values up to Q-1 are the same for instance 1 and the item at Q is 0 .
The second part of the question is: at most how many leaf nodes can the tree have given Z examples?
My current answer is that at most the tree would have 2 leaf nodes, one representing true and one representing false since it is dealing with binary inputs and binary outputs.
Is this the correct way of examining this problem or am I generalizing my answers too deeply?
Edit
After looking at Cameron's response, I would now turn my first answer into 2^Q and to build on his example of Q = 3, I would get 2^3 or 8 (2*2*2). Please correct if that is incorrect thinking.
Edit #2
The second part of the question it appears as though it should be (2^Q) * Z or to provide an example: (2^3) * 3) or 8*3 = 24 leaf nodes. To recap if I have 3 inputs that are binary I would initially take 2^3 and get 8 now I want to go over 3 examples. Therefore I should get 8*3 or 24.
Edit #3
In hindsight it seems that no matter how many examples I use the number of leaf nodes should never increase, as it is a per tree basis.
I'd suggest you approach the problem by working out small example cases by hand.
For the first part, choose a small value for Q, say 3, and write down all possible combinations of Q. Then you can figure out how many examples you need. Increase Q and do it again.
For the second part of your question, pick a small Z and run the decision tree algorithm by hand. See how many leaves you get. Then pick another Z and see if/how it changes. Try generating different examples (with the same Z) and see if you can change the number of leaves.

Solving N-Queens Problem... How far can we go?

The N-Queens Problem:
This problem states that given a chess board of size N by N, find the different permutations in which N queens can be placed on the board without any one threatening each other.
My question is:
What is the maximum value of N for which a program can calculate the answer in reasonable amount of time? Or what is the largest N we have seen so far?
Here is my program in CLPFD(Prolog):
generate([],_).
generate([H|T],N) :-
H in 1..N ,
generate(T,N).
lenlist(L,N) :-
lenlist(L,0,N).
lenlist([],N,N).
lenlist([_|T],P,N) :-
P1 is P+1,
lenlist(T,P1,N).
queens(N,L) :-
generate(L,N),lenlist(L,N),
safe(L),
!,
labeling([ffc],L).
notattack(X,Xs) :-
notattack(X,Xs,1).
notattack(X,[],N).
notattack(X,[Y|Ys],N) :-
X #\= Y,
X #\= Y - N,
X #\= Y + N,
N1 is N + 1,
notattack(X,Ys,N1).
safe([]).
safe([F|T]) :-
notattack(F,T),
safe(T).
This program works just fine, but the the time it takes keeps on increasing with N.
Here is a sample execution:
?- queens(4,L).
L = [2, 4, 1, 3] ;
L = [3, 1, 4, 2] ;
No
This means you place the 4 queens at Row 2 in Column1, Row 4 in Column 2, Row 1 in 3 and Row 3 in 4.(In a 4 By 4 chess board)
Now lets see how this program performs(Time taken in calculating the first permutation):
For N=4,5.....10 Computes within a second
For N=11-30 Takes between -1-3 seconds
For N=40..50 Still calculates within a minute
At N=60 It goes out of Global stack(Search space being enormous).
This was a past Homework problem. (The original problem was just to code N-Queens)
I am also interested in seeing alternate implementations in other languages(which performs better than my implementation) or If there is room for improvement in my algorithm/program
This discussion conflates three different computational problems: (1) Finding a solution to the N queens problem, (2) Listing all solutions for some fixed N, and (3) counting all of the solutions for some fixed N. The first problem looks tricky at first for a size of board such as N=8. However, as Wikipedia suggests, in some key ways it is easy when N is large. The queens on a large board don't communicate all that much. Except for memory constraints, a heuristic repair algorithm has an easier and easier job as N increases.
Listing every solution is a different matter. That can probably be done with a good dynamic programming code up to a size that is large enough that there is no point in reading the output.
The most interesting version of the question is to count the solutions. The state of the art is summarized in a fabulous reference known as The Encyclopedia of Integer Sequences. It has been computed up to N=26. I would guess that that also uses dynamic programming, but unlike the case of listing every solution, the algorithmic problem is much deeper and open to further advances.
Loren Pechtel said: "Now for some real insanity: 29 took 9 seconds.
30 took almost 6 minutes!"
This fascinating lack of predictability in backtrack-complexity for different board sizes was the part of this puzzle that most interested me. For years I've been building a list of the 'counts' of algorithm steps needed to find the first solution for each board size - using the simple and well known depth-first algorithm, in a recursive C++ function.
Here's a list of all those 'counts' for boards up to N=49 ... minus N=46 and N=48 which are still work-in-progress:
http://queens.cspea.co.uk/csp-q-allplaced.html
(I've got that listed in the Encyclopedia of Integer Sequences (OEIS) as A140450)
That page includes a link to a list of the matching first solutions.
(My list of First Solutions is OEIS Sequence number A141843)
I don't primarily record how much processing time each solution demands, but rather I record how many failed queen-placements were needed prior to discovery of each board's algorithmically-first solution. Of course the rate of queen placements depends on CPU performance, but given a quick test-run on a particular CPU and a particular board size, it's an easy matter to calculate how long it took to solve one of these 'found' solutions.
For example, on an Intel Pentium D 3.4GHz CPU, using a single CPU thread -
For N=35 my program 'placed' 24 million queens per second and took just 6 minutes to find the first solution.
For N=47 my program 'placed' 20.5 million queens per second and took 199 days.
My current 2.8GHz i7-860 is thrashing through about 28.6 million queens per second, trying to find the first solution for N=48. So far it has taken over 550 days (theoretically, if it had never been uninterrupted) to UNsuccessfully place 1,369,331,731,000,000 (and rapidly climbing) queens.
My web site doesn't (yet) show any C++ code, but I do give a link on that web page to my simple illustration of each one of the 15 algorithm steps needed to solve the N=5 board.
It's a delicious puzzle indeed!
Which Prolog system are you using? For example, with recent versions of SWI-Prolog, you can readily find solutions for N=80 and N=100 within fractions of a second, using your original code. Many other Prolog systems will be much faster than that.
The N-queens problem is even featured in one of the online examples of SWI-Prolog, available as CLP(FD) queens in SWISH.
Example with 100 queens:
?- time((n_queens(100, Qs), labeling([ff], Qs))).
Qs = [1, 3, 5, 57, 59 | ...] .
2,984,158 inferences, 0.299 CPU in 0.299 seconds (100% CPU, 9964202 Lips)
SWISH also shows you nices image of solutions.
Here is an animated GIF showing the complete solution process for N=40 queens with SWI-Prolog:
a short solution presented by raymond hettinger at pycon: easy ai in python
#!/usr/bin/env python
from itertools import permutations
n = 12
cols = range(n)
for vec in permutations(cols):
if (n == len(set(vec[i] + i for i in cols))
== len(set(vec[i] - i for i in cols))):
print vec
computing all permutations is not scalable, though (O(n!))
As to what is the largest N solved by computers there are references in literature in which a solution for N around 3*10^6 has been found using a conflict repair algorithm (i.e. local search). See for example the classical paper of [Sosic and Gu].
As to exact solving with backtracking,there exist some clever branching heuristics which achieve correct configurations with almost no backtracking. These heuristics can also be used to find the first-k solutions to the problem: after finding an initial correct configuration the search backtracks to find other valid configurations in the vicinity.
References for these almost perfect heuristics are [Kale 90] and [San Segundo 2011]
What is the maximum value of N for which a program can calculate the answer in reasonable amount of time? Or what is the largest N we have seen so far?
There is no limit. That is, checking for the validity of a solution is more costly than constructing one solution plus seven symmetrical ones.
See Wikipedia:
"Explicit solutions exist for placing n queens on an n × n board for all n ≥ 4, requiring no combinatorial search whatsoever‌​.".
I dragged out an old Delphi program that counted the number of solutions for any given board size, did a quick modification to make it stop after one hit and I'm seeing an odd pattern in the data:
The first board that took over 1 second to solve was n = 20. 21 solved in 62 milliseconds, though. (Note: This is based off Now, not any high precision system.) 22 took 10 seconds, not to be repeated until 28.
I don't know how good the optimization is as this was originally a highly optimized routine from back when the rules of optimization were very different. I did do one thing very different than most implementations, though--it has no board. Rather, I'm tracking which columns and diagonals are attacked and adding one queen per row. This means 3 array lookups per cell tested and no multiplication at all. (As I said, from when the rules were very different.)
Now for some real insanity: 29 took 9 seconds. 30 took almost 6 minutes!
Actually constrained random walk (generate and test) like what bakore outlined is the way to go if you just need a handful of solutions because these can be generated rapidly. I did this for a class when I was 20 or 21 and published the solution in the Journal of Pascal, Ada & Modula-2, March 1987, "The Queens Problem Revisited". I just dusted off the code from that article today (and this is very inefficient code) and after fixing a couple of problems have been generating N=26 ... N=60 solutions.
If you only want 1 solution then it can be found greedily in linear time O(N). My code in python:
import numpy as np
n = int(raw_input("Enter n: "))
rs = np.zeros(n,dtype=np.int64)
board=np.zeros((n,n),dtype=np.int64)
k=0
if n%6==2 :
for i in range(2,n+1,2) :
#print i,
rs[k]=i-1
k+=1
rs[k]=3-1
k+=1
rs[k]=1-1
k+=1
for i in range(7,n+1,2) :
rs[k]=i-1
k+=1
rs[k]=5-1
elif n%6==3 :
rs[k]=4-1
k+=1
for i in range(6,n+1,2) :
rs[k]=i-1
k+=1
rs[k]=2-1
k+=1
for i in range(5,n+1,2) :
rs[k]=i-1
k+=1
rs[k]=1-1
k+=1
rs[k]=3-1
else :
for i in range(2,n+1,2) :
rs[k]=i-1
k+=1
for i in range(1,n+1,2) :
rs[k]=i-1
k+=1
for i in range(n) :
board[rs[i]][i]=1
print "\n"
for i in range(n) :
for j in range(n) :
print board[i][j],
print
Here, however printing takes O(N^2) time and also python being a slower language any one can try implementing it in other languages like C/C++ or Java. But even with python it will get the first solution for n=1000 within 1 or 2 seconds.

Resources