Hot and Cold Binary Search Game - algorithm

Hot or cold.
I think you have to do some sort of binary search but I'm not sure how.
Your goal is the guess a secret integer between 1 and N. You
repeatedly guess integers between 1 and N. After each guess you learn
if it equals the secret integer (and the game stops); otherwise
(starting with the second guess), you learn if the guess is hotter
(closer to) or colder (farther from) the secret number than your
previous guess. Design an algorithm that finds the secret number in lg
N + O(1) guesses.
Hint: Design an algorithm that solves the problem in lg N + O(1)
guesses assuming you are permitted to guess integers in the range -N
to 2N.
I've been racking my brain and I can't seem to come up with a a lg N + O(1).
I found this: http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=riddles_cs;action=display;num=1316188034 but could not understand the diagram and it did not describe other possible cases.

Suppose you know that your secret integer is in [a,b], and that your last guess is c.
You want to divide your interval by two, and to know whether your secret integer lies in between [a,m] or [m,b], with m=(a+b)/2.
The trick is to guess d, such that (c+d)/2 = (a+b)/2.
Without loss of generality, we can suppose that d is bigger than c. Then, if d is hotter than c, your secret integer will be bigger than (c+d)/2 = (a+b)/2 = m, and so your secret integer will lie in [m,b]. If d is cooler than c, your secret integer will belong to [a,m].
You need to be able to guess between -N and 2N because you can't guarantee that c and d as defined above will always be [a,b]. Your two first guess can be 1 and N.
So, your are dividing your interval be two at each guess, so the complexity is log(N) + O(1).
A short example to illustrate this (results chosen randomly):
Guess Result Interval of the secret number
1 *** [1 , N ] // d = a + b - c
N cooler [1 , N/2 ] // N = 1 + N - 1
-N/2 cooler [N/4 , N/2 ] //-N/2 = 1 + N/2 - N
5N/4 hotter [3N/8, N/2 ] // 5N/4 = N/4 + N/2 + N/2
-3N/8 hotter [3N/8, 7N/16] //-3N/8 = 3N/8 + N/2 - 5N/4
. . . .
. . . .
. . . .
Edit, suggested by #tmyklebu:
We still need to prove that our guess will always fall in bewteen [-N,2N]
By recurrence, suppose that c (our previous guess) is in [a-(a+b), b+(a+b)] = [-b,a+2b]
Then d = a+b-c <= a+b-(-b) <= a+2b and d = a+b-c >= a+b-(a+2b) >= -b
Initial case: a=1, b=N, c=1, c is indeed in [-b,a+2*b]
QED

This was a task at IOI 2010, for which I sat on the Host Scientific Committee. (We asked for an optimal solution instead of simply lg N + O(1), and what follows is not quite optimal.)
Not swinging outside -N .. 2N and using lg N + 2 guesses is straightforward; all you need to do is show that the obvious translation of binary search works.
Once you have something that doesn't swing outside -N .. 2N and takes lg N + 2 guesses, do this:
Guess N/2, then N/2+1. This tells you which half of the array the answer is in. Then guess the end of that half-array. You're either in one of the two "middle" quarters or you're in one of the two "end" quarters. If you're in a middle quarter, do the thing before and you win in lg N + 4 guesses. The ends are slightly trickier.
Suppose I need to guess a number in 1 .. K without straying outside 1 .. N and my last guess was 1. If I guess K/2 and I'm colder, then I next guess 1; I spent two guesses to get a similar subproblem that's 1/4 the size. If K/2 is hotter, I know the answer is in K/4 .. K. Guess K/2-1 next. The two subcases are K/4 .. K/2-1 and K/2 .. K, both of which are nice. But it took me three guesses to (in the worst-case) halve the size of the problem; if I ever do this, I wind up doing lg N + 6 guesses.

The solution is close to binary search. At each step you have an interval that the number can be in. Start with the whole interval [1, N]. First guess both ends - that is the numbers 1 and N. One of them will be closer, thus you will know that now the number you are searching for is in [1, N/2] or in [N/2 + 1, N](considering N even for simplicity). Now you go to the next step having a twice smaller interval. Continue using the same approach. Keep in mind that you've already probed one of the ends, however it may not be your last guess.
I am not sure what you mean by lg N + O(1), but the approach I suggest will perform O(log(N)) operations and in the worst case it will do exactly log4(N) probes.

Here're my two cents for this problem, since I got obsessed with it for two days. I'm not going to say anything new to what others have already said, but I'm going to explain it in a way that might get some people to understand the solution easily(or at least that was the way I managed to understand it).
Drawing from the ~ 2 lg N solution, if I knew that solution existed in [a, b] I'd want to know if it's in the left half [a, (a + b) / 2] or the right half [(a + b) / 2, b], with the point (a + b) / 2 separating the two halves. So what do I do? I guess a then b; if I get colder with b I know I'm in the first(left) half, if I get hotter I know I'm in the second(right) one. So guessing a and b is the way to know the secret integer position with respect to the mid point (a + b) / 2. However a and b aren't the only points that I can guess at to know the secret position. (a - 1, b + 1), (a - 2, b + 2), ... etc are all valid pairs of points to guess at to know the secret position, as the mid point of all these pairs is (a + b) / 2, the mid point of the original interval [a, b]. In fact any two numbers c and d such that (c + d) / 2 = (a + b) / 2 can be used.
So considering [a, b] as the interval we know the secret integer exists within, take c to be the last number we guessed. We want to determine the position of the secret with respect to the mid point (a + b) / 2, so we a new number d to guess at to know the secret relative position to (a + b) / 2. How do we know such number d? By solving the equation (c + d) / 2 = (a + b) / 2, which yields d = a + b - c. Guessing at that d, we shrink the range [a, b] appropriately based on the answer(colder or hotter) and then repeat the process taking d as our last guess and trying a new guess at number e for example with the same conditions.
To establish the initial conditions, we should start with a = 1, b = N, and c = 1. We guess at c to establish a reference(since the first guess can't tell you anything useful as there were no prior guesses). We then proceed with new guesses and adjusting the enclosing interval as appropriate with each guess. The table in #R2B2's answer explains it all.
You have to be vigilant however when trying to code this solution. When I tried to code it in python, I first ran into the mistake of getting [a, b] stuck when it was small enough(like [a, a + 1]) where neither a nor b would move inwards. I had to phase the cases where the interval size was 2 outside the loop and handle them separately(like I did with intervals with size 1 also).

the task is quite brain cracking
but I'll try to keep it simple
to solve the problem in 2lnN let's make following guesses:
1 and N : to decide which half is hotter (1, N/2) or (N/2, N), for example (N/2, N) is the hotter half, then let's make next guesses
N/2 and N : again to decide which half is hotter (N/2, 3/4N) or (3/4N, N) and so on
... so we need 2 guesses for every half-division, therefore we should make 2 * lnN guesses.
here we see that each time we need to repeat one of previous intervals' borders one more time - in our example point 'N' is repeated.
to solve the problem in 1*lnN guesses instead of 2*lnN, we need to find way to spend only one guess for each intervals' half-division, good illustration of such method is depicted on image at http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=riddles_cs;action=display;num=1316188034
the idea is to avoid repeating one of border points of each interval again in subsequent steps, but be smart to spend one guess for each half-division step by mirroring points.
the smart idea is that when we want to decide which half of current interval is hotter, we don't need to try only its borders, but we can as well try any points located outside its borders, these guessing points must be at equal distances (in other words mirrored) relatively to the center of interesting interval, nothing bad even if these guessing points are negative (i.e. < 0) or if they are > N (this is not prohibited by the conditions of the task)
so to guess the secret we can freely make guesses using pounts in interval (-N, 2N)

public class HotOrCold {
static int num=5000;// number to search
private static int prev=0;
public static void main(String[] args) {
System.out.println(guess(1,Integer.MAX_VALUE));
}
public static int guess(int lo,int hi) {
while(hi>=lo) {
boolean one=false;
boolean two=false;
if(hi==lo) {
return lo;
}
if(isHot(lo)) {
one=true;
}
if(isHot(hi)) {
two=true;
}
if(!(one||two)) {
return (hi+lo)/2;// equal distance
}
if(two) {
lo=hi-(hi-lo)/2;
continue;//checking two before as it's hotter than lo so ignoring the lo
}
if(one) {
hi=lo+(hi-lo)/2;
}
}
return 0;
}
public static boolean isHot(int curr) {
boolean hot=false;
if(Math.abs(num-curr)<Math.abs(num-prev)) {
hot=true;
}
prev=curr;
return hot;
}
}

Related

Search a word in a matrix runtime comlexity

Trying to analyze the runtime complexity of the following algorithm:
Problem: We have an m * n array A consisting of lower case letters and a target string s. The goal is to examine whether the target string appearing in A or not.
algorithm:
for(int i = 0; i < m; i++){
for(int j = 0; j < n; j++){
if(A[i][j] is equal to the starting character in s) search(i, j, s)
}
}
boolean search(int i, int j, target s){
if(the current position relative to s is the length of s) then we find the target
looping through the four possible directions starting from i, j: {p,q} = {i+1, j} or {i-1, j} or {i, j+1} or {i, j-1}, if the coordinate is never visited before
search(p, q, target s)
}
One runtime complexity analysis that I read is the following:
At each position in the array A, we are first presented with 4 possible directions to explore. After the first round, we are only given 3 possible choices because we can never go back. So the worst runtime complexity is O(m * n * 3**len(s))
However, I disagree with this analysis, because even though we are only presented with 3 possible choices each round, we do need to spend one operation to check whether that direction has been visited before or not. For instance, in java you probably just use a boolean array to track whether one spot has been visited before, so in order to know whether a spot has been visited or not, one needs a conditional check, and that costs one operation. The analysis I mentioned does not seem to take into account this.
What should be the runtime complexity?
update:
Let us suppose that the length of the target string is l and the runtime complexity at a given position in the matrix is T(l). Then we have:
T(l) = 4 T(l- 1) + 4 = 4(3T(l - 2) + 4) + 4 = 4(3( 3T(l -3) + 4) + 4)) + 4 = 4 * 3 ** (l - 1) + 4 + 4 *4 + 4 * 3 * 4 + ...
the +4 is coming from the fact that we are looping through four directions in each round besides recursively calling itself three times.
What should be the runtime complexity?
The mentioned analysis is correct and the complexity is indeed O(m * n * 3**len(s)).
For instance, in java you probably just use a boolean array to track whether one spot has been visited before, so in order to know whether a spot has been visited or not, one needs a conditional check, and that costs one operation.
That is correct and does not contradict the analysis.
The worst case we can construct is the matrix filled with only one letter a and a string aaaa....aaaax (many letters a and one x at the end). If m, n and len(s) are large enough, almost each call of the search function will generate 3 recursion calls of itself. Each of that calls will generate another 3 calls (which gives us total 9 calls of depth 2), each of them willl generate another 3 calls (which gives us total 27 calls of depth 3) and so on. Checking current string character, conditional checks, spawning a recursion are all O(1), so complexity of the whole search function is O(3**len(s)).
The solution is brute force. We have to touch each point on the board. That makes O(m*n) operation.
Now for each point, we have to run dfs() to check if the word exist. So we get
O(m * n * timeComplexityOf dfs)
this is a dfs written in python. Examine the time complexity
def dfs(r,c,i):
# O(1)
if i==len(word):
return True
# O(1)
# set is implemented as a hash table.
# So, time complexity of look up in a set is O(1)
if r<0 or c<0 or r>=ROWS or c>=COLS or word[i]!=board[r][c] or (r,c) in path_set:
return False
# O(1)
path.add((r,c))
# O(1)
res=(dfs(r+1,c,i+1) or
dfs(r-1,c,i+1) or
dfs(r,c+1,i+1) or
dfs(r,c-1,i+1))
# O(1)
path.remove((r,c))
return res
Since we dfs recursively calling itself, think about how many dfs calls will be on call stack. in worst case it will length of word. Thats why
O ( m * n * word.length)

How can I solve this coding problem efficiently which involves the 'modulo' operation?

We are given an integer 'N' . We can choose any 2 numbers (a and b) in the range (1 to z) . The value of L is given by,
L = Max(( (N%a) %b) %N)
We have to calculate the number of pairs (a,b) which give(s) the value 'L' .
I know the brute-force , one, O(n2) solution.
Is there any more efficient way to solve this problem?!
The only way I can decipher Max(( (N%a) %b) %N) is that the max is taken over all a, b pairs. If I am wrong, please disregard the rest.
In case z > N/2:
First, observe that if both a and b are greater than N, then (N%a) % b yields N, so (N%a) %b) %N yields 1, which is unsatisfactory small. Therefore at least one of them shall be less than N.
Second, observe (better yet, prove) that the maximal value of N % a is achieved when a is N/2 + 1 for even N, and (N + 1)/2 for odd (important note: it is a half of the next multiple of 2 after N). Call it a maximizer.
Finally, observe that any b greater than that modulo leaves it untouched. Prove that this is indeed the desired maximum.
Now you have enough facts to come up with effectively a one-line program (don't forget the a > N, b = maximizer case).
The same logic works for z < N/2. Finding the maximizer is a bit trickier, but still possible in O(1) (see the important note above).

Stuck at Algorithm pseudocode generation

I do not know what to do next (and even if my approach is correct) in the following problem:
Part 1
Part 2
I have just figured out that a possible MNT (for part a) is to get a jar, test if it breaks from height h, if so then there's the answer, if not, height+1 and keep looping.
For part b is the following. Since we know max height equals n, then we start from n (current height = n). Therefore we go from top to bottom adding to our broken jar count (they are supposed to break if you start from top) until the jars stop breaking. Then the number would be current height + 1 (because we need to go back one index).
For part c, I don't even know what my approach would be, since I am assuming that the order of the algorithm is O(n^c) where c is a fraction. I also know that O(n^c) is faster than O(n).
I also noted that there is a problem similar to this one online, but it talks about rungs instead of a robotic arm. Maybe it is similar? Here is the link
Do you have any recommendations/clues? Any help will be appreciated.
Thank you for your time and help in advance.
Cheers!
This is an answer for part (c).
The idea is to find some number k and apply the following scheme:
Drop a jar from height k:
If it breaks, drop the other one from k-1 down to 1 until we find the height that it breaks in, in no more than k tries.
If it doesn't break, drop it again from height k + (k-1). Again, if it breaks drop the other one from (k+(k-1)-1) down to k+1, otherwise continue to (k + (k-1) + (k-2)).
Continue this until you find the height
(of course if at some point you need to jump to a height greater than n, you just jump to n).
This scheme ensures we'll use at most k tries. So now the question is how to find a minimal k (as a function of n), for which the scheme will work. Since, at every step, we reduce by 1 our height advancement, the following equation must hold:
k + (k-1) + (k-2) + ... + 1 >= n
Otherwise will "run out" of steps before reaching n. We want to find the smallest k for which the inequality holds.
There's a formula to the sum:
1 + 2 + ... + k = k(k+1)/2
Using that we get the equation:
k(k+1)/2 = n ===> k^2 + k - 2n = 0
Solving this (and if it's not integral take the ceiling of it) will give us k. Quadratic equations might have two solutions, but ignoring the negative one you get:
k = (-1 + sqrt(1 + 8n))/2
Looking for the complexity, we can ignore everything but the n, which has an exponent of 1/2 (since we're taking its square root). That is actually better then the requested complexity of n to power of 2/3.
For part (a) you can use binary search over height. pseudo code for the same is below :
lo = 0
hi = n
while(lo<hi) {
mid = lo +(hi-lo)/2;
if(galss_breaks(mid)) {
hi = mid-1;
} else {
lo = mid;
}
}
'lo' will contain the maximum possible height in minimum possible trials. It will take log(n) steps in worst case whereas your approach may take N steps in worst case.
For part(b) ,
you can use your approach a, start from the minimum height and increase height by 1 until the glass breaks. This will at most break 1 glass to determine the required height.

Number of Positive Solutions to a1 x1+a2 x2+......+an xn=k (k<=10^18)

The question is Number of solutions to a1 x1+a2 x2+....+an xn=k with constraints: 1)ai>0 and ai<=15 2)n>0 and n<=15 3)xi>=0 I was able to formulate a Dynamic programming solution but it is running too long for n>10^10. Please guide me to get a more efficient soution.
The code
int dp[]=new int[16];
dp[0]=1;
BigInteger seen=new BigInteger("0");
while(true)
{
for(int i=0;i<arr[0];i++)
{
if(dp[0]==0)
break;
dp[arr[i+1]]=(dp[arr[i+1]]+dp[0])%1000000007;
}
for(int i=1;i<15;i++)
dp[i-1]=dp[i];
seen=seen.add(new BigInteger("1"));
if(seen.compareTo(n)==0)
break;
}
System.out.println(dp[0]);
arr is the array containing coefficients and answer should be mod 1000000007 as the number of ways donot fit into an int.
Update for real problem:
The actual problem is much simpler. However, it's hard to be helpful without spoiling it entirely.
Stripping it down to the bare essentials, the problem is
Given k distinct positive integers L1, ... , Lk and a nonnegative integer n, how many different finite sequences (a1, ..., ar) are there such that 1. for all i (1 <= i <= r), ai is one of the Lj, and 2. a1 + ... + ar = n. (In other words, the number of compositions of n using only the given Lj.)
For convenience, you are also told that all the Lj are <= 15 (and hence k <= 15), and n <= 10^18. And, so that the entire computation can be carried out using 64-bit integers (the number of sequences grows exponentially with n, you wouldn't have enough memory to store the exact number for large n), you should only calculate the remainder of the sequence count modulo 1000000007.
To solve such a problem, start by looking at the simplest cases first. The very simplest cases are when only one L is given, then evidently there is one admissible sequence if n is a multiple of L and no admissible sequence if n mod L != 0. That doesn't help yet. So consider the next simplest cases, two L values given. Suppose those are 1 and 2.
0 has one composition, the empty sequence: N(0) = 1
1 has one composition, (1): N(1) = 1
2 has two compositions, (1,1); (2): N(2) = 2
3 has three compositions, (1,1,1);(1,2);(2,1): N(3) = 3
4 has five compositions, (1,1,1,1);(1,1,2);(1,2,1);(2,1,1);(2,2): N(4) = 5
5 has eight compositions, (1,1,1,1,1);(1,1,1,2);(1,1,2,1);(1,2,1,1);(2,1,1,1);(1,2,2);(2,1,2);(2,2,1): N(5) = 8
You may see it now, or need a few more terms, but you'll notice that you get the Fibonacci sequence (shifted by one), N(n) = F(n+1), thus the sequence N(n) satisfies the recurrence relation
N(n) = N(n-1) + N(n-2) (for n >= 2; we have not yet proved that, so far it's a hypothesis based on pattern-spotting). Now, can we see that without calculating many values? Of course, there are two types of admissible sequences, those ending with 1 and those ending with 2. Since that partitioning of the admissible sequences restricts only the last element, the number of ad. seq. summing to n and ending with 1 is N(n-1) and the number of ad. seq. summing to n and ending with 2 is N(n-2).
That reasoning immediately generalises, given L1 < L2 < ... < Lk, for all n >= Lk, we have
N(n) = N(n-L1) + N(n-L2) + ... + N(n-Lk)
with the obvious interpretation if we're only interested in N(n) % m.
Umm, that linear recurrence still leaves calculating N(n) as an O(n) task?
Yes, but researching a few of the mentioned keywords quickly leads to an algorithm needing only O(log n) steps ;)
Algorithm for misinterpreted problem, no longer relevant, but may still be interesting:
The question looks a little SPOJish, so I won't give a complete algorithm (at least, not before I've googled around a bit to check if it's a contest question). I hope no restriction has been omitted in the description, such as that permutations of such representations should only contribute one to the count, that would considerably complicate the matter. So I count 1*3 + 2*4 = 11 and 2*4 + 1*3 = 11 as two different solutions.
Some notations first. For m-tuples of numbers, let < | > denote the canonical bilinear pairing, i.e.
<a|x> = a_1*x_1 + ... + a_m*x_m. For a positive integer B, let A_B = {1, 2, ..., B} be the set of positive integers not exceeding B. Let N denote the set of natural numbers, i.e. of nonnegative integers.
For 0 <= m, k and B > 0, let C(B,m,k) = card { (a,x) \in A_B^m × N^m : <a|x> = k }.
Your problem is then to find \sum_{m = 1}^15 C(15,m,k) (modulo 1000000007).
For completeness, let us mention that C(B,0,k) = if k == 0 then 1 else 0, which can be helpful in theoretical considerations. For the case of a positive number of summands, we easily find the recursion formula
C(B,m+1,k) = \sum_{j = 0}^k C(B,1,j) * C(B,m,k-j)
By induction, C(B,m,_) is the convolution¹ of m factors C(B,1,_). Calculating the convolution of two known functions up to k is O(k^2), so if C(B,1,_) is known, that gives an O(n*k^2) algorithm to compute C(B,m,k), 1 <= m <= n. Okay for small k, but our galaxy won't live to see you calculating C(15,15,10^18) that way. So, can we do better? Well, if you're familiar with the Laplace-transformation, you'll know that an analogous transformation will convert the convolution product to a pointwise product, which is much easier to calculate. However, although the transformation is in this case easy to compute, the inverse is not. Any other idea? Why, yes, let's take a closer look at C(B,1,_).
C(B,1,k) = card { a \in A_B : (k/a) is an integer }
In other words, C(B,1,k) is the number of divisors of k not exceeding B. Let us denote that by d_B(k). It is immediately clear that 1 <= d_B(k) <= B. For B = 2, evidently d_2(k) = 1 if k is odd, 2 if k is even. d_3(k) = 3 if and only if k is divisible by 2 and by 3, hence iff k is a multiple of 6, d_3(k) = 2 if and only if one of 2, 3 divides k but not the other, that is, iff k % 6 \in {2,3,4} and finally, d_3(k) = 1 iff neither 2 nor 3 divides k, i.e. iff gcd(k,6) = 1, iff k % 6 \in {1,5}. So we've seen that d_2 is periodic with period 2, d_3 is periodic with period 6. Generally, like reasoning shows that d_B is periodic for all B, and the minimal positive period divides B!.
Given any positive period P of C(B,1,_) = d_B, we can split the sum in the convolution (k = q*P+r, 0 <= r < P):
C(B,m+1, q*P+r) = \sum_{c = 0}^{q-1} (\sum_{j = 0}^{P-1} d_B(j)*C(B,m,(q-c)*P + (r-j)))
+ \sum_{j = 0}^r d_B(j)*C(B,m,r-j)
The functions C(B,m,_) are no longer periodic for m >= 2, but there are simple formulae to obtain C(B,m,q*P+r) from C(B,m,r). Thus, with C(B,1,_) = d_B and C(B,m,_) known up to P, calculating C(B,m+1,_) up to P is an O(P^2) task², getting the data necessary for calculating C(B,m+1,k) for arbitrarily large k, needs m such convolutions, hence that's O(m*P^2).
Then finding C(B,m,k) for 1 <= m <= n and arbitrarily large k is O(n^2*P^2), in time and O(n^2*P) in space.
For B = 15, we have 15! = 1.307674368 * 10^12, so using that for P isn't feasible. Fortunately, the smallest positive period of d_15 is much smaller, so you get something workable. From a rough estimate, I would still expect the calculation of C(15,15,k) to take time more appropriately measured in hours than seconds, but it's an improvement over O(k) which would take years (for k in the region of 10^18).
¹ The convolution used here is (f \ast g)(k) = \sum_{j = 0}^k f(j)*g(k-j).
² Assuming all arithmetic operations are O(1); if, as in the OP, only the residue modulo some M > 0 is desired, that holds if all intermediate calculations are done modulo M.

Quadratic testing in hash tables

During an assignment, I was asked to show that a hash table of size m (m>3, m is prime) that is less than half full, and that uses quadratic checking (hash(k, i) = (h(k) + i^2) mod m) we will always find a free spot.
I've checked and arrived to the conclusion that the spots that will be found (when h(k)=0) are 0 mod m, 1 mod m, 4 mod m, 9 mod m, ...
My problem is that I can't figure a way to show that it will always find the free spot. I've tested it myself with different values of m, and also have proven myself that if the hash table is more than half full, we might never find a free spot.
Can anyone please hint me towards the way to solve this?
Thanks!
0, 1, 4, ..., ((m-1)/2)^2 are all distinct mod m. Why?
Suppose two numbers from that range, i^2 and j^2, are equivalent mod m.
Then i^2 - j^2 = (i-j)(i+j) = 0 (mod m). Since m is prime, m must divide one of those factors. But the factors are both less than m, so one of them ((i-j)) is 0. That is, i = j.
Since we are starting at 0, more than half the slots that are distinct. If you can only fill less than m/2 of them, at least one remains open.
Let's break the proof down.
Setup
First, some background.
With a hash table, we define a probe sequence P. For any item q, following P will eventually lead to the right item in the hash table. The probe sequence is just a series of functions {h_0, ..., h_M-1} where h_i is a hash function.
To insert an item q into the table, we look at h_0(q), h_1(q), and so on, until we find an empty spot. To find q later, we examine the same sequence of locations.
In general, the probe sequence is of the form h_i(q) = [h(q) + c(i)] mod M, for a hash table of size M, where M is a prime number. The function c(i) is the collision-resolution strategy, which must have two properties:
First, c(0) = 0. This means that the first probe in the sequence must be equal to just performing the hash.
Second, the values {c(0) mod M, ..., c(M-1) mod M} must contain every integer between 0 and M-1. This means that if you keep trying to find empty spots, the probe sequence will eventually probe every array position.
Applying quadratic probing
Okay, we've got the setup of how the hash table works. Let's look at quadratic probing. This just means that for our c(i) we're using a general quadratic equation of the form ai^2 + bi + c, though for most implementations you'll usually just see c(i) = i^2 (that is, b, c = 0).
Does quadratic probing meet the two properties we talked about before? Well, it's certainly true that c(0) = 0 here, since (0)^2 is indeed 0, so it meets the first property. What about the second property?
It turns out that in general, the answer is no.
Theorem. When quadratic probing is used in a hash table of size M, where M is a prime number, only the first floor[M/2] probes in the probe sequence are distinct.
Let's see why this is the case, using a proof by contradiction.
Say that the theorem is wrong. Then that means there are two values a and b such that 0 <= a < b < floor[M/2] that probe the same position.
h_a(q) and h_b(q) must probe the same position, by (1), so h_a(q) = h_b(q).
h_a(q) = h_b(q) ==> h(q) + c(a) = h(q) + c(b), mod M.
The h(q) on both sides cancel. Our c(i) is just c(i) = i^2, so we have a^2 = b^2.
Solving the quadratic equation in (4) gives us a^2 - b^2 = 0, mod M. This is a difference of two squares, so the solution is (a - b)(a + b) = 0, mod M.
But remember, we said M was a prime number. The only way that (a - b)(a + b) can be zero mod M is if [case I] (a - b) is zero, or [case II] (a + b) is zero mod M.
Case I can't be right, because we said that a != b, so a - b must be something other than zero.
The only way for (a + b) to be zero mod M is for a + b to be equal to be a multiple of M or zero. They clearly can't be zero, since they're both bigger than zero. And since they're both less than floor[M/2], their sum must be less than M. So case II can't be right either.
Thus, if the theorem were wrong, one of two quantities must be zero, neither of which can possibly be zero -- a contradiction! QED: quadratic probing doesn't satisfy property two once your table is more than half full and if your table size is a prime number. The proof is complete!
From Wikipedia:
For prime m > 2, most choices of c1 and c2 will make h(k,i) distinct for i in [0,(m − 1) / 2]. Such choices include c1 = c2 = 1/2, c1 = c2 = 1, and c1 = 0,c2 = 1. Because there are only about m/2 distinct probes for a given element, it is difficult to guarantee that insertions will succeed when the load factor is > 1/2.
See the quadratic probing section in Data Structures and Algorithms with Object-Oriented Design Patterns in C++ for a proof that m/2 elements are distinct when m is prime.

Resources