This is a 2016 entrance exam question:
We have N balls with distinct and unknown weights that have labels 1 to n. We are given a two-pan balance and want to use it for weighting these balls in pairs and writing them on a paper in-order to sort all of these balls. In the worst case, how many weighing operations are need? Choose the best answer.
a) Ceil[ n log2 n ]
b) Floor[ n log2 n ]
c) n − 1
d) Ceil[ log2 n! ]
According to the answer answer sheet, the correct solution is: Ceil[ log2 n! ]
My question is: how is this solution is achieved (how does this algorithms work, is there any pesudocode?)?
If you look at Number of Comparisons in Merge-Sort you will find my answer there arguing that the total number of comparisons for mergesort (which is known to have good asymptotic behavior) is
n ⌈log2 n⌉ − 2⌈log2 n⌉ + 1
Of course n ⌈log2 n⌉ = ⌈n log2 n⌉ and 2⌈log2 n⌉ ≥ n so for n ≥ 1 this confirms answer (a) as an upper bound.
Is (b) a tighter upper bound? If you write ⌈log2 n⌉ = log2 n + d for some 0 ≤ d < 1 then you get
n (log2 n + d) − 2d n + 1 = n (log2 n + d − 2d) + 1 = (n log2 n) + n (d − 2d + 1/n)
and if you write m := ⌈log2 n⌉ and n = 2m − d that last parenthesis becomes (d − 2d + 2d − m).
Plotting this for some values of m shows that for integers m ≥ 1 this will very likely be zero. You get m = 0 for n = 1, which means d = 0 so the whole parenthesis becomes zero. So when you worked out the details of the proof, this will show that (b) is indeed an upper bound for mergesort.
How about (c)? There is an easy counterexample for n = 3. If you know that ball 1 is lighter than 2 and smaller than 3, this doesn't tell you how to sort 2 and 3. You can show that you can't have chosen a suboptimal algorithm by comparing 1 to both 2 and 3, due to the symmetry of the problem this is a generic situation. So (c) is not an upper bound. Can it be a lower bound? Sure, even to confirm that the balls are already ordered you have to weigh each consecutive pair, resulting in n − 1 comparisons. Even with the best algorithm you can't do better than guessing the correct order and then confirming your guess.
Is (d) a tighter lower bound? Plots again suggest that it is at least as great as (c), with the exception of a small region with no integer values. So if it is a lower bound, it will be tighter. Now think of a decision tree. Every algorithm to order these n balls can be written as a binary decision tree: you compare two balles named in a given node, and depending on the result of the comparison you proceed with one of two possible next steps. That decision tree has to have n! leafs, since every permutation has to be a distinct leaf so you know the exact permutation once you have reached a leaf. And a binary tree with n! leafs has to have a depth of at least ⌈log2 n!⌉. So yes, this is a lower bound as well.
Summarizing all of this you have (c) ≤ (d) ≤ x ≤ (b) ≤ (a), where x denotes the number of comparisons an optimal algorithm would need to order all the balls. As a comment by Mark Dickinson pointed out, A036604 on OEIS gives explicit lower bounds for some few n, and for n = 12 the inequality (d) < x is strict. So (d) does not describe the optimal algorithm exactly either.
By the way (and to answer your “how does this algorithms work”), finding the optimal algorithm for a given n is fairly easy, at least in theory: compute all possible decision trees for those n! sortings, and choose one with minimal depth. Of course this approach becomes impractical fairly quickly.
Now that we know that none of the answers gives the correct count of the optimal sorting algorithm, which answer is “best”? That depends a lot on context. In many applications, knowing an upper bound to the worst time behavior is more valuable than knowing a lower limit, so (b) would be superior to (d). But apparently the person creating the solution sheet had a different opinion, and went for (d), either because it is closer to the optimum (which I assume but have not proven) or because a lower bound is more useful to the application at hand. If you wanted to, you could likely challenge the whole question on the grounds that “best” wasn't adequately defined in the scope of the question.
Related
I am now stuck two days at this exercise from my professor:
"Consider the ordered binary tree over a set S ⊆ ℕ, built by repeated insertion, where the elements of S are inserted by a permutation picked uniformly at random. Prove that the height of the tree is O(logn) with high propability."
My work so far has been to study about the propabilistic analysis of random algorithms. For example, CLSR book has a chapter "12.4 Randomly built binary search trees" where its proven that the expected height of a binary tree built by repeated insertion over a random permutation is O(logn). Many other books prove this bound. But this is not what we are looking for. We want to prove a way stronger bound; that the height is O(logn) with high propability. I've studied the classic paper "A Note on the Height of Binary Search Trees, Luc Devroye, 1986" where he proves that the height is ~ 4.31107... logn , with high probability. But the analysis is way out of my league. I couldn't understand the logic of key points in the paper.
Every book and article i've seen uses the citation of Devroye's paper, and says "it can also be proven that with high probability the height is O(logn)".
How should I proceed further?
Thanks in advance.
I will outline my best idea based on well-known probability results. You will need to add details and make it rigorous.
First let's consider the process of descending down pivots to a random node in a binary tree. Suppose that your random node is known to be somewhere between i and i+m-1. At the next step which adds to the length of the path, you will pick a number j in that range. With probability (j-1)/m our random node is now in a range of length j. With probability 1/m it was j. And with probability (m-j-1)/m it was above j and is now in a range of length (m-j-i). Within those ranges, the unknown node is evenly distributed.
The obvious continuous approximation is to go from discrete to continuous. We pick a random number x from 0 to m to be the next pivot. With probability of x/m we are in a range of size x. With probability of (m-x)/m we are in a range of size m-x. We have therefore shrunk our factor by a random number X that is x/m or (m-x)/m. The distribution of X is known. And the sequence of samples we take from the continuous approximation is independent.
One more note. log(X) has both an expected value E and a variance V that can be calculated. Since X is always between 0 and 1, its expected value is negative.
Now pick ε with 0 < ε. The outline of the proof is as follows.
Show that at each step, the expected error from the discrete to the continuous approximation increases by at most O(1).
Show that the probability that the sum of (ε - 1/E) log(n) samples of log(X) fails to be below -log(n) is O(1/n).
Show that the probability that a random node is at depth (2ε + 1/E) log(n) or less is O(1/n).
Show that the probability of a random permutation has ANY node at depth (3ε + 1/E) log(n) is O(1/log(n)) or more.
Let's go.
1. Show that at each step, the error from the discrete to the continuous approximation increases by at most O(1).
The two roundoffs in each step are at most 1. Any errors carried over from the previous step will shrink by a random factorThe previous errors from the approximation shrink, but not increase, in the next step. And the two roundoffs are both at most 1. So the error increases by at most 2.
2. Show that the probability that the sum of (ε - 1/E) log(n) samples of log(X) fails to be below -log(n) is O(1/n).
The expected value of summing log(X) for (ε - 1/E) log(n) times is (ε E - 1) log(n). Since E is negative, this is below our target. Now we can use the Bernstein Inequalities we can put a bound on the probability of being that far from the mean. This will turn out to be proportional to an exponential in the number of variables. Since we have O(log(n)), this will be proportional to 1/n.
3. Show that the probability that a random node is at depth (2ε + 1/E) log(n) or less is O(1/n).
With probability 1 - O(1/n), in (ε + 1/E) log(n) steps the continuous approximation has converged to within 1. There were O(log(n)) steps, and therefore the error between continuous and discrete is at most O(log(n)). (Look back to step 1 to see that.) So we just have to show that the odds of failing to go from O(log(n)) possibilities to 1 in ε log(n) steps is at most O(1/n).
This would be implied if we could show that for any given constant a, the odds of failing to go from a k to 1 possibilities in at most k steps is a negative exponential in k. (k here is ε log(n).)
For that, let's record a 1 every time we cut the space in half with the next element, and a 0 otherwise. The number of such sequences is 2^k. But for any given a, if k is large enough, then you can't cut the space in half more than k/4 times without reducing the search space to 1. note that each time with odds at least 1/2 you cut the search space by at least 1/2. In k steps there are 2^k sequences of 1 or 0 for whether you cut the search space in half. A little playing around with the binomial formula and Stirling's approximation will get you an upper limit on the likelihood of failing to halve enough times of the form O( k (3/4)^k ). Which is sufficient for our purposes.
4. Show that the probability of a random permutation has ANY node at depth at least (3ε + 1/E) is O(1/log(n)).
The proportion of random nodes in random binary trees that are depth at least (2ε + 1/E) log(n) is at most < p/n for some p.
Any random tree with any node at depth (3ε + 1/E) log(n) has n nodes and ε log(n) nodes of depth at least (2ε + 1/E) log(n). If the odds of having a node at depth (3ε + 1/E) log(n) exceeds p / (ε log(n)), then we have too many nodes of depth (2ε + 1/E) log(n) just in those graphs. And therefore by the pigeon hole principle, we have our upper bound on the likelihood.
Say if we have an algorithm needs to list out all possibilities of choosing k elements from n elements (k<=n), is the time complexity of the particular algorithm exponential and why?
No.
There are n choose k = n!/(k!(n-k)!) possibilities [1].
Consider that, n choose k = n^k / (k!). [2].
Assuming you are keeping k constant, as n grows, the amount of possibilities increases in polynomial time.
For this example, ignore the (1/(k!)) term because it is constant. If k = 2, and you increase n from 2 to 3, then you have a 2^2 to 3^2 change. An exponential change would be from 2^2 to 2^3. This is not the same.
Keeping k constant and changing n results in a big O of O(n^k) (the 1/(k!) term is constant and you ignore it).
Thinking carefully about the size of the input instance is required since the input instance contains numbers - a basic familiarity with weak NP-hardness can also be helpful.
Assume that we fix k=1 and encode n in binary. Since the algorithm must visit n choose 1 = n numbers, it takes at least n steps. Since the magnitude of the number n may be exponential in the size of the input (the number of bits used to encode n), the algorithm in the worst case consumes exponential time.
You can get a feel for this exponential-time behavior by writing a simple C program that prints all the numbers from 1 to n with n = 2^64 and see how far you get in a minute. While the input is only 64 bits long, it would take you about 600 years to print all the numbers assuming that your device can print a million numbers per second.
An algorithm that finds all possibilities of choosing k elements from n unique elements (k<=n), does NOT have an exponential time complexity, O(K^n), because it instead has a factorial time complexity, O(n!). The relevant formula is:
p = n!/(k!(n-k)!)
I know that a certain algorithm I am using does 2Nk - 4k^2 operations, with parameters N and k. Now, the first derivative of that function is 2N - 8k (I know that N and k can only be positive integers here, but bear with me). That derivative is positive when k < N/4 and negative when k > N/4. So the complexity actually reduces if we increase k past a certain point. How will I express this in Big O notation? Also note that k <= (N - 1)/2, so there is an upper bound on k.
NOTE:
I know that a similar question has been asked here, but it does not consider the case where the first derivative changes sign if one of the variables reaches a certain point.
Today I was reading a great article from Julienne Walker about sorting - Eternally Confuzzled - The Art of Sorting and one thing caught my eye. I don't quite understand the part where the author proves that for sorting by comparison we are limited by Ω(N·log N) lower bound
Lower bounds aren't as obvious. The lowest possible bound for most sorting algorithms is Ω(N·log N). This is because most sorting algorithms use item comparisons to determine the relative order of items. Any algorithm that sorts by comparison will have a minimum lower bound of Ω(N·log N) because a comparison tree is used to select a permutation that's sorted. A comparison tree for the three numbers 1, 2, and 3 can be easily constructed:
1 < 2
1 < 3 1 < 3
2 < 3 3,1,2 2,1,3 2 < 3
1,2,3 1,3,2 2,3,1 3,2,1
Notice how every item is compared with every other item, and that each path results in a valid permutation of the three items. The height of the tree determines the lower bound of the sorting algorithm. Because there must be as many leaves as there are permutations for the algorithm to be correct, the smallest possible height of the comparison tree is log N!, which is equivalent to Ω(N·log N).
It seems to be a very reasonable until the last part (bolded) which I don't quite understand - how log N! is equivalent to Ω(N·log N). I must be miss something from my CopmSci courses and can't get the last transition. I'm looking forward for help with this or for some link to other evidence that we are limited Ω(N·log N) if we use sorting by comparison.
You didn't miss anything from CompSci class. What you missed was math class. The Wikipedia page for Stirling's Approximation shows that log n! is asymptotically n log n + lower order terms.
N! < N^N
∴ log N! < log (N^N)
∴ log N! <N * log N
With this, you can prove θ(log N!) = O(N log N). Proving the same for Ω is left as an exercise for the reader, or a question for mathematics stackexchange or theoretical computer science stackexchange.
My favorite proof of this is very elementary.
N! = 1 * 2 * .. * N - 1 * N
We can can a very easy lower bound by pretending the first half of those products don't exist, and then that the second half are all just N/2.
(N/2)^(N/2) <= N!
log((N/2)^(N/2) = N/2 * log(N/2) = N/2 * (log(N) - 1) = O(n log n)
So even when you take only the second half of the expression, and pretend that all those factors are no bigger than N/2, you are still in O(n log n) territory for a lower bound, and this is super elementary. I could convince an average high school student of this. I can't even derive Stirling's formula by myself.
I was just answering a question about different approaches for picking the partition in a quicksort implementation and came up with a question that I honestly don't know how to answer. It's a bit math-heavy, and this may be the wrong site on which to ask this, so if this needs to move please let me know and I'll gladly migrate it elsewhere.
It's well-known that a quicksort implementation that picks its pivots uniformly at random will end up running in expected O(n lg n) time (there's a nice proof of this on Wikipedia). However, due to the cost of generating random numbers, many quicksort implementations don't pick pivots randomly, but instead rely on a "median-of-three" approach in which three elements are chosen deterministically and of which the median is chosen as the pivot. This is known to degenerate to O(n2) in the worst-case (see this great paper on how to generate those worst-case inputs, for example).
Now, suppose that we combine these two approaches by picking three random elements from the sequence and using their median as the choice of pivot. I know that this also guarantees O(n lg n) average-case runtime using a slightly different proof than the one for the regular randomized quicksort. However, I have no idea what the constant factor in front of the n lg n term is in this particular quicksort implementation. For regular randomized quicksort Wikipedia lists the actual runtime of randomized quicksort as requiring at most 1.39 n lg n comparisons (using lg as the binary logarithm).
My question is this: does anyone know of a way to derive the constant factor for the number of comparisons made using a "median-of-three" randomized quicksort? If we go even more generally, is there an expression for the constant factor on quicksort using a randomized median-of-k approach? I'm curious because I think it would be fascinating to see if there is some "sweet spot" of this approach that makes fewer comparisons than other randomized quicksort implementations. I mean, wouldn't it be cool to be able to say that randomized quicksort with a randomized median-of-six pivot choice makes the fewest comparisons? Or be able to conclusively say that you should just pick a pivot element at random?
Here's a heuristic derivation of the constant. I think it can be made rigorous, with a lot more effort.
Let P be a continuous random variable with values in [0, 1]. Intuitively, P is the fraction of values less than the pivot. We're looking to find the constant c such that
c n lg n = E[n + c P n lg (P n) + c (1 - P) n lg ((1 - P) n)].
A little bit of algebra later, we have
c = 1/E[-P lg P - (1 - P) lg (1 - P))].
In other words, c is the reciprocal of the expected entropy of the Bernoulli distribution with mean P. Intuitively, for each element, we need to compare it to pivots in a way that yields about lg n bits of information.
When P is uniform, the pdf of P is 1. The constant is
In[1]:= -1/NIntegrate[x Log[2, x] + (1 - x) Log[2, 1 - x], {x, 0, 1}]
Out[1]= 1.38629
When the pivot is a median of 3, the pdf of P is 6 x (1 - x). The constant is
In[2]:= -1/NIntegrate[6 x (1 - x) (x Log[2, x] + (1 - x) Log[2, 1 - x]), {x, 0, 1}]
Out[2]= 1.18825
The constant for the usual randomized quicksort is easy to compute because the probability that two elements k locations apart are compared is exactly 2/(k+1): the probability that one of the those two elements is chosen as a pivot before any of the k-1 elements between them. Unfortunately nothing so clever applies to your algorithm.
I'm hesitant to attempt your bolded question because I can answer your "underlying" question: asymptotically speaking, there is no "sweet spot". The total added cost of computing medians of k elements, even O(n1 - ε) elements, is linear, and the constant for the n log n term decreases with the array being split more evenly. The catch is of course constants on the linear term that are spectacularly impractical, highlighting one of the drawbacks of asymptotic analysis.
Based on my comments below, I guess k = O(nα) for 0 < α < 1 is the "sweet spot".
If the initial state of the set is randomly ordered, you will get the exact same constant factor for randomly picking three items to calculate the median as when picking three items deterministically.
The motive for picking item by random would be that the deterministic method would give a result that is worse than the average. If the deterministic method gives a good median, you can't improve on it by picking items by random.
So, which ever method gives the best result depends on the input data, it can't be determined for every possible set.
The only sure way to lower the constant factor is to increase the number of items that you use to calcuate the median, but at some point calculating the median will be more expensive than what you gain by getting a better median value.
Yes, it does. Bentley and McIlroy, authors of the C standard library's qsort function, wrote in their paper, Engineering a Sort Function the following numbers:
1.386 n lg n average comparisons using first, middle or a randomized pivot
1.188 n lg n average comparisons using a median of 3 pivot
1.094 n lg n average comparisons using a median of 3 medians pivot
According to the above paper:
Our final code therefore chooses the middle element of smaller arrays,
the median of the first, middle and last elements of a mid-sized
array, and the pseudo-median of nine evenly spaced elements of a large
array.
Just a thought: If you use the median-of-three approach, and you find it to be better, why not use a median-of-five or median-of-eleven approach? And while you are on it, maybe one can think of a median-of-n optimization... hmmm... Ok, that is obviously a bad idea (since you would have to sort your sequence for that...).
Basically, to choose your pivot element as the median-of-m elements, you sort those m elements, right? Therefore, I'd simply guess, that one of the constants you are looking for is "2": By first sorting 3 elements to choose your pivot you execute how many additional comparisons? Lets say its 2. You do this inside the quicksort over and over again. A basic conclusion would be that the median-of-3 is therefore 2-times slower then the simple random quicksort.
But what is working for you here? That you get a better device-and-conquer-distribution, and you are better protected against the degenerated case (a bit).
So, back to my infamous question at the beginning: Why not choose the pivot element from a median-of-m, m being 5, 7, n/3, or so. There must be a sweet spot where the sorting of the m elements is worse then the gain from the better divide-and-conquer behavior and quicksort. I guess, this sweet-spot is there very early -- you have to fight first against the constant factor of 2 comparisons if you choose median-of-3. It is worth an experiment, I admit, but I would not be too expectant of the result :-) But if I am wrong, and the gain is huge: don't stop at 3!