Algorithm to compute k fractions of form 1/r summing up to 1 - algorithm

Given k, we need to write 1 as a sum of k fractions of the form 1/r.
For example,
For k=2, 1 can uniquely be written as 1/2 + 1/2.
For k=3, 1 can be written as 1/3 + 1/3 + 1/3 or 1/2 + 1/4 + 1/4 or 1/6 + 1/3 + 1/2
Now, we need to consider all such set of k fractions that sum upto 1 and return the highest denominator among all such sets; for instance, the sample case 2, our algorithm should return 6.
I came across this problem in a coding competition and couldn't come up with an algorithm for the same. A bit of Google search later revealed that such fractions are called Egyption Fractions but probably they are set of distinct fractions summing upto a particular value (not like 1/2 + 1/2). Also, I couldn't find an algo to compute Egyption Fractions (if they are at all helpful for this problem) when their number is restricted by k.

If all you want to do is find the largest denominator, there's no reason to find all the possibilities. You can do this very simply:
public long largestDenominator(int k){
long denominator = 1;
for(int i=1;i<k;i++){
denominator *= denominator + 1;
}
return denominator;
}
For you recursive types:
public long largestDenominator(int k){
if(k == 1)
return 1;
long last = largestDenominator(k-1);
return last * (last + 1); // or (last * last) + last)
}
Why is it that simple?
To create the set, you need to insert the largest fraction that will keep it under 1 at each step(except the last). By "largest fraction", I mean by value, meaning the smallest denominator.
For the simple case k=3, that means you start with 1/2. You can't fit another half, so you go with 1/3. Then 1/6 is left over, giving you three terms.
For the next case k=4, you take that 1/6 off the end, since it won't fit under one, and we need room for another term. Replace it with 1/7, since that's the biggest value that fits. The remainder is 1/42.
Repeat as needed.
For example:
2 : [2,2]
3 : [2,3,6]
4 : [2,3,7,42]
5 : [2,3,7,43,1806]
6 : [2,3,7,43,1807,3263442]
As you can see, it rapidly becomes very large. Rapidly enough that you'll overflow a long if k>7. If you need to do so, you'll need to find an appropriate container (ie. BigInteger in Java/C#).
It maps perfectly to this sequence:
a(n) = a(n-1)^2 + a(n-1), a(0)=1.
You can also see the relationship to Sylvester's sequence:
a(n+1) = a(n)^2 - a(n) + 1, a(0) = 2
Wikipedia has a very nice article explaining the relationship between the two, as pointed out by Peter in the comments.

I never heard of Egyptian fractions before but here are some thoughts:
Idea
You can think of them geometrically:
Start with a unit square (1x1)
Draw either vertical or horizontal lines dividing the square into equal parts.
Repeat optionally the drawing of lines inside any of the sub-boxes evenly.
Stop any time you want.
The rectangles present will form a set of fractions of the form 1/n that add to 1.
You can count them and they might equal your 'k'.
Depending on how many equal sections you divided a rectangle into, it will tell whether you have 1/2 or 1/3 or whatever. 1/6 is 1/2 of 1/3 or 1/3 of 1/2. (i.e. You dived by 2 and then one of the sub-boxes by 3 OR the other way around.)
Idea 2
You start with 1 box. This is the fraction 1/1 with k=1.
When you sub-divide by n you add n to the count of boxes (k or of fractions summed) and subtract 1.
When you sub-divide any of those boxes, again, subtract 1 and add n, the number of divisions. Note that n-1 is the number of lines you drew to divide them.
More
You are going to start searching for the answer with k. Obviously k * 1/k = 1 so you have one solution right there.
How about k-1?
There's a solution there: (k-2) * 1/(k-1) + 2 * (1/((k-1)*2))
How did I get that? I made k-1 equal sections (with k-2 vertical lines) and then divided the last one in half horizontally.
Each solution is going to consist of:
taking a prior solution
using j less lines and some stage and dividing one of the boxes or sub-boxes into j+1 equal sections.
I don't know if all solutions can be formed by repeating this rule starting from k * 1/k
I do know you can get effective duplicates this way. For example: k * 1/k with j = 1 => (k-2) * 1/(k-1) + 2 * (1/((k-1)*2)) [from above] but k * 1/k with j = (k-2) => 2 * (1/((k-1)*2)) + (k-2) * 1/(k-1) [which just reverses the order of the parts]
Interesting
k = 7 can be represented by 1/2 + 1/4 + 1/8 + ... + 1/(2^6) + 1/(2^6) and the general case is 1/2 + ... + 1/(2^(k-1)) + 1/(2^(k-1)).
Similarly for any odd k it can be represented by 1/3 + ... + 3 * [1/(3^((k-1)/2)].
I suspect there are similar patterns for all integers up to k.

Related

What is the time complexity of this BFS algorithm?

I looked at LeetCode question 270. Perfext Squares:
Given an integer n, return the least number of perfect square numbers that sum to n.
A perfect square is an integer that is the square of an integer; in other words, it is the product of some integer with itself. For example, 1, 4, 9, and 16 are perfect squares while 3 and 11 are not.>
Example 1:
Input: n = 12
Output: 3
Explanation: 12 = 4 + 4 + 4.
I solved it using the following algorithm:
def numSquares(n):
squares = [i**2 for i in range(1, int(n**0.5)+1)]
step = 1
queue = {n}
while queue:
tempQueue = set()
for node in queue:
for square in squares:
if node-square == 0:
return step
if node < square:
break
tempQueue.add(node-square)
queue = tempQueue
step += 1
It basically tries to go from goal number to 0 by subtracting each possible number, which are : [1 , 4, 9, .. sqrt(n)] and then does the same work for each of the numbers obtained.
Question
What is the time complexity of this algorithm? The branching in every level is sqrt(n) times, but some branches are destined to end early... which makes me wonder how to derive the time complexity.
If you think about what you're doing, you can imagine that you're doing a breadth-first search over a graph with n + 1 nodes (all the natural numbers between 0 and n, inclusive) and some number of edges m, which we'll determine later on. Your graph is essentially represented as an adjacency list, since at each point you iterate over all the outgoing edges (squares less than or equal to your number) and stop as soon as you consider a square that's too large. As a result, the runtime will be O(n + m), and all we have to do now is work out what m is.
(There's another cost here in computing all the square roots up to and including n, but that takes time O(n1/2), which is dominated by the O(n) term.)
If you think about it, the number of outgoing edges from each number k will be given by the number of perfect squares less than or equal to k. That value is equal to ⌊√k⌋ (check this for a few examples - it works!). This means that the total number of edges is upper-bounded by
√0 + √1 + √2 + ... + √n
We can show that this sum is Θ(n3/2). First, we'll upper-bound this sum at O(n3/2), which we can do by noting that
√0 + √1 + √2 + ... + √n
≤ √n + √n + √ n + ... + √n (n+1) times
= (n + 1)√n
= O(n3/2).
To lower-bound this at Ω(n3/2), notice that
√0 + √1 + √2 + ... + √ n
≥ √(n/2) + √(n/2 + 1) + ... + √(n) (drop the first half of the terms)
≥ √(n/2) + √(n/2) + ... + √(n/2)
= (n / 2)√(n / 2)
= Ω(n3/2).
So overall, the number of edges is Θ(n3/2), so using a regular analysis of breadth-first search we can see that the runtime will be O(n3/2).
This bound is likely not tight, because this assumes that you visit every single node and every single edge, which isn't going to happen. However, I'm not sure how to tighten things much beyond this.
As a note - this would be a great place to use A* search instead of breadth-first search, since you can fairly easily come up with heuristics to underestimate the remaining total distance (say, take the number and divide it by the largest perfect square less than it). That would cause the search to focus on extremely promising paths that jump rapidly toward 0 before less-good paths, like, say, always taking steps of size one.
Hope this helps!
Some observations:
The number of squares up to n is √n (floored to the nearest integer)
After the first iteration of the while loop, tempQueue will have √n entries
tempQueue can never have more than n entries, since all these values are positive, less than n and unique.
Every natural number can be written as the sum of four integer squares. So that means your BFS algorithm's while loop will iterate at the most 4 times. If the return statement did not get executed during any of the first 3 iterations, it is guaranteed it will in the 4th.
Every statement (except for the initialisation of squares) runs in constant time, even the call to .add().
The initialisation of squares has a list comprehension loop that has √n iterations, and range runs in constant time, so that initialisation has a time complexity of O(√n).
Now we can set a ceiling to the number of times the if node-square == 0 statement is executed (or any other statement in the innermost loop's body):
1⋅√n + √n⋅√n + n⋅√n + n⋅√n
Each of the 4 terms corresponds to an iteration of the while loop. The left factor of each product corresponds to the maximum size of queue in that particular iteration, and the factor at the right corresponds to the size of squares (always the same). This simplifies to:
√n + n + 2n3⁄2
In terms of time complexity this is:
O(n3⁄2)
This is the worst case time complexity. When the while loop only has to iterate twice, it is O(n), and when only once (when n is a square), it is O(√n).

Largest divisor such that two numbers divided by it round to the same value?

I've got an algorithm that can be interpreted as dividing up the number line into an equal number of chunks. For simplicity, I'll stick with [0,1), it will be divided up like so:
0|----|----|----|----|1
What I need to do is take a range of numbers [j,k) and find the largest number of chunks, N, up to some maximum M, that will divide up the number line so that [j,k) still all fall into the same "bin". This is trickier than it sounds, as the range can straddle a bin like so:
j|-|k
0|----|----|----|----|1
So that you may have to get to quite a low number before the range is entirely contained. What's more, as the number of bins goes up, the range may move in and out of a single bin, so that there's local minima.
The obvious answer is to start with M bins, and decrease the number until the range falls into a single bin. However, I'd like to know if there's a faster way than enumerating all possible divisions, as my maximum number can be reasonable large (80 million or so).
Is there a better algorithm for this?
Here I would like to give another heuristic, which is different from btilly's.
The task is to find integers m and n such that m / n <= j < k <= (m + 1) / n, with n as big as possible (but still under M).
Intuitively, it is preferable that the fraction m / n is close to j. This leads to the idea of using continued fractions.
The algorithm that I propose is quite simple:
calculate all the continued fractions of j using minus signs (so that the fractions are always approching j from above), until the denominator exceeds M;
for each such fraction m / n, find the biggest integer i >= 0 such that k <= (m * i + 1) / (n * i) and n * i <= M, and replace the fraction m / n with (m * i) / (n * i);
among all the fractions in 2, find the one with biggest denominator.
The algorithm is not symmetric in j and k. Hence there is a similar k-version, which in general should not give the same answer, so that you can choose the bigger one from the two results.
Example: Here I will take btilly's example: j = 0.6 and k = 0.65, but I will take M = 10.
I will first go through the j-procedure. To calculate the continued fraction expansion of j, we compute:
0.6
= 0 + 0.6
= 0 + 1 / (2 - 0.3333)
= 0 + 1 / (2 - 1 / (3 - 0))
Since 0.6 is a rational number, the expansion terminates in fintely many steps. The corresponding fractions are:
0 = 0 / 1
0 + 1 / 2 = 1 / 2
0 + 1 / (2 - 1 / 3) = 3 / 5
Computing the corresponding i values in step 2, we replaces the three factions with:
0 / 1 = 0 / 1
1 / 2 = 3 / 6
3 / 5 = 6 / 10
The biggest denominator is given by 6 / 10.
Continue with the example above, the corresponding k-procedure goes as follows:
0.65
= 1 - 0.35
= 1 - 1 / (3 - 0.1429)
= 1 - 1 / (3 - 1 / (7 - 0))
Hence the corresponding fractions:
1 = 1 / 1
1 - 1 / 3 = 2 / 3
1 - 1 / (3 - 1 / 7) = 13 / 20
Passing step 2, we get:
1 / 1 = 2 / 2
2 / 3 = 6 / 9
13 / 20 = 0 / 0 (this is because 20 is already bigger than M = 10)
The biggest denominator is given by 6 / 9.
EDIT: experimental results.
To my surprise, the algorithm works better than I thought.
I did the following experiment, with the bound M ignored (equivalently, one can take M big enough).
In every round, I generate a pair (j, k) of uniformly distributed random numbers in the inteval [0, 1) with j < k. If the difference k - j is smaller than 1e-4, I discard this pair, making this round ineffective. Otherwise I calculate the true result trueN using naive algorithm, and calculate the heuristic result heurN using my algorithm, and add them to statistic data. This goes for 1e6 rounds.
Here is the result:
effective round = 999789
sum of trueN = 14013312
sum of heurN = 13907575
correct percentage = 99.2262 %
average quotient = 0.999415
The correct percentage is the percentage of effective rounds such that trueN is equal to heurN, and the average quotient is the average of the quotient heurN / trueN for all effective rounds.
Thus the method gives the correct answer in 99%+ cases.
I also did experiments with smaller M values, and the results are similar.
The best case for the bin size must be larger than k-j.
Consider the number line segments [0..j]and [k..1). If we can divide both of the partial segments into parts using the same bin size, we should be able to solve the problem.
So if we consider gcd((j-0)/(k-j), (1-k)/(k-j)), (where we use the greatest-integer-function after the division), we should be able to get a good estimate, or the best value. There are corner cases: if (k-j) > j or (k-j) > (1-k), the best value is 1 itself.
So a very good estimate should be min(1, (k-j) * gcd((j-0)/(k-j), (1-k)/(k-j)))
Let's turn this around a bit.
You'd like to find m, n as large as you can (though n < M) with m/n close to but less than j and k <= (m+1)/n.
All promising candidates will be on the https://en.wikipedia.org/wiki/Stern%E2%80%93Brocot_tree. Indeed you'll get a reasonably good answer just walking the Stern-Brocot tree to find the last "large rational" fitting your limit just below j whose top is at k or above.
There is a complication. Usually the tree converges quickly. But sometimes the Stern-Brocot tree has long sequences with very small gaps. For example the sequence to get to 0.49999999 will include 1/3, 2/5, 3/7, 4/9, ... We always fall into those sequences when a/b < c/d, then we take the median (a+c)/(b+d) and then we walk towards one side, so (a+i*c)/(b+i*d). If you're clever, then rather than walking the whole sequence you can just do a binary search for the right power of i to use.
The trick to that cleverness is to view your traversal as:
Start with 2 "equal" fractions.
Take their median. If that exceeds M then I'm done. Else figure out which direction I am going from that.
Try powers of 2 in (a+i*c)/(b+i*d) until I know what range i is in for my range and M conditions.
Do binary search to find the last i that I can use.
(a+i*c)/(b+i*d) and (a+i*c+c)/(b+i*d+d) are my two new equal fractions. Go back to the first step.
The initial equal fractions are, of course, 0/1 and 1/1.
This will always find a decent answer in O(log(M)) operations. Unfortunately this reasonably good answer is NOT always correct. Consider the case where M = 3, j=0.6 and k=0.65. In this case the heuristic would stop at 1/2 while the actual best answer is 1/3.
Another way that it can fail is that it only finds reduced answers. In the above example if M was 4 then it still thinks that the best answer is 1/2 when it is actually 1/4. It is easy to handle this by testing whether a multiple of your final answer will work. (That step will improve your answer a fixed, but reasonably large, fraction of the time.)

Reversed Huffman coding

Suppose I have a collection of words with a predefined binary prefix code. Given a very large random binary chunk of data, I can parse this chunk into words using the prefix code.
I want to determine, at least approximately (for random chunks of very large lengths) the expectation values of number of hits for each word (how many times it is mentioned in the decoded text).
At first glance, the problem appears trivial - the probability of each word being scanned from the random pool of bits is completely determined by its length (since each bit can be either 0 or 1). But I suspect this to be an incorrect answer to the problem above since words have different lengths and thus this probability is not the same as the expected number of hits (divided by the length of the data chunk).
UPD: I was asked (in comments below) to state this problem mathematically, so here it goes.
Let w be a list of words written with only zeros and ones (our alphabet consists of only two letters). Furthermore, no word in w is a prefix of any other word. Thus w forms a legitimate binary prefix code. I want to know (at least approximately) the mean value of hits, for each word in w, averaged over all possible binary chunks of data with fixed size n. n can be taken very large, much much larger than any of the lengths of our words. However, words have different lengths and this can not be neglected.
I would appreciate any references to attempts to solve this.
My brief answer: the expected number of hits (or rather the expected proportion of hits) can be calculated for every given list of words.
I will not describe the full algorithm, but just do the following example in detail for illustration: let us fix the following very simple list of three words: 0, 10, 11.
For every n, there are 2^n different data chunks of length n (I mean n bits), each occur with the same probability 2^(-n).
The first observation is that, not all the data chunks can be decoded exactly - e.g. the data 0101, when you decode, there will remain a single 1 in the end.
Let us write U(n) for the number of length n data chunks that CAN be decoded exactly, and write V(n) for the others (i.e. those with an extra 1 in the end). The following recurrence relations are clear:
U(n) + V(n) = 2^n
V(n) = U(n - 1)
with the initial values U(0) = 1 and V(0) = 0.
A simple calculation then yields:
U(n) = (2^(n + 1) + (- 1)^n) / 3.
Now let A(n) (resp. B(n), C(n)) be the sum of the number of hits on the word 0 (resp. 10, 11) for all the U(n) exact data chunks, and let a(n) (resp. b(n), c(n)) be the same sum for all the V(n) inexact data chunks (the last 1 does not count in this case).
Then we have the following relations:
a(n) = A(n - 1), b(n) = B(n - 1), c(n) = C(n - 1)
A(n) = A(n - 1) + U(n - 1) + A(n - 2) + A(n - 2)
B(n) = B(n - 1) + B(n - 2) + U(n - 2) + B(n - 2)
C(n) = C(n - 1) + C(n - 2) + C(n - 2) + U(n - 2)
Explanation for the relations 2 3 4:
If D is an exact data chunk of length n, then there are three possibilities:
D ends with 0, and deleting this 0 yields an exact data chunk of length n - 1;
D ends with 10, and deleting this 10 yields an exact data chunk of length n - 2;
D ends with 11, and deleting this 11 yields an exact data chunk of length n - 2.
Thus, for example, when we sum up all the hit numbers for 0 in all exact data chunks of length n, the contributions of the three cases are respectively A(n - 1) + U(n - 1), A(n - 2), A(n - 2). Similarly for the other two equalities.
Now, solving these recurrence relations, we get:
A(n) = 2/9 * n * 2^n + (smaller terms)
B(n) = C(n) = 1/9 * n * 2^n + (smaller terms)
Since U(n) = 2/3 * 2^n + (smaller terms), our conclusion is that there are approximately n/3 hits on 0, n/6 hits on 10, n/6 hits on 11.
Note that the same proportions hold if we take also the V(n) inexact data chunks into account, because of the relations between A(n), B(n), C(n), U(n) and a(n), b(n), c(n), V(n).
This method generalizes to any list of words. It's the same idea as if you were to solve this problem using dynamic programing - create status, find recurrence relation, and establish transition matrix.
To go further
I think the following might also be true, which will simplify the answer further.
Let w_1, ..., w_k be the words in the list, and let l_1, ..., l_k be their lengths.
For every i = 1, ..., k, let a_i be the proportion of hits of w_i, i.e. for length n data chunks the expected number of hits for w_i is a_i * n + (smaller terms).
Then, my feeling (conjecture) is that a_i * 2^(l_i) is the same for all i, i.e. if one word is one bit longer than another, then its hit number is a half of that of the other.
This conjecture, if correct, is probably not very difficult to prove. But I'm too lazy to think now...
If this is true, then we can calculate those a_i very easily, because we have the identity:
sum (a_i * l_i) = 1.
Let me illustrate this with the above example.
We have w_1 = 0, w_2 = 10, w_3 = 11, hence l_1 = 1, l_2 = l_3 = 2.
According to the conjecture, we should have a_1 = 2 * a_2 = 2 * a_3. Thus a_2 = a_3 = x and a_1 = 2x. The above equality becomes:
2x * 1 + x * 2 + x * 2 = 1
Hence x = 1 / 6, and we have a_1 = 1 / 3, a_2 = a_3 = 1 / 6, as can be verified by the above calculation.
Let's make a simple machine that can recognize words: a DFA with an accepting state for each word. To construct this DFA, start with a binary tree with each left-child-edge labeled 0 and each right-child-edge labeled 1. Each leaf is either a word-accepter (if the path to that leaf down the tree is the word's spelling) or is garbage (a string of letters that isn't a prefix for any valid word). We wire up "restart" edges from the leaves back to the root of the tree*.
Let's find out what the frequency of matching each word would be, if we had a string of infinite length. To do this, treat the graph of the DFA as a Markov state transition diagram, initialize the starting state to be at the root with probability 1 and all other states 0, and find the steady state distribution (by finding the dominant eigenvector of the transition diagram's corresponding matrix).
Our string is not of infinite length. But since n is large, I expect "edge effects" to not matter so much. We can approximate the matching frequency by word by taking the matching rate by word and multiplying by n. If we want to be more precise, instead of taking the eigenvector we could just take the transition matrix to the nth power and multiply that with the starting distribution to get the resulting distribution after n letters.
*This isn't quite precise, because this Markov system would spend some nonzero amount of time at the root, when after recognizing a word or skipping garbage it should immediately go to the 0-child or 1-child depending. So we don't actually wire up our "restart" edges to a root: from a word-accepting node we wire up two restart edges (one to the 0-child and one to the 1-child of the root); we replace garbage nodes that are left-children with an edge to the 0-child; and we replace garbage nodes that are right-children with an edge to the 1-child. In fact, if we set our initial state to 0 with probability 0.5 and 1 with probability 0.5, we don't even need the root.
EDIT: To use #WhatsUp's example, we start with a DFA that looks like this:
We rewire it a little bit to restart after a word is accepted and get rid of the root node:
The corresponding Markov transition matrix is:
0.5 0 0.5 0.5
0.5 0 0.5 0.5
0 0.5 0 0
0 0.5 0 0
whose first eigenvector is:
0.333
0.333
0.167
0.167
Which is to say that it spends 1/3 of its time in the 0 node, 1/3 in 1, 1/6 in 10, and 1/6 in 11. This is in agreement with #WhatsUp's results for that example.

Probabilty based on quicksort partition

I have come across this question:
Let 0<α<.5 be some constant (independent of the input array length n). Recall the Partition subroutine employed by the QuickSort algorithm, as explained in lecture. What is the probability that, with a randomly chosen pivot element, the Partition subroutine produces a split in which the size of the smaller of the two subarrays is ≥α times the size of the original array?
Its answer is 1-2*α.
Can anyone explain me how has this answer come?Please Help.
The choice of the pivot element is random, with uniform distribution.
There are N elements in the array, and we will assume that N is large (or we won't get the answer we want).
If 0≤α≤1, the probability that the number of elements smaller than the pivot is less than αN is α. The probability that the number of elements greater than the pivot is less than αN is the same. If α≤ 1/2, then these two possibilities are exclusive.
To say that the smaller subarray is of length ≥αN, is to say that neither of these conditions holds, therefore the probability is 1-2α.
The other answers didn't quite click with me so here's another take:
If at least one of the 2 subarrays must be you can deduce that the pivot must also be in position . This is obvious by contradiction. If the pivot is then there is a subarray smaller than . By the same reasoning the pivot must also be . Any larger value for the pivot will yield a smaller subarray than on the "right hand side".
This means that , as shown by the diagram below:
What we want to calculate then is the probability of that event (call it A) i.e .
The way we calculate the probability of an event is to sum of the probability of the constituent outcomes i.e. that the pivot lands at .
That sum is expressed as:
Which easily simplifies to:
With some cancellation we get:
Just one more approach for solving the problem (for those who have uneasy time understanding it, like I have).
First.
Since we are talking about "the smaller of the two subarrays", then its length is less than 1/2 * n (n - the number of elements in original array).
Second.
If 0 < a < 0.5 it means the a * n is less than 1/2 * n either.
And thus we are talking from now about two randomly chosen integers bounded by 0 at lowest and 1/2 * n at highest.
Third.
Lets imagine the dice with numbers from 1 to 6 on it's sides. Lets choose a number from 1 to 6, for example 4. Now roll the dice. Each number has a probability 1/6 to be the outcome of this roll. Thus for event "outcome is less or equal to 4" we have probability equal to the sum of probabilities of each of this outcomes. And we have numbers 1, 2, 3 and 4. Altogether p(x <= 4) = 4 * 1/6 = 4/6 = 2/3. So the probability of event "output is bigger than 4" is p(x > 4) = 1 - p(x <= 4) = 1 - 2/3 = 1/3.
Fourth.
Lets go back to our problem. The "chosen number" is now a * n. And we are going to roll the dice with the numbers from 0 to (1/2 * n) on it to get k - the number of elements in a smallest of subarrays. The probability that outcome is bounded by (a * n) at highest is equals to sum of the probabilities of all outcomes from 0 to (a * n). And the probability for any particular outcome k is p(k) = 1 / (1/2 * n).
Therefore p(k <= a * n) = (a * n) * (1 / (1/2 * n)) = 2 * a.
From this we can easily conclude that p(k > a * n) = 1 - p(k <= a * n) = 1 - 2 * a.
Array length is n.
For smaller array length >= αn pivot should be greater than αn number of elements. At the same time pivot should be smaller than αn number of elements( else smaller array size will be less than required)
So out of n element we have to select one among (n-2α)n elements.
required probability is n(1-2α)/n.
Hence 1-2α
The probability would be, the number of desired elements/Total number of elements.
In this case, ((1-αn)-(αn))/n
Since α lies between,0 and 0.5,(1-α) must be bigger than α.Hence the number of elements contained between them would be,
(1-α-α)n=(1-2α)n
and so,the probability would be,
(1-2α)n/n=1-2α
Another approach:
List the "more balanced" options:
αn + 1 to (1 - α)n - 1
αn + 2 to (1 - α)n - 2
...
αn + k to (1 - α)n - k
So k in total. We know that the most balanced is n / 2 to n / 2, so:
αn + k = n / 2 => k = n(1/2 - α)
Similarly, list the "less balanced" options:
αn - 1 to (1 - α)n + 1
αn - 2 to (1 - α)n + 2
...
αn - m to (1 - α)n + m
So m in total. We know that the least balanced is 0 to n so:
αn - m = 0 => m = αn
Since all these options happen with equal probability we can use the frequency definition of probability so:
Pr{More balanced} = (total # of more balanced) / (total # of options) =>
Pr{More balanced} = k / (k + m) = n(1/2 - α) / (n(1/2 - α) + αn) = 1 - 2α

What would cause an algorithm to have O(log n) complexity?

My knowledge of big-O is limited, and when log terms show up in the equation it throws me off even more.
Can someone maybe explain to me in simple terms what a O(log n) algorithm is? Where does the logarithm come from?
This specifically came up when I was trying to solve this midterm practice question:
Let X(1..n) and Y(1..n) contain two lists of integers, each sorted in nondecreasing order. Give an O(log n)-time algorithm to find the median (or the nth smallest integer) of all 2n combined elements. For ex, X = (4, 5, 7, 8, 9) and Y = (3, 5, 8, 9, 10), then 7 is the median of the combined list (3, 4, 5, 5, 7, 8, 8, 9, 9, 10). [Hint: use concepts of binary search]
I have to agree that it's pretty weird the first time you see an O(log n) algorithm... where on earth does that logarithm come from? However, it turns out that there's several different ways that you can get a log term to show up in big-O notation. Here are a few:
Repeatedly dividing by a constant
Take any number n; say, 16. How many times can you divide n by two before you get a number less than or equal to one? For 16, we have that
16 / 2 = 8
8 / 2 = 4
4 / 2 = 2
2 / 2 = 1
Notice that this ends up taking four steps to complete. Interestingly, we also have that log2 16 = 4. Hmmm... what about 128?
128 / 2 = 64
64 / 2 = 32
32 / 2 = 16
16 / 2 = 8
8 / 2 = 4
4 / 2 = 2
2 / 2 = 1
This took seven steps, and log2 128 = 7. Is this a coincidence? Nope! There's a good reason for this. Suppose that we divide a number n by 2 i times. Then we get the number n / 2i. If we want to solve for the value of i where this value is at most 1, we get
n / 2i ≤ 1
n ≤ 2i
log2 n ≤ i
In other words, if we pick an integer i such that i ≥ log2 n, then after dividing n in half i times we'll have a value that is at most 1. The smallest i for which this is guaranteed is roughly log2 n, so if we have an algorithm that divides by 2 until the number gets sufficiently small, then we can say that it terminates in O(log n) steps.
An important detail is that it doesn't matter what constant you're dividing n by (as long as it's greater than one); if you divide by the constant k, it will take logk n steps to reach 1. Thus any algorithm that repeatedly divides the input size by some fraction will need O(log n) iterations to terminate. Those iterations might take a lot of time and so the net runtime needn't be O(log n), but the number of steps will be logarithmic.
So where does this come up? One classic example is binary search, a fast algorithm for searching a sorted array for a value. The algorithm works like this:
If the array is empty, return that the element isn't present in the array.
Otherwise:
Look at the middle element of the array.
If it's equal to the element we're looking for, return success.
If it's greater than the element we're looking for:
Throw away the second half of the array.
Repeat
If it's less than the the element we're looking for:
Throw away the first half of the array.
Repeat
For example, to search for 5 in the array
1 3 5 7 9 11 13
We'd first look at the middle element:
1 3 5 7 9 11 13
^
Since 7 > 5, and since the array is sorted, we know for a fact that the number 5 can't be in the back half of the array, so we can just discard it. This leaves
1 3 5
So now we look at the middle element here:
1 3 5
^
Since 3 < 5, we know that 5 can't appear in the first half of the array, so we can throw the first half array to leave
5
Again we look at the middle of this array:
5
^
Since this is exactly the number we're looking for, we can report that 5 is indeed in the array.
So how efficient is this? Well, on each iteration we're throwing away at least half of the remaining array elements. The algorithm stops as soon as the array is empty or we find the value we want. In the worst case, the element isn't there, so we keep halving the size of the array until we run out of elements. How long does this take? Well, since we keep cutting the array in half over and over again, we will be done in at most O(log n) iterations, since we can't cut the array in half more than O(log n) times before we run out of array elements.
Algorithms following the general technique of divide-and-conquer (cutting the problem into pieces, solving those pieces, then putting the problem back together) tend to have logarithmic terms in them for this same reason - you can't keep cutting some object in half more than O(log n) times. You might want to look at merge sort as a great example of this.
Processing values one digit at a time
How many digits are in the base-10 number n? Well, if there are k digits in the number, then we'd have that the biggest digit is some multiple of 10k. The largest k-digit number is 999...9, k times, and this is equal to 10k + 1 - 1. Consequently, if we know that n has k digits in it, then we know that the value of n is at most 10k + 1 - 1. If we want to solve for k in terms of n, we get
n ≤ 10k+1 - 1
n + 1 ≤ 10k+1
log10 (n + 1) ≤ k + 1
(log10 (n + 1)) - 1 ≤ k
From which we get that k is approximately the base-10 logarithm of n. In other words, the number of digits in n is O(log n).
For example, let's think about the complexity of adding two large numbers that are too big to fit into a machine word. Suppose that we have those numbers represented in base 10, and we'll call the numbers m and n. One way to add them is through the grade-school method - write the numbers out one digit at a time, then work from the right to the left. For example, to add 1337 and 2065, we'd start by writing the numbers out as
1 3 3 7
+ 2 0 6 5
==============
We add the last digit and carry the 1:
1
1 3 3 7
+ 2 0 6 5
==============
2
Then we add the second-to-last ("penultimate") digit and carry the 1:
1 1
1 3 3 7
+ 2 0 6 5
==============
0 2
Next, we add the third-to-last ("antepenultimate") digit:
1 1
1 3 3 7
+ 2 0 6 5
==============
4 0 2
Finally, we add the fourth-to-last ("preantepenultimate"... I love English) digit:
1 1
1 3 3 7
+ 2 0 6 5
==============
3 4 0 2
Now, how much work did we do? We do a total of O(1) work per digit (that is, a constant amount of work), and there are O(max{log n, log m}) total digits that need to be processed. This gives a total of O(max{log n, log m}) complexity, because we need to visit each digit in the two numbers.
Many algorithms get an O(log n) term in them from working one digit at a time in some base. A classic example is radix sort, which sorts integers one digit at a time. There are many flavors of radix sort, but they usually run in time O(n log U), where U is the largest possible integer that's being sorted. The reason for this is that each pass of the sort takes O(n) time, and there are a total of O(log U) iterations required to process each of the O(log U) digits of the largest number being sorted. Many advanced algorithms, such as Gabow's shortest-paths algorithm or the scaling version of the Ford-Fulkerson max-flow algorithm, have a log term in their complexity because they work one digit at a time.
As to your second question about how you solve that problem, you may want to look at this related question which explores a more advanced application. Given the general structure of problems that are described here, you now can have a better sense of how to think about problems when you know there's a log term in the result, so I would advise against looking at the answer until you've given it some thought.
When we talk about big-Oh descriptions, we are usually talking about the time it takes to solve problems of a given size. And usually, for simple problems, that size is just characterized by the number of input elements, and that's usually called n, or N. (Obviously that's not always true-- problems with graphs are often characterized in numbers of vertices, V, and number of edges, E; but for now, we'll talk about lists of objects, with N objects in the lists.)
We say that a problem "is big-Oh of (some function of N)" if and only if:
For all N > some arbitrary N_0, there is some constant c, such that the runtime of the algorithm is less than that constant c times (some function of N.)
In other words, don't think about small problems where the "constant overhead" of setting up the problem matters, think about big problems. And when thinking about big problems, big-Oh of (some function of N) means that the run-time is still always less than some constant times that function. Always.
In short, that function is an upper bound, up to a constant factor.
So, "big-Oh of log(n)" means the same thing that I said above, except "some function of N" is replaced with "log(n)."
So, your problem tells you to think about binary search, so let's think about that. Let's assume you have, say, a list of N elements that are sorted in increasing order. You want to find out if some given number exists in that list. One way to do that which is not a binary search is to just scan each element of the list and see if it's your target number. You might get lucky and find it on the first try. But in the worst case, you'll check N different times. This is not binary search, and it is not big-Oh of log(N) because there's no way to force it into the criteria we sketched out above.
You can pick that arbitrary constant to be c=10, and if your list has N=32 elements, you're fine: 10*log(32) = 50, which is greater than the runtime of 32. But if N=64, 10*log(64) = 60, which is less than the runtime of 64. You can pick c=100, or 1000, or a gazillion, and you'll still be able to find some N that violates that requirement. In other words, there is no N_0.
If we do a binary search, though, we pick the middle element, and make a comparison. Then we throw out half the numbers, and do it again, and again, and so on. If your N=32, you can only do that about 5 times, which is log(32). If your N=64, you can only do this about 6 times, etc. Now you can pick that arbitrary constant c, in such a way that the requirement is always met for large values of N.
With all that background, what O(log(N)) usually means is that you have some way to do a simple thing, which cuts your problem size in half. Just like the binary search is doing above. Once you cut the problem in half, you can cut it in half again, and again, and again. But, critically, what you can't do is some preprocessing step that would take longer than that O(log(N)) time. So for instance, you can't shuffle your two lists into one big list, unless you can find a way to do that in O(log(N)) time, too.
(NOTE: Nearly always, Log(N) means log-base-two, which is what I assume above.)
In the following solution, all the lines with a recursive call are done on
half of the given sizes of the sub-arrays of X and Y.
Other lines are done in a constant time.
The recursive function is T(2n)=T(2n/2)+c=T(n)+c=O(lg(2n))=O(lgn).
You start with MEDIAN(X, 1, n, Y, 1, n).
MEDIAN(X, p, r, Y, i, k)
if X[r]<Y[i]
return X[r]
if Y[k]<X[p]
return Y[k]
q=floor((p+r)/2)
j=floor((i+k)/2)
if r-p+1 is even
if X[q+1]>Y[j] and Y[j+1]>X[q]
if X[q]>Y[j]
return X[q]
else
return Y[j]
if X[q+1]<Y[j-1]
return MEDIAN(X, q+1, r, Y, i, j)
else
return MEDIAN(X, p, q, Y, j+1, k)
else
if X[q]>Y[j] and Y[j+1]>X[q-1]
return Y[j]
if Y[j]>X[q] and X[q+1]>Y[j-1]
return X[q]
if X[q+1]<Y[j-1]
return MEDIAN(X, q, r, Y, i, j)
else
return MEDIAN(X, p, q, Y, j, k)
The Log term pops up very often in algorithm complexity analysis. Here are some explanations:
1. How do you represent a number?
Lets take the number X = 245436. This notation of “245436” has implicit information in it. Making that information explicit:
X = 2 * 10 ^ 5 + 4 * 10 ^ 4 + 5 * 10 ^ 3 + 4 * 10 ^ 2 + 3 * 10 ^ 1 + 6 * 10 ^ 0
Which is the decimal expansion of the number. So, the minimum amount of information we need to represent this number is 6 digits. This is no coincidence, as any number less than 10^d can be represented in d digits.
So how many digits are required to represent X? Thats equal to the largest exponent of 10 in X plus 1.
==> 10 ^ d > X
==> log (10 ^ d) > log(X)
==> d* log(10) > log(X)
==> d > log(X) // And log appears again...
==> d = floor(log(x)) + 1
Also note that this is the most concise way to denote the number in this range. Any reduction will lead to information loss, as a missing digit can be mapped to 10 other numbers. For example: 12* can be mapped to 120, 121, 122, …, 129.
2. How do you search for a number in (0, N - 1)?
Taking N = 10^d, we use our most important observation:
The minimum amount of information to uniquely identify a value in a range between 0 to N - 1 = log(N) digits.
This implies that, when asked to search for a number on the integer line, ranging from 0 to N - 1, we need at least log(N) tries to find it. Why? Any search algorithm will need to choose one digit after another in its search for the number.
The minimum number of digits it needs to choose is log(N). Hence the minimum number of operations taken to search for a number in a space of size N is log(N).
Can you guess the order complexities of binary search, ternary search or deca search? Its O(log(N))!
3. How do you sort a set of numbers?
When asked to sort a set of numbers A into an array B, here’s what it looks like ->
Permute Elements
Every element in the original array has to be mapped to it’s corresponding index in the sorted array. So, for the first element, we have n positions. To correctly find the corresponding index in this range from 0 to n - 1, we need…log(n) operations.
The next element needs log(n-1) operations, the next log(n-2) and so on. The total comes to be:
==> log(n) + log(n - 1) + log(n - 2) + … + log(1)Using log(a) + log(b) = log(a * b), ==> log(n!)
This can be approximated to nlog(n) - n. Which is O(n*log(n))!
Hence we conclude that there can be no sorting algorithm that can do better than O(n*log(n)). And some algorithms having this complexity are the popular Merge Sort and Heap Sort!
These are some of the reasons why we see log(n) pop up so often in the complexity analysis of algorithms. The same can be extended to binary numbers. I made a video on that here.
Why does log(n) appear so often during algorithm complexity analysis?
Cheers!
We call the time complexity O(log n), when the solution is based on iterations over n, where the work done in each iteration is a fraction of the previous iteration, as the algorithm works towards the solution.
Can't comment yet... necro it is!
Avi Cohen's answer is incorrect, try:
X = 1 3 4 5 8
Y = 2 5 6 7 9
None of the conditions are true, so MEDIAN(X, p, q, Y, j, k) will cut both the fives. These are nondecreasing sequences, not all values are distinct.
Also try this even-length example with distinct values:
X = 1 3 4 7
Y = 2 5 6 8
Now MEDIAN(X, p, q, Y, j+1, k) will cut the four.
Instead I offer this algorithm, call it with MEDIAN(1,n,1,n):
MEDIAN(startx, endx, starty, endy){
if (startx == endx)
return min(X[startx], y[starty])
odd = (startx + endx) % 2 //0 if even, 1 if odd
m = (startx+endx - odd)/2
n = (starty+endy - odd)/2
x = X[m]
y = Y[n]
if x == y
//then there are n-2{+1} total elements smaller than or equal to both x and y
//so this value is the nth smallest
//we have found the median.
return x
if (x < y)
//if we remove some numbers smaller then the median,
//and remove the same amount of numbers bigger than the median,
//the median will not change
//we know the elements before x are smaller than the median,
//and the elements after y are bigger than the median,
//so we discard these and continue the search:
return MEDIAN(m, endx, starty, n + 1 - odd)
else (x > y)
return MEDIAN(startx, m + 1 - odd, n, endy)
}

Resources