Time limit exceeding for my program in python3 - time

IOI Training Camp 20xx
(INOI 2011)
We are well into the 21st century and school children are taught dynamic programming in class 4. The IOI training camp has degenerated into an endless sequence of tests, with negative marking. At the end of the camp, each student is evaluated based on the sum of the best contiguous segment (i.e., no gaps) of marks in the overall sequence of tests.
Students, however, have not changed much over the years and they have asked for some relaxation in the evaluation procedure. As a concession, the camp coordinators have agreed that students are allowed to drop upto a certain number of tests when calculating their best segment.
For instance, suppose that Lavanya is a student at the training camp and there have been ten tests, in which her marks are as follows.
Test 1 2 3 4 5 6 7 8 9 10
Marks 6 -5 3 -7 6 -1 10 -8 -8 8
In this case, without being allowed to drop any tests, the best segment is tests 5–7, which yields a total of 15 marks. If Lavanya is allowed to drop upto 2 tests in a segment, the best segment is tests 1–7, which yields a total of 24 marks after dropping tests 2 and 4. If she is allowed to drop upto 6 tests in a segment, the best total is obtained by taking the entire list and dropping the 5 negative entries to get a total of 33.
You will be given a sequence of N test marks and a number K. You have to compute the sum of the best segment in the sequence when upto K marks may be dropped from the segment.
Solution hint
For 1 ≤ i ≤ N, 1 ≤ j ≤ K, let Best[i][j] denote the maximum segment ending at position i with at most j marks dropped. Best[i][0] is the classical maximum subsegment or maximum subarray problem. For j ≥ 1; inductively compute Best[i][j] from Best[i][j-1].
Input format
The first line of input contains two integers N and K, where N is the number of tests for which marks will be provided and K is the limit of how many entries may be dropped from a segment.
This is followed by N lines of input each containing a single integer. The marks for test i, i ∈ {1,2,…,N} are provided in line i+1.
Output format
The output is a single number, the maximum marks that can be obtained from a segment in which upto K values are dropped.
Constraints
You may assume that 1 ≤ N ≤ 104 and 0 ≤ K ≤ 102. The marks for each test lie in the range [-104 … 104]. In 40% of the cases you may assume N ≤ 250.
def sumsub(ls,k):
if k==0:
return(sum(ls))
else:
ls.sort()
if len(ls)>=k:
for j in range(0,k):
if(ls[0]<0):
ls.remove(ls[0])
return(sum(ls))
else:
for i in range(len(ls)):
if ls[0]<0:
ls.remove(ls[0])
return(sum(ls))
n,k= map(int,input().split(" "))
l=[]
smax=0
for i in range(n):
m=input()
l.append(int(m))
for i in range(n):
for j in range(n,0,-1):
s=sumsub(l[i:j],k)
if s>smax :
smax=s
print(smax)
i am gettin the proper answer but the time limit is exceeding for some.
it'll be of great help if you can provide a solution for this.

There're four 'for' loops running every time when your defined function executes. That's the big reason for the time limit exceeding in execution.

Related

Find four factors of a number such that their product is maximum and their sum is the original number

Given number of test cases T and an integer N, you need to find four integers A,B,C,D , such that they're all factors of N(A|N,B|N,C|N,D|N), and N=A+B+C+D. Goal is to maximize A * B * C * D. If it's not possible to find such four factors simply return -1.
Input format for the problem is:
First line contains an integer T(1<=T<=40000), represents the number of test cases.
Each of the next T lines contains an integer N (1<=N<=40000, N^4 will not exceed 64 bit integer).
This question is on Hackerearth under recursion category, but i'm not able to understand the algorithm in the editorial( editorial link:- https://www.hackerearth.com/practice/basic-programming/recursion/recursion-and-backtracking/practice-problems/algorithm/divide-number-a410603f/editorial/).
In the editorial it's been solved using unit fractions but i'm not able to understand the algorithm( I've provided the editorial below if you are not able to open the above link, I'm not able to understand the points marked with ***). Brute force solution results in TLE(Time Limit Exceeded). Please provide algorithm or pseudo-code using DFS or backtracking.
My brute force approach:- calculate the factors of a number 'n' in O(sqrt(n)) and store them in an array, then traverse the array to get A,B,C,D using four for loops. But for T(1<=T<=40000) test cases it gets TLE.
Editorial(If you are not able to open the above link):-
Consider the equation N = A+B+C+D , if we divide the equation by N , we get 1 = 1/A' + 1/B' + 1/C' + 1/D' , here A',B',C',D' are all intergers, because A,B,C,D are factors of N.
So the original problem is equal to divide 1 into four unit fractions.
We can enumerate the unit fractions from large to small.
*** If we need to divide X into Y unit fractions, and the last unit fraction is 1/Z, we can enumerate unit fractions between 1/Z and X/Y(because we are enumerating the largest remaining fraction), and recursively solve.
*** After find all solutions to 1 = 1/A' + 1/B' + 1/C' + 1/D' (about 20 solutions if the numbers are in order), we can enumerate them in each test case. If A',B',C',D' are all factors of N, we can use this solution to update the answer.
Time Complexity: O(T), where T is the number of Test cases.
*** If we need to divide X into Y unit fractions, and the last unit fraction is 1/Z, we can enumerate unit fractions between 1/Z and X/Y(because we are enumerating the largest remaining fraction), and recursively solve.
Answer: We are trying to find out all the combination from 1 = 1/A + 1/B + 1/C + 1/D. Initially, we have X=1 and Y=4, and we are enumerating A as the largest factor, which should be no less than X/Y = 1/4. Because this is the first element, there is no last fraction 1/Z. Suppose we chose A=3, so last fraction 1/Z is 1/A=1/3, and X=1-1/3=2/3, and Y=3. Now we shall choose 1/B from [X/Y, 1/Z] = [2/9, 1/3]. And do the same thing for the next steps.
*** After find all solutions to 1 = 1/A' + 1/B' + 1/C' + 1/D' (about 20 solutions if the numbers are in order), we can enumerate them in each test case. If A',B',C',D' are all factors of N, we can use this solution to update the answer.
Answer: Because 1/A should be no less than 1/4, so A could only be 2, 3, 4. If A==4, then A=B=C=D, only have one solution. If A==3, [X/Y, 1/Z] = [2/9, 1/3], so B could only be 3 or 4, if B ==4, then next round C should be 4 where [X/Y, 1/Z] = [5/24, 1/4]; if B=3, then C could be 4,5,6 because [X/Y, 1/Z]=[1/6,1/3]. If A==2, [X/Y, 1/Z] = [1/6, 1/2], B could be 3,4,5,6. You could do the rest calculation using the code, feel like we could cut off many search branches. (Ignore my enumeration order, you should start from A=2. )
The time complexity of your code can be improved by using only 3 for loop and applying binary search to find the fourth number as time complexity of the binary search is log(n).
Time complexity = O(n^3*(log(n)) and according to the
constraints of question it should able to pass all the test cases.

Calculating limits in dynamic programming

I found this question on topcoder:
Your friend Lucas gave you a sequence S of positive integers.
For a while, you two played a simple game with S: Lucas would pick a number, and you had to select some elements of S such that the sum of all numbers you selected is the number chosen by Lucas. For example, if S={2,1,2,7} and Lucas chose the number 11, you would answer that 2+2+7 = 11.
Lucas now wants to trick you by choosing a number X such that there will be no valid answer. For example, if S={2,1,2,7}, it is not possible to select elements of S that sum up to 6.
You are given the int[] S. Find the smallest positive integer X that cannot be obtained as the sum of some (possibly all) elements of S.
Constraints: - S will contain between 1 and 20 elements, inclusive. - Each element of S will be between 1 and 100,000, inclusive.
But in the editorial solution it has been written:
How about finding the smallest impossible sum? Well, we can try the following naive algorithm: First try with x = 1, if this is not a valid sum (found using the methods in the previous section), then we can return x, else we increment x and try again, and again until we find the smallest number that is not a valid sum.
Let's find an upper bound for the number of iterations, the number of values of x we will need to try before we find a result. First of all, the maximum sum possible in this problem is 100000 * 20 (All numbers are the maximum 100000), this means that 100000 * 20 + 1 will not be an impossible value. We can be certain to need at most 2000001 steps.
How good is this upper bound? If we had 100000 in each of the 20 numbers, 1 wouldn't be a possible sum. So we actually need one iteration in that case. If we want 1 to be a possible sum, we should have 1 in the initial elements. Then we need a 2 (Else we would only need 2 iterations), then a 4 (3 can be found by adding 1+2), then 8 (Numbers from 5 to 7 can be found by adding some of the first 3 powers of two), then 16, 32, .... It turns out that with the powers of 2, we can easily make inputs that require many iterations. With the first 17 powers of two, we can cover up to the first 262143 integer numbers. That should be a good estimation for the largest number. (We cannot use 2^18 in the input, smaller than 100000).
Up to 262143 times, we need to query if a number x is in the set of possible sums. We can just use a boolean array here. It appears that even O(log(n)) data structures should be fast enough, however.
I did understand the first paragraph. But after that they have explained something about "How good is this upper bound?...". I couldnt understand that paragraph. How did they deduce to the fact that we need to query 262143 times if a number x is in the set of possible sums?
I am a newbie at dynamic programming and so it would be great if somebody could explain this to me.
Thank you.
The idea is as follows:
If the input sequence contains the first k powers of two: 2^0, 2^1, ... 2^(k-1), then the sum can be any integer between 0 and (2^k) - 1. Since the greatest power of two that can appear in the sequence is 2^17, the greatest sum that you can build from 18 numbers is 2^18 - 1=262,143. If a power of two would be missing, there would be a smaller sum that was not possible to achieve.
However, the statement is missing that there may be 2 more numbers in the sequence (at most 20). From these two numbers, you can repeat the same process. Hence, the maximum number to check is actually (2^18) - 1 + (2^2) - 1.
You may wonder why we use powers of two and not any other powers. The reason is the binary selection that we perform on the numbers in the input sequence. Either we add a number to the sum or we don't. So, if we represent this selection for number ni as a selection variable si (either 0 or 1), then the possible sum is:
s = s0 * n0 + s1 * n1 + s2 * n2 + ...
Now, if we choose the ni to be powers of two ni = 2^i, then:
s = s0 * 2^0 + s1 * 2^1 + s2 * 2^2 + ...
= sum si * 2^i
This is equivalent to the binary representations of numbers (see Positional Notation). By definition, different choices for the selection variables will produce different sums. Hence, the number of possible sums is maximal by choosing powers of two in the input sequence.

How many numbers have a maximum number of unique prime factors in a given range

Note that the divisors have to be unique
So 32 has 1 unique prime factor [2], 40 has [2, 5] and so on.
Given a range [a, b], a, b <= 2^31, we should find how many numbers in this range have a maximum number of unique divisors.
The best algorithm I can imagine is an improved Sieve of Eratosthenes, with an array counting how many prime factors a number has. But it is not only O(n), which is unacceptable with such a range, but also very inefficient in terms of memory.
What is the best algorithm to solve this question? Is there such an algorithm?
I'll write a first idea in Python-like pseudocode. First find out how many prime factors you may need at most:
p = 1
i = 0
while primes[i] * p <= b:
p = p * primes[i]
i = i + 1
This only used b, not a, so you may have to decrease the number of actual prime factors. But since the result of the above is at most 9 (as the product of the first 10 primes already exceeds 231), you can conceivably go down from this maximum one step at a time:
cnt = 0
while cnt == 0:
cnt = count(i, 1, 0)
i = i - 1
return cnt
So now we need to implement this function count, which I define recursively.
def count(numFactorsToGo, productSoFar, nextPrimeIndex):
if numFactorsToGo > 0:
cnt = 0
while productSoFar * primes[nextPrimeIndex] <= b:
cnt = cnt + count(numFactorsToGo - 1,
productSoFar * primes[nextPrimeIndex],
nextPrimeIndex + 1)
nextPrimeIndex = nextPrimeIndex + 1
return cnt
else:
return floor(b / productSoFar) - ceil(a / productSoFar) + 1
This function has two cases to distinguish. In the first case, you don't have the desired number of prime factors yet. So you multiply in another prime, which has to be larger than the largest prime already included in the product so far. You achieve this by starting at the given index for the next prime. You add the counts for all these recursive calls.
The second case is where you have reached the desired number of prime factors. In this case, you want to count all possible integers k such that a ≤ k∙p ≤ b. Which translates easily into ⌈a/p⌉ ≤ k ≤ ⌊b/p⌋ so the count would be ⌊b/p⌋ − ⌈a/p⌉ + 1. In an actual implementation I'd not use floating-point division and floor or ceil, but instead I'd make use of truncating integer division for the sake of performance. So I'd probably write this line as
return (b // productSoFar) - ((a - 1) // productSoFar + 1) + 1
As it is written now, you'd need the primes array precomouted up to 231, which would be a list of 105,097,565 numbers according to Wolfram Alpha. That will cause considerable memory requirements, and will also make the outer loops (where productSoFar is still small) iterate over a large number of primes which won't be needed later on.
One thing you can do is change the end of loop condition. Instead of just checking that adding one more prime doesn't make the product exceed b, you can check whether including the next primesToGo primes in the product is possible without exceeding b. This will allow you to end the loop a lot earlier if the total number of prime factors is large.
For a small number of prime factors, things are still tricky. In particular if you have a very narrow range [a, b] then the number with maximal prime factor count might well be a large prime factor times a product of very small primes. Consider for example [2147482781, 2147482793]. This interval contains 4 elements with 4 distinct factors, some of which contain quite large prime factors, namely
3 ∙ 5 ∙ 7 ∙ 20,452,217
22 ∙ 3 ∙ 11 ∙ 16,268,809
2 ∙ 5 ∙ 19 ∙ 11,302,541
23 ∙ 7 ∙ 13 ∙ 2,949,839
Since there are only 4,792 primes up to sqrt(231), with 46,337 as their largest (which fits into a 16 bit unsigned integer). It would be possible to precompute only those, and use that to factor each number in the range. But that would again mean iterating over the range. Which makes sense for small ranges, but not for large ones.
So perhaps you need to distinguish these cases up front, and then choose the algorithm accordingly. I don't have a good idea of how to combine these ideas – yet. If someone else does, feel free to extend this post or write your own answer building on this.

ACM: The 3n + 1 [duplicate]

This question already has answers here:
Uva's 3n+1 problem
(8 answers)
Project Euler Question 14 (Collatz Problem)
(8 answers)
Closed 9 years ago.
The 3n + 1 problem
Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 18416 Accepted Submission(s): 6803
Problem Description
Problems in Computer Science are often classified as belonging to a certain class of problems (e.g., NP, Unsolvable, Recursive). In this problem you will be analyzing a property of an algorithm whose classification is not known for all possible inputs.
Consider the following algorithm:
1. input n
2. print n
3. if n = 1 then STOP
4. if n is odd then n <- 3n + 1
5. else n <- n / 2
6. GOTO 2
Given the input 22, the following sequence of numbers will be printed 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
It is conjectured that the algorithm above will terminate (when a 1 is printed) for any integral input value. Despite the simplicity of the algorithm, it is unknown whether this conjecture is true. It has been verified, however, for all integers n such that 0 < n < 1,000,000 (and, in fact, for many more numbers than this.)
Given an input n, it is possible to determine the number of numbers printed (including the 1). For a given n this is called the cycle-length of n. In the example above, the cycle length of 22 is 16.
For any two numbers i and j you are to determine the maximum cycle length over all numbers between i and j.
Input
The input will consist of a series of pairs of integers i and j, one pair of integers per line. All integers will be less than 1,000,000 and greater than 0.
You should process all pairs of integers and for each pair determine the maximum cycle length over all integers between and including i and j.
You can assume that no opperation overflows a 32-bit integer.
Output
For each pair of input integers i and j you should output i, j, and the maximum cycle length for integers between and including i and j. These three numbers should be separated by at least one space with all three numbers on one line and with one line of output for each line of input. The integers i and j must appear in the output in the same order in which they appeared in the input and should be followed by the maximum cycle length (on the same line).
Sample Input
1 10
100 200
201 210
900 1000
Sample Output
1 10 20
100 200 125
201 210 89
900 1000 174
if you use the plain way.for each case,cal all integer's solution.
so ,you can cal the solution for all between 1 and 1,000,000,than store in an array.
use the ST way to cal a new dp[N][M].
than when there is a query ,you can get the ans with O(1) times;
or you can use the Segment Tree to get the ans.
but i am not sure it won't get TLE...
for(int j=0;j<=log(N);j++)
for(int i=1;i<=N;i++)
dp[i][j]=max(dp[i][j-1]+dp[i+pow(2,j-1)][j-1];

What would cause an algorithm to have O(log n) complexity?

My knowledge of big-O is limited, and when log terms show up in the equation it throws me off even more.
Can someone maybe explain to me in simple terms what a O(log n) algorithm is? Where does the logarithm come from?
This specifically came up when I was trying to solve this midterm practice question:
Let X(1..n) and Y(1..n) contain two lists of integers, each sorted in nondecreasing order. Give an O(log n)-time algorithm to find the median (or the nth smallest integer) of all 2n combined elements. For ex, X = (4, 5, 7, 8, 9) and Y = (3, 5, 8, 9, 10), then 7 is the median of the combined list (3, 4, 5, 5, 7, 8, 8, 9, 9, 10). [Hint: use concepts of binary search]
I have to agree that it's pretty weird the first time you see an O(log n) algorithm... where on earth does that logarithm come from? However, it turns out that there's several different ways that you can get a log term to show up in big-O notation. Here are a few:
Repeatedly dividing by a constant
Take any number n; say, 16. How many times can you divide n by two before you get a number less than or equal to one? For 16, we have that
16 / 2 = 8
8 / 2 = 4
4 / 2 = 2
2 / 2 = 1
Notice that this ends up taking four steps to complete. Interestingly, we also have that log2 16 = 4. Hmmm... what about 128?
128 / 2 = 64
64 / 2 = 32
32 / 2 = 16
16 / 2 = 8
8 / 2 = 4
4 / 2 = 2
2 / 2 = 1
This took seven steps, and log2 128 = 7. Is this a coincidence? Nope! There's a good reason for this. Suppose that we divide a number n by 2 i times. Then we get the number n / 2i. If we want to solve for the value of i where this value is at most 1, we get
n / 2i ≤ 1
n ≤ 2i
log2 n ≤ i
In other words, if we pick an integer i such that i ≥ log2 n, then after dividing n in half i times we'll have a value that is at most 1. The smallest i for which this is guaranteed is roughly log2 n, so if we have an algorithm that divides by 2 until the number gets sufficiently small, then we can say that it terminates in O(log n) steps.
An important detail is that it doesn't matter what constant you're dividing n by (as long as it's greater than one); if you divide by the constant k, it will take logk n steps to reach 1. Thus any algorithm that repeatedly divides the input size by some fraction will need O(log n) iterations to terminate. Those iterations might take a lot of time and so the net runtime needn't be O(log n), but the number of steps will be logarithmic.
So where does this come up? One classic example is binary search, a fast algorithm for searching a sorted array for a value. The algorithm works like this:
If the array is empty, return that the element isn't present in the array.
Otherwise:
Look at the middle element of the array.
If it's equal to the element we're looking for, return success.
If it's greater than the element we're looking for:
Throw away the second half of the array.
Repeat
If it's less than the the element we're looking for:
Throw away the first half of the array.
Repeat
For example, to search for 5 in the array
1 3 5 7 9 11 13
We'd first look at the middle element:
1 3 5 7 9 11 13
^
Since 7 > 5, and since the array is sorted, we know for a fact that the number 5 can't be in the back half of the array, so we can just discard it. This leaves
1 3 5
So now we look at the middle element here:
1 3 5
^
Since 3 < 5, we know that 5 can't appear in the first half of the array, so we can throw the first half array to leave
5
Again we look at the middle of this array:
5
^
Since this is exactly the number we're looking for, we can report that 5 is indeed in the array.
So how efficient is this? Well, on each iteration we're throwing away at least half of the remaining array elements. The algorithm stops as soon as the array is empty or we find the value we want. In the worst case, the element isn't there, so we keep halving the size of the array until we run out of elements. How long does this take? Well, since we keep cutting the array in half over and over again, we will be done in at most O(log n) iterations, since we can't cut the array in half more than O(log n) times before we run out of array elements.
Algorithms following the general technique of divide-and-conquer (cutting the problem into pieces, solving those pieces, then putting the problem back together) tend to have logarithmic terms in them for this same reason - you can't keep cutting some object in half more than O(log n) times. You might want to look at merge sort as a great example of this.
Processing values one digit at a time
How many digits are in the base-10 number n? Well, if there are k digits in the number, then we'd have that the biggest digit is some multiple of 10k. The largest k-digit number is 999...9, k times, and this is equal to 10k + 1 - 1. Consequently, if we know that n has k digits in it, then we know that the value of n is at most 10k + 1 - 1. If we want to solve for k in terms of n, we get
n ≤ 10k+1 - 1
n + 1 ≤ 10k+1
log10 (n + 1) ≤ k + 1
(log10 (n + 1)) - 1 ≤ k
From which we get that k is approximately the base-10 logarithm of n. In other words, the number of digits in n is O(log n).
For example, let's think about the complexity of adding two large numbers that are too big to fit into a machine word. Suppose that we have those numbers represented in base 10, and we'll call the numbers m and n. One way to add them is through the grade-school method - write the numbers out one digit at a time, then work from the right to the left. For example, to add 1337 and 2065, we'd start by writing the numbers out as
1 3 3 7
+ 2 0 6 5
==============
We add the last digit and carry the 1:
1
1 3 3 7
+ 2 0 6 5
==============
2
Then we add the second-to-last ("penultimate") digit and carry the 1:
1 1
1 3 3 7
+ 2 0 6 5
==============
0 2
Next, we add the third-to-last ("antepenultimate") digit:
1 1
1 3 3 7
+ 2 0 6 5
==============
4 0 2
Finally, we add the fourth-to-last ("preantepenultimate"... I love English) digit:
1 1
1 3 3 7
+ 2 0 6 5
==============
3 4 0 2
Now, how much work did we do? We do a total of O(1) work per digit (that is, a constant amount of work), and there are O(max{log n, log m}) total digits that need to be processed. This gives a total of O(max{log n, log m}) complexity, because we need to visit each digit in the two numbers.
Many algorithms get an O(log n) term in them from working one digit at a time in some base. A classic example is radix sort, which sorts integers one digit at a time. There are many flavors of radix sort, but they usually run in time O(n log U), where U is the largest possible integer that's being sorted. The reason for this is that each pass of the sort takes O(n) time, and there are a total of O(log U) iterations required to process each of the O(log U) digits of the largest number being sorted. Many advanced algorithms, such as Gabow's shortest-paths algorithm or the scaling version of the Ford-Fulkerson max-flow algorithm, have a log term in their complexity because they work one digit at a time.
As to your second question about how you solve that problem, you may want to look at this related question which explores a more advanced application. Given the general structure of problems that are described here, you now can have a better sense of how to think about problems when you know there's a log term in the result, so I would advise against looking at the answer until you've given it some thought.
When we talk about big-Oh descriptions, we are usually talking about the time it takes to solve problems of a given size. And usually, for simple problems, that size is just characterized by the number of input elements, and that's usually called n, or N. (Obviously that's not always true-- problems with graphs are often characterized in numbers of vertices, V, and number of edges, E; but for now, we'll talk about lists of objects, with N objects in the lists.)
We say that a problem "is big-Oh of (some function of N)" if and only if:
For all N > some arbitrary N_0, there is some constant c, such that the runtime of the algorithm is less than that constant c times (some function of N.)
In other words, don't think about small problems where the "constant overhead" of setting up the problem matters, think about big problems. And when thinking about big problems, big-Oh of (some function of N) means that the run-time is still always less than some constant times that function. Always.
In short, that function is an upper bound, up to a constant factor.
So, "big-Oh of log(n)" means the same thing that I said above, except "some function of N" is replaced with "log(n)."
So, your problem tells you to think about binary search, so let's think about that. Let's assume you have, say, a list of N elements that are sorted in increasing order. You want to find out if some given number exists in that list. One way to do that which is not a binary search is to just scan each element of the list and see if it's your target number. You might get lucky and find it on the first try. But in the worst case, you'll check N different times. This is not binary search, and it is not big-Oh of log(N) because there's no way to force it into the criteria we sketched out above.
You can pick that arbitrary constant to be c=10, and if your list has N=32 elements, you're fine: 10*log(32) = 50, which is greater than the runtime of 32. But if N=64, 10*log(64) = 60, which is less than the runtime of 64. You can pick c=100, or 1000, or a gazillion, and you'll still be able to find some N that violates that requirement. In other words, there is no N_0.
If we do a binary search, though, we pick the middle element, and make a comparison. Then we throw out half the numbers, and do it again, and again, and so on. If your N=32, you can only do that about 5 times, which is log(32). If your N=64, you can only do this about 6 times, etc. Now you can pick that arbitrary constant c, in such a way that the requirement is always met for large values of N.
With all that background, what O(log(N)) usually means is that you have some way to do a simple thing, which cuts your problem size in half. Just like the binary search is doing above. Once you cut the problem in half, you can cut it in half again, and again, and again. But, critically, what you can't do is some preprocessing step that would take longer than that O(log(N)) time. So for instance, you can't shuffle your two lists into one big list, unless you can find a way to do that in O(log(N)) time, too.
(NOTE: Nearly always, Log(N) means log-base-two, which is what I assume above.)
In the following solution, all the lines with a recursive call are done on
half of the given sizes of the sub-arrays of X and Y.
Other lines are done in a constant time.
The recursive function is T(2n)=T(2n/2)+c=T(n)+c=O(lg(2n))=O(lgn).
You start with MEDIAN(X, 1, n, Y, 1, n).
MEDIAN(X, p, r, Y, i, k)
if X[r]<Y[i]
return X[r]
if Y[k]<X[p]
return Y[k]
q=floor((p+r)/2)
j=floor((i+k)/2)
if r-p+1 is even
if X[q+1]>Y[j] and Y[j+1]>X[q]
if X[q]>Y[j]
return X[q]
else
return Y[j]
if X[q+1]<Y[j-1]
return MEDIAN(X, q+1, r, Y, i, j)
else
return MEDIAN(X, p, q, Y, j+1, k)
else
if X[q]>Y[j] and Y[j+1]>X[q-1]
return Y[j]
if Y[j]>X[q] and X[q+1]>Y[j-1]
return X[q]
if X[q+1]<Y[j-1]
return MEDIAN(X, q, r, Y, i, j)
else
return MEDIAN(X, p, q, Y, j, k)
The Log term pops up very often in algorithm complexity analysis. Here are some explanations:
1. How do you represent a number?
Lets take the number X = 245436. This notation of “245436” has implicit information in it. Making that information explicit:
X = 2 * 10 ^ 5 + 4 * 10 ^ 4 + 5 * 10 ^ 3 + 4 * 10 ^ 2 + 3 * 10 ^ 1 + 6 * 10 ^ 0
Which is the decimal expansion of the number. So, the minimum amount of information we need to represent this number is 6 digits. This is no coincidence, as any number less than 10^d can be represented in d digits.
So how many digits are required to represent X? Thats equal to the largest exponent of 10 in X plus 1.
==> 10 ^ d > X
==> log (10 ^ d) > log(X)
==> d* log(10) > log(X)
==> d > log(X) // And log appears again...
==> d = floor(log(x)) + 1
Also note that this is the most concise way to denote the number in this range. Any reduction will lead to information loss, as a missing digit can be mapped to 10 other numbers. For example: 12* can be mapped to 120, 121, 122, …, 129.
2. How do you search for a number in (0, N - 1)?
Taking N = 10^d, we use our most important observation:
The minimum amount of information to uniquely identify a value in a range between 0 to N - 1 = log(N) digits.
This implies that, when asked to search for a number on the integer line, ranging from 0 to N - 1, we need at least log(N) tries to find it. Why? Any search algorithm will need to choose one digit after another in its search for the number.
The minimum number of digits it needs to choose is log(N). Hence the minimum number of operations taken to search for a number in a space of size N is log(N).
Can you guess the order complexities of binary search, ternary search or deca search? Its O(log(N))!
3. How do you sort a set of numbers?
When asked to sort a set of numbers A into an array B, here’s what it looks like ->
Permute Elements
Every element in the original array has to be mapped to it’s corresponding index in the sorted array. So, for the first element, we have n positions. To correctly find the corresponding index in this range from 0 to n - 1, we need…log(n) operations.
The next element needs log(n-1) operations, the next log(n-2) and so on. The total comes to be:
==> log(n) + log(n - 1) + log(n - 2) + … + log(1)Using log(a) + log(b) = log(a * b), ==> log(n!)
This can be approximated to nlog(n) - n. Which is O(n*log(n))!
Hence we conclude that there can be no sorting algorithm that can do better than O(n*log(n)). And some algorithms having this complexity are the popular Merge Sort and Heap Sort!
These are some of the reasons why we see log(n) pop up so often in the complexity analysis of algorithms. The same can be extended to binary numbers. I made a video on that here.
Why does log(n) appear so often during algorithm complexity analysis?
Cheers!
We call the time complexity O(log n), when the solution is based on iterations over n, where the work done in each iteration is a fraction of the previous iteration, as the algorithm works towards the solution.
Can't comment yet... necro it is!
Avi Cohen's answer is incorrect, try:
X = 1 3 4 5 8
Y = 2 5 6 7 9
None of the conditions are true, so MEDIAN(X, p, q, Y, j, k) will cut both the fives. These are nondecreasing sequences, not all values are distinct.
Also try this even-length example with distinct values:
X = 1 3 4 7
Y = 2 5 6 8
Now MEDIAN(X, p, q, Y, j+1, k) will cut the four.
Instead I offer this algorithm, call it with MEDIAN(1,n,1,n):
MEDIAN(startx, endx, starty, endy){
if (startx == endx)
return min(X[startx], y[starty])
odd = (startx + endx) % 2 //0 if even, 1 if odd
m = (startx+endx - odd)/2
n = (starty+endy - odd)/2
x = X[m]
y = Y[n]
if x == y
//then there are n-2{+1} total elements smaller than or equal to both x and y
//so this value is the nth smallest
//we have found the median.
return x
if (x < y)
//if we remove some numbers smaller then the median,
//and remove the same amount of numbers bigger than the median,
//the median will not change
//we know the elements before x are smaller than the median,
//and the elements after y are bigger than the median,
//so we discard these and continue the search:
return MEDIAN(m, endx, starty, n + 1 - odd)
else (x > y)
return MEDIAN(startx, m + 1 - odd, n, endy)
}

Resources