How do you calculate big O on a function with a hard limit? - algorithm

As part of a programming assignment I saw recently, students were asked to find the big O value of their function for solving a puzzle. I was bored, and decided to write the program myself. However, my solution uses a pattern I saw in the problem to skip large portions of the calculations.
Big O shows how the time increases based on a scaling n, but as n scales, once it reaches the resetting of the pattern, the time it takes resets back to low values as well. My thought was that it was O(nlogn % k) when k+1 is when it resets. Another thought is that as it has a hard limit, the value is O(1), since that is big O of any constant. Is one of those right, and if not, how should the limit be represented?
As an example of the reset, the k value is 31336.
At n=31336, it takes 31336 steps but at n=31337, it takes 1.
The code is:
def Entry(a1, q):
F = [a1]
lastnum = a1
q1 = q % 31336
rows = (q / 31336)
for i in range(1, q1):
lastnum = (lastnum * 31334) % 31337
F.append(lastnum)
F = MergeSort(F)
print lastnum * rows + F.index(lastnum) + 1
MergeSort is a standard merge sort with O(nlogn) complexity.

It's O(1) and you can derive this from big O's definition. If f(x) is the complexity of your solution, then:
with
and with any M > 470040 (it's nlogn for n = 31336) and x > 0. And this implies from the definition that:

Well, an easy way that I use to think about big-O problems is to think of n as so big it may as well be infinity. If you don't get particular about byte-level operations on very big numbers (because q % 31336 would scale up as q goes to infinity and is not actually constant), then your intuition is right about it being O(1).
Imagining q as close to infinity, you can see that q % 31336 is obviously between 0 and 31335, as you noted. This fact limits the number of array elements, which limits the sort time to be some constant amount (n * log(n) ==> 31335 * log(31335) * C, for some constant C). So it is constant time for the whole algorithm.
But, in the real world, multiplication, division, and modulus all do scale based on input size. You can look up Karatsuba algorithm if you are interested in figuring that out. I'll leave it as an exercise.

If there are a few different instances of this problem, each with its own k value, then the complexity of the method is not O(1), but instead O(k·ln k).

Related

Binary vs Linear searches for unsorted N elements

I try to understand a formula when we should use quicksort. For instance, we have an array with N = 1_000_000 elements. If we will search only once, we should use a simple linear search, but if we'll do it 10 times we should use sort array O(n log n). How can I detect threshold when and for which size of input array should I use sorting and after that use binary search?
You want to solve inequality that rougly might be described as
t * n > C * n * log(n) + t * log(n)
where t is number of checks and C is some constant for sort implementation (should be determined experimentally). When you evaluate this constant, you can solve inequality numerically (with uncertainty, of course)
Like you already pointed out, it depends on the number of searches you want to do. A good threshold can come out of the following statement:
n*log[b](n) + x*log[2](n) <= x*n/2 x is the number of searches; n the input size; b the base of the logarithm for the sort, depending on the partitioning you use.
When this statement evaluates to true, you should switch methods from linear search to sort and search.
Generally speaking, a linear search through an unordered array will take n/2 steps on average, though this average will only play a big role once x approaches n. If you want to stick with big Omicron or big Theta notation then you can omit the /2 in the above.
Assuming n elements and m searches, with crude approximations
the cost of the sort will be C0.n.log n,
the cost of the m binary searches C1.m.log n,
the cost of the m linear searches C2.m.n,
with C2 ~ C1 < C0.
Now you compare
C0.n.log n + C1.m.log n vs. C2.m.n
or
C0.n.log n / (C2.n - C1.log n) vs. m
For reasonably large n, the breakeven point is about C0.log n / C2.
For instance, taking C0 / C2 = 5, n = 1000000 gives m = 100.
You should plot the complexities of both operations.
Linear search: O(n)
Sort and binary search: O(nlogn + logn)
In the plot, you will see for which values of n it makes sense to choose the one approach over the other.
This actually turned into an interesting question for me as I looked into the expected runtime of a quicksort-like algorithm when the expected split at each level is not 50/50.
the first question I wanted to answer was for random data, what is the average split at each level. It surely must be greater than 50% (for the larger subdivision). Well, given an array of size N of random values, the smallest value has a subdivision of (1, N-1), the second smallest value has a subdivision of (2, N-2) and etc. I put this in a quick script:
split = 0
for x in range(10000):
split += float(max(x, 10000 - x)) / 10000
split /= 10000
print split
And got exactly 0.75 as an answer. I'm sure I could show that this is always the exact answer, but I wanted to move on to the harder part.
Now, let's assume that even 25/75 split follows an nlogn progression for some unknown logarithm base. That means that num_comparisons(n) = n * log_b(n) and the question is to find b via statistical means (since I don't expect that model to be exact at every step). We can do this with a clever application of least-squares fitting after we use a logarithm identity to get:
C(n) = n * log(n) / log(b)
where now the logarithm can have any base, as long as log(n) and log(b) use the same base. This is a linear equation just waiting for some data! So I wrote another script to generate an array of xs and filled it with C(n) and ys and filled it with n*log(n) and used numpy to tell me the slope of that least squares fit, which I expect to equal 1 / log(b). I ran the script and got b inside of [2.16, 2.3] depending on how high I set n to (I varied n from 100 to 100'000'000). The fact that b seems to vary depending on n shows that my model isn't exact, but I think that's okay for this example.
To actually answer your question now, with these assumptions, we can solve for the cutoff point of when: N * n/2 = n*log_2.3(n) + N * log_2.3(n). I'm just assuming that the binary search will have the same logarithm base as the sorting method for a 25/75 split. Isolating N you get:
N = n*log_2.3(n) / (n/2 - log_2.3(n))
If your number of searches N exceeds the quantity on the RHS (where n is the size of the array in question) then it will be more efficient to sort once and use binary searches on that.

How do I prove that this algorithm is O(loglogn)

How do I prove that this algorithm is O(loglogn)
i <-- 2
while i < n
i <-- i*i
Well, I believe we should first start with n / 2^k < 1, but that will yield O(logn). Any ideas?
I want to look at this in a simple way, what happends after one iteration, after two iterations, and after k iterations, I think this way I'll be able to understand better how to compute this correctly. What do you think about this approach? I'm new to this, so excuse me.
Let us use the name A for the presented algorithm. Let us further assume that the input variable is n.
Then, strictly speaking, A is not in the runtime complexity class O(log log n). A must be in (Omega)(n), i.e. in terms of runtime complexity, it is at least linear. Why? There is i*i, a multiplication that depends on i that depends on n. A naive multiplication approach might require quadratic runtime complexity. More sophisticated approaches will reduce the exponent, but not below linear in terms of n.
For the sake of completeness, the comparison < is also a linear operation.
For the purpose of the question, we could assume that multiplication and comparison is done in constant time. Then, we can formulate the question: How often do we have to apply the constant time operations > and * until A terminates for a given n?
Simply speaking, the multiplication reduces the effort logarithmic and the iterative application leads to a further logarithmic reduce. How can we show this? Thankfully to the simple structure of A, we can transform A to an equation that we can solve directly.
A changes i to the power of 2 and does this repeatedly. Therefore, A calculates 2^(2^k). When is 2^(2^k) = n? To solve this for k, we apply the logarithm (base 2) two times, i.e., with ignoring the bases, we get k = log log n. The < can be ignored due to the O notation.
To answer the very last part of the original question, we can also look at examples for each iteration. We can note the state of i at the end of the while loop body for each iteration of the while loop:
1: i = 4 = 2^2 = 2^(2^1)
2: i = 16 = 4*4 = (2^2)*(2^2) = 2^(2^2)
3: i = 256 = 16*16 = 4*4 = (2^2)*(2^2)*(2^2)*(2^2) = 2^(2^3)
4: i = 65536 = 256*256 = 16*16*16*16 = ... = 2^(2^4)
...
k: i = ... = 2^(2^k)

Calculation of log for efficiency in math

Hello I am weak in maths. but I an trying to solve the problem below. Am I doing it correctly>
Given: that A is big O, omega,or theta of B.
Question is:
A = n^3 + n * log(n);
B = n^3 + n^2 * log(n);
As an example, I take n=2.
A= 2^3+2log2 => 8.6
B= 2^3+2^2log2 => 9.2
A is lower bound of B..
I have other questions as well but i need to just confirm the method i am applying is correct or is there any other way to do so.
Am doing this right? Thanks in advance.
The idea behind the big O-notation is to compare the long term behaviour. Your idea (to insert n=2) reveals whether A or B is largest for small values of n. However O is all about large values. Part of the problem is to figure out what a large value is.
One way to get a feel of the problem is to make a table of A and B for larger and larger values of n:
A B
n=10
n=100
n=1000
n=10000
n=100000
n=1000000
The first entry in the table is A for n=10: A=10^3 + 10*log(10) = 1000+10*1 = 1010.
The next thing to do, is to draw graphs of A and B in the same coordinate system. Can you spot any long term relation between the two?
A n^3 + n *log(n) 1 + log(n)/n^2
--- = ------------------ = ----------------
B n^3 + n^2*log(n) 1 + log(n)/n
Since log(n)/n and also log(n)/n^2 have limit zero for n trending to infinity, the expressions 1+log(n)/n and 1+log(n)/n^2 in the canceled quotient A/B are bounded to both sides away from zero. For instance, there is a lower bound N such that both expressions fall into the interval [1/2,3/2] for all n > N. This means that all possibilities are true.

Computational complexity of simple algorithm

I have simple algorithm, something like
h = SHA1(message)
r = a^b mod p
r = h * r mod p
l = Str1 || Str2
if ( l == r)
return success
else
return false
Now I want to compute its complexity, but I didn't konw how to do it. I don't know e.g. how the multiplication is done, so I don't understand how to do it. Assume worst case O(n^2) or best case or average case? Maybe I must look on it from other side?
Additionaly the numbers are keep as a byte arrays.
If you want to know the complexity of this algorithm, you just have to add the complexitys of the operations you use and sum it up.
sha1(message) has a complexity depending on the length m of the message, so lets say poly(m), since I dont know the complexity of sha1.
ab mod p can be done in O(log b) multiplications.
h * r mod p is exactly one multiplication
Str1 || Str2 Is this bitwise or? If yes it will take O(s) where s is the length of Str1
l == r will take as much comparisons as the length of the byte array is. This will also be s.
When numbers are realy big. The can not multiplicated in one processor step, so complexity of one multiplications will be in O(log p), since log p is the length of the numbers.
All together you get O(poly(m) + log(b) ⋅ log(p) + s).
Notice: If the length of the numbers (log(b) and log(p)) will never change, this part will be constant. This also holds for s.
You said the numbers are 256 Bit long, so the complexity is only O(poly(m)), which is the complexity of the Sha1-algorithm.
Notice: If you have an algorithm with any complexity, an you only use input of a fixed length, the complexity will always be constany. Complexity is a tool to see how the runtime will expand if the input is growing. If it is not growing, the runtime will also not.
If your input has always a fixed length, than you are more interested in the performance of the implementation of an algorithm.

Finding time complexity of partition by quick sort metod

Here is an algorithm for finding kth smallest number in n element array using partition algorithm of Quicksort.
small(a,i,j,k)
{
if(i==j) return(a[i]);
else
{
m=partition(a,i,j);
if(m==k) return(a[m]);
else
{
if(m>k) small(a,i,m-1,k);
else small(a,m+1,j,k);
}
}
}
Where i,j are starting and ending indices of array(j-i=n(no of elements in array)) and k is kth smallest no to be found.
I want to know what is the best case,and average case of above algorithm and how in brief. I know we should not calculate termination condition in best case and also partition algorithm takes O(n). I do not want asymptotic notation but exact mathematical result if possible.
First of all, I'm assuming the array is sorted - something you didn't mention - because that code wouldn't otherwise work. And, well, this looks to me like a regular binary search.
Anyway...
The best case scenario is when either the array is one element long (you return immediately because i == j), or, for large values of n, if the middle position, m, is the same as k; in that case, no recursive calls are made and it returns immediately as well. That makes it O(1) in best case.
For the general case, consider that T(n) denotes the time taken to solve a problem of size n using your algorithm. We know that:
T(1) = c
T(n) = T(n/2) + c
Where c is a constant time operation (for example, the time to compare if i is the same as j, etc.). The general idea is that to solve a problem of size n, we consume some constant time c (to decide if m == k, if m > k, to calculate m, etc.), and then we consume the time taken to solve a problem of half the size.
Expanding the recurrence can help you derive a general formula, although it is pretty intuitive that this is O(log(n)):
T(n) = T(n/2) + c = T(n/4) + c + c = T(n/8) + c + c + c = ... = T(1) + c*log(n) = c*(log(n) + 1)
That should be the exact mathematical result. The algorithm runs in O(log(n)) time. An average case analysis is harder because you need to know the conditions in which the algorithm will be used. What is the typical size of the array? The typical size of k? What is the mos likely position for k in the array? If it's in the middle, for example, the average case may be O(1). It really depends on how you use this.

Resources