algorithm with O(logn) and θ(logn) time-complexity - algorithm

If we have 2 algorthims. One of them is O(f(x)) time-complexity and the other on is θ(f(x)) time-complexity. Which one we prefer to solve our problem? and why?

There is insufficient information given to decide which algorithm is preferable. It's possible that the first algorithm is preferable, it's possible that both are equally preferable, and it's even possible the second is preferable if they are asymptotically equal but the second has a lower constant factor.
Consider the fact that binary search is O(n) because big-O only gives an upper bound, whereas linear search is Θ(n). Binary search is preferable, because it is asymptotically more efficient.
Consider linear search, which is O(n), and... linear search, which is Θ(n). Both are equally preferable because they are literally the same.
Consider bubble sort, which is O(n2), and insertion sort, which is Θ(n2). Insertion sort does on average ~ n2/4 comparisons, whereas bubble sort does on average ~ n2/2 comparisons, which is twice as many; so insertion sort is preferable.
So as you can see, it's not possible to say without more information.

Let's try to compare the algorithms:
First algorithm has O(nlogn) time complexity which means that execution time t1 is
t1 <= k1 * n * log(n) + o(n * log(n))
Second algorithm is θ(nlogn), that's why
t2 = k2 * n * log(n) + o(n * log(n))
Assuming that n is large enough so we can neglect o(n * log(n)) term, we still have two possibilities here.
t1 < n * log(n)
t1 = k1 * n * log(n) (at least for some worst case)
In the first case we should prefer algorithm 2 for large n, since algorithm 1 has a shorter execution time when n is large enough.
In the second case we have to compare unknown k1 and k2, we have not enough information to choose from 1st and 2nd algorithms.

Related

Measuring the Time Complexity of Insertion And Merge Sort Hybrid

I have a very basic implementation of merge and insertion sort that involves a threshold below which insertion sort is used on sub-arrays of problem size n, where merge and insertion sort are the most basic and widely available:
def hybrid_sort(array: list, threshold: int = 10):
if len(array) > 1:
mid = len(array) // 2
left = array[:mid]
right = array [mid:]
if len(array) > threshold:
hybrid_sort(left)
hybrid_sort(right)
merge(array, left, right)
else:
insertion_sort(array)
Unless I am completely misunderstanding then this would mean that we have a recurrence relation for this particular piece of code generalized as:
T(n) = 2T(n/2) + O(n^2)
The first half showing up for merge sort, and the second being insertion sort opertations.
By the master theorem, n raised to log_b(a) would equal n in this case, because you'd have n raised to the log_2(2) which is 1, so n^1 = n.
Then, our F(n) = n^2 which is is 'larger' than n, so by case 3 of the master theorem my algorithm above would be f(n) or O(n^2), because f(n) is bounded from below by n.
This doesn't seem right to me considering we know merge sort is O(nlog(n)), and I'm having a hard time wrapping my head around this. I think it's because I've not yet analyzed such an algorithm that has a conditional 'if' check.
Can anyone illuminate this for me?
Unless the threshold itself depends on n, the insertion sort part does not matter at all. This has the same complexity as a normal merge sort.
Keep in mind that the time complexity of an algorithm that takes an input of size n is a function of n that is generally difficult to compute exactly, and so we focus on the asymptotic behavior of that function instead. This is where the big O notation comes into play.
In your case, as long as threshold is a constant, this means that as n grows, threshold becomes insignificant and all the insertion sorts can just be grouped up as a constant factor, making the overall complexity O((n-threshold) * log(n-threshold) * f(threshold)), where f(threshold) is a constant. So it simplifies to O(n log n), the complexity of merge sort.
Here's a different perspective that might help give some visibility into what's happening.
Let's suppose that once the array size reaches k, you switch from merge sort to insertion sort. We want to work out the time complexity of this new approach. To do so, we'll imagine the "difference" between the old algorithm and the new algorithm. Specifically, if we didn't make any changes to the algorithm, merge sort would take time Θ(n log n) to complete. However, once we get to arrays of size k, we stop running mergesort and instead use insertion sort. Therefore, we'll make some observations:
There are Θ(n / k) subarrays of the original array of size k.
We are skipping calling mergesort on all these arrays. Therefore, we're avoiding doing Θ(k log k) work for each of Θ(n / k) subarrays, so we're avoiding doing Θ(n log k) work.
Instead, we're insertion-sorting each of those subarrays. Insertion sort, in the worst case, takes time O(k2) when run on an array of size k. There are Θ(n / k) of those arrays, so we're adding in a factor of O(nk) total work.
Overall, this means that the work we're doing in this new variant is O(n log n) - O(n log k) + O(nk). Dialing k up or down will change the total amount of work done. If k is a fixed constant (that is, k = O(1)), this simplifies to
O(n log n) - O(n log k) + O(nk)
= O(n log n) - O(n) + O(n)
= O(n log n)
and the asymptotic runtime is the same as that of regular insertion sort.
It's worth noting that as k gets larger, eventually the O(nk) term will dominate the O(n log k) term, so there's some crossover point where increasing k starts decreasing the runtime. You'd have to do some experimentation to fine-tune when to make the switch. But empirically, setting k to some modest value will indeed give you a big performance boost.

What is the complexity of code that does O(n*log n) work, and then O(n^2) work?

I have an algorithm that first does something in O(n*log(n)) time and then does something else in O(n^2) time. Am I correct that the total complexity would be
O(n*log(n) + n^2)
= O(n*(log(n) + n))
= O(n^2)
since log(n) + n is dominated by the + n?
The statement is correct, as O(n log n) is a subset of O(n^2); however, a formal proof would consist out of choosing and constructing suitable constants.
If the call probability of both is equal then you are right. But if the probability of both is not equal you have to do an amortized analysis where you split rare expensive calls (n²) to many fast calls (n log(n)).
For quick sort for example (which generally takes n log(n), but rarly takes n²) you can proof that average running time is n log(n) because of amortized anlysis.
one of the rules of complexity analysis is that you must remove the terms with lower exponent or lower factors.
nlogn vs n^2 (divide both by n)
logn vs n
logn is smaller than n, than you can remove it from the complexity equation
so if the complexity is O(nlogn + n^2), when n is really big, the value of nlogn is not significant if compared to n^2, this is why you remove it and rewrite as O(n^2)

TriMerge vs Merge Sort

Can some one tell me which is better of the two algorithms TriMergeSort and MergeSort.
The time complexity of the MergeSort would be nlogn base 2.
The time complexity of the TriMergeSort is nlogn base 3.
Since TriMergeSort is base 3 and MergeSort is base 2 I am considering TriMergeSort is faster than that of MergeSort.
Please correct me if I am wrong.
While you are right that the number of levels in the recursive structure is log2 n in the case of regular mergesort and log3 n in the case of three-way mergesort, it's important to remember that the work done per level increases as the number of levels increases. Specifically, in your merge step, you need to switch from a normal 2-way merge to a special 3-way merge. At each step in the merge, you need to determine which of the lists has the smallest unused element. In a two-way merge, you just compare the front elements of the two lists against one another. In a three-way merge, there are more comparisons required because you have to find the lowest element out of three elements.
Generalizing this to a k-way mergesort, the number of layers will be logk n, but the work for the merge will be higher than this. It's possible to do a k-way merge of n total elements in time O(n log k) by using binary heaps, so more work is required as k increases.
Interestingly, if we talk about the amount of work required overall, then we can see that we need to do O(n log k) work across logk n levels. This gives us a total runtime of O(n log k logk n). Using the change-of-base formula for logarithms, which says that logk n = log2 n / log2 k, we see that the runtime will be
O(n log k logk n)
= O(n log k (log n / log k))
= O(n log n)
In other words, there isn't an asymptotic difference between the algorithms when you choose different values of k. The drop in levels due to a higher splitting factor is offset by an increased amount of work per level.
To figure out which algorithm is best, the best option would be to run them all and see what happens. Due to caching effects and locality of reference, I suspect that the answer might at some level depend on the particular architecture you're using.
As far as Big-O complexity, it doesn't matter.
Regular merge sort is n * log_2(n) which is equivalent to n * (log(n) / log(2)). The log(2) is constant, so merge sort is simply n * log(n)
Tri-merge sort is n * log_3(n) which, using the same logic for regular merge sort, is simply n * log(n)
Given that both reduce to O(n * log(n)), it's not really possible to say which is better.
An alternate way to demonstrate why you can't just assume tri-merge to be better:
Assume a 3-way merge is better than a 2-way merge.
In general, assume an (N+1)-way merge is better than an N-way merge.
If this were true, it would be best to use an N-way merge where N is the number of elements you're sorting. However, the merge step requires choosing the least element from N sources which requires O(N) time.
This means that the N-way merge sort runs in O(N^2) time, effectively making it selection sort.

N log(N) or N clarification

Will performing a O(log N) algorithm N times give O(N log(N))? Or is it O(N)?
e.g. Inserting N elements into a self-balancing tree.
int i = 0;
while (i++ < N) {
insert(itemsToInsert[i]);
}
It's definitely O(N log(N)). It COULD also be O(N), if you could show that the sequence of calls, as a total, grows slow enough (because while SOME calls are O(log N), enough others are fast enough, say O(1), to bring the total down).
Remember: O(f) means the algorithm is no SLOWER than f, but it can be faster (even if just in certain cases).
N times O(log(N)) leads to O(N log(N)).
Big-O notation notates the asymptotic behavior of the algorithm. The cost of each additional step is O(log N); we know that for an O(N) algorithm, the cost of each additional step is O(1) and asymptotically the cost function bound is a straight line.
Therefore O(N) is too low of a bound; O(N log N) seems about right.
Yes and no.
Calculus really helps here. The first iteration is complexity log(1), the second iteration is log(2), &ct until the Nth iteration which is log(N). Rather than thinking of the problem as a multiplication, think of it as an integral...
This happens to come out as O(N log(N)), but that is kind of a coincidence.

Randomized Quick Sort Pivot selection with 25%-75% split

I came to know that in case of Randomized quick sort, if we choose the pivot in such a way that it will at least give the split in the ration 25%-75%, then the run time is O(n log n).
Now I also came to know that we can prove this with Master Theorem.
But my problem is that if we split the array in 25%-75% in each step, then how I will define my T(n) and how can I prove that the runtime analysis in O(n log n)?
You can use Master theorem to find the complexity of this kind of algorithms. In this particular case assume, that when you divide the array into two parts each of these parts is not greater then 3/4 of the initial array. Then, T(n) < 2 * T(3/4 * n) + O(n), or T(n) = 2 * T(3/4 * n) + O(n) if you look for upper bound. Master theorem gives you the solution for this equation.
Update: though Master theorem may solve such recurrence equations, in this case it gives us a result which is worse than expected O(n*log n). Nevertheless, it can be solved in other way. If we assume that a pivot always splits the array in the way that the smaller part is >= 1/4 size, then we can limit the recursion depth as log_{4/3}N (because on each level the size of array decreases by at least 4/3 times). Time complexity on each recursion level is O(n) in total, thus we have O(n) * log{4/3}n = O(n*log n) overall complexity.
Furthermore, if you want some more strict analysis, you may consider a Wikipedia article, there are some good proofs.

Resources