Divide and conquer - why does it work? - algorithm

I know that algorithms like mergesort and quicksort use the divide-and-conquer paradigm, but I'm wondering why does it work in lowering the time complexity...
why does usually a "divide and conquer" algorithm work better than a non-divide-and-conquer one?

Divide and conquer algorithms work faster because they end up doing less work.
Consider the classic divide-and-conquer algorithm of binary search: rather than looking at N items to find an answer, binary search ends up checking only Log2N of them. Naturally, when you do less work, you can finish faster; that's precisely what's going on with the divide-and-conquer algorithms.
Of course the results depend a lot on how well your strategy does at dividing the work: if the division is more or less fair at every step (i.e. you divide the work in half) you get the perfect Log2N speed. If, however, the dividing is not perfect (e.g. the worst case of quicksort, when it spends O(n^2) sorting the array because it eliminates only a single element at each iteration) then divide-and-conquer strategy is not helpful, as your algorithm does not reduce the amount of work.

Divide and conquer works, because the mathematics supports it!
Consider a few divide and conquer algorithms:
1) Binary search: This algorithm reduces your input space to half each time. It is intuitively clear that this is better than a linear search, as we would avoid looking at a lot of elements.
But how much better? We get the recurrence (note: this is recurrence for the worst case analysis):
T(n) = T(n/2) + O(1)
Mathematics implies that T(n) = Theta(log n). Thus this is exponentially better than a linear search.
2) Merge Sort: Here we divide into two (almost) equal halves, sort the halves and then merge them. Why should this be better than quadratic? This is recurrence:
T(n) = 2T(n/2) + O(n)
It can be mathematically shown (say using Master theorem) that T(n) = Theta(n log n). Thus T(n) is asymptotically better than quadratic.
Observe that the naive quicksort ends up giving us the recurrence for worst case as
T(n) = T(n-1) + O(n)
which mathematically, comes out to be quadratic, and in the worst case, isn't better than bubble sort (asymptotically speaking). But, we can show that in the average case, quicksort is O(n log n).
3 Selection Algorithm: This is a divide a conquer algorithm to find the k^th largest element. It is not at all obvious whether this algorithm is better than sorting (or even that it is not quadratic).
But mathematically, its recurrence(again worst case) comes out to be
T(n) = T(n/5) + T(7n/10 + 6) + O(n)
It can be shown mathematically that T(n) = O(n) and thus it is better than sorting.
Perhaps a common way to look at them:
You can look at algorithms as tree where each sub-problem becomes a sub-tree of the current and the node can be tagged with the amount of work done and then the total work can be added up for each node.
For binary search, the work is O(1) (just a compare), and one of the sub-trees, the work is 0, so the total amount of work is O(log n) (essentially a path, just like we do in binary search trees).
For merge-sort, for a node with k children, the work is O(k) (merge step). The work done at each level is O(n) (n, n/2 + n/2, n/4 + n/4 + n/4 + n/4 etc) and there are O(log n) levels, and so merge sort is O(n log n).
For quicksort, in the worst case the binary tree is actually a linked list, so work done is n+n-1 + ... + 1 = Omega(n^2).
For selection sort, I have no clue how to visualize it, but I believe looking at it as a tree with 3 children (n/5, 7n/10 and the remaining) might still help.

Divide and conquer algorithms don't "usually work better". They just work, as other non-divide and conquer algorithms do. They don't lower sorting complexity, they do as good as other algorithms.

Related

What would be the running time of an algorithm that combines mergeSort and heapsort?

I have been given this problem that asks to compute the worst case running time of an algorithm that's exactly like mergeSort, but one of the two recursive calls is substituted by Heapsort.
So, I know that dividing in mergesort takes constant time and that merging is O(n). Heapsort takes O(nlogn).
This is what I came up with: T(n) = 2T(n/2) + O((n/2)logn)+ O(n).
I have some doubts about the O((n/2)logn) part. Is it n or n/2? I wrote n/2 because I'm doing heapsort only on half of the array, but I'm not sure that's correct
The question asks about running time, but should it be asking about time complexity?
Since recursion is mentioned, this is a question about top down merge sort (as opposed to bottom up merge sort).
With the code written as described, since heap sort is not recursive, recursion only occurs on one of each of the split sub-arrays. Heap sort will be called to sort sub-arrays of size n/2, n/4, n/8, n/16, ... , and no merging takes place until two sub-arrays of size 1 are the result of the recursive splitting. In the simple case where array size is a power of 2, then "merge sort" is only used for a single element, the rest of the sub-arrays of size {1, 2, 4, 8, ..., n/8, n/4, n/2} are sorted by heap sort and then merged.
Since heap sort is slower than merge sort, then running time will be longer, but time complexity remains at O(n log(n)) since constant or lower term factors are ignored for time complexity.
Let’s work out what the recurrence relation should be in this case. Here, we’re
splitting the array in half,
recursively sorting one half (T(n / 2)),
heapsorting one half (O(n log n)), and then
merging the two halves together (O(n)).
That gives us this recurrence relation:
T(n) = T(n / 2) + O(n log n).
Why is this O(n log n) and not, say, O((n / 2) log (n / 2))? The reason is that big-O notation munches up constant factors, so O(n log n) expresses the same asymptotic growth rate as O((n / 2) log (n / 2)). And why isn’t there a coefficient of 2 on the T(n / 2)? It’s because we’re only making one recursive call; remember that the other call was replaced by heapsort.
All that’s left to do now is to solve this recurrence. It does indeed work out to O(n log n), and I’ll leave it to you to decide how you want to show this. The iteration method is a great option here.

What's the difference between Theta(n) and T(n) when considering time complexity?

The professor was discussing the time complexity of merge sort and he divided the whole process into three steps.
check whether the size of array is 1 -> time complexity: theta(1)
He described the sorting process -> time complexity: 2T(n/2)
Merge the tow sorted sequences -> time complexity: theta(n)
I don't understand step 2, why did he describe it as 2T(n/2) instead of 2Theta(n/2)? What's the difference between theta(n) and T(n)?
Here is the link from Youtube: https://www.youtube.com/watch?v=JPyuH4qXLZ0
And it's between 1:08:45 - 1:10:33
What the professor means by T(n), is the exact complexity, i.e. the number of steps the algorithm needs to complete, which actually may vary depending on the implementation. What's more interesting, is the asymptotic complexity, which is here denoted as Θ(n), and shows how fast T grows along with n.
The first step of the mergesort algorithm is to split the array into halves and sort each half with the same algorithm (which is therefore recursive). That step takes obviously 2T(n/2). Then you merge both halves (hence the name), that takes linear time, Θ(n). From that recursive definition T(n) = 2T(n/2) + Θ(n) he derives that T(n) = Θ(nlogn) which is the complexity class of the mergesort algorithm.

Is n or nlog(n) better than constant or logarithmic time?

In the Princeton tutorial on Coursera the lecturer explains the common order-of-growth functions that are encountered. He says that linear and linearithmic running times are "what we strive" for and his reasoning was that as the input size increases so too does the running time. I think this is where he made a mistake because I have previously heard him refer to a linear order-of-growth as unsatisfactory for an efficient algorithm.
While he was speaking he also showed a chart that plotted the different running times - constant and logarithmic running times looked to be more efficient. So was this a mistake or is this true?
It is a mistake when taken in the context that O(n) and O(n log n) functions have better complexity than O(1) and O(log n) functions. When looking typical cases of complexity in big O notation:
O(1) < O(log n) < O(n) < O(n log n) < O(n^2)
Notice that this doesn't necessarily mean that they will always be better performance-wise - we could have an O(1) function that takes a long time to execute even though its complexity is unaffected by element count. Such a function would look better in big O notation than an O(log n) function, but could actually perform worse in practice.
Generally speaking: a function with lower complexity (in big O notation) will outperform a function with greater complexity (in big O notation) when n is sufficiently high.
You're missing the broader context in which those statements must have been made. Different kinds of problems have different demands, and often even have theoretical lower bounds on how much work is absolutely necessary to solve them, no matter the means.
For operations like sorting or scanning every element of a simple collection, you can make a hard lower bound of the number of elements in the collection for those operations, because the output depends on every element of the input. [1] Thus, O(n) or O(n*log(n)) are the best one can do.
For other kinds of operations, like accessing a single element of a hash table or linked list, or searching in a sorted set, the algorithm needn't examine all of the input. In those settings, an O(n) operation would be dreadfully slow.
[1] Others will note that sorting by comparisons also has an n*log(n) lower bound, from information-theoretic arguments. There are non-comparison based sorting algorithms that can beat this, for some types of input.
Generally speaking, what we strive for is the best we can manage to do. But depending on what we're doing, that might be O(1), O(log log N), O(log N), O(N), O(N log N), O(N2), O(N3), or (or certain algorithms) perhaps O(N!) or even O(2N).
Just for example, when you're dealing with searching in a sorted collection, binary search borders on trivial and gives O(log N) complexity. If the distribution of items in the collection is reasonably predictable, we can typically do even better--around O(log log N). Knowing that, an algorithm that was O(N) or O(N2) (for a couple of obvious examples) would probably be pretty disappointing.
On the other hand, sorting is generally quite a bit higher complexity--the "good" algorithms manage O(N log N), and the poorer ones are typically around O(N2). Therefore, for sorting an O(N) algorithm is actually very good (in fact, only possible for rather constrained types of inputs), and we can pretty much count on the fact that something like O(log log N) simply isn't possible.
Going even further, we'd be happy to manage a matrix multiplication in only O(N2) instead of the usual O(N3). We'd be ecstatic to get optimum, reproducible answers to the traveling salesman problem or subset sum problem in only O(N3), given that optimal solutions to these normally require O(N!).
Algorithms with a sublinear behavior like O(1) or O(Log(N)) are special in that they do not require to look at all elements. In a way this is a fallacy because if there are really N elements, it will take O(N) just to read or compute them.
Sublinear algorithms are often possible after some preprocessing has been performed. Think of binary search in a sorted table, taking O(Log(N)). If the data is initially unsorted, it will cost O(N Log(N)) to sort it first. The cost of sorting can be balanced if you perform many searches, say K, on the same data set. Indeed, without the sort, the cost of the searches will be O(K N), and with pre-sorting O(N Log(N)+ K Log(N)). You win if K >> Log(N).
This said, when no preprocessing is allowed, O(N) behavior is ideal, and O(N Log(N)) is quite comfortable as well (for a million elements, Lg(N) is only 20). You start screaming with O(N²) and worse.
He said those algorithms are what we strive for, which is generally true. Many algorithms cannot possibly be improved better than logarithmic or linear time, and while constant time would be better in a perfect world, it's often unattainable.
constant time is always better because the time (or space) complexity doesn't depend on the problem size... isn't it a great feature? :-)
then we have O(N) and then Nlog(N)
did you know? problems with constant time complexity exist!
e.g.
let A[N] be an array of N integer values, with N > 3. Find and algorithm to tell if the sum of the first three elements is positive or negative.
What we strive for is efficiency, in the sense of designing algorithms with a time (or space) complexity that does not exceed their theoretical lower bound.
For instance, using comparison-based algorithms, you can't find a value in a sorted array faster than Omega(Log(N)), and you cannot sort an array faster than Omega(N Log(N)) - in the worst case.
Thus, binary search O(Log(N)) and Heapsort O(N Log(N)) are efficient algorithms, while linear search O(N) and Bubblesort O(N²) are not.
The lower bound depends on the problem to be solved, not on the algorithm.
Yes constant time i.e. O(1) is better than linear time O(n) because the former is not depending on the input-size of the problem. The order is O(1) > O (logn) > O (n) > O (nlogn).
Linear or linearthimic time we strive for because going for O(1) might not be realistic as in every sorting algorithm we atleast need a few comparisons which the professor tries to prove with his decison Tree- comparison analysis where he tries to sort three elements a b c and proves a lower bound of nlogn. Check his "Complexity of Sorting" in the Mergesort lecture.

Randomized Quick Sort Pivot selection with 25%-75% split

I came to know that in case of Randomized quick sort, if we choose the pivot in such a way that it will at least give the split in the ration 25%-75%, then the run time is O(n log n).
Now I also came to know that we can prove this with Master Theorem.
But my problem is that if we split the array in 25%-75% in each step, then how I will define my T(n) and how can I prove that the runtime analysis in O(n log n)?
You can use Master theorem to find the complexity of this kind of algorithms. In this particular case assume, that when you divide the array into two parts each of these parts is not greater then 3/4 of the initial array. Then, T(n) < 2 * T(3/4 * n) + O(n), or T(n) = 2 * T(3/4 * n) + O(n) if you look for upper bound. Master theorem gives you the solution for this equation.
Update: though Master theorem may solve such recurrence equations, in this case it gives us a result which is worse than expected O(n*log n). Nevertheless, it can be solved in other way. If we assume that a pivot always splits the array in the way that the smaller part is >= 1/4 size, then we can limit the recursion depth as log_{4/3}N (because on each level the size of array decreases by at least 4/3 times). Time complexity on each recursion level is O(n) in total, thus we have O(n) * log{4/3}n = O(n*log n) overall complexity.
Furthermore, if you want some more strict analysis, you may consider a Wikipedia article, there are some good proofs.

Avgerage Time Complexity of a sorting algorithm

I have a treesort function which performs two distinct tasks, each with its own time complexity. I figured out the avg. case time complexity of the two tasks but how do I find the overall complexity of the algorithm.
For example the algorithm takes in a random list of "n" keys x:
Sort(x):
Insert(x):
#Time complexity of O(nLog(n))
Traverse(x):
#Time complexity of O(n)
Do I just add the two complexities together to give me O(n + nLog(n)) or do I take the dominant task (in this case Insert) and end up with an overall complexity of O(nLog(n))
In a simple case like this,
O((n) + (n log(n)) = O(n + n log(n))
= O(n (log(n) + 1))
= O(n log(n))
or do I take the dominant task (in this case Insert) and end up with an over complexity of O(nLog(n))
That's right. As n grows, first element in O(n + nLog(n)) sum will become less and less significant. Thus, for sufficiently large n, its contribution can be ignored.
You need to take the dominant one.
The whole idea of measuring complexity this way is based on the assumption that you want to know what happens with large ns.
So if you have a polynomial, you can discard all but the highest order element, if you have a logarithm, you can ignore the base and so on.
In everyday practice however, these differences may start to matter, so it's sometimes good to have a more precise picture of your algorithm's complexity, even down to the level where you assign different weights to different operations.
(Returning to your original questions, assuming you're using base 2 logarithms, at n=1048576, the difference between n+n*logn and n*logn is around 5%, which is probably not really worth worrying about.)

Resources