I was going over the 2 way merge sort algorithm and was thinking if by reducing the merge passes can we get better gain in terms of time.
E.g in a 2 way merge we have the following recurrence:
T(n) = 2T(n/2) + O(n)
and this has a time complexity of N.log-base2(N)
if I divide the problem by 4 and merge 4 sub arrays I will get
T(n) = 4T(n/4) + O(n)
and this should have a time complexity of N.log-base4(N)
Since, the number of merge passes has reduced, should this be something to consider when implementing merge sort ?
e.g with an array of 64 elements the first approach will have 6 passes and using the second approach it will have 3 passes.
Edit:
Ok, so with 2T(n/2) we are doing N comparisons per pass and with 4T(n/4) we end up doing 3*N comparisons per pass ? What about the cost of moving elements to result arrary, does that remain same at each pass ?
Note that the base-4 log of a number is exactly half the base-2 log of a number; thus, you're only introducing a constant-factor speedup. Except you're not, because you introduce a similar constant-factor SLOWDOWN in the cost of the actual merging (2-way merge needs 1 comparison per item, 4-way may need up to 3). So while there may be fewer passes, the passes are each more costly. So you've complicated the code a fair bit, and the benefits are in question.
It sounds reasonable but is not, because in this case you have to do at least 5 comparions to sort each 4 elements. This way you have 5*N*log4(N) comparisons and this is greater than N*log2(N) by factor 5*log(2)/log(4)
Height of decomposition tree is log4n, but running time is not n log4n.
In first step you have four list of size n/4, their merge takes n/4 + n/4 + n/2 comparison. but in normal case it takes n/2 comparison, so you half the height of tree but you multiply each merge with two, so total running time is not better than division to 2 part. (you can prove for other depths of tree that you should have at least two times comparison compare to tree with two branch).
P.S:By running time I mean number of comparison.
Related
Number of operations required to implimented merge is :
:::::: 6n (logn+1)= 6nlogn+6n.
logn+1 is the number of levels in merge sort. What is 6n here?
In the case of a crude merge sort: two reads to compare two elements, one read and one write to copy the smaller element to a working array, then later another read and another write to copy elements back to the original array, for a total of 6 memory accesses per element (except for boundary cases like reaching the end of a run, in which case the remainder of the other run is just copied without compares). A more optimized merge sort avoids the copy back step by alternating the direction of merge depending on the merge pass if bottom up, or the recursion level if top down, reducing the 6 to a 4. If an element fits in a register, then after a compare, the element will be in a register and will not have to be re-read, reducing the 6 to a 3.
I'm not sure what mean by "what is 6n"? If you are asking about the complexity of your algorithm (merge sort), it can be reduced to nlog(n). You can ignore the the coefficients in your problem as they are negligible when accounting for big O complexity. When calculating nlog(n) + n, you can also ignore n as it will increase at a much slower rate than nlog(n). This leaves you with a complexity of nlog(n).
I keep seeing the search time for linked lists listed as O(N) but if you have 100 elements in a list aren't you on average only comparing against 50 of them before you've found a match?
So is O(N/2) being rounded to O(N) or am I just wrong in thinking it's N/2 on average for a linked list lookup?
Thanks!
The thing is, the order is really only talking about how the time increases as n increases.
So O(N) means that you have linear growth. If you double N then the time taken also doubles. N/2 and N both have the same growth behaviour so in terms of Order they are identical.
Functions like log(N), and N^2 on the other hand have non-linear growth, N^2 for example means that if you double N the time taken increases 4 times.
It is all about ratios. If something on average takes 1 minute for 1 item will it on average take 2 minutes or 4 minutes for 2 items? O(N) will be 2 minutes, O(N^2) will take 4 minutes. If the original took 1 second then O(N) will take 2 seconds, O(N^2) 4 seconds.
The algorithm that takes 1 minute and the algorithm that takes 1 second are both O(N)!
The other answers make the main points required for the answer, but there's a detail I'd like to add: If it's not explicitly stated, complexity statements typically discuss the worst case behaviour, rather than the average, simply because analysing the worst case behaviour is frequently a lot easier than the average behaviour. So if I say, without any additional qualifications, search in a linked list is O(N), then I'm admittedly being sloppy, but I would be talking about the worst case of search in a linked list taking a number of steps linear in N.
The O means something. The average number of elements that need to be traversed, to find something in a linked list is N/2. N/2 = O(N).
Note that saying search in linked list on average takes n/2 operations is wrong as operation is not defined. I could argue that for each node you need to read it's value, compare it to what you are searching for and then read the pointer to next node, and thus that the algorithm performs 3N/2 operations on average. Using O notation allows us to avoid such insignificant details.
In median-of-medians algorithm, we need to divide the array into chunks of size 5. I am wondering how did the inventors of the algorithms came up with the magic number '5' and not, may be, 7, or 9 or something else?
The number has to be larger than 3 (and an odd number, obviously) for the algorithm. 5 is the smallest odd number larger than 3. So 5 was chosen.
I think that if you'll check "Proof of O(n) running time" section of wiki page for medians-of-medians algorithm:
The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians is 20% of the size of the list, while the other recursive call recurses on at most 70% of the list, making the running time
The O(n) term c n is for the partitioning work (we visited each element a constant number of times, in order to form them into n/5 groups and take each median in O(1) time).
From this, using induction, one can easily show that
That should help you to understand, why.
You can also use blocks of size 3 or 4, as shown in the paper Select with groups of 3 or 4 by K. Chen and A. Dumitrescu (2015). The idea is to use the "median of medians" algorithm twice and partition only after that. This lowers the quality of the pivot but is faster.
So instead of:
T(n) <= T(n/3) + T(2n/3) + O(n)
T(n) = O(nlogn)
one gets:
T(n) <= T(n/9) + T(7n/9) + O(n)
T(n) = Theta(n)
See this explanation on Brilliant.org. Basically, five is the smallest possible array we can use to maintain linear time. It is also easy to implement a linear sort with an n=5 sized array. Apologies for the laTex:
Why 5?
The median-of-medians divides a list into sublists of length five to
get an optimal running time. Remember, finding the median of small
lists by brute force (sorting) takes a small amount of time, so the
length of the sublists must be fairly small. However, adjusting the
sublist size to three, for example, does change the running time for
the worse.
If the algorithm divided the list into sublists of length three, pp
would be greater than approximately \frac{n}{3} 3 n elements and
it would be smaller than approximately \frac{n}{3} 3 n elements.
This would cause a worst case \frac{2n}{3} 3 2n recursions,
yielding the recurrence T(n) = T\big( \frac{n}{3}\big) +
T\big(\frac{2n}{3}\big) + O(n),T(n)=T( 3 n )+T( 3 2n )+O(n),
which by the master theorem is O(n \log n),O(nlogn), which is slower
than linear time.
In fact, for any recurrence of the form T(n) \leq T(an) + T(bn) +
cnT(n)≤T(an)+T(bn)+cn, if a + b < 1a+b<1, the recurrence will solve to
O(n)O(n), and if a+b > 1a+b>1, the recurrence is usually equal to
\Omega(n \log n)Ω(nlogn). [3]
The median-of-medians algorithm could use a sublist size greater than
5—for example, 7—and maintain a linear running time. However, we need
to keep the sublist size as small as we can so that sorting the
sublists can be done in what is effectively constant time.
In class, simple sort is used as like one of our first definitions of O(N) runtimes...
But since it goes through one less iteration of the array every time it runs, wouldn't it be something more along the lines of...
Runtime bubble= sum(i = 0, n, (n-i)) ?
And aren't only the biggest processes when run one after another counted in asymptotic analysis which would be the N iteration, why is by definition this sort not O(N)?
The sum of 1 + 2 + ... + N is N*(N+1)/2 ... (high school maths) ... and that approaches (N^2)/2 as N goes to infinity. Classic O(N^2).
I'm not sure where you (or your professor) got the notion that bubble sort is O(n). If your professor had a guaranteed O(n) sort algorithm, they'd be wise to try and patent it :-)
A bubble sort is, by it's very nature, O(n2).
That's because it has to make a full pass of the entire data set, to correctly place the first element.
Then a second pass of N - 1 elements to correctly place the second. And a third pass of N - 2 elements to correctly place the third.
And so on, effectively ending up with close to N * N / 2 operations which, removing the superfluous 0.5 constant, is O(n2).
The time complexity of bubble sort is O(n^2).
When considering the complexity, only the largest expression is considered (but not the factor)
I know there are quite a bunch of questions about big O notation, I have already checked:
Plain english explanation of Big O
Big O, how do you calculate/approximate it?
Big O Notation Homework--Code Fragment Algorithm Analysis?
to name a few.
I know by "intuition" how to calculate it for n, n^2, n! and so, however I am completely lost on how to calculate it for algorithms that are log n , n log n, n log log n and so.
What I mean is, I know that Quick Sort is n log n (on average).. but, why? Same thing for merge/comb, etc.
Could anybody explain me in a not too math-y way how do you calculate this?
The main reason is that Im about to have a big interview and I'm pretty sure they'll ask for this kind of stuff. I have researched for a few days now, and everybody seem to have either an explanation of why bubble sort is n^2 or the unreadable explanation (for me) on Wikipedia
The logarithm is the inverse operation of exponentiation. An example of exponentiation is when you double the number of items at each step. Thus, a logarithmic algorithm often halves the number of items at each step. For example, binary search falls into this category.
Many algorithms require a logarithmic number of big steps, but each big step requires O(n) units of work. Mergesort falls into this category.
Usually you can identify these kinds of problems by visualizing them as a balanced binary tree. For example, here's merge sort:
6 2 0 4 1 3 7 5
2 6 0 4 1 3 5 7
0 2 4 6 1 3 5 7
0 1 2 3 4 5 6 7
At the top is the input, as leaves of the tree. The algorithm creates a new node by sorting the two nodes above it. We know the height of a balanced binary tree is O(log n) so there are O(log n) big steps. However, creating each new row takes O(n) work. O(log n) big steps of O(n) work each means that mergesort is O(n log n) overall.
Generally, O(log n) algorithms look like the function below. They get to discard half of the data at each step.
def function(data, n):
if n <= constant:
return do_simple_case(data, n)
if some_condition():
function(data[:n/2], n / 2) # Recurse on first half of data
else:
function(data[n/2:], n - n / 2) # Recurse on second half of data
While O(n log n) algorithms look like the function below. They also split the data in half, but they need to consider both halves.
def function(data, n):
if n <= constant:
return do_simple_case(data, n)
part1 = function(data[n/2:], n / 2) # Recurse on first half of data
part2 = function(data[:n/2], n - n / 2) # Recurse on second half of data
return combine(part1, part2)
Where do_simple_case() takes O(1) time and combine() takes no more than O(n) time.
The algorithms don't need to split the data exactly in half. They could split it into one-third and two-thirds, and that would be fine. For average-case performance, splitting it in half on average is sufficient (like QuickSort). As long as the recursion is done on pieces of (n/something) and (n - n/something), it's okay. If it's breaking it into (k) and (n-k) then the height of the tree will be O(n) and not O(log n).
You can usually claim log n for algorithms where it halves the space/time each time it runs. A good example of this is any binary algorithm (e.g., binary search). You pick either left or right, which then axes the space you're searching in half. The pattern of repeatedly doing half is log n.
For some algorithms, getting a tight bound for the running time through intuition is close to impossible (I don't think I'll ever be able to intuit a O(n log log n) running time, for instance, and I doubt anyone will ever expect you to). If you can get your hands on the CLRS Introduction to Algorithms text, you'll find a pretty thorough treatment of asymptotic notation which is appropriately rigorous without being completely opaque.
If the algorithm is recursive, one simple way to derive a bound is to write out a recurrence and then set out to solve it, either iteratively or using the Master Theorem or some other way. For instance, if you're not looking to be super rigorous about it, the easiest way to get QuickSort's running time is through the Master Theorem -- QuickSort entails partitioning the array into two relatively equal subarrays (it should be fairly intuitive to see that this is O(n)), and then calling QuickSort recursively on those two subarrays. Then if we let T(n) denote the running time, we have T(n) = 2T(n/2) + O(n), which by the Master Method is O(n log n).
Check out the "phone book" example given here: What is a plain English explanation of "Big O" notation?
Remember that Big-O is all about scale: how much more operation will this algorithm require as the data set grows?
O(log n) generally means you can cut the dataset in half with each iteration (e.g. binary search)
O(n log n) means you're performing an O(log n) operation for each item in your dataset
I'm pretty sure 'O(n log log n)' doesn't make any sense. Or if it does, it simplifies down to O(n log n).
I'll attempt to do an intuitive analysis of why Mergesort is n log n and if you can give me an example of an n log log n algorithm, I can work through it as well.
Mergesort is a sorting example that works through splitting a list of elements repeatedly until only elements exists and then merging these lists together. The primary operation in each of these merges is comparison and each merge requires at most n comparisons where n is the length of the two lists combined. From this you can derive the recurrence and easily solve it, but we'll avoid that method.
Instead consider how Mergesort is going to behave, we're going to take a list and split it, then take those halves and split it again, until we have n partitions of length 1. I hope that it's easy to see that this recursion will only go log (n) deep until we have split the list up into our n partitions.
Now that we have that each of these n partitions will need to be merged, then once those are merged the next level will need to be merged, until we have a list of length n again. Refer to wikipedia's graphic for a simple example of this process http://en.wikipedia.org/wiki/File:Merge_sort_algorithm_diagram.svg.
Now consider the amount of time that this process will take, we're going to have log (n) levels and at each level we will have to merge all of the lists. As it turns out each level will take n time to merge, because we'll be merging a total of n elements each time. Then you can fairly easily see that it will take n log (n) time to sort an array with mergesort if you take the comparison operation to be the most important operation.
If anything is unclear or I skipped somewhere please let me know and I can try to be more verbose.
Edit Second Explanation:
Let me think if I can explain this better.
The problem is broken into a bunch of smaller lists and then the smaller lists are sorted and merged until you return to the original list which is now sorted.
When you break up the problems you have several different levels of size first you'll have two lists of size: n/2, n/2 then at the next level you'll have four lists of size: n/4, n/4, n/4, n/4 at the next level you'll have n/8, n/8 ,n/8 ,n/8, n/8, n/8 ,n/8 ,n/8 this continues until n/2^k is equal to 1 (each subdivision is the length divided by a power of 2, not all lengths will be divisible by four so it won't be quite this pretty). This is repeated division by two and can continue at most log_2(n) times, because 2^(log_2(n) )=n, so any more division by 2 would yield a list of size zero.
Now the important thing to note is that at every level we have n elements so for each level the merge will take n time, because merge is a linear operation. If there are log(n) levels of the recursion then we will perform this linear operation log(n) times, therefore our running time will be n log(n).
Sorry if that isn't helpful either.
When applying a divide-and-conquer algorithm where you partition the problem into sub-problems until it is so simple that it is trivial, if the partitioning goes well, the size of each sub-problem is n/2 or thereabout. This is often the origin of the log(n) that crops up in big-O complexity: O(log(n)) is the number of recursive calls needed when partitioning goes well.