Assume you have to sort an array with n = 1,000,000 elements. How long would insert sort and heapsort roughly need assuming each basic step takes one milli-second?
I know that insert sort takes n^2 steps in the worst case, and heapsort takes n log n steps in the worst case.
So 1,000,000 ^ 2 for insertion sort = 1*10^12 milli-seconds
and 1,000,000 * log(1,000,000) for heap-sort? 6,000,000 milli-seconds
is that correct?
Well...
The problem is that "order" notation is only talking about limits and comparisons, not absolute times. It also leaves off constants and lower order terms.
For example (this is totally fictitious), the actual running time for the specific insertion sort implementation you might be looking at could be:
num steps = 45,334 * n^2 + 6,500,000 * n + 2,000,000
That is an O(n^2) algorithm, but it'll take a lot more time than what you've computed.
Related
Performing binary search on a sorted array has O(logN) complexity where N is the number of elements in the array.
But if we perform binary searches in a sorted (linked) list then what is the complexity?
We are doing logN comparisons of the middle element of the range but to get to the range the complexity is O(N) due to the fact that the list is not a random access structure.
So is the time complexity:
1) logN * O(N) = O(N) treating logN as a constant? or
2) logN*O(N) = O(NlogN) meaning that logN = O(logN) in all cases?
What is correct here? 1 or 2?
The second assumption is correct and first is wrong. Asymptotic analysis deals with growth. If the number of nodes increase, log(n) would also increase. You can't treat it as a constant. For a very basic example, if you had 10 nodes and it took 10 seconds to execute, assuming 100 nodes take 200 seconds to execute seems better more accurate than assuming 100 seconds (by neglecting log(n)).
So assume we are given an array of m numbers, the max number in this array is k. There are duplicates in this array.
let array a = [1,2,3,1,2,5,1,2,3,4]
Is there an algorithm that prints out this array after o(n) operation result in [1,2,3,4,5](both sorted and no duplicate), where n is the quantity of unique values.
We are allowed to use k memory -- 5 in this case.
The algorithm I have in mind is to use a hash table. Insert value into a hash table, if the value exist before, we ignore it. This will sort automatically. However, if we have 5 number, [1,2,3,100,4] but one of them is 100, means when printing these 5 numbers, we need to run o(k) ~= 100 time instead of o(n) ~= 5 time.
Is there a way to solve this problem?
I don't think there exists such algorithm. Take a look here https://en.wikipedia.org/wiki/Sorting_algorithm
Essentially for comparison based algorithms the best you can achieve is O(nlogn). But since you have provided the max value k I would assume you want something more than just comparison based algorithm.
But for non-comparison based algorithms, since it by nature depends on magnitude of the numbers, the complexity has to reflect such dependency - meaning you will definitely have k somewhere in your total time complexity. You won't be able to find an algorithm of just O(n).
Conversly, if that O(n) algorithm were to exist and were not to depend on k. You can sort any array of n numbers since k is an extra, useless information.
You suggest that printing 5 numbers takes o(k) (or 100) time instead of o(n). That is wrong because, to print those 5 numbers, it takes 5 time to iterate and print. How would the value of your number change the time in which it takes to pull off this problem? The only situation when that should make a difference is if the value is greater than the allowable value in a 32-bit integer, or 2^32-1. Then you would have to detect those cases and treat them differently. However, assuming you don't have any integers of that size, you should be able to print 5 integers in O(5) time. I would go back over your calculation of the time it takes to go through your algorithm.
With your method, if you're using efficient algorithms, you should be able to remove duplicates in O(n log n) time, as seen here.
The way I see it, if you have a piece of the algorithm (the hashing part, where you remove duplicates and sort) running in O(n log n) time, and a piece of the algorithm (printing the array) running O(N) (or O(5) in this case), the entire algorithm runs in O(N) time: O(N) + O(N log N) -> O(N), since O(N) >= O(N log N). I hope that answers what you were asking for!
Looks like I was wrong, since O(N log N) of course grows faster than O(N). I don't think there's any way to pull off your problem.
Question:
Here's a modification of quick sort: whenever we have ten items or fewer in a sublist, we sort the sublist using selection sort rather than further recursing with quicksort. Does this change the big-oh time complexity of quicksort? Explain.
In my opinion the big-oh time complexity would change. We know that selection sort is O(n^2) and therefore sorting the sublist of ten items or fewer would take O(n^2). Until we get to a sublist that has ten or fewer items we would use quicksort and keep partitioning the list. So in the end we would have O( nlogn + n^2) which is O(n^2).
Am I correct? If not, could someone explain why?
The reason that the time complexity is actually unaffected is that 10 is a constant term. No matter how large the total array is, it always takes a constant amount of time to sort subarrays of size 10 or less. If you are sorting a list with one million elements, that constant 10 is going to play a very small role in the actual time it takes to sort the list (most of the time will be spent partitioning the original array into subarrays recursively).
If sorting a list of 10 elements takes constant time, partitioning the array at each recursive step is linear, and you end up with log n subarrays of 10 items or fewer, you end up with O(n log n + log n), which is the same as O(n log n).
Saying that selection sort is O(n^2) means that the running time of the algorithm increases quadratically with the size of the input. Running selection sort on an array with a constant number of elements will always take constant time on its own, but if you were to compare the running time of selection sort on arrays of varying sizes, you would see a quadratic increase in the running time as input size varies linearly.
The big O complexity does not change. Please read up on the Master Method (aka Master Theorem) https://en.wikipedia.org/wiki/Master_theorem
If you think through the algorithm as the size of the sorting list grows exceptionally large the time to sort the final ten in any given recursion substree will make insignificant contributions to overall running time.
I've been applying for jobs and every time I hear questions about algorithm time/space complexity, I cringe and stumble. No matter how much I read, my brain seems to be programmed to not get any of it, and I think the reason is down to the fact I have a very low mathematical background due to skipping school. This may not be the usual S.O question, potentially even be removed due to being fundamentally about the maths, but at least I'm hoping I'll figure out where to go next with this question.
I don't know why job people would go into that, so here's just a few examples. The whole "complexity" thing is just to provide an indication of how much time (or memory) the algorithm uses.
Now, if you have an array with values, accessing the value at a given index is O(1) -- constant. It doesn't matter how many elements are the in array, if you have an index you can get the element directly.
If, on the other hand, you are looking for a specific value, you'll have no choice but to look at every element (at least until you find the one, but that doesn't matter for the complexity thing). Thus, searching in a random array is O(n): the runtime corresponds to the number of elements.
If, on the other hand, you have sorted array then you can do a "binary search", which would be O(log n). "Log n" is twos-logarithm, which is basically the inverse of 2^n. For example, 2^10 is 2*2*2*2...*2 10 times = 1024, and log2(1024) is 10. Thus, algorithms with O(log n) are generally considered pretty good: to find an element in a sorted array using binary search, if the array has up to 1024 elements then the binary search will only have to look at 10 of them to find any value. For 1025-2048 elements it would be 11 peeks at most, for 2049-4096 it would 12 and so on. So, adding more elements will only slowly increase the runtime.
Of course, things can get a lot worse. A trivial sorting algorithm tends to be O(n**2), meaning that it needs 2^2 = 4 "operations" for an array with just 2 elements, 3^2 = 9 if the array has 3, 4^2 = 16 if the array has 4 elements and so on. Pretty bad actually, considering that an array with just 1000 elements would already require 1000*1000 = 1 million compares to sort. This is called exponential growth, and of course it can get even worse: O(n^3), O(n^4) etc. Increasingly bad.
A "good" sorting algorithm is O(n*log n). Assuming an array with 1024 elements, this would be 1024*10=10240 compares --- much better than the 1 million we had before.
Just take these O(...) as indicators for the runtime behavior (or memory footprint if applied to memory). I did plug in real numbers to you can see how the numbers change, but those are not important, and usually these complexities are worst case. Nonetheless, by just looking at the numbers, "constant time" is obviously best, exponential is always bad because runtime (or memory use) skyrockets really fast.
EDIT: also, you are not really interested in constant factors; you don't usually see "O(2n)". This would still be "O(n)" -- the runtime relates directly to the number of elements.
To analyze the time/space complexity of an algorithm - a high school knowledge should be fine. I studied this in Uni. during my first semester and I was just fine.
The fields of interests for the basics are:
basic calculus (understanding what a "limit" and asymptote are)
Series calculations (especially sum of arithmetic progression is often used)
Basics knowledge of combinatorics ("how many options there are to...?")
Basics of polynomial arithmetrics (p(x) + q(x) = ?)
The above is true for analyzing complexity of algorithms. Calculating complexity of problems is much deeper field, which is still in research - theory of complexity. This requires extensive knowledge on set theory, theory of computation, advanced calculus, linear algebra and much more.
While knowing something about calculus, summing series, and discrete mathematics are all Good Things, from your question and from my limited experience in industry, I'd doubt that your interviewers are expecting that level of understanding.
In practice, you can make useful big-O statements about time and space complexity without having to do much mathematical thinking. Here are the basics, which I'll talk about in terms of time complexity just make the language less abstract.
A big-O time complexity tells you how the worst-case running time of your algorithm scales with the size of its input. The actual numbers you get from the big-O function are an indication of the number of constant time operations your algorithm will perform on a given size of input.
A big-O function, therefore, simply counts the number of constant time operations your algorithm will perform.
A constant time operation is said to be O(1). [Note that any fixed length sequence of constant time operations is also O(1) since the sequence also takes a constant amount of time.]
O(k) = O(1), for any constant k.
If your algorithm performs several operations in series, you sum their costs.
O(f) + O(g) = O(f + g)
If your algorithm performs an operation multiple times, you multiply the cost of the operation by the number of times it is performed.
n * O(f) = O(n * f)
O(f) * O(f) * ... * O(f) = O(f^n), where there are n terms on the left hand side
A classic big-O function is log(n), which invariably corresponds to "the height of the balanced tree containing n items". You can get away with just knowing that sorting is O(n log(n)).
Finally, you only report the fastest growing term in a big-O function since, as the size of the input grows, this will dominate all the other terms. Any constant factors are also discarded, since we're only interested in the scaling properties of the result.
E.g., O(2(n^2) + n) = O(n^2).
Here are two examples.
Bubble Sorting n Items
Each traversal of the items sorts (at least) one item into place. We therefore need n traversals to sort all the items.
O(bubble-sort(n)) = n * O(traversal(n))
= O(n * traversal(n))
Each traversal of the items involves n - 1 adjacent compare-and-swap operations.
O(traversal(n)) = (n - 1) * O(compare-and-swap)
= O((n - 1) * O(compare-and-swap))
Compare-and-swap is a constant time operation.
O(compare-and-swap) = O(1)
Collecting our terms, we get:
O(bubble-sort(n)) = O(n * (n - 1) * 1)
= O(n^2 - n)
= O(n^2)
Merge Sorting n Items
Merge-sort works bottom-up, merging items into pairs, pairs into fours, fours into eights, and so on until the list is sorted. Call each such set of operations a "merge-traversal". There can be at most log_2(n) merge traversals since n = 2 ^ log_2(n) and at each level we are doubling the sizes of the sub-lists being merged. Therefore,
O(merge-sort(n)) = log_2(n) * O(merge-traversal(n))
= O(log_2(n) * merge-traversal(n))
Each merge-traversal traverses all the input data once. Each input item is the subject of at least one compare-and-select operation and each compare-and-select operation chooses one of a pair of items to "emit". Hence
O(merge-traversal(n)) = n * O(compare-and-select)
= O(n * compare-and-select)
Each compare-and-select operation takes constant time:
O(compare-and-select) = O(1)
Collecting terms, we get
O(merge-sort(n)) = O(log_2(n) * n * 1)
= O(n * log_2(n))
= O(n * log(n)), since change of log base is
multiplication by a constant.
Ta daaaa!
In class, simple sort is used as like one of our first definitions of O(N) runtimes...
But since it goes through one less iteration of the array every time it runs, wouldn't it be something more along the lines of...
Runtime bubble= sum(i = 0, n, (n-i)) ?
And aren't only the biggest processes when run one after another counted in asymptotic analysis which would be the N iteration, why is by definition this sort not O(N)?
The sum of 1 + 2 + ... + N is N*(N+1)/2 ... (high school maths) ... and that approaches (N^2)/2 as N goes to infinity. Classic O(N^2).
I'm not sure where you (or your professor) got the notion that bubble sort is O(n). If your professor had a guaranteed O(n) sort algorithm, they'd be wise to try and patent it :-)
A bubble sort is, by it's very nature, O(n2).
That's because it has to make a full pass of the entire data set, to correctly place the first element.
Then a second pass of N - 1 elements to correctly place the second. And a third pass of N - 2 elements to correctly place the third.
And so on, effectively ending up with close to N * N / 2 operations which, removing the superfluous 0.5 constant, is O(n2).
The time complexity of bubble sort is O(n^2).
When considering the complexity, only the largest expression is considered (but not the factor)