I have a random ordered array of 30 elements with only 3 distinct keys (TRUE, FALSE and NULL) that I want to sort using insertion sort. What will be the time complexity? Will it be O(n2) assuming worst case or O(n) assuming best case since there are only 3 different keys?
n refers to the size of the array, not the possible elements of the array. Thus, the complexity is the same:
Worst-case: O(n2)
Best-case: O(n)
Average-case: O(n2)
Having 3 distinct elements will reduce the amount of elements you have to check during the "insertion" phase, but only by a constant factor. This will not change the asymptotic run-time.
For example, in the average-case, instead of insert checking n elements, it will check n/3 elements. This is better, but not asymptotically.
Related
The question:
Merge k sorted arrays each with n elements into a single array of size nk in minimum time complexity. The algorithm should be a comparison-based algorithm. No assumption on the input should be made.
So I know about an algorithm that solves the problem in nklogk time complexity as mentioned here: https://www.geeksforgeeks.org/merge-k-sorted-arrays/.
Though, my question is can we sort in less than nklogk, meaning, the runtime is o(nklogk).
So I searched through the internet and found this answer:
Merge k sorted arrays of size n in O(nk) time complexity
Which claims to divide an array of size K into singletons and merge them into a single array. But this is incorrect since one can claim that he found an algorithm that solves the problem in sqrt(n)klogk which is o(nklogk) but n=1 so we sort the array in KlogK time which doesn't contradict the lower bound on sorting an array.
So how can I contradict the lower bound on sorting an array? meaning, for an array of size N which doesn't have any assumptions on the input, sorting will take at least NlogN operations.
The lower bound of n log n only applies to comparison-based sorting algorithms (heap sort, merge sort, etc.). There are, of course, sorting algorithms that have better time complexities (such as counting sort), however they are not comparison-based.
Me and my fellow students have been debating for a good time what the big o notation for this is:
Creating a hashtable with values by iterative insertion (the number of elements is known at the beginning) in the average and worst case.
Average complexity of inserting 1 element is O(1) so inserting n elements in an empty hashtable should be O(n).
Worst case insertion of 1 element is O(n).
So is inserting n elements in an empty hashtable O(n^2) or O(n) and why?
Worst case happens when every insertion results in collision. The cost of collision depends on the hash table implementation. The simplest implementation is usually a linked list of all elements that belong to the same hash cell. So insertion of n elements will cost 1+2+3+..+n time units. This is a sum of arithmetic series and it equals n(n+1)/2=O(n2). This result can be improved by using more advanced data structures to handle collisions. For example, for AVL tree the cost of insertion is O(logn), i.e., for n elements it will be O(log1+log2+...+logn)=O(log(n!)) which is significantly better than O(n2).
Suppose we have m sets S1,S2,...,Sm of elements from {1...n}
Given that m=O(n) , |S1|+|S2|+...+|Sm|=O(n)
sort all the sets in O(n) time and O(n) space.
I was thinking to use counting sort algorithm on each set.
Counting sort on each set will be O(S1)+O(S2)+...+O(Sm) < O(n)
and because that in it's worst case if one set consists of n elements it will still take O(n).
But will it solve the problem and still hold that it uses only O(n) space?
Your approach won't necessarily work in O(n) time. Imagine you have n sets of one element each, where each set just holds n. Then each iteration of counting sort will take time Θ(n) to complete, so the total runtime will be Θ(n2).
However, you can use a modified counting sort to solve this by effectively doing counting sort on all sets at the same time. Create an array of length n that stores lists of numbers. Then, iterate over all the sets and for each element, if the value is k and the set number is r, append the number r to array k. This process essentially builds up a histogram of the distribution of the elements in the sets, where each element is annotated with the set that it came from. Then, iterate over the arrays and reconstruct the sets in sorted order using logic similar to counting sort.
Overall, this algorithm takes time Θ(n), since it takes time Θ(n) to initialize the array, O(n) total time to distribute the elements, and O(n) time to write them back. It also uses only Θ(n) space, since there are n total arrays and across all the arrays there are a total of n elements distributed.
Hope this helps!
I just read the Wikipedia page about Bucket sort. In this article they say that the worst case complexity is O(n²). But I thought the worst case complexity was O(n + k) where k are the number of buckets. This is how I calculate this complexity:
Add the element to the bucket. Using a linked list this is O(1)
Going through the list and put the elements in the correct bucket = O(n)
Merging the buckets = O(k)
O(1) * O(n) + O(k) = O(n + k)
Am I missing something?
In order to merge the buckets, they first need to be sorted. Consider the pseudocode given in the Wikipedia article:
function bucketSort(array, n) is
buckets ← new array of n empty lists
for i = 0 to (length(array)-1) do
insert array[i] into buckets[msbits(array[i], k)]
for i = 0 to n - 1 do
nextSort(buckets[i])
return the concatenation of buckets[0], ..., buckets[n-1]
The nextSort(buckets[i]) sorts each of the individual buckets. Generally, a different sort is used to sort the buckets (i.e. insertion sort), as once you get down and size, different, non-recursive sorts often give you better performance.
Now, consider the case where all n elements end up in the same bucket. If we use insertion sort to sort individual buckets, this could lead to the worst case performance of O(n^2). I think the answer must be dependent on the sort you choose to sort the individual buckets.
What if the algorithm decides that every element belongs in the same bucket? In that case, the linked list in that bucket needs to be traversed every time an element is added. That takes 1 step, then 2, then 3, 4, 5... n . Thus the time is the sum of all of the numbers from 1 to n which is (n^2 + n)/2, which is O(n^2).
Of course, this is "worst case" (all the elements in one bucket) - the algorithm to calculate which bucket to place an element is generally designed to avoid this behavior.
If you can guarantee that each bucket represents a unique value (equivalent items), then the worst case time complexity would be O(m+n) as you pointed out.
Bucket sort assumes that the input is drawn from a uniform distribution. This implies that a few items fall in each bucket. In turn, this leads to a nice average running time of O(n). Indeed, if the n elements are inserted in each bucket so that O(1) elements fall in each different bucket (insertion requires O(1) per item), then sorting a bucket using insertion sort requires, on average, O(1) as well (this is proved in almost all textbooks on algorithms). Since you must sort n buckets, the average complexity is O(n).
Now, assume that the input is not drawn from a uniform distribution. As already pointed out by #mfrankli, this may lead in the worst case to a situation in which all of the items fall for example all in the first bucket. In this case, insertion sort will require in the worst case O(n^2).
Note that you may use the following trick to maintain the same average O(n) complexity, while providing an O(n log n) complexity in the worst case. Instead of using insertion sort, simply use an algorithm with O(n log n) complexity in the worst case: either merge sort or heap sort (but not quick sort, which achieves O(n log n) only on average).
This is an add-on answer to #perreal. I tried to post it as a comment but it's too long. #perreal is correctly pointing out when bucket sort makes the most sense. The different answers are making different assumptions about what data is being sorted. E.G. if the keys to be sorted are strings, then the range of possible keys will be too large (larger than the bucket array), and we will have to only use the first character of the string for the bucket positions or some other strategy. The individual buckets will have to be sorted because they hold items with different keys, leading to O(n^2).
But if we are sorting data where the keys are integers in a known range, then the buckets are always already sorted because the keys in the bucket are equal, which leads to the linear time sort. Not only are the buckets sorted, but the sort is stable because we can pull items out of the bucket array in the order they were added.
The thing that I wanted to add is that if you are facing O(n^2) because of the nature of the keys to be sorted, bucket sort might not be the right approach. When you have a range of possible keys that is proportional to the size of the input, then you can take advantage of the linear time bucket sort by having each bucket hold only 1 value of a key.
I have 2 arrays
a of length n
b of length m
Now i want to find all elements that are common to both the arrays
Algorithm:
Build hash map consisting of all elements of A
Now for each element x in B check if the element x is present in hashmap.
Analyzing overall time complexity
for building hashmap O(n)
for second step complexity is O(m)
So the overall is O(m+n). Am i correct?
What is O(m+n) = ?? when m is large or vice versa?
O(m) + O(n) = O(m+n), if you know that m>n then O(m+n)=O(m+m)=O(m).
Note: hashes theoretically don't guarantee O(1) lookup, but practically you can count on it (= it's the average complexity, the expected runtime for a random input).
Also note, that your algo will repeatedly signal duplicated elements of b which are also present in a. If this is a problem you have to store in the hash that you already checked/printed out that element.
Average case time complexity is O(m + n). This is what you should consider if you are doing some implementation, since hash maps would usually not have collisions. O(m+n) = O(max(m, n))
However, if this is an test question, by time complexity, people mean worst case time complexity. Worst case time complexity is O(mn) since each of second steps can take O(n) time in worst case.