Time complexity of sort with a limit - algorithm

Let's take three different sorts: Selection, Bubble, and Quick. Here is how they perform with an array of size n:
Selection: O(n2) in the average and worst cases
Bubble: O(n2) in the average and worst cases
Quicksort: O(n log n), O(n2) in the average and worst cases
With a limit of L applied to them, what would be the time complexity of the sort? My guess was the answer would be: O(nL) and O(n log L) -- is that correct? Why or why not? Additionally, is there a particular type of array sort that performs better than others when doing a sort-with-limit?
For example:
a = [1, 4, 2, 7, 4, 22, 49, 0, 2]
sort(a, limit=4)
==> [0, 1, 2, 2]

Selection sort:
The time complexity for selection sort is going to be O(nL). In selection sort, we append the next smallest element in the unsorted part of the list to the sorted part of the list. Since we only need L sorted elements, we only need to iterate L times. In each iteration, we have to iterate through the unsorted part of the list to find the smallest element, so we need a total of n*L iterations. This leads to O(nL).
Bubble sort:
The time complexity for selection sort is going to be O(nL). The regular bubble sort puts the current largest element to the end of the list after each iteration. We can modify the bubble sort such that in each iteration we put the smallest element to the front of the list. Instead of starting from the start of the array, we start from the end of the array. We only need L iterations to find the L sorted elements, so the time complexity becomes O(nL).
Quick sort:
Quick sort divides the array into two parts and sorts each part using recursion. The time complexity will still be O(log(n)*n) because there is no "check point" in quick sort. We never have a partial sorted list that contains the L smallest elements when we are doing quick sort.
Merge sort:
The idea here is similar to that of quick sort. We have to reach the base case of the recursion and merge the results to get a fully sorted list, so we still have O(log(n)*n) running time.

Related

How to construct the order? How to sort tuples?

This question asks me to design a deterministic algorithm that would run in theta(n log n) time to do the following:
There was a race, and the order in which the racers finished will be decided by this info: each runner will report his own number,a, and the runner immediately ahead of him, b. <a,b> pairs. The winner will report b as null.
If the input of the algorithm is n such pairs of <a,b>s, how can we design an algorithm to decide the order in which the runners finished the race?
Hint says use sorting but if I sort based on the first values, a's, then finding out about the second value still makes the algorithm O(n^2). If I sort based on the b's, then searching for a's will cause the algorithm be O(n^2).
How can I do this in theta(n log n)?
Thanks!
Assuming that the racers' numbers are chosen from the set {1, ..., n} (where n is the total number of racers):
Instantiate a 0-based array arr of size n + 1.
For each pair (a,b), do arr[b] := a, interpreting null as 0.
Starting from i := 0, do n times: i := arr[i]. The assigned values of i are exactly the racers' numbers in the correct order.
This clearly has time complexity O(n). So in order to get Θ(n log n) (Theta, not O), just do an irrelevant task as Step 4 which takes Θ(n log n), like sorting n numbers using Heap Sort and ignoring the result.
If you cannot assume that the racers' numbers are chosen from {1, ..., n}, you first create an associative array from the racers' numbers to {1, ..., n} (and a normal array for the other direction) and then proceed as before, using the associative array for translating the racers' numbers into the array indices. A hash table won't do the job since it has Θ(n) (non-amortized) lookup time, which would result in Θ(n^2). Use a self-balancing binary search tree as associate array instead, which has Θ(log n) lookup time. The creation of the tree also takes Θ(n log n), so there you get your Θ(n log n) in total even without the dummy step 4 above.

how to find sum up to kth element of sorted sub-array from l to r?

Problem:
We have given an array A[] of size N. Now we have given Q queries, each queries consist of three integer l,r,k where:
1<=l<=r<=N
1<=k<=(r-l+1)
1<=N,Q<=10^5
Now,we want to find out the sum upto the k element of the sorted sub-array from l to r.
For example:
Let N=6 and array element be 5,1,7,4,6,3
And Q=1 where l,r,k be as 2,5,3.
then sub-array from index 2 to index 5 be {1,7,4,6}
after sorting it becomes as {1,4,6,7}
so sum upto k=3 term is equal to (1+4+6)=11
so answer is 11 .
I have tried using sorting of each sub-array and then sum, it takes Q*N*log(N) time complexity in worst case.
Please help to find any better solution within time complexity less than Q*N in worst case.
One approach would be to preprocess by using mergesort with the modification that we keep a copy of all the sorted intermediate results.
This preprocessing takes O(nlogn).
Suppose we started with 32 elements. We would now have:
16 sorted 2 element lists
8 sorted 4 element lists
4 sorted 8 element lists
2 sorted 16 element lists
1 sorted 32 element list.
We can also precompute the prefix sum of each of these lists in O(nlogn).
Then when faced with a query from l to r, we can identify log(n) of the preprocessed lists that together will cover all elements from l to r.
We can then use binary search to find the value such that there are exactly k smaller elements in the identified lists, and use the prefix sums to calculate the sum.
If O(Q) >= O(log N):
Sort the original array indices by the sorting order of their elements, for instance:
values: [50, 10, 20, 40, 30] -> [10, 20, 30, 40, 50]
indices: [#0, #1, #2, #3, #4] -> [#1, #2, #4, #3, #0]
Then, for each query, scan the sorted indices left to right, and add the values of the first k elements that you encounter whose indices are within range ([l, r]). The complexity will be O(QN + N log N) = O(QN) -- again, provided that O(Q) >= O(log N).
There's a way to improve on this by sorting the query ranges first, and then execute all queries in a single scan of the sorted indices, but there's no way to predict how this will affect complexity, short of knowing something special about the lengths of the ranges.

Find number(s) repeated k times in unsorted array

We have array that contain integer number, I would like to find the numbers that repeated k time in this array. The array is not sorted, and the numbers are not bounded.
Example,
A(20, 6, 99, 3, 6, 2, 1,11,41, 31, 99, 6, 7, 8, 99, 10, 99, ,6)
Find the numbers repeated more than 3 times.
Answer: 6,99
possible answer using bit wise operations (xor) or combination? Efficiency in running time Big(o) is required as well as the space capacity.
This is not homework, its simply interesting problem.
Patrick87 probably has the most straightforward answer in the comments, so I'll give another approach.
As you iterate over your list, you will both create and insert into a map an element (id, value), where id is equivalent to the number, and the value matched a count initialized to 1 upon insertion. If the value is already present in the map, you just increment the counter.
Insertions into your map will take O(log k) time where k is the size of the map. Your total map creation time is therefore O(n log n).
After the map is created, you can iterate over it, and output any number whose count == target count.
Time complexity = O(n) + O(n log n) + O(n) = O(n log n)
Space complexity = O(n) + O(n) = O(n)
If you're looking for better space complexity, you're not going to get it... Even if the numbers are read from a stream, you need to track individual values which is O(n).

Finding n-th biggest product in a large matrix of numbers, fast

I'm working on a sorting/ranking algorithm that works with quite large number of items and I need to implement the following algorithm in an efficient way to make it work:
There are two lists of numbers. They are equally long, about 100-500 thousand items. From this I need to find the n-th biggest product between these lists, ie. if you create a matrix where on top you have one list, on the side you have the other one and each cell is the product of the number above and the number on the side.
Example: The lists are A=[1, 3, 4] and B=[2, 2, 5]. Then the products are [2, 2, 5, 6, 6, 15, 8, 8, 20]. If I wanted the 3rd biggest from that it would be 8.
The naive solution would be to simply generate those numbers, sort them and then select the n-th biggest. But that is O(m^2 * log m^2) where m is the number of elements in the small lists, and that is just not fast enough.
I think what I need is to first sort the two small lists. That is O(m * log m). Then I know for sure that the biggest one A[0]*B[0]. Second biggest one is either A[0]*B[1] or A[1]*B[0], ...
I feel like this could be done in O(f(n)) steps, independent of the size of the matrix. But I can't figure out an efficient way to do this part.
Edit: There was an answer that got deleted, which suggested to remember position in the two sorted sets and then look at A[a]*B[b+1] and A[a+1]*B[b], returning the bigger one and incrementing a/b. I was going to post this comment before it got deleted:
This won't work. Imagine two lists A=B=[3,2,1]. This will give you
matrix like [9,6,3 ; 6,4,2 ; 3,2,1]. So you start at (0,0)=9, go to
(0,1)=6 and then the choice is (0,2)=3 or (1,1)=4. However, this will
miss the (1,0)=6 which is bigger then both. So you can't just look to
the two neighbors but you have to backtrack.
I think it can be done in O(n log n + n log m). Here's a sketch of my algorithm, which I think will work. It's a little rough.
Sort A descending. (takes O(m log m))
Sort B descending. (takes O(m log m))
Let s be min(m, n). (takes O(1))
Create s lazy sequence iterators L[0] through L[s-1]. L[i] will iterate through the s values A[i]*B[0], A[i]*B[1], ..., A[i]*B[s-1]. (takes O(s))
Put the iterators in a priority queue q. The iterators will be prioritized according to their current value. (takes O(s) because initially they are already in order)
Pull n values from q. The last value pulled will be the desired result. When an iterator is pulled, it is re-inserted in q using its next value as the new priority. If the iterator has been exhausted, do not re-insert it. (takes O(n log s))
In all, this algorithm will take O(m log m + (s + n)log s), but s is equal to either m or n.
I don't think there is an algorithm of O(f(n)), which is independent of m.
But there is a relatively fast O(n*logm) algo:
At first, we sort the two arrays, we get A[0] > A[1] > ... > A[m-1] and B[0] > B[1] > ... > B[m-1]. (This is O(mlogm), of course.)
Then we build a max-heap, whose elements are A[0]*B[0], A[0]*B[1], ... A[0]*B[m-1]. And we maintain a "pointer array" P[0], P[1], ... P[m-1]. P[i]=x means that B[i]*A[x] is in the heap currently. All the P[i] are zero initially.
In each iteration, we pop the max element from the heap, which is the next largest product. Assuming it comes from B[i]*A[P[i]] (we can record the elements in the heap come from which B[i]), we then move the corresponding pointer forward: P[i] += 1, and push the new B[i] * A[P[i]] into the heap. (If P[i] is moved to out-of-range (>=m), we simply push a -inf into the heap.)
After the n-th iteration, we get the n-th largest product.
There are n iterations, and each one is O(logm).
Edit: add some details
You don't need to sort the the 500 000 elements to get the top 3.
Just take the first 3, put them in a SortedList, and iterate over the list, replacing the smallest of the 3 elements with the new value, if that is higher, and resort the resulting list.
Do this for both lists, and you'll end with a 3*3 matrix, where it should be easy to take the 3rd value.
Here is an implementation in scala.
If we assume n is smaller than m, and A=[1, 3, 4] and B=[2, 2, 5], n=2:
You would take (3, 4) => sort them (4,3)
Then take (2,5) => sort them (5, 2)
You could now do an zipped search. Of course the biggest product now is (5, 4). But the next one is either (4*2) or (5*3). For longer lists, you could keep in mind what the result of 4*2 was, compare it only with the next product, taken the other way. That way you would only calculate one product too much.

How to sort an array according to another array?

Suppose A={1,2,3,4}, p={36,3,97,19}, sort A using p as sort keys. You can get {2,4,1,3}.
It is an example in the book introducton to algorithms. It says it can be done in nlogn.
Can anyone give me some idea about how it can be done? My thought is you need to keep track of each element in p to find where it ends up, like p[1] ends up at p[3] then A[1] ends up at A[3]. Can anyone use merge sort or other nlogn sorting to get this done?
I'm new to algorithm and find it a little intimidating :( thanks for any help.
Construct an index array:
i = { 0, 1, 2, 3 }
Now, while you are sorting p, make the same changes to the index array i.
When you're done, you'll have:
i = { 1, 3, 0, 2 }
Sorting two arrays takes at most twice as long as sorting one (and actually, if you're only counting comparisons you don't have to do any additional comparisons, just data swaps in two arrays instead of one), so that doesn't change the Big-O complexity of the overall sort because O( 2n log n ) = O(n log n).
Now, you can use those indices to construct the sorted A array in linear time by simply iterating through the sorted index array and looking up the element of A at that index. This takes O( n ) time.
The runtime complexity of your overall algorithm is at worst: O( n + 2n log n ) = O( n log n )
Of course you can also skip index array entirely and simply treat the array A in the same way, sorting it along side p.
I don't see this difficult, since complexity of a sorting algorithm is usually measured on number of comparisons required you just need to update the position of elements in array A according to the elements in B. You won't need to do any comparison in addition to ones already needed to sort B so complexity is the same.
Every time you move an element, just move it in both arrays and you are done.

Resources