What is the Time Complexity of a array sorted in the ascending order if it is passed to Reversort Algorithm? - algorithm

A reversort Algorithm is defined as the following:
Reversort(L):
for i := 1 to length(L) - 1
j := position with the minimum value in L between i and length(L), inclusive
Reverse(L[i..j])
I understand that the time complexity is O(n^2) for a array
But for a array which is already sorted(in ascending) what is the complexity?
Will it remain same or will it become O(n)?

Still takes quadratic time. Not for reversals, since j will always be i so each reversal takes O(1). But for finding the minimum values.
(Finding the minima could be done faster if you for example additionally kept the remaining elements in a min-heap (leading to overall O(n log n) time), but that would really have to be stated. As it's written, it's doing a full search through the remaining part each time.)

Related

Algorithmic complexity of generating Hamming numbers (not codes)

Hamming numbers are numbers of the form 2^a3^b5^c. If I want to generate the nth Hamming number, one way to do this is to use a min-heap and a hashset as in the following pseudo-code:
heap = [1]
seen = hashset()
answer = []
while len(answer) < n:
x = heap.pop()
answer.append(x)
for f in [2,3,5]:
if f*x not in seen:
heap.push(f*x)
seen.add(f*x)
return answer[-1]
I think this is O(n log n) in time complexity: each time we run the block of code in the while loop, we do one pop and up to three pushes, each of which is logarithmic in the size of the heap, and the size of the heap is at worse linear in the number of times we've performed the while loop.
Questions:
Is my timing analysis correct?
Is there an algorithm which can do this faster, i.e. linear time complexity in n? What is the best-case for time complexity for this problem?

Time Complexity when processing output

I'm struggling to figure out what the time complexity for this code would be.
def under_ten(input_list : List[int]) -> List[int]:
res = []
for i in input_list:
if i < 10:
res.append(i)
res.sort()
return res
Since the loop iterates over every element of n, I think the best case should be O(n). What I'm not sure about is how sorting the result list affects the time complexity of the entire function. Is the worst case O(nlogn) (all numbers in n are under 10, so the result list is the same size as the input list)? And what would be the average case?
EDIT: Changed input name from n to input_list and added type hints, sorry if that caused some confusion (added type hints as well).
Your first observation is correct that iterating the input collection would be an O(N) operation, where N here is the length of the array called n. The running time of the sort operation at the end would depend on how large the res array is. In the worst case scenario, every number in n would be less than 10, and therefore would end up in res. The internal algorithm Python would be using for sort() would likely be either quicksort or mergesort (q.v. this SO question). Both of these algorithms use a divide-and-conquer approach which run in O(N*lgN). So, in the worst case, your under_ten() function would run in O(N*lgN).
Let N be the length of the list and K the number of elements smaller than 10.
The complexity is O(N + K log K), assuming that append is done in amortized constant time.
In the worst case, K=N, hence O(N Log N), provided the sort truly has a worst case O(N Log N). Otherwise, it could be O(N²).

How to choose the least number of weights to get a total weight in O(n) time

If there are n unsorted weights and I need to find the least number of weights to get at least weight W.
How do I find them in O(n)?
This problem has many solution methods:
Method 1 - Sorting - O(nlogn)
I guess that the most trivial one would be to sort in descending order and then to take the first K elements that give a sum of at least W. The time complexity will be though O(nlogn).
Method 2 - Max Heap - O(n + klogn)
Another method would be to use a max heap.
Creating the heap will take O(n) and then extracting elements until we got to a total sum of at least W. Each extraction will take O(logn) so the total time complexity will be O(klogn) where k is the number of elements we had to extract from the heap.
Method 3 - Using Min Heap - O(nlogk)
Adding this method that JimMischel suggested in the comments below.
Creating a min heap with the first k elements in the list that sums to at least W. Then, iterate over the remaining elements and if it's greater than the minimum (heap top) replace between them.
At this point, it might be that we have more elements of what we actually need to get to W, so we will just extract the minimums until we reach our limit. In practice, depending on the relation between
find_min_set(A,W)
currentW = 0
heap H //Create empty heap
for each Elem in A
if (currentW < W)
H.add(Elem)
currentW += Elem
else if (Elem > H.top())
currentW += (Elem-H.top())
H.pop()
H.add(Elem)
while (currentW-H.top() > W)
currentW -= H.top()
H.pop()
This method might be even faster in practice, depending on the relation between k and n. See when theory meets practice.
Method 4 - O(n)
The best method I could think of will be using some kind of quickselect while keeping track of the total weight and always partitioning with the median as a pivot.
First, let's define few things:
sum(A) - The total sum of all elements in array A.
num(A) - The number of elements in array A.
med(A) - The median of the array A.
find_min_set(A,W,T)
//partition A
//L contains all the elements of A that are less than med(A)
//R contains all the elements of A that are greater or equal to med(A)
L, R = partition(A,med(A))
if (sum(R)==W)
return T+num(R)
if (sum(R) > W)
return find_min_set(R,W,T)
if (sum(R) < W)
return find_min_set(L,W-sum(R),num(R)+T)
Calling this method by find_min_set(A,W,0).
Runtime Complexity:
Finding median is O(n).
Partitioning is O(n).
Each recursive call is taking half of the size of the array.
Summing it all up we get a follow relation: T(n) = T(n/2) + O(n) which is same as the average case of quickselect = O(n).
Note: When all values are unique both worst-case and average complexity is indeed O(n). With possible duplicates values, the average complexity is still O(n) but the worst case is O(nlogn) with using Median of medians method for selecting the pivot.

Complexity of finding the median using 2 heaps

A way of finding the median of a given set of n numbers is to distribute them among 2 heaps. 1 is a max-heap containing the lower n/2 (ceil(n/2)) numbers and a min-heap containing the rest. If maintained in this way the median is the max of the first heap (along with the min of the second heap if n is even). Here's my c++ code that does this:
priority_queue<int, vector<int> > left;
priority_queue<int,vector<int>, greater<int> > right;
cin>>n; //n= number of items
for (int i=0;i<n;i++) {
cin>>a;
if (left.empty())
left.push(a);
else if (left.size()<=right.size()) {
if (a<=right.top())
left.push(a);
else {
left.push(right.top());
right.pop();
right.push(a);
}
}
else {
if (a>=left.top())
right.push(a);
else {
right.push(left.top());
left.pop();
left.push(a);
}
}
}
We know that the heapify operation has linear complexity . Does this mean that if we insert numbers one by one into the two heaps as in the above code, we are finding the median in linear time?
Linear time heapify is for the cost of building a heap from an unsorted array as a batch operation, not for building a heap by inserting values one at a time.
Consider a min heap where you are inserting a stream of values in increasing order. The value at the top of the heap is the smallest, so each value trickles all the way down to the bottom of the heap. Consider just the last half of the values inserted. At this time the heap will have very nearly its full height, which is log(n), so each value trickles down log(n) slots, and the cost of inserting n/2 values is O(n log(n))
If I present a stream of values in increasing order to your median finding algorithm one of the things it has to do is build a min heap from a stream of values in increasing order so the cost of the median finding is O(n log(n)). In, fact the max heap is going to be doing a lot of deletes as well as insertions, but this is just a constant factor on top so I think the overall complexity is still O(n log(n))
When there is one element, the complexity of the step is Log 1 because of a single element being in a single heap.
When there are two elements, the complexity of the step is Log 1 as we have one element in each heap.
When there are four elements, the complexity of the step is Log 2 as we have two elements in each heap.
So, when there are n elements, the complexity is Log n as we have n/2 elements in each heap and
adding an element; as well as,
removing element from one heap and adding it to another;
takes O(Log n/2) = O(Log n) time.
So for keeping track of median of n elements essentially is done by performing:
2 * ( Log 1 + Log 2 + Log 3 + ... + Log n/2 ) steps.
The factor of 2 comes from performing the same step in 2 heaps.
The above summation can be handled in two ways. One way gives a tighter bound but it is encountered less frequently in general. Here it goes:
Log a + Log b = Log a*b (By property of logarithms)
So, the summation is actually Log ((n/2)!) = O(Log n!).
The second way is:
Each of the values Log 1, Log 2, ... Log n/2 is less than or equal to Log n/2
As there are a total n/2 terms, the summation is less than (n/2) * Log (n/2)
This implies the function is upper bound by (n/2) * Log (n/2)
Or, the complexity is O(n * Log n).
The second bound is looser but more well known.
This is a great question, especially since you can find the median of a list of numbers in O(N) time using Quickselect.
But the dual priority-queue approach gives you O(N log N) unfortunately.
Riffing in binary heap wiki article here, heapify is a bottom-up operation. You have all the data in hand and this allows you to be cunning and reduce the number of swaps/comparisons to O(N). You can build an optimal structure from the get-go.
Adding elements from the top, one at a time, as you are doing here, requires reorganizing every time. That's expensive so the whole operation ends up being O(N log N).

Count number of identical pairs

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
inversion++;
j++;
}
}
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.
You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)
As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)
If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
...
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2
I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Resources