Algorithmic complexity of generating Hamming numbers (not codes) - algorithm

Hamming numbers are numbers of the form 2^a3^b5^c. If I want to generate the nth Hamming number, one way to do this is to use a min-heap and a hashset as in the following pseudo-code:
heap = [1]
seen = hashset()
answer = []
while len(answer) < n:
x = heap.pop()
answer.append(x)
for f in [2,3,5]:
if f*x not in seen:
heap.push(f*x)
seen.add(f*x)
return answer[-1]
I think this is O(n log n) in time complexity: each time we run the block of code in the while loop, we do one pop and up to three pushes, each of which is logarithmic in the size of the heap, and the size of the heap is at worse linear in the number of times we've performed the while loop.
Questions:
Is my timing analysis correct?
Is there an algorithm which can do this faster, i.e. linear time complexity in n? What is the best-case for time complexity for this problem?

Related

What is the Time Complexity of a array sorted in the ascending order if it is passed to Reversort Algorithm?

A reversort Algorithm is defined as the following:
Reversort(L):
for i := 1 to length(L) - 1
j := position with the minimum value in L between i and length(L), inclusive
Reverse(L[i..j])
I understand that the time complexity is O(n^2) for a array
But for a array which is already sorted(in ascending) what is the complexity?
Will it remain same or will it become O(n)?
Still takes quadratic time. Not for reversals, since j will always be i so each reversal takes O(1). But for finding the minimum values.
(Finding the minima could be done faster if you for example additionally kept the remaining elements in a min-heap (leading to overall O(n log n) time), but that would really have to be stated. As it's written, it's doing a full search through the remaining part each time.)

Why the complexity of this algorithm is O(n log(n)))?

I have a multi-set S with positive numbers where I want to partition it into K subsets such that the difference between the sum of partitions is minimized. one simple heuristics approach to the above problem is the greedy algorithm, which iterates through the set of numbers sorted in descending order, assigning each of them to whichever subset has the smaller sum of the numbers. My question is why time complexity of this greedy algorithm is O(nlog(n))?
Determining "whichever subset has the smaller sum of the numbers" will take logarithmic time on the current number of subsets. You would need to implement some sort of priority queue or heap for doing this efficiently with such a time complexity. In a worst case your number of sets will be O(𝑛), and so you get the following search time complexities as each input value is processed:
O(log(1) + log(2) + log(3) + ... + log𝑛)
= O(log𝑛!)
= O(𝑛log𝑛)

Time Complexity when processing output

I'm struggling to figure out what the time complexity for this code would be.
def under_ten(input_list : List[int]) -> List[int]:
res = []
for i in input_list:
if i < 10:
res.append(i)
res.sort()
return res
Since the loop iterates over every element of n, I think the best case should be O(n). What I'm not sure about is how sorting the result list affects the time complexity of the entire function. Is the worst case O(nlogn) (all numbers in n are under 10, so the result list is the same size as the input list)? And what would be the average case?
EDIT: Changed input name from n to input_list and added type hints, sorry if that caused some confusion (added type hints as well).
Your first observation is correct that iterating the input collection would be an O(N) operation, where N here is the length of the array called n. The running time of the sort operation at the end would depend on how large the res array is. In the worst case scenario, every number in n would be less than 10, and therefore would end up in res. The internal algorithm Python would be using for sort() would likely be either quicksort or mergesort (q.v. this SO question). Both of these algorithms use a divide-and-conquer approach which run in O(N*lgN). So, in the worst case, your under_ten() function would run in O(N*lgN).
Let N be the length of the list and K the number of elements smaller than 10.
The complexity is O(N + K log K), assuming that append is done in amortized constant time.
In the worst case, K=N, hence O(N Log N), provided the sort truly has a worst case O(N Log N). Otherwise, it could be O(N²).

Time complexity of a loop with value increasing in powers of 2

for(i=1;i<=n;i=pow(2,i)) { print i }
What will be the time complexity of this?
Approximate kth term for value of i will be pow(2,(pow(2,pow(2,pow(2, pow(2,pow(2,...... k times)))))))
How can the above value, let's say kth value of i < n be solved for k.
What you have is similar to tetration(2,n) but its not it as you got wrong ending condition.
The complexity greatly depends on the domain and implementation. From your sample code I infer real domain and integers.
This function grows really fast so after 5 iterations you need bigints where even +,-,*,/,<<,>> are not O(1). Implementation of pow and print have also a great impact.
In case of small n<tetration(2,4) you can assume the complexity is O(1) as there is no asymptotic to speak of for such small n.
Beware pow is floating point in most languages and powering 2 by i can be translated into simple bit shift so let assume this:
for (i=1;i<=n;i=1<<i) print(i);
We could use previous state of i to compute 1<<i like this:
i0=i; i<<=(i-i0);
but there is no speedup on such big numbers.
Now the complexity of decadic print(i) is one of the following:
O( log(i)) // power of 10 datawords (like 1000000000 for 32 bit)
O((log(i))^2) // power of 2 datawords naive print implementation
O( log(i).log(log(i))) // power of 2 datawords subdivision or FFT based print implementation
The complexity of bit shift 1<<i and comparison i<=n is:
O(log(i)) // power of 2 datawords
So choosing the best implementation for print in power of 2 datawords lead to iteration:
O( log(i).log(log(i) + log(i) + log(i) ) -> O(log(i).log(log(i)))
At first look one would think we would need to know the number of iterations k from n:
n = tetration(2,k)
k = slog2(n)
or Knuth's notation which is directly related to Ackermann function:
n = 2↑↑k
k = 2↓↓n
but the number of iterations is so small in comparison to inner complexity of the stuff inside loop and next iterations grows so fast that the previous iteration is negligible fraction of the next one so we can ignore them all and only consider the last therm/iteration...
After all these assumptions I got final complexity:
O(log(n).log(log(n)))

Complexity of finding the median using 2 heaps

A way of finding the median of a given set of n numbers is to distribute them among 2 heaps. 1 is a max-heap containing the lower n/2 (ceil(n/2)) numbers and a min-heap containing the rest. If maintained in this way the median is the max of the first heap (along with the min of the second heap if n is even). Here's my c++ code that does this:
priority_queue<int, vector<int> > left;
priority_queue<int,vector<int>, greater<int> > right;
cin>>n; //n= number of items
for (int i=0;i<n;i++) {
cin>>a;
if (left.empty())
left.push(a);
else if (left.size()<=right.size()) {
if (a<=right.top())
left.push(a);
else {
left.push(right.top());
right.pop();
right.push(a);
}
}
else {
if (a>=left.top())
right.push(a);
else {
right.push(left.top());
left.pop();
left.push(a);
}
}
}
We know that the heapify operation has linear complexity . Does this mean that if we insert numbers one by one into the two heaps as in the above code, we are finding the median in linear time?
Linear time heapify is for the cost of building a heap from an unsorted array as a batch operation, not for building a heap by inserting values one at a time.
Consider a min heap where you are inserting a stream of values in increasing order. The value at the top of the heap is the smallest, so each value trickles all the way down to the bottom of the heap. Consider just the last half of the values inserted. At this time the heap will have very nearly its full height, which is log(n), so each value trickles down log(n) slots, and the cost of inserting n/2 values is O(n log(n))
If I present a stream of values in increasing order to your median finding algorithm one of the things it has to do is build a min heap from a stream of values in increasing order so the cost of the median finding is O(n log(n)). In, fact the max heap is going to be doing a lot of deletes as well as insertions, but this is just a constant factor on top so I think the overall complexity is still O(n log(n))
When there is one element, the complexity of the step is Log 1 because of a single element being in a single heap.
When there are two elements, the complexity of the step is Log 1 as we have one element in each heap.
When there are four elements, the complexity of the step is Log 2 as we have two elements in each heap.
So, when there are n elements, the complexity is Log n as we have n/2 elements in each heap and
adding an element; as well as,
removing element from one heap and adding it to another;
takes O(Log n/2) = O(Log n) time.
So for keeping track of median of n elements essentially is done by performing:
2 * ( Log 1 + Log 2 + Log 3 + ... + Log n/2 ) steps.
The factor of 2 comes from performing the same step in 2 heaps.
The above summation can be handled in two ways. One way gives a tighter bound but it is encountered less frequently in general. Here it goes:
Log a + Log b = Log a*b (By property of logarithms)
So, the summation is actually Log ((n/2)!) = O(Log n!).
The second way is:
Each of the values Log 1, Log 2, ... Log n/2 is less than or equal to Log n/2
As there are a total n/2 terms, the summation is less than (n/2) * Log (n/2)
This implies the function is upper bound by (n/2) * Log (n/2)
Or, the complexity is O(n * Log n).
The second bound is looser but more well known.
This is a great question, especially since you can find the median of a list of numbers in O(N) time using Quickselect.
But the dual priority-queue approach gives you O(N log N) unfortunately.
Riffing in binary heap wiki article here, heapify is a bottom-up operation. You have all the data in hand and this allows you to be cunning and reduce the number of swaps/comparisons to O(N). You can build an optimal structure from the get-go.
Adding elements from the top, one at a time, as you are doing here, requires reorganizing every time. That's expensive so the whole operation ends up being O(N log N).

Resources