Time Complexity when processing output - algorithm

I'm struggling to figure out what the time complexity for this code would be.
def under_ten(input_list : List[int]) -> List[int]:
res = []
for i in input_list:
if i < 10:
res.append(i)
res.sort()
return res
Since the loop iterates over every element of n, I think the best case should be O(n). What I'm not sure about is how sorting the result list affects the time complexity of the entire function. Is the worst case O(nlogn) (all numbers in n are under 10, so the result list is the same size as the input list)? And what would be the average case?
EDIT: Changed input name from n to input_list and added type hints, sorry if that caused some confusion (added type hints as well).

Your first observation is correct that iterating the input collection would be an O(N) operation, where N here is the length of the array called n. The running time of the sort operation at the end would depend on how large the res array is. In the worst case scenario, every number in n would be less than 10, and therefore would end up in res. The internal algorithm Python would be using for sort() would likely be either quicksort or mergesort (q.v. this SO question). Both of these algorithms use a divide-and-conquer approach which run in O(N*lgN). So, in the worst case, your under_ten() function would run in O(N*lgN).

Let N be the length of the list and K the number of elements smaller than 10.
The complexity is O(N + K log K), assuming that append is done in amortized constant time.
In the worst case, K=N, hence O(N Log N), provided the sort truly has a worst case O(N Log N). Otherwise, it could be O(N²).

Related

What is the Time Complexity of a array sorted in the ascending order if it is passed to Reversort Algorithm?

A reversort Algorithm is defined as the following:
Reversort(L):
for i := 1 to length(L) - 1
j := position with the minimum value in L between i and length(L), inclusive
Reverse(L[i..j])
I understand that the time complexity is O(n^2) for a array
But for a array which is already sorted(in ascending) what is the complexity?
Will it remain same or will it become O(n)?
Still takes quadratic time. Not for reversals, since j will always be i so each reversal takes O(1). But for finding the minimum values.
(Finding the minima could be done faster if you for example additionally kept the remaining elements in a min-heap (leading to overall O(n log n) time), but that would really have to be stated. As it's written, it's doing a full search through the remaining part each time.)

Algorithmic complexity of generating Hamming numbers (not codes)

Hamming numbers are numbers of the form 2^a3^b5^c. If I want to generate the nth Hamming number, one way to do this is to use a min-heap and a hashset as in the following pseudo-code:
heap = [1]
seen = hashset()
answer = []
while len(answer) < n:
x = heap.pop()
answer.append(x)
for f in [2,3,5]:
if f*x not in seen:
heap.push(f*x)
seen.add(f*x)
return answer[-1]
I think this is O(n log n) in time complexity: each time we run the block of code in the while loop, we do one pop and up to three pushes, each of which is logarithmic in the size of the heap, and the size of the heap is at worse linear in the number of times we've performed the while loop.
Questions:
Is my timing analysis correct?
Is there an algorithm which can do this faster, i.e. linear time complexity in n? What is the best-case for time complexity for this problem?

How to construct the order? How to sort tuples?

This question asks me to design a deterministic algorithm that would run in theta(n log n) time to do the following:
There was a race, and the order in which the racers finished will be decided by this info: each runner will report his own number,a, and the runner immediately ahead of him, b. <a,b> pairs. The winner will report b as null.
If the input of the algorithm is n such pairs of <a,b>s, how can we design an algorithm to decide the order in which the runners finished the race?
Hint says use sorting but if I sort based on the first values, a's, then finding out about the second value still makes the algorithm O(n^2). If I sort based on the b's, then searching for a's will cause the algorithm be O(n^2).
How can I do this in theta(n log n)?
Thanks!
Assuming that the racers' numbers are chosen from the set {1, ..., n} (where n is the total number of racers):
Instantiate a 0-based array arr of size n + 1.
For each pair (a,b), do arr[b] := a, interpreting null as 0.
Starting from i := 0, do n times: i := arr[i]. The assigned values of i are exactly the racers' numbers in the correct order.
This clearly has time complexity O(n). So in order to get Θ(n log n) (Theta, not O), just do an irrelevant task as Step 4 which takes Θ(n log n), like sorting n numbers using Heap Sort and ignoring the result.
If you cannot assume that the racers' numbers are chosen from {1, ..., n}, you first create an associative array from the racers' numbers to {1, ..., n} (and a normal array for the other direction) and then proceed as before, using the associative array for translating the racers' numbers into the array indices. A hash table won't do the job since it has Θ(n) (non-amortized) lookup time, which would result in Θ(n^2). Use a self-balancing binary search tree as associate array instead, which has Θ(log n) lookup time. The creation of the tree also takes Θ(n log n), so there you get your Θ(n log n) in total even without the dummy step 4 above.

Big O Notation - Including Data Structure Costs?

For the purpose of my question, I'll include a sample problem.
Say we need to iterate through a vector of N Elements and remove duplicates. So, we'd probably use a set right? (Let's use a C++ Set that's a tree)
O(N) cost to iterate through each element - then insert into the Set Data Structure.
My question Has a log n cost with the Set structure, and we insert N times, is this algorithm O(N log N) or simply O(N)? I was discussing this with a professor, and I'm not sure. The Leetcode/SO/online community seems to disregard data structure costs, but from an academic point of view, N inserts into a red/black tree with log N worst case - This is Log N, N times no?
For clarification - Yes It'd make more sense to use unordered_set, but that doesn't make my question valid.
Complexities express the count of some reference operation.
For example, you can very well count the inserts in some black-box structure and enumerate O(N) inserts.
But if you focus on, say, comparisons and you know that an insert costs Log N comparisons on average, the total number of comparisons is O(N Log N).
Now if you are comparing strings of Log N characters, you will count O(N Log²N) character comparisons...
Yes, it is O(n * log(n)). If you have a method like
public void foo(int n) {
for (int i = 0; i < n; i++) {
// Call a method that is in O(log n)
someLogNMethod();
}
}
then the method foo runs in O(n * log n) time.
Example
There are many non-constructed examples. Like computing the median-value in an array of integer. Take a look at the following solution to this problem which solves it by sorting the array first. Sorting is in Theta(n log n) (see comparison based sorting).
public int median(int[] values) {
int[] sortedValues = sort(values);
// Let's ignore special cases (even, empty, ...) for simplicity
int indexOfMedian = values.length / 2;
return sortedValues[indexOfMedian];
}
Obviously you wouldn't call this median method to be in Theta(1) though all it does runs in constant time (excluding the sort method).
However, the problem depends on the sort method. You can't solve the problem of finding the median of general arrays in O(1). You need to include the sort in your analysis. The method thus actually runs in Theta(n log n + 1) which is Theta(n log n).
Note that the problem can actually be solved in Theta(n) (see Find median of unsorted array in O(n) time).

Efficient algorithm to determine if two sets of numbers are disjoint

Practicing for software developer interviews and got stuck on an algorithm question.
Given two sets of unsorted integers with array of length m and other of
length n and where m < n find an efficient algorithm to determine if
the sets are disjoint. I've found solutions in O(nm) time, but haven't
found any that are more efficient than this, such as in O(n log m) time.
Using a datastructure that has O(1) lookup/insertion you can easily insert all elements of first set.
Then foreach element in second set, if it exists not disjoint, otherwise it is disjoint
Pseudocode
function isDisjoint(list1, list2)
HashMap = new HashMap();
foreach( x in list1)
HashMap.put(x, true);
foreach(y in list2)
if(HashMap.hasKey(y))
return false;
return true;
This will give you an O(n + m) solution
Fairly obvious approach - sort the array of length m - O(m log m).
For every element in the array of length n, use binary search to check if it exists in the array of length m - O(log m) per element = O(n log m). Since m<n, this adds up to O(n log m).
Here's a link to a post that I think answers your question.
3) Sort smaller O((m + n)logm)
Say, m < n, sort A
Binary search for each element of B into A
Disadvantage: Modifies the input
Looks like Cheruvian beat me to it, but you can use a hash table to get O(n+m) in average case:
*Insert all elements of m into the table, taking (probably) constant time for each, assuming there aren't a lot with the same hash. This step is O(m)
*For each element of n, check to see if it is in the table. If it is, return false. Otherwise, move on to the next. This takes O(n).
*If none are in the table, return true.
As I said before, this works because a hash table gives constant lookup time in average case. In the rare event that many unique elements in m have the same hash, it will take slightly longer. However, most people don't need to care about hypothetical worst cases. For example, quick sort is used more than merge sort because it gives better average performance, despite the O(n^2) upper bound.

Resources