Worst-case time complexity of an algorithm with 2+ steps - algorithm

My goal is to write an algorithm that checks if an unsorted array of positive integers contains a value x and x^2 and return their indices if so.
I've solved this by proposing that first you sort the array using merge sort, then perform binary search for x, then perform binary search for x^2.
I then wrote that "since binary search has worst-case runtime of O(log n) and merge sort has worst-case runtime of O(n log n), we conclude that the worst-case runtime of this algorithm is O(n log n)." Am I correct in my understanding that when analyzing the overall efficiency of an algorithm that involves steps with different runtimes, we just take the one with the longest runtime? Or is it more involved than this?
Thanks in advance!

Since O(log n) < O(n log n):
O(n log n) + O(log n) + O(log n) = O(n log n)
So the time complexity of the hole algorithm is O(n log n).

Your question is a bit ambigous. Do you get
an unsorted list [a,b,c ...] and a specific x to search for as parameter?
or
just get the list and have to find if there is at least one pair (x,y) with x^2 = y contained in the list?
Now as you have cleared it's the first, the answer is O(n), because you just have to iterate over the list (no need to sort or binary search) and check for each element if it's equal to x or x^2. If you find both, the list fulfills the condition.
function match(list, x) {
let ix = -1, ixx = -1;
for (let i = 0; i< list.length && (ix == -1 || ixx == -1); i++) {
if (i == x) ix = i;
if (i == x*x) ixx = i;
}
return [ix, ixx];
}
This returns the indexes of x and x^2 or, if not found -1 for the respective index. It returns, once both values are found in the list

Related

Why is the time complexity of merge intervals algorithm O(n log n) and not O(n)?

In a merge intervals algorithm, we first sort the intervals, which is O(log n), and then iterate through them to perform the merge, which is O(n).
I have seen it stated that this makes the merge intervals algorithm O(n log n). But from what I can see, since we only perform the sort once, and then afterwards we iterate once over the intervals, we should have complexity O(log n) + O(n) = O(n).
So what I am missing? Why are O(log n) and O(n) multiplied together instead of added when computing the complexity?
def merge(intervals):
if len(intervals) < 2:
return intervals
intervals.sort(key=lambda x: x.start)
mergedIntervals = []
start = intervals[0].start
end = intervals[0].end
for i in range(1, len(intervals)):
interval = intervals[i]
if interval.start <= end:
end = max(interval.end, end)
else:
mergedIntervals.append(Interval(start, end))
start = interval.start
end = interval.end
# add the last interval
mergedIntervals.append(Interval(start, end))
return mergedIntervals
The error is that you think sorting is O(log n). Sorting is typically implemented in O(n log n) or worse (see here).
The rest of the calculations is correct: O(nlog n) + O(n) = O(n log n).

Can this be approximated?

The time complexity of finding k largest element using min-heap is given as
O(k + (n-k)log k) as mentioned here link Can it be approximated to O((n-k) log k)?
Since O(N+Nlog(k))=O(Nlog(k)) is above approximation also true ?
No you can't simplify it like that. This can be shown with a few example values for k that are close to n:
k = n
Now the complexity is defined as: O(n + 0log n) = O(n). If you would have left out the first term of the sum, you would have ended of with O(0), which obviously is wrong.
k = n - 1
We get: O((n-1) + 1log(n-1)) = O(n + log(n)) = O(n). Without the first term, you would get O(log(n)), which again is wrong.

How does O(log log N) complexity loop look like? [duplicate]

This question already has answers here:
O(n log log n) time complexity
(4 answers)
Closed 7 years ago.
I have a very basic question here
for(int i = 1; i < N/2; i++) {
}
My initial understanding was the time-complexity for the above loop would O(logn) but after reading through some articles it is pretty much evident that it's simply O(n) and O(logn) would look like for (i = 1; i <= n; i *= 2)
Now my question is how does O(log log N) loop look like?
O(log n) loop:
for (i = 1; i <= n; i *= 2)
So you double i at each step. Basically:
Increment => O(n)
Doubling => O(log n)
??? => O(log log n)
What comes after multiplication? Exponentiation. So this would be O(log log n):
for (i = 2; i <= n; i *= i) // we are squaring i at each step
Note: your loop is O(n), not O(log n). Keeping in line with the increment / double / exponentiate idea above, you can rewrite your loop using incrementation:
for(int i = 1; i < n; i += 2)
Even if you increment by more, it's still incrementation, and still O(n).
That loop doesn't look like O(log N). It is what it is, an O(N/2) loop. To quote the definition:
Function f(x) is said to be O(g(x)) iff (if and only if) there exists a positive real number c, and a real number x0 such that |f(x)| <= c|g(x)| for all x >= x0. For example, you could also call that loop O(N), O(N^2), O(N^3) as you can find the required parameters easily. But, you cannot find parameters that will make O(log N) fit.
As for O(log log N), I suppose you could rewrite Interpolation search implementation given here https://en.wikipedia.org/wiki/Interpolation_search to use a for loop. It is O(log log N) on the average!
Your cost is not O(logN) your cost is O(N*logN).
Read the link you will see a function example like :
No matter the number in the beginning of the polynomial cost is the biggest polynomial.
In your case it is
1/2 * n * log(n) , which 1/2 makes no difference your complexity is O(N*logN)

Dynamic programming over interval

I'm working on an algorithms problem, and I'm hitting a wall in speeding it up.
I have a function f(i,j), where i and j are integers such that 1 <= i <= j <= n for some upper bound n. This function is already written.
Furthermore, this function satisfies the equality f(i, j) + f(j, k) = f(i, k).
I need to compute f(x, y) for many different pairs x, y. Assume n is big enough that storing f(x,y) for every possible pair x,y will take up too much space.
Is there a known algorithm for this type of question? The one I'm using right now memoizes f and tries to reduce x,y to a previously computed pair of numbers by using the equality mentioned above, but my guess is that I'm not reducing in a smart way, and it's costing me time.
Edit: Assume that f(i, j) takes time proportional to j-i when computed the naive way.
You can use an implicit tree of power-of-two-sized intervals:
Store f(i,i+1) for every i
Store f(i,i+2) for every even i
Store f(i,i+4) for every i divisible by four
...
There will be O(log n) tables (floor(log_2(n)), to be exact), with a total size of O(n) (~2*n).
To retrieve f(i,j) where i<=j:
find the highest bit where i, j differ.
Let n be the value with this bit set, and all lower bits cleared. This guarantees the following steps will always succeed:
find f(i, n) by cutting off a chunk as large as possible from the right repeatedly
find f(n, j) by cutting off a chunk as large as possible from the left repeatedly
The retreival accesses each table at most twice, and thus runs in O(log n).
The function satisfies the rule
f(i, j) + f(j, k) = f(i, k)
As you say .
So modify the function to something like f(i,j) =g(j)-g(i) , where g(i)= f(1,x)
So as
f(i,k)=g(k)-g(i)
=g(k)-g(j)+g(j)-g(i)
=f(j,k) + f(i,j)
So i think if you try to store all combinations of f(i,j) it is it cost you around o(n^2) space , so better you store value of g(i) values for all values of i which is of o(n) space
so when ever you need to find f(i,j) you can actually find it as g(j)-g(i) .
As
f(i,j)= g(j)-g(i) // as we already calculated and stored the g(i) .
This is a solution that requires O(n) space, O(n^2) setup time and O(1) time per evaluation.
We have that f(i, j) = -f(j, i) for i <= j.
Given is f(i, k) = f(i, j) + f(j, k). Therefore, f(i, k) = f(i, j) + f(j, k) = -f(j, i) + f(j, k). In a setup phase, fix j = 1 arbitrarily. Then, compute f(1, i) for every i and store the result. This takes O(n) space and O(n^2) time: n evaluations with running times of 1, 2, 3, ..., n.
For a query f(i, k), we need two constant-time lookups for f(i, 1) and f(k, 1).

What is the fastest algorithm to find an element with highest frequency in an array

I have two input arrays X and Y. I want to return that element of array X which occurs with highest frequency in array Y.
The naive way of doing this requires that for each element x of array X, I linearly search array Y for its number of occurrences and then return that element x which has highest frequency. Here is the pseudo algorithm:
max_frequency = 0
max_x = -1 // -1 indicates no element found
For each x in X
frequency = 0
For each y in Y
if y == x
frequency++
End For
If frequency > max_frequency
max_frequency = frequency
max_x = x
End If
End For
return max_x
As there are two nested loops, time complexity for this algorithm would be O(n^2). Can I do this in O(nlogn) or faster ?
Use a hash table mapping keys to counts. For each element in the array, do like counts[element] = counts[element] + 1 or your language's equivalent.
At the end, loop through the mappings in the hash table and find the max.
Alternatively, if you can have additional data structures, you walk the array Y, for each number updating its frequency in a hash table. This takes O(N(Y) time. Then walk X finding which element in X has highest frequency. This takes O(N(X)) time. Overall: linear time, and since you have to look at each element of both X and Y in any implementation at least once (EDIT: This is not strictly speaking true in all cases/all implementations, as jwpat7 points out, though it true in the worst case), you can't do it any faster than that.
The time complexity of common algorithms are listed below:
Algorithm | Best | Worst | Average
--------------+-----------+-----------+----------
MergeSort | O(n lg n) | O(n lg n) | O(n lg n)
InsertionSort | O(n) | O(n^2) | O(n^2)
QuickSort | O(n lg n) | O(n^2) | O(n lg n)
HeapSort | O(n lg n) | O(n lg n) | O(n lg n)
BinarySearch | O(1) | O(lg n) | O(lg n)
In general, when traversing through a list to fulfill a certain criteria, you really can't do any better than linear time. If you are required to sort the array, I would say stick with Mergesort (very dependable) to find the element with highest frequency in an array.
Note: This is under the assumption that you want to use a sorting algorithm. Otherwise, if you are allowed to use any data structure, I would go with a hashmap/hashtable type structure with constant lookup time. That way, you just match keys and update the frequency key-value pair. Hope this helps.
1st step: Sort both X and Y. Assuming their corresponding lengths are m and n, complexity of this step will be O(n log n) + O(m log m).
2nd step: count each Xi in Y and track maximum count so far. Search of Xi in sorted Y is O(log n). Total 2nd step complexity is: O(m log n)
Total complexity: O(n log n) + O(m log m) + O(m log n), or simpified: O(max(n,m) log n)
Merge Sorting Based on Divide and Conquer Concept gives you O(nlogn) complexity
Your suggested approach will be O(n^2) if both lists are length n. What's more likely is that the lists can be different lengths, so the time complexity could be expressed as O(mn).
You can separate your problem into two phases:
1. Order the unique elements from Y by their frequency
2. Find the first item from this list that exists in X
As this sounds like a homework question I'll let you think about how fast you can make these individual steps. The sum of these costs will give you the overall cost of the algorithm. There are many approaches that will be cheaper than the product of the two list lengths that you currently have.
Sort X and Y. Then do merge sort. Count the frequencies from Y every time it encounters with same element in X.
So complexity, O(nlogn) + O(mlogm) + O(m+n) = O(klogk) where n,m = length of X, Y; k = max(m,n)
Could do a quicksort and then traverse it with a variable that counts how many of a number are in a row + what that number is. That should give you nlogn

Resources