Prefix sum time complexity - complexity-theory

I am reading this, the exercise in the last part.
I am new to time complexity.
First solution says the robot would move p times in one direction and then m - p in the other direction, for p from 0 to m, to me this is:
sums = []
for left in 0..m
sums[left] = 0
for right in 0..(m-left)
sums[left] += A[k - left + right] || 0
A[k - left + right] = 0
A is the input array, k is an initial position, i.e. a given constant.
From what I understand complexity would be:
O(m + m+(m-1)+(m-2)+...+3+2+1)
| -----------------------
| |
because because the inner loop
first loop
O(m + (m*(m+1))/2)
O(m + (m*(m+1))/2)
O(m^2) ?
What is my error here?
Solution for this problem states that complexity is O(n*m), can you explain me why?

the goal is to calculate the maximum sum that the robot can collect in m moves.
With that, I understand the algorithm will be something like:
max=0;
for i in 1..n
for j in 1..m
sum+=A[i]
end loop;
if sum>max then
max=sum;
end if;
sum=0;
end loop;
That it's an O(n*m) problem (if I understand correctly the problem)

Related

Finding xth smallest element in unsorted array

I've been trying some coding algorithm exercises and one in particular topic has stood out to me. I've been trying to find out a good answer to this but I've been stuck in analysis paralysis. Let's say I have an array of unsorted integers and I want to determine the xth smallest element in this array.
I know of two options to go about this:
Option 1: Run a sort algorithm, sorting elements least to greatest and look up the xth element. To my understanding, the time complexity to this is O(n*log(n)) and O(1) space.
Option 2: Heapify the array, turning it into a min heap. Then pop() the top of the heap x times. To my understanding, this is O(n) + O(x*log(n)).
I can't tell which is optimal answer and maybe I fundamental misunderstanding of priority queues and when to use them. I've tried to measure runtime and I feel like I'm getting conflicting results. Maybe since with option 2, it depends on how big x is. And maybe there is a better way to go algo. If someone could help, I'd appreciate it!
Worst case time complexity of approach 2 should be O(n + n*log(n)), as maximum value of x = n.
For average case, time complexity = O(n + (1+2+3+....n)/n * log(n)) = O(n + (n+1)*log(n)).
Therefore approach 1 is more efficient than approach 2, but still not optimal.
PS: I would like you to have a look at quick select algorithm which works in O(n) on average case.
This algorithms complexity can revolve around two data points:
Value of x.
Value of n.
Space complexity
In both algos space complexity remains the O(1)
Time complexity
Approach 1
Best Case : O(nlog(n)) for sorting & O(1) for case x == 1;
Average Case : O(nlog(n)) if we consider all elements are unique &
O(x+nlog(n)) if there are duplicates.
Worst Case. : O(n+nlog(n)) for case x==n;
Approach 2:
Best Case : O(n) as just heapify would be require case x==1
Average Case : O(n + xlog(n))
Worst Case. : O(n+nlog(n)) case x==n;
Now Coming to the point to analyze this algo's in runtime.
In general below guidelines are to be followed.
1. Always test for larger values of n.
2. Have a good spread for values being tested(here x).
3. Do multiple iterations of the analysis with clean environment
(array created everytime before the experiment etc) & get the average of all
results.
4. Check for the any predefined functions code complexity for exact implementation.
In this case the sort(can be 2nlogn etc) & various heap operations code.
So if considered above all having idle values.
Method 2 should perform better than Method 1.
Although approach 1 will have less time complexity, but both of these algorithms will use auxiliary space,space complexity of std::sort is O(n). Another way of doing this ,in constant is to do is via binary search. You can do binary search for the xth element . Let l be the smallest element of the array and r be the largest, then time complexity will be O((nlog(r-l)).
int ans=l-1;
while(l<=r){
int mid=(l+r)/2;
int cnt=0;
for(int i=0;i<n;i++){
if(a[i]<=mid)
cnt++;
}
if(cnt<x){
ans=mid;
l=mid+1;
}
else
r=mid-1;
}
Now you can look for the smallest element larger than ans present in the array.
Time complexity-O(nlog(r-l))+O(n)(for the last step)
space complexity-O(1)
You can find xth element in O(n); there are also two simple heap algorithms that improve on your option 2 complexity. I'll start with the latter.
Simple heap algorithm №1: O(x + (n-x) log x) worst-case complexity
Create a max heap out of the first x elements; for each of the remaining elements, pop the max and push them instead:
import heapq
def findKthSmallest(nums: List[int], k: int) -> int:
heap = [-n for n in nums[:k]]
heapq.heapify(heap)
for num in nums[k:]:
if -num > heap[0]:
heapq.heapreplace(heap, -num)
return -heap[0]
Simple heap algorithm №2: O(n + x log x)
Turn the whole array into a min heap, and insert its root into an auxiliary min heap.
k-1 times pop an element from the second heap, and push back its children from the first heap.
Return the root of the second heap.
import heapq
def findKthSmallest(nums: List[int], k: int) -> int:
x = nums.copy()
heapq.heapify(x)
s = [(x[0], 0)] #auxiliary heap
for _ in range(k-1):
ind = heapq.heappop(s)[1]
if 2*ind+1 < len(x):
heapq.heappush(s, (x[2*ind+1], 2*ind+1))
if 2*ind+2 < len(x):
heapq.heappush(s, (x[2*ind+2], 2*ind+2))
return s[0][0]
Which of these is faster? It depends on values of x and n.
A more complicated Frederickson algorithm would allow you to find xth smallest element in a heap in O(x), but that would be overkill, since xth smallest element in unsorted array can be found in O(n) worst-case time.
Median-of-medians algorithm: O(n) worst-case time
Described in [1].
Quickselect algorithm: O(n) average time, O(n^2) worst-case time
def partition(A, lo, hi):
"""rearrange A[lo:hi+1] and return j such that
A[lo:j] <= pivot
A[j] == pivot
A[j+1:hi+1] >= pivot
"""
pivot = A[lo]
if A[hi] > pivot:
A[lo], A[hi] = A[hi], A[lo]
#now A[hi] <= A[lo], and A[hi] and A[lo] need to be exchanged
i = lo
j = hi
while i < j:
A[i], A[j] = A[j], A[i]
i += 1
while A[i] < pivot:
i += 1
j -= 1
while A[j] > pivot:
j -= 1
#now put pivot in the j-th place
if A[lo] == pivot:
A[lo], A[j] = A[j], A[lo]
else:
#then A[right] == pivot
j += 1
A[j], A[hi] = A[hi], A[j]
return j
def quickselect(A, left, right, k):
pivotIndex = partition(A, left, right)
if k == pivotIndex:
return A[k]
elif k < pivotIndex:
return quickselect(A, left, pivotIndex - 1, k)
else:
return quickselect(A, pivotIndex + 1, right, k)
Introselect: O(n) worst-case time
Basically, do quickselect, but if recursion gets too deep, switch to median-of-medians.
import numpy as np
def findKthSmallest(nums: List[int], k: int) -> int:
return np.partition(nums, k, kind='introselect')[k]
Rivest-Floyd algorithm: O(n) average time, O(n^2) worst-case time
Another way to speed up quickselect:
import math
C1 = 600
C2 = 0.5
C3 = 0.5
def rivest_floyd(A, left, right, k):
assert k < len(A)
while right > left:
if right - left > C1:
#select a random sample from A
N = right - left + 1
I = k - left + 1
Z = math.log(N)
S = C2 * math.exp(2/3 * Z) #sample size
SD = C3 * math.sqrt(Z * S * (N - S) / N) * math.copysign(1, I - N/2)
#select subsample such that kth element lies between newleft and newright most of the time
newleft = max(left, k - int(I * S / N + SD))
newright = min(right, k + int((N - I) * S / N + SD))
rivest_floyd(A, newleft, newright, k)
A[left], A[k] = A[k], A[left]
j = partition2(A, left, right)
if j <= k:
left = j+1
if k <= j:
right = j-1
return A[k]
[1]Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4., pp.220-223

Changing the randomized select algorithm affect on runtime

What happens to Randomized select algorithm running time if we change line 8 in the code from q-1 to q in CLRS book page 216 ?
what I found is that algorithm should still work and there shouldn't be any change in running time since it depends only on RANDOMIZED PARTITION subroutine. Is it true ?
Randomized-Select (A,p,r,i)
// Finds the ith smallest value in A[p .. r].
if (p = r)
return A[p]
q = Randomized-Partition(A,p,r)
k = q-p+1 // k = size of low side + 1 (pivot)
if (i = k)
return A[q]
else if (i<k)
return Randomized-Select(A,p,q-1,i)
else
return Randomized-Select(A,q+1,r,i-k)
I-th statistics might be in the:
left part - range p ..q-1
right part - range q+1..r
exactly at index q
The last case happens when condition fulfills:
if (i = k)
return A[q]
otherwise we know that q-th element never will be i-th statistics, so it is not wise to treat it again and again at later iterations (recursion levels).
Proposed modification won't change complexity but real run time might increase a bit
(average case n + n/2 + n/4 + ... + 1=2n vs n + (n/2+1) + (n/4+1) + ... + 1=2n+log(n))

Deriving the cost function from algorithm and proving their complexity is O(...)

When computing algorithm costs with 1 for each operation, it gets confusing when while loops depend on more than one variable. This pseudo code inserts an element into the right place of a heap.
input: H[k] // An array of size k, storing a heap
e // an element to insert
s // last element in array (s < k - 1)
output: Array H, e is inserted into H in the right place
s = s+1 [2]
H[s] = e [3]
while s > 1: ]
t=s/2 ][3]
if H[s] < H[t] ][3]
tmp = H[s] ][3]
H[s] = H[t] ][3]
H[t] = tmp ][3]
s = t ][2]
else
break ][1]
return H
What would be the cost function in terms of f(n)? and the Big O complexity?
I admit, I was initially confused by the indentation of your pseudo-code. After being prompted by M.K's comment, I reindented your code, and understood what you meant about more than one variable.
Hint: If s is equal to 2k, the loop will iterate k times, worst case. Expected average is k/2 iterations.
The reason for k/2 is that absent any other information, we assume the input e has equal chance of being any value between the current min and max of the array. If you know the distribution, then you can skew the expected average accordingly. Usually, though, the expected average will be constant factor of k, and so does not affect the big-O.
Let n be the number of elements in the heap. So, the cost function f(n) represents the cost of the function for a heap of size n. The cost of the function outside of the while loop is constant C1, so f(n) is dominated by the while loop itself, g(n). The cost of each iteration of the loop is also constant C2, so the cost is dependent on the number of iterations. So: f(n) = C1 + g(n+1). And g(n) = C2 + g(n/2). Now, you can solve the characteristic equation for g(n). Note that g(1) is 0, and g(2) is C2.
The algorithm as presented uses swaps to sort of bubble the element up into the correct position. To make the inner loop more efficient (it doesn't change the complexity, mind you), the inner loop can instead behave more like an insertion sort would behave, and place the element in the right place only at the end.
s = s+1
while s > 1 and e < H[s/2]:
H[s] = H[s/2];
s = s/2;
H[s] = e;
If you look at the while loop, you’ll observe that s divides itself by 2 until you reach 1.
Therefore the number of iterations will be equal to the log of s to the base 2.

How to make the run time of the program to ϴ n

The requirement is that the input will be set of integer ranging from -5 to 5, the result should give the longest subset of the integer, in which the total must be greater or equal to zero.
I can only come up with the following:
The input will be input[0 to n]
let start, longestStart, end, longestEnd, sum = 0
for i=0 to n-1
start = i
sum = input[i]
for j=1 to n
if sum + input[j] >= 0 then
end=j;
if end - start > longestEnd - longestStart then
longestStart = start;
longestEnd = end;
However this is ϴ(n^2). I would like to know what are the ways to make this loop become ϴ(n)
Thank you
Since
a - b == (a + n) - (b + n)
for any a, b or n, we can apply this to the array of numbers, keeping a running total of all elements from 0 to current. From the above equation, the sum of any subarray from index a to b is sum(elements 0-b) - sum(elements 0-a).
By keeping track of local minima and maxima, and the sums to them, you can find the subarray with the greatest range in one pass, ie O(n).

Understanding the bottom-up rod cut implementation

In Introduction to Algorithms(CLRS), Cormen et al. talk about solving the Rod-cutting problem as follows(page 369)
EXTENDED-BOTTOM-UP-CUT-ROD(p, n)
let r[0...n] and s[0....n] be new arrays
r[0] = 0
for j = 1 to n:
q = -infinity
for i = 1 to j:
if q < p[i] + r[j - i]: // (6)
q = p[i] + r[j - i]
s[j] = i
r[j] = q
return r and s
Here p[i] is the price of cutting the rod at length i, r[i] is the revenue of cutting the rod at length i and s[i], gives us the optimal size for the first piece to cut off.
My question is about the outer loop that iterates j from 1 to n and the inner loop i that goes from 1 to n as well.
On line 6 we are comparing q (the maximum revenue gained so far) with r[j - i], the maximum revenue gained during the previous cut.
When j = 1 and i = 1, it seems to be fine, but the very next iteration of the inner loop where j = 1 and i = 2, won't r[j - i] be r[1 - 2] = r[-1]?
I am not sure if the negative index makes sense here. Is that a typo in CLRS or I am missing something here?
I case some of you don't know what the rod-cutting problem is, here's an example.
Here's the key: for i = 1 to j
i will begin at 1 and increase in value up to but not exceeding the value of j.
i will never be greater than j, thus j-i will never be less than zero.
Variable i will not be greater than variable j because of the inner loop and thus index r become never less than zero.
You are missing the conditions in the inner for loop. In that, the value of i goes only upto j. So if it exceeds j, the loop will be terminated. Hence no question of the negative indices you mentioned.

Resources