time complexity similar to bubble sort - algorithm

Analyze the following sorting algorithm:
for (int i = 0; i < SIZE; i++)
{
if (list[i] > list[i + 1])
{
swap list[i] with list[i + 1];
i = 0;
}
}
I want to determine the time complexity for this, in the worse case...I don't understand how it is O(n^3)

Clearly the for loop by itself is O(n). The question is, how many times can it run?
Every time you do a swap, the loop starts over. How many times will you do a swap? You will do a swap for each element from its starting position until it reaches its proper spot in the sorted output. For an input that is reverse sorted, that will average to n/2 times, or O(n) again. But that's for each element, giving another O(n). That's how you get to O(n^3).

I ran an analysis for n = 10 and n = 100. The number of comparisons seems to be O(n3) which makes sense because i gets set to 0 an average of n / 2 times so it's somewhere around n2*(n/2) comparison and increment operations for your for loop, but the number of swaps seems to be only O(n2) because obviously no more swaps are necessary to sort the entire list. The best case is still n-1 comparisons and 0 swaps of course.
For best-case testing I use an already sorted array of n elements: [0...n-1].
For worst-case testing I use a reverse-sorted array of n elements: [n-1...0]
def analyzeSlowSort(A):
comparison_count = swap_count = i = 0
while i < len(A) - 1:
comparison_count += 1
if A[i] > A[i+1]:
A[i], A[i+1] = A[i+1], A[i]
swap_count += 1
i = 0
i += 1
return comparison_count, swap_count
n = 10
# Best case
print analyzeSlowSort(range(n)) # ->(9, 0)
# Worst case
print analyzeSlowSort(range(n, 0, -1)) # ->(129, 37)
n = 100
# Best case
print analyzeSlowSort(range(n)) # ->(99, 0)
# Worst case
print analyzeSlowSort(range(n, 0, -1)) # ->(161799, 4852)
Clearly this is a very inefficient sorting algorithm in terms of comparisons. :)

Okay.. here goes..
in the worst case lets say we have a completely flipped array..
9 8 7 6 5 4 3 2 1 0
Each time there is a swap.. the i is getting resetted to 0.
Lets start by flipping 9 8 : We have now 8 9 7 6 5 4 3 2 1 0 and the i is set back to zero.
Now the loop runs till 2 and we have a flip again.. : 8 7 9 6 5 4 3 2 1 0 i reset again.. but to get 7 to the first place we have another flip for 8 and 7. : 7 8 9 6 5 4 3 2 1 0
So the number of loops are like this :
T(1) = O(1)
T(2) = O(1 + 2)
T(3) = O(1 + 2 + 3)
T(4) = O(1 + 2 + 3 + 4) and so on..
Finally For nth term which is the biggest in this case its T(n) = O(n(n-1)/2).
But for the entire thing you need to sum all of these terms up
Which can be bounded by the case Summation of (T(n)) = O(Summation of (n^2)) = O(n^3)
Addition
Think of it this way: For each element you need to go up to it and bring it back.. but when you bring it back its just by one space. I hope that makes it a little more clear.
Another Edit
If any of the above is not making sense. Think of it this way : You have to bring 0 to the front of the array. You have initially walk up to the zero 9 steps and put it before 1. But after that you are magically transported (i=0) to the beginning of the array. So now you have to walk 8 steps to the zero and then bring it in two's position. Again Zap! and you are back to start of the array. How many steps approximately you have to take to get to zero each time so that its right at the front. 9 + 8 + 7 + 6 + 5 + .. this is the last term of the recurrence and so is bounded by the Square of the length of the array. Does this make sense? Now to do this for each of the element on average you are doing O(n) work.. right? Which translates to summing all the terms up.. And we have O(n^3).
Please comment if things help or don't make sense.

Related

How to effectively calculate an algorithm's time complexity? [duplicate]

This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 5 years ago.
I'm studying algorithm's complexity and I'm still not able to determine the complexity of some algorithms ... Ok I'm able to figure out basic O(N) and O(N^2) loops but I'm having some difficult in routines like this one:
// What is time complexity of fun()?
int fun(int n)
{
int count = 0;
for (int i = n; i > 0; i /= 2)
for (int j = 0; j < i; j++)
count += 1;
return count;
}
Ok I know that some guys can calculate this with the eyes closed but I would love to to see a "step" by "step" how to if possible.
My first attempt to solve this would be to "simulate" an input and put the values in some sort of table, like below:
for n = 100
Step i
1 100
2 50
3 25
4 12
5 6
6 3
7 1
Ok at this point I'm assuming that this loop is O(logn), but unfortunately as I said no one solve this problem "step" by "step" so in the end I have no clue at all of what was done ....
In case of the inner loop I can build some sort of table like below:
for n = 100
Step i j
1 100 0..99
2 50 0..49
3 25 0..24
4 12 0..11
5 6 0..5
6 3 0..2
7 1 0..0
I can see that both loops are decreasing and I suppose a formula can be derived based on data above ...
Could someone clarify this problem? (The Answer is O(n))
Another simple way to probably look at it is:
Your outer loop initializes i (can be considered step/iterator) at n and divides i by 2 after every iteration. Hence, it executes the i/2 statement log2(n) times. So, a way to think about it is, your outer loop run log2(n) times. Whenever you divide a number by a base continuously till it reaches 0, you effectively do this division log number of times. Hence, outer loop is O(log-base-2 n)
Your inner loop iterates j (now the iterator or the step) from 0 to i every iteration of outer loop. i takes the maximum value of n, hence the longest run that your inner loop will have will be from 0 to n. Thus, it is O(n).
Now, your program runs like this:
Run 1: i = n, j = 0->n
Run 2: i = n/2, j = 0->n/2
Run 3: i = n/4, j = 0->n/4
.
.
.
Run x: i = n/(2^(x-1)), j = 0->[n/(2^(x-1))]
Now, runnning time always "multiplies" for nested loops, so
O(log-base-2 n)*O(n) gives O(n) for your entire code
Lets break this analysis up into a few steps.
First, start with the inner for loop. It is straightforward to see that this takes exactly i steps.
Next, think about which different values i will assume over the course of the algorithm. To start, consider the case where n is some power of 2. In this case, i starts at n, then n/2, then n/4, etc., until it reaches 1, and finally 0 and terminates. Because the inner loop takes i steps each time, then the total number of steps of fun(n) in this case is exactly n + n/2 + n/4 + ... + 1 = 2n - 1.
Lastly, convince yourself this generalizes to non-powers of 2. Given an input n, find smallest power of 2 greater than n and call it m. Clearly, n < m < 2n, so fun(n) takes less than 2m - 1 steps which is less than 4n - 1. Thus fun(n) is O(n).

Analyze the run time of a nested for loops algorithm

Say i have the following code
def func(A,n):
for i = 0 to n-1:
for k = i+1 to n-1:
for l = k+1 to n-1:
if A[i]+A[k]+A[l] = 0:
return True
A is an array, and n denotes the length of A.
As I read it, the code checks if any 3 consecutive integers in A sum up to 0. I see the time complexity as
T(n) = (n-2)(n-1)(n-2)+O(1) => O(n^3)
Is this correct, or am I missing something? I have a hard time finding reading material about this (and I own CLRS)
You have the functionality wrong: it checks to see whether any three elements add up to 0. To improve execution time, it considers them only in index order: i < k < j.
You are correct about the complexity. Although each loop takes a short-cut, that short-cut is merely a scalar divisor on the number of iterations. Each loop is still O(n).
As for the coding, you already have most of it done -- and Stack Overflow is not a coding service. Give it your best shot; if that doesn't work and you're stuck, post another question.
If you really want to teach yourself a new technique, look up Python's itertools package. You can use this to generate all the combinations in triples. You can then merely check sum(triple) in each case. In fact, you can use the any method to check whether any one triple sums to 0, which could reduce your function body to a single line of Python code.
I'll leave that research to you. You'll learn other neat stuff on the way.
Addition for OP's comment.
Let's set N to 4, and look at what happens:
i = 0
for k = 1 to 3
... three k loop
i = 1
for k = 2 to 3
... two k loops
i = 2
for k = 3 to 3
... one k loop
The number of k-loop executions is the "triangle" number of n-1: 3 + 2 + 1. Let m = n-1; the formula is T(m) = m(m-1)/2.
Now, you propagate the same logic to the l loops. You run T(k) loops on l for k= 1, 2, 3. If I recall, this third-order "pyramid" formula is P(m) = m(m-1)(m-2)/6.
In terms of n, this is (n-1)(n-2)(n-3)/6 loops on l. When you multiply this out, you get a straightforward cubic formula in n.
Here is the sequence for n=5:
0 1 2
0 1 3
0 1 4
change k
0 2 3
0 2 4
change k
0 3 4
change k
change k
change l
1 2 3
1 2 4
change k
1 3 4
change k
change k
change l
2 3 4
BTW, l is a bad variable name, easily confused with 1.

sum of maximum element of sliding window of length K

Recently I got stuck in a problem. The part of algorithm requires to compute sum of maximum element of sliding windows of length K. Where K ranges from 1<=K<=N (N length of an array).
Example if I have an array A as 5,3,12,4
Sliding window of length 1: 5 + 3 + 12 + 4 = 24
Sliding window of length 2: 5 + 12 + 12 = 29
Sliding window of length 3: 12 + 12 = 24
Sliding window of length 4: 12
Final answer is 24,29,24,12.
I have tried to this O(N^2). For each sliding window of length K, I can calculate the maximum in O(N). Since K is upto N. Therefore, overall complexity turns out to be O(N^2).
I am looking for O(N) or O(NlogN) or something similar to this algorithm as N maybe upto 10^5.
Note: Elements in array can be as large as 10^9 so output the final answer as modulo 10^9+7
EDIT: What I actually want to find answer for each and every value of K (i.e. from 0 to N) in overall linear time or in O(NlogN) not in O(KN) or O(KNlogN) where K={1,2,3,.... N}
Here's an abbreviated sketch of O(n).
For each element, determine how many contiguous elements to the left are no greater (call this a), and how many contiguous elements to the right are lesser (call this b). This can be done for all elements in time O(n) -- see MBo's answer.
A particular element is maximum in its window if the window contains the element and only elements among to a to its left and the b to its right. Usefully, the number of such windows of length k (and hence the total contribution of these windows) is piecewise linear in k, with at most five pieces. For example, if a = 5 and b = 3, there are
1 window of size 1
2 windows of size 2
3 windows of size 3
4 windows of size 4
4 windows of size 5
4 windows of size 6
3 windows of size 7
2 windows of size 8
1 window of size 9.
The data structure that we need to encode this contribution efficiently is a Fenwick tree whose values are not numbers but linear functions of k. For each linear piece of the piecewise linear contribution function, we add it to the cell at beginning of its interval and subtract it from the cell at the end (closed beginning, open end). At the end, we retrieve all of the prefix sums and evaluate them at their index k to get the final array.
(OK, have to run for now, but we don't actually need a Fenwick tree for step two, which drops the complexity to O(n) for that, and there may be a way to do step one in linear time as well.)
Python 3, lightly tested:
def left_extents(lst):
result = []
stack = [-1]
for i in range(len(lst)):
while stack[-1] >= 0 and lst[i] >= lst[stack[-1]]:
del stack[-1]
result.append(stack[-1] + 1)
stack.append(i)
return result
def right_extents(lst):
result = []
stack = [len(lst)]
for i in range(len(lst) - 1, -1, -1):
while stack[-1] < len(lst) and lst[i] > lst[stack[-1]]:
del stack[-1]
result.append(stack[-1])
stack.append(i)
result.reverse()
return result
def sliding_window_totals(lst):
delta_constant = [0] * (len(lst) + 2)
delta_linear = [0] * (len(lst) + 2)
for l, i, r in zip(left_extents(lst), range(len(lst)), right_extents(lst)):
a = i - l
b = r - (i + 1)
if a > b:
a, b = b, a
delta_linear[1] += lst[i]
delta_linear[a + 1] -= lst[i]
delta_constant[a + 1] += lst[i] * (a + 1)
delta_constant[b + 2] += lst[i] * (b + 1)
delta_linear[b + 2] -= lst[i]
delta_linear[a + b + 2] += lst[i]
delta_constant[a + b + 2] -= lst[i] * (a + 1)
delta_constant[a + b + 2] -= lst[i] * (b + 1)
result = []
constant = 0
linear = 0
for j in range(1, len(lst) + 1):
constant += delta_constant[j]
linear += delta_linear[j]
result.append(constant + linear * j)
return result
print(sliding_window_totals([5, 3, 12, 4]))
Let's determine for every element an interval, where this element is dominating (maximum). We can do this in linear time with forward and backward runs using stack. Arrays L and R will contain indexes out of the domination interval.
To get right and left indexes:
Stack.Push(0) //(1st element index)
for i = 1 to Len - 1 do
while Stack.Peek < X[i] do
j = Stack.Pop
R[j] = i //j-th position is dominated by i-th one from the right
Stack.Push(i)
while not Stack.Empty
R[Stack.Pop] = Len //the rest of elements are not dominated from the right
//now right to left
Stack.Push(Len - 1) //(last element index)
for i = Len - 2 to 0 do
while Stack.Peek < X[i] do
j = Stack.Pop
L[j] = i //j-th position is dominated by i-th one from the left
Stack.Push(i)
while not Stack.Empty
L[Stack.Pop] = -1 //the rest of elements are not dominated from the left
Result for (5,7,3,9,4) array.
For example, 7 dominates at 0..2 interval, 9 at 0..4
i 0 1 2 3 4
X 5 7 3 9 4
R 1 3 3 5 5
L -1 -1 1 -1 4
Now for every element we can count it's impact in every possible sum.
Element 5 dominates at (0,0) interval, it is summed only in k=1 sum entry
Element 7 dominates at (0,2) interval, it is summed once in k=1 sum entry, twice in k=2 entry, once in k=3 entry.
Element 3 dominates at (2,2) interval, it is summed only in k=1 sum entry
Element 9 dominates at (0,4) interval, it is summed once in k=1 sum entry, twice in k=2, twice in k=3, twice in k=4, once in k=5.
Element 4 dominates at (4,4) interval, it is summed only in k=1 sum entry.
In general element with long domination interval in the center of long array may give up to k*Value impact in k-length sum (it depends on position relative to array ends and to another dom. elements)
k 1 2 3 4 5
--------------------------
5
7 2*7 7
3
9 2*9 2*9 2*9 9
4
--------------------------
S(k) 28 32 25 18 9
Note that the sum of coefficients is N*(N-1)/2 (equal to the number of possible windows), the most of table entries are empty, so complexity seems better than O(N^2)
(I still doubt about exact complexity)
The sum of maximum in sliding windows for a given window size can be computed in linear time using a double ended queue that keeps elements from the current window. We maintain the deque such that the first (index 0, left most) element in the queue is always the maximum of the current window.
This is done by iterating over the array and in each iteration, first we remove the first element in the deque if it is no longer in the current window (we do that by checking its original position, which is also saved in the deque together with its value). Then, we remove any elements from the end of the deque that are smaller than the current element, and finally we add the current element to the end of the deque.
The complexity is O(N) for computing the maximum for all sliding windows of size K. If you want to do that for all values of K from 1..N, then time complexity will be O(N^2). O(N) is the best possible time to compute the sum of maximum values of all windows of size K (that is easy to see). To compute the sum for other values of K, the simple approach is to repeat the computation for each different value of K, which would lead to overall time of O(N^2). Is there a better way ? No, because even if we save the result from a computation for one value of K, we would not be able to use it to compute the result for a different value of K, in less then O(N) time. So best time is O(N^2).
The following is an implementation in python:
from collections import deque
def slide_win(l, k):
dq=deque()
for i in range(len(l)):
if len(dq)>0 and dq[0][1]<=i-k:
dq.popleft()
while len(dq)>0 and l[i]>=dq[-1][0]:
dq.pop()
dq.append((l[i],i))
if i>=k-1:
yield dq[0][0]
def main():
l=[5,3,12,4]
print("l="+str(l))
for k in range(1, len(l)+1):
s=0
for x in slide_win(l,k):
s+=x
print("k="+str(k)+" Sum="+str(s))

Merge sort space and time complexity

Given the following merge sort algorithm :
mergesort(A,p,r)
if (r <= l) return //constant amount of time
int m = (p+r)/2 //constant amount of time
mergesort(A, p, q) // these two calls will decide the
mergesort(A, q+1, r) // O(logn) factor inside O(n * logn) right?
merge(A, p, q, r) lets dwell further
merge(a,p,q,r){
n1 = q-p+1 //constant time
n2 = r-q //constant time
// Let L[1...n1+1] and R[1...n2+1] be new arrays // idk , lets say constant
for i,j in L[],R[]
L[i] = A[p+i-1]
R[j] = A[q+j] // shouldn't this take time varying on size of array?
// also extra space too?
i=1 j =1 // constant time
for k = p to r // everything below this will contribute to O(n)
// inside O(n*logn) amirite?
if L[i]<=R[j]
A[k] = L[i]
i++
else A[k] = R[j]
j++
How come we are estimating O(nlogn) time complexity for it , keeping in mind that there are left and right arrays being created to be merged back?
And how come space complexity is O(n) only if extra size is being used? Won't the two of them be increased by n, because filling up array takes O(n) and L[] and R[] are being created at each recursion step.
I suggest you reason about this by drawing a tree on paper: first write down your whole array:
2 4 7 1 4 6 2 3 7 ...
Then write what the recursion causes it to be split in below it:
2 4 7 1 3 4 6 2 3 7 ...
|
2 4 7 1 3 4 6 2 3 7 ...
| |
2 4 7 1 3 4 6 2 3 7
And so on with each piece.
Then, count how many rows you've written. This will be close to the base 2 logarithm of the number of elements you started with (O(log n)).
Now, how much work is being done for each row? It's O(n). Merging two arrays of lengths n1, n2 will take O(n1 + n2), even if you have to allocate space for them (and you don't in a proper implementation). Since each row in the recursion tree has n array elements, it follows that the work done for each row is O(n) and therefore the entire algorithm is O(n log n).
And how come space complexity is O(n) only if extra size is being used? Won't the two of them be increased by n , because filling up array takes O(n) and L[] and R[] are being created at each recursion step.
This is more interesting. If you do indeed create new L, R arrays at each recursion step, then the space complexity will be O(n log n). But you don't do that. You create one extra array of size n at the beginning (think of it as a global variable), and then you store the result of each merge into it.
You only pass around things that help you identify the subarrays, such as their sizes and the indexes they begin at. Then you access them using the original array and store the result of the merge in the globally allocated array, resulting in O(n) extra space:
global_temp = array of size equal to the array you're sorting
merge(a,p,q,r){
i=p
j =q // constant time
while i < q and j <= r // or a similar condition
if A[i]<=A[j]
global_temp[k++] = A[i]
i++
else
global_temp[k++] = A[j]
j++
// TODO: copy remaining elements into global_temp
// TODO: copy global_temp into A[p..r]
Your question is unclear, but perhaps you are confused by the extra space you need.
Obviously, on the first pass (and every pass) you read the entire data, and merge each partition into one twice as big.
Let's just focus on 8 elements.
8 7 6 5 4 3 2 1
In the first pass, the size of each partition is 1, and you are merging them to size=2. So you read the 8 and the 7, and merge them into a partition:
7 8 5 6 3 4 1 2
The next stage is to merge groups of 2 into groups of 4. Obviously, you have to read every element. So both of these passes take O(n) operations. The number of times to double is log2(n), which is why this algorithm is O(n log n)
In order to merge, you need extra room. You could recycle it. But the worst case is when you merge two n/2 partitions into n (the last time). The easy way to envision that is to allocate a buffer big enough to copy the entire data into. That would be O(n) storage.
5 6 7 8 1 2 3 4
i j
EMPTY Buffer int buf[8]
k = 0
buf[k++] = (orig[j] < orig[i]) ? orig[j++] : orig[k++]

Number of iterations in nested for-loops?

So I was looking at this code from a textbook:
for (int i=0; i<N; i++)
for(int j=i+1; j<N; j++)
The author stated that the inner for-loop iterates for exactly N*(N-1)/2 times but gives no basis for how he arrived to such an equation. I understand N*(N-1) but why divide by 2? I ran the code myself and sure enough when N is 10, the inner loop iterates 45 times (10*9/2).
I messed around with the code myself and tried the following (assigned only i to j):
for (int i=0; i<N; i++)
for(int j=i; j<N; j++)
With N = 10, this results in 55. So I'm having trouble understanding the underlying math here. Sure I could just plug in all the values and bruteforce my way through the problem, but I feel there is something essential and very simple I'm missing. How would you come up with an equation for describing the for loop I just constructed? Is there a way to do it without relying on the outputs? Would really appreciate any help thanks!
Think about what happens each time the outer loop iterates. The first time, i == 0, so the inner loop starts at 1 and runs to N-1, which is N-1 iterations in total. The next time through the outer loop, i has incremented to 1, so the inner loop starts at 2 and runs up to N-1, for a total of N-2 iterations. And that pattern continues: the third time through the outer loop, you get N-3 iterations, the fourth time through, N-4, etc. When you get to the last iteration of the outer loop, i == N-1, so the inner loop starts with j = N and stops immediately. So that's zero iterations.
The total number of iterations is the sum of all these numbers:
(N-1) + (N-2) + (N-3) + ... + 1 + 0
To look at it another way, this is just the sum of the positive integers from 1 to N-1. The result of this sum is called the (N-1)th triangular number, and Wikipedia explains how you can find that the formula for the n'th triangular number is n(n+1)/2. But here you have the (N-1)th triangular number, so if you set n=N-1, you get
(N-1)(N-1+1)/2 = N(N-1)/2
You're looking at nested loops where the outer one runs N times and the inner one (N-1). You're in effect adding up the sum of 1 + 2 + 3 + ....
The N * (N+1) / 2 is a "classic" formula in mathematics. Young Carl Gauss, later a famous mathematician, was given in-class busywork: Adding up the numbers from 1 to 100. The teacher expected to keep the kids busy for an hour but Carl came up with the answer almost immediately: 5050. He explained: 1 + 100; 2 + 99; 3 + 98; 4 + 97; and so on up to 50 + 51. That's 50 sums of 101 each. You could also see that as (100 / 2) * (100 + 1); that's where the /2 comes from.
As for why it's (N-1) instead of the (N+1) I mentioned... that could have to do with starting from 1 rather than 0, that would drop one iteration from the inner loop, I think.
Look at how many times the inner (j) loop runs for each value of i. When N = 10, the outer (i) loop runs 10 times, and the j loop should run 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 times. Now you just add up those numbers to see how many times the inner loop runs. You can sum the numbers from 0 to N-1 with the formula N(N-1)/2. This is a very slight modification of a well-known formula for adding the numbers from 1 to N.
For a visual aid, you can see why 1 + 2 + 3 + ... + n = n * (n+1) / 2
If you count the iterations of the inner loop, you get:
1 2 3 4 5 6 7 8 9 10
To get the total for an arbitrary number of iterations, you can "wrap" the numbers around like this:
0 1 2 3 4
9 8 7 6 5
Now, if we add each of those columns, the all add to 9 (N-1), and there are 5 (N/2) columns. It's pretty obvious that for any even N, we'd still get N/2 columns that each added up to (N-1). As such, when the total number of iterations is even, the total number of iterations is always (N/2)(N-1), which (thanks to the commutative property) we can rewrite as N(N-1)/2.
If we did the same for an odd number of iterations, we'd have one "odd" column that couldn't be paired. In this case, we can ignore the '0' since we know it won't affect the overall sum in any case. For example, let's consider N=9 instead of N=10. For that, we get:
1 2 3 4
8 7 6 5
This gives us (N-1)/2 columns (9-1=8, 8/2=4) that each add up to N, so the sum will be N*(N-1)/2. Even though we've arrived at it slightly differently, this is an exact match for the formula above for when N is even. Again, it seems pretty obvious that this would remain true regardless of the number of columns we used (i.e., total number of iterations).
For any N (odd or even), the sum of the numbers from 0 through N-1 is N*(N-1)/2.

Resources