how to find the time complexity using step count - complexity-theory

sum(array,n)
{
tsum=0;
for(i=0;i<n;i++)
tsum=tsum+array[i];
return tsum;
}

In terms big-O, that's linear in proportion to the number of array elements processed, so O(n)
(Note, your code overwrites the i parameter, presume that parameter was intended to be n, to indicate the number of array elements to sum?)

You declare tsum=0 in 1 step, then you run the loop for n steps. in each iteration, you do a sum which is 1 step, and then you return tsum in 1 step. Your step count is approximately:
1 + 3n + 1, which is O(n) in terms of the big-oh notation, which ignores constants, and all lower order terms (if there are a constant number of them.) There are 3n steps as in each iteration, you increment the variable i, check if it is less than n, and then enter the loop to do the computation.

Cost Times
tsum=0; c1 1
for(i=0;i<n;i++) c2 n+1
tsum=tsum+array[i]; c3 n
return tsum; c4 1
The total cost of algorithm is 1*c1 + (n+1)c2 + nc3 + 1*c4
Thus, the time required for this algorithm is proportional to n.

Related

What is the time complexity of this BFS algorithm?

I looked at LeetCode question 270. Perfext Squares:
Given an integer n, return the least number of perfect square numbers that sum to n.
A perfect square is an integer that is the square of an integer; in other words, it is the product of some integer with itself. For example, 1, 4, 9, and 16 are perfect squares while 3 and 11 are not.>
Example 1:
Input: n = 12
Output: 3
Explanation: 12 = 4 + 4 + 4.
I solved it using the following algorithm:
def numSquares(n):
squares = [i**2 for i in range(1, int(n**0.5)+1)]
step = 1
queue = {n}
while queue:
tempQueue = set()
for node in queue:
for square in squares:
if node-square == 0:
return step
if node < square:
break
tempQueue.add(node-square)
queue = tempQueue
step += 1
It basically tries to go from goal number to 0 by subtracting each possible number, which are : [1 , 4, 9, .. sqrt(n)] and then does the same work for each of the numbers obtained.
Question
What is the time complexity of this algorithm? The branching in every level is sqrt(n) times, but some branches are destined to end early... which makes me wonder how to derive the time complexity.
If you think about what you're doing, you can imagine that you're doing a breadth-first search over a graph with n + 1 nodes (all the natural numbers between 0 and n, inclusive) and some number of edges m, which we'll determine later on. Your graph is essentially represented as an adjacency list, since at each point you iterate over all the outgoing edges (squares less than or equal to your number) and stop as soon as you consider a square that's too large. As a result, the runtime will be O(n + m), and all we have to do now is work out what m is.
(There's another cost here in computing all the square roots up to and including n, but that takes time O(n1/2), which is dominated by the O(n) term.)
If you think about it, the number of outgoing edges from each number k will be given by the number of perfect squares less than or equal to k. That value is equal to ⌊√k⌋ (check this for a few examples - it works!). This means that the total number of edges is upper-bounded by
√0 + √1 + √2 + ... + √n
We can show that this sum is Θ(n3/2). First, we'll upper-bound this sum at O(n3/2), which we can do by noting that
√0 + √1 + √2 + ... + √n
≤ √n + √n + √ n + ... + √n (n+1) times
= (n + 1)√n
= O(n3/2).
To lower-bound this at Ω(n3/2), notice that
√0 + √1 + √2 + ... + √ n
≥ √(n/2) + √(n/2 + 1) + ... + √(n) (drop the first half of the terms)
≥ √(n/2) + √(n/2) + ... + √(n/2)
= (n / 2)√(n / 2)
= Ω(n3/2).
So overall, the number of edges is Θ(n3/2), so using a regular analysis of breadth-first search we can see that the runtime will be O(n3/2).
This bound is likely not tight, because this assumes that you visit every single node and every single edge, which isn't going to happen. However, I'm not sure how to tighten things much beyond this.
As a note - this would be a great place to use A* search instead of breadth-first search, since you can fairly easily come up with heuristics to underestimate the remaining total distance (say, take the number and divide it by the largest perfect square less than it). That would cause the search to focus on extremely promising paths that jump rapidly toward 0 before less-good paths, like, say, always taking steps of size one.
Hope this helps!
Some observations:
The number of squares up to n is √n (floored to the nearest integer)
After the first iteration of the while loop, tempQueue will have √n entries
tempQueue can never have more than n entries, since all these values are positive, less than n and unique.
Every natural number can be written as the sum of four integer squares. So that means your BFS algorithm's while loop will iterate at the most 4 times. If the return statement did not get executed during any of the first 3 iterations, it is guaranteed it will in the 4th.
Every statement (except for the initialisation of squares) runs in constant time, even the call to .add().
The initialisation of squares has a list comprehension loop that has √n iterations, and range runs in constant time, so that initialisation has a time complexity of O(√n).
Now we can set a ceiling to the number of times the if node-square == 0 statement is executed (or any other statement in the innermost loop's body):
1⋅√n + √n⋅√n + n⋅√n + n⋅√n
Each of the 4 terms corresponds to an iteration of the while loop. The left factor of each product corresponds to the maximum size of queue in that particular iteration, and the factor at the right corresponds to the size of squares (always the same). This simplifies to:
√n + n + 2n3⁄2
In terms of time complexity this is:
O(n3⁄2)
This is the worst case time complexity. When the while loop only has to iterate twice, it is O(n), and when only once (when n is a square), it is O(√n).

Theta Notation and Worst Case Running time nested loops

This is the code I need to analyse:
i = 1
while i < n
do
j = 0;
while j <= i
do
j = j + 1
i = 2i
So, the first loop should run log(2,n) and the innermost loop should run log(2,n) * (i + 1), but I'm pretty sure that's wrong.
How do I use a theta notation to prove it?
An intuitive way to think about this is to see how much work your inner loop is doing for a fixed value of outer loop variable i. It's clearly as much as i itself. Thus, if the value of i is 256, then then you will do j = j + 1 that many times.
Thus, total work done is the sum of the values that i takes in the outer loop's execution. That variable is increasing much rapidly to catch up with n. Its values, as given by i = 2i (it should be i = 2*i), are going to be like: 2, 4, 8, 16, ..., because we start with 2 iterations of the inner loop when i = 1. This is a geometric series: a, ar, ar^2 ... with a = 1 and r = 2. The last term, as you figured out will be n and there will be log2 n terms in the series. And that is simple summation of a geometric series.
It doesn't make much sense to have a worst case or a best case for this algorithm because there are no different permutations of the input which is just a number n in this case. Best case or worst case are relevant when a particular input (e.g. a particular sequence of numbers) affects the running time of the algorithm.
The running time then is the sum of geometric series (a.(r^num_terms - 1)/(r-1)):
T(n) = 2 + 4 + ... 2^(log2 n)
= 2 . (2^log2 n - 1)
= 2 . (n - 1)
⩽ 3n = O(n)
Thus, you can't be doing work that is more than some constant multiple of n. Hence, the running time of this algorithm is O(n).
You can't be doing some work that is less than some (other) constant multiple of n, since you have to go through the increment in inner loop as shown above. Thus, the running time of this algorithm is also ≥ c.n i.e. it is Ω(n).
Together, this means that running time of this algorithm is Θ(n).
You can't use i in your final expression; only n.
You can easily see that the inner loop executes i times each time it is reached. And it sounds like you've figured out the different values that i can have. So add up those values, and you have the total amount of work.

Selection i'th smallest number algorithm

I'm reading Introduction to Algorithms book, second edition, the chapter about Medians and Order statistics. And I have a few questions about randomized and non-randomized selection algorithms.
The problem:
Given an unordered array of integers, find i'th smallest element in the array
a. The Randomized_Select algorithm is simple. But I cannot understand the math that explains it's work time. Is it possible to explain that without doing deep math, in more intuitive way? As for me, I'd think that it should work for O(nlog n), and in worst case it should be O(n^2), just like quick sort. In avg randomizedPartition returns near middle of the array, and array is divided into two each call, and the next recursion call process only half of the array. The RandomizedPartition costs (p-r+1)<=n, so we have O(n*log n). In the worst case it would choose every time the max element in the array, and divide the array into two parts - (n-1) and (0) each step. That's O(n^2)
The next one (Select algorithm) is more incomprehensible then previous:
b. What it's difference comparing to previous. Is it faster in avg?
c. The algorithm consists of five steps. In first one we divide the array into n/5 parts each one with 5 elements (beside the last one). Then each part is sorted using insertion sort, and we select 3rd element (median) of each. Because we have sorted these elements, we can be sure that previous two <= this pivot element, and the last two are >= then it. Then we need to select avg element among medians. In the book stated that we recursively call Select algorithm for these medians. How we can do that? In select algorithm we are using insertion sort, and if we are swapping two medians, we need to swap all four (or even more if it is more deeper step) elements that are "children" for each median. Or do we create new array that contain only previously selected medians, and are searching medians among them? If yes, how can we fill them in original array, as we changed their order previously.
The other steps are pretty simple and look like in the randomized_partition algorithm.
The randomized select run in O(n). look at this analysis.
Algorithm :
Randomly choose an element
split the set in "lower than" set L and "bigger than" set B
if the size of "lower than" is j-1 we found it
if the size is bigger, then Lookup in L
or lookup in B
The total cost is the sum of :
The cost of splitting the array of size n
The cost of lookup in L or the cost of looking up in B
Edited: I Tried to restructure my post
You can notice that :
We always go next in the set with greater amount of elements
The amount of elements in this set is n - rank(xj)
1 <= rank(xi) <= n So 1 <= n - rank(xj) <= n
The randomness of the element xj directly affect the randomness of the number of element which
are greater xj(and which are smaller than xj)
if xj is the element chosen , then you know that the cost is O(n) + cost(n - rank(xj)). Let's call rank(xj) = rj.
To give a good estimate we need to take the expected value of the total cost, which is
T(n) = E(cost) = sum {each possible xj}p(xj)(O(n) + T(n - rank(xj)))
xj is random. After this it is pure math.
We obtain :
T(n) = 1/n *( O(n) + sum {all possible values of rj when we continue}(O(n) + T(n - rj))) )
T(n) = 1/n *( O(n) + sum {1 < rj < n, rj != i}(O(n) + T(n - rj))) )
Here you can change variable, vj = n - rj
T(n) = 1/n *( O(n) + sum { 0 <= vj <= n - 1, vj!= n-i}(O(n) + T(vj) ))
We put O(n) outside the sum , gain a factor
T(n) = 1/n *( O(n) + O(n^2) + sum {1 <= vj <= n -1, vj!= n-i}( T(vj) ))
We put O(n) and O(n^2) outside, loose a factor
T(n) = O(1) + O(n) + 1/n *( sum { 0 <= vj <= n -1, vj!= n-i} T(vj) )
Check the link on how this is computed.
For the non-randomized version :
You say yourself:
In avg randomizedPartition returns near middle of the array.
That is exactly why the randomized algorithm works and that is exactly what it is used to construct the deterministic algorithm. Ideally you want to pick the pivot deterministically such that it produces a good split, but the best value for a good split is already the solution! So at each step they want a value which is good enough, "at least 3/10 of the array below the pivot and at least 3/10 of the array above". To achieve this they split the original array in 5 at each step, and again it is a mathematical choice.
I once created an explanation for this (with diagram) on the Wikipedia page for it... http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

Time complexity

The Problem is finding majority elements in an array.
I understand how this algorithm works, but i don't know why this has O(nlogn) as a time complexity.....
a. Both return \no majority." Then neither half of the array has a majority
element, and the combined array cannot have a majority element. Therefore,
the call returns \no majority."
b. The right side is a majority, and the left isn't. The only possible majority for
this level is with the value that formed a majority on the right half, therefore,
just compare every element in the combined array and count the number of
elements that are equal to this value. If it is a majority element then return
that element, else return \no majority."
c. Same as above, but with the left returning a majority, and the right returning
\no majority."
d. Both sub-calls return a majority element. Count the number of elements equal
to both of the candidates for majority element. If either is a majority element
in the combined array, then return it. Otherwise, return \no majority."
The top level simply returns either a majority element or that no majority element
exists in the same way.
Therefore, T(1) = 0 and T(n) = 2T(n/2) + 2n = O(nlogn)
I think,
Every recursion it compares the majority element to whole array which takes 2n.
T(n) = 2T(n/2) + 2n = 2(2T(n/4) + 2n) +
2n = ..... = 2^kT(n/2^k) + 2n + 4n + 8n........ 2^kn = O(n^2)
T(n) = 2T(n/2) + 2n
The question is how many iterations does it take for n to get to 1.
We divide by 2 in each iteration so we get a series: n , n/2 , n/4 , n/8 ... n/(n^k)
So, let's find k that will bring us to 1 (last iteration):
n/(2^k)=1 .. n=2^k ... k=log(n)
So we got log(n) iterations.
Now, in each iteration we do 2n operations (less because we divide n by 2 each time) but in worth case scenario lets say 2n.
So in total, we got log(n) iterations with O(n) operations: nlog(n)
I'm not sure if I understand, but couldn't you just create a hash map, walk over the array, incrementing hash[value] at every step, then sort the hash map (xlogx time complexity) and compare the top two elements? This would cost you O(n) + O(mlogm) + 2 = O(n + mlogm), with n the size of the array and m the amount of different elements in the vector.
Am I mistaken here? Or ...?
When you do this recursively, you split the array in two for each level, make a call for each half, then makes one of the tests a - d. The test a requires no looping, the other tests requires looping through the entire array. By average you will loop through (0 + 1 + 1 + 1) / 4 = 3 / 4 of the array for each level in the recursion.
The number of levels in the recursion is based on the size of the array. As you split the array in half each level, the number of levels will be log2(n).
So, the total work is (n * 3/4) * log2(n). As constants are irrelevant to the time complexity, and all logarithms are the same, the complexity is O(n * log n).
Edit:
If someone is wondering about the algorithm, here's a C# implementation. :)
private int? FindMajority(int[] arr, int start, int len) {
if (len == 1) return arr[start];
int len1 = len / 2, len2 = len - len1;
int? m1 = FindMajority(arr, start, len1);
int? m2 = FindMajority(arr, start + len1, len2);
int cnt1 = m1.HasValue ? arr.Skip(start).Take(len).Count(n => n == m1.Value) : 0;
if (cnt1 * 2 >= len) return m1;
int cnt2 = m2.HasValue ? arr.Skip(start).Take(len).Count(n => n == m2.Value) : 0;
if (cnt2 * 2 >= len) return m2;
return null;
}
This guy has a lot of videos on recurrence relation, and the different techniques you can use to solve them:
https://www.youtube.com/watch?v=TEzbkIggJfo&list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
Basically for this problem I would use the Master Theorem:
https://youtu.be/i5kTZof1LRY
T(1) = 0 and T(n) = 2T(n/2) + 2n
Master Theorem ==> AT(n/B) + 2n^D, so in this case A=2, B=3, D=1
So according to the Master Theorem this is O(nlogn)
You can also use another method to solve this (below) it would just take a little bit more time:
https://youtu.be/TEzbkIggJfo?list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
I hope this helps you out !

Summing an Array and Big O Notation

How to find an algorithm for calculating the sum value in the array??
Is is Something like this?
Algorithm Array Sum
Input: nonnegative integer N, and array A[1],A[2],...,A[N]
Output: sum of the N integers in array A
Algorith Body:
j:=1
sum:=0
while j<N
sum := sum + a[J]
j:=j+1
end while
end Algorithm Array Sum
And how I can relate it with the running time of the algorithm by using O-Notation
This is the past year exam and I need to make revision for my exam.
Question
An Array A[] holding n integer value is given
1.Give an algorithm for calculating the sum of all the value in the array
2.Find the simplest and best O-notation for the running time of the algorithm.
The question is to find the sum of all the values so iterate through each element in the array and add each element to a temporary sum value.
temp_sum = 0
for i in 1 ...array.length
temp_sum = temp_sum + array[i]
Since you need to go through all the elements in the array, this program depends linearly to the number of elements. If you have 10 elements, iterate through 10 elements, if you have a million you have no choice other than to go through all the million elements and add each of them. Thus the time complexity is Θ(n).
If you are finding the sum of all the elements and you dont know any thing about the data then you need to look at all the elements at least once. Thus n is the lowerbound. You also need not look at the element more than once. n is also the upper bound. Hence the complexity is Θ(n).
However if you know something about the elements..say you get a sequency of n natural numbers, you can do it in constant time with n(n+1)/2. If the data you get are random then you have no choice but do the above linear time algorithm.
Since n is the size of array and all you have to do is iterate from begeinning to end the the Big O notation is O[n]
integer N= Size_array;
array a[N]
j=1
sum=0
while j<=N
sum += a[j]
j++
end while
I think that you meant "while j <= N", you need to specify this.
The running time shall be O(n), I think, as you have only one loop.
To calculate O for this algorithm you need to count the number of times each line of code executes. Later on you will only count the fundamental operations but start by counting all.
So how many times will the j := 1 line run? How many times will the sum := 0 run?
How many times will the while loop's condition execute? The statements inside the while loop?
Sum these all up. You will notice that the value you get will be something like 1 + 1 + n + n + n = 2 + 3n. thus you can conclude that it is a linear function on n.

Resources