Doubts on finding complexity of an algorithm - algorithm

So far I think I understand the basic of finding algorithm complexity:
Basics operations like read,write,assignments and allocations have constant complexity O(k) that can be simplified as O(1).
For loops you have to think of the worst case, so for what value n the loop will take the longest time:
The complexity is O(n) if there are constant increments, for example if you have a variable i that starts from 0 and you increase it or decrease it by one at each loop iteration until you reach n.
The complexity is O(logn) if you have a variable and you increase it or decrease it by multiples.
The complexity is O(n^2) if there are nested loops.
If in a function there are multiple loops, the complexity of the function will be the loop with the worst complexity.
In case the value n doesn't change, and you always have to iterate n times, you use the Θ notation because there isn't a worst case or best case scenario.
Please correct me if anything I said so far is wrong.
For recursive functions the complexity depends on how many recursive calls there will be in the worst case scenario, you have to find a recurrence relation and solve it with one of the 3 methods:
This is where the problems begin for me:
Example
Let's say I have a binary tree with this structure: pointer to left and right child and value of depth of the node.
There is a function that initially takes the root and wants to perform an operation on each left child of nodes that have odd depths. To solve this with recursion, I'll have to check if node has odd depth, if it has a left child,if yes perform the operation on left child and then make the recursive call to the next node. In this case I think the complexity should be O(n), where n is the number of odd nodes and the worst case is that all odd nodes have a left child.
But what's the recurrence relation in a function like this?

Related

How do i measure the O-notation of a for loop?

i have a question about O-notation. (big O)
In my code, i am using a for loop to iterate through an array of users.
The for loop has if-statements that makes it break out of the loop, if the rigth user is found.
My question is how i measure the O-notation?
Is the O-notation is O(N) as i loop through all the users in the array?
Or is the O-notation O(1), as the loop breaks and never runs again?
O notation defines an "order of" relationship between an amount of work (however measured) and the number of items processed (usually 'n'). So "O(n)" means "in direct proportion to the number of items n". "O(1)" means simply "constant". If a loop processes every item once then the amount of work is intuitively in direct proportion to n, but let's say that your exit condition gets hit on average half way through, we might be tempted to say that this is O(n/2), but instead we still say that it is O(n) because the relationship to n is still direct/linear. Similarly if you were to assess the relationship to be O(7n^3 + 2n), you'd say the relationship was simply O(n^3) because n^3 is the term that dominates as n grows large.
The answer to your specific question is therefore O(n) because the number of iterations is in direct proportion to n. All that this says is that if N user records take M milliseconds to process, 2N should take about 2M milliseconds.
It is probably worth noting that O notation is strictly concerned with worst case and not the average cost of algorithms (although I have started to find that it is quite common for people to use it in the latter sense). It is always a good idea to specify to avoid ambiguity.
Big O notation answers the following two questions:
If there are N data elements, how many steps will the algorithm take?
How will the performance of the algorithm change if the number of data elements increases?
Best-case scenario in your case is that the user you are searching for is found at the first index. Time complexity in this case would be O(1) because number of steps taken by the algorithm are constant and do not change if the number of elements in the array are changed.
The worst-case scenario is that your loop will have to iterate over all the users. That makes the time complexity to be O(N) because number of steps taken by the algorithm will be directly proportional to the number of elements in the array.
Big O notation generally refers to the worst-case scenario, so you can say that the time complexity in your case is O(N).
Best case complexity of for loop is O(1) and worst case complexity is O(N). In linear search best case is O(N) and worst case is O(N). It also depends on the approach followed by you to solve problem. Like for(int i = n; i>1; i=i/2) in this case complexity is O(log(N). Complexity of if else condition is O(1).

Simple algorithmic complexity of two nested loops

I guess it's rather simple but it seems I'm troubling myself..
what's the complexity of the following?
// let's say that Q has M initial items
while Q not empty
v <- Q.getFirst
for each z in v // here, every v cannot have more than 3 z's
...
O(1) operations here
...
Q.insert(z)
end
end
The number of the times this will happen, depends on if at some point v's do not have more z's (let's call this number N)
Is the complexity O(MxN^2) or I'm wrong? It's like having a tree with M parent nodes and each node, at most, can have three children. N is the total number of nodes.
Your algorithmic complexity should have an upper bound of O( (M * v) - parent nodes that are children nodes ) which is much better stated as O(n) where n is the number of nodes in your tree, since you only iterate the tree once.
Depending on your operation, you would want to consider the runtime of your Q.insert(z) and Q.getFirst() operation, because depending on your data structure that may be worth considering.
Assuming Q.insert() and Q.getFirst() runtimes are O(1), you can say O(M * v) is an approximate bounding, but since v elements can be repeated, you are better off stating that the runtime is just O(n) because O(m*v) actually overestimates the upper bound in all cases. O(n) is exact for every instance of the tree (n being the number of nodes).
I would say that it's much more safe to call it O(n) since I don't know the exact implementation of your insert - although with a linked list both insert and the get first can be O(1) operations. (Most binary tree inserts will be O(log n) if properly implemented - sufficient information was not provided)
It should not harm you to play it safe and consider your runtime analysis O(n), but depending on who you're pitching it to, that extra variable may seem unnecessary.
HTH
edited: clarity of problem in comments helped me understand the question better, fixed nonsense

Time complexity of an algorithm - n or n*n?

I'm trying to find out which is the Theta complexity of this algorithm.
(a is a list of integers)
def sttr(a):
for i in xrange(0,len(a)):
while s!=[] and a[i]>=a[s[-1]]:
s.pop()
s.append(i)
return s
On the one hand, I can say that append is being executed n (length of a array) times, so pop too and the last thing I should consider is the while condition which could be executed probably 2n times at most.
From this I can say that this algorithm is at most 4*n so it is THETA(n).
But isn't it amortised analysis?
On the other hand I can say this:
There are 2 nested cycles. The for cycle is being executed exactly n times. The while cycle could be executed at most n times since I have to remove item in each iteration. So the complexity is THETA(n*n).
I want to compute THETA but don't know which of these two options is correct. Could you give me advice?
The answer is THETA(n) and your arguments are correct.
This is not amortized analysis.
To get to amortized analysis you have to look at the inner loop. You can't easily say how fast the while will execute if you ignore the rest of the algorithm. Naive approach would be O(N) and that's correct since that's the maximum number of iterations. However, since we know that the total number of executions is O(N) (your argument) and that this will be executed N time we can say that the complexity of the inner loop is O(1) amortized.

Best and worst case time for Algorithm S when time complexity changes in accordance to n being even/odd

The following is a homework assignment, so I would rather get hints or bits of information that would help me figure this out, and not complete answers.
Consider S an algorithm solution to a problem that takes as input an array A of size n. After analysis, the following conclusion was obtained:
Algorithm S executes an O(n)-time computation for each even number in A.
Algorithm S executes an O(logn)-time computation for each odd number in A.
What are the best and worst case time for algorithm S?
From this I understand that the time complexity changes in accordance to n being even or odd. In other words, if n is even, S takes O(n) time and when n is odd, S takes O(logn).
Is it a simple matter of taking the best case and the worst case of both growth-rates, and choosing their boundaries? Meaning:
Best case of O(n) is O(1), and worst case is O(n).
Best case of O(logn) is O(logn) and worst case is O(logn).
Therefore the best case for Algorithm S is O(logn) and the worst case is O(n)?
Am I missing something? or am I wrong in assessing the different best/worst case of both cases of big-Oh?
1st attempt:
Ok, so I completely misunderstood the problem. Thanks to candu, I can now better understand what is required of me, and so try to calculate the best and worst case better.
It seems that Algorithm S changes its runtime according to EACH number in A. If the number is even, the runtime is O(n), and if the number is odd, we get O(logn).
The worst case will be composed of an array A of n even numbers, and for each the algorithm will run O(n). In other words, the worst case runtime for Algorithm S should be n*O(n).
The best case will be composed of an array A of n odd numbers, and for each the algorithm will run O(logn). The best case runtime for algorithm S should be n*O(logn).
Am I making any sense? is it true then that:
Best case of algorithm S is nO(logn) and worst case is nO(n)?
If that is true, can it be rewritten? for example, as O(log^n(n)) and O(n^n)? or is this an arithmetic mistake?
2nd attempt:
Following JuanLopes' response, it seems like I can rewrite nO(n) as O(n*n) or O(n^2), and nO(logn) as O(nlogn).
Does it make sense now that Algorithm S runs at O(nlogn) at the best case, and O(n^2) at the worst case?
There's a bit of confusion here: the algorithm runtime doesn't depend on n being even or odd, but on whether the numbers in A are even or odd.
With that in mind, what sort of input A would make Algorithm S run faster? Slower?
Also: it doesn't make sense to say that the best case of O(n) is O(1). Suppose I have an algorithm ("Algorithm Q") that is O(n); all I can say is that there exists a constant c such that, for any input of size n, Algorithm Q takes less than cn time. There is no guarantee that I can find specific inputs for which Algorithm Q is O(1).
To give a concrete example, this takes linear time no matter what input it is passed:
def length(A):
len = 0
for x in A:
len += 1
return len
A few thoughts.
First, there is no mention of asymptotically tight time. So an O(n) algorithm can actually be an O(logn) one. So just imagine the best case running time this algorithm can be in this case. I know, this is a little picky. But this is a homework, I guess it's always welcome to mention all the possibilities.
Second, even if it's asymptotically tight, it doesn't necessarily mean it's tight for all elements. Consider insertion sort. For each new element to insert, we need to find the correct position in the previous already-sorted subarray. The time is proportional to the number of element in subarray, which has the upper bound O(n). But it doesn't mean each new element need exactly #n comparisons to insert. Actually, the shorter the subarray, the quicker the insertion.
Back to this question. "executes an O(logn)-time computation for each odd number in A." Let's assume all odd nubmers. It could be that the first odd takes O(log1), the second odd takes O(log2), .. the nth takes O(logn). Totally, it takes O(logn!). It doesn't contradicts "O(logn) for each odd number".
As to worst case, you may analysize it in much the same way.

What's wrong with this inductive proof that mergesort is O(n)?

Comparison based sorting is big omega of nlog(n), so we know that mergesort can't be O(n). Nevertheless, I can't find the problem with the following proof:
Proposition P(n): For a list of length n, mergesort takes O(n) time.
P(0): merge sort on the empty list just returns the empty list.
Strong induction: Assume P(1), ..., P(n-1) and try to prove P(n). We know that at each step in a recursive mergesort, two approximately "half-lists" are mergesorted and then "zipped up". The mergesorting of each half list takes, by induction, O(n/2) time. The zipping up takes O(n) time. So the algorithm has a recurrence relation of M(n) = 2M(n/2) + O(n) which is 2O(n/2) + O(n) which is O(n).
Compare the "proof" that linear search is O(1).
Linear search on an empty array is O(1).
Linear search on a nonempty array compares the first element (O(1)) and then searches the rest of the array (O(1)). O(1) + O(1) = O(1).
The problem here is that, for the induction to work, there must be one big-O constant that works both for the hypothesis and the conclusion. That's impossible here and impossible for your proof.
The "proof" only covers a single pass, it doesn't cover the log n number of passes.
The recurrence only shows the cost of a pass as compared to the cost of the previous pass. To be correct, the recurrence relation should have the cumulative cost rather than the incremental cost.
You can see where the proof falls down by viewing the sample merge sort at http://en.wikipedia.org/wiki/Merge_sort
Here is the crux: all induction steps which refer to particular values of n must refer to a particular function T(n), not to O() notation!
O(M(n)) notation is a statement about the behavior of the whole function from problem size to performance guarantee (asymptotically, as n increases without limit). The goal of your induction is to determine a performance bound T(n), which can then be simplified (by dropping constant and lower-order factors) to O(M(n)).
In particular, one problem with your proof is that you can't get from your statement purely about O() back to a statement about T(n) for a given n. O() notation allows you to ignore a constant factor for an entire function; it doesn't allow you to ignore a constant factor over and over again while constructing the same function recursively...
You can still use O() notation to simplify your proof, by demonstrating:
T(n) = F(n) + O(something less significant than F(n))
and propagating this predicate in the usual inductive way. But you need to preserve the constant factor of F(): this constant factor has direct bearing on the solution of your divide-and-conquer recurrence!

Resources