Trying to figure out why complexity is O(n) for this code:
int sum(Node node) {
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
Node is:
class Node {
int value;
Node left;
Node right;
}
This is from CCI book. Shouldn't it be O(2^n) since it iterates through each node?
Yet this one is O(2^n), which is clear to me why:
int f(int n) {
if (n <= 1) {
return 1;
}
return f(n - 1) + f(n - 1);
}
Thanks for help.
An algorithm is said to take linear time, or O(n) time, if its time
complexity is O(n). Informally, this means that for large enough input
sizes the running time increases linearly with the size of the input.
For example, a procedure that adds up all elements of a list requires
time proportional to the length of the list.
From Wikipedia
It is very reasonable that the alogrithm complexity is O(n) since the recursive function numbers of calls is proportional to the number of items in the tree, there is n items in the tree and we will pass each item only once, this sounds very linear realtion to me.
In contrast to the other algorithm which is very similar to recursive Fibonacci Sequence algorithm, it this algorithm we will pass each number from 1 until n much more times than once and not linear proportional to n either, this explains why it has O(2^n) complexity.
Related
I'm trying to understand the time and space complexity of an algorithm for generating an array's permutations. Given a partially built permutation where k out of n elements are already selected, the algorithm selects element k+1 from the remaining n-k elements and calls itself to select the remaining n-k-1 elements:
public static List<List<Integer>> permutations(List<Integer> A) {
List<List<Integer>> result = new ArrayList<>();
permutations(A, 0, result);
return result;
}
public static void permutations(List<Integer> A, int start, List<List<Integer>> result) {
if(A.size()-1==start) {
result.add(new ArrayList<>(A));
return;
}
for (int i=start; i<A.size(); i++) {
Collections.swap(A, start, i);
permutations(A, start+1, result);
Collections.swap(A, start, i);
}
}
My thoughts are that in each call we swap the collection's elements 2n times, where n is the number of elements to permute, and make n recursive calls. So the running time seems to fit the recurrence relation T(n)=nT(n-1)+n=n[(n-1)T(n-2)+(n-1)]+n=...=n+n(n-1)+n(n-1)(n-2)+...+n!=n![1/(n-1)!+1/(n-2)!+...+1]=n!e, hence the time complexity is O(n!) and the space complexity is O(max(n!, n)), where n! is the total number of permutations and n is the height of the recursion tree.
This problem is taken from the Elements of Programming Interviews book, and they're saying that the time complexity is O(n*n!) because "The number of function calls C(n)=1+nC(n-1) ... [which solves to] O(n!) ... [and] ... we do O(n) computation per call outside of the recursive calls".
Which time complexity is correct?
The time complexity of this algorithm, counted by the number of basic operations performed, is Θ(n * n!). Think about the size of the result list when the algorithm terminates-- it contains n! permutations, each of length n, and we cannot create a list with n * n! total elements in less than that amount of time. The space complexity is the same, since the recursion stack only ever has O(n) calls at a time, so the size of the output list dominates the space complexity.
If you count only the number of recursive calls to permutations(), the function is called O(n!) times, although this is usually not what is meant by 'time complexity' without further specification. In other words, you can generate all permutations in O(n!) time, as long as you don't read or write those permutations after they are generated.
The part where your derivation of run-time breaks down is in the definition of T(n). If you define T(n) as 'the run-time of permutations(A, start) when the input, A, has length n', then you can not define it recursively in terms of T(n-1) or any other function of T(), because the length of the input in all recursive calls is n, the length of A.
A more useful way to define T(n) is by specifying it as the run-time of permutations(A', start), when A' is any permutation of a fixed, initial array A, and A.length - start == n. It's easy to write the recurrence relation here:
T(x) = x * T(x-1) + O(x) if x > 1
T(1) = A.length
This takes into account the fact that the last recursive call, T(1), has to perform O(A.length) work to copy that array to the output, and this new recurrence gives the result from the textbook.
Binary tree identical or not with another binary tree code below gives linear complexity i.e big O (n) where n is number of node of the binary tree with least number of nodes.
boolean identical(Node a, Node b)
{
if (a == null && b == null)
return true;
if (a != null && b != null)
return (a.data == b.data
&& identical(a.left, b.left)
&& identical(a.right, b.right));
/* 3. one empty, one not -> false */
return false;
}
(Fibonacci series using recursion gives exponential complexity)
Complexity of below code is 2^n.
class Fibonacci {
static int fib(int n)
{
if (n <= 1)
return n;
return fib(n-1) + fib(n-2);
}
public static void main (String args[])
{
int n = 9;
}
}
My question is both are looking similar but one has linear complexity and another has exponential. Could anyone clarify on both algorithms.
Fibonacci Series
If you build a tree for the recursive code to generate the fibonacci series, it will be like:
fib(n)
fib(n-1) fib(n-2)
fib(n-2) fib(n-3) fib(n-3) fib(n-4)
at what level you will encounter fib(1) so that the tree can "stop" ?
at ( n-1 )th level you will encounter fib(1) and there the recursion stops.
The number of nodes will be of order of 2^n because there are (n-1) levels.
Binary Tree Comparison
Lets consider your binary tree comparison.
Lets assume both are complete binary trees. According to your algorithm it will visit all nodes once and if 'h' is the height
of the tree , the number of nodes will be order of 2^h. You can say the complexity in that case as O(2^h).
The O(n) in this case is equivalent to O(2^h)
The difference originates in a different definition of n. While the naive recursive algorithm for Fibonacci numbers also performs a kind of traversal in a graph, the value of n is not defined by the number of nodes in that graph, but by the input number.
The binary tree comparison however, has n defined as a number of nodes.
So n has a completely different meaning in these two algorithms, and it explains why the time complexity in terms of n comes out so differently.
void call(int n)
{
for (int j=1;j<=n;j++)
{
call(n/2);
}
}
void main()
{
int i;
for (i=1;i<=n;i++)
{
call(i);
}
}
For the time complexity of this loop. Is this thought process correct? In the main function, the loop is O(N). In the call function, the loop is O(N), which the recursion is n/2, therefore the O(logN)with base 2. So the overall time complexity of in the main is O(N)*[O(N)*O(LogN)]= O(N^2 Log N)?
you can use recursion tree to figure out the number of calls and the order of recursion function is equal to the number of nodes in the recursion tree (leaves are call(n/2) that is not showing):
so to calculate the number of all nodes you can calculate summation and estimate the order (using geometric sequence by formula to calculate summation) :
Order of the main loop is less than , so main loop order is
In another question about finding an algorithm to compute the diameter of a binary tree the following code is provided as a possible answer to the problem.
public static int getDiameter(BinaryTreeNode root) {
if (root == null)
return 0;
int rootDiameter = getHeight(root.getLeft()) + getHeight(root.getRight()) + 1;
int leftDiameter = getDiameter(root.getLeft());
int rightDiameter = getDiameter(root.getRight());
return Math.max(rootDiameter, Math.max(leftDiameter, rightDiameter));
}
public static int getHeight(BinaryTreeNode root) {
if (root == null)
return 0;
return Math.max(getHeight(root.getLeft()), getHeight(root.getRight())) + 1;
}
In the comments section it's being said that the time complexity of the above code is O(n^2). At a given call of the getDiameter function, the getHeight and the getDiameter functions are called for the left and right subtrees.
Let's consider the average case of a binary tree. Height can be computed at Θ(n) time (true for worst case too). So how do we compute the time complexity for the getDiameter function?
My two theories
Τ(n) = 4T(n/2) + Θ(1) = Θ(n^2), height computation is considered
(same?) subproblem.
T(n) = 2T(n/2) + n + Θ(1) = Θ(nlogn), n = 2*n/2 for height computation?
Thank you for your time and effort!
One point of confusion is that you think the binary tree is balanced. Actually, it can be a line. In this case, we need n operations from the root to the leaf to find the height, n - 1 from the root's child to the leaf and so on. This gives O(n^2) operations to find the height alone for all nodes.
The algorithm could be optimised if the height of each node was calculated independently, before finding the diameter. Then we would spend O(n) time for finding all heights. Then the complexity of finding the diameter would be of the following type:
T(n) = T(a) + T(n - 1 - a) + 1
where a is the size of the left subtree. This relation would give linear time for finding diameter also. So the total time would be linear.
How can we remove the median of a set with time complexity O(log n)? Some idea?
If the set is sorted, finding the median requires O(1) item retrievals. If the items are in arbitrary sequence, it will not be possible to identify the median with certainty without examining the majority of the items. If one has examined most, but not all, of the items, that will allow one to guarantee that the median will be within some range [if the list contains duplicates, the upper and lower bounds may match], but examining the majority of the items in a list implies O(n) item retrievals.
If one has the information in a collection which is not fully ordered, but where certain ordering relationships are known, then the time required may require anywhere between O(1) and O(n) item retrievals, depending upon the nature of the known ordering relation.
For unsorted lists, repeatedly do O(n) partial sort until the element located at the median position is known. This is at least O(n), though.
Is there any information about the elements being sorted?
For a general, unsorted set, it is impossible to reliably find the median in better than O(n) time. You can find the median of a sorted set in O(1), or you can trivially sort the set yourself in O(n log n) time and then find the median in O(1), giving an O(n logn n) algorithm. Or, finally, there are more clever median selection algorithms that can work by partitioning instead of sorting and yield O(n) performance.
But if the set has no special properties and you are not allowed any pre-processing step, you will never get below O(n) by the simple fact that you will need to examine all of the elements at least once to ensure that your median is correct.
Here's a solution in Java, based on TreeSet:
public class SetWithMedian {
private SortedSet<Integer> s = new TreeSet<Integer>();
private Integer m = null;
public boolean contains(int e) {
return s.contains(e);
}
public Integer getMedian() {
return m;
}
public void add(int e) {
s.add(e);
updateMedian();
}
public void remove(int e) {
s.remove(e);
updateMedian();
}
private void updateMedian() {
if (s.size() == 0) {
m = null;
} else if (s.size() == 1) {
m = s.first();
} else {
SortedSet<Integer> h = s.headSet(m);
SortedSet<Integer> t = s.tailSet(m + 1);
int x = 1 - s.size() % 2;
if (h.size() < t.size() + x)
m = t.first();
else if (h.size() > t.size() + x)
m = h.last();
}
}
}
Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.
Edit: To help understand the code, here's the invariant condition of the class attributes:
private boolean isGood() {
if (s.isEmpty()) {
return m == null;
} else {
return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
}
}
In human-readable form:
If the set "s" is empty, then "m" must be
null.
If the set "s" is not empty, then it must
contain "m".
Let x be the number of elements
strictly less than "m", and let y be
the number of elements greater than
or equal "m". Then, if the total
number of elements is even, x must be
equal to y; otherwise, x+1 must be
equal to y.
Try a Red-black-tree. It should work quiet good and with a binary search you get ur log(n). It has aswell a remove and insert time of log(n) and rebalancing is done in log(n) aswell.
As mentioned in previous answers, there is no way to find the median without touching every element of the data structure. If the algorithm you look for must be executed sequentially, then the best you can do is O(n). The deterministic selection algorithm (median-of-medians) or BFPRT algorithm will solve the problem with a worst case of O(n). You can find more about that here: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
However, the median of medians algorithm can be made to run faster than O(n) making it parallel. Due to it's divide and conquer nature, the algorithm can be "easily" made parallel. For instance, when dividing the input array in elements of 5, you could potentially launch a thread for each sub-array, sort it and find the median within that thread. When this step finished the threads are joined and the algorithm is run again with the newly formed array of medians.
Note that such design would only be beneficial in really large data sets. The additional overhead that spawning threads has and merging them makes it unfeasible for smaller sets. This has a bit of insight: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html
Note that you can find asymptotically faster algorithms out there, however they are not practical enough for daily use. Your best bet is the already mentioned sequential median-of-medians algorithm.
Master Yoda's randomized algorithm has, of course, a minimum complexity of n like any other, an expected complexity of n (not log n) and a maximum complexity of n squared like Quicksort. It's still very good.
In practice, the "random" pivot choice might sometimes be a fixed location (without involving a RNG) because the initial array elements are known to be random enough (e.g. a random permutation of distinct values, or independent and identically distributed) or deduced from an approximate or exactly known distribution of input values.
I know one randomize algorithm with time complexity of O(n) in expectation.
Here is the algorithm:
Input: array of n numbers A[1...n] [without loss of generality we can assume n is even]
Output: n/2th element in the sorted array.
Algorithm ( A[1..n] , k = n/2):
Pick a pivot - p universally at random from 1...n
Divided array into 2 parts:
L - having element <= A[p]
R - having element > A[p]
if(n/2 == |L|) A[|L| + 1] is the median stop
if( n/2 < |L|) re-curse on (L, k)
else re-curse on (R, k - (|L| + 1)
Complexity:
O( n)
proof is all mathematical. One page long. If you are interested ping me.
To expand on rwong's answer: Here is an example code
// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main () {
int myints[] = {9,8,7,6,5,4,3,2,1};
vector<int> myvector (myints, myints+9);
vector<int>::iterator it;
partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());
// print out content:
cout << "myvector contains:";
for (it=myvector.begin(); it!=myvector.end(); ++it)
cout << " " << *it;
cout << endl;
return 0;
}
Output:
myvector contains: 1 2 3 4 5 9 8 7 6
The element in the middle would be the median.