Disjoint sets with forest implementation without path compression - algorithm

Consider a forest implementation of disjoint sets with only the weighted union heuristics (NO PATH COMPRESSION!) with n distinct elements. Define T(n,m) to be the worst case time complexity of executing a sequence of n-1 unions and m finds in any order, where m is any positive integer greater than n.
I defined T(n,m) to be the sequence of doing n-1 unions and then m finds AFTERWARDS because doing the find operation on the biggest tree possible would take the longest. Accordingly, T(n,m) = m*log(n) + n - 1 because each union takes O(1) so n-1 unions is n-1 steps, and each find takes log(n) steps per as the height of the resultant tree after n-1 unions is bounded by log_2 (n).
My problem now is, does the T(n,m) chosen look fine?
Secondly, is T(n,m) Big Omega (m*log(n)) ? My claim is that it is with c = 1 and n >= 2, given that the smallest possible T(n,m) is m*log(2) + 1, which is obviously greater than m*log(2). Seems rather stupid to ask this but it seemed rather too easy for a solution, so I have my suspicions regarding my correctness.
Thanks in advance.

Yes to T(n, m) looking fine, though I suppose you could give a formal induction proof that the worst-case is unions followed by finds.
As for proving that T(n, m) is Ω(m log(n)), you need to show that there exist n0 and m0 and c such that for all n ≥ n0 and all m ≥ m0, it holds that T(n, m) ≥ c m log(n). What you've written arguably shows this only for n = 2.

Related

Master theorem solution for the case d=log_b(a)

From Dasgupta's Algorithms: if the running time of a divide and conquer algorithm is described by the recurrence T(n)=aT(n/b)+O(n^d), then its solution is:
T(n)=O(n^d) if d>log_b(a)
T(n)=O(n^log_b(a)) if d<log_b(a)
T(n)=O(n^d*log_2(n)) if d=log_b(a)
where each subproblem's size is decreasing by b in the next recursion call, a is the branching factor and O((n/b^k)^d) is the time for deviding and combining the subproblems on the level k for each subproblem.
Cases 1 and 2 are straightforward - they are taken from the geometric series formed when summing the work done at each level of the recursion tree, which is a^k*O((n\b^k)^d)=O(n^d)*(a/b^d)^k.
Where does the log_2(n) come up from in case 3? When d=log_b(a), the ratio a/b^d equals 1, hence the sum of the series is n^d*log_b(a), not n^d*log_2(n)
As a simpler example, first note that O(log n), O(log137 n), and O(log16 n) mean the same thing. The reason for this is that, by the change of basis formula for logarithms, for any fixed constant m we have
log_m n = log n / log m = (1 / log m) · log n = O(log n).
The Master Theorem assumes that a, b, and d are constants. From the change of basis formula for logarithms, we have that
In that sense, O(nd logb n) = O(nd log n), since b is a constant here.
As a note, it’s unusual to see something written out as O(nd log2 n), since the log base here doesn’t matter and just contributes to the (already hidden) constant factor.

Show 2^n is O(n!)

I am struggling to understand why they are equal. Help would be appreciated.
I have tried saying how 2^n implies doubling n times but I am not sure how that is similar to a factorial.
To prove that 2n is O(n!), you need to show that 2n ≤ M·n!, for some constant M and all values of n ≥ C, where C is also some constant.
So let's choose M = 2 and C = 1.
For n = C, we see that 2n = 2 and M·n! = 2, so indeed in that base case the 2n ≤ M·n! is true.
Assuming it holds true for some n (≥ C), does it also hold for n+1? Yes, because if 2n ≤ M·n! then also 2n+1 ≤ M·(n+1)!
The left side gets multiplied with 2, while the right side gets multiplied with at least 2.
So this proves by induction that 2n ≤ M·n! for all n ≥ C, for the chosen values for M and C. By consequence 2n is O(n!).
2^n and n! are not "equal". In formal mathematics, there is an important distinction that is often overlooked when people say "function a is O of b". It just means that asymptotically, b is an upper bound of a. This means that, technically, n is O(n!), 1 is O(n!), etc. These are trivial examples. Likewise, n! is not O(2^n).
Informally, especially in computer science, the big O notation often
can be used somewhat differently to describe an asymptotic tight bound
where using big Theta Θ notation might be more factually appropriate
in a given context.
wikipedia

Big O notation for the complexity function of the fourth root of n

I am expected to find the Big O notation for the following complexity function: f(n) = n^(1/4).
I have come up with a few possible answers.
The more accurate answer would seem to be O(n^1/4). However, since it contains a root, it isn't a polynomial, and I've never seen this n'th rooted n in any textbook or online resource.
Using the mathematical definition, I can try to define an upper-bound function with a specified n limit. I tried plotting n^(1/4) in red with log2 n in blue and n in green.
The log2 n curve intersects with n^(1/4) at n=2.361 while n intersects with n^(1/4) at n=1.
Given the formal mathematical definition, we can come up with two additional Big O notations with different limits.
The following shows that O(n) works for n > 1.
f(n) is O(g(n))
Find c and n0 so that
n^(1/4) ≤ cn
where c > 0 and n ≥ n0
C = 1 and n0 = 1
f(n) is O(n) for n > 1
This one shows that O(log2 n) works for n > 3.
f(n) is O(g(n))
Find c and n0 so that
n^(1/4) ≤ clog2 n
where c > 0 and n ≥ n0
C = 1 and n0 = 3
f(n) is O(log2 n) for n > 3
Which Big O description of the complexity function would be typically used? Are all 3 "correct"? Is it up to interpretation?
Using O(n^1/4) is perfectly fine for big O notation. Here are some examples of fractures in exponents from real life examples
O(n) is also correct (because big O giving only upper bound), but it is not tight, so n^1/4 is in O(n), but not in Theta(n)
n^1/4 is NOT in O(log(n)) (proof guidelines follows).
For any value r>0, and for large enough value of n, log(n) < n^r.
Proof:
Have a look on log(log(n)) and r*log(n). The first is clearly smaller than the second for large enough values. In big O notation terminology, we can definetly say that the r*log(n)) is NOT in O(log(log(n)), and log(log(n)) is(1), so we can say that:
log(log(n)) < r*log(n) = log(n^r) for large enough values of n
Now, exponent each side with base of e. Note that both left hand and right hand values are positives for large enough n:
e^log(log(n)) < e^log(n^r)
log(n) < n^r
Moreover, with similar way, we can show that for any constant c, and for large enough values of n:
c*log(n) < n^r
So, by definition it means n^r is NOT in O(log(n)), and your specific case: n^0.25 is NOT in O(log(n)).
Footnotes:
(1) If you are still unsure, create a new variable m=log(n), is it clear than r*m is not in O(log(m))? Proving it is easy, if you want an exercise.

Are all algorithms with a constant base to the n (e.g.: k^n) of the same time complexity?

In this example: http://www.wolframalpha.com/input/?i=2%5E%281000000000%2F2%29+%3C+%283%2F2%29%5E1000000000
I noticed that those two equations are pretty similar no matter how high you go in n. Do all algorithms with a constant to the n fall in the same time complexity category? Such as 2^n, 3^n, 4^n, etc.
They are in the same category, This does not mean their complexity is the same. They are exponential running time algorithms. Obviously 2^n < 4^n
We can see 4^n/2^n = 2^2n/2^n = 2^n
This means 4^n algorithm exponential slower(2^n times) than 2^n
The same thing happens with 3^n which is 1.5^n.
But this does not mean 2^n is something far less than 4^n, It is still exponential and will not be feasible when n>50.
Note this is happening due to n is not in the base. If they were in the base like this:
4n^k vs n^k then this 2 algorithms are asymptotically the same(as long as n is relatively small than actually data size). They would be different by linear time, just like O(n) vs c * O(n)
The time complexities O(an) and O(bn) are not the same if 1 < a < b. As a quick proof, we can use the formal definition of big-O notation to show that bn ≠ O(an).
This works by contradiction. Suppose that bn = O(an) and that 1 < a < b. Then there must be some c and n0 such that for any n ≥ n0, we have that bn ≤ c · an. This means that bn / an ≤ c for any n ≥ n0. Since b > a, it should start to become clear that this is impossible - as n grows larger, bn / an = (b / a)n will get larger and larger. In particular, if we pick any n ≥ n0 such that n > logb / a c, then we will have that
(b / a)n > (b / a)log(b/a) c = c
So, if we pick n = max{n0, c + 1}, then it's not true that bn ≤ c · an, contradicting our assumption that bn = O(an).
This means, in particular, that O(2n) ≠ O(1.5n) and that O(3n) ≠ O(2n). This is why when using big-O notation, it's still necessary to specify the base of any exponents that end up getting used.
One more thing to notice - although it looks like 21000000000/2 is approximately 1.41000000000/2, notice that these are totally different numbers. The first is of the form 10108.1ish and the second of the form 10108.2ish. That might not seem like a big difference, but it's absolutely colossal. Take, for example, 10101 and 10102. This first number is 1010, which is 10 billion and takes ten digits to write out. The second is 10100, one googol, which takes 100 digits to write out. There's a huge difference between them - the first of them is close to the world population, while the second is about the total number of atoms in the universe!
Hope this helps!

Meaning of lg * N in Algorithmic Analysis

I'm currently reading about algorithmic analysis and I read that a certain algorithm (weighted quick union with path compression) is of order N + M lg * N. Apparently though this is linear because lg * N is a constant in this universe. What mathematical operation is being referred to here. I am unfamiliar with the notation lg * N.
The answers given here so far are wrong. lg* n (read "log star") is the iterated logarithm. It is defined as recursively as
0 if n <= 1
lg* n =
1 + lg*(lg n) if n > 1
Another way to think of it is the number of times that you have to iterate logarithm before the result is less than or equal to 1.
It grows extremely slowly. You can read more on Wikipedia which includes some examples of algorithms for which lg* n pops up in the analysis.
I'm assuming you're talking about the algorithm analyzed on slide 44 of this lecture:
http://www.cs.princeton.edu/courses/archive/fall05/cos226/lectures/union-find.pdf
Where they say "lg * N is a constant in this universe" I believe they aren't being entirely literal.
lg*N does appear to increase with N as per their table on the right side of the slide; it just happens to grow at such a slow rate that it can't be considered much else (N = 2^65536 -> log*n = 5). As such it seems they're saying that you can just ignore the log*N as a constant because it will never increase enough to cause a problem.
I could be wrong, though. That's simply how I read it.
edit: it might help to note that for this equation they're defining "lg*N" to be 2^(lg*(N-1)). Meaning that an N value of 2^(2^(65536)) [a far larger number] would give lg*N = 6, for example.
The recursive definition of lg*n by Jason is equivalent to
lg*n = m when 2 II m <= n < 2 II (m+1)
where
2 II m = 2^2^...^2 (repeated exponentiation, m copies of 2)
is Knuth's double up arrow notation. Thus
lg*2= 1, lg*2^2= 2, lg*2^{2^2}= 3, lg*2^{2^{2^2}} = 4, lg*2^{2^{2^{2^2}}} = 5.
Hence lg*n=4 for 2^{16} <= n < 2^{65536}.
The function lg*n approaches infinity extremely slowly.
(Faster than an inverse of the Ackermann function A(n,n) which involves n-2 up arrows.)
Stephen
lg is "LOG" or inverse exponential. lg typically refers to base 2, but for algorithmic analysis, the base usually doesnt matter.
lg n refers to log base n. It is the answer to the equation 2^x = n. In Big O complexity analysis, the base to log is irrelevant. Powers of 2 crop up in CS, so it is no surprise if we have to choose a base, it will be base 2.
A good example of where it crops up is a fully binary tree of height h, which has 2^h-1 nodes. If we let n be the number of nodes this relationship is the tree is height lg n with n nodes. The algorithm traversing this tree takes at most lg n to see if a value is stored in the tree.
As to be expected, wiki has great additional info.
Logarithm is denoted by log or lg. In your case I guess the correct interpretation is N + M * log(N).
EDIT: The base of the logarithm does not matter when doing asymptotic complexity analysis.

Resources