Master theorem solution for the case d=log_b(a) - algorithm

From Dasgupta's Algorithms: if the running time of a divide and conquer algorithm is described by the recurrence T(n)=aT(n/b)+O(n^d), then its solution is:
T(n)=O(n^d) if d>log_b(a)
T(n)=O(n^log_b(a)) if d<log_b(a)
T(n)=O(n^d*log_2(n)) if d=log_b(a)
where each subproblem's size is decreasing by b in the next recursion call, a is the branching factor and O((n/b^k)^d) is the time for deviding and combining the subproblems on the level k for each subproblem.
Cases 1 and 2 are straightforward - they are taken from the geometric series formed when summing the work done at each level of the recursion tree, which is a^k*O((n\b^k)^d)=O(n^d)*(a/b^d)^k.
Where does the log_2(n) come up from in case 3? When d=log_b(a), the ratio a/b^d equals 1, hence the sum of the series is n^d*log_b(a), not n^d*log_2(n)

As a simpler example, first note that O(log n), O(log137 n), and O(log16 n) mean the same thing. The reason for this is that, by the change of basis formula for logarithms, for any fixed constant m we have
log_m n = log n / log m = (1 / log m) · log n = O(log n).
The Master Theorem assumes that a, b, and d are constants. From the change of basis formula for logarithms, we have that
In that sense, O(nd logb n) = O(nd log n), since b is a constant here.
As a note, it’s unusual to see something written out as O(nd log2 n), since the log base here doesn’t matter and just contributes to the (already hidden) constant factor.

Related

O(n log n) vs O(m) for algorithm

I am finding an algorithm for a problem where I have two sets A and B of points with n and m points. I have two algorithms for the sets with complexity O(n log n) and O(m) and I am now wondering whether the complexity for the both algorithms combined is O(n log n) or O(m).
Basically, I am wondering whether there is some relation between m and n which would result in O(m).
If m and n are truly independent of one another and neither quantity influences the other, then the runtime of running an O(n log n)-time algorithm and then an O(m)-time algorithm is will be O(n log n + m). Neither term dominates the other - if n gets huge compared to m then the n log n part dominates, and if m is huge relative to n then the m term dominates.
This gets more complicated if you know how m and n relate to one another in some way. Many graph algorithms, for example, use m to denote the number of edges and n to denote the number of nodes. In those cases, you can sometimes simplify these expressions, but sometimes cannot. For example, the cost of implementing Dijkstra’s algorithm with a Fibonacci heap is O(m + n log n), the same as what we have above.
Size of your input is x: = m + n.
Complexity of a combined (if both are performed at most a constant number of times in the combined algorithm) algorithm is:
O(n log n) + O(m) = O(x log x) + O(x) = O(x log x).
Yes if m ~ n^n, then O(logm) = O(nlogn).
There is a log formula:
log(b^c) = c*log(b)
EDIT:
For both the algos combined the Big O is always the one that is larger because we are concerned about the asymptotic upper bound.
So it will depend on value of n and m. Eg: While n^n < m, the complexity is Olog(m), after that it becomes O(nlog(n)).
For Big-O notation we are only concerned about the larger values, so if n^n >>>> m then it is O(nlog(n)), else if m >>>> n^n then it is O(logm)

Dijkstra's: Where is the equation from? m < n^2/log n

In this passage from my textbook:
where are the inequalities from? (The ones that I've marked with red rectangles.) I feel that they describe a relationship between vertices and edges in a graph, but I don't understand it.
You have two implementations of Dijkstra’s algorithm to choose from. One runs in time O((m + n) log n) = O(m log n), assuming the graph is connected. The other runs in time O(n2). The question is where the crossover point is between these two runtimes. Equating and simplifying gives that
m log n = n2
m = n2 / log n
So if m is asymptotically smaller than n2 / log n, you’d prefer the heap implementation, and if m is asymptotically bigger than n2 / log n you’d prefer the unsorted sequence approach.
(Note that, with a Fibonacci heap, the runtime of Dijkstra’s algorithm is O(m + n log n), which is never asymptotically worse than O(n2).)

Find 3 elements in each of 3 arrays that sum to a given value

Let A , B, C be 3 arrays of n elements each. Find an algorithm for determining whether there exist an a in A, b in B, c in C such that a+b+c = k.
I have tried the following algorithm, but it takes O(n²):
Sort all 3 arrays. - O(n log n)
Temporary array h = k - (a+b) - O(n)
For every h, find c' in B such that c' = h - B[i] - O(n)
Search c' in C using binary search - O(log n)
Total is = O(n log n) + O(n) + O(n² log n)
Can we solve it in O(n log n)?
Your question asks about solving the problem 3SUMx1, in linearithmic time, which is shown to reduce to 3SUMx3 in randomized linear time. See here for the reduction.
Unless you're about to publish something very big, I doubt that there can be such a fast algorithm for your problem, which is at least as hard as 3SUM (you can also show the reduction in the opposite direction with some work, too).
Edit: To make the above paragraph clear, the linear-time reduction from 3SUM proves that OP's problem is $\Omega(n^{1.5})$.
this is just a variation of the 3SUM problem. you cannot solve it in O(n log n)
it can be solved in O(n^2). The algorithm you described is wrong - it is not considering combinations of various indexes from A and B... see https://en.wikipedia.org/wiki/3SUM

Prove 3-Way Quicksort Big-O Bound

For 3-way Quicksort (dual-pivot quicksort), how would I go about finding the Big-O bound? Could anyone show me how to derive it?
There's a subtle difference between finding the complexity of an algorithm and proving it.
To find the complexity of this algorithm, you can do as amit said in the other answer: you know that in average, you split your problem of size n into three smaller problems of size n/3, so you will get, in è log_3(n)` steps in average, to problems of size 1. With experience, you will start getting the feeling of this approach and be able to deduce the complexity of algorithms just by thinking about them in terms of subproblems involved.
To prove that this algorithm runs in O(nlogn) in the average case, you use the Master Theorem. To use it, you have to write the recursion formula giving the time spent sorting your array. As we said, sorting an array of size n can be decomposed into sorting three arrays of sizes n/3 plus the time spent building them. This can be written as follows:
T(n) = 3T(n/3) + f(n)
Where T(n) is a function giving the resolution "time" for an input of size n (actually the number of elementary operations needed), and f(n) gives the "time" needed to split the problem into subproblems.
For 3-Way quicksort, f(n) = c*n because you go through the array, check where to place each item and eventually make a swap. This places us in Case 2 of the Master Theorem, which states that if f(n) = O(n^(log_b(a)) log^k(n)) for some k >= 0 (in our case k = 0) then
T(n) = O(n^(log_b(a)) log^(k+1)(n)))
As a = 3 and b = 3 (we get these from the recurrence relation, T(n) = aT(n/b)), this simplifies to
T(n) = O(n log n)
And that's a proof.
Well, the same prove actually holds.
Each iteration splits the array into 3 sublists, on average the size of these sublists is n/3 each.
Thus - number of iterations needed is log_3(n) because you need to find number of times you do (((n/3) /3) /3) ... until you get to one. This gives you the formula:
n/(3^i) = 1
Which is satisfied for i = log_3(n).
Each iteration is still going over all the input (but in a different sublist) - same as quicksort, which gives you O(n*log_3(n)).
Since log_3(n) = log(n)/log(3) = log(n) * CONSTANT, you get that the run time is O(nlogn) on average.
Note, even if you take a more pessimistic approach to calculate the size of the sublists, by taking minimum of uniform distribution - it will still get you first sublist of size 1/4, 2nd sublist of size 1/2, and last sublist of size 1/4 (minimum and maximum of uniform distribution), which will again decay to log_k(n) iterations (with a different k>2) - which will yield O(nlogn) overall - again.
Formally, the proof will be something like:
Each iteration takes at most c_1* n ops to run, for each n>N_1, for some constants c_1,N_1. (Definition of big O notation, and the claim that each iteration is O(n) excluding recursion. Convince yourself why this is true. Note that in here - "iteration" means all iterations done by the algorithm in a certain "level", and not in a single recursive invokation).
As seen above, you have log_3(n) = log(n)/log(3) iterations on average case (taking the optimistic version here, same principles for pessimistic can be used)
Now, we get that the running time T(n) of the algorithm is:
for each n > N_1:
T(n) <= c_1 * n * log(n)/log(3)
T(n) <= c_1 * nlogn
By definition of big O notation, it means T(n) is in O(nlogn) with M = c_1 and N = N_1.
QED

Meaning of lg * N in Algorithmic Analysis

I'm currently reading about algorithmic analysis and I read that a certain algorithm (weighted quick union with path compression) is of order N + M lg * N. Apparently though this is linear because lg * N is a constant in this universe. What mathematical operation is being referred to here. I am unfamiliar with the notation lg * N.
The answers given here so far are wrong. lg* n (read "log star") is the iterated logarithm. It is defined as recursively as
0 if n <= 1
lg* n =
1 + lg*(lg n) if n > 1
Another way to think of it is the number of times that you have to iterate logarithm before the result is less than or equal to 1.
It grows extremely slowly. You can read more on Wikipedia which includes some examples of algorithms for which lg* n pops up in the analysis.
I'm assuming you're talking about the algorithm analyzed on slide 44 of this lecture:
http://www.cs.princeton.edu/courses/archive/fall05/cos226/lectures/union-find.pdf
Where they say "lg * N is a constant in this universe" I believe they aren't being entirely literal.
lg*N does appear to increase with N as per their table on the right side of the slide; it just happens to grow at such a slow rate that it can't be considered much else (N = 2^65536 -> log*n = 5). As such it seems they're saying that you can just ignore the log*N as a constant because it will never increase enough to cause a problem.
I could be wrong, though. That's simply how I read it.
edit: it might help to note that for this equation they're defining "lg*N" to be 2^(lg*(N-1)). Meaning that an N value of 2^(2^(65536)) [a far larger number] would give lg*N = 6, for example.
The recursive definition of lg*n by Jason is equivalent to
lg*n = m when 2 II m <= n < 2 II (m+1)
where
2 II m = 2^2^...^2 (repeated exponentiation, m copies of 2)
is Knuth's double up arrow notation. Thus
lg*2= 1, lg*2^2= 2, lg*2^{2^2}= 3, lg*2^{2^{2^2}} = 4, lg*2^{2^{2^{2^2}}} = 5.
Hence lg*n=4 for 2^{16} <= n < 2^{65536}.
The function lg*n approaches infinity extremely slowly.
(Faster than an inverse of the Ackermann function A(n,n) which involves n-2 up arrows.)
Stephen
lg is "LOG" or inverse exponential. lg typically refers to base 2, but for algorithmic analysis, the base usually doesnt matter.
lg n refers to log base n. It is the answer to the equation 2^x = n. In Big O complexity analysis, the base to log is irrelevant. Powers of 2 crop up in CS, so it is no surprise if we have to choose a base, it will be base 2.
A good example of where it crops up is a fully binary tree of height h, which has 2^h-1 nodes. If we let n be the number of nodes this relationship is the tree is height lg n with n nodes. The algorithm traversing this tree takes at most lg n to see if a value is stored in the tree.
As to be expected, wiki has great additional info.
Logarithm is denoted by log or lg. In your case I guess the correct interpretation is N + M * log(N).
EDIT: The base of the logarithm does not matter when doing asymptotic complexity analysis.

Resources