Is this Dynamic Programming Solution to Text Justification just Brute Force? - algorithm

I'm having trouble understanding the dynamic programming solution to the text justification problem as specified in the MIT open courseware lecture here. Some notes from that lecture are here, and page 3 of the notes is what I am referring to.
I thought that Dynamic Programming meant you memoize some of the computations so that you don't need to recompute, thus saving you time, but in the algorithm given in the lecture, I don't see any use of memoization, just a whole bunch of deep recursive calls, i.e. the main function is this:
DP[i] = min(badness (i, j) + DP[j] for j in range (i + 1, n + 1))
DP[n] = 0
where badness is a function that determines the the amount of unused space after subtracting the length of the words from the line length. To me it looks like this algorithm calculates all possible "badness" calculations and chooses the smallest one, which seems like brute force to me. Where is the advantage Dynamic Programming usually gives us by memoizing past calculations so we don't have to recompute?

If you memoize the results, you don't have to compute each DP[i] several times.
That is, DP[0] "calls" DP[2] for example, but so does DP[1]. In the second time DP[2] is called, it won't be necessary to compute it again, you can just return the memoized value.
This also makes it easy to verify a polynomial upper bound for this algorithm. Since each DP[i] will perform O(n) operations, and there are n of them, the overall algorithm is O(n^2), assuming, of course, that badness(i, j) is O(1).

Related

Time Complexity (Big O) - Can value of N decides whether the time complexity is O(1) or O(N) when we have 2 nested FOR loops?

Suppose that I have 2 nested for loops, and 1 array of size N as shown in my code below:
int result = 0;
for( int i = 0; i < N ; i++)
{
for( int j = i; j < N ; j++)
{
result = array[i] + array[j]; // just some funny operation
}
}
Here are 2 cases:
(1) if the constraint is that N >= 1,000,000 strictly, then we can definitely say that the time complexity is O(N^2). This is true for sure as we all know.
(2) Now, if the constraint is that N < 25 strictly, then people could probably say that because we know that definitely, N is always too small, the time complexity is estimated to be O(1) since it takes very little time to run and complete these 2 for loops WITH MODERN COMPUTERS ? Does that sound right ?
Please tell me if the value of N plays a role in deciding the outcome of the time complexity O(N) ? If yes, then how big the value N needs to be in order to play that role (1,000 ? 5,000 ? 20,000 ? 500,000 ?) In other words, what is the general rule of thumb here ?
INTERESTING THEORETICAL QUESTION: If 15 years from now, the computer is so fast that even if N = 25,000,000, these 2 for loops can be completed in 1 second. At that time, can we say that the time complexity would be O(1) even for N = 25,000,000 ? I suppose the answer would be YES at that time. Do you agree ?
tl:dr No. The value of N has no effect on time complexity. O(1) versus O(N) is a statement about "all N" or how the amount of computation increases when N increases.
Great question! It reminds me of when I was first trying to understand time complexity. I think many people have to go through a similar journey before it ever starts to make sense so I hope this discussion can help others.
First of all, your "funny operation" is actually funnier than you think since your entire nested for-loops can be replaced with:
result = array[N - 1] + array[N - 1]; // just some hilarious operation hahaha ha ha
Since result is overwritten each time, only the last iteration effects the outcome. We'll come back to this.
As far as what you're really asking here, the purpose of Big-O is to provide a meaningful way to compare algorithms in a way that is indenependent of input size and independent of the computer's processing speed. In other words, O(1) versus O(N) has nothing to with the size of N and nothing to do with how "modern" your computer is. That all effects execution time of the algorithm on a particular machine with a particular input, but does not effect time complexity, i.e. O(1) versus O(N).
It is actually a statement about the algorithm itself, so a math discussion is unavoidable, as dxiv has so graciously alluded to in his comment. Disclaimer: I'm going to omit certain nuances in the math since the critical stuff is already a lot to explain and I'll defer to the mountains of complete explanations elsewhere on the web and textbooks.
Your code is a great example to understand what Big-O does tell us. The way you wrote it, its complexity is O(N^2). That means that no matter what machine or what era you run your code in, if you were to count the number of operations the computer has to do, for each N, and graph it as a function, say f(N), there exists some quadratic function, say g(N)=9999N^2+99999N+999 that is greater than f(N) for all N.
But wait, if we just need to find big enough coefficients in order for g(N) to be an upper bound, can't we just claim that the algorithm is O(N) and find some g(N)=aN+b with gigantic enough coefficients that its an upper bound of f(N)??? THE ANSWER TO THIS IS THE MOST IMPORTANT MATH OBSERVATION YOU NEED TO UNDERSTAND TO REALLY UNDERSTAND BIG-O NOTATION. Spoiler alert. The answer is no.
For visuals, try this graph on Desmos where you can adjust the coefficients:[https://www.desmos.com/calculator/3ppk6shwem][1]
No matter what coefficients you choose, a function of the form aN^2+bN+c will ALWAYS eventually outgrow a function of the form aN+b (both having positive a). You can push a line as high as you want like g(N)=99999N+99999, but even the function f(N)=0.01N^2+0.01N+0.01 crosses that line and grows past it after N=9999900. There is no linear function that is an upper bound to a quadratic. Similarly, there is no constant function that is an upper bound to a linear function or quadratic function. Yet, we can find a quadratic upper bound to this f(N) such as h(N)=0.01N^2+0.01N+0.02, so f(N) is in O(N^2). This observation is what allows us to just say O(1) and O(N^2) without having to distinguish between O(1), O(3), O(999), O(4N+3), O(23N+2), O(34N^2+4+e^N), etc. By using phrases like "there exists a function such that" we can brush all the constant coefficients under the rug.
So having a quadratic upper bound, aka being in O(N^2), means that the function f(N) is no bigger than quadratic and in this case happens to be exactly quadratic. It sounds like this just comes down to comparing the degree of polynomials, why not just say that the algorithm is a degree-2 algorithm? Why do we need this super abstract "there exists an upper bound function such that bla bla bla..."? This is the generalization necessary for Big-O to account for non-polynomial functions, some common ones being logN, NlogN, and e^N.
For example if the number of operations required by your algorithm is given by f(N)=floor(50+50*sin(N)), we would say that it's O(1) because there is a constant function, e.g. g(N)=101 that is an upper bound to f(N). In this example, you have some bizarre algorithm with oscillating execution times, but you can convey to someone else how much it doesn't slow down for large inputs by simply saying that it's O(1). Neat. Plus we have a way to meaningfully say that this algorithm with trigonometric execution time is more efficient than one with linear complexity O(N). Neat. Notice how it doesn't matter how fast the computer is because we're not measuring in seconds, we're measuring in operations. So you can evaluate the algorithm by hand on paper and it's still O(1) even if it takes you all day.
As for the example in your question, we know it's O(N^2) because there are aN^2+bN+c operations involved for some a, b, c. It can't be O(1) because no matter what aN+b you pick, I can find a large enough input size N such that your algorithm requires more than aN+b operations. On any computer, in any time zone, with any chance of rain outside. Nothing physical effects O(1) versus O(N) versus (N^2). What changes it to O(1) is changing the algorithm itself to the one-liner that I provided above where you just add two numbers and spit out the result no matter what N is. Let's say for N=10 it takes 4 operations to do both array lookups, the addition, and the variable assignment. If you run it again on the same machine with N=10000000 it's still doing the same 4 operations. The amount of operations required by the algorithm doesn't grow with N. That's why the algorithm is O(1).
It's why problems like finding a O(NlogN) algorithm to sort an array are math problems and not nano-technology problems. Big-O doesn't even assume you have a computer with electronics.
Hopefully this rant gives you a hint as to what you don't understand so you can do more effective studying for a complete understanding. There's no way to cover everything needed in one post here. It was some good soul-searching for me, so thanks.

Finding longest common subsequence in O(NlogN) time

Is there any way of finding the longest common subsequence of two sequences in O(NlogN) time?
I read somewhere that there is a way to achieve this using binary search.
I know the dp approach that takes O(N2) time.
For the general case, the O(N^2) dynamic programming algorithm is the best you can do. However, there exist better algorithms in some special cases.
Alphabet size is bounded
This is a very common situation. Sequences consisting of letters from some alphabet (e.g. English) lie in this category. For this case, the O(N*M) algorithm can be optimised to get an O(N^2/logN) with method of four Russians. I don't know how exactly, you can search for it.
Both sequences consist of distinct elements
An example problem is "Given two permutations of numbers from 1 to N, find their LCS". This one can be solved in O(N*logN). Let the sequences be A and B.
Define a sequence C. C[i] is the index of B[i] in A. (A[C[i]] = B[i])
LCS of A and B is the longest increasing subsequence of C.
The dynamic programming approach, which is O(n2) for general case. For certain other cases, there are lower-complexity algorithms:
For a fixed alphabet size (which doesn't grow with n), there's the Method of Four Russians which brings the time down to O(n2/log n) (see here).
See here another further optimized case.
Assuming Exponential Time Hypothesis (which is stricter than P is not equal to NP, but is still widely believed to be true), it is not possible to do it in time O(N^{2 - eps}) for any positive constant eps, see "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping" by Karl Bringmann and Marvin Kunnemann (pre-print on arXiv is available).
Roughly speaking, it means that the general case of this problem cannot be solved in time better than something like O(N^2/log N), so if you want faster algorithms you have to consider additional constraints (some properties of the strings) or look for approximate solution.
The longest common subsequence between two sequences is essentially n-squared.
Masek and Patterson (1980) made a minor improvement to n-squared / log n using the so-called "Four Russians" technique.
In most cases the additional complexity introduced by such convoluted approaches is not justified by the small gains. For practical purposes you can consider the n-squared approach as the reasonable optimum in typical applications.
vector <int> LIS;
int LongestIncreasingSubsequence(int n){
if(!n) return 0;
LIS.emplace_back(arr[0]);
for(int i = 1; i < n; i++){
if(arr[i] > LIS.back()) LIS.emplace_back(arr[i]);
else *lower_bound(LIS.begin(), LIS.end(), arr[i]) = arr[i];
}
return LIS.size();
}

selection algorithm for median

I was trying to understand the selection algorithm for finding the median. I have pasted the psuedo code below.
SELECT(A[1 .. n], k):
if n<=25
use brute force
else
m = ceiling(n/5)
for i=1 to m
B[i]=SELECT(A[5i-4 .. 5i], 3)
mom=SELECT(B[1 ..m], floor(m/2))
r = PARTITION(A[1 .. n],mom)
if k < r
return SELECT(A[1 .. r-1], k)
else if k > r
return SELECT(A[r +1 .. n], k-r)
else
return mom
i have a very trivial doubt. I was wondering what the author means by brute force written above for i<=25. Is it that he will compare elements one by one with every other element and see if its the kth largest or something else.
The code must come from here.
A brute force algorithm can be any simple and stupid algorithm. In your example, you can sort the 25 elements and find the middle one. This is simple and stupid compared to the selection algorithm since sorting takes O(nlgn) while selection takes only linear time.
A brute force algorithm is often good enough when n is small. Besides, it is easier to implement. Read more about brute force here.
Common wisdom is that Quicksort is slower than insertion sort for small inputs. Therefore many implementations switch to insertion sort at some threshold.
There is a reference to this practice in the Wikipedia page on Quicksort.
Here's an example of commercial mergesort code that switches to insertion sort for small inputs. Here the threshold is 7.
The "brute force" almost certainly refers to the fact that the code here is using the same practice: insertion sort followed by picking the middle element(s) for the median.
However I've found in practice that the common wisdom is not generally true. When I've run benchmarks, the switch has either very little positive effect or negative. That was for Quicksort. In the Parition algorithm, it's more likely ot be negative because one side of the partition is thrown away at each step, so there is less time spent on small inputs. This is verified in #Dennis's response to this SO question.

Dynamic programming aspect in Kadane's algorithm

Initialize:
max_so_far = 0
max_ending_here = 0
Loop for each element of the array
(a) max_ending_here = max_ending_here + a[i]
(b) if(max_ending_here < 0)
max_ending_here = 0
(c) if(max_so_far < max_ending_here)
max_so_far = max_ending_here
return max_so_far
Can anyone help me in understanding the optimal substructure and overlapping problem(bread and butter of DP) i the above algo?
According to this definition of overlapping subproblems, the recursive formulation of Kadane's algorithm (f[i] = max(f[i - 1] + a[i], a[i])) does not exhibit this property. Each subproblem would only be computed once in a naive recursive implementation.
It does however exhibit optimal substructure according to its definition here: we use the solution to smaller subproblems in order to find the solution to our given problem (f[i] uses f[i - 1]).
Consider the dynamic programming definition here:
In mathematics, computer science, and economics, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems1 and optimal substructure (described below). When applicable, the method takes far less time than naive methods that don't take advantage of the subproblem overlap (like depth-first search).
The idea behind dynamic programming is quite simple. In general, to solve a given problem, we need to solve different parts of the problem (subproblems), then combine the solutions of the subproblems to reach an overall solution. Often when using a more naive method, many of the subproblems are generated and solved many times. The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations
This leaves room for interpretation as to whether or not Kadane's algorithm can be considered a DP algorithm: it does solve the problem by breaking it down into easier subproblems, but its core recursive approach does not generate overlapping subproblems, which is what DP is meant to handle efficiently - so this would put it outside DP's specialty.
On the other hand, you could say that it is not necessary for the basic recursive approach to lead to overlapping subproblems, but this would make any recursive algorithm a DP algorithm, which would give DP a much too broad scope in my opinion. I am not aware of anything in the literature that definitely settles this however, so I wouldn't mark down a student or disconsider a book or article either way they labeled it.
So I would say that it is not a DP algorithm, just a greedy and / or recursive one, depending on the implementation. I would label it as greedy from an algorithmic point of view for the reasons listed above, but objectively I would consider other interpretations just as valid.
Note that I derived my explanation from this answer. It demonstrates how Kadane’s algorithm can be seen as a DP algorithm which has overlapping subproblems.
Identifying subproblems and recurrence relations
Imagine we have an array a from which we want to get the maximum subarray. To determine the max subarray that ends at index i the following recursive relation holds:
max_subarray_to(i) = max(max_subarray_to(i - 1) + a[i], a[i])
In order to get the maximum subarray of a we need to compute max_subarray_to() for each index i in a and then take the max() from it:
max_subarray = max( for i=1 to n max_subarray_to(i) )
Example
Now, let's assume we have an array [10, -12, 11, 9] from which we want to get the maximum subarray. This would be the work required running Kadane's algorithm:
result = max(max_subarray_to(0), max_subarray_to(1), max_subarray_to(2), max_subarray_to(3))
max_subarray_to(0) = 10 # base case
max_subarray_to(1) = max(max_subarray_to(0) + (-12), -12)
max_subarray_to(2) = max(max_subarray_to(1) + 11, 11)
max_subarray_to(3) = max(max_subarray_to(2) + 9, 49)
As you can see, max_subarray_to() is evaluated twice for each i apart from the last index 3, thus showing that Kadane's algorithm does have overlapping subproblems.
Kadane's algorithm is usually implemented using a bottom up DP approach to take advantage of the overlapping subproblems and to only compute each subproblem once, hence turning it to O(n).

Determining the worst-case complexity of an algorithm

Can someone please explain to me how one can determine the worst-case complexity of an algorithm. I know that the we need to use the equation W(n) = max{t(I)|I element of D), where D is the set of inputs of size n. Do I calculate the number of operations performed for each element I and then take its max? What easier way is there to accomplish this?
Starting from the equation is thinking of it a bit backwards. What you really care about is scalability, or, what is it going to do as you increase the size of the input.
If you just have a loop, for instance, you have a O(n) time complexity algorithm. If you have a loop within another loop though, it becomes O(n^2), because it must now do n^2 many things for any size n input.
When you are talking about worst case, you are usually talking about non deterministic algorithms, where you might have a loop that can stop prematurely. What you want to do for this is assume the worst and pretend the loop will stop as late as possible. So if we have:
for(int i = 0;i < n;i++){
for(int j = 0;j < n;j++){
if(rand() > .5) j = n;
}
}
We would say that the worst-case is O(n^2). Even though we know that it is very likely that the middle loop will bust out early, we are looking for the worst possible performance.
That equation is more of a definition than an algorithm.
Does the algorithm in question care about anything other than the size of its input? If not then calculating W(n) is "easy".
If it does, try to come up with a pathological input. For example, with quicksort it might be fairly obvious that a sorted input is pathological, and you can do some counting to see that it takes O(n^2) steps. At that point you can either
Argue that your input is "maximally" pathological
Exhibit a matching upper bound on the runtime on any input
Example of #1:
Each pass of quicksort will put the pivot in the right place, and then recurse on the two parts. (handwave alert) The worst case is to have the rest of the array on one side of the pivot. A sorted input achieves this.
Example of #2:
Each pass of quicksort puts the pivot in the right place, so there are no more than O(n) passes. Each pass requires no more than O(n) work. As such, no input can cause quicksort to take more than O(n^2).
In this case #2 is a lot easier.

Resources