I am trying to understand how loop invariants interact with breaks. CLRS 3e (pg19) describes a loop invariant as requiring that
If it is true before an iteration of the loop, it remains true before the next iteration.
So given the following trivial loop
for i = 1 to 5
if i == 3 then break
Would it be fair to say that i < 4 is an invariant property of the loop? The argument being that, since the loop terminates at the break statement, there is no iteration after that property is violated.
Yes, that would be an invariant, for precisely the reason you’ve mentioned. Whether it’s a useful invariant is a separate question that depends on what you’re trying to prove.
Related
TL;DR: How do I prove an algorithm will work for every value of n?
Overview:
I'm a self-taught programmer with a math background up to linear algebra. I recently needed to prove that a relationship was recursive by writing an algorithm to solve the problem for n=100.
When I got to the solution, the way that I arrived there was deemed unacceptable. The person I was speaking with said that my algorithm was a "statistical' algorithm rather than actually demonstrating that a recurrance relation existed and proving my algorithm will work.
I've been solving some problems on websites such as codesignal, hackerrank, etc., but this is the first time that I've run into this concept of generalising a solution into a formal proof.
Question:
How do I prove an algorithm will work for every value of n?
Example:
Let's use binary search as the example and just forget the actual problem that I faced.
In the case where you have an array of 100 integers, sorted in ascending order, how can you prove your binary search algorithm will work for any array and any n?
In the example below, let's say our array is
arr = list(range(100))
and my proposed problem is:
Write a recursive algorithm that will return True if the value '42' is in the array and False otherwise.
How can you prove (as in formal proof) that this algorithm works? Please take care to highlight the thought process and intuition behind the moment that the algorithm goes from being a heuristic solution to being a proved algorithm?
42 is not discarded
If an array A is sorted, then if we can show that A[x] > 42, then A[x + 1] > 42. This is because, if an array is sorted, each element is greater-than or equal to its predecessor (that is, A[x + 1] > A[x] > 42). We know this because the > operator is transitive.
The same is true, in reverse, for the < operator.
A binary search shall, at each step, reject all the inputs that are either bigger -- or smaller -- than the desired input, by sampling a single possibility, and deciding that all those on one side of it are also in need of rejection (as explained above).
(EDIT: if x > 42 or x < 42 is true; then x = 42 must be false.)
The array gets smaller
At each step, at least one element of the array is removed, unless it is equal to 42. This is because if the element is not 42, then that element (perhaps along with some others) will be removed.
If the array is getting smaller (assuming 42 is at no point sampled), and 42 is never removed, then at some point, either 42 will be sampled, or the array will be empty
Conclusion
If the array is empty, and since 42 is not discarded, there was never a 42.
If we sample a 42, since no new elements are introduced to the array, the 42 was there to start with.
Proof!
Additional Comments
To show that the recursive algorithm works, you want to show that it
ends
yields the correct result.
It ends because at each recursive step the array is getting smaller (but cannot dip below []). It yields the correct result because 42 is never removed nor added -- so at the end, if we can't find a 42, it's because it was never there. Your argument should not rely on any concrete examples, except maybe the base case, in my opinion--else it might be statistical. You need to "prove" it in the mathematical sense.
For a simple correctness proof: You need to prove that your algorithm can successfully do what it is designed for.
So, take a precondition of the statement about input case data. And work out that it should imply the post condition which is required in the output. This proves that algorithm is correct.
P: Statement about given input
Q: Statement of the required output.
Prove P implies Q.
Take care of corner cases.
Make sure about the termination of algorithm in all cases.
If its a recursive algo, you strictly need to prove the algorithm terminates/exits.
Write a recursive algorithm that will return True if the value '42' is
in the array and False otherwise.
For such problems, you can also use Proof by contradiction. First try to assume that the algorithm will yield true if 42 is not present or the algorithm will return false if 42 is present. Then, justify your assumption through your algorithm flow and try to show that this is not possible, a contradiction.
int i, temp;
a is an array of integers [1...100]
i = 1;
while i < 100
if a[i] > a[i+1]
temp = a[i]
a[i] = a[i+1]
a[i+1] = temp
i = i+1
I'm having trouble understanding how to find loop invariants and writing a formal statement for them. So a loop invariant is just a condition that is true immediately before and after each iteration of a loop. It looks like the code is doing a swap: if the following element is greater in the array than the current one, switch places. I mean from the definition of a loop invariant, it really sounds like it's i < 100 because that must be true for the loop to run, but I don't really understand. Would greatly appreciate some clarification.
Going by your definition, I can see two loop invariant conditions:
1. i < 100
2. a[i] = greater than a[j] for all j < i, where i is the loop variable.
This is in fact one outer loop iteration of bubble sort. At the end of this loop, the highest value in the array bubbles to the top (a[100])
.
Roughly speaking, you are right. A loop invariant is "just a condition that is true immediately before and after each iteration of a loop." However, under this definition, there are literally an infinite number of loop invariants for the code in question and most of them are of no particular interest. For example, i < 101, i < 102, i < 103, ... are all loop invariants for the given code.
However, usually we are not interested in simply finding a loop invariant just for the sake of finding a loop invariant. Instead, we are interested in proving a program correct and if we want to prove a program correct then a well-chosen loop invariant turns out to be very useful.
For example, the code in question is the inner loop of the bubble sort algorithm and its purpose is to make the array "more sorted". Hence, to prove total correctness of this code we must prove three things:
(1) When execution gets to the end of the code, the array is a permutation of the array at the beginning of the code.
(2) When execution gets to the end of the code, the number of inversions in the array is either zero or it is less than the number of inversions in the array at the beginning of the code (this condition helps us prove that the outer loop of the bubble sort algorithm terminates).
(3) The code terminates.
To prove (1) we need to consider three execution paths (and a loop invariant will play a critical role when we consider PATH 2).
(PATH 1) Consider what happens when execution starts at the beginning of the code and arrives at the top of the loop for the first time. Since nothing is done to the array on this execution path, the array is a permutation of the array at the beginning of the code.
(PATH 2) Now consider what happens when execution starts at the top of the loop, goes around the loop, and returns to the top of the loop. If a[i] <= a[i+1] then the swap does not occur and, thus, the array is still a permutation of the array at the beginning of the code (since nothing is done to it). Alternatively, if a[i] > a[i+1] then the swap does occur. However, the array is still a permutation of the array at the beginning of the code (since a swap is a type of permutation). Thus, whenever execution gets to the top of the loop, the array is a permutation of the array at the beginning of the code. Note that the statement "the array is a permutation of the array at the beginning of the code" is the well-chosen loop invariant that we need to help us prove that the code is correct.
(PATH 3) Finally, consider what happens when execution starts at the top of the loop but does not enter the loop and instead it goes to the end of the code. Since nothing is done to the array on this execution path, the array is a permutation of the array at the beginning of the code.
These three paths cover all possible ways that execution can go from the beginning of the code to the end of the code and, hence, we have proved (1) the array at the end of the code is a permutation of the array at the beginning of the code.
A loop invariant is some predicate (condition) that holds for every
iteration of the loop, that is necessarily true immediately before and
immediately after each iteration of a loop.
There can be of course infinitely many loop invariants, but the fact that the loop invariant property is used to prove correctness of the algorithm, restricts us to consider only the so-called "interesting loop invariants".
Your program, whose aim is to sort the given array, is a simple bubble sort.
Goal Statement: The array a is sorted at the end of while loop
Some interesting properties can be like: At the end of ith iteration, a[0:i] is sorted, which when extended to i=100, results in the whole array being sorted.
Loop Invariant for your case: a[100-i: 100-1] is sorted
Note that when i equals 100, the above statement would mean that the complete array is sorted, which is what you want to be true at the end of the algorithm.
PS: Just realized it is an old question, anyway helps in improving my answering skills:)
Your loop is controlled by the test i < 100. Within the body of the loop, i is used in several places but is only assigned in one place. The assignment always happens, and for any value of i which permits entry to the loop the assignment will converge towards the terminating condition. Thus the loop is guaranteed to terminate.
As for correctness of your program, that's a different issue. Depending on whether your arrays use zero-based or one-based indexing, the way you're using i for array accesses could be problematic. If it's zero-based, you never look at the first element and you'll step out of bounds with a[i+1] on the last iteration.
Taking an example to clarify, say I have a program that takes in two inputs from the user, a and b. The program increments a and decrements b. The program returns the value at which a and b both become equal. Hence, a has to be smaller and b has to be larger. But what if the user enters the opposite? a as larger and b as smaller? The program would go into an infinite loop obviously. But suppose I want the program to return that "these two numbers can never meet". Then how do I check for it? Please don't answer that check for the numbers and respond respectively. This is just an example. I want to know how to check for a condition that can never be met.
Another example, say comparing two numbers. Suppose I have two numbers and I keep randomising them. The program should return true when they are equal and false when they are not. It should not go into an infinite loop when the numbers are not equal. I can only keep comparing the numbers in each iteration and return true as soon as they are equal. But there are likely chances that they never become equal and the program never terminates. How to check for such a scenario and return something like the numbers can never be equal.
It has to be done on a case-by-case basis; the general problem (given an iterative procedure controlled by a variable condition, determine whether the condition will ever assume a given value, e.g. True) is equivalent to the halting problem, which is uncomputable...
A simple if statement, like this?
if (a <= b)
throw "Condition cannot be met"
else
for ( ; a != b; a--, b++) {
// Do something
}
Example in c++
EDIT I got A and B the wrong way round! Same scenario though.
do i=2, n-1
y(i) = y(i+1)
end do
do i=2, n-1
y(i) = y(i+1) - y(i-1)
end do
Hi, I'm wondering if both of these loop can be parallelized? It seems that the y(i+1) part makes it not possible. Because it depends on value that's not generated yet.
If y is an array (it LOOKS like a function, but then you'd be assigning to a function call), then the y(i+1) part already exists, although it is still problematic for parallelizing.
Of course you can.
You just need to write it in such a way that each parallel task doesn't "step on" any other task's memory.
In the second case, that would be a tricky, but I'm certain it's possible with enough thought.
In the first case, parallelization is only possible if you have a secondary storage area. For maximum parallelism, you will need a completely separate array:
for all i in [2, n-2] in parallel do
y'(i) = y(i+1)
end do
If you only want to use two parallel execution units, you will need storage for one element of the array:
e = y(n/2)
for all i in [0, 1] in parallel do
for j in [1, n/2 - 1] do
y(i*n/2 + j) = (y(i*n/2 + j)
end do
end do
y(n/2 - 1) = e
You need this to avoid the race condition on the last element of the first half and the first element of the second half. In fact, there is a direct relationship between the amount of additional storage you need and the factor by which you parallelize the code.
The second loop cannot be parallelized, since you must have computed y(i-1) to compute y(i). No way. This isn't a problem for the first loop since all values which are eventually read are guaranteed to have the correct value in them before the loop starts. Not so in the second!
For what it's worth, these loops can be combined, if they are meant to be executed sequentially. That would be faster than parallelizing the first and leaving the second alone.
In both cases you have what's called a "loop carried dependency"
do i=2, n-1
y(i) = y(i+1) - y(i-1)
end do
The calculation for y(i) depends on y(i +/- 1) because in a parallel loop you cannot guarentee the order in which i will be executed y(i+1) may have already been updated to its new value before y(i) is calculated. Worse still y(i+1) may be in the process of being updated on one thread while another thread attempts to read what might be a corrupt value (because its data is only half way through being updated. In either case you'll get incorrect answers.
The best solution here is to have a readonly and writable array
do i=2, n-1
yNew(i) = yOld(i+1) - yOld(i-1)
end do
swap(yOld, yNew)
Now your problem goes away because array y is not updated by the parallel loop. If your language supports pointers you can easily swap the new/old arrays by maintaining pointers to them and simply swapping the pointers. The only additional overhead is that you need to keep an additional copy of your data as a readonly copy for the loop to refer to.
In what seems to me a common implementation of quicksort, the program is composed of a partitioning subroutine and two recursive calls to quicksort those (two) partitions.
So the flow of control, in the quickest and pseudo-est of pseudocode, goes something like this:
quicksort[list, some parameters]
.
.
.
q=partition[some other parameters]
quicksort[1,q]
quicksort[q+1,length[list]]
.
.
.
End
The q is the "pivot" after a partitioning. That second quicksort call--the one that'll quicksort the second part of the list, also uses q. This is what I don't understand. If the "flow of control" is going through the first quicksort first, q is going to be updated. How is the same q going to work in the second quicksort, when it comes time to do the second parts of all those partitions?
I think my misunderstanding comes from the limitations of pseudocode. There are details that have been likely left out by expressing this implementation of the quicksort algorithm in pseudocode.
Edit 1 This seems related to my problem:
For[i = 1, i < 5, i = i + 1, Print[i]]
The first time through, we would get i=1, true, i=2, 1. Even though i was updated to 2, i is still 1 in body (i.e., Print[i]=1). This "flow of control" is what I don't understand. Where is the i=1 being stored when it increments to 2 and before it gets to body?
Edit 2
As an example of what I'm trying to get at, I'm pasting this here. It's from here.
Partition(A,p,r)
x=A[r]
i=p+1
j=r+1
while TRUE
repeat j=j-1
until A[j]<=x
repeat i=i+1
until A[i]>=x
if i<j
then exchange A[i] with A[j]
else return j
Quicksort(A,1,length[A])
Quicksort(A,p,r)
if p<r
then q=Partition(A,p,r)
Quicksort(A,p,q)
Quicksort(A,q+1,r)
Another example can be found here.
Where or when in these algorithms is q being put onto a stack?
q is not updated. The pivot remains in his place. In each iteration of quicksort, the only element who is guaranteed to be in its correct place, is the pivot.
Also, note that the q which is "changed" during the recursive call is NOT actually changed, since it is a different variable, stored in a different area, this is true because q is a local variable of the function, and is generated for each call.
EDIT: [response to the question edit]
In quicksort, the algorithm actually generate number of qs, which are stored on the stack. Every variable is 'alive' only on its own function, and is accessible [in this example] only from it. When the function ends, the local variable is being released automatically, so actually you don't have only one pivot, you actually have number of pivots, one for each recursive step.
Turns out Quicksort demands extra memory to function precisely in order to do the bookeeping you mentioned. Perhaps the following (pseudocode) iterative version of the algorithm might clear things up:
quicksort(array, begin, end) =
intervals_to_sort = {(begin, end)}; //a set
while there are intervals to sort:
(begin, end) = remove an interval from intervals_to_sort
if length of (begin, end) >= 2:
q = partition(array, begin, end)
add (begin, q) to intervals_to_sort
add (q+1, end) to intervals_to_sort
You may notice that now the intervals to sort are being explicitly kept in a data structure (usually just an array, inserting and removing at the end, in a stack-like fashion) so there is no risk of "forgetting" about old intervals.
What might confuse you is that the most common description of Quicksort is recursive so the q variable appears multiple times. The answer to this is that every time a function is called it creates a new batch of local variables so it doesn't touch the old ones. In the end, the explicit stack from that previous imperative example ends up being implemented as an implicit stack with function variables.
(An interesting side note: some early programming languages didn't implement neat local variables like that and Quicksort was actually first described using the iterative version with the explicit stack. It was only latter that it was seen how Quicksort could be elegantly described as a recursive algorithm in Algol.)
As for the part after your edit, the i=1 is forgotten since assignment will destructively update the variable.
The partition code picks some value from the array (such as the value at the midpoint of the array ... your example code picks the last element) -- this is the pivot. It then puts all the values <= pivot on the left and all values >= pivot on the right, and then stores the pivot in the one remaining slot between them. At that point, the pivot is necessarily in the correct slot, q. Then the algorithm sorts the partition [p, q) and the partition [q+1, r), which are disjoint but cover all of A except q, resulting in the entire array being sorted.