Can both the following loop be parallelized? - parallel-processing

do i=2, n-1
y(i) = y(i+1)
end do
do i=2, n-1
y(i) = y(i+1) - y(i-1)
end do
Hi, I'm wondering if both of these loop can be parallelized? It seems that the y(i+1) part makes it not possible. Because it depends on value that's not generated yet.

If y is an array (it LOOKS like a function, but then you'd be assigning to a function call), then the y(i+1) part already exists, although it is still problematic for parallelizing.

Of course you can.
You just need to write it in such a way that each parallel task doesn't "step on" any other task's memory.
In the second case, that would be a tricky, but I'm certain it's possible with enough thought.

In the first case, parallelization is only possible if you have a secondary storage area. For maximum parallelism, you will need a completely separate array:
for all i in [2, n-2] in parallel do
y'(i) = y(i+1)
end do
If you only want to use two parallel execution units, you will need storage for one element of the array:
e = y(n/2)
for all i in [0, 1] in parallel do
for j in [1, n/2 - 1] do
y(i*n/2 + j) = (y(i*n/2 + j)
end do
end do
y(n/2 - 1) = e
You need this to avoid the race condition on the last element of the first half and the first element of the second half. In fact, there is a direct relationship between the amount of additional storage you need and the factor by which you parallelize the code.
The second loop cannot be parallelized, since you must have computed y(i-1) to compute y(i). No way. This isn't a problem for the first loop since all values which are eventually read are guaranteed to have the correct value in them before the loop starts. Not so in the second!
For what it's worth, these loops can be combined, if they are meant to be executed sequentially. That would be faster than parallelizing the first and leaving the second alone.

In both cases you have what's called a "loop carried dependency"
do i=2, n-1
y(i) = y(i+1) - y(i-1)
end do
The calculation for y(i) depends on y(i +/- 1) because in a parallel loop you cannot guarentee the order in which i will be executed y(i+1) may have already been updated to its new value before y(i) is calculated. Worse still y(i+1) may be in the process of being updated on one thread while another thread attempts to read what might be a corrupt value (because its data is only half way through being updated. In either case you'll get incorrect answers.
The best solution here is to have a readonly and writable array
do i=2, n-1
yNew(i) = yOld(i+1) - yOld(i-1)
end do
swap(yOld, yNew)
Now your problem goes away because array y is not updated by the parallel loop. If your language supports pointers you can easily swap the new/old arrays by maintaining pointers to them and simply swapping the pointers. The only additional overhead is that you need to keep an additional copy of your data as a readonly copy for the loop to refer to.

Related

Determining the number of steps in an algorithm

I was going through my Data Structures and Algorithms notes, and came across the following examples regarding Time Complexity and Big-O Notation: The columns on the left count the number of operations carried out in each line. I didn't understand why almost all the lines in the first example have a multiple of 2 in front of them, whereas the other two examples don't. Obviously this doesn't affect the resulting O(n), but I would still like to know where the 2 came form.
I can only find one explanation for this: the sloppiness of the author of the slides.
In a proper analysis one have to explain what kind of operations are performed at which time for what input (like for example this book on page 21). Without this you can not even be sure whether we count multiplication of 2 numbers as 1 operation or 2 or something else?
These slides are inconsistent. For example:
In slide1 currentMax = A[0] takes 2 operations. Kind of makes sense if you take finding 0-th element in array as 1 operation and assigning as another one. But in the slide3 n iterations of s = s + X[i] takes n operations. Which means that s = s + X[i] takes 1 operation. Also kind of makes sense we just increase one counter.
But it is totally inconsistent with each other, because it doesn't makes sense that a = X[0] is 2 operations, and a = a + X[0] where you do more takes only 1.

How to find the loop invariant and prove correctness?

int i, temp;
a is an array of integers [1...100]
i = 1;
while i < 100
if a[i] > a[i+1]
temp = a[i]
a[i] = a[i+1]
a[i+1] = temp
i = i+1
I'm having trouble understanding how to find loop invariants and writing a formal statement for them. So a loop invariant is just a condition that is true immediately before and after each iteration of a loop. It looks like the code is doing a swap: if the following element is greater in the array than the current one, switch places. I mean from the definition of a loop invariant, it really sounds like it's i < 100 because that must be true for the loop to run, but I don't really understand. Would greatly appreciate some clarification.
Going by your definition, I can see two loop invariant conditions:
1. i < 100
2. a[i] = greater than a[j] for all j < i, where i is the loop variable.
This is in fact one outer loop iteration of bubble sort. At the end of this loop, the highest value in the array bubbles to the top (a[100])
.
Roughly speaking, you are right. A loop invariant is "just a condition that is true immediately before and after each iteration of a loop." However, under this definition, there are literally an infinite number of loop invariants for the code in question and most of them are of no particular interest. For example, i < 101, i < 102, i < 103, ... are all loop invariants for the given code.
However, usually we are not interested in simply finding a loop invariant just for the sake of finding a loop invariant. Instead, we are interested in proving a program correct and if we want to prove a program correct then a well-chosen loop invariant turns out to be very useful.
For example, the code in question is the inner loop of the bubble sort algorithm and its purpose is to make the array "more sorted". Hence, to prove total correctness of this code we must prove three things:
(1) When execution gets to the end of the code, the array is a permutation of the array at the beginning of the code.
(2) When execution gets to the end of the code, the number of inversions in the array is either zero or it is less than the number of inversions in the array at the beginning of the code (this condition helps us prove that the outer loop of the bubble sort algorithm terminates).
(3) The code terminates.
To prove (1) we need to consider three execution paths (and a loop invariant will play a critical role when we consider PATH 2).
(PATH 1) Consider what happens when execution starts at the beginning of the code and arrives at the top of the loop for the first time. Since nothing is done to the array on this execution path, the array is a permutation of the array at the beginning of the code.
(PATH 2) Now consider what happens when execution starts at the top of the loop, goes around the loop, and returns to the top of the loop. If a[i] <= a[i+1] then the swap does not occur and, thus, the array is still a permutation of the array at the beginning of the code (since nothing is done to it). Alternatively, if a[i] > a[i+1] then the swap does occur. However, the array is still a permutation of the array at the beginning of the code (since a swap is a type of permutation). Thus, whenever execution gets to the top of the loop, the array is a permutation of the array at the beginning of the code. Note that the statement "the array is a permutation of the array at the beginning of the code" is the well-chosen loop invariant that we need to help us prove that the code is correct.
(PATH 3) Finally, consider what happens when execution starts at the top of the loop but does not enter the loop and instead it goes to the end of the code. Since nothing is done to the array on this execution path, the array is a permutation of the array at the beginning of the code.
These three paths cover all possible ways that execution can go from the beginning of the code to the end of the code and, hence, we have proved (1) the array at the end of the code is a permutation of the array at the beginning of the code.
A loop invariant is some predicate (condition) that holds for every
iteration of the loop, that is necessarily true immediately before and
immediately after each iteration of a loop.
There can be of course infinitely many loop invariants, but the fact that the loop invariant property is used to prove correctness of the algorithm, restricts us to consider only the so-called "interesting loop invariants".
Your program, whose aim is to sort the given array, is a simple bubble sort.
Goal Statement: The array a is sorted at the end of while loop
Some interesting properties can be like: At the end of ith iteration, a[0:i] is sorted, which when extended to i=100, results in the whole array being sorted.
Loop Invariant for your case: a[100-i: 100-1] is sorted
Note that when i equals 100, the above statement would mean that the complete array is sorted, which is what you want to be true at the end of the algorithm.
PS: Just realized it is an old question, anyway helps in improving my answering skills:)
Your loop is controlled by the test i < 100. Within the body of the loop, i is used in several places but is only assigned in one place. The assignment always happens, and for any value of i which permits entry to the loop the assignment will converge towards the terminating condition. Thus the loop is guaranteed to terminate.
As for correctness of your program, that's a different issue. Depending on whether your arrays use zero-based or one-based indexing, the way you're using i for array accesses could be problematic. If it's zero-based, you never look at the first element and you'll step out of bounds with a[i+1] on the last iteration.

Is it beneficial to transpose an array in order to use column-wise operations?

Assume that we are working with a language which stores arrays in column-major order. Assume also that we have a function which uses 2-D array as an argument, and returns it.
I'm wondering can you claim that it is (or isn't) in general beneficial to transpose this array when calling the function in order to work with column-wise operations instead of row-wise operations, or does the transposing negate the the benefits of column-wise operations?
As an example, in R I have a object of class ts named y which has dimension n x p, i.e I have p times series of length n.
I need to make some computations with y in Fortran, where I have two loops with following kind of structure:
do i = 1, n
do j= 1, p
!just an example, some row-wise operations on `y`
x(i,j) = a*y(i,j)
D = ddot(m,y(i,1:p),1,b,1)
! ...
end do
end do
As Fortran (as does R) uses column-wise storage, it would be better to make the computations with p x n array instead. So instead of
out<-.Fortran("something",y=array(y,dim(y)),x=array(0,dim(y)))
ynew<-out$out$y
x<-out$out$x
I could use
out<-.Fortran("something2",y=t(array(y,dim(y))),x=array(0,dim(y)[2:1]))
ynew<-t(out$out$y)
x<-t(out$out$x)
where Fortran subroutine something2 would be something like
do i = 1, n
do j= 1, p
!just an example, some column-wise operations on `y`
x(j,i) = a*y(j,i)
D = ddot(m,y(1:p,i),1,b,1)
! ...
end do
end do
Does the choice of approach always depend on the dimensions n and p or is it possible to say one approach is better in terms of computation speed and/or memory requirements? In my application n is usually much larger than p, which is 1 to 10 in most cases.
more of a comment, buy i wanted to put a bit of code: under old school f77 you would essentially be forced to use the second approach as
y(1:p,i)
is simply a pointer to y(1,i), with the following p values contiguous in memory.
the first construct
y(i,1:p)
is a list of values interspaced in memory, so it seems to require making a copy of the data to pass to the subroutine. I say it seems because i haven't the foggiest idea how a modern optimizing compiler deals with these things. I tend to think at best its a wash at worst this could really hurt. Imagine an array so large you need to page swap to access the whole vector.
In the end the only way to answer this is to test it yourself
----------edit
did a little testng and confirmed my hunch: passing rows y(i,1:p) does cost you vs passing columns y(1:p,i). I used a subroutine that does practically nothing to see the difference. My guess with any real subroutine the hit is negligable.
Btw (and maybe this helps understand what goes on) passing every other value in a column
y(1:p:2,i) takes longer (orders of magnitude) than passing the whole column, while passing every other value in a row cuts the time in half vs. passing a whole row.
(using gfortran 12..)

quicksort quickie: the flow of control in quicksort

In what seems to me a common implementation of quicksort, the program is composed of a partitioning subroutine and two recursive calls to quicksort those (two) partitions.
So the flow of control, in the quickest and pseudo-est of pseudocode, goes something like this:
quicksort[list, some parameters]
.
.
.
q=partition[some other parameters]
quicksort[1,q]
quicksort[q+1,length[list]]
.
.
.
End
The q is the "pivot" after a partitioning. That second quicksort call--the one that'll quicksort the second part of the list, also uses q. This is what I don't understand. If the "flow of control" is going through the first quicksort first, q is going to be updated. How is the same q going to work in the second quicksort, when it comes time to do the second parts of all those partitions?
I think my misunderstanding comes from the limitations of pseudocode. There are details that have been likely left out by expressing this implementation of the quicksort algorithm in pseudocode.
Edit 1 This seems related to my problem:
For[i = 1, i < 5, i = i + 1, Print[i]]
The first time through, we would get i=1, true, i=2, 1. Even though i was updated to 2, i is still 1 in body (i.e., Print[i]=1). This "flow of control" is what I don't understand. Where is the i=1 being stored when it increments to 2 and before it gets to body?
Edit 2
As an example of what I'm trying to get at, I'm pasting this here. It's from here.
Partition(A,p,r)
x=A[r]
i=p+1
j=r+1
while TRUE
repeat j=j-1
until A[j]<=x
repeat i=i+1
until A[i]>=x
if i<j
then exchange A[i] with A[j]
else return j
Quicksort(A,1,length[A])
Quicksort(A,p,r)
if p<r
then q=Partition(A,p,r)
Quicksort(A,p,q)
Quicksort(A,q+1,r)
Another example can be found here.
Where or when in these algorithms is q being put onto a stack?
q is not updated. The pivot remains in his place. In each iteration of quicksort, the only element who is guaranteed to be in its correct place, is the pivot.
Also, note that the q which is "changed" during the recursive call is NOT actually changed, since it is a different variable, stored in a different area, this is true because q is a local variable of the function, and is generated for each call.
EDIT: [response to the question edit]
In quicksort, the algorithm actually generate number of qs, which are stored on the stack. Every variable is 'alive' only on its own function, and is accessible [in this example] only from it. When the function ends, the local variable is being released automatically, so actually you don't have only one pivot, you actually have number of pivots, one for each recursive step.
Turns out Quicksort demands extra memory to function precisely in order to do the bookeeping you mentioned. Perhaps the following (pseudocode) iterative version of the algorithm might clear things up:
quicksort(array, begin, end) =
intervals_to_sort = {(begin, end)}; //a set
while there are intervals to sort:
(begin, end) = remove an interval from intervals_to_sort
if length of (begin, end) >= 2:
q = partition(array, begin, end)
add (begin, q) to intervals_to_sort
add (q+1, end) to intervals_to_sort
You may notice that now the intervals to sort are being explicitly kept in a data structure (usually just an array, inserting and removing at the end, in a stack-like fashion) so there is no risk of "forgetting" about old intervals.
What might confuse you is that the most common description of Quicksort is recursive so the q variable appears multiple times. The answer to this is that every time a function is called it creates a new batch of local variables so it doesn't touch the old ones. In the end, the explicit stack from that previous imperative example ends up being implemented as an implicit stack with function variables.
(An interesting side note: some early programming languages didn't implement neat local variables like that and Quicksort was actually first described using the iterative version with the explicit stack. It was only latter that it was seen how Quicksort could be elegantly described as a recursive algorithm in Algol.)
As for the part after your edit, the i=1 is forgotten since assignment will destructively update the variable.
The partition code picks some value from the array (such as the value at the midpoint of the array ... your example code picks the last element) -- this is the pivot. It then puts all the values <= pivot on the left and all values >= pivot on the right, and then stores the pivot in the one remaining slot between them. At that point, the pivot is necessarily in the correct slot, q. Then the algorithm sorts the partition [p, q) and the partition [q+1, r), which are disjoint but cover all of A except q, resulting in the entire array being sorted.

Mergesort that saves memory

I'm taking an Algorithms class and the latest homework is really stumping me. Essentially, the assignment is to implement a version of merge sort that doesn't allocate as much temporary memory as the implementation in CLRS. I'm supposed to do this by creating only 1 temp array at the beginning, and put all the temp stuff in it while splitting and merging.
I should also note that the language of the class is Lua, which is important because the only available data structures are tables. They're like Java maps in that they come in key-value pairs, but they're like arrays in that you don't have to insert things in pairs - if you insert only one thing it's treated as a value, and its key will be what its index would be in a language with real arrays. At least that's how I understand it, since I'm new to Lua as well. Also, anything at all, primitives, strings, objects, etc can be a key - even different types in the same table.
Anyway, 2 things that are confusing me:
First, well, how is it done? Do you just keep overwriting the temp array with each recursion of splitting and merging?
Second, I'm really confused about the homework instructions (I'm auditing the class for free so I can't ask any of the staff). Here are the instructions:
Write a top level procedure merge_sort that takes as its argument the ar-
ray to sort. It should declare a temporary array and then call merge_sort_1,
a procedure of four arguments: The array to sort, the one to use as tem-
porary space, and the start and finish indexes within which this call to
merge_sort_1 should work.
Now write merge_sort_1, which computes the midpoint of the start–finish
interval, and makes a recursive call to itself for each half. After that it
calls merge to merge the two halves.
The merge procedure you write now will be a function of the permanent
array and the temporary array, the start, the midpoint, and the finish.
It maintains an index into the temporary array and indices i, j into each
(sorted) half of the permanent array.
It needs to walk through the temporary array from start to finish, copying
a value either from the lower half of the permanent array or from the
upper half of the permanent array. It chooses the value at i in the lower
half if that is less than or equal to the value at j in the upper half, and
advances i. It chooses the value at j in the upper half if that is less than
the value at i in the lower half, and advances j.
After one part of the permanent array is used up, be sure to copy the rest
of the other part. The textbook uses a trick with an infinite value ∞ to
avoid checking whether either part is used up. However, that trick is hard
to apply here, since where would you put it?
Finally, copy all the values from start to finish in the temporary array
back to the permanent array.
Number 2 is confusing because I have no idea what merge_sort_1 is supposed to do, and why it has to be a different method from merge_sort. I also don't know why it needs to be passed starting and ending indexes. In fact, maybe I misread something, but the instructions sound like merge_sort_1 doesn't do any real work.
Also, the whole assignment is confusing because I don't see from the instructions where the splitting is done to make 2 halves of the original array. Or am I misunderstanding mergesort?
I hope I've made some sense. Thanks everyone!
First, I would make sure you understand mergesort.
Look at this explanation, with fancy animations to help you understand it.
This is their pseudo code version of it:
# split in half
m = n / 2
# recursive sorts
sort a[1..m]
sort a[m+1..n]
# merge sorted sub-arrays using temp array
b = copy of a[1..m]
i = 1, j = m+1, k = 1
while i <= m and j <= n,
a[k++] = (a[j] < b[i]) ? a[j++] : b[i++]
→ invariant: a[1..k] in final position
while i <= m,
a[k++] = b[i++]
→ invariant: a[1..k] in final position
See how they use b to hold a temporary copy of the data?
What your teacher wants is for you to pass one table in to be used for this temporary storage.
Does that clear up the assignment?
Your main sort routine would look like this: (sorry, I don't know Lua, so I'll write some Javaish code)
void merge_sort(int[] array) {
int[] t = ...allocate a temporary array...
merge_sort_1(array, 0, array.length, t);
}
merge_sort_1 takes an array to sort, some start and finish indexes, and an array to use for some temporary space. It does the actual divide-and-conquer calls and calls to the merge routine. Note that the recursive calls need to go to merge_sort_1 and not merge_sort because you don't want to allocate the array on each recursive level, just once at the start of the merge sort procedure. (This is the whole point in dividing the merge sort into two routines.)
I'll leave it up to you to write a merge routine. It should take the original array that contains 2 sorted sub-parts and a temporary array, and sorts the original array. The easiest way to do that would be to merge into the temporary array, then just copy it back when done.
First, well, how is it done? Do you
just keep overwriting the temp array
with each recursion of splitting and
merging?
Yes, the temp array keeps getting overwritten. The temp array is used during the merge phase to hold the merge results that are then copied back into the permanent array at the end of the merge.
Number 2 is confusing because I have
no idea what merge_sort_1 is supposed
to do, and why it has to be a
different method from merge_sort.
merge_sort_1 is the recursive center of the recursive merge sort. merge_sort will only be a convenience function, creating the temp array and populating the initial start and finish positions.
I also don't know why it needs to be
passed starting and ending indexes. In
fact, maybe I misread something, but
the instructions sound like
merge_sort_1 doesn't do any real work.
Also, the whole assignment is
confusing because I don't see from the
instructions where the splitting is
done to make 2 halves of the original
array. Or am I misunderstanding
mergesort?
The recursive function merge_sort_1 will only work on a portion of the passed in array. The portion it works on is defined by the start and ending indexes. The mid-point between the start and end is how the array is split and then split again on recursive calls. After the recursive calls for the upper and lower half are complete the two halves are merged into the temp array and then copied back to the permanent array.
I was able to write the merge sort in Lua as described and can comment on my implementation. It does seem as through the instructions were written as if they were comments in or about the teacher's implementation.
Here is the merge_sort function. As I said, it is only a convenience function and I feel is not the meat of the problem.
-- Write a top level procedure merge_sort that takes as its argument
-- the array to sort.
function merge_sort(a)
-- It should declare a temporary array and then call merge_sort_1,
-- a procedure of four arguments: The array to sort, the one to use
-- as temporary space, and the start and finish indexes within which
-- this call to merge_sort_1 should work.
merge_sort_1(a,{},1,#a)
end

Resources