OpenMP - running things in parallel and some in sequence within them

OpenMP - running things in parallel and some in sequence within them - openmp

I have a scenario like:
for (i = 0; i < n; i++)
{
for (j = 0; j < m; j++)
{
for (k = 0; k < x; k++)
{
val = 2*i + j + 4*k
if (val != 0)
{
for(t = 0; t < l; t++)
{
someFunction((i + t) + someFunction(j + t) + k*t)
}
}
}
}
}
Considering this is block A, Now I have two more similar blocks in my code. I want to put them in parallel, so I used OpenMP pragmas. However I am not able to parallelize it, because I am a tad confused that which variables would be shared and private in this case. If the function call in the inner loop was an operation like sum += x, then I could have added a reduction clause.
In general, how would one approach parallelizing a code using OpenMP, when we there is a nested for loop, and then another inner for loop doing the main operation.
I tried declaring a parallel region, and then simply putting pragma fors before the blocks, but definitely I am missing a point there!
Thanks,
Sayan

I'm more of a Fortran programmer than C so my knowledge of OpenMP in C-style is poor, and I'll leave the syntax to you.
Your easiest approach here is probably (I'll qualify this later) to simply parallelise the outermost loop. By default OpenMP will regard variable i as private, all the rest as shared. This is probably not what you want, you probably want to make j and k and t private too. I suspect that you want val private also.
I'm a bit puzzled by the statement at the bottom of your nest of loops (ie someFunction...), which doesn't seem to return any value at all. Does it work by side-effects ?
So, you shouldn't need to declare a parallel region enclosing all this code, and you should probably only parallelise the outermost loop. If you were to parallelise the inner loops too you might find your OpenMP installation either ignoring them, spawning more processes than you have processors, or complaining bitterly.
I say that your easiest approach is probably to parallelise the outermost loop because I've made some assumptions about what your program (fragment) is doing. If the assumptions are wrong you might want to parallelise one of the inner loops. Another point to check is that the number of executions of the loop(s) you parallelise is much greater than the number of threads you use. You don't want to have OpenMP run loops with a trip count of, say, 7, on 4 threads, the load balance would be very poor.

You're correct, the innermost statement would rather be someFunction((i + t) + someFunction2(j + t) + k*t).

Related

Nested loops for beginners

I can't seem to understand the logic of nested loops.
Any tips or easy examples about how to make nested loops less complicated? New to programming. Thanks.

Nested loops are essential for a good programmer. They are usually used to manage matrices in order to avoid code repetition. Let's have an example:
for(int i = 0; i < 5; i++){
for(int j = 0; j < 5; j++){
Log.i("i:", i);
Log.i("j:", j);
}
}
After using this code (Java) several numbers are written in the log of the IDE. As you can see, the inner loop is run first (in our example 5 strings are written, because the loop needs to be run 5 times in order to make j become equal to 5). After the first 5 results are carried out, the program increments i and runs the inner loop again. The outer loop is also run 5 times, indeed it takes 5 times to make i equal to 5.
As you can notice, the code written inside the two nested loops is run i*j times.
I hope my answer was clear enaugh.

Can 2 small loops be faster than a big one?

I was watching this video "how did we end up here?" by Martin Thompson of mechanical-sympathy.
(http://m.youtube.com/watch?v=oxjT7veKi9c)
He claims that to make use of the L0 cache, sometimes it's better to have 2 small loops rather than a big one even though we might to to pass through the same list twice.
Is it possible? Anyway to create a trivial example code with measurement to demonstrate this?

Simple example:
double sum1 = 0, sum2 = 0;
for (i = n; --i >= 0;){
sum1 += a[i];
sum2 += b[i];
}
as against:
double sum1 = 0, sum2 = 0;
for (i = n; --i >= 0;){
sum1 += a[i];
}
for (i = n; --i >= 0;){
sum2 += b[i];
}
In the first example, the compiler has to generate code to "switch context" between indexing a[i] and b[i], and keeping track of where the addition goes.
If a and b are complicated, the compiler may be unable to hold references to both of them in registers.
The result can be that this "context switching", because it has to be done on every iteration, takes more instruction cycles than the cost of the extra loop.
(With unrolling, it is even more true.)
This is still without considering cache issues.

"Sometimes", maybe. If the loop's body may be split into parts without much overhead than the total count of executed instructions, either in two small loops or in one big loop, could be almost the same. And data cache helps anyway when traversing the input twice.
Yet I doubt if this trick could be really useful in general.

parallelization of nested loop in threading building blocks

I am new to threading building blocks and trying to encode the FFT algorithm in TBB to get some hands on experience. In case of this algorithm i can only parallelize the innermost loop. but by doing so the performance has degraded up to unacceptable degree (more than thousand times). I have tried for array size up to 2^20 but same result. My code is given below
for(stage1=0;stage1 < lim;stage1 ++)
{
butterfly_index1 = butterfly_index2;
butterfly_index2 = butterfly_index2 + butterfly_index2;
k = -1*2*PI/butterfly_index2;
k_next = 0.0;
for(stage2 = 0 ; stage2 < butterfly_index1;stage2++)
{
sine=sin(k_next);
cosine = cos(k_next);
k_next = k_next + k;
FFT. sine = &sine;
FFT.cosine = &cosine;
FFT.n2 = &butterfly_index2;
FFT.loop_init = &stage2;
FFT.n1 = &butterfly_index1;
parallel_for(blocked_range<int>(
stage2,SIZE,SIZE/4),FFT,simple_partitioner());
}
}
and body for parallel_loop is
void operator()(const blocked_range<int> &r)const
{
for(int k = r.begin(); k != r.end(); k++)
{
if(k != *loop_init)
{
if((k - (*loop_init))% (* n2)!= 0)
continue;
}
temp_real = (*cosine) * X[k + *n1].real - (*sine) * X[k + *n1].img;
temp_img = (*sine)* X[k + *n1].real + (*cosine) * X[k + *n1].img;
X[k + *n1].real = X[k].real - temp_real;
X[k + *n1].img = X[k].img - temp_img;
X[k].real = X[k].real + temp_real;
X[k].img = X[k].img + temp_img;
}
}
If i replace it with normal loop then things are correct.

For very short workloads, the huge slowdown can happen due to the overhead on thread creation. For 2^20 elements in the array, I do not believe such a huge performance drop happens.
Another significant source of performance degradation is compiler's inability to optimize (in particular, vectorize) the code after it has been TBBfied. Find out whether your compiler may produce a vectorization report, and look for differences between serial and TBB versions.
A possible source of slowdown might be the copy constructor of parallel_for's body class, because the bodies are copied a lot. But the given code does not look suspicious in this regard: seems the body contains a few pointers. Anyway, look whether it could be a problem.
Another usual source of significant overhead is too fine-grained parallelism, i.e. a lot of tasks each containing very little work. But here this is not the case either, because grainsize SIZE/4 in the 3rd parameter to blocked_range specifies that body's operator()() will be invoked at most 4 times per the algorithm.
I would recommend to not specify simple_partitioner and grainsize for initial experiments, and instead let TBB distribute the work dynamically. You can tune it later if necessary.

the problem with my code was the decrease in workload with increase in n2 variable. so as the out loop proceeds the workload for parallel_for become half and after just a few iteration it becomes too small to give performance boost by using TBB. So solution is to parallelize the inner loop only for iterations when the workload for inner loop is sufficient ans serialize the inner loop for rest of iterations. secondly the "for" loop header that contains the condition check (k != r.end()) also degrades the performance. solution is to replace the r.end() with a locally defined variable that is initialized to r.end()
Thanks to intel software forum for their assistance in solving this problem

Naming the counter variables of consecutive for-loops

Let's say you have a piece of code where you have a for-loop, followed by another for-loop and so on... now, which one is preferable
Give every counter variable the same name:
for (int i = 0; i < someBound; i++) {
doSomething();
}
for (int i = 0; i < anotherBound; i++) {
doSomethingElse();
}
Give them different names:
for (int i = 0; i < someBound; i++) {
doSomething();
}
for (int j = 0; j < anotherBound; j++) {
doSomethingElse();
}
I think the second one would be somewhat more readable, on the other hand I'd use j,k and so on to name inner loops... what do you think?

I reuse the variable name in this case. The reason being that i is sort of international programmerese for "loop control variable whose name isn't really important". j is a bit less clear on that score, and once you have to start using k and beyond it gets kind of obscure.
One thing I should add is that when you use nested loops, you do have to go to j, k, and beyond. Of course if you have more than three nested loops, I'd highly suggest a bit of refactoring.

first one is good for me,. cz that would allow you to use j, k in your inner loops., and because you are resetting i = 0 in the second loop so there wont be any issues with old value being used

In a way you wrote your loops the counter is not supposed to be used outside the loop body. So there's nothing wrong about using the same variable names.
As for readability i, j, k are commonly used as variable names for counters. So it is even better to use them rather then pick the next letter over and over again.

I find it interesting that so many people have different opinions on this. Personally I prefer the first method, if for no other reason then to keep j and k open. I could see why people would prefer the second one for readability, but I think any coder worth handing a project over to is going to be able to see what you're doing with the first situation.

The variable should be named something related to the operation or the boundary condition.
For example:
'indexOfPeople',
'activeConnections', or
'fileCount'.
If you are going to use 'i', 'j', and 'k', then reserve 'j' and 'k' for nested loops.

void doSomethingInALoop() {
for (int i = 0; i < someBound; i++) {
doSomething();
}
}
void doSomethingElseInALoop() {
for (int i = 0; i < anotherBound; i++) {
doSomethingElse();
}
}

If the loops are doing the same things (the loop control -- not the loop body, i.e. they are looping over the same array or same range), then I'd use the same variable.
If they are doing different things -- A different array, or whatever, then I'd use use different variables.

So, on the one-hundredth loop, you'd name the variable "zzz"?
The question is really irrelevant since the variable is defined local to the for-loop. Some flavors of C, such as on OpenVMS, require using different names. Otherwise, it amounts to programmer's preference, unless the compiler restricts it.

Should one use < or <= in a for loop [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
If you had to iterate through a loop 7 times, would you use:
for (int i = 0; i < 7; i++)
or:
for (int i = 0; i <= 6; i++)
There are two considerations:
performance
readability
For performance I'm assuming Java or C#. Does it matter if "less than" or "less than or equal to" is used? If you have insight for a different language, please indicate which.
For readability I'm assuming 0-based arrays.
UPD: My mention of 0-based arrays may have confused things. I'm not talking about iterating through array elements. Just a general loop.
There is a good point below about using a constant to which would explain what this magic number is. So if I had "int NUMBER_OF_THINGS = 7" then "i <= NUMBER_OF_THINGS - 1" would look weird, wouldn't it.

The first is more idiomatic. In particular, it indicates (in a 0-based sense) the number of iterations. When using something 1-based (e.g. JDBC, IIRC) I might be tempted to use <=. So:
for (int i=0; i < count; i++) // For 0-based APIs
for (int i=1; i <= count; i++) // For 1-based APIs
I would expect the performance difference to be insignificantly small in real-world code.

Both of those loops iterate 7 times. I'd say the one with a 7 in it is more readable/clearer, unless you have a really good reason for the other.

I remember from my days when we did 8086 Assembly at college it was more performant to do:
for (int i = 6; i > -1; i--)
as there was a JNS operation that means Jump if No Sign. Using this meant that there was no memory lookup after each cycle to get the comparison value and no compare either. These days most compilers optimize register usage so the memory thing is no longer important, but you still get an un-required compare.
By the way putting 7 or 6 in your loop is introducing a "magic number". For better readability you should use a constant with an Intent Revealing Name. Like this:
const int NUMBER_OF_CARS = 7;
for (int i = 0; i < NUMBER_OF_CARS; i++)
EDIT: People aren’t getting the assembly thing so a fuller example is obviously required:
If we do for (i = 0; i <= 10; i++) you need to do this:
mov esi, 0
loopStartLabel:
; Do some stuff
inc esi
; Note cmp command on next line
cmp esi, 10
jle exitLoopLabel
jmp loopStartLabel
exitLoopLabel:
If we do for (int i = 10; i > -1; i--) then you can get away with this:
mov esi, 10
loopStartLabel:
; Do some stuff
dec esi
; Note no cmp command on next line
jns exitLoopLabel
jmp loopStartLabel
exitLoopLabel:
I just checked and Microsoft's C++ compiler does not do this optimization, but it does if you do:
for (int i = 10; i >= 0; i--)
So the moral is if you are using Microsoft C++†, and ascending or descending makes no difference, to get a quick loop you should use:
for (int i = 10; i >= 0; i--)
rather than either of these:
for (int i = 10; i > -1; i--)
for (int i = 0; i <= 10; i++)
But frankly getting the readability of "for (int i = 0; i <= 10; i++)" is normally far more important than missing one processor command.
† Other compilers may do different things.

I always use < array.length because it's easier to read than <= array.length-1.
also having < 7 and given that you know it's starting with a 0 index it should be intuitive that the number is the number of iterations.

Seen from an optimizing viewpoint it doesn't matter.
Seen from a code style viewpoint I prefer < . Reason:
for ( int i = 0; i < array.size(); i++ )
is so much more readable than
for ( int i = 0; i <= array.size() -1; i++ )
also < gives you the number of iterations straight away.
Another vote for < is that you might prevent a lot of accidental off-by-one mistakes.

#Chris, Your statement about .Length being costly in .NET is actually untrue and in the case of simple types the exact opposite.
int len = somearray.Length;
for(i = 0; i < len; i++)
{
somearray[i].something();
}
is actually slower than
for(i = 0; i < somearray.Length; i++)
{
somearray[i].something();
}
The later is a case that is optimized by the runtime. Since the runtime can guarantee i is a valid index into the array no bounds checks are done. In the former, the runtime can't guarantee that i wasn't modified prior to the loop and forces bounds checks on the array for every index lookup.

It makes no effective difference when it comes to performance. Therefore I would use whichever is easier to understand in the context of the problem you are solving.

I prefer:
for (int i = 0; i < 7; i++)
I think that translates more readily to "iterating through a loop 7 times".
I'm not sure about the performance implications - I suspect any differences would get compiled away.

In C++, I prefer using !=, which is usable with all STL containers. Not all STL container iterators are less-than comparable.

In Java 1.5 you can just do
for (int i: myArray) {
...
}
so for the array case you don't need to worry.

I don't think there is a performance difference. The second form is definitely more readable though, you don't have to mentally subtract one to find the last iteration number.
EDIT: I see others disagree. For me personally, I like to see the actual index numbers in the loop structure. Maybe it's because it's more reminiscent of Perl's 0..6 syntax, which I know is equivalent to (0,1,2,3,4,5,6). If I see a 7, I have to check the operator next to it to see that, in fact, index 7 is never reached.

I'd say use the "< 7" version because that's what the majority of people will read - so if people are skim reading your code, they might interpret it wrongly.
I wouldn't worry about whether "<" is quicker than "<=", just go for readability.
If you do want to go for a speed increase, consider the following:
for (int i = 0; i < this->GetCount(); i++)
{
// Do something
}
To increase performance you can slightly rearrange it to:
const int count = this->GetCount();
for (int i = 0; i < count; ++i)
{
// Do something
}
Notice the removal of GetCount() from the loop (because that will be queried in every loop) and the change of "i++" to "++i".

Edsger Dijkstra wrote an article on this back in 1982 where he argues for lower <= i < upper:
There is a smallest natural number. Exclusion of the lower bound —as in b) and d)— forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of the unnatural numbers. That is ugly, so for the lower bound we prefer the ≤ as in a) and c). Consider now the subsequences starting at the smallest natural number: inclusion of the upper bound would then force the latter to be unnatural by the time the sequence has shrunk to the empty one. That is ugly, so for the upper bound we prefer < as in a) and d). We conclude that convention a) is to be preferred.

First, don't use 6 or 7.
Better to use:
int numberOfDays = 7;
for (int day = 0; day < numberOfDays ; day++){
}
In this case it's better than using
for (int day = 0; day <= numberOfDays - 1; day++){
}
Even better (Java / C#):
for(int day = 0; day < dayArray.Length; i++){
}
And even better (C#)
foreach (int day in days){// day : days in Java
}
The reverse loop is indeed faster but since it's harder to read (if not by you by other programmers), it's better to avoid in. Especially in C#, Java...

I agree with the crowd saying that the 7 makes sense in this case, but I would add that in the case where the 6 is important, say you want to make clear you're only acting on objects up to the 6th index, then the <= is better since it makes the 6 easier to see.

Way back in college, I remember something about these two operations being similar in compute time on the CPU. Of course, we're talking down at the assembly level.
However, if you're talking C# or Java, I really don't think one is going to be a speed boost over the other, The few nanoseconds you gain are most likely not worth any confusion you introduce.
Personally, I would author the code that makes sense from a business implementation standpoint, and make sure it's easy to read.

This falls directly under the category of "Making Wrong Code Look Wrong".
In zero-based indexing languages, such as Java or C# people are accustomed to variations on the index < count condition. Thus, leveraging this defacto convention would make off-by-one errors more obvious.
Regarding performance: any good compiler worth its memory footprint should render such as a non-issue.

As a slight aside, when looping through an array or other collection in .Net, I find
foreach (string item in myarray)
{
System.Console.WriteLine(item);
}
to be more readable than the numeric for loop. This of course assumes that the actual counter Int itself isn't used in the loop code. I do not know if there is a performance change.

There are many good reasons for writing i<7. Having the number 7 in a loop that iterates 7 times is good. The performance is effectively identical. Almost everybody writes i<7. If you're writing for readability, use the form that everyone will recognise instantly.

I have always preferred:
for ( int count = 7 ; count > 0 ; -- count )

Making a habit of using < will make it consistent for both you and the reader when you are iterating through an array. It will be simpler for everyone to have a standard convention. And if you're using a language with 0-based arrays, then < is the convention.
This almost certainly matters more than any performance difference between < and <=. Aim for functionality and readability first, then optimize.
Another note is that it would be better to be in the habit of doing ++i rather than i++, since fetch and increment requires a temporary and increment and fetch does not. For integers, your compiler will probably optimize the temporary away, but if your iterating type is more complex, it might not be able to.

Don't use magic numbers.
Why is it 7? ( or 6 for that matter).
use the correct symbol for the number you want to use...
In which case I think it is better to use
for ( int i = 0; i < array.size(); i++ )

The '<' and '<=' operators are exactly the same performance cost.
The '<' operator is a standard and easier to read in a zero-based loop.
Using ++i instead of i++ improves performance in C++, but not in C# - I don't know about Java.

As people have observed, there is no difference in either of the two alternatives you mentioned. Just to confirm this, I did some simple benchmarking in JavaScript.
You can see the results here. What is not clear from this is that if I swap the position of the 1st and 2nd tests, the results for those 2 tests swap, this is clearly a memory issue. However the 3rd test, one where I reverse the order of the iteration is clearly faster.

As everybody says, it is customary to use 0-indexed iterators even for things outside of arrays. If everything begins at 0 and ends at n-1, and lower-bounds are always <= and upper-bounds are always <, there's that much less thinking that you have to do when reviewing the code.

Great question. My answer: use type A ('<')
You clearly see how many iterations you have (7).
The difference between two endpoints is the width of the range
Less characters makes it more readable
You more often have the total number of elements i < strlen(s) rather than the index of the last element so uniformity is important.
Another problem is with this whole construct. i appears 3 times in it, so it can be mistyped. The for-loop construct says how to do instead of what to do. I suggest adopting this:
BOOST_FOREACH(i, IntegerInterval(0,7))
This is more clear, compiles to exaclty the same asm instructions, etc. Ask me for the code of IntegerInterval if you like.

So many answers ... but I believe I have something to add.
My preference is for the literal numbers to clearly show what values "i" will take in the loop. So in the case of iterating though a zero-based array:
for (int i = 0; i <= array.Length - 1; ++i)
And if you're just looping, not iterating through an array, counting from 1 to 7 is pretty intuitive:
for (int i = 1; i <= 7; ++i)
Readability trumps performance until you profile it, as you probably don't know what the compiler or runtime is going to do with your code until then.

You could also use != instead. That way, you'll get an infinite loop if you make an error in initialization, causing the error to be noticed earlier and any problems it causes to be limitted to getting stuck in the loop (rather than having a problem much later and not finding it).

I think either are OK, but when you've chosen, stick to one or the other. If you're used to using <=, then try not to use < and vice versa.
I prefer <=, but in situations where you're working with indexes which start at zero, I'd probably try and use <. It's all personal preference though.

Strictly from a logical point of view, you have to think that < count would be more efficient than <= count for the exact reason that <= will be testing for equality as well.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

OpenMP - running things in parallel and some in sequence within them - openmp

You're correct, the innermost statement would rather be someFunction((i + t) + someFunction2(j + t) + k*t).

Related

Nested loops for beginners

Can 2 small loops be faster than a big one?

parallelization of nested loop in threading building blocks

Naming the counter variables of consecutive for-loops

Should one use < or <= in a for loop [closed]

Categories

Resources