I was trying to implement some piece of parallel code and tried to synchronize the threads using an array of flags as shown below
// flags array set to zero initially
#pragma omp parallel for num_threads (n_threads) schedule(static, 1)
for(int i = 0; i < n; i ++){
for(int j = 0; j < i; j++) {
while(!flag[j]);
y[i] -= L[i][j]*y[j];
}
y[i] /= L[i][i];
flag[i] = 1;
}
However, the code always gets stuck after a few iterations when I try to compile it using gcc -O3 -fopenmp <file_name>. I have tried different number of threads like 2, 4, 8 all of them leads to the loop getting stuck. On putting print statements inside critical sections, I figured out that even though the value of flag[i] gets updated to 1, the while loop is still stuck or maybe there is some other problem with the code, I am not aware of.
I also figured out that if I try to do something inside the while block like printf("Hello\n") the problem goes away. I think there is some problem with the memory consistency across threads but I do not know how to resolve this. Any help would be appreciated.
Edit: The single threaded code I am trying to parallelise is
for(int i=0; i<n; i++){
for(int j=0; j < i; j++){
y[i]-=L[i][j]*y[j];
}
y[i]/=L[i][i];
}
You have data race in your code, which is easy to fix, but the bigger problem is that you also have loop carried dependency. The result of your code does depend on the order of execution. Try reversing the i loop without OpenMP, you will get different result, so your code cannot be parallelized efficiently.
One possibility is to parallelize the j loop, but the workload is very small inside this loop, so the OpenMP overheads will be significantly bigger than the speed gain by parallelization.
EDIT: In the case of your updated code I suggest to forget parallelization (because of loop carried dependency) and make sure that inner loop is properly vectorized, so I suggest the following:
for(int i = 0; i < n; i ++){
double sum_yi=y[i];
#pragma GCC ivdep
for(int j = 0; j < i; j++) {
sum_yi -= L[i][j]*y[j];
}
y[i] = sum_yi/L[i][i];
}
#pragma GCC ivdep tells the compiler that there is no loop carried dependency in the loop, so it can vectorize it safely. Do not forget to inform compiler the about the vectorization capabilities of your processor (e.g. use -mavx2 flag if your processor is AVX2 capable).
void ompClassifyToClusteres(Point* points, Cluster* clusteres, int
numOfPoints, int numOfClusteres, int myid) {
int i, j;
Cluster closestCluster;
double closestDistance;
double tempDistance;
omp_set_num_threads(OMP_NUM_OF_THREADS);
#pragma omp parallel private(j)
{
#pragma omp for
for (i = 0; i < numOfPoints; i++) {
closestCluster = clusteres[0];
closestDistance = distanceFromClusterCenter(points[i], closestCluster);
for (j = 1; j <= numOfClusteres; j++) {
tempDistance = distanceFromClusterCenter(points[i], clusteres[j]);
if (tempDistance < closestDistance) {
closestCluster = clusteres[j];
closestDistance = tempDistance;
}
}
points[i].clusterId = closestCluster.id;
}
}
printPoints(points, numOfPoints);
}
Output:
!(output
Im trying to classify points to clusteres for K-MEANS algorithm.
So im getting this output (dont notice the checks) in one execution and the right results in the second execution and goes on..
I tried to put some varibales in private but it didnt work.
Ill just say that this 3 points need to classify to cluster 0 and im guessing theres a race or something but i cant figure it out.
Yes, there is a race condition. tempDistance, closestCluster, and closestDistance should be private also. A good check is to ask yourself, do you need these variables to be different for each for loop iteration, if they happened at the same time?
You could make them private with the private() clause, like you did with j, or just declare them within the outer for loop.
I would like to know if these 2 codes are the same for performance respecting the variable declarations:
int Value;
for(int i=0; i<1000; i++)
{ Value = i;
}
or
for(int i=0; i<1000; i++)
{ int Value = i;
}
Basically I need to know if the process time to create the variable Value and allocate it in Ram is just once in the first case and if it is, or not, repeated 1000 times in the second.
If you are Programming in c++ or c#, there will be no runtime difference since no implicite initialization will be done for simple int type.
I'm currently working with a big container that i need to iterater over an do lots of things. Which of the following is considered better style?
a) Big Loop:
for (container::iterator it = myContainer.begin(); it < myContainer.end(); ++it) {
// do thing a with *it
// do thing d with *it
// do thing c with *it
// do thing b with *it
}
b) Small Loops:
for (container::iterator it = myContainer.begin(); it < myContainer.end(); ++it) {
// do thing a with *it
}
for (container::iterator it = myContainer.begin(); it < myContainer.end(); ++it) {
// do thing b with *it
}
for (container::iterator it = myContainer.begin(); it < myContainer.end(); ++it) {
// do thing c with *it
}
for (container::iterator it = myContainer.begin(); it < myContainer.end(); ++it) {
// do thing d with *it
}
I find b) better to read since a) will get clunky really easy and harder to understand. But a) will run faster I think. So which of those is better style? myContainer will contain up to 10.000 elements and this procedure will have to be repeated so performance is important.
There are further refactoring benefits to splitting your separate tasks to different loops.
Loop performance isn't always very straight-forward when handling big data sets so you should usually start with readability and go back to single loop if needed.
Here's Split Loop refactoring example by Martin Fowler.
You often see loops that are doing two different things at once, because
they can do that with one pass through a loop. Indeed most programmers
would feel very uncomfortable with this refactoring as it forces you to
execute the loop twice - which is double the work.
But like so many optimizations, doing two different things in one loop
is less clear than doing them separately. It also causes problems for
further refactoring as it introduces temps that get in the way of
further refactorings. So while refactoring, don't be afraid to get rid
of the loop. When you optimize, if the loop is slow that will show up
and it would be right to slam the loops back together at that point. You
may be surprised at how often the loop isn't a bottleneck, or how the
later refactorings open up another, more powerful, optimization.
Original code
void printValues() {
double averageAge = 0;
double totalSalary = 0;
for (int i = 0; i < people.length; i++) {
averageAge += people[i].age;
totalSalary += people[i].salary;
}
averageAge = averageAge / people.length;
System.out.println(averageAge);
System.out.println(totalSalary);
}
Becomes initially
void printValues() {
double totalSalary = 0;
for (int i = 0; i < people.length; i++) {
totalSalary += people[i].salary;
}
double averageAge = 0;
for (int i = 0; i < people.length; i++) {
averageAge += people[i].age;
}
averageAge = averageAge / people.length;
System.out.println(averageAge);
System.out.println(totalSalary);
}
Cleaned code after extracting methods
void printValues() {
System.out.println(averageAge());
System.out.println(totalSalary());
}
private double averageAge() {
double result = 0;
for (int i = 0; i < people.length; i++) {
result += people[i].age;
}
return result / people.length;
}
private double totalSalary() {
double result = 0;
for (int i = 0; i < people.length; i++) {
result += people[i].salary;
}
return result;
}
You should use the single loop. You could refactor the code contained inside the loop into separate functions.
Splitting the big loop into four smaller loops is rather inefficient, it adds superfluous operations.
Which one is faster? Why?
var messages:Array = [.....]
// 1 - for
var len:int = messages.length;
for (var i:int = 0; i < len; i++) {
var o:Object = messages[i];
// ...
}
// 2 - foreach
for each (var o:Object in messages) {
// ...
}
From where I'm sitting, regular for loops are moderately faster than for each loops in the minimal case. Also, as with AS2 days, decrementing your way through a for loop generally provides a very minor improvement.
But really, any slight difference here will be dwarfed by the requirements of what you actually do inside the loop. You can find operations that will work faster or slower in either case. The real answer is that neither kind of loop can be meaningfully said to be faster than the other - you must profile your code as it appears in your application.
Sample code:
var size:Number = 10000000;
var arr:Array = [];
for (var i:int=0; i<size; i++) { arr[i] = i; }
var time:Number, o:Object;
// for()
time = getTimer();
for (i=0; i<size; i++) { arr[i]; }
trace("for test: "+(getTimer()-time)+"ms");
// for() reversed
time = getTimer();
for (i=size-1; i>=0; i--) { arr[i]; }
trace("for reversed test: "+(getTimer()-time)+"ms");
// for..in
time = getTimer();
for each(o in arr) { o; }
trace("for each test: "+(getTimer()-time)+"ms");
Results:
for test: 124ms
for reversed test: 110ms
for each test: 261ms
Edit: To improve the comparison, I changed the inner loops so they do nothing but access the collection value.
Edit 2: Answers to oshyshko's comment:
The compiler could skip the accesses in my internal loops, but it doesn't. The loops would exit two or three times faster if it was.
The results change in the sample code you posted because in that version, the for loop now has an implicit type conversion. I left assignments out of my loops to avoid that.
Of course one could argue that it's okay to have an extra cast in the for loop because "real code" would need it anyway, but to me that's just another way of saying "there's no general answer; which loop is faster depends on what you do inside your loop". Which is the answer I'm giving you. ;)
When iterating over an array, for each loops are way faster in my tests.
var len:int = 1000000;
var i:int = 0;
var arr:Array = [];
while(i < len) {
arr[i] = i;
i++;
}
function forEachLoop():void {
var t:Number = getTimer();
var sum:Number = 0;
for each(var num:Number in arr) {
sum += num;
}
trace("forEachLoop :", (getTimer() - t));
}
function whileLoop():void {
var t:Number = getTimer();
var sum:Number = 0;
var i:int = 0;
while(i < len) {
sum += arr[i] as Number;
i++;
}
trace("whileLoop :", (getTimer() - t));
}
forEachLoop();
whileLoop();
This gives:
forEachLoop : 87
whileLoop : 967
Here, probably most of while loop time is spent casting the array item to a Number. However, I consider it a fair comparison, since that's what you get in the for each loop.
My guess is that this difference has to do with the fact that, as mentioned, the as operator is relatively expensive and array access is also relatively slow. With a for each loop, both operations are handled natively, I think, as opossed to performed in Actionscript.
Note, however, that if type conversion actually takes place, the for each version is much slower and the while version if noticeably faster (though, still, for each beats while):
To test, change array initialization to this:
while(i < len) {
arr[i] = i + "";
i++;
}
And now the results are:
forEachLoop : 328
whileLoop : 366
forEachLoop : 324
whileLoop : 369
I've had this discussion with a few collegues before, and we have all found different results for different scenarios. However, there was one test that I found quite eloquent for comparison's sake:
var array:Array=new Array();
for (var k:uint=0; k<1000000; k++) {
array.push(Math.random());
}
stage.addEventListener("mouseDown",foreachloop);
stage.addEventListener("mouseUp",forloop);
/////// Array /////
/* 49ms */
function foreachloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var i:uint=0;
for each (var n:Number in array) {
i++;
tmp+=n;
}
trace("foreach", i, tmp, getTimer() - t1);
}
/***** 81ms ****/
function forloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var l:uint=array.length;
for(var i:uint = 0; i < l; i++)
tmp += Number(array[i]);
trace("for", i, tmp, getTimer() - t1);
}
What I like about this tests is that you have a reference for both the key and value in each iteration of both loops (removing the key counter in the "for-each" loop is not that relevant). Also, it operates with Number, which is probably the most common loop that you will want to optimize that much. And most importantly, the winner is the "for-each", which is my favorite loop :P
Notes:
-Referencing the array in a local variable within the function of the "for-each" loop is irrelevant, but in the "for" loop you do get a speed bump (75ms instead of 105ms):
function forloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var a:Array=array;
var l:uint=a.length;
for(var i:uint = 0; i < l; i++)
tmp += Number(a[i]);
trace("for", i, tmp, getTimer() - t1);
}
-If you run the same tests with the Vector class, the results are a bit confusing :S
for would be faster for arrays...but depending on the situation it can be foreach that is best...see this .net benchmark test.
Personally, I'd use either until I got to the point where it became necessary for me to optimize the code. Premature optimization is wasteful :-)
Maybe in a array where all element are there and start at zero (0 to X) it would be faster to use a for loop. In all other case (sparse array) it can be a LOT faster to use for each.
The reason is the usage of two data structure in the array: Hast table an Debse Array.
Please read my Array analysis using Tamarin source:
http://jpauclair.wordpress.com/2009/12/02/tamarin-part-i-as3-array/
The for loop will check at undefined index where the for each will skip those one jumping to next element in the HastTable
guys!
Especially Juan Pablo Califano.
I've checked your test. The main difference in obtain array item.
If you will put var len : int = 40000;, you will see that 'while' cycle is faster.
But it loses with big counts of array, instead for..each.
Just an add-on:
a for each...in loop doesn't assure You, that the elements in the array/vector gets enumerated in the ORDER THEY ARE STORED in them. (except XMLs)
This IS a vital difference, IMO.
"...Therefore, you should not write code that depends on a for-
each-in or for-in loop’s enumeration order unless you are processing
XML data..." C.Moock
(i hope not to break law stating this one phrase...)
Happy benchmarking.
sorry to prove you guys wrong, but for each is faster. even a lot. except, if you don't want to access the array values, but a) this does not make sense and b) this is not the case here.
as a result of this, i made a detailed post on my super new blog ... :D
greetz
back2dos