I have an array of 100 elements that needs to be sorted with insertion sort using OpenMP. When I parallelize my sort it does not give correct values. Can some one help me
void insertionSort(int a[])
{
int i, j, k;
#pragma omp parallel for private(i)
for(i = 0; i < 100; i++)
{
k = a[i];
for (j = i; j > 0 && a[j-1] > k; j--)
#pragma omp critical
a[j] = a[j-1];
a[j] = k;
}
}
Variables "j" and "k" need to be private on the parallel region. Otherwise you have a data race condition.
Unless it's a homework, sorting as few as 100 elements in parallel makes no sense: the overhead introduced by parallelism will far outweigh any performance benefit.
And, insertion sort algorithm is inherently serial. When a[i] is processed, it is supposed that all previous elemens in the array are already sorted. But if two elements are processed in parallel, there is obviously no such guarantee.
A more detailed explanation of why insertion sort cannot be parallelized in the suggested way is given by #dreamcrash in his answer to a similar question.
Related
I'm trying to write a code for matrix multiplication. As far as I understand OMP and pararel programming this code may suffer from race condition.
#pragma omp parallel
#pragma omp for
for (int k = 0; k < size; k++){
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
c[i][j] += a[i][k] * b[k][j];
}}}
Do I get rid of it if I put #pragma omp atomic before writing to c matrix or by adding private(i) to 2nd #pragma? Also is it possible to make this code false-sharing free? If yes, how ?
A race condition occurs when 2 or more threads access the same memory location and at least one of them is writing it. Line c[i][j] +=... can cause data race in your code. The solution is to reorder your nested loops (use the order of i,j,k) and you may introduce a temporary variable to calculate the dot product:
#pragma omp parallel for
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
double tmp=0; // change its type as needed
for (int k = 0; k < size; k++){
tmp += a[i][k] * b[k][j];
}
c[i][j] = tmp; //note that += was used in your original code
}
}
Note that your code will be faster if you calculate the transpose of matrix b. For more details read this.
UPDATE:
If you need to maintain the order of loops, there are 2 possibilities (but these solutions may be slower than the serial code):
Use atomic operation (i.e #pragma omp atomic). In this case false sharing also can be a problem.
If your stack is large enough to store the matrix for all threads, a better alternative is to use reduction: #pragma omp parallel for reduction(+:c[:size][:size]) (Another alternative is to do the reduction manually. In this case you can allocate the matrices used for reduction on the heap.)
Consider the following code segment
sum = 0;
for (i=0; i<n; i++)
sum = myfunc(a[i])+ sum;
Write the corresponding parallel code segment using OPENMP.
I did this way,
sum = 0;
#pragma omp parallel for
for (i=0; i<n; i++)
sum = myfunc(a[i])+ sum;
I'm a newcomer in parallel computing. Do you think is it correct?
Thank you very much for your help!
The sum variable will become a point of contention because every iteration touches it. Since you are doing a reduction, you should use the reduction clause to let OpenMP know that you want that variable accumulated across all threads:
sum = 0;
#pragma omp parallel for reduction(+ : sum)
for (i=0; i<n; i++)
sum = myfunc(a[i])+ sum;
Quite similar to that question
Sorting an array in openmp
which has several hundred views but no correct answer. Therefore I give it another try asking here again.
I am aware of the overhead and uselessness of this regarding speedup or performance. It simply is a small example to get into openMP. The fact that is is insertSort is given by my courseinstructor.
Here is my code:
std::vector<int> insertionSort(std::vector<int> a) {
int i, j, k;
#pragma omp parallel for private(i,j,k)
for(i = 0; i < a.size(); i++) {
#pragma omp critical
k = a[i];
for (j = i; j > 0 && a[j-1] > k; j--)
#pragma omp critical
{
a[j] = a[j-1];
a[j] = k;
}
}
return a;
}
I understand that the critical aspect is the race-condition between threads accessing (reading and writing) elements of a - that is, why I put a critical section arround all of them. That does not seem to be sufficient. What am I missing here. Without the pragmas, the sorting is correct.
What is the worst case time complexity for the following two algorithms assuming items (an ArrayList<Integer>)has enough unused space that it never needs to be re-sized? My initial guess is that A would run slower because it has to shift every element over to add the new one at index [0]. I think B is O(N^2) in the worst case but I am not sure.
A.
for (int i = 0; i < N; i++)
items.add(0, new Integer(i));
and B.
for (int i = 0; i < N; i++)
items.add(new Integer(i));
If your question is about java, then first version is slower and has complexity O(N^2)for the very reason you mention, while B has complexity O(N).
Implementation A could be, by assuming that the items array is sufficiently large, implemented as:
for (int i = 0; i < n; i++) {
for (int j = items.size; j > 0; j++) {
items[j] = items[j-1];
}
items[0] = i;
}
The total number of operations executed in this case (assuming m was the initial size of the items list) would be:
This has the complexity O(n2)
Option B, on the other hand, can be implemented as
for (int i = 0; i < n; i++) {
items[items.size] = i;
items.size++;
}
and the number of operations executed in this case will be
This has the complexity O(n)
in A, you must shift all of the items to the right one in the internal array of the array list for each insertion. This will be O(n^2) to complete the operation. In the second case, no shifting is needed so it will be O(n). In A, you are doing tons of unnecessary and expensive work.
I am assuming, as you had stipulated, that the internal array is not resized.
This is taken from TopCoder's Algorithm page - section on "Trivial algorithms for RMQ"
Supposedly a pre-processing function for calculating RMQ on the A array.
void process1(int M[MAXN][MAXN], int A[MAXN], int N)
{
int i, j;
for (i =0; i < N; i++)
M[i][i] = i;
for (i = 0; i < N; i++)
for (j = i + 1; j < N; j++)
if (A[M[i][j - 1]] < A[j])
M[i][j] = M[i][j - 1];
else
M[i][j] = j;
}
But I don't see how the M 2D array generated would be helpful in calculating the RMQ, what am I not getting?
Hint
The array A[] contains the sequence of elements that you calculate the RMQ for. The array M[][] contains answers for every query of type "What is the minimum element in range a..b?" into M[a][b].
Full answer
This way, you can look up the answer to any query on in constant time by looking at the respective element inside M[][].
The way it is calculated is as follows:
The first for-loop iterates over all elements and assigns the minimum of the range i..i to i. This is because the minimum of one-element-range is just that element.
The nested loops then calculate the RMQ answers for the other ranges i..k for all k > i. This is done by extending the already-calculated range that starts at i by one element at a time.