Parallelizing with OpenMP - how? - parallel-processing

Parallelizing with OpenMP - how? - parallel-processing

I want to parallelize an OpenMP raytracing algorithm that contains two for loops.
Is there anything more I can do than just setting omp_set_num_threads(omp_get_max_threads()) and putting #pragma omp parallel for in front of the first for loop?
So far I've reached a 2.13-times faster algorithm.
Code:
start = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < (viewport.xvmax - viewport.xvmin); i++)
{
for (int j = 0; j < (viewport.yvmax - viewport.yvmin); j++)
{
int intersection_object = -1; // none
int reflected_intersection_object = -1; // none
double current_lambda = 0x7fefffffffffffff; // maximum positive double
double current_reflected_lambda = 0x7fefffffffffffff; // maximum positive double
RAY ray, shadow_ray, reflected_ray;
PIXEL pixel;
SPHERE_INTERSECTION intersection, current_intersection, shadow_ray_intersection, reflected_ray_intersection, current_reflected_intersection;
double red, green, blue;
double theta, reflected_theta;
bool bShadow = false;
pixel.i = i;
pixel.j = j;
// 1. compute ray:
compute_ray(&ray, &view_point, &viewport, &pixel, &camera_frame, focal_distance);
// 2. check if ray hits an object:
for (int k = 0; k < NSPHERES; k++)
{
if (sphere_intersection(&ray, &sphere[k], &intersection))
{
// there is an intersection between ray and object
// 1. Izracunanaj normalu...
intersection_normal(&sphere[k], &intersection, &ray);
// 2. ako je lambda presjecista manji od trenutacnog:
if (intersection.lambda_in < current_lambda)
{
current_lambda = intersection.lambda_in;
intersection_object = k;
copy_intersection_struct(&current_intersection, &intersection);
}
// izracunaj current lambda current_lambda =
// oznaci koji je trenutacni object : intersection_object =
// kopiraj strukturu presjeka : copy_intersection_struct();
}
}
// Compute the color of the pixel:
if (intersection_object > -1)
{
compute_shadow_ray(&shadow_ray, &intersection, &light);
theta = dotproduct(&(shadow_ray.direction), &(intersection.normal));
for (int l = 0; l<NSPHERES; l++)
{
if (l != intersection_object)
{
if (sphere_intersection(&shadow_ray, &sphere[l], &shadow_ray_intersection) && (theta>0.0))
bShadow = true;
}
}
if (bShadow)
{ // if in shadow, add only ambiental light to the surface color
red = shadow(sphere[intersection_object].ka_rgb[CRED], ambi_light_intensity);
green = shadow(sphere[intersection_object].ka_rgb[CGREEN], ambi_light_intensity);
blue = shadow(sphere[intersection_object].ka_rgb[CBLUE], ambi_light_intensity);
}
else
{
// the intersection is not in shadow:
red = blinnphong_shading(&current_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CRED], sphere[intersection_object].ks_rgb[CRED], sphere[intersection_object].ka_rgb[CRED], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
green = blinnphong_shading(&current_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CGREEN], sphere[intersection_object].ks_rgb[CGREEN], sphere[intersection_object].ka_rgb[CGREEN], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
blue = blinnphong_shading(&current_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CBLUE], sphere[intersection_object].ks_rgb[CBLUE], sphere[intersection_object].ka_rgb[CBLUE], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
}
tabelaPixlov[i][j].red = red;
tabelaPixlov[i][j].green = green;
tabelaPixlov[i][j].blue = blue;
glColor3f(tabelaPixlov[i][j].red, tabelaPixlov[i][j].green, tabelaPixlov[i][j].blue);
intersection_object = -1;
bShadow = false;
}
else
{
// draw the pixel with the background color
tabelaPixlov[i][j].red = 0;
tabelaPixlov[i][j].green = 0;
tabelaPixlov[i][j].blue = 0;
intersection_object = -1;
bShadow = false;
}
current_lambda = 0x7fefffffffffffff;
current_reflected_lambda = 0x7fefffffffffffff;
}
}
//glFlush();
stop = omp_get_wtime();
for (int i = 0; i < (viewport.xvmax - viewport.xvmin); i++)
{
for (int j = 0; j < (viewport.yvmax - viewport.yvmin); j++)
{
glColor3f(tabelaPixlov[i][j].red, tabelaPixlov[i][j].green, tabelaPixlov[i][j].blue);
glBegin(GL_POINTS);
glVertex2i(i, j);
glEnd();
}
}
printf("%f\n št niti:%d\n", stop - start, omp_get_max_threads());
glutSwapBuffers();
}

With ray tracing you should use schedule(dynamic). Besides that I would suggest fusing the loop
#pragma omp parallel for schedule(dynamic) {
for(int n=0; n<((viewport.xvmax - viewport.xvmin)*(viewport.yvmax - viewport.yvmin); n++) {
int i = n/(viewport.yvmax - viewport.yvmin);
int j = n%(viewport.yvmax - viewport.yvmin)
//...
}
Also, why are you setting the number of threads? Just use the default which should be set to the number of logical cores. If you have Hyper Threading ray tracing is one of the algorithms that will benefit from Hyper Threading so you don't want to set the number of threads to the number of physical cores.
In addition to using MIMD with OpenMP I would suggest looking into using SIMD for ray tracing. See Ingo Wald's PhD thesis for an example on how to do this http://www.sci.utah.edu/~wald/PhD/. Basically you shoot four (eight) rays in one SSE (AVX) register and then go down the ray tree for each ray in parallel. However, if one ray finishes you hold it and wait until all four are finished (this is similar to what is done on the GPU). There have been many papers written since which have more advanced tricks based on this idea.

Related

c code is running to slow from nested for loops

my c program is running to slow (right now it is around 40 seconds without parallelization). I have tried using openmp which has brought the timing down significantly but I am looking to use simple and natural ways to make my code run faster other than using parallel for loops. The basic structure of the code is that is takes some command line arguments as inputs and then saves those inputs as variables. Then it recursively computes a variable called Rplus1 using the math.h library and the complex.h library. The problem of the code and where it is taking most of it's time is at the bottom where there are nested for loops. My goal is to get the whole code running in under 5 seconds but as of now it runs in about 40 seconds without using parallel for loops. Please Help!
#include "time.h"
#include "stdio.h"
#include "stdlib.h"
#include "complex.h"
#include "math.h"
#include "string.h"
#include "unistd.h"
#include "omp.h"
#define PI 3.14159265
int main (int argc, char *argv[]){
if(argc >= 8){
double start1 = omp_get_wtime();
// command line arguments are aligned in the following order: [theta] [number of layers in superlattice] [material_1] [lat const_1] [number of unit cells_1] [material_2] [lat const_2] [number of unit cells_2] .... [material_N] [lat const_N] [number of unit cells_N] [Log/Linear] [number of repeating superlattice layers] [yes/no]
int N;
sscanf(argv[2],"%d",&N); // Number of layers in superlattice specified by second input argument
if(strcmp(argv[argc-1],"yes") == 0) //If the substrate is included then add one more layer to the N variable
{
N = N+1;
}
int total;
sscanf(argv[argc-2],"%d",&total); // Number of repeating superlattice layers specified by second to last argument
double layers[N][6], horizangle[1001], vertangle[1001];
double complex (*F_hkl)[1001][1001] = malloc(N*1001*1001*sizeof(complex double)), (*F_0)[1001][1001] = malloc(N*1001*1001*sizeof(complex double)), (*g)[1001][1001] = malloc(N*1001*1001*sizeof(complex double)), (*g_0)[1001][1001] = malloc(N*1001*1001*sizeof(complex double)),SF_table[10];// this array will hold the unit cell structure factors for all of the materials selected for each wavevector in the beam spectrum
double real, real2, lam, c_light = 299792458, h_pl = 4.135667516e-15,E = 10e3, r_0 = 2.818e-15, Lccd = 1.013;// just a few variables to hold values through calculations and constants, speed of light, plancks const, photon energy, and detector distance from sample
double angle;
double complex z;// just a variable to hold complex numbers throughout calculations
int i,j,m,n,t; // integers to index through arrays
lam = (h_pl*c_light)/E;
sscanf(argv[1],"%lf",&angle); //first argument is the angle of incidence, read it
angle = angle*(PI/180.0);
angle2 = -angle;
double (*table)[10] = malloc(10*9*sizeof(double)); // this array holds all the coefficients to calculate the atomic scattering factor below
double (*table2)[10] = malloc(10*2*sizeof(double));
FILE*datfile1 = fopen("/home/vhosts/xraydev.engr.wisc.edu/data/coef_table.bin","rb"); // read the binary file containg all the coefficients
fread(table,sizeof(double),90,datfile1);
fclose(datfile1);
FILE*datfile2 = fopen("/home/vhosts/xraydev.engr.wisc.edu/data/dispersioncs.bin","rb");
fread(table2,sizeof(double),20,datfile2);
fclose(datfile2);
// Calculate scattering factors for all elements
double a,b;
double k_z = (sin(angle)/lam)*1e-10; // incorporate angular dependence of SF but neglect 0.24 degree divergence because of approximation
for(i = 0;i<10;i++) // for each element...
{
SF_table[i] = 0;
for(j = 0;j<4;j++) // summation
{
a = table[2*j][i];
b = table[2*j+1][i];
SF_table[i] = SF_table[i] + a * exp(-b*k_z*k_z);
}
SF_table[i] = SF_table[i] + table[8][i] + table2[0][i] + table2[1][i]*I;
}
free(table);
double mm = 4.0, (*phi)[1001][1001] = malloc(N*1001*1001*sizeof(double));
for(i = 1; i < N+1; i++) // for each layer of material...
{
sscanf(argv[i*3+1],"%lf",&layers[i-1][1]); // get out of plane lattice constant
sscanf(argv[i*3+2],"%lf",&layers[i-1][2]); // get the number of unit cells in the layer
layers[i-1][1] = layers[i-1][1]*1e-10; // convert lat const input to meters
// Define reciprocal space positions at the incident angle h, k, l
layers[i-1][3] = 0; // h
layers[i-1][4] = 0; // k
double l; // l calculated for each wavevector in the spectrum because l changes with angle of incidence
for (m = 0; m < 1001; m++)
{
for (n = 0; n <1001; n++)
{
l = 4;
phi[i-1][m][n] = 2*PI*layers[i-1][1]*sin(angle)/lam; // Caculate phi for each layer
if(strcmp(argv[i*3],"GaAs") == 0)
{
F_hkl[i-1][m][n] = (2+2*cexp(I*PI*l))*(SF_table[2]+SF_table[3]*cexp(I*PI*l/2));
F_0[i-1][m][n] = 0.5*8.0*(31 + table2[0][2] + table2[1][2]*I) + 0.5*8.0*(33 + table2[0][3] + table2[1][3]*I);
g[i-1][m][n] = 2*r_0*F_hkl[i-1][m][n]/mm/layers[i-1][1]*cos(2*angle[m][n]);
g_0[i-1][m][n] = 2*r_0*F_0[i-1][m][n]/mm/layers[i-1][1];
}
if(strcmp(argv[i*3],"AlGaAs") == 0)
{
F_hkl[i-1][m][n] = (2+2*cexp(I*PI*l))*((0.76*SF_table[2]+ 0.24*SF_table[4])+SF_table[3]*cexp(I*PI*l/2));
F_0[i-1][m][n] = 0.24*4.0*(13 + table2[0][4] + table2[1][4]*I) + 0.76*4.0*(31 + table2[0][2] + table2[1][2]*I) + 4.0*(33 + table2[0][3] + table2[1][3]*I);
g[i-1][m][n] = 2*r_0*F_hkl[i-1][m][n]/mm/layers[i-1][1]*cos(2*angle[m][n]);
g_0[i-1][m][n] = 2*r_0*F_0[i-1][m][n]/mm/layers[i-1][1];
}
}
}
}
double complex (*Rplus1)[1001] = malloc(1001*1001*sizeof(double complex));
for (m = 0; m < 1001; m++)
{
for (n = 0; n <1001; n++)
{
Rplus1[m][n] = 0.0;
}
}
double stop1 = omp_get_wtime();
for(i=1;i<N;i++) // For each layer of the film
{
for(j=0;j<layers[i][2];j++) // For each unit cell
{
for (m = 0; m < 1001; m++) // For each row of the diffraction pattern
{
for (n = 0; n <1001; n++) // For each column of the diffraction pattern
{
Rplus1[m][n] = -I*g[i][m][n] + ((1-I*g_0[i][m][n])*(1-I*g_0[i][m][n]))/(I*g[i][m][n] + (cos(-2*phi[i][m][n])+I*sin(-2*phi[i][m][n]))/Rplus1[m][n]);
}
}
}
}
double stop2 = omp_get_wtime();
double elapsed1 = (double)(stop1 - start1);// Second user defined function to use Durbin and Follis recursive formula
double elapsed2 = (double)(stop2 - start1);// Second user defined function to use Durbin and Follis recursive formula
printf("main() through before diffraction function took %f seconds to run\n\n",elapsed1);
printf("main() through after diffraction function took %f seconds to run\n\n",elapsed2);
}
}

Simple image processing algorithm causes Processing to freeze

I've written an algorithm in Processing to do the following:
1. Instantiate a 94 x 2 int array
2. Load a jpg image of dimensions 500 x 500 pixels
3. Iterate over every pixel in the image and determine whether it is black or white then change a variable related to the array
4. Print the contents of the array
For some reason this algorithm freezes immediately. I've put print statements in that show me that it freezes before even attempting to load the image. This is especially confusing to me in light of the fact that I have written another very similar algorithm that executes without complications. The other algorithm reads an image, averages the color of each tile of whatever size is specified, and then prints rectangles over the region that was averaged with the average color, effectively pixelating the image. Both algorithms load an image and examine each of its pixels. The one in question is mostly different in that it doesn't draw anything. I was going to say that it was different for having an array but the pixelation algorithm holds all of the colors in a color array which should take up far more space than the int array.
From looking in my mac's console.app I see that there was originally this error: "java.lang.OutOfMemoryError: GC overhead limit exceeded". From other suggestions/sources on the web I tried bumping the memory allocation from 256mb to 4000mb (doing this felt meaningless because my analysis of the algorithms showed they should be the same complexity but I tried anyways). This did not stop freezing but changed the error to a combination of "JavaNativeFoundation error occurred obtaining Java exception description" and "java.lang.OutOfMemoryError: Java heap space".
Then I tried pointing processing to my local jdk with the hope of utilizing the 64 bit jdk over processing's built in 32 bit jdk. From within Processing.app/Contents I executed the following commands:
mv Java java-old
ln -s /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk Java
Processing would not start after this attempt with the following error populating my console:
"com.apple.xpc.launchd[1]: (org.processing.app.160672[13559]) Service exited with abnormal code: 1"
Below is my code:
First the noncompliant algorithm
int squareSize=50;
int numRows = 10;
int numCols = 10;
PFont myFont;
PImage img;
//33-126
void setup(){
size(500,500);
count();
}
void count(){
ellipseMode(RADIUS);
int[][] asciiArea = new int[94][2];
println("hello?");
img=loadImage("countingPicture.jpg");
println("image loaded");
for(int i=0; i<(500/squareSize); i++){
for(int j=0; j<(500/squareSize); j++){
int currentValue=i+j*numCols;
if(currentValue+33>126){
break;
}
println(i+", "+j);
asciiArea[currentValue][0]=currentValue+33;
asciiArea[currentValue][1]=determineTextArea(i,j,squareSize);
//fill(color(255,0,0));
//ellipse(i*squareSize,j*squareSize,3,3);
}
}
println("done calculating");
displayArrayContents(asciiArea);
}
int determineTextArea(int i, int j, int squareSize){
int textArea = 0;
double n=0.0;
while(n < squareSize*squareSize){
n+=1.0;
int xOffset = (int)(n%((double)squareSize));
int yOffset = (int)(n/((double)squareSize));
color c = img.get(i*squareSize+xOffset, j*squareSize+yOffset);
if(red(c)!=255 || green(c)!=255 || blue(c)!=255){
println(red(c)+" "+green(c)+" "+blue(c));
textArea++;
}
}
return textArea;
}
void displayArrayContents(int[][] arr){
int i=0;
println("\n now arrays");
while(i<94){
println(arr[i][0]+" "+arr[i][1]);
}
}
The pixelation algorithm that works:
PImage img;
int direction = 1;
float signal;
int squareSize = 5;
int wideness = 500;
int highness = 420;
int xDimension = wideness/squareSize;
int yDimension= highness/squareSize;
void setup() {
size(1500, 420);
noFill();
stroke(255);
frameRate(30);
img = loadImage("imageIn.jpg");
color[][] colors = new color[xDimension][yDimension];
for(int drawingNo=0; drawingNo < 3; drawingNo++){
for(int i=0; i<xDimension; i++){
for(int j=0; j<yDimension; j++){
double average = 0;
double n=0.0;
while(n < squareSize*squareSize){
n+=1.0;
int xOffset = (int)(n%((double)squareSize));
int yOffset = (int)(n/((double)squareSize));
color c = img.get(i*squareSize+xOffset, j*squareSize+yOffset);
float cube = red(c)*red(c) + green(c)*green(c) + blue(c)*blue(c);
double grayValue = (int)(sqrt(cube)*(255.0/441.0));
double nAsDouble = (double)n;
average=(grayValue + (n-1.0)*average)/n;
average=(grayValue/n)+((n-1.0)/(n))*average;
}
//average=discretize(average);
println(i+" "+j+" "+average);
colors[i][j]=color((int)average);
fill(colors[i][j]);
if(drawingNo==0){ //stroke(colors[i][j]); }
stroke(210);}
if(drawingNo==1){ stroke(150); }
if(drawingNo==2){ stroke(90); }
//stroke(colors[i][j]);
rect(drawingNo*wideness+i*squareSize,j*squareSize,squareSize,squareSize);
}
}
}
save("imageOut.jpg");
}

You're entering an infinite loop, which makes the println() statements unreliable. Fix the infinite loop, and your print statements will work again.
Look at this while loop:
while(i<94){
println(arr[i][0]+" "+arr[i][1]);
}
When will i ever become >= 94?
You never increment i, so its value is always 0. You can prove this by adding a println() statement inside the while loop:
while(i<94){
println("i: " + i);
println(arr[i][0]+" "+arr[i][1]);
}
You probably wanted to increment i inside the while loop. Or just use a for loop instead.

Canvas multiple Text draw performance

Is there a way of drawing efficiently multiple characters individually at some reasonable FPS on Canvas in Dart?
I am rendering an array of characters with different colors, background rectangles, etc and it runs smoothly only if the "resolution of characters" is max 40x40.
This is the drawing method:
static draw(CanvasRenderingContext2D ctx, CanvasRenderingContext2D ctxUnvisible) {
for(int i = 0; i < chars.length; i++) {
for(int j = 0; j < chars[0].length; j++) {
ctxUnvisible.fillRect(i*offX, j*offY, (i+1)*offX, (j+1)*offY);
}
}
for(int i = 0; i < chars.length; i++) {
for(int j = 0; j < chars[0].length; j++) {
ctxUnvisible.fillStyle = charArray[i][j].color;
ctxUnvisible.fillText(charArray[i][j].char, i*offX, j*offY);
}
}
ctx.drawImage(ctxUnvisible.canvas, 0, 0);
}
The first double loop renders background rectangles as "text background" and the second draws the characters itself. This unfornately doesn't work for larger number of characters. Is there some more efficient way of drawing it? I am already drawing to unvisible canvas and then copying it to the visible one, but that's still not enough.

In system all single char is prerendered ( I've heard it, not confirmed. ). You can make lazy initialized Map of CanvasElements and draw every character like image.
Example:
CanvasElement precompiled_a = new CanvasElement(width:20, height:20);
CanvasRenderingContext2D ctx = precompiled_a.context2D;
ctx.fillStyle = "black";
ctx.fillText("a", 10, 10);
CanvasElement c = querySelector("canvas");
c.context2D.drawImage(precompiled_a, 2, 2);

How to find pair edge while building half-edge data structure

I cannot find out an efficient way to generate the opposite edges of a given edge. My idea is just to do the iterates:
//construct the opposite half edges
for(int j=0;j<edge_num;j++)
for(int m=0;m<edge_num;m++)
if(edge[j].vert_end->v_index==edge[m].vert_start->v_index &&
edge[j].vert_start->v_index==edge[m].vert_end->v_index )
{
edge[j].pair = &edge[m];
edge[m].pair = &edge[j];
}
Other information about an half edge is generated from the procedure of loading .M file.
My structure is:
class HE_vert{
public:
GLfloat x, y, z;
int v_index;
HE_edge *edge;
};
class HE_face{
public:
int v1, v2, v3;
int f_index;
HE_edge* edge;
};
class HE_edge{
public:
HE_edge(){ pair = NULL; }
public:
HE_vert* vert_start; // vertex at the start of the half-edge
HE_vert* vert_end; // vertex at the end of the half-edge
HE_edge* pair; // oppositely oriented adjacent half-edge
HE_face* face; // face the half-edge borders
HE_edge* next; // next half-edge around the face
int e_index;
};
I checked all the output information and it’s correct, but it took a long computational time, especially when loading bunny.M. How can I do this in an more efficient way? Could you give me some hints?

// grid[i + vert_num*j] = edge from i to j
int grid[vert_num*vert_num]; // malloc()?
// memset()?
for (int i = vert_num*vert_num - 1; i >= 0; i--)
{
grid[i] = -1;
}
for (int i = 0; i < edge_num; i++)
{
int i_from = edge[i]->vert_start->v_index;
int i_to = edge[i]->vert_end->v_index;
int pair_index = grid[i_to + vert_num*i_from];
if (pair_index >= 0)
{
edge[i]->pair = edge[pair_index];
edge[pair_index]->pair = edge[i];
grid[i_to + vert_num*i_from] = -1;
}
else
{
grid[i_from + vert_num*i_to] = i;
}
}
Possible optimization: Use a linked list instead of a huge array. There will only be about 1-4 entries for each row/column.

Find local maxima in grayscale image using OpenCV

Does anybody know how to find the local maxima in a grayscale IPL_DEPTH_8U image using OpenCV? HarrisCorner mentions something like that but I'm actually not interested in corners ...
Thanks!

A pixel is considered a local maximum if it is equal to the maximum value in a 'local' neighborhood. The function below captures this property in two lines of code.
To deal with pixels on 'plateaus' (value equal to their neighborhood) one can use the local minimum property, since plateaus pixels are equal to their local minimum. The rest of the code filters out those pixels.
void non_maxima_suppression(const cv::Mat& image, cv::Mat& mask, bool remove_plateaus) {
// find pixels that are equal to the local neighborhood not maximum (including 'plateaus')
cv::dilate(image, mask, cv::Mat());
cv::compare(image, mask, mask, cv::CMP_GE);
// optionally filter out pixels that are equal to the local minimum ('plateaus')
if (remove_plateaus) {
cv::Mat non_plateau_mask;
cv::erode(image, non_plateau_mask, cv::Mat());
cv::compare(image, non_plateau_mask, non_plateau_mask, cv::CMP_GT);
cv::bitwise_and(mask, non_plateau_mask, mask);
}
}

Here's a simple trick. The idea is to dilate with a kernel that contains a hole in the center. After the dilate operation, each pixel is replaced with the maximum of it's neighbors (using a 5 by 5 neighborhood in this example), excluding the original pixel.
Mat1b kernelLM(Size(5, 5), 1u);
kernelLM.at<uchar>(2, 2) = 0u;
Mat imageLM;
dilate(image, imageLM, kernelLM);
Mat1b localMaxima = (image > imageLM);

Actually after I posted the code above I wrote a better and very very faster one ..
The code above suffers even for a 640x480 picture..
I optimized it and now it is very very fast even for 1600x1200 pic.
Here is the code :
void localMaxima(cv::Mat src,cv::Mat &dst,int squareSize)
{
if (squareSize==0)
{
dst = src.clone();
return;
}
Mat m0;
dst = src.clone();
Point maxLoc(0,0);
//1.Be sure to have at least 3x3 for at least looking at 1 pixel close neighbours
// Also the window must be <odd>x<odd>
SANITYCHECK(squareSize,3,1);
int sqrCenter = (squareSize-1)/2;
//2.Create the localWindow mask to get things done faster
// When we find a local maxima we will multiply the subwindow with this MASK
// So that we will not search for those 0 values again and again
Mat localWindowMask = Mat::zeros(Size(squareSize,squareSize),CV_8U);//boolean
localWindowMask.at<unsigned char>(sqrCenter,sqrCenter)=1;
//3.Find the threshold value to threshold the image
//this function here returns the peak of histogram of picture
//the picture is a thresholded picture it will have a lot of zero values in it
//so that the second boolean variable says :
// (boolean) ? "return peak even if it is at 0" : "return peak discarding 0"
int thrshld = maxUsedValInHistogramData(dst,false);
threshold(dst,m0,thrshld,1,THRESH_BINARY);
//4.Now delete all thresholded values from picture
dst = dst.mul(m0);
//put the src in the middle of the big array
for (int row=sqrCenter;row<dst.size().height-sqrCenter;row++)
for (int col=sqrCenter;col<dst.size().width-sqrCenter;col++)
{
//1.if the value is zero it can not be a local maxima
if (dst.at<unsigned char>(row,col)==0)
continue;
//2.the value at (row,col) is not 0 so it can be a local maxima point
m0 = dst.colRange(col-sqrCenter,col+sqrCenter+1).rowRange(row-sqrCenter,row+sqrCenter+1);
minMaxLoc(m0,NULL,NULL,NULL,&maxLoc);
//if the maximum location of this subWindow is at center
//it means we found the local maxima
//so we should delete the surrounding values which lies in the subWindow area
//hence we will not try to find if a point is at localMaxima when already found a neighbour was
if ((maxLoc.x==sqrCenter)&&(maxLoc.y==sqrCenter))
{
m0 = m0.mul(localWindowMask);
//we can skip the values that we already made 0 by the above function
col+=sqrCenter;
}
}
}

The following listing is a function similar to Matlab's "imregionalmax". It looks for at most nLocMax local maxima above threshold, where the found local maxima are at least minDistBtwLocMax pixels apart. It returns the actual number of local maxima found. Notice that it uses OpenCV's minMaxLoc to find global maxima. It is "opencv-self-contained" except for the (easy to implement) function vdist, which computes the (euclidian) distance between points (r,c) and (row,col).
input is one-channel CV_32F matrix, and locations is nLocMax (rows) by 2 (columns) CV_32S matrix.
int imregionalmax(Mat input, int nLocMax, float threshold, float minDistBtwLocMax, Mat locations)
{
Mat scratch = input.clone();
int nFoundLocMax = 0;
for (int i = 0; i < nLocMax; i++) {
Point location;
double maxVal;
minMaxLoc(scratch, NULL, &maxVal, NULL, &location);
if (maxVal > threshold) {
nFoundLocMax += 1;
int row = location.y;
int col = location.x;
locations.at<int>(i,0) = row;
locations.at<int>(i,1) = col;
int r0 = (row-minDistBtwLocMax > -1 ? row-minDistBtwLocMax : 0);
int r1 = (row+minDistBtwLocMax < scratch.rows ? row+minDistBtwLocMax : scratch.rows-1);
int c0 = (col-minDistBtwLocMax > -1 ? col-minDistBtwLocMax : 0);
int c1 = (col+minDistBtwLocMax < scratch.cols ? col+minDistBtwLocMax : scratch.cols-1);
for (int r = r0; r <= r1; r++) {
for (int c = c0; c <= c1; c++) {
if (vdist(Point2DMake(r, c),Point2DMake(row, col)) <= minDistBtwLocMax) {
scratch.at<float>(r,c) = 0.0;
}
}
}
} else {
break;
}
}
return nFoundLocMax;
}

The first question to answer would be what is "local" in your opinion. The answer may well be a square window (say 3x3 or 5x5) or circular window of a certain radius. You can then scan over the entire image with the window centered at each pixel and pick the highest value in the window.
See this for how to access pixel values in OpenCV.

This is very fast method. It stored founded maxima in a vector of
Points.
vector <Point> GetLocalMaxima(const cv::Mat Src,int MatchingSize, int Threshold, int GaussKernel )
{
vector <Point> vMaxLoc(0);
if ((MatchingSize % 2 == 0) || (GaussKernel % 2 == 0)) // MatchingSize and GaussKernel have to be "odd" and > 0
{
return vMaxLoc;
}
vMaxLoc.reserve(100); // Reserve place for fast access
Mat ProcessImg = Src.clone();
int W = Src.cols;
int H = Src.rows;
int SearchWidth = W - MatchingSize;
int SearchHeight = H - MatchingSize;
int MatchingSquareCenter = MatchingSize/2;
if(GaussKernel > 1) // If You need a smoothing
{
GaussianBlur(ProcessImg,ProcessImg,Size(GaussKernel,GaussKernel),0,0,4);
}
uchar* pProcess = (uchar *) ProcessImg.data; // The pointer to image Data
int Shift = MatchingSquareCenter * ( W + 1);
int k = 0;
for(int y=0; y < SearchHeight; ++y)
{
int m = k + Shift;
for(int x=0;x < SearchWidth ; ++x)
{
if (pProcess[m++] >= Threshold)
{
Point LocMax;
Mat mROI(ProcessImg, Rect(x,y,MatchingSize,MatchingSize));
minMaxLoc(mROI,NULL,NULL,NULL,&LocMax);
if (LocMax.x == MatchingSquareCenter && LocMax.y == MatchingSquareCenter)
{
vMaxLoc.push_back(Point( x+LocMax.x,y + LocMax.y ));
// imshow("W1",mROI);cvWaitKey(0); //For gebug
}
}
}
k += W;
}
return vMaxLoc;
}

Found a simple solution.
In this example, if you are trying to find 2 results of a matchTemplate function with a minimum distance from each other.
cv::Mat result;
matchTemplate(search, target, result, CV_TM_SQDIFF_NORMED);
float score1;
cv::Point displacement1 = MinMax(result, score1);
cv::circle(result, cv::Point(displacement1.x+result.cols/2 , displacement1.y+result.rows/2), 10, cv::Scalar(0), CV_FILLED, 8, 0);
float score2;
cv::Point displacement2 = MinMax(result, score2);
where
cv::Point MinMax(cv::Mat &result, float &score)
{
double minVal, maxVal;
cv::Point minLoc, maxLoc, matchLoc;
minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc, cv::Mat());
matchLoc.x = minLoc.x - result.cols/2;
matchLoc.y = minLoc.y - result.rows/2;
return minVal;
}
The process is:
Find global Minimum using minMaxLoc
Draw a filled white circle around global minimum using min distance between minima as radius
Find another minimum
The the scores can be compared to each other to determine, for example, the certainty of the match,

To find more than just the global minimum and maximum try using this function from skimage:
http://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.peak_local_max
You can parameterize the minimum distance between peaks, too. And more. To find minima, use negated values (take care of the array type though, 255-image could do the trick).

You can go over each pixel and test if it is a local maxima. Here is how I would do it.
The input is assumed to be type CV_32FC1
#include <vector>//std::vector
#include <algorithm>//std::sort
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/core/core.hpp"
//structure for maximal values including position
struct SRegionalMaxPoint
{
SRegionalMaxPoint():
values(-FLT_MAX),
row(-1),
col(-1)
{}
float values;
int row;
int col;
//ascending order
bool operator()(const SRegionalMaxPoint& a, const SRegionalMaxPoint& b)
{
return a.values < b.values;
}
};
//checks if pixel is local max
bool isRegionalMax(const float* im_ptr, const int& cols )
{
float center = *im_ptr;
bool is_regional_max = true;
im_ptr -= (cols + 1);
for (int ii = 0; ii < 3; ++ii, im_ptr+= (cols-3))
{
for (int jj = 0; jj < 3; ++jj, im_ptr++)
{
if (ii != 1 || jj != 1)
{
is_regional_max &= (center > *im_ptr);
}
}
}
return is_regional_max;
}
void imregionalmax(
const cv::Mat& input,
std::vector<SRegionalMaxPoint>& buffer)
{
//find local max - top maxima
static const int margin = 1;
const int rows = input.rows;
const int cols = input.cols;
for (int i = margin; i < rows - margin; ++i)
{
const float* im_ptr = input.ptr<float>(i, margin);
for (int j = margin; j < cols - margin; ++j, im_ptr++)
{
//Check if pixel is local maximum
if ( isRegionalMax(im_ptr, cols ) )
{
cv::Rect roi = cv::Rect(j - margin, i - margin, 3, 3);
cv::Mat subMat = input(roi);
float val = *im_ptr;
//replace smallest value in buffer
if ( val > buffer[0].values )
{
buffer[0].values = val;
buffer[0].row = i;
buffer[0].col = j;
std::sort(buffer.begin(), buffer.end(), SRegionalMaxPoint());
}
}
}
}
}
For testing the code you can try this:
cv::Mat temp = cv::Mat::zeros(15, 15, CV_32FC1);
temp.at<float>(7, 7) = 1;
temp.at<float>(3, 5) = 6;
temp.at<float>(8, 10) = 4;
temp.at<float>(11, 13) = 7;
temp.at<float>(10, 3) = 8;
temp.at<float>(7, 13) = 3;
vector<SRegionalMaxPoint> buffer_(5);
imregionalmax(temp, buffer_);
cv::Mat debug;
cv::cvtColor(temp, debug, cv::COLOR_GRAY2BGR);
for (auto it = buffer_.begin(); it != buffer_.end(); ++it)
{
circle(debug, cv::Point(it->col, it->row), 1, cv::Scalar(0, 255, 0));
}
This solution does not take plateaus into account so it is not exactly the same as matlab's imregionalmax()

I think you want to use the
MinMaxLoc(arr, mask=NULL)-> (minVal, maxVal, minLoc, maxLoc)
Finds global minimum and maximum in array or subarray
function on you image

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parallelizing with OpenMP - how? - parallel-processing

Related

c code is running to slow from nested for loops

Simple image processing algorithm causes Processing to freeze

Canvas multiple Text draw performance

How to find pair edge while building half-edge data structure

Find local maxima in grayscale image using OpenCV

Categories

Resources