manipulation of 2d-array in openmp - openmp

I write a loop of the 2d-array 'charge_density'. One specific element of this 2d-array is changed by adding a number to its original value every time. I'd like to speed this process up by openmp, but the reduction command only support scale number. How should I solve this problem? This is a dll file which I use in ctypes for Python. Parameters of this function are from python. Here is my code:
void charge_distribute(double charge_density[][256], double position[][2], double x_range[], double y_range[], double x_step, double y_step)
{
int column_index_left=0, column_index_right=0, row_index_up=0, row_index_down=0;
int i = 0;
#pragma omp parallel for default(none) shared(x_step,y_step,position,x_range,y_range) private(column_index_left,column_index_right,row_index_up,row_index_down) reduction(+:charge_density)
for (i = 0; i < 100000; i++)
{
column_index_left = ((position[i][0] - x_range[0])) / x_step;
column_index_right = column_index_left + 1;
row_index_up = ((position[i][1] - y_range[0])) / y_step;
row_index_down = row_index_up + 1;
charge_density[row_index_up][column_index_left] = charge_density[row_index_up][column_index_left]+(1 - fabs(x_range[column_index_left] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_up] - position[i][1]) / y_step);
charge_density[row_index_up][column_index_right] = charge_density[row_index_up][column_index_right]+(1 - fabs(x_range[column_index_right] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_up] - position[i][1]) / y_step);
charge_density[row_index_down][column_index_left] = charge_density[row_index_down][column_index_left]+(1 - fabs(x_range[column_index_left] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_down] - position[i][1]) / y_step);
charge_density[row_index_down][column_index_right] = charge_density[row_index_down][column_index_right]+(1 - fabs(x_range[column_index_right] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_down] - position[i][1]) / y_step);
}
}

You have 2 options:
use reduction of an array (available since OpenMP 4.5). In this case you have to specify the size of array in the reduction clause: reduction(+:charge_density[:array_size][:256])
Note that in this case a private copy of the array is created on the stack, so if your array is too big, it may cause stack overflow.
use atomic operations:
#pragma omp atomic
charge_density[row_index_up][column_index_left] += (1 - fabs(x_range[column_index_left] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_up] - position[i][1]) / y_step);
#pragma omp atomic
charge_density[row_index_up][column_index_right] += (1 - fabs(x_range[column_index_right] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_up] - position[i][1]) / y_step);
#pragma omp atomic
charge_density[row_index_down][column_index_left] += (1 - fabs(x_range[column_index_left] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_down] - position[i][1]) / y_step);
#pragma omp atomic
charge_density[row_index_down][column_index_right] += (1 - fabs(x_range[column_index_right] - position[i][0]) / x_step) * (1 - fabs(y_range[row_index_down] - position[i][1]) / y_step);

Related

splitting trapezoid in given proportion

I need to split trapezoid in 2 part of given size with line, parallel basement. I need to get new h1 of new trapezoid.
For example I have trapezoid of area S and I want to split it in 2 trapezoids of areas S1 and S2.
S1 = aS; S2 = (1-a)S;
S1 = (a+z)*(h1)/2;
S2 = (b+z)*(1-h1)/2;
S1/S2 = KS;
To get new h1 I compare a and b, if a != b, I solve square equation and if a == b I work like with square. But sometimes I get mistakes because of rounding (for example when I solve this analytically I get a = b and program thinks a > b). How can I handle this? Or maybe there is another better way to split trapezoid?
Here is simplifyed code:
if (base > base_prev) {
b_t = base; // base of trapezoid
h = H; //height of trapezoid
a_t = base_prev; //another base of trapezoid
KS = S1 / S2;
a_x = (a_t - b_t) * (1 + KS) / h;
b_x = 2 * KS * b_t + 2 * b_t;
c_x = -(a_t * h + b_t * h);
h_tmp = (-b_x + sqrt(b_x * b_x - 4 * a_x * c_x)) / (2 * a_x);
if (h_tmp > h || h_tmp < 0)
h_tmp = (-b_x - sqrt(b_x * b_x - 4 * a_x * c_x)) / (2 * a_x);
} else if (base < base_prev) {
b_t = base_prev;
a_t = base;
KS = S1 / S2;
a_x = (a_t - b_t) * (1 + KS) / h;
b_x = 2 * KS * b_t + 2 * b_t;
c_x = -(a_t * h + b_t * h);
h_tmp = (-b_x + sqrt(b_x * b_x - 4 * a_x * c_x)) / (2 * a_x);
if (h_tmp > h || h_tmp < 0)
h_tmp = (-b_x - sqrt(b_x * b_x - 4 * a_x * c_x)) / (2 * a_x);
}
else {
KS = S1 / S2;
h_tmp = h * KS;
}
If you're dealing with catastrophic cancellation, one approach, dating back to a classic article by Forsythe, is to use the alternative solution form x = 2c/(-b -+ sqrt(b^2 - 4ac)) for the quadratic equation ax^2 + bx + c = 0. One way to write the two roots, good for b < 0, is
x = (-b + sqrt(b^2 - 4ac))/(2a)
x = 2c/(-b + sqrt(b^2 - 4ac)),
and another, good for b >= 0, is
x = 2c/(-b - sqrt(b^2 - 4ac))
x = (-b - sqrt(b^2 - 4ac))/(2a).
Alternatively, you could use the bisection method to obtain a reasonably good guess and polish it with Newton's method.

vectorize/optimize this code in MATLAB?

I am building my first large-scale MATLAB program, and I've managed to write original vectorized code for everything so for until I came to trying to create an image representing vector density in stereographic projection. After a couple failed attempts I went to the Mathworks file exchange site and found an open source program which fits my needs courtesy of Malcolm Mclean. With a test matrix his function produces something like this:
And while this is almost exactly what I wanted, his code relies on a triply nested for-loop. On my workstation a test data matrix of size 25000x2 took 65 seconds in this section of code. This is unacceptable since I will be scaling up to a data matrices of size 500000x2 in my project.
So far I've been able to vectorize the innermost loop (which was the longest/worst loop), but I would like to continue and be rid of the loops entirely if possible. Here is Malcolm's original code that I need to vectorize:
dmap = zeros(height, width); % height, width: scalar with default value = 32
for ii = 0: height - 1 % 32 iterations of this loop
yi = limits(3) + ii * deltay + deltay/2; % limits(3) & deltay: scalars
for jj = 0 : width - 1 % 32 iterations of this loop
xi = limits(1) + jj * deltax + deltax/2; % limits(1) & deltax: scalars
dd = 0;
for kk = 1: length(x) % up to 500,000 iterations in this loop
dist2 = (x(kk) - xi)^2 + (y(kk) - yi)^2;
dd = dd + 1 / ( dist2 + fudge); % fudge is a scalar
end
dmap(ii+1,jj+1) = dd;
end
end
And here it is with the changes I've already made to the innermost loop (which was the biggest drain on efficiency). This cuts the time from 65 seconds down to 12 seconds on my machine for the same test matrix, which is better but still far slower than I would like.
dmap = zeros(height, width);
for ii = 0: height - 1
yi = limits(3) + ii * deltay + deltay/2;
for jj = 0 : width - 1
xi = limits(1) + jj * deltax + deltax/2;
dist2 = (x - xi) .^ 2 + (y - yi) .^ 2;
dmap(ii + 1, jj + 1) = sum(1 ./ (dist2 + fudge));
end
end
So my main question, are there any further changes I can make to optimize this code? Or even an alternative method to approach the problem? I've considered using C++ or F# instead of MATLAB for this section of the program, and I may do so if I cannot get to a reasonable efficiency level with the MATLAB code.
Please also note that at this point I don't have ANY additional toolboxes, if I did then I know this would be trivial (using hist3 from the statistics toolbox for example).
Mem consuming solution
yi = limits(3) + deltay * ( 1:height ) - .5 * deltay;
xi = limits(1) + deltax * ( 1:width ) - .5 * deltax;
dx = bsxfun( #minus, x(:), xi ) .^ 2;
dy = bsxfun( #minus, y(:), yi ) .^ 2;
dist2 = bsxfun( #plus, permute( dy, [2 3 1] ), permute( dx, [3 2 1] ) );
dmap = sum( 1./(dist2 + fudge ) , 3 );
EDIT
handling extremely large x and y by breaking the operation into blocks:
blockSize = 50000; % process up to XX elements at once
dmap = 0;
yi = limits(3) + deltay * ( 1:height ) - .5 * deltay;
xi = limits(1) + deltax * ( 1:width ) - .5 * deltax;
bi = 1;
while bi <= numel(x)
% take a block of x and y
bx = x( bi:min(end, bi + blockSize - 1) );
by = y( bi:min(end, bi + blockSize - 1) );
dx = bsxfun( #minus, bx(:), xi ) .^ 2;
dy = bsxfun( #minus, by(:), yi ) .^ 2;
dist2 = bsxfun( #plus, permute( dy, [2 3 1] ), permute( dx, [3 2 1] ) );
dmap = dmap + sum( 1./(dist2 + fudge ) , 3 );
bi = bi + blockSize;
end
This is a good example of why starting a loop from 1 matters. The only reason that ii and jj are initiated at 0 is to kill the ii * deltay and jj * deltax terms which however introduces sequentiality in the dmap indexing, preventing parallelization.
Now, by rewriting the loops you could use parfor() after opening a matlabpool:
dmap = zeros(height, width);
yi = limits(3) + deltay*(1:height) - .5*deltay;
matlabpool 8
parfor ii = 1: height
for jj = 1: width
xi = limits(1) + (jj-1) * deltax + deltax/2;
dist2 = (x - xi) .^ 2 + (y - yi(ii)) .^ 2;
dmap(ii, jj) = sum(1 ./ (dist2 + fudge));
end
end
matlabpool close
Keep in mind that opening and closing the pool has significant overhead (10 seconds on my Intel Core Duo T9300, vista 32 Matlab 2013a).
PS. I am not sure whether the inner loop instead of the outer one can be meaningfully parallelized. You can try to switch the parfor to the inner one and compare speeds (I would recommend going for the big matrix immediately since you are already running in 12 seconds and the overhead is almost as big).
Alternatively, this problem can be solved in using kernel density estimation techniques. This is part of the Statistics Toolbox, or there's this KDE implementation by Zdravko Botev (no toolboxes required).
For the example code below, I get 0.3 seconds for N = 500000, or 0.7 seconds for N = 1000000.
N = 500000;
data = [randn(N,2); rand(N,1)+3.5, randn(N,1);]; % 2 overlaid distrib
tic; [bandwidth,density,X,Y] = kde2d(data); toc;
imagesc(density);

Circle-Circle Collision Prediction

I'm aware of how to check if two circles are intersecting one another. However, sometimes the circles move too fast and end up avoiding collision on the next frame.
My current solution to the problem is to check circle-circle collision an arbitrary amount of times between the previous position and it's current position.
Is there a mathematical way to find the time it takes for the two circle to collide? If I was able to get that time value, I could move the circle to the position at that time and then collide them at that point.
Edit: Constant Velocity
I'm assuming the motion of the circles is linear. Let's say the position of circle A's centre is given by the vector equation Ca = Oa + t*Da where
Ca = (Cax, Cay) is the current position
Oa = (Oax, Oay) is the starting position
t is the elapsed time
Da = (Dax, Day) is the displacement per unit of time (velocity).
Likewise for circle B's centre: Cb = Ob + t*Db.
Then you want to find t such that ||Ca - Cb|| = (ra + rb) where ra and rb are the radii of circles A and B respectively.
Squaring both sides:
||Ca-Cb||^2 = (ra+rb)^2
and expanding:
(Oax + t*Dax - Obx - t*Dbx)^2 + (Oay + t*Day - Oby - t*Dby)^2 = (ra + rb)^2
From that you should get a quadratic polynomial that you can solve for t (if such a t exists).
Here is a way to solve for t the equation in Andrew Durward's excellent answer.
To just plug in values one can skip to the bottom.
(Oax + t*Dax - Obx - t*Dbx)^2 + (Oay + t*Day - Oby - t*Dby)^2 = (ra + rb)^2
(Oax * (Oax + t*Dax - Obx - t*Dbx) + t*Dax * (Oax + t*Dax - Obx - t*Dbx)
- Obx * (Oax + t*Dax - Obx - t*Dbx) - t*Dbx * (Oax + t*Dax - Obx - t*Dbx))
+
(Oay * (Oay + t*Day - Oby - t*Dby) + t*Day * (Oay + t*Day - Oby - t*Dby)
- Oby * (Oay + t*Day - Oby - t*Dby) - t*Dby * (Oay + t*Day - Oby - t*Dby))
=
(ra + rb)^2
Oax^2 + (Oax * t*Dax) - (Oax * Obx) - (Oax * t*Dbx)
+ (t*Dax * Oax) + (t*Dax)^2 - (t*Dax * Obx) - (t*Dax * t*Dbx)
- (Obx * Oax) - (Obx * t*Dax) + Obx^2 + (Obx * t*Dbx)
- (t*Dbx * Oax) - (t*Dbx * t*Dax) + (t*Dbx * Obx) + (t*Dbx)^2
+
Oay^2 + (Oay * t*Day) - (Oay * Oby) - (Oay * t*Dby)
+ (t*Day * Oay) + (t*Day)^2 - (t*Day * Oby) - (t*Day * t*Dby)
- (Oby * Oay) - (Oby * t*Day) + Oby^2 + (Oby * t*Dby)
- (t*Dby * Oay) - (t*Dby * t*Day) + (t*Dby * Oby) + (t*Dby)^2
=
(ra + rb)^2
t^2 * (Dax^2 + Dbx^2 - (Dax * Dbx) - (Dbx * Dax)
+ Day^2 + Dby^2 - (Day * Dby) - (Dby * Day))
+
t * ((Oax * Dax) - (Oax * Dbx) + (Dax * Oax) - (Dax * Obx)
- (Obx * Dax) + (Obx * Dbx) - (Dbx * Oax) + (Dbx * Obx)
+ (Oay * Day) - (Oay * Dby) + (Day * Oay) - (Day * Oby)
- (Oby * Day) + (Oby * Dby) - (Dby * Oay) + (Dby * Oby))
+
Oax^2 - (Oax * Obx) - (Obx * Oax) + Obx^2
+ Oay^2 - (Oay * Oby) - (Oby * Oay) + Oby^2 - (ra + rb)^2
=
0
Now it's a standard form quadratic equation:
ax2 + bx + c = 0
solved like this:
x = (−b ± sqrt(b^2 - 4ac)) / 2a // this x here is t
where--
a = Dax^2 + Dbx^2 + Day^2 + Dby^2 - (2 * Dax * Dbx) - (2 * Day * Dby)
b = (2 * Oax * Dax) - (2 * Oax * Dbx) - (2 * Obx * Dax) + (2 * Obx * Dbx)
+ (2 * Oay * Day) - (2 * Oay * Dby) - (2 * Oby * Day) + (2 * Oby * Dby)
c = Oax^2 + Obx^2 + Oay^2 + Oby^2
- (2 * Oax * Obx) - (2 * Oay * Oby) - (ra + rb)^2
t exists (collision will occur) if--
(a != 0) && (b^2 >= 4ac)
You can predict collision by using direction vector and speed, this gives you the next steps, and when they will make a collision (if there will be).
You just need to check line crossing algorithm to detect that...

PETSc solving linear system with ksp guide

I am starting use PETSc library to solve linear system of equations in parallel. I have installed all packages, build and run successfully the examples in petsc/src/ksp/ksp/examples/tutorials/ folder, for example ex.c
But I couldn't understand how to fill matrices A,X an B by reading them for example from file.
Here I provide the code within ex2.c file:
/* Program usage: mpiexec -n <procs> ex2 [-help] [all PETSc options] */
static char help[] = "Solves a linear system in parallel with KSP.\n\
Input parameters include:\n\
-random_exact_sol : use a random exact solution vector\n\
-view_exact_sol : write exact solution vector to stdout\n\
-m <mesh_x> : number of mesh points in x-direction\n\
-n <mesh_n> : number of mesh points in y-direction\n\n";
/*T
Concepts: KSP^basic parallel example;
Concepts: KSP^Laplacian, 2d
Concepts: Laplacian, 2d
Processors: n
T*/
/*
Include "petscksp.h" so that we can use KSP solvers. Note that this file
automatically includes:
petscsys.h - base PETSc routines petscvec.h - vectors
petscmat.h - matrices
petscis.h - index sets petscksp.h - Krylov subspace methods
petscviewer.h - viewers petscpc.h - preconditioners
*/
#include <C:\PETSC\include\petscksp.h>
#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char **args)
{
Vec x,b,u; /* approx solution, RHS, exact solution */
Mat A; /* linear system matrix */
KSP ksp; /* linear solver context */
PetscRandom rctx; /* random number generator context */
PetscReal norm; /* norm of solution error */
PetscInt i,j,Ii,J,Istart,Iend,m = 8,n = 7,its;
PetscErrorCode ierr;
PetscBool flg = PETSC_FALSE;
PetscScalar v;
#if defined(PETSC_USE_LOG)
PetscLogStage stage;
#endif
PetscInitialize(&argc,&args,(char *)0,help);
ierr = PetscOptionsGetInt(PETSC_NULL,"-m",&m,PETSC_NULL);CHKERRQ(ierr);
ierr = PetscOptionsGetInt(PETSC_NULL,"-n",&n,PETSC_NULL);CHKERRQ(ierr);
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Compute the matrix and right-hand-side vector that define
the linear system, Ax = b.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Create parallel matrix, specifying only its global dimensions.
When using MatCreate(), the matrix format can be specified at
runtime. Also, the parallel partitioning of the matrix is
determined by PETSc at runtime.
Performance tuning note: For problems of substantial size,
preallocation of matrix memory is crucial for attaining good
performance. See the matrix chapter of the users manual for details.
*/
ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n);CHKERRQ(ierr);
ierr = MatSetFromOptions(A);CHKERRQ(ierr);
ierr = MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);CHKERRQ(ierr);
ierr = MatSeqAIJSetPreallocation(A,5,PETSC_NULL);CHKERRQ(ierr);
/*
Currently, all PETSc parallel matrix formats are partitioned by
contiguous chunks of rows across the processors. Determine which
rows of the matrix are locally owned.
*/
ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr);
/*
Set matrix elements for the 2-D, five-point stencil in parallel.
- Each processor needs to insert only elements that it owns
locally (but any non-local elements will be sent to the
appropriate processor during matrix assembly).
- Always specify global rows and columns of matrix entries.
Note: this uses the less common natural ordering that orders first
all the unknowns for x = h then for x = 2h etc; Hence you see J = Ii +- n
instead of J = I +- m as you might expect. The more standard ordering
would first do all variables for y = h, then y = 2h etc.
*/
ierr = PetscLogStageRegister("Assembly", &stage);CHKERRQ(ierr);
ierr = PetscLogStagePush(stage);CHKERRQ(ierr);
for (Ii=Istart; Ii<Iend; Ii++) {
v = -1.0; i = Ii/n; j = Ii - i*n;
if (i>0) {J = Ii - n; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);}
if (i<m-1) {J = Ii + n; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);}
if (j>0) {J = Ii - 1; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);}
if (j<n-1) {J = Ii + 1; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);}
v = 4.0; ierr = MatSetValues(A,1,&Ii,1,&Ii,&v,INSERT_VALUES);CHKERRQ(ierr);
}
/*
Assemble matrix, using the 2-step process:
MatAssemblyBegin(), MatAssemblyEnd()
Computations can be done while messages are in transition
by placing code between these two statements.
*/
ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = PetscLogStagePop();CHKERRQ(ierr);
/* A is symmetric. Set symmetric flag to enable ICC/Cholesky preconditioner */
ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);CHKERRQ(ierr);
/*
Create parallel vectors.
- We form 1 vector from scratch and then duplicate as needed.
- When using VecCreate(), VecSetSizes and VecSetFromOptions()
in this example, we specify only the
vector's global dimension; the parallel partitioning is determined
at runtime.
- When solving a linear system, the vectors and matrices MUST
be partitioned accordingly. PETSc automatically generates
appropriately partitioned matrices and vectors when MatCreate()
and VecCreate() are used with the same communicator.
- The user can alternatively specify the local vector and matrix
dimensions when more sophisticated partitioning is needed
(replacing the PETSC_DECIDE argument in the VecSetSizes() statement
below).
*/
ierr = VecCreate(PETSC_COMM_WORLD,&u);CHKERRQ(ierr);
ierr = VecSetSizes(u,PETSC_DECIDE,m*n);CHKERRQ(ierr);
ierr = VecSetFromOptions(u);CHKERRQ(ierr);
ierr = VecDuplicate(u,&b);CHKERRQ(ierr);
ierr = VecDuplicate(b,&x);CHKERRQ(ierr);
/*
Set exact solution; then compute right-hand-side vector.
By default we use an exact solution of a vector with all
elements of 1.0; Alternatively, using the runtime option
-random_sol forms a solution vector with random components.
*/
ierr = PetscOptionsGetBool(PETSC_NULL,"-random_exact_sol",&flg,PETSC_NULL);CHKERRQ(ierr);
if (flg) {
ierr = PetscRandomCreate(PETSC_COMM_WORLD,&rctx);CHKERRQ(ierr);
ierr = PetscRandomSetFromOptions(rctx);CHKERRQ(ierr);
ierr = VecSetRandom(u,rctx);CHKERRQ(ierr);
ierr = PetscRandomDestroy(&rctx);CHKERRQ(ierr);
} else {
ierr = VecSet(u,1.0);CHKERRQ(ierr);
}
ierr = MatMult(A,u,b);CHKERRQ(ierr);
/*
View the exact solution vector if desired
*/
flg = PETSC_FALSE;
ierr = PetscOptionsGetBool(PETSC_NULL,"-view_exact_sol",&flg,PETSC_NULL);CHKERRQ(ierr);
if (flg) {ierr = VecView(u,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);}
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Create the linear solver and set various options
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Create linear solver context
*/
ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
/*
Set operators. Here the matrix that defines the linear system
also serves as the preconditioning matrix.
*/
ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
/*
Set linear solver defaults for this problem (optional).
- By extracting the KSP and PC contexts from the KSP context,
we can then directly call any KSP and PC routines to set
various options.
- The following two statements are optional; all of these
parameters could alternatively be specified at runtime via
KSPSetFromOptions(). All of these defaults can be
overridden at runtime, as indicated below.
*/
ierr = KSPSetTolerances(ksp,1.e-2/((m+1)*(n+1)),1.e-50,PETSC_DEFAULT,
PETSC_DEFAULT);CHKERRQ(ierr);
/*
Set runtime options, e.g.,
-ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
These options will override those specified above as long as
KSPSetFromOptions() is called _after_ any other customization
routines.
*/
ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Solve the linear system
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Check solution and clean up
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Check the error
*/
ierr = VecAXPY(x,-1.0,u);CHKERRQ(ierr);
ierr = VecNorm(x,NORM_2,&norm);CHKERRQ(ierr);
ierr = KSPGetIterationNumber(ksp,&its);CHKERRQ(ierr);
/* Scale the norm */
/* norm *= sqrt(1.0/((m+1)*(n+1))); */
/*
Print convergence information. PetscPrintf() produces a single
print statement from all processes that share a communicator.
An alternative is PetscFPrintf(), which prints to a file.
*/
ierr = PetscPrintf(PETSC_COMM_WORLD,"Norm of error %A iterations %D\n",
norm,its);CHKERRQ(ierr);
/*
Free work space. All PETSc objects should be destroyed when they
are no longer needed.
*/
ierr = KSPDestroy(&ksp);CHKERRQ(ierr);
ierr = VecDestroy(&u);CHKERRQ(ierr); ierr = VecDestroy(&x);CHKERRQ(ierr);
ierr = VecDestroy(&b);CHKERRQ(ierr); ierr = MatDestroy(&A);CHKERRQ(ierr);
/*
Always call PetscFinalize() before exiting a program. This routine
- finalizes the PETSc libraries as well as MPI
- provides summary and diagnostic information if certain runtime
options are chosen (e.g., -log_summary).
*/
ierr = PetscFinalize();
return 0;
}
Does someone know how to fill own matrices within examples?
Yeah, this can be a little daunting when you're getting started. There's a good walk-through of the process in this ACTS tutorial from 2006; the tutorials listed on the PetSC web page are generally quite good.
The key parts of this are:
ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
Actually create the PetSC matrix object, Mat A;
ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n);CHKERRQ(ierr);
set the sizes; here, the matrix is m*n x m*n, as it's a stencil for operating on an m x n 2d grid
ierr = MatSetFromOptions(A);CHKERRQ(ierr);
This just takes any PetSC command line options that you might have supplied at run time and apply them to the matrix, if you wanted to control how A was set up; otherwise, you could just, have eg, used MatCreateMPIAIJ() to create it as an AIJ-format matrix (the default), MatCreateMPIDense() if it was going to be a dense matrix.
ierr = MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);CHKERRQ(ierr);
ierr = MatSeqAIJSetPreallocation(A,5,PETSC_NULL);CHKERRQ(ierr);
Now that we've gotten an AIJ matrix, these calls just pre-allocates the sparse matrix, assuming 5 non-zeros per row. This is for performance. Note that both the MPI and Seq functions must be called to make sure this works with both 1 processor and multiple processors; this always seemed weird to be, but there you go.
Ok, now that the matrix is all set up, here's where we start getting into the actual meat of the matter.
First, we find out which rows this particular process owns. The distribution is by rows, which is a good distribution for typical sparse matrices.
ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr);
So after this call, each processor has its own version of Istart and Iend, and its this processors job to update rows starting at Istart end ending just before Iend, as you see in this for loop:
for (Ii=Istart; Ii<Iend; Ii++) {
v = -1.0; i = Ii/n; j = Ii - i*n;
Ok, so if we're operating on row Ii, this corresponds to grid location (i,j) where i = Ii/n and j = Ii % n. Eg, grid location (i,j) corresponds to row Ii = i*n + j. Makes sense?
I'm going to strip out the if statements here because they're important but they're just dealing with the boundary values and they make things more complicated.
In this row, there will be a +4 on the diagonal, and -1s at columns corresponding to (i-1,j), (i+1,j), (i,j-1), and (i,j+1). Assuming that we haven't gone off the end of the grid for these (eg, 1 < i < m-1 and 1 < j < n-1), that means
J = Ii - n; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);
J = Ii + n; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);
J = Ii - 1; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);
J = Ii + 1; ierr = MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);
v = 4.0; ierr = MatSetValues(A,1,&Ii,1,&Ii,&v,INSERT_VALUES);CHKERRQ(ierr);
}
The if statements I took out just avoid setting those values if they don't exist, and the CHKERRQ macro just prints out a useful error if ierr != 0, eg the set values call failed (because we tried to set an invalid value).
Now we've set local values; the MatAssembly calls start communication to ensure any necessary values are exchanged between processors. If you have any unrelated work to do, it can be stuck between the Begin and End to try to overlap communication and computation:
ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
And now you're done and can call your solvers.
So a typical workflow is:
Create your matrix (MatCreate)
Set its size (MatSetSizes)
Set various matrix options (MatSetFromOptions is a good choice, rather than hardcoding things)
For sparse matrices, set the preallocation to reasonable guesses for the number of non-zeros per row; you can do this with a single value (as here), or with an array representing the number of non-zeros per row (here filled in with PETSC_NULL): (MatMPIAIJSetPreallocation, MatSeqAIJSetPreallocation)
Find out which rows are your responsibility: (MatGetOwnershipRange)
Set the values (calling MatSetValues either once per value, or passing in a chunk of values; INSERT_VALUES sets new elements, ADD_VALUES increments any existing elements)
Then do the assembly (MatAssemblyBegin,MatAssemblyEnd).
Other more complicated use cases are possible.

efficient X within limit algorithm

I determine limits as limit(0)=0; limit(y)=2*1.08^(y-1), y∈{1,2,3,...,50} or if you prefeer iterative functions:
limit(0)=0
limit(1)=2
limit(y)=limit(y-1)*1.08, x∈{2,3,4,...,50}
Exmples:
limit(1) = 2*1.08^0 = 2
limit(2) = 2*1.08^1 = 2.16
limit(3) = 2*1.08^2 = 2.3328
...
for a given x∈[0,infinity) I want an efficient formula to calculate y so that limit(y)>x and limit(y-1)≤x or 50 if there is none.
Any ideas?
or is pre-calculating the 50 limits and using a couple of ifs the best solution?
I am using erlang as language, but I think it will not make much of a difference.
limit(y) = 2 * 1.08^(y-1)
limit(y) > x >= limit(y - 1)
Now if I haven't made a mistake,
2 * 1.08^(y - 1) > x >= 2 * 1.08^(y - 2)
1.08^(y - 1) > x / 2 >= 1.08^(y - 2)
y - 1 > log[1.08](x / 2) >= y - 2
y + 1 > 2 + ln(x / 2) / ln(1.08) >= y
y <= 2 + ln(x / 2) / ln(1.08) < y + 1
Which gives you
y = floor(2 + ln(x / 2) / ln(1.08))

Resources