Related
recently I write a Molecular Dynamics code calculating the ion-electron force using CUDA parallel computing.
the kernel is list as below:
__global__ void (*x,*y,*z,N){
int i = (blockIdx.x * blockDim.x) + threadIdx.x;
while(i<N) {
double dx;
double dy;
double dz;
double dr;
double Fx;
double Fy;
double Fz;
for (int j = 0; j < N; j++){
dx=x[i]-x[j];
dy=y[i]-y[j];
dz=z[i]-z[j];
dr=sqrt(dx*dx+dy*dy+dz*dz)
dr=dr*dr*dr
Fx+=k*q*q*dx/dr
Fy+=k*q*q*dy/dr
Fz+=k*q*q*dz/dr //force=kq^2r/r^3 written in Cartesian coordinate
}
//rest of the code manipulate force is irrelevant to my question and I want to keep my code short
i += blockDim.x * gridDim.x;
}
}
x,y,z are position of the particels, and dx,dy,dz is the xyz distance, Fx, Fy, Fz in the for loop is the sum of force exerting on ith particle, more specifically you need to calculate x[i]-x[j] and run through all js to find the total force, and let the kernel do all i in parallel.
I found this to be slow as I know the GPU is reading the arrays from global memory. When I change x[i] to a number it becomes 10 times faster because it is reading from the register(L1 cache). my array is too big (more than 20000 element with double float) it is impossible to put into the register. But can it still be a little faster using other memories? I know there's constant memory and shared memory but I don't know how to implement. I think the x[i] is sitting at the globe memory causing it to be slow, and all thread is trying to read x[i] at the same time. any way to improve the speed?
Here is a basic version using shared memory to optimize the access pattern a bit.
#define KERNEL_BLOCKSIZE 256
__global__ void __launch_bounds__(KERNEL_BLOCKSIZE)
kernel(const double* x, const double* y, const double* z, int N,
double k, double q, double* fake_out)
{
const int i = blockIdx.x * blockDim.x + threadIdx.x;
/*
* threads beyond the bound still participate in value fetching, so we cannot
* return early
*/
const bool active = i < N;
double xi, yi, zi;
if(active)
xi = x[i], yi = y[i], zi = z[i];
const double kqq = k * q * q;
double Fx = 0., Fy = 0., Fz = 0.;
__shared__ double xt[KERNEL_BLOCKSIZE];
__shared__ double yt[KERNEL_BLOCKSIZE];
__shared__ double zt[KERNEL_BLOCKSIZE];
for(int j = 0; j < N; j += blockDim.x) {
__syncthreads();
const int thread_j = j + threadIdx.x;
if(thread_j < N) {
xt[threadIdx.x] = x[thread_j];
yt[threadIdx.x] = y[thread_j];
zt[threadIdx.x] = z[thread_j];
}
__syncthreads();
for(int l = 0, M = min(KERNEL_BLOCKSIZE, N - j); l < M; ++l) {
const double dx = xi - xt[l], dy = yi - yt[l], dz = zi - zt[l];
// 1 / sqrt(dx*dx + dy+dy + dz*dz)
const double rnorm = rnorm3d(dx, dy, dz);
const double dr = rnorm * rnorm * rnorm;
const double scale = kqq * dr;
Fx += scale * dx;
Fy += scale * dy;
Fz += scale * dz;
}
}
if(active)
fake_out[i] = norm3d(Fx, Fy, Fz);
}
It's nothing fancy and it doesn't solve the inherent issues with the O(N²) runtime. I made the following changes
Get rid of the while loop. The loop counter was declared as int i. The maximum grid dimension in all CUDA devices is 2^31-1. Meaning we can always launch the entire grid with only one loop per thread.
Given the quadratic runtime, we have no chance of ever running such a huge grid, anyway. But if we did have one that is larger, just launch multiple kernels operating on subsets
Use shared memory to buffer blocks. I picked 256 as a fixed blocksize. That tends to work well. 512 may be another size that is worth experimenting with
The whole dr calculation can be folded into a single predefined math function
To get something that at least compiles into reasonable code, I added an output
Double buffering
We can reduce the number of __syncthreads() that are required by using double buffering. However, that doubles the shared memory usage. Platforms that have only 64 kiB of shared memory will suffer limited occupancy. It requires benchmarking to see which version works better.
__global__ void __launch_bounds__(KERNEL_BLOCKSIZE)
kernel_dbuf(const double* x, const double* y, const double* z, int N,
double k, double q, double* fake_out)
{
const int i = blockIdx.x * blockDim.x + threadIdx.x;
const bool active = i < N;
double xi, yi, zi;
if(active)
xi = x[i], yi = y[i], zi = z[i];
const double kqq = k * q * q;
double Fx = 0., Fy = 0., Fz = 0.;
__shared__ double xt[2][KERNEL_BLOCKSIZE];
__shared__ double yt[2][KERNEL_BLOCKSIZE];
__shared__ double zt[2][KERNEL_BLOCKSIZE];
int dbuf = 0;
for(int j = 0; j < N; dbuf ^= 1, j += blockDim.x) {
const int thread_j = j + threadIdx.x;
if(thread_j < N) {
xt[dbuf][threadIdx.x] = x[thread_j];
yt[dbuf][threadIdx.x] = y[thread_j];
zt[dbuf][threadIdx.x] = z[thread_j];
}
__syncthreads();
for(int l = 0, M = min(KERNEL_BLOCKSIZE, N - j); l < M; ++l) {
const double dx = xi - xt[dbuf][l];
const double dy = yi - yt[dbuf][l];
const double dz = zi - zt[dbuf][l];
// 1 / sqrt(dx*dx + dy+dy + dz*dz)
const double rnorm = rnorm3d(dx, dy, dz);
const double dr = rnorm * rnorm * rnorm;
const double scale = kqq * dr;
Fx += scale * dx;
Fy += scale * dy;
Fz += scale * dz;
}
}
if(active)
fake_out[i] = norm3d(Fx, Fy, Fz);
}
Launch the kernel like this:
__host__ void
launch(const double* x, const double* y, const double* z, int N,
double k, double q, double* fake_out, cudaStream_t stream)
{
const int numBlocks = (N + KERNEL_BLOCKSIZE - 1) / KERNEL_BLOCKSIZE;
kernel<<<numBlocks, KERNEL_BLOCKSIZE, 0, stream>>>(x, y, z, N, k, q, fake_out);
}
Other thoughts
People have already commented on the inherent inefficiency of the algorithm
I guess there is a good reason why k and q are separate variables and you don't just pass a precomputed k * q * q to the kernel
Using doubles should always be a last resort when computing on a GPU, in my opinion. Possible avenues to reduce the precision, at least for parts of the algorithm:
Replace the dr computation with one that is less prone to overflows. Like this:
float scale = 1.f / max(max(abs(dx), abs(dy)), abs(dz));
float rnorm = rnorm3df(dx * scale, dy * scale, dz * scale) * scale;
float dr = rnorm * rnorm * rnorm;
Use Kahan summation for Fx, Fy, Fz
Use double only for Fx, Fy, Fz but not x, y, z positions or other computations
I have this for() loop where I am randomizing the selection of slices of a picture, to display 16 slices of an image in a random order.
I'm picking those slices from an array and I have a variable that picks up what slice is going to be selected in the array.
The problem being that I'd think that the random function would be triggered for every frame, but it's triggered only once.
Here's the code :
void setup() {
size(720,720);
slices = new PImage[16];
slices[0] = loadImage("1.png");
slices[1] = loadImage("2.png");
slices[2] = loadImage("3.png");
slices[3] = loadImage("4.png");
slices[4] = loadImage("5.png");
slices[5] = loadImage("6.png");
slices[6] = loadImage("7.png");
slices[7] = loadImage("8.png");
slices[8] = loadImage("9.png");
slices[9] = loadImage("10.png");
slices[10] = loadImage("11.png");
slices[11] = loadImage("12.png");
slices[12] = loadImage("13.png");
slices[13] = loadImage("14.png");
slices[14] = loadImage("15.png");
slices[15] = loadImage("16.png");
frameRate(1);
}
void draw() {
for (int a = 0; a < 16; a++){
int rand = int(random(slices.length));
image(slices[rand],x,y,size,size);
x += size;
if (a % 4 == 3){
y += size;
x = 0;
}
}
It's dispalying the randomized slices only once and then I end up with a fix image. What I'd like to have is random slices appearing at every frame.
Thanks for your help !
You have 2 problems in your code.
First, you may not want to choose a random index.
This is because the same image could be chosen twice.
Instead, you could shuffle the array before drawing the images, like this:
for (int i = slices.length; i > 1; i--) {
//choose a random index for the i-th element to be swapped with
int j = (int)random(i);
//swap them
PImage temp = slices[j];
slices[j] = slices[i-1];
slices[i-1] = temp;
}
Second, the index is chosen on every frame, and the images are drawn, too, but you can't see it, because your code never resets y back to 0, meaning that they are below the screen.
You can fix this by adding
y = 0;
to the top or bottom of your draw().
Could it be because you've forgot to clear the screen (e.g. calling background()) (meaning once you've drawn an image it will stay rendered) ?
You could also make use of the for loop in setup to avoid repeating yourself:
int numSlices = 16;
PImage[] slices = new PImage[numSlices];
float x, y;
float size = 180;
void setup() {
size(720, 720);
for(int i = 0 ; i < numSlices; i++){
slices[i] = loadImage((i+1) + ".png");
}
frameRate(1);
}
void draw() {
background(255);
for (int a = 0; a < numSlices; a++) {
int rand = int(random(numSlices));
image(slices[rand], x, y, size, size);
x += size;
if (a % 4 == 3) {
y += size;
x = 0;
}
}
y = 0;
}
Additionally you could easily format your code (via CMD+T on OSX or Ctrl+T on Windows/Linux)
Update Kamakura (+1) correctly pointing out y not being reset to 0.
As a distraction I though't point you to IntList's shuffle() method:
int numSlices = 16;
PImage[] slices = new PImage[numSlices];
float x, y;
float size = 180;
IntList indices = new IntList();
void setup() {
size(720, 720);
for(int i = 0 ; i < numSlices; i++){
slices[i] = loadImage((i+1) + ".png");
indices.append(i);
}
frameRate(1);
}
void draw() {
background(255);
// shuffle list
indices.shuffle();
// reset y
y = 0;
for (int a = 0; a < numSlices; a++) {
int rand = indices.get(a);
image(slices[rand], x, y, size, size);
x += size;
if (a % 4 == 3) {
y += size;
x = 0;
}
}
}
Extra reason to play with it, other than a learning experience is that fact that it will be unlikely to get the same random index repeated.
Regarding splicing/shuffling, here's a modified version of the Load and Display example:
/**
* Load and Display
*
* Images can be loaded and displayed to the screen at their actual size
* or any other size.
*/
PImage img; // Declare variable "a" of type PImage
// shuffled image
PImage imgShuffled;
// list of indices to shuffle
IntList shuffleIndices = new IntList();
// configure image slicing rows/columns
int rows = 4;
int cols = 4;
// total sections
int numSections = rows * cols;
// image section dimensions
int sectionWidth;
int sectionHeight;
void setup() {
size(640, 360);
frameRate(1);
// The image file must be in the data folder of the current sketch
// to load successfully
img = loadImage("https://processing.org/examples/moonwalk.jpg"); // Load the image into the program
// calculate section dimensions
sectionWidth = img.width / cols;
sectionHeight = img.height / rows;
// allocate a separate image to copy shuffled pixels into
imgShuffled = createImage(img.width, img.height, RGB);
// populate image section indices
for(int i = 0 ; i < numSections; i++){
shuffleIndices.append(i);
}
}
void shuffleImage(){
// shuffle the list
shuffleIndices.shuffle();
// Ta-da!
println(shuffleIndices);
// loop through each section
for(int i = 0 ; i < numSections; i++){
// index to row, col conversion
int srcCol = i % cols;
int srcRow = i / cols;
// convert to pixel coordinates to copy from
int srcX = srcCol * sectionWidth;
int srcY = srcRow * sectionHeight;
// get random / shuffled index
int index = shuffleIndices.get(i);
// same row, col, to pixel conversion to copy to
int dstCol = index % cols;
int dstRow = index / cols;
int dstX = dstCol * sectionWidth;
int dstY = dstRow * sectionHeight;
// copy from original image to shuffled pixel coordinates
imgShuffled.copy(img,srcX,srcY,sectionWidth,sectionHeight,dstX,dstY,sectionWidth,sectionHeight);
}
}
void draw() {
shuffleImage();
// Displays the image at its actual size at point (0,0)
image(imgShuffled, 0, 0);
}
I've read frame which is encoded with H264, decoded it, and converted it to YUV420P and the data is stored in frameYUV420->data, (type of frame is AVFrame). I want to save that data into a file that can be displayed with GIMP for example.
I know how to save RGB25 pixel format but i'm not quite sure how to do YUV420P. Though i know that Y component will take width x height , and Cb/Cr will take (width/2) x (height/2) amount of space needed to save the data. So i'm guessing i need to first write Y data, and after that i need to write Cb and Cr data. Does anyone have finished code that i could take a look at?
void SaveAvFrame(AVFrame *avFrame)
{
FILE *fDump = fopen("...", "ab");
uint32_t pitchY = avFrame->linesize[0];
uint32_t pitchU = avFrame->linesize[1];
uint32_t pitchV = avFrame->linesize[2];
uint8_t *avY = avFrame->data[0];
uint8_t *avU = avFrame->data[1];
uint8_t *avV = avFrame->data[2];
for (uint32_t i = 0; i < avFrame->height; i++) {
fwrite(avY, avFrame->width, 1, fDump);
avY += pitchY;
}
for (uint32_t i = 0; i < avFrame->height/2; i++) {
fwrite(avU, avFrame->width/2, 1, fDump);
avU += pitchU;
}
for (uint32_t i = 0; i < avFrame->height/2; i++) {
fwrite(avV, avFrame->width/2, 1, fDump);
avV += pitchV;
}
fclose(fDump);
}
int saveYUVFrameToFile(AVFrame* frame, int width, int height)
{
FILE* fileHandle;
int y, writeError;
char filename[32];
static int frameNumber = 0;
sprintf(filename, "frame%d.yuv", frameNumber);
fileHandle = fopen(filename, "wb");
if (fileHandle == NULL)
{
printf("Unable to open %s...\n", filename);
return ERROR;
}
/*Writing Y plane data to file.*/
for (y = 0; y < height; y++)
{
writeError = fwrite(frame->data[0] + y*frame->linesize[0], 1, width, fileHandle);
if (writeError != width)
{
printf("Unable to write Y plane data!\n");
return ERROR;
}
}
/*Dividing by 2.*/
height >>= 1;
width >>= 1;
/*Writing U plane data to file.*/
for (y = 0; y < height; y++)
{
writeError = fwrite(frame->data[1] + y*frame->linesize[1], 1, width, fileHandle);
if (writeError != width)
{
printf("Unable to write U plane data!\n");
return ERROR;
}
}
/*Writing V plane data to file.*/
for (y = 0; y < height; y++)
{
writeError = fwrite(frame->data[2] + y*frame->linesize[2], 1, width, fileHandle);
if (writeError != width)
{
printf("Unable to write V plane data!\n");
return ERROR;
}
}
fclose(fileHandle);
frameNumber++;
return NO_ERROR;
Basicly this is what i came up with using several examples provided by FFmpeg and stackoverflow users.
I'm not sure if it is possible in processing but I would like to be able to zoom in on the fractal without it being extremely laggy and buggy. What I currently have is:
int maxIter = 100;
float zoom = 1;
float x0 = width/2;
float y0 = height/2;
void setup(){
size(500,300);
noStroke();
smooth();
}
void draw(){
translate(x0, y0);
scale(zoom);
for(float Py = 0; Py < height; Py++){
for(float Px = 0; Px < width; Px++){
// scale pixel coordinates to Mandelbrot scale
float w = width;
float h = height;
float xScaled = (Px * (3.5/w)) - 2.5;
float yScaled = (Py * (2/h)) - 1;
float x = 0;
float y = 0;
int iter = 0;
while( x*x + y*y < 2*2 && iter < maxIter){
float tempX = x*x - y*y + xScaled;
y = 2*x*y + yScaled;
x = tempX;
iter += 1;
}
// color pixels
color c;
c = pickColor(iter);
rect(Px, Py,1,1);
fill(c);
}
}
}
// pick color based on time pixel took to escape (number of iterations through loop)
color pickColor(int iters){
color b = color(0,0,0);
if(iters == maxIter) return b;
int l = 1;
color[] colors = new color[maxIter];
for(int i = 0; i < colors.length; i++){
switch(l){
case 1 : colors[i] = color(255,0,0); break;
case 2 : colors[i] = color(0,0,255); break;
case 3 : colors[i] = color(0,255,0); break;
}
if(l == 1 || l == 2) l++;
else if(l == 3) l = 1;
else l--;
}
return colors[iters];
}
// allow zooming in and out
void mouseWheel(MouseEvent event){
float direction = event.getCount();
if(direction < 0) zoom += .02;
if(direction > 0) zoom -= .02;
}
// allow dragging back and forth to change view
void mouseDragged(){
x0+= mouseX-pmouseX;
y0+= mouseY-pmouseY;
}
but it doesn't work very well. It works alright at the size and max iteration I have it set to now (but still not well) and is completely unusable at larger sizes or higher maximum iterations.
The G4P library has an example that does exactly this. Download the library and go to the G4P_MandelBrot example. The example can be found online here.
Hope this helps!
Given a distance transform of a map with obstacles in it, how do I get the least cost path from a start pixel to a goal pixel? The distance transform image has the distance(euclidean) to the nearest obstacle of the original map, in each pixel i.e. if in the original map pixel (i,j) is 3 pixels away to the left and 2 pixels away to the down of an obstacle, then in the distance transform the pixel will have sqrt(4+9) = sqrt(13) as the pixel value. Thus darker pixels signify proximity to obstacles and lighter values signify that they are far from obstacles.
I want to plan a path from a given start to goal pixel using the information provided by this distance transform and optimize the cost of the path and also there is another constraint that the path should never reach a pixel which is less than 'x' pixels away from an obstacle.
How do I go about this?
P.S. A bit of description on the algorithm (or a bit of code) would be helpful as I am new to planning algorithms.
I found an algorithm in the appendix I of the chapter titled
JARVIS, Ray. Distance transform based path planning for robot navigation. Recent trends in mobile robots, 1993, 11: 3-31.
That chapter is fully visible to me in Google books and the book is
ZHENG, Yuang F. (ed.). Recent trends in mobile robots. World Scientific, 1993.
A C++ implementation of the algorithm follows:
#include <vector>
#include <iostream>
#include <cmath>
#include <algorithm>
#include <cassert>
#include <sstream>
/**
Algorithm in the appendix I of the chapter titled
JARVIS, Ray. Distance transform based path planning for robot navigation. *Recent trends in mobile robots*, 1993, 11: 3-31.
in the book
ZHENG, Yuang F. (ed.). *Recent trends in mobile robots*. World Scientific, 1993.
See also http://stackoverflow.com/questions/21215244/least-cost-path-using-a-given-distance-transform
*/
template < class T >
class Matrix
{
private:
int m_width;
int m_height;
std::vector<T> m_data;
public:
Matrix(int width, int height) :
m_width(width), m_height(height), m_data(width *height) {}
int width() const
{
return m_width;
}
int height() const
{
return m_height;
}
void set(int x, int y, const T &value)
{
m_data[x + y * m_width] = value;
}
const T &get(int x, int y) const
{
return m_data[x + y * m_width];
}
};
float distance( const Matrix< float > &a, const Matrix< float > &b )
{
assert(a.width() == b.width());
assert(a.height() == b.height());
float r = 0;
for ( int y = 0; y < a.height(); y++ )
{
for ( int x = 0; x < a.width(); x++ )
{
r += fabs(a.get(x, y) - b.get(x, y));
}
}
return r;
}
int PPMGammaEncode(float radiance, float d)
{
//return int(std::pow(std::min(1.0f, std::max(0.0f, radiance * d)),1.0f / 2.2f) * 255.0f);
return radiance;
}
void PPM_image_save(const Matrix<float> &img, const std::string &filename, float d = 15.0f)
{
FILE *file = fopen(filename.c_str(), "wt");
const int m_width = img.width();
const int m_height = img.height();
fprintf(file, "P3 %d %d 255\n", m_width, m_height);
for (int y = 0; y < m_height; ++y)
{
fprintf(file, "\n# y = %d\n", y);
for (int x = 0; x < m_width; ++x)
{
const float &c(img.get(x, y));
fprintf(file, "%d %d %d\n",
PPMGammaEncode(c, d),
PPMGammaEncode(c, d),
PPMGammaEncode(c, d));
}
}
fclose(file);
}
void PPM_image_save(const Matrix<bool> &img, const std::string &filename, float d = 15.0f)
{
FILE *file = fopen(filename.c_str(), "wt");
const int m_width = img.width();
const int m_height = img.height();
fprintf(file, "P3 %d %d 255\n", m_width, m_height);
for (int y = 0; y < m_height; ++y)
{
fprintf(file, "\n# y = %d\n", y);
for (int x = 0; x < m_width; ++x)
{
float v = img.get(x, y) ? 255 : 0;
fprintf(file, "%d %d %d\n",
PPMGammaEncode(v, d),
PPMGammaEncode(v, d),
PPMGammaEncode(v, d));
}
}
fclose(file);
}
void add_obstacles(Matrix<bool> &m, int n, int avg_lenght, int sd_lenght)
{
int side = std::max(3, std::min(m.width(), m.height()) / 10);
for ( int y = m.height() / 2 - side / 2; y < m.height() / 2 + side / 2; y++ )
{
for ( int x = m.width() / 2 - side / 2; x < m.width() / 2 + side / 2; x++ )
{
m.set(x, y, true);
}
}
/*
for ( int y = m.height()/2-side/2; y < m.height()/2+side/2; y++ ) {
for ( int x = 0; x < m.width()/2+side; x++ ) {
m.set(x,y,true);
}
}
*/
for ( int y = 0; y < m.height(); y++ )
{
m.set(0, y, true);
m.set(m.width() - 1, y, true);
}
for ( int x = 0; x < m.width(); x++ )
{
m.set(x, 0, true);
m.set(x, m.height() - 1, true);
}
}
class Info
{
public:
Info() {}
Info(float v, int x_o, int y_o): value(v), x_offset(x_o), y_offset(y_o) {}
float value;
int x_offset;
int y_offset;
bool operator<(const Info &rhs) const
{
return value < rhs.value;
}
};
void next(const Matrix<float> &m, const int &x, const int &y, int &x_n, int &y_n)
{
//todo: choose the diagonal adiacent in case of ties.
x_n = x;
y_n = y;
Info neighbours[8];
neighbours[0] = Info(m.get(x - 1, y - 1), -1, -1);
neighbours[1] = Info(m.get(x , y - 1), 0, -1);
neighbours[2] = Info(m.get(x + 1, y - 1), +1, -1);
neighbours[3] = Info(m.get(x - 1, y ), -1, 0);
neighbours[4] = Info(m.get(x + 1, y ), +1, 0);
neighbours[5] = Info(m.get(x - 1, y + 1), -1, +1);
neighbours[6] = Info(m.get(x , y + 1), 0, +1);
neighbours[7] = Info(m.get(x + 1, y + 1), +1, +1);
auto the_min = *std::min_element(neighbours, neighbours + 8);
x_n += the_min.x_offset;
y_n += the_min.y_offset;
}
int main(int, char **)
{
std::size_t xMax = 200;
std::size_t yMax = 150;
Matrix<float> cell(xMax + 2, yMax + 2);
Matrix<bool> start(xMax + 2, yMax + 2);
start.set(0.1 * xMax, 0.1 * yMax, true);
Matrix<bool> goal(xMax + 2, yMax + 2);
goal.set(0.9 * xMax, 0.9 * yMax, true);
Matrix<bool> blocked(xMax + 2, yMax + 2);
add_obstacles(blocked, 1, 1, 1);
PPM_image_save(blocked, "blocked.ppm");
PPM_image_save(start, "start.ppm");
PPM_image_save(goal, "goal.ppm");
for ( int y = 0; y <= yMax + 1; y++ )
{
for ( int x = 0; x <= xMax + 1; x++ )
{
if ( goal.get(x, y) )
{
cell.set(x, y, 0.);
}
else
{
cell.set(x, y, xMax * yMax);
}
}
}
Matrix<float> previous_cell = cell;
float values[5];
int cnt = 0;
do
{
std::ostringstream oss;
oss << "cell_" << cnt++ << ".ppm";
PPM_image_save(cell, oss.str());
previous_cell = cell;
for ( int y = 2; y <= yMax; y++ )
{
for ( int x = 2; x <= xMax; x++ )
{
if (!blocked.get(x, y))
{
values[0] = cell.get(x - 1, y ) + 1;
values[1] = cell.get(x - 1, y - 1) + 1;
values[2] = cell.get(x , y - 1) + 1;
values[3] = cell.get(x + 1, y - 1) + 1;
values[4] = cell.get(x , y );
cell.set(x, y, *std::min_element(values, values + 5));
}
}
}
for ( int y = yMax - 1; y >= 1; y-- )
{
for ( int x = xMax - 1; x >= 1; x-- )
{
if (!blocked.get(x, y))
{
values[0] = cell.get(x + 1, y ) + 1;
values[1] = cell.get(x + 1, y + 1) + 1;
values[2] = cell.get(x , y + 1) + 1;
values[3] = cell.get(x - 1, y + 1) + 1;
values[4] = cell.get(x , y );
cell.set(x, y, *std::min_element(values, values + 5));
}
}
}
}
while (distance(previous_cell, cell) > 0.);
PPM_image_save(cell, "cell.ppm");
Matrix<bool> path(xMax + 2, yMax + 2);
for ( int y_s = 1; y_s <= yMax; y_s++ )
{
for ( int x_s = 1; x_s <= xMax; x_s++ )
{
if ( start.get(x_s, y_s) )
{
int x = x_s;
int y = y_s;
while (!goal.get(x, y))
{
path.set(x, y, true);
int x_n, y_n;
next(cell, x, y, x_n, y_n);
x = x_n;
y = y_n;
}
}
}
}
PPM_image_save(path, "path.ppm");
return 0;
}
The algorithm uses the simple PPM image format explained for example in the Chapter 15 from the book Computer Graphics: Principles and Practice - Third Edition by Hughes et al. in order to save the images.
The algorithm starts from the image of the obstacles (blocked) and computes from it the distance transform (cell); then, starting from the distance transform, it computes the optimal path with a steepest descent method: it walks downhill in the transform distance potential field. So you can start from your own distance transform image.
Please note that it seems to me that the algorithm does not fulfill your additional constraint that:
the path should never reach a pixel which is less than 'x' pixels away
from an obstacle.
The following png image is the image of the obstacles, the program generated blocked.ppm image was exported as png via Gimp:
The following png image is the image of the start point, the program generated start.ppm image was exported as png via Gimp:
The following png image is the image of the end point, the program generated goal.ppm image was exported as png via Gimp:
The following png image is the image of the computed path, the program generated path.ppm image was exported as png via Gimp:
The following png image is the image of the distance transform, the program generated cell.ppm image was exported as png via Gimp:
I found the Jarvis' article after having a look at
CHIN, Yew Tuck, et al. Vision guided agv using distance transform. In: Proceedings of the 32nd ISR (International Symposium on Robotics). 2001. p. 21.
Update:
The Jarvis' algorithm is reviewed in the following paper where the authors state that:
Since the path is found by choosing locally only between neighbour
cells, the obtained path can be sub optimal
ELIZONDO-LEAL, Juan Carlos; PARRA-GONZÁLEZ, Ezra Federico; RAMÍREZ-TORRES, José Gabriel. The Exact Euclidean Distance Transform: A New Algorithm for Universal Path Planning. Int J Adv Robotic Sy, 2013, 10.266.
For a graph-based solution you can check for example chapter 15 of the book
DE BERG, Mark, et al. Computational geometry. Springer Berlin Heidelberg, 2008.
which has title "Visibility Graphs - Finding the Shortest Route" and it is freely available at the publisher site.
The chapter explains how to compute the Euclidean shortest path starting from the so-called visibility graph. The visibility graph is computed starting from the set of obstacles, each obstacle is described as a polygon.
The Euclidean shortest path is then found applying a shortest path algorithm such as Dijkstra's algorithm to the visibility graph.
In your distance transform image the obstacles are represented by pixels with zero value and so you can try to approximate them as polygons and after that apply the method described in the cited book.
In the distance transform map of pixels you choose your starting puxel and then select its neighbor with a lower value than your starting puxel - repeat the process until goal puxel is reached (a pixel with value zero).
Normally the goal pixel has value zero, the lowest number of any passable mode.
The problem of nor passing close to barriers is silver by generate the distance transform map so that barriers are enlarged. For example if you want a dustance if two pixels to any barrier - just add two pixels of barrier value. Normally the barriers wich could mot be passed is given a value of minus one.
Same value i used for the edges. An alternative approach ua to surround barriers with a very hifh starting value - the path is not guaranteed to get close, but the algorithm will try to avoid paths un the proximity of barriers.