Implementing the Shih-Wu Euclidean distance transform: what is R(p) used for? - algorithm

I'm attempting to implement the Shih-Wu distance transform algorithm, as described on page 5 of the pdf. It looks fairly simple but I'm hampered by my limited math (or possibly my reading comprehension ability).
I think I have it all except for one question:
In the algorithm, how is R(p) used? It is meticulously calculated using h(p,q) and G(p,q), and then appears not to be used anywhere.
I'm sure it's explained somewhere in the proof, but the math is opaque to me, and I don't see R mentioned in the lead-in to the algorithm.

In the definitions on page 4 it says:
R(p): The relative-coordinates vector R(p) = (Rx, Ry) of pixel p,
which records the horizontal and vertical pixel-distances between p
and the closest background pixel. It is initialized as all (0,0). Note
that, Rx(p) and Ry(p) indicate the horizontal and vertical
pixel-distances, respectively.
At every pixel, the algorithm calculates h() using the R values already saved in the neighboring pixels, and then saves the R value for that pixel so that it can be used in the calculations for the next pixel.

Related

Linear depth buffer

Many people use usual perspective matrix with third line like this:
(0 0 (n+f)/(n-f) 2*n*f/(n-f))
But it has problem with float precision near far clipping surface. The result is z-fighting.
What about to use linear transformation of z? Let's change the matrix third line to this:
(0 0 -2/(f-n) (-f-n)/(f-n))
It will be linear transformation z from [-n, -f] to [-1, 1]. Then, we will add the line in vertex shader:
gl_Position.z *= gl_Position.w;
After perspective divide the z value will be restored.
Why don't it used everywhere? I found a lot of articles in internet. All of them used a usual matrix.
Is linear transformation described by me has problems what I don't see?
Update: This is not a duplicate of this. My question is not about how to do linear depth buffer. In my case, the buffer is already linear. I don't understand, why is this method not used? Are there traps in the inner webgl pipeline?
The approach you're describing simply doesn't work. One advantage of a hyperbolic Z buffer is that we can interpolate the resulting depth values linearly in screen space. If you multiply gl_Position.z by gl_Position.w, the resulting z value will not be linear in screen space any more, but the depth test will still use linearly interpolated values. This results in your primitives becoming bend in the z-dimension, leading to completely wrong occlusions and intersections between nearby primitives (especially if the vertices of on primitive lie near the center of the other).
The only way to use a linear depth buffer is to actually do the non-linear interpolation for the Z value yourself in the fragment shader. This can be done (and boil's down to just linearly transform the perspective-corrected interpolated w value for each fragment, hence it is sometimes called "W buffering"), but you're losing the benefits of the early Z test and - much worse - of the hierarchical depth test.
An interesting way to improve the precision of the depth test is to use a floating point buffer in combination with a reversed Z projection matrix, as explained in this Depth Precision Visualized blog article.
UPDATE
From your comment:
Depth in screen space is linear interpolation of NDC, how I understand form here. In my case, it will be linear interpolation of linear interpolation of z from camera space. Thus, depth in screen space interpolated already.
You mis-understood this. May main point was that the linear interpolation in screen space is only valid if you're using Z values which are already hyperbolically distorted (like NDC Z). If you want to use eye-space Z, this can not be linearly interpolated. I made some drawings of the situation:
This is a top-down view on eye-space and NDC. All drawings are actually to scale. The green ray is a view ray going through some pixel. This pixel happens to be the one which directly represents the mid-point of that one triangle (green point).
After the projection matrix is applied and the division by w has happened, we are in normalized device coordinates. Note that the direction of the viewing ray is now just +z, and all view rays of all pixels became parallel (so that we can just ignore Z when rasterizing). Due to the hyperbolic relation of the z value, the green point now does not lie on exactly on the center any more, but is squeezed towards the far plane. However, the important point is that this point now lies on the straight line formed by the (hyperbolically distorted) end points of the primitive - hence we simply can interpolate z_ndc linearly in screen space.
If you use a linear depth buffer, the green point now lies at z in the center of the primitive again, but that point is not on the straight line - you actually bend your primitives.
Since the depth test will use a linear interpolation, it will just get the points as in the rightmost drawing as input from the vertex shader, but will interpolate them linearly - connecting those points by straight lines. As a result, the intersection between primitives will not be where it actually has to be.
Another way to think of this: Imagine you draw some primitive which extents into the z-dimension, with some perspective projection. Due to perspective, stuff that is farther away will appear smaller. So if you just go one pixel to the right in screen space, the z extent covered by that step will actually bigger if the primitive is far away, while it will become smaller and smaller as closer you get. So if you just go in equal-sized steps to the right, the z-steps you're making will vary depending on the orientation and position of your primitive. However, we want to use a linear interpolation, so we want to make the same z step size for every x step. The only we to do this is by distorting the space z is in - and the hyperbolical distortion introduced by the division by w exactly does that.
We don't use a linear transformation because that will have precision problems at all distances equally. At least now, the precision problems only show up far away, where you're less likely to notice. A linear mapping spaces the error out evenly, which makes errors more likely to happen close to the camera.

Matrix of pixels to coordinates

I have to convert a given matrix of pixels (coefficients are in a range from 0 to 255, since the matrix corresponds to a black and white image) into two lists. Both of them may be composed of lists, one containing the abscissas of the points, the other the ordinates.
As you can notice on the included picture, the first case corresponds to a single curve, whereas both the others involve multiple ones, crossing one each other. The algorithm should be able to make the difference between the two or three curves (in the two last examples), so in the two mainlists, a given sublist corresponds to a given curve.
I have absolutely no idea of what to start from...
One last thing : I'm seeking ideas on how to program this algorithm, so this is why I didn't add any specific programming language (if code may help any explanation, feel free to speak any language).
Thanks in advance >^.^<
Check out the Hough transform. It is a simple voting algorithm, that allows finding simple geometric shapes in images. One complication could be that your lines are not strictly straight. But it would give you equations on the lines it does find. Since your case is a little nonstandard I'd try to understand the algorithm itself and write my own implementation.
In my first implementation (centering a circle on a square in long focal depth image I took) I started with a very simple Python example I found online, rewrote it for my purposes and then later moved to C# for speed, since I needed more parameters (higher dimensional search space) than you need for this simple case.
In your case I would start with the simple assumption of a straight line. Then the Hough transform will give 1, 2 and 3 maxima respectively for your three cases.
The idea of the Hough transform is well described on wikipedia.
Here just the gist of the idea:
For a straight line think of giving each black pixel the right to vote
for 180 possible lines that could go through it (one for each angle in
single degree steps), then plotting the vote as histogram over a 2d space, where one
dimension is the angle of the line, another is the distance from
origin (using the Hesse normal form of the line for practical reasons
rather than the common y= m x +b) and the z-dimension is the number of votes. The actual line formed by the black
pixels will get more votes than any other possible line, so you are
simply looking for the Maximum vote location in the transformation
space (say in Python/numpy it would be argmax).
If there are two lines, you will find two clear maxima, the higher one with the longer or thicker line (more votes). You can then start playing with grayscale in your image, giving non-integer votes to pixels. You can also play with the resolution of the angle, depending on the content of your problem.

Curvature estimation from image

I have images like this ones:
In this images the red line is what I want to get from the image. Original images do not have that red lines, but only that green road.
What I want is to estimate the curve from image in form of a coeffitients of equation: A x^2 + B x + C = 0. In images there can be noise (black holes on edges as you see above).
I have tried to solve this by using least squares method (LSM), but there are two problems with this approach:
The method is too slow even on PC, because the points amount is high.
The road is too wide in the following case:
The curve on the left image is correctly recognized, but on the right side incorrectly. The reason is that the road is too wide and too short, I suppose.
As a solution for both cases I want to make the road narrow. In ideal case it is a red line in images above. Or I want to use LSM for line detection (A x + B = 0) for optimization of processing time.
I have tried eroding image - it is wrong approach.
Skeleton also not the right solution.
Any ideas about how to achieve the desired result (make the road narrow)?
Or any ideas of another approach for this problem?
If you can rely on always having one axis as the dependent variable in your fit (looks like it should be the x axis in the above "correct" examples, although your bottom right failure seems to be using y), then you could do something like this:
for each scanline y, pick the median x of the non-black pixels
if there are no non-black pixels (or fewer than some chosen noise threshold), skip the line
You now have a list of (x,y) pairs, at most as many as there are scan lines. These represent guesses as to the midpoint of the road at each level. Fit a low order polynomial x=f(y) (I'd go for linear or cubic, but you could do quadratic if you prefer) to these points by least squares.
For the sorts of images you've shown, the detail is very coarse, so you might be able to manage with just a subset of points. But even without that the processing cost should be reasonable unless you're using very constrained hardware.
If left-right paths occur often then you could fit both ways and then apply some kind of goodness of fit criterion. If paths loop back on themselves often, then this sort of midpoint approach won't give you a good answer, but then you're onto a loser with the fitting anyway.

Reproducing images with primitive shapes. (Graphics optimization problem)

Based on this original idea, that many of you have probably seen before:
http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/
I wanted to try taking a different approach:
You have a target image. Let's say you can add one triangle at a time. There exists some triangle (or triangles in case of a tie) that maximizes the image similarity (fitness function). If you could brute force through all possible shapes and colors, you would find it. But that is prohibitively expensive. Searching all triangles is a 10-dimensional space: x1, y1, x2, y2, x3, y3, r, g, b, a.
I used simulated annealing with pretty good results. But I'm wondering if I can further improve on this. One thought was to actually analyze the image difference between the target image and current image and look for "hot spots" that might be good places to put a new triangle.
What algorithm would you use to find the optimal triangle (or other shape) that maximizes image similarity?
Should the algorithm vary to handle coarse details and fine details differently? I haven't let it run long enough to start refining the finer image details. It seems to get "shy" about adding new shapes the longer it runs... it uses low alpha values (very transparent shapes).
Target Image and Reproduced Image (28 Triangles):
Edit! I had a new idea. If shape coordinates and alpha value are given, the optimal RGB color for the shape can be computed by analyzing the pixels in the current image and the target image. So that eliminates 3 dimensions from the search space, and you know the color you're using is always optimal! I've implemented this, and tried another run using circles instead of triangles.
300 Circles and 300 Triangles:
I would start experimenting with vertex-colours (have a different RGBA value for each vertex), this will slightly increase the complexity but massively increase the ability to quickly match the target image (assuming photographic images which tend to have natural gradients in them).
Your question seems to suggest moving away from a genetic approach (i.e. trying to find a good triangle to fit rather than evolving it). However, it could be interpreted both ways, so I'll answer from a genetic approach.
A way to focus your mutations would be to apply a grid over the image, calculate which grid-square is the least-best match of the corresponding grid-square in the target image and determine which triangles intersect with that grid square, then flag them for a greater chance of mutation.
You could also (at the same time) improve fine-detail by doing a smaller grid-based check on the best matching grid-square.
For example if you're using an 8x8 grid over the image:
Determine which of the 64 grid squares is the worst match and flag intersecting (or nearby/surrounding) triangles for higher chance of mutation.
Determine which of the 64 grid-squares is the best match and repeat with another smaller 8x8 grid within that square only (i.e. 8x8 grid within that best grid-square). These can be flagged for likely spots for adding new triangles, or just to fine-tune the detail.
An idea using multiple runs:
Use your original algorithm as the first run, and stop it after a predetermined number of steps.
Analyze the first run's result. If the result is pretty good on most part of the image but was doing badly in a small part of the image, increase the emphasis of this part.
When running the second run, double the error contribution from the emphasized part (see note). This will cause the second run to do a better match in that area. On the other hand, it will do worse in the rest of the image, relative to the first run.
Repeatedly perform many runs.
Finally, use a genetic algorithm to merge the results - it is allowed to choose from triangles generated from all of the previous runs, but is not allowed to generate any new triangles.
Note: There was in fact some algorithms for calculating how much the error contribution should be increased. It's called http://en.wikipedia.org/wiki/Boosting. However, I think the idea will still work without using a mathematically precise method.
Very interesting problem indeed ! My way of analyzing such problem was usage of evolutionary strategy optimization algorithm. It's not fast and is suitable if number of triangles is small. I've not achieved good approximations of original image - but that is partly because my original image was too complex - so I didn't tried a lot of algorithm restarts to see what other sub-optimal results EVO could produce... In any case - this is not bad as abstract art generation method :-)
i think that algorithm is at real very simple.
P = 200 # size of population
max_steps = 100
def iteration
create P totally random triangles (random points and colors)
select one triangle that has best fittness
#fitness computing is described here: http://rogeralsing.com/2008/12/09/genetic-programming-mona-lisa-faq/
put selected triangle on the picture (or add it to array of triangles to manipulate them in future)
end
for i in 1..max_steps {iteration}

Drawing a Topographical Map

I've been working on a visualization project for 2-dimensional continuous data. It's the kind of thing you could use to study elevation data or temperature patterns on a 2D map. At its core, it's really a way of flattening 3-dimensions into two-dimensions-plus-color. In my particular field of study, I'm not actually working with geographical elevation data, but it's a good metaphor, so I'll stick with it throughout this post.
Anyhow, at this point, I have a "continuous color" renderer that I'm very pleased with:
The gradient is the standard color-wheel, where red pixels indicate coordinates with high values, and violet pixels indicate low values.
The underlying data structure uses some very clever (if I do say so myself) interpolation algorithms to enable arbitrarily deep zooming into the details of the map.
At this point, I want to draw some topographical contour lines (using quadratic bezier curves), but I haven't been able to find any good literature describing efficient algorithms for finding those curves.
To give you an idea for what I'm thinking about, here's a poor-man's implementation (where the renderer just uses a black RGB value whenever it encounters a pixel that intersects a contour line):
There are several problems with this approach, though:
Areas of the graph with a steeper slope result in thinner (and often broken) topo lines. Ideally, all topo lines should be continuous.
Areas of the graph with a flatter slope result in wider topo lines (and often entire regions of blackness, especially at the outer perimeter of the rendering region).
So I'm looking at a vector-drawing approach for getting those nice, perfect 1-pixel-thick curves. The basic structure of the algorithm will have to include these steps:
At each discrete elevation where I want to draw a topo line, find a set of coordinates where the elevation at that coordinate is extremely close (given an arbitrary epsilon value) to the desired elevation.
Eliminate redundant points. For example, if three points are in a perfectly-straight line, then the center point is redundant, since it can be eliminated without changing the shape of the curve. Likewise, with bezier curves, it is often possible to eliminate cetain anchor points by adjusting the position of adjacent control points.
Assemble the remaining points into a sequence, such that each segment between two points approximates an elevation-neutral trajectory, and such that no two line segments ever cross paths. Each point-sequence must either create a closed polygon, or must intersect the bounding box of the rendering region.
For each vertex, find a pair of control points such that the resultant curve exhibits a minimum error, with respect to the redundant points eliminated in step #2.
Ensure that all features of the topography visible at the current rendering scale are represented by appropriate topo lines. For example, if the data contains a spike with high altitude, but with extremely small diameter, the topo lines should still be drawn. Vertical features should only be ignored if their feature diameter is smaller than the overall rendering granularity of the image.
But even under those constraints, I can still think of several different heuristics for finding the lines:
Find the high-point within the rendering bounding-box. From that high point, travel downhill along several different trajectories. Any time the traversal line crossest an elevation threshold, add that point to an elevation-specific bucket. When the traversal path reaches a local minimum, change course and travel uphill.
Perform a high-resolution traversal along the rectangular bounding-box of the rendering region. At each elevation threshold (and at inflection points, wherever the slope reverses direction), add those points to an elevation-specific bucket. After finishing the boundary traversal, start tracing inward from the boundary points in those buckets.
Scan the entire rendering region, taking an elevation measurement at a sparse regular interval. For each measurement, use it's proximity to an elevation threshold as a mechanism to decide whether or not to take an interpolated measurement of its neighbors. Using this technique would provide better guarantees of coverage across the whole rendering region, but it'd be difficult to assemble the resultant points into a sensible order for constructing paths.
So, those are some of my thoughts...
Before diving deep into an implementation, I wanted to see whether anyone else on StackOverflow has experience with this sort of problem and could provide pointers for an accurate and efficient implementation.
Edit:
I'm especially interested in the "Gradient" suggestion made by ellisbben. And my core data structure (ignoring some of the optimizing interpolation shortcuts) can be represented as the summation of a set of 2D gaussian functions, which is totally differentiable.
I suppose I'll need a data structure to represent a three-dimensional slope, and a function for calculating that slope vector for at arbitrary point. Off the top of my head, I don't know how to do that (though it seems like it ought to be easy), but if you have a link explaining the math, I'd be much obliged!
UPDATE:
Thanks to the excellent contributions by ellisbben and Azim, I can now calculate the contour angle for any arbitrary point in the field. Drawing the real topo lines will follow shortly!
Here are updated renderings, with and without the ghetto raster-based topo-renderer that I've been using. Each image includes a thousand random sample points, represented by red dots. The angle-of-contour at that point is represented by a white line. In certain cases, no slope could be measured at the given point (based on the granularity of interpolation), so the red dot occurs without a corresponding angle-of-contour line.
Enjoy!
(NOTE: These renderings use a different surface topography than the previous renderings -- since I randomly generate the data structures on each iteration, while I'm prototyping -- but the core rendering method is the same, so I'm sure you get the idea.)
Here's a fun fact: over on the right-hand-side of these renderings, you'll see a bunch of weird contour lines at perfect horizontal and vertical angles. These are artifacts of the interpolation process, which uses a grid of interpolators to reduce the number of computations (by about 500%) necessary to perform the core rendering operations. All of those weird contour lines occur on the boundary between two interpolator grid cells.
Luckily, those artifacts don't actually matter. Although the artifacts are detectable during slope calculation, the final renderer won't notice them, since it operates at a different bit depth.
UPDATE AGAIN:
Aaaaaaaand, as one final indulgence before I go to sleep, here's another pair of renderings, one in the old-school "continuous color" style, and one with 20,000 gradient samples. In this set of renderings, I've eliminated the red dot for point-samples, since it unnecessarily clutters the image.
Here, you can really see those interpolation artifacts that I referred to earlier, thanks to the grid-structure of the interpolator collection. I should emphasize that those artifacts will be completely invisible on the final contour rendering (since the difference in magnitude between any two adjacent interpolator cells is less than the bit depth of the rendered image).
Bon appetit!!
The gradient is a mathematical operator that may help you.
If you can turn your interpolation into a differentiable function, the gradient of the height will always point in the direction of steepest ascent. All curves of equal height are perpendicular to the gradient of height evaluated at that point.
Your idea about starting from the highest point is sensible, but might miss features if there is more than one local maximum.
I'd suggest
pick height values at which you will draw lines
create a bunch of points on a fine, regularly spaced grid, then walk each point in small steps in the gradient direction towards the nearest height at which you want to draw a line
create curves by stepping each point perpendicular to the gradient; eliminate excess points by killing a point when another curve comes too close to it-- but to avoid destroying the center of hourglass like figures, you might need to check the angle between the oriented vector perpendicular to the gradient for both of the points. (When I say oriented, I mean make sure that the angle between the gradient and the perpendicular value you calculate is always 90 degrees in the same direction.)
In response to your comment to #erickson and to answer the point about calculating the gradient of your function. Instead of calculating the derivatives of your 300 term function you could do a numeric differentiation as follows.
Given a point [x,y] in your image you could calculate the gradient (direction of steepest decent)
g={ ( f(x+dx,y)-f(x-dx,y) )/(2*dx),
{ ( f(x,y+dy)-f(x,y-dy) )/(2*dy)
where dx and dy could be the spacing in your grid. The contour line will run perpendicular to the gradient. So, to get the contour direction, c, we can multiply g=[v,w] by matrix, A=[0 -1, 1 0] giving
c = [-w,v]
Alternately, there is the marching squares algorithm which seems appropriate to your problem, although you may want to smooth the results if you use a coarse grid.
The topo curves you want to draw are isosurfaces of a scalar field over 2 dimensions. For isosurfaces in 3 dimensions, there is the marching cubes algorithm.
I've wanted something like this myself, but haven't found a vector-based solution.
A raster-based solution isn't that bad, though, especially if your data is raster-based. If your data is vector-based too (in other words, you have a 3D model of your surface), you should be able to do some real math to find the intersection curves with horizontal planes at varying elevations.
For a raster-based approach, I look at each pair of neighboring pixels. If one is above a contour level, and one is below, obviously a contour line runs between them. The trick I used to anti-alias the contour line is to mix the contour line color into both pixels, proportional to their closeness to the idealized contour line.
Maybe some examples will help. Suppose that the current pixel is at an "elevation" of 12 ft, a neighbor is at an elevation of 8 ft, and contour lines are every 10 ft. Then, there is a contour line half way between; paint the current pixel with the contour line color at 50% opacity. Another pixel is at 11 feet and has a neighbor at 6 feet. Color the current pixel at 80% opacity.
alpha = (contour - neighbor) / (current - neighbor)
Unfortunately, I don't have the code handy, and there might have been a bit more to it (I vaguely recall looking at diagonal neighbors too, and adjusting by sqrt(2) / 2). I hope this enough to give you the gist.
It occurred to me that what you're trying to do would be pretty easy to do in MATLAB, using the contour function. Doing things like making low-density approximations to your contours can probably be done with some fairly simple post-processing of the contours.
Fortunately, GNU Octave, a MATLAB clone, has implementations of the various contour plotting functions. You could look at that code for an algorithm and implementation that's almost certainly mathematically sound. Or, you might just be able to offload the processing to Octave. Check out the page on interfacing with other languages to see if that would be easier.
Disclosure: I haven't used Octave very much, and I haven't actually tested it's contour plotting. However, from my experience with MATLAB, I can say that it will give you almost everything you're asking for in just a few lines of code, provided you get your data into MATLAB.
Also, congratulations on making a very VanGough-esque slopefield plot.
I always check places like http://mathworld.wolfram.com before going to deep on my own :)
Maybe their curves section would help? Or maybe the entry on maps.
compare what you have rendered with a real-world topo map - they look identical to me! i wouldn't change a thing...
Write the data out as an HGT file (very simple digital elevation data format used by USGS) and use the free and open-source gdal_contour tool to create contours. That works very well for terrestrial maps, the constraint being that the data points are signed 16-bit numbers, which fits the earthly range of heights in metres very well, but may not be enough for your data, which I assume not to be a map of actual terrain - although you do mention terrain maps.
I recommend the CONREC approach:
Create an empty line segment list
Split your data into regular grid squares
For each grid square, split the square into 4 component triangles:
For each triangle, handle the cases (a through j):
If a line segment crosses one of the cases:
Calculate its endpoints
Store the line segment in the list
Draw each line segment in the line segment list
If the lines are too jagged, use a smaller grid. If the lines are smooth enough and the algorithm is taking too long, use a larger grid.

Resources