What are some common focus stacking algorithms? - algorithm

I want to write my own focus stacking software but haven't been able to find a suitable explanation of any algorithm for extracting the in-focus portions of each image in the stack.
For those who are not familiar with focus stacking, this Wikipedia article does a nice job of explaining the idea.
Can anyone point me in the right direction for finding an algorithm? Even some key words to search would be helpful.

I realise this is over a year old but for anyone who is interested...
I have had a fair bit of experience in machine vision and this is how I would do it:
Load every image in memory
Perform a Gaussian blur on each image on one of the channels (maybe Green):
The simplest Gaussian kernel is:
1 2 1
2 4 2
1 2 1
The idea is to loop through every pixel and look at the pixels immediately adjacent. The pixel that you are looping through is multiplied by 4, and the neighboring pixels are multiplied by whatever value corresponds to the kernel above.
You can make a larger Gaussian kernel by using the equation:
exp(-(((x*x)/2/c/c+(y*y)/2/c/c)))
where c is the strength of the blur
Perform a Laplacian Edge Detection kernel on each Gaussian Blurred image but do not apply a threshold
The simplest Laplacian operator is:
-1 -1 -1
-1 8 -1
-1 -1 -1
same deal as the Gaussian, slide the kernel over the entire image and generate a result.
An equation to work out larger kernels is here:
(-1/pi/c/c/c/c)*(1-(x*x+y*y)/2/c/c)*exp(-(x*x+y*y)/2/c/c)
Take the absolute value of the Laplacian of Gaussian result. this will quantify the strength of edges with respect to the size and strength of your kernel.
Now create a blank image, loop through each pixel and find the strongest edge in the LoG (i.e. the highest value in the image stack) and take the RGB value for that pixel from the corresponding image.
Here is an example in MATLAB that I have created:
http://www.magiclantern.fm/forum/index.php?topic=11886.0
You are free to use it for whatever you like. It will create a file called Outsharp.bmp which is what you are after.
To better your output image you could:
- Compensate for differences in lightness levels between images (i.e. histogram matching or simple level adjustment)
- Create a custom algorithm to reject image noise
- Manually adjust the stack after you have generated it
- Apply a Gaussian blur (be sure to divide the result by 16) on the focus map so that the individual images are better merged
Good luck!

Related

How to decrease background noise in binary image

Here is an example of binary images, i.e. as input we have an imageByteArray with 2 possible values: 0 and 255.
Example1:
Example2:
The image contains some document edge on a background.
The task is to remove, decrease amount of background pixels with minimal impact on edge pixels.
The question is what modern algorithms, techniques exist to do this?
What I do not expect as an answer: use Gaussian blur to get rid of background noise, use bitonal algorithm (Canny, Sobel, etc.) thresholds or use Hough (Hough linearization goes crazy on such noise no matter what options are set)
The simplest solution is to detect all contours and filter out ones with the lowest length. This works good, but sometimes depending on an image it will also erase useful edge pixels pretty much.
Update:
As input I have standard RGB image with a document (driver license ID, check, bill, credit card, ...) on some background. The main task is to detect document edges. Next steps are pretty known: greyscale, blur, Sobel binarization, Hough probabilistic, find rectangle or trapezium (if trapezium shape found then go to perspective transformation). On simple contrast backgrounds it all works fine. The reason why I am asking about noise reduction is that I have to work with thousands of backgrounds and some of them give noise no matter what options used. The noise will cause additional lines no matter how Hough is configured and additional lines may fool subsequent logic and seriously affect performance. (It is implemented in java script, no OpenCV or GPU support).
It's hard to know whether this approach will work with all your images since you only provided one, but a Hough Line detection with ImageMagick and these parameters in the Terminal command-line produces this:
convert card.jpg \
\( +clone -background none -fill red -stroke red \
-strokewidth 2 -hough-lines 49x49+100 -write lines.mvg \
\) -composite hough.png
and the file lines.mvg contains 4 lines as follows:
# Hough line transform: 49x49+100
viewbox 0 0 1024 765
line 168.14,0 141.425,765 # 215
line 0,155.493 1024,191.252 # 226
line 0,653.606 1024,671.48 # 266
line 940.741,0 927.388,765 # 158
ImageMagick is installed on most Linux distros and is available for OSX and Windows from here.
I assume you did mean binary image instead of bitonic...
Do flood fill based segmentation
scan image for set pixels (color=255)
for each set pixel create a mask/map of its area
Just flood fill set pixels with 4 or 8 neighbor connection and count how many pixels you filled.
for each filled area compute its bounding box
detect edge lines
edge lines have rectangular bounding box so test its aspect ratio if close to square then this is not edge line
also too small bounding box means not an edge line
too small filled pixels count in comparison to bounding box bigger side size then area is also not an edge line
You can make this more robust if you regress line for set pixels of each area and compute the average distance between regressed line and each set pixel. If too high area is not edge line ...
recolor not edge lines areas to black
so either substract the mask from image or flood fill with black again ...
[notes]
Sometimes step #5 can mess the inside of document. In that case you do not recolor anything instead you remember all the regressed lines for edge areas. Then after whole process is done join together all lines that are parallel and close to same axis (infinite line) that should reduce to 4 big lines determining document rectangle. So now fill with black all outside pixels (by geometric approach)
For such tasks you would usually carefully examine input data and try to figure out what cues can you utilize. But unfortunately you have provided only one example, which makes this approach pretty useless. Besides, this representation is not really comfortable to work with - have you done some preprocessing, or this is what you get as input? In first case, you may get better advice if you can show us real input.
Next, if your goal is noise reduction and not document/background segmentation - you are really limited in options. Similar to what you said, I would try to detect connected components with 255 intensity (instead of detecting contours, which can be less robust) and remove ones with small area. That may fail on certain instances.
Besides, on image you have provided you can use local statistics to suppress areas of regular noise. This will reduce background clutter if you select neighborhood size appropriately.
But again, if you are doing this for document detection - there may be more robust approaches.
For example, if you know the foreground object (driver's ID) - you can try to collect a dataset of ID images, and calculate the 'typical' color histogram - it may be rather characteristic. After that, you can backproject this histogram on input image and get either rough region of interest, or maybe even precise mask. Then you may binarize it and try to detect contours. You may try different color spaces and bin sizes to see which fits best.
If you have to work in different lighting conditions you can try to equalize histogram or do some other preprocessing to reduce color variation caused by lighting.
Strictly answering the question for the binary image (i.e. after the harm as been made):
What seems characteristic of the edge pixels as opposed to noise is that they form (relatively) long and smooth chains.
So far I see no better way than tracing all chains of 8-connected pixels, for instance with a contour following algorithm, and detect the straight sections, for example by Douglas-Peucker simplification.
As the noise is only on the outside of the card, the outline of the blobs will have at least one "clean" section. Keep the sections that are long enough.
This may destroy the curved corners as well and actually you should look for the "smooth" paths that are long enough.
Unfortunately, I cannot advise of any specific algorithm to address that. It should probably be based on graph analysis combined to geometry (enumerating long paths in a graph and checking the local/global curvature).
As far as I know (after reading thousands related articles), this is nowhere addressed in the literature.
None of the previous answers would really work, the only thing that can work here is a blob filter, filter it so that blobs below a certain size get deleted.

Detecting individual images in an array of images

I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.

A Summary of How SURF Works

I am trying to figure out how SURF feature detection works. I think I have made some progress. I would like to know how off I am from what's really going on.
A template image you have already got stored and a real-world image
are compared on the basis of "key points" or some important features
in the two images.
The smallest Euclidean distance between the same points constitutes a
good match.
What constitutes an important feature or keypoint? A corner
(intersection of edges) or a blob (sharp change in intensity).
SURF uses blobs.
It uses a Hessian matrix for blob detection or feature extraction.
The Hessian matrix is a matrix of second derivatives: this is to
figure out the minima and maxima associated with the intensity of a
given region in the image.
sift/surf etc have 3 stages:
find features/keypoints that are likely to be found in different images of same object again (surf uses box filters afair). those features should be scale and rotation invariant if possible. corners, blobs etc are good and most often searched in multiple scales.
find the right "orientation" of that point so that if the image is rotated according to that orientation, both images are aligned in regard to that single keypoint.
computation of a "descriptor" that has information of how the neighborhood of the keypoint looks like (after orientation) in the right scale.
now your euclidean distance computation is done only on the descriptors, not on the keypoint locations!
it is important to know that step 1 isnt fixed for SURF. SURF in fact is step 2-3 but the authors give a suggestion how step 1 can be done to have some synergies with steps 2-3. the synergy is that both, step 1 and 3 use integral images to speed things up, so the integral image has to be computed only once.

Which way is my yarn oriented?

I have an image processing problem. I have pictures of yarn:
The individual strands are partly (but not completely) aligned. I would like to find the predominant direction in which they are aligned. In the center of the example image, this direction is around 30-34 degrees from horizontal. The result could be the average/median direction for the whole image, or just the average in each local neighborhood (producing a vector map of local directions).
What I've tried: I rotated the image in small steps (1 degree) and calculated statistics in the vertical vs horizontal direction of the rotated image (for example: standard deviation of summed rows or summed columns). I reasoned that when the strands are oriented exactly vertically or exactly horizontally the difference in statistics would be greatest, and so that angle of rotation is the correct direction in the original image. However, for at least several kinds of statistical properties I tried, this did not work.
I further thought that perhaps this wasn't working because there were too many different directions at the same time in the whole image, so I tired it in a small neighborhood. In this case, there is always a very clear preferred direction (different for each neighborhood), but it is not the direction that the fibers really go... I can post my sample code but it is basically useless.
I keep thinking there has to be some kind of simple linear algebra/statistical property of the whole image, or some value derived from the 2D FFT that would give the correct direction in one step... but how?
What probably won't work: detecting individual fibers. They are not necessarily the same color, and the image can shade from light to dark so edge detectors don't work well, and the image may not even be in focus sometimes. Because of that, it is not always even possible to see individual fibers for a human (see top-right in the example), they kinda have to be detected as preferred direction in a statistical sense.
You might try doing this in the frequency domain. The output of a Fourier Transform is orientation dependent so, if you have some kind of oriented pattern, you can apply a 2D FFT and you will see a clustering around a specific orientation.
For example, making a greyscale out of your image and performing FFT (with ImageJ) gives this:
You can see a distinct cluster that is oriented orthogonally with respect to the orientation of your yarn. With some pre-processing on your source image, to remove noise and maybe enhance the oriented features, you can probably achieve a much stronger signal in the FFT. Once you have a cluster, you can use something like PCA to determine the vector for the major axis.
For info, this is a technique that is often used to enhance oriented features, such as fingerprints, by applying a selective filter in the FFT and then taking the inverse to obtain a clearer image.
An alternative approach is to try a series of Gabor filters see here pre-built with a selection of orientations and frequencies and use the resulting features as a metric for identifying the most likely orientation. There is a scikit article that gives some examples here.
UPDATE
Just playing with ImageJ to give an idea of some possible approaches to this - I started with the FFT shown above, then - in the following image, I performed these operations (clockwise from top left) - Threshold => Close => Holefill => Erode x 3:
Finally, rather than using PCA, I calculated the spatial moments of the lower left blob using this ImageJ Plugin which handily calculates the orientation of the longest axis based on the 2nd order moment. The result gives an orientation of approximately -38 degrees (with respect to the X axis):
Depending on your frame of reference you can calculate the approximate average orientation of your yarn from this rather than from PCA.
I tried to use Gabor filters to enhance the orientations of your yarns. The parameters I used are:
phi = x*pi/16; % x = 1, 3, 5, 7
theta = 3;
sigma = 0.65*theta;
filterSize = 3;
And the imag part of the convoluted image are shown below:
As you mentioned, the most orientations lies between 30-34 degrees, thus the filter with phi = 5*pi/16 in left bottom yields the best contrast among the four.
I would consider using a Hough Transform for this type of problem, there is a nice write-up here.

How do I efficiently segment 2D images into regions/blobs of similar values?

How do I segment a 2D image into blobs of similar values efficiently? The given input is a n array of integer, which includes hue for non-gray pixels and brightness of gray pixels.
I am writing a virtual mobile robot using Java, and I am using segmentation to analyze the map and also the image from the camera. This is a well-known problem in Computer Vision, but when it's on a robot performance does matter so I wanted some inputs. Algorithm is what matters, so you can post code in any language.
Wikipedia article: Segmentation (image processing)
[PPT] Stanford CS-223-B Lecture 11 Segmentation and Grouping (which says Mean Shift is perhaps the best technique to date)
Mean Shift Pictures (paper is also available from Dorin Comaniciu)
I would downsample,in colourspace and in number of pixels, use a vision method(probably meanshift) and upscale the result.
This is good because downsampling also increases the robustness to noise, and makes it more likely that you get meaningful segments.
You could use floodfill to smooth edges afterwards if you need smoothness.
Some more thoughts (in response to your comment).
1) Did you blend as you downsampled? y[i]=(x[2i]+x[2i+1])/2 This should eliminate noise.
2)How fast do you want it to be?
3)Have you tried dynamic meanshift?(also google for dynamic x for all algorithms x)
Not sure if it is too efficient, but you could try using a Kohonen neural network (or, self-organizing map; SOM) to group the similar values, where each pixel contains the original color and position and only the color is used for the Kohohen grouping.
You should read up before you implement this though, as my knowledge of the Kohonen network goes as far as that it is used for grouping data - so I don't know what the performance/viability options are for your scenario.
There are also Hopfield Networks. They can be mangled into grouping from what I read.
What I have now:
Make a buffer of the same size as the input image, initialized to UNSEGMENTED.
For each pixel in the image where the corresponding buffer value is not UNSEGMENTED, flood the buffer using the pixel value.
a. The border checking of the flooding is done by checking if pixel is within EPSILON (currently set to 10) of the originating pixel's value.
b. Flood filling algorithm.
Possible issue:
The 2.a.'s border checking is called many times in the flood filling algorithm. I could turn it into a lookup if I could precalculate the border using edge detection, but that may add more time than current check.
private boolean isValuesCloseEnough(int a_lhs, int a_rhs) {
return Math.abs(a_lhs - a_rhs) <= EPSILON;
}
Possible Enhancement:
Instead of checking every single pixel for UNSEGMENTED, I could randomly pick a few points. If you are expecting around 10 blobs, picking random points in that order may suffice. Drawback is that you might miss a useful but small blob.
Check out Eyepatch (eyepatch.stanford.edu). It should help you during the investigation phase by providing a variety of possible filters for segmentation.
An alternative to flood-fill is the connnected-components algorithm. So,
Cheaply classify your pixels. e.g. divide pixels in colour space.
Run the cc to find the blobs
Retain the blobs of significant size
This approach is widely used in early vision approaches. For example in the seminal paper "Blobworld: A System for Region-Based Image Indexing and Retrieval".

Resources