standard deviation of a 2D array but with a twist - std

I'm working on light propagation and I compute PSFs for multifocal systems. I need a way to estimate the quality of each z-slice that I compute. For instance :
This could consider a focal point since all light is condense in a small region
Where this is definitely not one:
I'm looking for a way to express the quality of each slice. I thought about looking for the standard deviation of the best gaussian surface that fit the slice. For that I looked into numpy.std but it's clearly that std that I'm looking for.
I also looked into something called scipy.stats.kurtosis it's interesting but it's not perfect from my test and in the future I will need to apply this computation for the PSF and for the FTM (Fourier transform of the PSF).
I have a precise size for each pixel and I want the standard deviation like the width of the surface that fit my dataset at half height.
I looked into gaussian regression but it's way too long for each slice. I'm sure it exist a somehow simple process to compute this std (if it's actually named like that).
This is written in python but I intentionnaly don't put the tag because I'm sure people from other language could help with this question as well.

Related

Quick and reliable algorithm to determine the existence of a QR code in an image?

Without implementing openCV or calling QR code's recognition API, is there any quick and reliable algorithm to determine the existence of a QR code in an image?
The intention of this question is to improve the user experience of scanning QR code. When QR code's recognition fails, the program needs to know whether there really exists a QR code for it to scan and recognize QR code again or there is not any QR code so that the program can call other procedures.
To echo some response, the detection program doesn't need to be 100% accurate but returns an accurate result with reasonable probability. If we can use openCV here, Fourier Transformation will be easily implemented to detect whether there is an obvious high frequency in an image, which is a good sign of the existence of QR. But the integration of openCV will largely increase the size of my program, which I want to avoid.
It's great that you want to provide feedback to a user. Providing graphics that indicate the user is "getting warmer" in finding the QR code can make the process of finding and reading a code quicker and smoother.
It looks like you already have your answer, but to provide a more robust solution and/or have options, you might try one or more of the following:
Use N iterations to morph dark pixels closed, and the resulting squarish checkboard pattern should more closely resemble a filled square. This was part of a detection method I used to determine if a DataMatrix (a similar 2D code) was present whether it was readable or not. Whether this works will depend greatly on your background.
Before applying FFT, considering finding the affine transform to reduce perspective distortion. Analyzing FFT data can be a pain if the frequencies have a bit of spread because of foreshortening.
You could get some decent results using texture measures such as Local Binary Patterns (LBPs) or older techniques such as Law's Texture methods. You might even get lucky and be able to detect slight differences in the histogram of texture measures between a 2D code and a checkerboard pattern.
In regions of checkerboard-like patterns, look for the 3 guide features at the corners of the QR code. You could try SIFT/SURF-like methods, or perhaps implement a simpler match method by using a limited number of correlation templates that are tested in scale space.
Speaking of scale space: generate an image pyramid to save yourself the trouble of searching for squares in full-resolution images. You could try edge-preserving or non-edge-preserving methods to generate the smaller images in the pyramid, or perhaps a combination of both.
If you have code for fast kernel processing, you might try a corner detection method to reduce the amount of data you process to detect checkerboard-like patterns.
Look for clear bimodal distributions of grayscale values in squarish regions. 2D codes on paper labels tend to have stark contrast even though 2D codes on paper are quite readable at low contrast.
Rather than look for bimodal distribution of grayscale values, you could look for regions where gradient magnitudes are very consistent, nearly unimodal.
If you know the min/max area limits of a readable QR code, you could probabilistically sample the image for patches that match one or more of the above criteria: one mode of gradient magnitudes, nearly evenly space corner points, etc. If a patch does look promising, then jump to another random position with the caveat that the new patch was not previously found unpromising.
If you have the memory for an image pyramid, then working with reduced resolution images could be advantageous since you could try a number of tests fairly quickly.
As far as user interaction is concerned, you might also update the "this might be a QR code" graphic multiple times during pre-processing, and indicate degrees of confidence with progressively stronger/greener graphics (or whatever color is appropriate for the local culture). For example, if a patch of texture has a roughly 60% chance of being a QR code, you might display a thin yellowish-green rectangle with a dashed border. For an 80% - 90% likelihood you might display a solid rectangle of a more saturated green color. If you can update the graphics about every 100 - 200 milliseconds then a user will have some idea that some action such as moving the smart phone is helping or hurting.
1) convert the image into grayscale
2) divide the image into cells of n x m, say 3 x 3. This procedure intends to guarantee that at the least one cell will be fully covered by possible QR code if any
3) implement 2D Fourier Transformation for all the cells. If in any cell there is an significantly large value in high-frequency area in both X and Y axis, there is a high likelihood that there exists a QR code
I am addressing a probability issue rather than 100% accurate detection. In this algorithm, chessboard will be detected as QR code as well.

FFT image comparison (theoretical)

Can anybody explain me (simplified) what happen if I do an image comparison with FFT? I somehow don't understand how it's possible to convert a picture into frequencies and how this is used to differentiate between two images. Via Google I can not find a simple description, which I (as non mathematic/informatic) could understand.
Any help would be very appreaciated!
Thanks!
Alas, a good description of an FFT might involve subjects such as the calculus of complex variables and the computational theory of recursive algorithms. So a simple description may not be very accurate.
Think about sound. Looking at the waveform of the sound produced by two singers might not tell you much. The two waveforms would just be a complicated long and messy looking squiggles. But a frequency meter could quickly tell you that one person was singing way off pitch and whether they were a soprano or bass. So you might be able to determine that certain waveforms did not indicate a good match for who was singing from the frequency meter readings.
An FFT is like a big bunch of frequency meters. And each scan line of a photo is a waveform.
Around 2 centuries ago, some guy named Fourier proved that any reasonable looking waveform squiggle could be matched by an appropriate bunch of just sine waves, each at a single frequency. Other people several decades ago figured out a very clever way of very quickly calculating just which bunch of sine waves that that was. The FFT.
Discrete FFT transforms a (2D) matrix of let's say, pixel values, into a 2D matrix in frequency domain. You can use a library like FFTW to convert an image from the ordinary form to the spectral one. The result of your comparison depends on what you really compare.
Fourier transform works in other dimensions than 2d, as well. But you'll be interested in a 2D FFT.

Where can I find a good read about bicubic interpolation and Lanczos resampling?

I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.

Novel fitness measure for evolutionary image matching simulation

I'm sure many people have already seen demos of using genetic algorithms to generate an image that matches a sample image. You start off with noise, and gradually it comes to resemble the target image more and more closely, until you have a more-or-less exact duplicate.
All of the examples I've seen, however, use a fairly straightforward pixel-by-pixel comparison, resulting in a fairly predictable 'fade in' of the final image. What I'm looking for is something more novel: A fitness measure that comes closer to what we see as 'similar' than the naive approach.
I don't have a specific result in mind - I'm just looking for something more 'interesting' than the default. Suggestions?
I assume you're talking about something like Roger Alsing's program.
I implemented a version of this, so I'm also interested in alternative fitness functions, though I'm coming at it from the perspective of improving performance rather than aesthetics. I expect there will always be some element of "fade-in" due to the nature of the evolutionary process (though tweaking the evolutionary operators may affect how this looks).
A pixel-by-pixel comparison can be expensive for anything but small images. For example, the 200x200 pixel image I use has 40,000 pixels. With three values per pixel (R, G and B), that's 120,000 values that have to be incorporated into the fitness calculation for a single image. In my implementation I scale the image down before doing the comparison so that there are fewer pixels. The trade-off is slightly reduced accuracy of the evolved image.
In investigating alternative fitness functions I came across some suggestions to use the YUV colour space instead of RGB since this is more closely aligned with human perception.
Another idea that I had was to compare only a randomly selected sample of pixels. I'm not sure how well this would work without trying it. Since the pixels compared would be different for each evaluation it would have the effect of maintaining diversity within the population.
Beyond that, you are in the realms of computer vision. I expect that these techniques, which rely on feature extraction, would be more expensive per image, but they may be faster overall if they result in fewer generations being required to achieve an acceptable result. You might want to investigate the PerceptualDiff library. Also, this page shows some Java code that can be used to compare images for similarity based on features rather than pixels.
A fitness measure that comes closer to what we see as 'similar' than the naive approach.
Implementing such a measure in software is definitely nontrivial. Google 'Human vision model', 'perceptual error metric' for some starting points. You can sidestep the issue - just present the candidate images to a human for selecting the best ones, although it might be a bit boring for the human.
I haven't seen such a demo (perhaps you could link one). But a couple proto-ideas from your desription that may trigger an interesting one:
Three different algorithms running in parallel, perhaps RGB or HSV.
Move, rotate, or otherwise change the target image slightly during the run.
Fitness based on contrast/value differences between pixels, but without knowing the actual colour.
...then "prime" a single pixel with the correct colour?
I would agree with other contributors that this is non-trivial. I'd also add that it would be very valuable commercially - for example, companies who wish to protect their visual IP would be extremely happy to be able to trawl the internet looking for similar images to their logos.
My naïve approach to this would be to train a pattern recognizer on a number of images, each generated from the target image with one or more transforms applied to it: e.g. rotated a few degrees either way; a translation a few pixels either way; different scales of the same image; various blurs and effects (convolution masks are good here). I would also add some randomness noise to the each of the images. The more samples the better.
The training can all be done off-line, so shouldn't cause a problem with runtime performance.
Once you've got a pattern recognizer trained, you can point it at the the GA population images, and get some scalar score out of the recognizers.
Personally, I like Radial Basis Networks. Quick to train. I'd start with far too many inputs, and whittle them down with principle component analysis (IIRC). The outputs could just be a similiarity measure and dissimilarity measure.
One last thing; whatever approach you go for - could you blog about it, publish the demo, whatever; let us know how you got on.

What is currently considered the "best" algorithm for 2D point-matching?

I have two lists containing x-y coordinates (of stars). I could also have magnitudes (brightnesses) attached to each star. Now each star has random position jiggles and there can be a few extra or missing points in each image. My question is, "What is the best 2D point matching algorithm for such a dataset?" I guess both for a simple linear (translation, rotation, scale) and non-linear (say, n-degree polynomials in the coordinates). In the lingo of the point matching field, I'm looking for the algorithms that would win in a shootout between 2D point matching programs with noise and spurious points. There may be a different "winners" depending if the labeling info is used (the magnitudes) and/or the transformation is restricted to being linear.
I am aware that there are many classes of 2D point matching algorithms and many algorithms in each class (literally probably hundreds in total) but I don't know which, if any, is the consider the "best" or the "most standard" by people in the field of computer vision. Sadly, many of the articles to papers I want to read don't have online versions and I can only read the abstract. Before I settle on a particular algorithm to implement it would be good to hear from a few experts to separate the wheat from the chaff.
I have a working matching program that uses triangles but it fails somewhat frequently (~5% of the time) such that the solution transformation has obvious distortions but for no obvious reason. This program was not written by me and is from a paper written almost 20 years ago. I want to write a new implementation that performs most robustly. I am assuming (hoping) that there have been some advances in this area that make this plausible.
If you're interested in star matching, check out the Astrometry.net blind astrometry solver and the paper on it here. They use four point quads to solve star configurations in Flickr pictures of the night sky. Check out this interview.
There is no single "best" algorithm for this. There are lots of different techniques, and each work better than others on specific datasets and types of data.
One thing I'd recommend is to read this introduction to image registration from the tutorials of the Insight Toolkit. ITK supports MANY types of image registration (which is what it sounds like you are attempting), and is very robust in many cases. Most of their users are in the medical field, so you'll have to wade through a lot of medical jargon, but the algorithms and code work with any type of image (including 1,2,3, and n dimensional images, of different types,etc).
You can consider applying your algorithm first only on the N brightest stars, then include progressively the others to refine the result, reducing the search range at the same time.
Using RANSAC for robustness to extra points is also very common.
I'm not sure it would work, but worth a try:
For each star do the circle time ray Fourier transform - centered around it - of all the other stars (note: this is not the standard Fourier transform, which is line times line).
The phase space of circle times ray is integers times line, but since we only have finite accuracy, you just get a matrix; the dimensions of the matrix depend on accuracy. Now try to pair the matrices to one another (e.g. using L_2 norm)
I saw a program on tv a while ago about how researchers were taking pictures of whales and using the spots on them (which are unique for each whale) to id each whale. It used the angles between the spots. By using the angles it didn't matter if the image was rotated or scaled or translated. That sounds similar to what you're doing with your triangles.
I think the "best" (most technical) way would to be to take the Fourier Transform of the original image and of the new linearly modified image. By doing some simple filtering, it should be easy to figure out the orientation and scale of your image with respect to the old one. There is a description of the 2d Fourier Transform here.

Resources