At this page, and in the docs for p5.js, Perlin noise is described as having an output range of [0,1], but all other implementations I've found have a small range symmetric about 0, and it seems that this is what should theoretically hold as well. What's going on with Processing's implementation of Perlin noise? What are they doing differently? How can I replicate this in, say, python?
Processing is open-source. You can view the source code of the noise() function here (currently line 5293).
You also might want to read through this question which discusses the output range of Perlin noise.
Related
I'm working on light propagation and I compute PSFs for multifocal systems. I need a way to estimate the quality of each z-slice that I compute. For instance :
This could consider a focal point since all light is condense in a small region
Where this is definitely not one:
I'm looking for a way to express the quality of each slice. I thought about looking for the standard deviation of the best gaussian surface that fit the slice. For that I looked into numpy.std but it's clearly that std that I'm looking for.
I also looked into something called scipy.stats.kurtosis it's interesting but it's not perfect from my test and in the future I will need to apply this computation for the PSF and for the FTM (Fourier transform of the PSF).
I have a precise size for each pixel and I want the standard deviation like the width of the surface that fit my dataset at half height.
I looked into gaussian regression but it's way too long for each slice. I'm sure it exist a somehow simple process to compute this std (if it's actually named like that).
This is written in python but I intentionnaly don't put the tag because I'm sure people from other language could help with this question as well.
Without implementing openCV or calling QR code's recognition API, is there any quick and reliable algorithm to determine the existence of a QR code in an image?
The intention of this question is to improve the user experience of scanning QR code. When QR code's recognition fails, the program needs to know whether there really exists a QR code for it to scan and recognize QR code again or there is not any QR code so that the program can call other procedures.
To echo some response, the detection program doesn't need to be 100% accurate but returns an accurate result with reasonable probability. If we can use openCV here, Fourier Transformation will be easily implemented to detect whether there is an obvious high frequency in an image, which is a good sign of the existence of QR. But the integration of openCV will largely increase the size of my program, which I want to avoid.
It's great that you want to provide feedback to a user. Providing graphics that indicate the user is "getting warmer" in finding the QR code can make the process of finding and reading a code quicker and smoother.
It looks like you already have your answer, but to provide a more robust solution and/or have options, you might try one or more of the following:
Use N iterations to morph dark pixels closed, and the resulting squarish checkboard pattern should more closely resemble a filled square. This was part of a detection method I used to determine if a DataMatrix (a similar 2D code) was present whether it was readable or not. Whether this works will depend greatly on your background.
Before applying FFT, considering finding the affine transform to reduce perspective distortion. Analyzing FFT data can be a pain if the frequencies have a bit of spread because of foreshortening.
You could get some decent results using texture measures such as Local Binary Patterns (LBPs) or older techniques such as Law's Texture methods. You might even get lucky and be able to detect slight differences in the histogram of texture measures between a 2D code and a checkerboard pattern.
In regions of checkerboard-like patterns, look for the 3 guide features at the corners of the QR code. You could try SIFT/SURF-like methods, or perhaps implement a simpler match method by using a limited number of correlation templates that are tested in scale space.
Speaking of scale space: generate an image pyramid to save yourself the trouble of searching for squares in full-resolution images. You could try edge-preserving or non-edge-preserving methods to generate the smaller images in the pyramid, or perhaps a combination of both.
If you have code for fast kernel processing, you might try a corner detection method to reduce the amount of data you process to detect checkerboard-like patterns.
Look for clear bimodal distributions of grayscale values in squarish regions. 2D codes on paper labels tend to have stark contrast even though 2D codes on paper are quite readable at low contrast.
Rather than look for bimodal distribution of grayscale values, you could look for regions where gradient magnitudes are very consistent, nearly unimodal.
If you know the min/max area limits of a readable QR code, you could probabilistically sample the image for patches that match one or more of the above criteria: one mode of gradient magnitudes, nearly evenly space corner points, etc. If a patch does look promising, then jump to another random position with the caveat that the new patch was not previously found unpromising.
If you have the memory for an image pyramid, then working with reduced resolution images could be advantageous since you could try a number of tests fairly quickly.
As far as user interaction is concerned, you might also update the "this might be a QR code" graphic multiple times during pre-processing, and indicate degrees of confidence with progressively stronger/greener graphics (or whatever color is appropriate for the local culture). For example, if a patch of texture has a roughly 60% chance of being a QR code, you might display a thin yellowish-green rectangle with a dashed border. For an 80% - 90% likelihood you might display a solid rectangle of a more saturated green color. If you can update the graphics about every 100 - 200 milliseconds then a user will have some idea that some action such as moving the smart phone is helping or hurting.
1) convert the image into grayscale
2) divide the image into cells of n x m, say 3 x 3. This procedure intends to guarantee that at the least one cell will be fully covered by possible QR code if any
3) implement 2D Fourier Transformation for all the cells. If in any cell there is an significantly large value in high-frequency area in both X and Y axis, there is a high likelihood that there exists a QR code
I am addressing a probability issue rather than 100% accurate detection. In this algorithm, chessboard will be detected as QR code as well.
Can anybody explain me (simplified) what happen if I do an image comparison with FFT? I somehow don't understand how it's possible to convert a picture into frequencies and how this is used to differentiate between two images. Via Google I can not find a simple description, which I (as non mathematic/informatic) could understand.
Any help would be very appreaciated!
Thanks!
Alas, a good description of an FFT might involve subjects such as the calculus of complex variables and the computational theory of recursive algorithms. So a simple description may not be very accurate.
Think about sound. Looking at the waveform of the sound produced by two singers might not tell you much. The two waveforms would just be a complicated long and messy looking squiggles. But a frequency meter could quickly tell you that one person was singing way off pitch and whether they were a soprano or bass. So you might be able to determine that certain waveforms did not indicate a good match for who was singing from the frequency meter readings.
An FFT is like a big bunch of frequency meters. And each scan line of a photo is a waveform.
Around 2 centuries ago, some guy named Fourier proved that any reasonable looking waveform squiggle could be matched by an appropriate bunch of just sine waves, each at a single frequency. Other people several decades ago figured out a very clever way of very quickly calculating just which bunch of sine waves that that was. The FFT.
Discrete FFT transforms a (2D) matrix of let's say, pixel values, into a 2D matrix in frequency domain. You can use a library like FFTW to convert an image from the ordinary form to the spectral one. The result of your comparison depends on what you really compare.
Fourier transform works in other dimensions than 2d, as well. But you'll be interested in a 2D FFT.
I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.
I have been looking into different algorithms lately and have read quite alot about perlin noise. It seems like the only thing people use it for is to generate textures (clouds/wood grain) or to distribute trees.
What else can Perlin Noise be used for?
The best treatise on Perlin noise and things you can do with it I know is in Texturing and Modelling by Ebert, but Hugo Elias put together a rather good collection of pages on noise and other related subjects some time back which is worth a look.
I used it extensively for creating realistic-looking landscapes when I wrote a series of Landscape Visualisation programs back in the late 90 early 2000s using various forms of Perlin noise processes to handle the terrain generation. Many other programs do similar things - the wonderful Terragen for example.
I've also used it to apply realistic noise on top of other textures, for example to add 'roughness' to a Photorealistic Textile plugin for Photoshop.
Basically the charm of Perlin noise is that it's not random but turbulent, so in any case where you have a non-deterministic phenomenon it can be applied to give more 'natural' results. Defiantly a set or routines that any programmer should be familiar with as its use is appropriate in many circumstances where people tend to reach for a random number generator. For example using a Perlin function to derive variations in velocity of some modelled moving entity in a game (say due to wind or some such) works far better than applying random changes.
Don't forget about Worley noise too. It's a useful complement to Perlin.
The paper itself is here.
http://www.cse.ohio-state.edu/~nouanese/782/lab4/
http://www.flickr.com/photos/12739382#N04/2652571038/
I've already seen it for virtual character motion to seem more realistic.
It can be used in 4 dimensions (i.e x,y,z, time) to create volumetric clouds that appear and disappear. Add a base movement vector that varies over time and you have wind too.
One related use is for fractal generated terrains.