I'm trying to approximate a Fisheye Lense distortion. I originally used the polynomial method described in this paper, and that worked fine for a forward transform, but I forgot that I would need some sort of interpolation so a backward transform was needed, and I would need an inverse function for this transformation, which proved problematic (I used the non alternating power sign version ie SUM( polynomial_coefficients[i] * radius^i)) so the division model didn't appear to be appropriate (and would spit out bad results if I tried to use the non alternating power version because I would be dividing by my radius). I switched to what appears to be a more accurate method (correct me if I'm wrong and provide a more accurate method) via
r_distorted = scalar * ln(1 + lambda * r_undistorted)
and
r_undistorted = (e^(r_distorted/scalar) - 1)/lambda
which was featured in the same paper. I in the source paper I didn't understand how you would ever end up with no distortion with lower values of lambda, or what the heck I was supposed to do with the scalar value. I wanted to test my code in situatiations where lense distortion was zero, however this formula does not seem to provide a way for me to set the parameters to some value where the forward transform of (r_undistorted) = r_distorted or the inverse transform (r_distorted) = r_undistorted for all r_undistorted and r_distorted. This was trivial however in the polynomial example.
Currently I have the algorithm implemented, but values of 0 for lambda and 1 for scale do not result in no distortion (indeed its obvious to see why) since 1*ln(1 - 0 *x) = 0. This source also alters the equation to be instead of terms of distance from image plane (f in the images) and tan(theta), and leaves me even more confused. It would seem that there must be another variable implicitly involved into the equation that would allow such a transformation (no transform) to happen. It also appears un-intuitive how to actually control distortion using these two equations.
In short, how do I use this equation to apply no distortion, and what do both lambda and scalar mean physically, and what do they do? Are there better methods for accuracy in approximating fisheye transform with inverse?
Related
My question may seem trivial, but the more I read about it - the more confused I get... I have started a little project where I want to roughly track the movements of a rotating object. (A basketball to be precise)
I have a 3-axis accelerometer (low-pass-filtered) and a 3-axis gyroscope measuring °/s.
I know about the issues of a gyro, but as the measurements will only be several seconds and the angles tend to be huge - I don't care about drift and gimbal right now.
My Gyro gives me the rotation speed of all 3 axis. As I want to integrate the acceleration twice to get the position at each timestep, I wanted to convert the sensors coordinate-system into an earthbound system.
For the first try, I want to keep things simple, so I decided to go with the big standard rotation matrix.
But as my results are horrible I wonder if this is the right way to do so. If I understood correctly - the matrix is simply 3 matrices multiplied in a certain order. As rotation of a basketball doesn't have any "natural" order, this may not be a good idea. My sensor measures 3 angular velocitys at once. If I throw them into my system "step by step" it will not be correct since my second matrix calculates the rotation around the "new y-axis" , but my sensor actually measured an angular velocity around the "old y-axis". Is that correct so far?
So how can I correctly calculate the 3D rotation?
Do I need to go for quaternoins? but how do I get one from 3 different rotations? And don't I have the same issue here again?
I start with a unity-matrix ((1, 0, 0)(0, 1, 0)(0, 0, 1)) multiplied with the acceleration vector to give me the first movement.
Then I want use the Rotation matrix to find out, where the next acceleration is really heading so I can simply add the accelerations together.
But right now I am just too confused to find a proper way.
Any suggestions?
btw. sorry for my poor english, I am tired and (obviously) not a native speaker ;)
Thanks,
Alex
Short answer
Yes, go for quaternions and use a first order linearization of the rotation to calculate how orientation changes. This reduces to the following pseudocode:
float pose_initial[4]; // quaternion describing original orientation
float g_x, g_y, g_z; // gyro rates
float dt; // time step. The smaller the better.
// quaternion with "pose increment", calculated from the first-order
// linearization of continuous rotation formula
delta_quat = {1, 0.5*dt*g_x, 0.5*dt*g_y, 0.5*dt*g_z};
// final orientation at start time + dt
pose_final = quaternion_hamilton_product(pose_initial, delta_quat);
This solution is used in PixHawk's EKF navigation filter (it is open source, check out formulation here). It is simple, cheap, stable and accurate enough.
Unit matrix (describing a "null" rotation) is equivalent to quaternion [1 0 0 0]. You can get the quaternion describing other poses using a suitable conversion formula (for example, if you have Euler angles you can go for this one).
Notes:
Quaternions following [w, i, j, k] notation.
These equations assume angular speeds in SI units, this is, radians per second.
Long answer
A gyroscope describes the rotational speed of an object as a decomposition in three rotational speeds around the orthogonal local axes XYZ. However, you could equivalently describe the rotational speed as a single rate around a certain axis --either in reference system that is local to the rotated body or in a global one.
The three rotational speeds affect the body simultaneously, continously changing the rotation axis.
Here we have the problem of switching from the continuous-time real world to a simpler discrete-time formulation that can be easily solved using a computer. When discretizing, we are always going to introduce errors. Some approaches will lead to bigger errors, while others will be notably more accurate.
Your approach of concatenating three simultaneous rotations around orthogonal axes work reasonably well with small integration steps (let's say smaller than 1/1000 s, although it depends on the application), so that you are simulating the continuous change of rotation axis. However, this is computationally expensive, and error grows as you make time steps bigger.
As an alternative to first-order linearization, you can calculate pose increments as a small delta of angular speed gradient (also using quaternion representation):
quat_gyro = {0, g_x, g_y, g_z};
q_grad = 0.5 * quaternion_product(pose_initial, quat_gyro);
// Important to normalize result to get unit quaternion!
pose_final = quaternion_normalize(pose_initial + q_grad*dt);
This technique is used in Madgwick rotation filter (here an implementation), and works pretty fine for me.
There exist several ways to evaluate an image, brightness, saturation, hue, intensity, contrast etc. And we always hear about the operation of smoothing or sharperning an image. From this, there must exist a way to evaluate the overall smoothness of an image and an exact way to figure out this value in one formula probably based on wavelet. Or fortunately anyone could even provide the MATLAB function or combination of them to directly calculate this value.
Thanks in advance!
Smoothness is a vague term. What considered smooth for one application might not be considered smooth for another.
In the common case, smoothness is a function of the color gradients. Take a 2d gradient on the 3 color channels, then take their magnitude, sqrt(dx^2 + dy^2) and average, sum or some function over the 3 channels. That can give you local smoothness which you can then sum/average/least squares over the image.
In the more common case, however, linear changes in color is also smooth (think 2 color gradients, or how light might be reflected from an object). For that, a second differential could be more suitable. A laplacian does exactly that.
I've had much luck using the laplacian operator for calculating smoothness in Python with the scipy/numpy libraries. Similar utilities exist for matlab and other tools.
Note that the resulting value isn't something absolute from the math books, you should only use it relative to itself and using constants you deem fit.
Specific how to:
First get scipy. If you are on Linux it's it available on pypi. For Windows you'll have to use a precompiled version here. You should open the image using scipy.ndimage.imread and then use scipy.ndimage.filters.laplace on the image you read. You don't actually have to mix the channels, you can simply call numpy.average and it should be close enough.
import scipy as np
import scipy.ndimage as ndi
print np.average(np.absolute(ndi.filters.laplace(ndi.imread(path).astype(float) / 255.0)))
This would give the average smoothness (for some meaning of smoothness) of the image. I use np.absolute since values can be positive or negative and we don't want them to even out when averaging. I convert to float and divide by 255 to have values between 0.0 and 1.0 instead of 0 to 256, since it's easier to work with.
If you want to see the what the laplacian found, you can use matplotlib:
import matplotlib.pyplot as plt
v = np.absolute(ndi.filters.laplace(ndi.imread(path).astype(float) / 255.0))
v2 = np.average(v, axis=2) # Mixing the channels down
plt.imshow(v2);
plt.figure();
plt.imshow(v2 > 0.05);
plt.show()
So the Wikipedia page for path tracing (http://en.wikipedia.org/wiki/Path_tracing) contains a naive implementation of the algorithm with the following explanation underneath:
"All these samples must then be averaged to obtain the output color. Note this method of always sampling a random ray in the normal's hemisphere only works well for perfectly diffuse surfaces. For other materials, one generally has to use importance-sampling, i.e. probabilistically select a new ray according to the BRDF's distribution. For instance, a perfectly specular (mirror) material would not work with the method above, as the probability of the new ray being the correct reflected ray - which is the only ray through which any radiance will be reflected - is zero. In these situations, one must divide the reflectance by the probability density function of the sampling scheme, as per Monte-Carlo integration (in the naive case above, there is no particular sampling scheme, so the PDF turns out to be 1)."
The part I'm having trouble understanding is the part in bold. I am familiar with PDFs but I am not quite sure how they fit into here. If we stick to the mirror example, what would be the PDF value we would divide by? Why? How would I go about finding the PDF value to divide by if I was using an arbitrary BRDF value such as a Phong reflection model or Cook-Torrance reflection model, etc? Lastly, why do we divide by the PDF instead of multiply? If we divide, don't we give more weight to a direction with a lower probability?
Let's assume that we have only materials without color (greyscale). Then, their BDRF at each point can be expressed as a single valued function
float BDRF(phi_in, theta_in, phi_out, theta_out, pointWhereObjWasHit);
Here, phi and theta are the azimuth and zenith angles of the two rays under consideration. For pure Lambertian reflection, this function would look like this:
float lambertBRDF(phi_in, theta_in, phi_out, theta_out, pointWhereObjWasHit)
{
return albedo*1/pi*cos(theta_out);
}
albedo ranges from 0 to 1 - this measures how much of the incoming light is reemitted. The factor 1/pi ensures that the integral of BRDF over all outgoing vectors does not exceed 1. With the naive approach of the Wikipedia article (http://en.wikipedia.org/wiki/Path_tracing), one can use this BRDF as follows:
Color TracePath(Ray r, depth) {
/* .... */
Ray newRay;
newRay.origin = r.pointWhereObjWasHit;
newRay.direction = RandomUnitVectorInHemisphereOf(normal(r.pointWhereObjWasHit));
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected*lambertBDRF(r.phi,r.theta,newRay.phi,newRay.theta,r.pointWhereObjWasHit);
}
As mentioned in the article and by Ross, this random sampling is unfortunate because it traces incoming directions (newRay's) from which little light is reflected with the same probability as directions from which there is lots of light. Instead, directions whence much light is reflected to the observer should be selected preferentially, to have an equal sample rate per contribution to the final color over all directions. For that, one needs a way to generate random rays from a probability distribution. Let's say there exists a function that can do that; this function takes as input the desired PDF (which, ideally should be be equal to the BDRF) and the incoming ray:
vector RandomVectorWithPDF(function PDF(p_i,t_i,p_o,t_o,point x), Ray incoming)
{
// this function is responsible to create random Rays emanating from x
// with the probability distribution PDF. Depending on the complexity of PDF,
// this might somewhat involved. It is possible, however, to do it for Lambertian
// reflection (how exactly is math, not programming):
vector randomVector;
if(PDF==lambertBDRF)
{
float phi = uniformRandomNumber(0,2*pi);
float rho = acos(sqrt(uniformRandomNumber(0,1)));
float theta = pi/2-rho;
randomVector = getVectorFromAzimuthZenithAndNormal(phi,zenith,normal(incoming.whereObjectWasHit));
}
else // deal with other PDFs
return randomVector;
}
The code in the TracePath routine would then simply look like this:
newRay.direction = RandomVectorWithPDF(lambertBDRF,r);
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected;
Because the bright directions are preferred in the choice of samples, you do not have to weight them again by applying the BDRF as a scaling factor to reflected. However, if PDF and BDRF are different for some reason, you would have to scale down the output whenever PDF>BDRF (if you picked to many from the respective direction) and enhance it when you picked to little .
In code:
newRay.direction = RandomVectorWithPDF(PDF,r);
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected*BDRF(...)/PDF(...);
The output is best, however, if BDRF/PDF is equal to 1.
The question remains why can't one always choose the perfect PDF which is exactly equal to the BDRF? First, some random distributions are harder to compute than others. For example, if there was a slight variation in the albedo parameter, the algorithm would still do much better for the non-naive sampling than for uniform sampling, but the correction term BDRF/PDF would be needed for the slight variations. Sometimes, it might even be impossible to do it at all. Imagine a colored object with different reflective behavior of red green and blue - you could either render in three passes, one for each color, or use an average PDF, which fits all color components approximately, but none perfectly.
How would one go about implementing something like Phong shading? For simplicity, I still assume that there is only one color component, and that the ratio of diffuse to specular reflection is 60% / 40% (the notion of ambient light makes no sense in path tracing). Then my code would look like this:
if(uniformRandomNumber(0,1)<0.6) //diffuse reflection
{
newRay.direction=RandomVectorWithPDF(lambertBDRF,r);
reflected = TracePath(newRay,depth+1)/0.6;
}
else //specular reflection
{
newRay.direction=RandomVectorWithPDF(specularPDF,r);
reflected = TracePath(newRay,depth+1)*specularBDRF/specularPDF/0.4;
}
return emittance + reflected;
Here specularPDF is a distribution with a narrow peak around the reflected ray (theta_in=theta_out,phi_in=phi_out+pi) for which a way to create random vectors is available, and specularBDRF returns the specular intensity from Phong's model (http://en.wikipedia.org/wiki/Phong_reflection_model).
Note how the PDFs are modified by 0.6 and 0.4 respectively.
I'm by no means an expert in ray tracing, but this seems to be classic Monte Carlo:
You have lots of possible rays, and you choose one uniformly at random and then average over lots of trials.
The distribution you used to choose one of the rays was uniform (they were all equally as likely)
so you don't have to do any clever re-normalising.
However, Perhaps there are lots of possible rays to choose, but only a few would possibly lead to useful results.We therefore bias towards picking those 'useful' possibilities with higher probability, and then re-normalise (we are not choosing the rays uniformly any more, so we can't just take the average). This is
importance sampling.
The mirror example seems to be the following: only one possible ray will give a useful result.
If we choose a ray at random then the probability we hit that useful ray is zero: this is a property
of conditional probability on continuous spaces (it's not actually continuous, it's implicitly discretised
by your computer, so it's not quite true...): the probability of hitting something specific when there are infinitely many things must be zero.
Thus we are re-normalising by something with probability zero - standard conditional probability definitions
break when we consider events with probability zero, and that is where the problem would come from.
The image resizing function provided by Emgu (a .net wrapper for OpenCV) can use any one of four interpolation methods:
CV_INTER_NN (default)
CV_INTER_LINEAR
CV_INTER_CUBIC
CV_INTER_AREA
I roughly understand linear interpolation, but can only guess what cubic or area do. I suspect NN stands for nearest neighbour, but I could be wrong.
The reason I'm resizing an image is to reduce the amount of pixels (they will be iterated over at some point) whilst keeping them representative. I mention this because it seems to me that interpolation is central to this purpose - getting the right type ought therefore be quite important.
My question then, is what are the pros and cons of each interpolation method? How do they differ and which one should I use?
Nearest neighbor will be as fast as possible, but you will lose substantial information when resizing.
Linear interpolation is less fast, but will not result in information loss unless you're shrinking the image (which you are).
Cubic interpolation (probably actually "Bicubic") uses one of many possible formulas that incorporate multiple neighbor pixels. This is much better for shrinking images, but you are still limited as to how much shrinking you can do without information loss. Depending on the algorithm, you can probably reduce your images by 50% or 75%. The primary con of this approach is that it is much slower.
Not sure what "area" is - it may actually be "Bicubic". In all likelihood, this setting will give your best result (in terms of information loss / appearance), but at the cost of the longest processing time.
Update: this link gives more details (including a fifth type not included in your list):
http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html?highlight=resize#resize
The algorithms are: (descriptions are from the OpenCV documentation)
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
If you want more speed use Nearest Neighbor method.
If you want to preserve quality of Image after downsampling, you can consider using INTER_AREA based interpolation, but again it depends on image content.
You can find detailed analysis of speed comparison here
Below is the speed comparison on 400*400 px image taken from the above link
The interpolation method to use depends on what you are trying to achieve:
CV_INTER_LINEAR or CV_INTER_CUBIC apply a lowpass filter (average) in order to achieve a trade-off between visual quality and edge removal (lowpass filters tend to remove edges in order to reduce aliasing in images). Between these two, i'd recommend you CV_INTER_CUBIC.
CV_INTER_NN method actually is Nearest neighbour, it's the most basic method and you'll get sharper edges (no lowpass filter will be applied). However this method simply is like "zooming" the image, no visual enhancement.
They all lose information, which you use depends on the speed you need, how much information you can afford to lose and the nature of your image.
Sorry there is no correct answer - that's why there is a choice
I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.