How to normalize an image color? - algorithm

In their paper describing Viola-Jones object detection framework (Robust Real-Time Face Detection by Viola and Jones), it is said:
All example sub-windows used for training were variance normalized to
minimize the effect of different lighting conditions.
My question is "What kind of tool did they use to normalize the images?"
I'm NOT looking for the specific tool that Viola & Jones used but a similar one that produces almost the same output. I've been following a lot of haar-training tutorials(trying to detect a hand) but not yet able to output a good detector(xml).
I've tried contacting the authors, but still no response yet.

One possible way is to apply plain and simple normalization assuming normal distribution to all elements.
First find the average (Mu) and standard deviation (S):
Mu = 1/N * Sum(a[i][j]) for each i,j
S = sqrt(1/(N-1) * Sum((a[i][j] - Mu)^2)) for each i,j
(in here N is the number of pixels, 20*20 in the viola jones case)
From this, we can normalize the value of each pixel using standard normal distribution formula (by standardizing all values):
a'[i][j] = (a[i][j] - Mu) / S
Another method is vector normalization, which basically says:
Find the length of the vector: |a| = sqrt(sum (a[i][j]*a[i][j])) for each i,j
Assign: a'[i][j] = a[i][j] / |a|

Related

What is the algorithm behind Photoshop's Highlight or shadow alteration?

I want to write an image enhancement algorithm which is similar to photoshop's highlight and shadows alteration feature. Can you help me regarding what does this feature of photoshop do internally to an image?
Simple approach
To begin with, you can already find already some clue in their documentation: https://helpx.adobe.com/photoshop/using/adjust-shadow-highlight-detail.html
It's quite hard to guess from those documents which algorithm they use exactly. Below I will only try to explain some approaches I would use if I was facing this problem. Don't expect there a clear algorithm, but use my answer as pointers to drive you at least to a path.
As I understood, this algorithm improve the contrast in a local scale, meaning for each pixel it will adjust the value based on the neighborhood.
To do so you have several input parameters:
Neighborhood size (or Kernel)
Highlight Threshold: Everything above is considered as belonging to highlight
Shadow Threshold: Everything below is considered as belonging to shadow
Other ones are mentioned in the documentation, but they are not useful to understand the algorithmic concept.
1. Determine to which category the pixel belong: Highlight / Shadow / none.
For this part you might consider using either the grayscale image or the Value channel from HSV transformation.
I would take a look to the pixel and its neighborhood.
Compute statistics of the local distribution (mean and variance).
I will compare the mean to the thresholds value define previously, then use the variance to distinguish if the pixel is noisy or belonging to a contour, which on those case I'll expect a huge variance.
2. Apply the processing
In case the pixel is belonging to the shadow or highlight class you want to improve its contrast, not the "gray" but the "color" contrast.
Dumb approach:
Will be to weight your color channel according to their intra-variances.
Here is an example: Consider your pixel being: (32, 35, 50)(R,G,B) and belonging to shadow class. I will determine 3 coefficients Rc, Gc, Bc which are defined between 0.5 - 1.5 (arbitrary) which apply to the respective channel.
Since the Blue is dominant I would have a high coefficient for the blue like 1.3 and lower the importance of R and G channel with a coefficient about 0.8.
To compute these coefficients you can think to look at color variance, meaning differences between the color channels themselves and differences between each channels and the pixel mean.
Other (high-level) approaches
Laplacian Pyramids
Using the pyramids to distinguish the details in different scales and the laplacian to improve the contrast.
http://mcclanahoochie.com/blog/portfolio/opencl-image-pyramid-detail-enhancement/
https://www.darktable.org/2017/11/local-laplacian-pyramids/
Those links could be really helpful for you, especially because the sources are available and the concept are well explained.
I would advise you to continue your quest to look deeper in darktable. It's a powerful free/open-source alternative to Lightroom.
I already find some interesting stuff just by looking at their blog.
Sorry for this incomplete answer, I'll probably come back there to improve it.
All comments and suggestions are more than welcome
You can follow the following technique. It is not accurate but imitates well.
lumR = 0.299;
lumG = 0.587;
lumB = 0.114;
// we have to find luminance of the pixel
// here 0.0 <= source.r/source.g/source.b <= 1.0
// and 0.0 <= luminance <= 1.0
luminance = sqrt( lumR*pow(source.r,2.0) + lumG*pow(source.g,2.0) + lumB*pow(source.b,2.0));
// here highlights and and shadows are our desired filter amounts
// highlights/shadows should be >= -1.0 and <= +1.0
// highlights = shadows = 0.0 by default
// you can change 0.05 and 8.0 according to your needs but okay for me
h = highlights * 0.05 * ( pow(8.0, luminance) - 1.0 );
s = shadows * 0.05 * ( pow(8.0, 1.0 - luminance) - 1.0 );
output.r = source.r + h + s;
output.g = source.g + h + s;
output.b = source.b + h + s;

Accurate Approximations of Fisheye lense distortion via non polynomial FET method?

I'm trying to approximate a Fisheye Lense distortion. I originally used the polynomial method described in this paper, and that worked fine for a forward transform, but I forgot that I would need some sort of interpolation so a backward transform was needed, and I would need an inverse function for this transformation, which proved problematic (I used the non alternating power sign version ie SUM( polynomial_coefficients[i] * radius^i)) so the division model didn't appear to be appropriate (and would spit out bad results if I tried to use the non alternating power version because I would be dividing by my radius). I switched to what appears to be a more accurate method (correct me if I'm wrong and provide a more accurate method) via
r_distorted = scalar * ln(1 + lambda * r_undistorted)
and
r_undistorted = (e^(r_distorted/scalar) - 1)/lambda
which was featured in the same paper. I in the source paper I didn't understand how you would ever end up with no distortion with lower values of lambda, or what the heck I was supposed to do with the scalar value. I wanted to test my code in situatiations where lense distortion was zero, however this formula does not seem to provide a way for me to set the parameters to some value where the forward transform of (r_undistorted) = r_distorted or the inverse transform (r_distorted) = r_undistorted for all r_undistorted and r_distorted. This was trivial however in the polynomial example.
Currently I have the algorithm implemented, but values of 0 for lambda and 1 for scale do not result in no distortion (indeed its obvious to see why) since 1*ln(1 - 0 *x) = 0. This source also alters the equation to be instead of terms of distance from image plane (f in the images) and tan(theta), and leaves me even more confused. It would seem that there must be another variable implicitly involved into the equation that would allow such a transformation (no transform) to happen. It also appears un-intuitive how to actually control distortion using these two equations.
In short, how do I use this equation to apply no distortion, and what do both lambda and scalar mean physically, and what do they do? Are there better methods for accuracy in approximating fisheye transform with inverse?

Path Tracing algorithm - Need help understanding key point

So the Wikipedia page for path tracing (http://en.wikipedia.org/wiki/Path_tracing) contains a naive implementation of the algorithm with the following explanation underneath:
"All these samples must then be averaged to obtain the output color. Note this method of always sampling a random ray in the normal's hemisphere only works well for perfectly diffuse surfaces. For other materials, one generally has to use importance-sampling, i.e. probabilistically select a new ray according to the BRDF's distribution. For instance, a perfectly specular (mirror) material would not work with the method above, as the probability of the new ray being the correct reflected ray - which is the only ray through which any radiance will be reflected - is zero. In these situations, one must divide the reflectance by the probability density function of the sampling scheme, as per Monte-Carlo integration (in the naive case above, there is no particular sampling scheme, so the PDF turns out to be 1)."
The part I'm having trouble understanding is the part in bold. I am familiar with PDFs but I am not quite sure how they fit into here. If we stick to the mirror example, what would be the PDF value we would divide by? Why? How would I go about finding the PDF value to divide by if I was using an arbitrary BRDF value such as a Phong reflection model or Cook-Torrance reflection model, etc? Lastly, why do we divide by the PDF instead of multiply? If we divide, don't we give more weight to a direction with a lower probability?
Let's assume that we have only materials without color (greyscale). Then, their BDRF at each point can be expressed as a single valued function
float BDRF(phi_in, theta_in, phi_out, theta_out, pointWhereObjWasHit);
Here, phi and theta are the azimuth and zenith angles of the two rays under consideration. For pure Lambertian reflection, this function would look like this:
float lambertBRDF(phi_in, theta_in, phi_out, theta_out, pointWhereObjWasHit)
{
return albedo*1/pi*cos(theta_out);
}
albedo ranges from 0 to 1 - this measures how much of the incoming light is reemitted. The factor 1/pi ensures that the integral of BRDF over all outgoing vectors does not exceed 1. With the naive approach of the Wikipedia article (http://en.wikipedia.org/wiki/Path_tracing), one can use this BRDF as follows:
Color TracePath(Ray r, depth) {
/* .... */
Ray newRay;
newRay.origin = r.pointWhereObjWasHit;
newRay.direction = RandomUnitVectorInHemisphereOf(normal(r.pointWhereObjWasHit));
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected*lambertBDRF(r.phi,r.theta,newRay.phi,newRay.theta,r.pointWhereObjWasHit);
}
As mentioned in the article and by Ross, this random sampling is unfortunate because it traces incoming directions (newRay's) from which little light is reflected with the same probability as directions from which there is lots of light. Instead, directions whence much light is reflected to the observer should be selected preferentially, to have an equal sample rate per contribution to the final color over all directions. For that, one needs a way to generate random rays from a probability distribution. Let's say there exists a function that can do that; this function takes as input the desired PDF (which, ideally should be be equal to the BDRF) and the incoming ray:
vector RandomVectorWithPDF(function PDF(p_i,t_i,p_o,t_o,point x), Ray incoming)
{
// this function is responsible to create random Rays emanating from x
// with the probability distribution PDF. Depending on the complexity of PDF,
// this might somewhat involved. It is possible, however, to do it for Lambertian
// reflection (how exactly is math, not programming):
vector randomVector;
if(PDF==lambertBDRF)
{
float phi = uniformRandomNumber(0,2*pi);
float rho = acos(sqrt(uniformRandomNumber(0,1)));
float theta = pi/2-rho;
randomVector = getVectorFromAzimuthZenithAndNormal(phi,zenith,normal(incoming.whereObjectWasHit));
}
else // deal with other PDFs
return randomVector;
}
The code in the TracePath routine would then simply look like this:
newRay.direction = RandomVectorWithPDF(lambertBDRF,r);
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected;
Because the bright directions are preferred in the choice of samples, you do not have to weight them again by applying the BDRF as a scaling factor to reflected. However, if PDF and BDRF are different for some reason, you would have to scale down the output whenever PDF>BDRF (if you picked to many from the respective direction) and enhance it when you picked to little .
In code:
newRay.direction = RandomVectorWithPDF(PDF,r);
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected*BDRF(...)/PDF(...);
The output is best, however, if BDRF/PDF is equal to 1.
The question remains why can't one always choose the perfect PDF which is exactly equal to the BDRF? First, some random distributions are harder to compute than others. For example, if there was a slight variation in the albedo parameter, the algorithm would still do much better for the non-naive sampling than for uniform sampling, but the correction term BDRF/PDF would be needed for the slight variations. Sometimes, it might even be impossible to do it at all. Imagine a colored object with different reflective behavior of red green and blue - you could either render in three passes, one for each color, or use an average PDF, which fits all color components approximately, but none perfectly.
How would one go about implementing something like Phong shading? For simplicity, I still assume that there is only one color component, and that the ratio of diffuse to specular reflection is 60% / 40% (the notion of ambient light makes no sense in path tracing). Then my code would look like this:
if(uniformRandomNumber(0,1)<0.6) //diffuse reflection
{
newRay.direction=RandomVectorWithPDF(lambertBDRF,r);
reflected = TracePath(newRay,depth+1)/0.6;
}
else //specular reflection
{
newRay.direction=RandomVectorWithPDF(specularPDF,r);
reflected = TracePath(newRay,depth+1)*specularBDRF/specularPDF/0.4;
}
return emittance + reflected;
Here specularPDF is a distribution with a narrow peak around the reflected ray (theta_in=theta_out,phi_in=phi_out+pi) for which a way to create random vectors is available, and specularBDRF returns the specular intensity from Phong's model (http://en.wikipedia.org/wiki/Phong_reflection_model).
Note how the PDFs are modified by 0.6 and 0.4 respectively.
I'm by no means an expert in ray tracing, but this seems to be classic Monte Carlo:
You have lots of possible rays, and you choose one uniformly at random and then average over lots of trials.
The distribution you used to choose one of the rays was uniform (they were all equally as likely)
so you don't have to do any clever re-normalising.
However, Perhaps there are lots of possible rays to choose, but only a few would possibly lead to useful results.We therefore bias towards picking those 'useful' possibilities with higher probability, and then re-normalise (we are not choosing the rays uniformly any more, so we can't just take the average). This is
importance sampling.
The mirror example seems to be the following: only one possible ray will give a useful result.
If we choose a ray at random then the probability we hit that useful ray is zero: this is a property
of conditional probability on continuous spaces (it's not actually continuous, it's implicitly discretised
by your computer, so it's not quite true...): the probability of hitting something specific when there are infinitely many things must be zero.
Thus we are re-normalising by something with probability zero - standard conditional probability definitions
break when we consider events with probability zero, and that is where the problem would come from.

Scaling Laplacian of Gaussian Edge Detection

I am using Laplacian of Gaussian for edge detection using a combination of what is described in http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm and http://wwwmath.tau.ac.il/~turkel/notes/Maini.pdf
Simply put, I'm using this equation :
for(int i = -(kernelSize/2); i<=(kernelSize/2); i++)
{
for(int j = -(kernelSize/2); j<=(kernelSize/2); j++)
{
double L_xy = -1/(Math.PI * Math.pow(sigma,4))*(1 - ((Math.pow(i,2) + Math.pow(j,2))/(2*Math.pow(sigma,2))))*Math.exp(-((Math.pow(i,2) + Math.pow(j,2))/(2*Math.pow(sigma,2))));
L_xy*=426.3;
}
}
and using up the L_xy variable to build the LoG kernel.
The problem is, when the image size is larger, application of the same kernel is making the filter more sensitive to noise. The edge sharpness is also not the same.
Let me put an example here...
Suppose we've got this image:
Using a value of sigma = 0.9 and a kernel size of 5 x 5 matrix on a 480 × 264 pixel version of this image, we get the following output:
However, if we use the same values on a 1920 × 1080 pixels version of this image (same sigma value and kernel size), we get something like this:
[Both the images are scaled down version of an even larger image. The scaling down was done using a photo editor, which means the data contained in the images are not exactly similar. But, at least, they should be very near.]
Given that the larger image is roughly 4 times the smaller one... I also tried scaling the sigma by factor of 4 (sigma*=4) and the output was... you guessed it right, a black canvas.
Could you please help me realize how to implement a LoG edge detector that finds the same features from an input signal, even if the incoming signal is scaled up or down (scaling factor will be given).
Looking at your images, I suppose you are working in 24-bit RGB. When you increase your sigma, the response of your filter weakens accordingly, thus what you get in the larger image with a larger kernel are values close to zero, which are either truncated or so close to zero that your display cannot distinguish.
To make differentials across different scales comparable, you should use the scale-space differential operator (Lindeberg et al.):
Essentially, differential operators are applied to the Gaussian kernel function (G_{\sigma}) and the result (or alternatively the convolution kernel; it is just a scalar multiplier anyways) is scaled by \sigma^{\gamma}. Here L is the input image and LoG is Laplacian of Gaussian -image.
When the order of differential is 2, \gammais typically set to 2.
Then you should get quite similar magnitude in both images.
Sources:
[1] Lindeberg: "Scale-space theory in computer vision" 1993
[2] Frangi et al. "Multiscale vessel enhancement filtering" 1998

How can I choose an image with higher contrast in PHP?

For a thumbnail-engine I would like to develop an algorithm that takes x random thumbnails (crop, no resize) from an image, analyzes them for contrast and chooses the one with the highest contrast. I'm working with PHP and Imagick but I would be glad for some general tips about how to compute contrast of imagery.
It seems that many things are easier than computing contrast, for example counting colors, computing luminosity,etc.
What are your experiences with the analysis of picture material?
I'd do it that way (pseudocode):
L[256] = {0,0,0...}
loop over each pixel:
luminance = avg(R,G,B)
increment L[luminance] by 1
for i = 0 to 255:
if L[i] < C: L[i] = 0 // C = threshold of your chose
find index of first and last non-zero value of L[]
contrast = last - first
In looking for the image "with the highest contrast," you will need to be very careful in how you define contrast for the image. In the simplest way, contrast is the difference between the lowest intensity and the highest intensity in the image. That is not going to be very useful in your case.
I suggest you use a histogram approach to describe the contrast of a given image and then compare the properties of the histograms to determine the image with the highest contrast as you define it. You could use a variety of well known containers to represent the histogram in code, or construct a class to meet your specific needs. (I am not implying that you need to create a histogram in the form of a chart – just a statistical representation of the intensity values.) You could use the variance of each histogram directly as a measure of contrast, or use the standard deviation if that is easier to work with.
The key really lies in how you define the contrast of the image. In general, I would define a high contrast image as one with values present for all, or nearly all, the possible values. And I would further add that in this definition of a high contrast image, the intensity values of the image will tend to be distributed across the range of possible values in a uniform way.
Using this approach, a low contrast image would tend to have relatively few discrete intensity values and they would tend to be closely grouped together rather than uniformly distributed. (As a general rule, they will also tend to be grouped toward the center of the range.)

Resources