Alpha Compositing Algorithm (Blend Modes) - algorithm

I'm trying to implement blend modes from the PDF specification, for my own pleasure, in SASS.
PDF Specification:
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
Page 322 is the alpha compositing section.
The input values are in RGBA format, but now I'm trying to implement blending of the alpha component. How do I excatly go about doing it? From what I gather my values should be between 0.0 and 1.0, that's done. However from the specs it seems that you should blend for each color channel? Do I just average it out to get back to RGBA form for my alpha component?
Any help is appriciated, I don't mind reading blog, books etc. to get my answer, as this is purely an intellectual exercise.
Thanks in advance,
Emil

The SVG spec has a lot of good equations for various blending modes. And yes, you do have to calculate both the new alpha and the new colour -- for each channel. For standard blending modes, the alpha is calculated this way:
alpha_final = alpha_bg + alpha_fg - alpha_bg * alpha_fg
Note: I see you're considering alpha to be between 0 and 1, which is good. Alpha values in CSS are always defined as float values from 0 to 1; it's good to stick with this convention, because it makes the calculations immensely easier.
It helps to 'premultiply' each colour channel by its alpha; these are more helpful for interpreting and using the usual formulae:
colour_bg_a = colour_bg * alpha_bg
In other words:
red_bg_a = red_bg * alpha_bg
green_bg_a = green_bg * alpha_bg
blue_bg_a = blue_bg * alpha_bg
Then, for plain-jane alpha compositing (like overlaying sheets of tracing paper, also known as src-over in Porter and Duff's original paper and the SVG alpha compositing spec), you take each channel and calculate it thus:
colour_final_a = colour_fg_a + colour_bg_a * (1 - alpha_fg)
The last step is to 'un-multiply' each final colour channel value by the final alpha:
colour_final = colour_final_a / alpha_final
and put it into your mixin somehow:
rgba(red_final, green_final, blue_final, alpha_final)
The other blending modes (multiply, difference, screen, etc) are slightly more complicated formulas, but the concept for every single mode is the same:
Separate the R, G, B, and A values of both the foreground and background colours
Calculate the alpha of the new, final colour with the above formula
Pre-multiply all the R, G, and B values by their alpha value
Calculate the new, final R, G, and B values (insert blending mode formula here)
Un-multiply the final R, G, and B values by the final alpha
Clip the final R, G, and B values so that they are between 0 and 255 (necessary for some modes, but not all)
Put the colour back together again!
If you're still interested in this, I've been doing the very thing in Stylus. You can see my progress here: https://github.com/pdaoust/stylus-helpers/blob/master/blend.styl You might be able to use it as a starting point for your own Sass mixin.
The first thing I do is convert all the R, G, and B values from 0 - 255 values to 0 - 1 float values for the purposes of the calculations. I don't know if that's necessary, and it does require converting them back to 0 - 255 values. It felt right to me, and Porter and Duff worked in 0 - 1 float values in their original paper.
(I'm encountering trouble with some of the compositing modes, which produce wildly different results from the expected results that the SVG spec pictures. I suspect that the spec gives the wrong equations. If anyone knows about Porter/Duff blending modes, I'd be very grateful for their help!)

Related

What is the algorithm behind Photoshop's Highlight or shadow alteration?

I want to write an image enhancement algorithm which is similar to photoshop's highlight and shadows alteration feature. Can you help me regarding what does this feature of photoshop do internally to an image?
Simple approach
To begin with, you can already find already some clue in their documentation: https://helpx.adobe.com/photoshop/using/adjust-shadow-highlight-detail.html
It's quite hard to guess from those documents which algorithm they use exactly. Below I will only try to explain some approaches I would use if I was facing this problem. Don't expect there a clear algorithm, but use my answer as pointers to drive you at least to a path.
As I understood, this algorithm improve the contrast in a local scale, meaning for each pixel it will adjust the value based on the neighborhood.
To do so you have several input parameters:
Neighborhood size (or Kernel)
Highlight Threshold: Everything above is considered as belonging to highlight
Shadow Threshold: Everything below is considered as belonging to shadow
Other ones are mentioned in the documentation, but they are not useful to understand the algorithmic concept.
1. Determine to which category the pixel belong: Highlight / Shadow / none.
For this part you might consider using either the grayscale image or the Value channel from HSV transformation.
I would take a look to the pixel and its neighborhood.
Compute statistics of the local distribution (mean and variance).
I will compare the mean to the thresholds value define previously, then use the variance to distinguish if the pixel is noisy or belonging to a contour, which on those case I'll expect a huge variance.
2. Apply the processing
In case the pixel is belonging to the shadow or highlight class you want to improve its contrast, not the "gray" but the "color" contrast.
Dumb approach:
Will be to weight your color channel according to their intra-variances.
Here is an example: Consider your pixel being: (32, 35, 50)(R,G,B) and belonging to shadow class. I will determine 3 coefficients Rc, Gc, Bc which are defined between 0.5 - 1.5 (arbitrary) which apply to the respective channel.
Since the Blue is dominant I would have a high coefficient for the blue like 1.3 and lower the importance of R and G channel with a coefficient about 0.8.
To compute these coefficients you can think to look at color variance, meaning differences between the color channels themselves and differences between each channels and the pixel mean.
Other (high-level) approaches
Laplacian Pyramids
Using the pyramids to distinguish the details in different scales and the laplacian to improve the contrast.
http://mcclanahoochie.com/blog/portfolio/opencl-image-pyramid-detail-enhancement/
https://www.darktable.org/2017/11/local-laplacian-pyramids/
Those links could be really helpful for you, especially because the sources are available and the concept are well explained.
I would advise you to continue your quest to look deeper in darktable. It's a powerful free/open-source alternative to Lightroom.
I already find some interesting stuff just by looking at their blog.
Sorry for this incomplete answer, I'll probably come back there to improve it.
All comments and suggestions are more than welcome
You can follow the following technique. It is not accurate but imitates well.
lumR = 0.299;
lumG = 0.587;
lumB = 0.114;
// we have to find luminance of the pixel
// here 0.0 <= source.r/source.g/source.b <= 1.0
// and 0.0 <= luminance <= 1.0
luminance = sqrt( lumR*pow(source.r,2.0) + lumG*pow(source.g,2.0) + lumB*pow(source.b,2.0));
// here highlights and and shadows are our desired filter amounts
// highlights/shadows should be >= -1.0 and <= +1.0
// highlights = shadows = 0.0 by default
// you can change 0.05 and 8.0 according to your needs but okay for me
h = highlights * 0.05 * ( pow(8.0, luminance) - 1.0 );
s = shadows * 0.05 * ( pow(8.0, 1.0 - luminance) - 1.0 );
output.r = source.r + h + s;
output.g = source.g + h + s;
output.b = source.b + h + s;

Summed area table in GLSL and GPU fragment shader execution

I am trying to compute the integral image (aka summed area table) of a texture I have in the GPU memory (a camera capture), the goal being to compute the adaptive threshold of said image. I'm using OpenGL ES 2.0, and still learning :).
I did a test with a simple gaussian blur shader (vertical/horizontal pass), which is working fine, but I need a way bigger variable average area for it to give satisfactory results.
I did implement a version of that algorithm on CPU before, but I'm a bit confused on how to implement that on a GPU.
I tried to do a (completely incorrect) test with just something like this for every fragment :
#version 100
#extension GL_OES_EGL_image_external : require
precision highp float;
uniform sampler2D u_Texture; // The input texture.
varying lowp vec2 v_TexCoordinate; // Interpolated texture coordinate per fragment.
uniform vec2 u_PixelDelta; // Pixel delta
void main()
{
// get neighboring pixels values
float center = texture2D(u_Texture, v_TexCoordinate).r;
float a = texture2D(u_Texture, v_TexCoordinate + vec2(u_PixelDelta.x * -1.0, 0.0)).r;
float b = texture2D(u_Texture, v_TexCoordinate + vec2(0.0, u_PixelDelta.y * 1.0)).r;
float c = texture2D(u_Texture, v_TexCoordinate + vec2(u_PixelDelta.x * -1.0, u_PixelDelta.y * 1.0)).r;
// compute value
float pixValue = center + a + b - c;
// Result stores value (R) and original gray value (G)
gl_FragColor = vec4(pixValue, center, center, 1.0);
}
And then another shader to get the area that I want and then get the average. This is obviously wrong as there's multiple execution units operating at the same time.
I know that the common way of computing a prefix sum on a GPU is to do it in two pass (vertical/horizontal, as discussed here on this thread or or here), but isn't there a problem here as there is a data dependency on each cell from the previous (top or left) one ?
I can't seem to understand the order in which the multiple execution units on a GPU will process the different fragments, and how a two-pass filter can solve that issue. As an example, if I have some values like this :
2 1 5
0 3 2
4 4 7
The two pass should give (first columns then rows):
2 1 5 2 3 8
2 4 7 -> 2 6 13
6 8 14 6 14 28
How can I be sure that, as an example, the value [0;2] will be computed as 6 (2 + 4) and not 4 (0 + 4, if the 0 hasn't been computed yet) ?
Also, as I understand that fragments are not pixels (If I'm not mistaken), would the values I store back in one of my texture in the first pass be the same in another pass if I use the exact same coordinates passed from the vertex shader, or will they be interpolated in some way ?
Tommy and Bartvbl address your questions about a summed-area table, but your core problem of an adaptive threshold may not need that.
As part of my open source GPUImage framework, I've done some experimentation with optimizing blurs over large radii using OpenGL ES. Generally, increasing blur radii leads to a significant increase in texture sampling and calculations per pixel, with an accompanying slowdown.
However, I found that for most blur operations you can apply a surprisingly effective optimization to cap the number of blur samples. If you downsample the image before blurring, blur at a smaller pixel radius (radius / downsampling factor), and then linearly upsample, you can arrive at a blurred image that is the equivalent of one blurred at a much larger pixel radius. In my tests, these downsampled, blurred, and then upsampled images look almost identical to the ones blurred based on the original image resolution. In fact, precision limits can lead to larger-radii blurs done at a native resolution breaking down in image quality past a certain size, where the downsampled ones maintain the proper image quality.
By adjusting the downsampling factor to keep the downsampled blur radius constant, you can achieve near constant-time blurring speeds in the face of increasing blur radii. For a adaptive threshold, the image quality should be good enough to use for your comparisons.
I use this approach in the Gaussian and box blurs within the latest version of the above-linked framework, so if you're running on Mac, iOS, or Linux, you can evaluate the results by trying out one of the sample applications. I have an adaptive threshold operation based on a box blur that uses this optimization, so you can see if the results there are what you want.
AS per the above, it's not going to be fantastic on a GPU. But assuming the cost of shunting data between the GPU and CPU is more troubling it may still be worth persevering.
The most obvious prima facie solution is to split horizontal/vertical as discussed. Use an additive blending mode, create a quad that draws the whole source image then e.g. for the horizontal step on a bitmap of width n issue a call that requests the quad be drawn n times, the 0th time at x = 0, the mth time at x = m. Then ping pong via an FBO, switching the target of buffer of the horizontal draw into the source texture for the vertical.
Memory accesses are probably O(n^2) (i.e. you'll probably cache quite well, but that's hardly a complete relief) so it's a fairly poor solution. You could improve it by divide and conquer by doing the same thing in bands — e.g. for the vertical step, independently sum individual rows of 8, after which the error in every row below the final is the failure to include whatever the sums are on that row. So perform a second pass to propagate those.
However an issue with accumulating in the frame buffer is clamping to avoid overflow — if you're expecting a value greater than 255 anywhere in the integral image then you're out of luck because the additive blending will clamp and GL_RG32I et al don't reach ES prior to 3.0.
The best solution I can think of to that, without using any vendor-specific extensions, is to split up the bits of your source image and combine channels after the fact. Supposing your source image were 4 bit and your image less than 256 pixels in both directions, you'd put one bit each in the R, G, B and A channels, perform the normal additive step, then run a quick recombine shader as value = A + (B*2) + (G*4) + (R*8). If your texture is larger or smaller in size or bit depth then scale up or down accordingly.
(platform specific observation: if you're on iOS then you've hopefully already got a CVOpenGLESTextureCache in the loop, which means you have CPU and GPU access to the same texture store, so you might well prefer to kick this step off to GCD. iOS is amongst the platforms supporting EXT_shader_framebuffer_fetch; if you have access to that then you can write any old blend function you like and at least ditch the combination step. Also you're guaranteed that preceding geometry has completed before you draw so if each strip writes its totals where it should and also to the line below then you can perform the ideal two-pixel-strips solution with no intermediate buffers or state changes)
What you attempt to do cannot be done in a fragment shader. GPU's are by nature very different to CPU's by executing their instructions in parallel, in massive numbers at the same time. Because of this, OpenGL does not make any guarantees about execution order, because the hardware physically doesn't allow it to.
So there is not really any defined order other than "whatever the GPU thread block scheduler decides".
Fragments are pixels, sorta-kinda. They are pixels that potentially end up on screen. If another triangle ends up in front of another, the previous calculated colour value is discarded. This happens regardless of whatever colour was stored at that pixel in the colour buffer previously.
As for creating the summed area table on the GPU, I think you may first want to look at GLSL "Compute Shaders", which are specifically made for this sort of thing.
I think you may be able to get this to work by creating a single thread for each row of pixels in the table, then have every thread "lag behind" by 1 pixel compared to the previous row.
In pseudocode:
int row_id = thread_id()
for column_index in (image.cols + image.rows):
int my_current_column_id = column_index - row_id
if my_current_column_id >= 0 and my_current_column_id < image.width:
// calculate sums
The catch of this method is that all threads should be guaranteed to execute their instructions simultaneously without getting ahead of one another. This is guaranteed in CUDA, but I'm not sure whether it is in OpenGL compute shaders. It may be a starting point for you, though.
It may look surprising for the beginner but the prefix sum or SAT calculation is suitable for parallelization. As the Hensley algorithm is the most intuitive to understand (also implemented in OpenGL), more work-efficient parallel methods are available, see CUDA scan. The paper from Sengupta discuss parallel method which seems state-of-the-art efficient method with reduce and down swap phases. These are valuable materials but they do not enter OpenGL shader implementations in detail. The closest document is the presentation you have found (it refers to Hensley publication), since it has some shader snippets. This is the job which is doable entirely in fragment shader with FBO Ping-Pong. Note that the FBO and its texture need to have internal format set to high precision - GL_RGB32F would be best but I am not sure if it is supported in OpenGL ES 2.0.

Zero out pixels that aren't at a particular intensity of a single colour

I'm having a little bit of difficulty wrapping my head around the correct terminology to use in phrasing my question, so I'll just take a stab at it and perhaps I can get some help in clarifying it along the way toward a solution.
I want to detect some coloured lights in an image, so I need a way to:
a) determine the colour of pixels
b) determine how "intense" or "bright" they are
c) use the two values above as a threshold or criteria for whether or not to discard a given pixel
I figured brightness alone will probably not be a good way to do this, since there will be non-zero ambient light.
Thanks!
EDIT: So using MATLAB's colour thresholder, I was able to isolate the coloured lights by restricting the hue range in HSV space. Just trying to figure out a way to do this via the command line.
Well there are two separate steps. 1 is finding out what you want to isolate, and 2 is isolation
1)Seems like you got this figured out. But for the future you can use the "imtool" command. It is nearly the same as imshow, but it allows you to inspect pixel values(RGB, you would convert these to HSV using rgb2hsv), crop images, zoom, measure distances, etc. It can be really helpful.
imtool(my_im)
will open up the window, pretty simple.
2)Now that you have your values you want to isolate them. The term you are looking for is MASKING A misk is typically a binary matrix/vector with 1's (true) corresponding to areas of interest and 0's (false) elsewhere. Matlab calls these "logical" arrays. So lets just say you found your areas of interest were as follows
hue=0.2 to 0.3, saturation=don't care, brightness= greater than .5
you would create your mask by doing binary comparisons on the pixels. I will split this into three steps just so you can make sense of everything.
%% MASKING STEPS
hue_idx = 1; sat_idx =2 ; bright_idx = 3;
hue_mask = ((my_hsv_im(:,:,hue_idx ) > 0.2) & (my_hsv_im(:,:,hue_idx ) < 0.3));
%note we have no saturation mask, because it would be filled with ones
%since we dont care about the saturation values
brightness_mask = (my_hsv_im(:,:,bright_idx ) > 0.5);
total_mask = hue_mask & brightness_mask;
%% ALL THE REST
%now we mask your image, recall that 1's are ares of interest and 0's are
%nothing so just multiply your image by your mask
% the mask is a logical array size MxNx1, we need to convert it to the same
%type as our image in order to multiply them
mask_3d(:,:,hue_idx) = total_mask;
mask_3d(:,:,sat_idx) = total_mask;
mask_3d(:,:,bright_idx) = total_mask;
mask_3d = uint8(mask_3d); %this step is pretty important, if your image
%is a double use double(mask_3d) instead
masked_rgb_im = my_im .* mask_3d;
%does some plotting just for fun
figure(10);
subplot(2,3,1);imshow(my_im);title('original image');
subplot(2,3,2);imshow(hue_mask);title('hue mask');
subplot(2,3,3);imshow(brightness_mask);title('bright mask');
subplot(2,3,4);imshow(total_mask);title('total mask');
subplot(2,3,5:6);imshow(masked_rgb_im );title('masked image');

Detect black dots from color background

My short question
How to detect the black dots in the following images? (I paste only one test image to make the question look compact. More images can be found →here←).
My long question
As shown above, the background color is roughly blue, and the dots color is "black". If pick one black pixel and measure its color in RGB, the value can be (0, 44, 65) or (14, 69, 89).... Therefore, we cannot set a range to tell the pixel is part of the black dot or the background.
I test 10 images of different colors, but I hope I can find a method to detect the black dots from more complicated background which may be made up of three or more colors, as long as human eyes can identify the black dots easily. Some extremely small or blur dots can be omitted.
Previous work
Last month, I have asked a similar question at stackoverflow, but have not got a perfect solution, some excellent answers though. Find more details about my work if you are interested.
Here are the methods I have tried:
Converting to grayscale or the brightness of image. The difficulty is that I can not find an adaptive threshold to do binarization. Obviously, turning a color image to grayscale or using the brightness (HSV) will lose much useful information. Otsu algorithm which calculates adaptive threshold can not work either.
Calculating RGB histogram. In my last question, natan's method is to estimate the black color by histogram. It is time-saving, but the adaptive threshold is also a problem.
Clustering. I have tried k-means clustering and found it quite effective for the background that only has one color. The shortage (see my own answer) is I need to set the number of clustering center in advance but I don't know how the background will be. What's more, it is too slow! My application is for real time capturing on iPhone and now it can process 7~8 frames per second using k-means (20 FPS is good I think).
Summary
I think not only similar colors but also adjacent pixels should be "clustered" or "merged" in order to extract the black dots. Please guide me a proper way to solve my problem. Any advice or algorithm will be appreciated. There is no free lunch but I hope a better trade-off between cost and accuracy.
I was able to get some pretty nice first pass results by converting to HSV color space with rgb2hsv, then using the Image Processing Toolbox functions imopen and imregionalmin on the value channel:
rgb = imread('6abIc.jpg');
hsv = rgb2hsv(rgb);
openimg = imopen(hsv(:, :, 3), strel('disk', 11));
mask = imregionalmin(openimg);
imshow(rgb);
hold on;
[r, c] = find(mask);
plot(c, r, 'r.');
And the resulting images (for the image in the question and one chosen from your link):
You can see a few false positives and missed dots, as well as some dots that are labeled with multiple points, but a few refinements (such as modifying the structure element used in the opening step) could clean these up some.
I was curios to test with my old 2d peak finder code on the images without any threshold or any color considerations, really crude don't you think?
im0=imread('Snap10.jpg');
im=(abs(255-im0));
d=rgb2gray(im);
filter=fspecial('gaussian',16,3.5);
p=FastPeakFind(d,0,filter);
imagesc(im0); hold on
plot(p(1:2:end),p(2:2:end),'r.')
The code I'm using is a simple 2D local maxima finder, there are some false positives, but all in all this captures most of the points with no duplication. The filter I was using was a 2d gaussian of width and std similar to a typical blob (the best would have been to get a matched filter for your problem).
A more sophisticated version that does treat the colors (rgb2hsv?) could improve this further...
Here is an extraodinarily simplified version, that can be extended to be full RGB, and it also does not use the image procesing library. Basically you can do 2-D convolution with a filter image (which is an example of the dot you are looking for), and from the points where the convolution returns the highest values, are the best matches for the dots. You can then of course threshold that. Here is a simple binary image example of just that.
%creating a dummy image with a bunch of small white crosses
im = zeros(100,100);
numPoints = 10;
% randomly chose the location to put those crosses
points = randperm(numel(im));
% keep only certain number of points
points = points(1:numPoints);
% get the row and columns (x,y)
[xVals,yVals] = ind2sub(size(im),points);
for ii = 1:numel(points)
x = xVals(ii);
y = yVals(ii);
try
% create the crosses, try statement is here to prevent index out of bounds
% not necessarily the best practice but whatever, it is only for demonstration
im(x,y) = 1;
im(x+1,y) = 1;
im(x-1,y) = 1;
im(x,y+1) = 1;
im(x,y-1) = 1;
catch err
end
end
% display the randomly generated image
imshow(im)
% create a simple cross filter
filter = [0,1,0;1,1,1;0,1,0];
figure; imshow(filter)
% perform convolution of the random image with the cross template
result = conv2(im,filter,'same');
% get the number of white pixels in filter
filSum = sum(filter(:));
% look for all points in the convolution results that matched identically to the filter
matches = find(result == filSum);
%validate all points found
sort(matches(:)) == sort(points(:))
% get x and y coordinate matches
[xMatch,yMatch] = ind2sub(size(im),matches);
I would highly suggest looking at the conv2 documentation on MATLAB's website.

gamma correction formula : .^(gamma) or .^(1/gamma)?

I'm looking for a simple gamma correction formula for grayscale images with values between 0 and 255.
Let's say that the gamma of my screen is 2.2 (it's an LCD screen so I would probably need to estimate it with a more complicated procedure, but let's assume my screen is behaving nicely).
Which one of the following formulas would be the correct one?
Corrected = 255 * (Image/255).^2.2
OR
Corrected = 255 * (Image/255).^(1/2.2)
(Those are destined to be MATLAB codes but I hope they are understandable even to non-MATLAB people)
I've been looking around on the Internet but found both formulas going around. I suspect (2) is the right one, and my confusion is due to the tendency to call "gamma value" the inverse of the actual gamma value, but I would really appreciate some feedback by people who know what they are talking about...
Both formulas are used, one to encode gamma, and one to decode gamma.
Gamma encoding is used to increase the quality of shadow values when an image is stored as integer intensity values, so to do gamma encoding you use the formula:
encoded = ((original / 255) ^ (1 / gamma)) * 255
Gamma decoding is used to restore the original values, so the formula for that is:
original = ((encoded / 255) ^ gamma) * 255
If the monitor does the gamma decoding, you would want to use the first formula to encode the image data.
Gamma correction controls the overall brightness of an image. Images which are not corrected can look either bleached out or too dark. Suppose a computer monitor has 2.2 power function as an intensity to voltage response curve. This just means that if you send a message to the monitor that a certain pixel should have intensity equal to x, it will actually display a pixel which has intensity equal to x2.2 Because the range of voltages sent to the monitor is between 0 and 1, this means that the intensity value displayed will be less than what you wanted it to be. Such a monitor is said to have a gamma of 2.2.
So in your case,
Corrected = 255 * (Image/255)^(1/2.2).

Resources