Why does not using premultiplied alpha have "significantly worse performance"? - performance

QPainter is responsible for drawing and compositing in Qt. There is a section in the documentation that talks about performance. My question is regarding the bolded sentence from the following paragraph.
Raster - This backend implements all rendering in pure software and is always used to render into QImages. For optimal performance only use the format types QImage::Format_ARGB32_Premultiplied, QImage::Format_RGB32 or QImage::Format_RGB16. Any other format, including QImage::Format_ARGB32, has significantly worse performance. This engine is used by default for QWidget and QPixmap.
I understand that multiplying the color channels by the alpha is something that is done in the source-over operation. This multiplication can be done ahead-of-time to avoid doing it in the compositor. Performing this multiplication involves multiplying the RGB channels by the alpha and then dividing by 255 (or multiplying by some magic number that overflows in the right way to mimick the division). That's six integer multiplications per pixel. Surely performing an extra six integer multiplications does not have "significantly worse performance"?
Is the alpha multiplication really that slow? Perhaps they are merely stating that they don't try to optimize that code path as much as the other so there are no guarantees as to how it performs?

Please have a look at the detailed explanation here: https://pspdfkit.com/blog/2016/a-curious-case-of-android-alpha/
Of course, it does not refer direclty to Qt, but to the case why premultiplied bitmaps make sense.

It makes some sense in your case, since I presume that some widget paints the image, and the assumption is that it may paint it more than once. In any case, the widget, during painting, will premultiply the alpha. So you might as well be very explicit about it - after all, image format conversions are one-liners, so it's not as if you had to write a page of code to handle it. So:
class MyViewer : public QWidget {
Q_OBJECT
QImage m_image;
public:
Q_SLOT void setImage(const QImage &image) {
m_image = image.convertToFormat(QImage::Format_ARGB32_Premultiplied);
update();
}
...
};

Related

Image comparison algorithm for finding the most similar image in a set

Is there an algorithm that can take one image as an input and a list of images as second input and tell which one is the most similar ? In our problem we can have images which are the same buy look different due to watermarks. So we would need to identify matching images even when watermarks are different.
Are neural networks used for this? Is there a particular algorithm?
Keypoint extraction and matching is one of the solutions for this problem.
Using some feature detector as SIFT, SURF, Fast-To-Track, ... to extract keypoints on original image and other images. On these days, SIFT detector was become popularly and be improved much because its accuracy and efficiency.
After that, features is matched by Ransac algorithm or others...
The matching images can be defined by a number of true matching points.
Can refer keypoint algorithm in OpenCV:
API: http://docs.opencv.org/2.4/modules/refman.html
Example:http://docs.opencv.org/2.4/doc/tutorials/features2d/table_of_content_features2d/table_of_content_features2d.html#table-of-content-feature2d
One possible approach is perceptual hashing, which is quite robust and can be tuned. Survey paper link
Of course this could also be done with Deep Learning, but this will need much more work. Slides
It will always be needed to define your problem-setting. Are there pixelwise-equal results (even classic hashing on pixel work); are there different compressions (robustness needed); are there color-transformations or even geometric-transformations (like rotations); size-changings...
The classic perceptual-hashing algorithms are quite robust, even for moderate transformations.
Some time ago i implemented a simple example of a perceptual-frame hash (based on pixel-statistics, not features). I would try something like this first before "going deeper" (CNNs) :-)
There is also the question of the query-process. While i don't remember the hashing-properties of my approach (maybe a metric for bit-errors is enough), the Deep-Learning one in the Slides try to preserve similarities, so that you could get distances / a ranking while querying.
If you want to tackle with watermark attack, I will recommend you image hash algorithms, they are fast and robust to this kind of attack. Let me show you some examples
#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>
#include <iostream>
void watermark_attack(cv::Ptr<cv::img_hash::ImgHashBase> algo)
{
std::vector<std::string> const origin_img
{
"origin_00.png", "origin_01.png",
"origin_02.png", "origin_03.png"
};
std::vector<std::string> const watermark_img
{
"watermark_00.png", "watermark_01.png",
"watermark_02.png", "watermark_03.png"
};
cv::Mat origin_hash, watermark_hash;
for(size_t i = 0; i != origin_img.size(); ++i){
cv::Mat const input = cv::imread(origin_img[i]);
cv::Mat const watermark_input = cv::imread(watermark_img[i]);
//compute the hash value of image without watermark
algo->compute(input, origin_hash);
//compute the hash value of image with watermark
algo->compute(watermark_input, watermark_hash);
//compare the different between the hash values
//of original image and watermark image
std::cout<<algo->compare(origin_hash, watermark_hash)
<<std::endl;
}
}
int main()
{
using namespace cv::img_hash;
//disable opencl acceleration may(or may not) boost up speed of img_hash
cv::ocl::setUseOpenCL(false);
watermark_attack(AverageHash::create());
}
The results are 1,2,1,2, all pass.
This small program compare the original image(left) with their watermark brother(right), the smaller the value of compute show, more similar the images are. For the case of AverageHash, recommend threshold is 5(that means if the compare result is bigger than 5, the images are consider very different).
Not only watermark, AverageHash provide another "side effects", this algorithm work under contrast , noise(gaussian, pepper and salt), resize and jpeg compression attack too.
The other benefit of using image hash is you can store the hash values of your images in a file, you do not need to compute the hash values again and again.
Simple and fast method to compare images for similarity will show you more details about the img_hash module of opencv.
Simplest approach would be registration (alignment) (ECC registration is present for both Matlab and OpenCV), you could devise your own means for how to create a scheme for this.
Better approach would be using MSER or FAST features present in Matlab. Use their documentation, they literally do what you ask.
PS: There's a cascade trainer inbuilt in Matlab 2015b which does just what you're asking. Takes a reference image, asks for background w/o reference image and WALLA you got yo own cascade classifier ready, use it at your leisure to classify images left and right, side and bottom, son.

iOS: Optimised algorithm for blurring bitmap

Following from this question, iOS / GLES2: How to achieve Glow Effect, I'm investigating making my own blurring routine.
Maybe something along the lines of:
blur8bitGreyscaleBitmap(int resX, int resY, int passes, char* src, char* dest)
{
...
}
And then filling it in with something that takes each pixel in turn, diffusing it into its neighbours, would create a subtle blur. and iterating this process several times would let the blur diffuse outwards.
Is there a better method than this?
also, this looks like just the sort of task that could be made to run 20x faster with good ( maybe NEON ) optimisation.
I am looking for alternate techniques, code, links.
After doing a bit of research, I discovered the following,
it is okay to blur horizontal, then vertical. this means if you are blurring five pixels left right up down into your target pixel, that is 11+11 operations instead of 11*11
most basic is box blur, simply averaging all of the pixels in the box. this would be the choice for real-time blur on mobile devices. this can be optimised heavily eg if the first pixel requires A+B+C+D+E, then for the next one we can simply -A then +F. ie we don't have to do all of those additions twice.
http://en.wikipedia.org/wiki/Gaussian_blur gives better results
the common technique is to do the work on the graphics chip, using GLES2 shaders eg http://www.gamerendering.com/2008/10/11/gaussian-blur-filter-shader/
I'm kind of curious whether a similar level of optimisation could be reached using accelerate framework.
I'm still curious whether there is any existing NEON code to do this, my guess is even this would not improve upon doing the work on the graphics chip, so no one has bothered.

Advice for classifying symbols/images

I am working on a project that requires classification of characters and symbols (basically OCR that needs to handle single ASCII characters and symbols such as music notation). I am working with vector graphics (Paths and Glyphs in WPF) so the images can be of any resolution and rotation will be negligable. It will need to classify (and probably learn from) fonts and paths not in a training set. Performance is important, though high accuracy takes priority.
I have looked at some examples of image detection using Emgu CV (a .Net wrapper of OpenCV). However examples and tutorials I find seem to deal specifically with image detection and not classification. I don't need to find instances of an image within a larger image, just determine the kind of symbol in an image.
There seems to be a wide range of methods to choose from which might work and I'm not sure where to start. Any advice or useful links would be greatly appreciated.
You should probably look at the paper: Gradient-Based Learning Applied to Document Recognition, although that refers to handwritten letters and digits. You should also read about Shape Context by Belongie and Malik. They keyword you should be looking for is digit/character/shape recognition (not detection, not classification).
If you are using EmguCV, the SURF features example (StopSign detector) would be a good place to start. Another (possibly complementary) approach would be to use the MatchTemplate(..) method.
However examples and tutorials I find
seem to deal specifically with image
detection and not classification. I
don't need to find instances of an
image within a larger image, just
determine the kind of symbol in an
image.
By finding instances of a symbol in image, you are in effect classifying it. Not sure why you think that is not what you need.
Image<Gray, float> imgMatch = imgSource.MatchTemplate(imgTemplate, Emgu.CV.CvEnum.TM_TYPE.CV_TM_CCOEFF_NORMED);
double[] min, max;
Point[] pointMin, pointMax;
imgMatch.MinMax(out min, out max, out pointMin, out pointMax);
//max[0] is the score
if (max[0] >= (double) myThreshold)
{
Rectangle rect = new Rectangle(pointMax[0], new Size(imgTemplate.Width, imgTemplate.Height));
imgSource.Draw(rect, new Bgr(Color.Aquamarine), 1);
}
That max[0] gives the score of the best match.
Put all your images down into some standard resolution (appropriately scaled and centered).
Break the canvas down into n square or rectangular blocks.
For each block, you can measure the number of black pixels or the ratio between black and white in that block and treat that as a feature.
Now that you can represent the image as a vector of features (each feature originating from a different block), you could use a lot of standard classification algorithms to predict what class the image belongs to.
Google 'viola jones' for more elaborate methods of this type.

Where can I find a good read about bicubic interpolation and Lanczos resampling?

I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.

Perceptual Image Downsampling

So here is my problem:
I have an image, that image is large (high resolution) and it needs to be small (much lower resolution).
So I do the naive thing (kill every other pixel) and the result looks poor.
So I try to do something more intelligent (low pass filtering using a Fourier transform and re-sampling in Fourier space) and the result is a little better but still fairly poor.
So my question, is there a perceptually motivated image down-sampling algorithm (or implementation)?
edit:
While I am aware of a number of resampling techniques, my application is more concerned with preserving the perceptual features, rather than producing smooth images.
edit2: it is safe to assume I have some level of familiarity with digital signal processing, convolutions, wavelet transforms, etc
Read this:
http://www.dspguide.com/
OK, that's quite a read. But understanding filter design would be handy.
In general, the process for scaling an image from W1 x H1 to W2 x H2 where W1, W2, H1, H2 are integers, is to find new W3, H3 so that W1 and W2 are integer factors of W3 and H1 and H2 are integer factors of H3, and then pad the original image with zeros (used to space the pixels of the original image) so that it's now W3 x H3 in size. This introduces high frequencies due to discontinuities in the image, so you apply a low-pass filter to the image, and then decimate the filtered image to its new size (W2 x H2). Sounds like you might be trying to do this already, but the filtering can be done in the time domain so that the Fourier transform isn't really necessary.
In practice, the process I just described is optimized (you'll note that when applying a convolution filter to the upscaled image most of the terms will be 0, so you can avoid most of the multiplication operations in your algorithm, for example. And since you end up throwing away many of the filtered results, you don't need to calculate those, so you end up with a handful of multiplications and additions for each pixel in the target image, basically. The trick is to figure out which coefficients to use.)
libswscale in the ffmpeg project does something like this, I believe. Check it out:
http://gitorious.org/libswscale
As others pointed out, (and you apparently noticed) decimating the image introduces aliasing artifacts. I can't be sure about your resampling implementation, but the technique has interesting gotchas depending on the window size you use and other implementation details.
Bicubic interpolation is generally regarded as good enough, but there is no perfect solution, it depends on people and on the properties of the picture being resampled.
Related links:
I didn't even know that sharpness was also called acutance.
Aliasing is a problem that can occur when downsampling naively.
Pascal is right. Depends on the image, and on what you want. Some factors:
preserving sharp edges
preserving colours
algorithm speed
This is your method.
Some others:
Lanczos resampling
Bicubic interpolation
Spline interpolation
Note that sometimes resampling down can get you a sharper result than, say, using a lower resolution camera, because there will be edges in the high-resolution image that cannot be detected by a lower-res device.
Side note: Many algorithms (especially Nearest Neighbour) can be optimised if you are scaling down by an integer (e.g. dividing by 4 or 6).
Recommended ImageMagick "general purpose" downsampling methods are discussed here: http://www.imagemagick.org/Usage/filter/nicolas/#downsample
You could try a content aware resizing algorithm. See: http://www.seamcarving.com/
Paint Mono (an OS fork of Paint.NET) implements Supersampling algorithm for image downsampling here: http://code.google.com/p/paint-mono/source/browse/trunk/src/PdnLib/Surface.cs?spec=svn59&r=59#1313

Resources