Can anyone clarify if the GDI StretchBlt function for the workstation Win32 API performs bilinear interpolation for scaling to both larger and smaller images for 24/32-bit color images? And if not, is there a GDI (not GDI+) function that does this?
The SetStretchBltMode fn has a setting HALFTONE which is documented as follows:
HALFTONE
Maps pixels from the source rectangle into blocks of pixels in the destination rectangle. The average color over the destination block of pixels approximates the color of the source pixels.
I've seen references (see follow-up to first answer) that this performs bilinear interpolation when scaling down an image, but no clear answer of what happens when scaling up.
I have noticed that the Windows Mobile CE SDK does support a BILINEAR flag - which is documented exactly opposite of the HALFTONE comments (only works for scaling up).
Note that for the scope of this question, I'm not interested in pursuing GDI+ (which has numerous interpolation options), OpenGL, DirectX, etc. as alternatives, so please don't bother with follow-ups regarding these other APIs or alternate image libraries.
What I'm really hoping to find is some definitive MS/MSDN or other high-quality documentation that clearly documents this behavior of the Win32 (desktop) GDI behavior.
Meanwhile, I'll try some experiments comparing GDI vs. Direct2D (which does have an explicit flag to control this) and post my findings.
Thanks!
I've been looking into this same problem for the past couple of weeks.
As far as I can tell, there does not exist any definitive documentation on this behaviour from Microsoft.
However, I've run some tests myself, to try and establish the degree to which StretchBlt can be trusted to perform consistently with respect to up- and down-scaling images in halftone mode.
My findings are:
1) StretchBlt does produce adequate quality up- and down-scaled images. It might be a touch below Photoshop quality, but probably OK for most practical purposes.
2) It seems to depend upon hardware acceleration, whenever it's available. I haven't been able to confirm this, but I have a slight fear that this may lead to different outputs on different types of hardware. However, on the 5 or 6 different systems I've tried it on, old and new, the performance has been consistent and fast.
3) If you use the call on a 16-bit color device, or lower, StretchBlt will automatically dither your image. If you run it on a 24-bit color device, it will not dither.
4) If you use it to scale small images (smaller than 150x150px), it will randomly fall back to nearest neighbour interpolation. This can be remedied in your own software, by padding the bitmap before scaling, doing StretchBlt on it, and then removing the padding afterwards. Kind of a hack, but it works.
HALFTONE mode performs a very blocky halftone dithering on the image, based on varying the conversion thresholds over a defined square. I have never seen a situation where it would be considered the best choice.
COLORONCOLOR is the best mode for color images, but as you've seen it doesn't give great results.
GDI does not support a bilinear mode (except in Windows Mobile CE as you discovered). The naive implementation of bilinear does not do very well when shrinking an image, as it simply tries to interpolate between two adjacent input pixels without trying to draw from a larger area.
Related
I picked up working on updating the classic GNOME Clearlooks theme (originally GTK2), featured prominently in Fedora 14, for GTK3 by forking the outdated Clearlooks-Phenix project.
I've never worked on any GTK3 theme before so I came in with some false assumptions, namely that clipping rules for the CSS stylesheets would be be consistent with how a browser handles them.
One of these assumptions led me to file a bug report which was closed with the response:
Yes, clipping is not being done in gtk3. It's too slow with cairo.
2D clipping has been a staple of accelerated graphics since the days of Windows 3.1. Cairo can already take advantage of display hardware acceleration when available, with many of its demos prominently using this feature.
Clipping to me seems like it should be so fundamentally basic on today's modern hardware that it should be effectively free. In what situations could it be considered slow enough to disable it selectively or entirely (some regions of GTK elements are seemingly clipped or overdrawn, I don't know which)? Is this something fundamental to Cairo, as it was mentioned specifically.
Just as Timm Bäder wrote: Clipping with something that cuts pixels in half is complicated. (For example: A circle is more complicated than a rectangle that fits to the pixel grid.)
Sure, clipping so that only whole pixels are painted to can speed things up since less pixels need to be touched. However, a clip path that contains only 20% of a pixel means that some interpolation with the current value of the pixel is required.
Simple example: Paint a pixel white. pixel = white when the pixel is just set to white. But when only 20% of the pixel is to be drawn, you end up with something like pixel = white * 0.2 + pixel * 0.8, which is much more complicated.
I have a game written with SpriteKit which uses a SKEffectNode with blur effect to blur a set of sprites, one of which has a fairly large texture, and which together cover a fairly large area of the screen. An iMac and Mac Book Pro cope quite happily with this but on a more humble Mac Book there is a notable drop in frame rate with the effect node added in. Since the effect isn't crucial to the functionality of the game, I could simply not add the SKEffectNode for machines with less powerful graphics capabilities.
So then the question: what would be a good programmatic check that I could make to determine the "power of the GPU" or "performance when applying texture effects" or [suggest better metric here] and via what API? Thanks for your suggestions!
You'll have to create a performance test using your actual blurring processes and some sample content to get an accurate idea of the time cost of it on each generation of hardware.
Blurs are really weird things, programmatically. A Box Blur can give you most of the appearance of a nice, soft gaussian blur for much less processing cost. A zoom or motion blur (that looks good) is surprisingly expensive, even on strong hardware.
And there's some amazingly effective "cheats" when doing blurs. Because there's no need for detail you can heavily optimise the operations, particularly if the blurs are strong.
Apple, it's believed, does something like this, for example, with its blurs:
Massively shrink the target image
Do a gaussian blur on this tiny image
Scale it back up, somewhat
Apply a cheap Box Blur to soften it
Fully scale back to the desired size
By way of terrible example benefitting from scaling well (with filtering set for good scaling)
This is the full sized image blurred:
And here's a version of the same image, scaled to a 16th of its original size, blurred, and then the blurred image scaled back up. As you can see, due to the good scaling and lack of detail, there's hardly any difference in the blurred image, but the blur takes MUCH less processing energy and time:
Could somebody provide an example of an efficient way to work with pixels using Direct2D?
For example, how can I swap all green pixels (RGB = 0x00FF00) with red pixels (RGB = 0xFF0000) on a render target? What is the standard approach? Is it possible to use ID2D1HwndRenderTarget for that? Here I assume using some kind of hardware acceleration. Should I create a different object for direct pixels manipulations?
Using DirectDraw I would use BltFast method on the IDirectDrawSurface7 with logical operation. Is there something similar with Direct2D?
Another task is to generate complex images dynamically where each point location and color is a result of a mathematical function. For the sake of an example let's simplify everything and draw Y = X ^ 2. How to do that with Direct2D? Ultimately I'm going to need to draw complex functions but if somebody could give me a simple example for Y = X ^ 2.
First, it helps to think of ID2D1Bitmap as a "device bitmap". It may or may not live in local, CPU-addressable memory, and it doesn't give you any convenient (or at least fast) way to read/write the pixels from the CPU side of the bus. So approaching from that angle is probably the wrong approach.
What I think you want is a regular WIC bitmap, IWICBitmap, which you can create with IWICImagingFactory::CreateBitmap(). From there you can call Lock() to get at the buffer, and then read/write using pointers and do whatever you want. Then, when you need to draw it on-screen with Direct2D, use ID2D1RenderTarget::CreateBitmap() to create a new device bitmap, or ID2D1Bitmap::CopyFromMemory() to update an existing device bitmap. You can also render into an IWICBitmap by making use of ID2D1Factory::CreateWicBitmapRenderTarget() (not hardware accelerated).
You will not get hardware acceleration for these types of operations. The updated Direct2D in Win8 (should also be available for Win7 eventually) has some spiffy stuff for this but it's rather complex looking.
Rick's answer talks about the methods you can use if you don't care about losing hardware acceleration. I'm focusing on how to accomplish this using a substantial amount of GPU acceleration.
In order to keep your rendering hardware accelerated and to get the best performance, you are going to want to switch from ID2DHwndRenderTarget to using the newer ID2DDevice and ID2DDeviceContext interfaces. It honestly doesn't add that much more logic to your code and the performance benefits are substantial. It also works on Windows 7 with the Platform Update. To summarize the process:
Create a DXGI factory when you create your D2D factory.
Create a D3D11 device and a D2D device to match.
Create a swap chain using your DXGI factory and the D3D device.
Ask the swap chain for its back buffer and wrap it in a D2D bitmap.
Render like before, between calls to BeginDraw() and EndDraw(). Remember to unbind the back buffer and destroy the D2D bitmap wrapping it!
Call Present() on the swap chain to see the results.
Repeat from 4.
Once you've done that, you have unlocked a number of possible solutions. Probably the simplest and most performant way to solve your exact problem (swapping color channels) is to use the color matrix effect as one of the other answers mentioned. It's important to recognize that you need to use the newer ID2DDeviceContext interface rather than the ID2DHwndRenderTarget to get this however. There are lots of other effects that can do more complicated operations if you so choose. Here are some of the most useful ones for simple pixel manipulation:
Color matrix effect
Arithmetic operation
Blend operation
For generally solving the problem of manipulating the pixels directly without dropping hardware acceleration or doing tons of copying, there are two options. The first is to write a pixel shader and wrap it in a completely custom D2D effect. It's more work than just getting the pixel buffer on the CPU and doing old-fashioned bit mashing, but doing it all on the GPU is substantially faster. The D2D effects framework also makes it super simple to reuse your effect for other purposes, combine it with other effects, etc.
For those times when you absolutely have to do CPU pixel manipulation but still want a substantial degree of acceleration, you can manage your own mappable D3D11 textures. For example, you can use staging textures if you want to asynchronously manipulate your texture resources from the CPU. There is another answer that goes into more detail. See ID3D11Texture2D for more information.
The specific issue of swapping all green pixels with red pixels can be addressed via ID2D1Effect as of Windows 8 and Platform Update for Windows 7.
More specifically, Color matrix effect.
I'm trying to get an image of a blackboard readable by OCR. Naturally, most OCR software doesn't like dirty images. What image processing should I try to put the image through to clean the image up?
Have you tried the OCR software yet? It's likely that the OCR software is well suited to reading what's essentially already a black and white image.
However, if you were required to do so you could try to:
Threshold the image.
Essentially take a greyscale version of the image and turn it into black / white pixels
Perform Binary Dilation to grow the remaining objects
Perform Binary Erosion
The idea is by dilating then eroding you would remove any rough / noisy edges and then you can pass the skeletonized image to the OCR.
There are probably plenty of methods to achieve a similar result. Given that there are entire books devoted to computer vision this answer will hardly do them justice.
The only texts I have are from 1997, but surely there's been more written on the subject since.
Algorithms for Image Processing and Computer Vision - J.R. Parker
Digital Image Processing - Gonzalez / Woods
Offhand, I'd say invert the image (reverse the colors, so that the writing is black on white) and increase the contrast a bit. You can try modifying the brightness to get the erased chalk fogginess to disappear into the background.
In Photoshop, the Levels dialog may be your most useful image adjustment. Mimicking this in code is another subject, entirely.
The basis of Levels is that you adjust the max, min and midpoints of the brightness levels. Usually shown on a histogram, you adjust the points such that you obtain the desired amount of contrast, but also move the midpoint such that text in the image is the most well-defined; critical for OCR applications. By moving the midpoint you can "eliminate" the grayscale fuzz that ordinarily surrounds handwriting by causing it to disappear into the light (or dark) areas of the image.
Also you might try converting the image to 1-bit after such an adjustment, forcing everything to black or white. Sometimes this speeds up the OCR process. But be careful, it also will discard detail.
Have you tried edge detection techniques such as Roberts Cross and Sobel operator to filter noise out of the image? Without seeing the quality of the image, can't say how effective that'd be.
Not sure how constrained you are in the choice of OCR solution, but the ABBYY OCR engine (and a web API based on it, http://www.wisetrend.com/wisetrend_ocr_cloud.shtml ) includes automatic image cleanup / texture removal options.
There are commercial solutions but cleaning up board images appears to be an open problem. Add OCR to an unsolved problem, and you get... an unsolved problem.
So I've read that StretchBlt can mirror images horizontally and/or vertically by negating the nWidthSrc/Dest and nHeightSrc/Dest parameters. I'd like this functionality without the performance overhead of a StretchBlt. I tried the same technique with BitBlt but it didn't work.
Is there any way to mirror an image with something as simple as BitBlt, without the overkill of a StretchBlt? Or will StretchBlt not affect performance if the source and destination sizes are the same?
BitBlt will only perform mathmatic operations (or, xor, etc) on the individual pixels in question, it will not resize the image in any way. That is exactly what StretchBlt is for, and StretchBlt (compared to any other graphics resizing operation) is insanely fast as in most cases it can use the graphics card to accelerate its performance.
All Win32 functions are probably going to be extremely optimized.
What makes you think StretchBlt will be a big performance hit?
Have you profiled your application using StretchBlt?
You could reverse all of the bitmap data yourself and see if you can do better that StretchBlt.
Here's a link that might help you out:
http://www.codeguru.com/cpp/g-m/bitmap/specialeffects/article.php/c1739
To mirror an image you just need to loop through the pixels in reverse order. Such as if you want to mirror horizontally you just need to do the following:
expand image canvas to double the size
start at the bottom of the image and work you way up writing the pixels in to the mirrored area from the top down.
do step 3 from left to right.
I don't know what language you are using, but most of them allow you to manipulates the pixels or bits on an individual basis using GDI.
No way you are going to be more efficient that StretchBlt, unless you know some extra information about the image (e.g, there is a border so you don't have to flip certain pixels.