DirectWrite renders issues and metric scaling inaccuracy - winapi

I have DirectWrite setup to render single glyphs and then shape them programmatically based on the glyph run and font metrics. (Due to my use case, I can't store every full texture in an OpenGL texture otherwise it's essentially a memory leak. So we store each glyph into one texture to lay them out glyph by glyph.)
However, I have two issues:
Inconsistent rendering results.
Scaling metrics leads to inconsistent distances between glyphs.
These results are are transferred to a bitmap using Direct2D and WIC bitmap (CreateWicBitmapRenderTarget).
Let's look at an example, font size 12 with Segoe UI.
Full string rendered 1st line is rendered using DrawTextLayout drawn with D2D1_DRAW_TEXT_OPTIONS_ENABLE_COLOR_FONT. 2nd line is drawn with each Glyph using DrawGlyphRun with DWRITE_MEASURING_MODE_NATURAL. 3rd is rendered with paint.net just for reference.
This leads to the second issue, the distance between each letter can be off. I am not sure if this is a symptom of the previous issue. You can see the distance between s and P is now 2 pixels when drawn separately. Because i is no longer 3 pixels wide, it visually looks too close to c now when zoomed out. p and e look too close.
I have checked the metrics, and I am receiving the right metrics from the font from shaping. Above string metrics from DirectWrite : [1088.0, 1204.0, 1071.0, 946.0, 496.0, 1071.0, 869.0]. I am comparing output with Harfbuzz: [S=0+1088|p=1+1204|e=2+1071|c=3+946|i=4+496|e=5+1071|s=6+869] which is correct.
To convert to DIP I am using this formula for the ratio multiplier: (size * dpi) / 72 / metrics.designUnitsPerEm
So with a default DPI of 96 and default size of 12 we get the following ratio: 0.0078125.
Let's look at S is 1088. So the advance should be 1088 * 0.0078125 = 8.5. Since we can't write between half a pixel, which way do we go? I tried every value from the lsb, to the advance, to render offset in every combination of flooring, ceiling, rounding, converting to int. Whichever way I choose, even if it fixes it for one situation, I'll test with another font, or another size, it will be one or two pixels too close in another string. I just can't seem to find a proper balance that is universal.
I am not really sure where to go from here. Any suggestions are appreciated. Here is the code: https://github.com/pyglet/pyglet/blob/master/pyglet/font/directwrite.py#L1736
EDIT: After a suggestion of using DrawGlyphRun using the full run, it does appear exactly what the DrawTextLayout outputs. So the DrawGlyphRun should produce the same appearance. Here's where it gets interesting:
Line1: DrawTextLayout
Line2: Single glyphs drawn by DrawGlyphRun
Line3: All glyphs drawn using DrawGlyphRun
You can see something interesting. If I render each 'c' by itself (right side), you can see that it has 4 pixels on the left of the character by itself. But in the strings it looks like it's missing. Actually, taking a deeper look, and a color dropper, it appears the color is indeed there, but it's much darker. So somehow each glyph is affecting the blend of the pixels around it. I am not really sure how it's doing this.
EDIT2: Talking with another, I think we narrowed this down to anti-aliasing. Applying the antialias to the whole string vs each character produces a different result. Setting D2D1_TEXT_ANTIALIAS_MODE_ALIASED each character looks and appears exactly the same now compared to both.

Related

"Barcode" reading from scanned image

I want to read a barcode from a scanned image that I printed. The image format is not relevant. I found that the scanned images are of very low quality and can understand why it normal barcodes fail.
My idea is to create a non standard and very simple barcode at the top of each page printed. It will be 20 squares in a row forming a simple binary code.Filled = 1, open = 0. It will be large enough on aA4 to make detection easy.
At this stage I need to load the image and find the barcode somewhere at the top. It will not be exactly at the same spot as it is scanned in. Step into each block and build the ID.
Any knowledge or links to info would be awesome.
If you can preset a region of interest that contains the code and nothing else, then detection is pretty easy. Scan a few rays across this region and find the white/black and black/white transitions. Then, knowing where the "cells" should be, you known their polarity.
For this to work, you need to frame your cells with two black ones on both ends to make sure to know where it starts/stops (if the scale is fixed, you can do with just a start cell, but I would not recommend this).
You could have a look at https://github.com/zxing/zxing. I would suggest to use a 1D bar code, but wide enough to match the low resolution of the scanner.
You could also invent your own bar code encoding and try to parse it your self. Use thick bars for 1 and thin lines for 0. A thick bar would be for instance 2 white pixels, 4 black pixels. A thin line would be 2 white pixels, 2 black pixels and 2 white pixels. The last two pixels encode the bit value.
The pixel should be the size of the scanned image pixel.
You then process the image scan line by scan line, trying to locate the bar code.
We locate the bar code by comparing a given pixel value sequence with a pattern. This is performed by computing a score function. The sum of squared difference is a good pick. When computing the score we ignore the two pixels encoding the bit value.
When the score is below a threshold, we found a matching pattern. It is good to add parity bits to the encoded value so that it's validity can be checked.
Computing a sum of square on a sliding window can be optimized.

Algorithm to detect the change in visible luminosity in an image

I want a formula to detect/calculate the change in visible luminosity in a part of the image,provided i can calculate the RGB, HSV, HSL and CMYK color spaces.
E.g: In the above picture we will notice that the left side of the image is more bright when compared to the right side , which is beneath a shade.
I have had a little think about this, and done some experiments in Photoshop, though you could just as well use ImageMagick which is free. Here is what I came up with.
Step 1 - Convert to Lab mode and discard the a and b channels since the Lightness channel holds most of the brightness information which, ultimately, is what we are looking for.
Step 2 - Stretch the contrast of the remaining L channel (using Levels) to accentuate the variation.
Step 3 - Perform a Gaussian blur on the image to remove local, high frequency variations in the image. I think I used 10-15 pixels radius.
Step 4 - Turn on the Histogram window and take a single row marquee and watch the histogram change as different rows are selected.
Step 5 - Look out for a strongly bimodal histogram (two distimct peaks) to identify the illumination variations.
This is not a complete, general purpose solution, but may hold some pointers and cause people who know better to suggest improvememnts for you!!! Note that the method requires the image to have a some areas of high uniformity like the whiteish horizontal bar across your input image. However, nearly any algorithm is going to have a hard time telling the difference between a sheet of white paper with a shadow of uneven light across it and the same sheet of paper with a grey sheet of paper laid on top of it...
In the images below, I have superimposed the histogram top right. In the first one, you can see the histogram is not narrow and bimodal because the dotted horizontal selection marquee is across the bar-code area of the image.
In the subsequent images, you can see a strong bimodal histogram because the dotted selection marquee is across a uniform area of image.
The first problem is in "visible luminosity". It me mean one of several things. This discussion should be a good start. (Yes, it has incomplete and contradictory answers, as well.)
Formula to determine brightness of RGB color
You should make sure you operate on the linear image which does not have any gamma correction applied to it. AFAIK Photoshop does not degamma and regamma images during filtering, which may produce erroneous results. It all depends on how accurate results you want. Photoshop wants things to look good, not be precise.
In principle you should first pick a formula to convert your RGB values to some luminosity value which fits your use. Then you have a single-channel image which you'll need to filter with a Gaussian filter, sliding average, or some other suitable filter. Unfortunately, this may require special tools as photoshop/gimp/etc. type programs tend to cut corners.
But then there is one thing you would probably like to consider. If you have an even brightness gradient across an image, the eye is happy and does not perceive it. Rather large differences go unnoticed if the contrast in the image is constant across the image. Unfortunately, the definition of contrast is not very meaningful if you do not know at least something about the content of the image. (If you have scanned/photographed documents, then the contrast is clearly between ink and paper.) In your sample image the brightness changes quite abruptly, which makes the change visible.
Just to show you how strange the human vision is in determining "brightness", see the classical checker shadow illusion:
http://en.wikipedia.org/wiki/Checker_shadow_illusion
So, my impression is that talking about the conversion formulae is probably the second or third step in the process of finding suitable image processing methods. The first step would be to try to define the problem in more detail. What do you want to accomplish?

Simulating the highlight recovery tool from Photoshop

I'm interested in processing a bitmap in Java using the same (or similar) technique as the Highlight recovery tool in Photoshop. (That would be the Image->Adjustments->Shadow/Highlight tool in CS4.)
I googled around, and found very little outside of discussion about existing tools that do the job.
Any ideas?
Just guessing because I don't have Photoshop - only going by the descriptions I find on the web.
The Radius control is probably used in a Gaussian Blur to get the average value around a pixel, to determine its level of highlight or shadow. Shadows will be closer to 0 while highlights will be closer to 255. The exact definition of "close" will be determined by the Tonal Width control. For example, at 100% maybe the shadows go from 0-63 and the highlights go from 192-255.
The Amount corresponds to the amount of brightness change desired - again I don't know the scale, or what equates to 100%. Changing the brightness of the shadows requires multiplying by a constant value - for example to brighten it by 100% would require multiplying by 2. You want to scale this by the shadow value determined above. The highlights work similarly, except working down from 255 instead of up from 0.

How do I locate black rectangles in a grid and extract the binary code from that

i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.

Is there an easy way to force Windows to calculate text extents using a fixed DPI value, instead of the current DPI setting?

I am wondering if there is an easy way to calculate the text extent of a string (similar to GetTextExtentPoint32), but allows me to specify the DPI to use in the calcuation. In other words, is there a function that does exactly what GetTextExtentPoint32 does, but allows me to pass the DPI as a parameter, or a way to "trick" GetTextExtentPoint32 into using a DPI that I can specify?
Before you ask "Why on earth do you want to do that?", I'll try to explain, but bear with me, the reasons behind this request are somewhat involved.
Ultimately, this is for a custom word-wrap algorithm that breaks a long string into smaller blocks of text that need to fit neatly on a Crystal Report with complex text layout requirements (it mimics the paper form used by police officers to file criminal complaints, so the state is in charge of the layout, not us, and it has to match the paper form almost exactly).
It's impossible for Crystal Reports to lay this text out properly without help (the text has to fit inside a small box on one page, followed by "standard-sized" continuation pages if the text overflows the small block), so I wrote code to break the text into multiple "blocks" that are stored in the reporting database and then rendered individually by the report.
Given the required dimensions (in logical inches), and the font information, the code first fits the text to the required width by inserting line breaks, then breaks it into correctly-size blocks based on the text height. The algorithm uses VB6's TextHeight and TextWidth functions to calculate extents, which returns the same results that the GetTextExtentPoint32 function would (I checked).
This code works great when the display is set to 96dpi, but breaks at 120 DPI: Some lines end up with more words in them they would have had at 96 DPI.
For example, "The quick brown fox jumps over the lazy dog" might break as follows:
At 96 DPI
The quick brown fox jumps over
the lazy dog
At 120 DPI
The quick brown fox jumps over the
lazy dog
This text is then further broken up by Crystal Reports, since the first line no longer fits in the corresponding text field on the report, so the actual report output looks like this:
The quick brown fox jumps over
the
lazy dog
At first, I thought I could compensate for this by scaling the results of TextHeight and TextWidth down by 25%, but apparently life isn't so simple: it seems numerous rounding errors creep in (and possibly other factors?), so that the text extent of any given string is never exactly 25% larger at 120 DPI compared to 96 DPI. I didn't expect it to scale perfectly, but it's not even close at times (in one test, the width at 120 DPI was only about 18% bigger than at 96 DPI).
This doesn't happen to any text on the report that is handled directly by Crystal Report: it seems to do a very good job of scaling everything so that the report is laid out exactly the same at 96 DPI and 120 DPI (and even 144 DPI). Even when printing the report, the text is printed exactly as it appears on the screen (i.e. it truly seems to be WYSIWYG).
Given all of this, since I know my code works at 96 DPI, I should be able to fix the problem by calculating all my text extents at 96 DPI, even if Windows is currently using a different DPI setting.
Put another way, I want the result of my FitTextToRect function to return the same output at any DPI setting, by forcing the text extents to be calculated using 96 DPI. This should all work out since I'm converting the extents back to inches and then comparing them against required width and height in inches. I think it just so happens that 96 DPI produces more accurate results than 120 DPI when converting back and forth between pixels and inches.
I've been pouring over the Windows Font and Text Functions, seeing if could roll my own function to calculate text extent at a given DPI, looking at GetTextMetrics and other functions to see how easy or difficult this might be to do. If there is an easier way to accomplish this, I'd love to know before I start creating my own versions of existing Windows API functions!
GetTextMetrics accepts a DC. It uses the DPI settings from that DC (for example, you couldn't possibly use screen settings and expect data to come out formatted acceptably for a printer).
So all you need to do is supply a DC with the right DPI. I think you might be able to directly control the DPI of a metafile DC.
Metafiles are vector graphics so they don't even look like they have DPI.
You can control DPI with CreateDIBitmap, but then there's no way to get a matching DC. You could see if the DPI changes if you select that DIB into a memory DC (CreateCompatibleDC).
Or you could use GDI+, create a Bitmap with the desired DPI, use the Graphics constructor that operates on an Image, and then that Graphics will have the right DPI so GDI+ text measurement functions would then use your DPI of choice.
I found a much simpler solution. It took me awhile to convince myself that it really does makes sense. The solution was so obvious I feel almost silly posting it here.
Ultimately, what I really want is for my FitTextToRect function to produce the same text layout at any DPI setting. It turns out, in order to make this happen, it's actually easier to measure everything in pixels. Since any other unit of measure will by definition take the current DPI setting into account, using anything other than pixels can throw things off.
The "trick" then is to force all the calculations to work out to the same result they would have had at 96 DPI. However, instead of calculating text extents first and then scaling them down, which adds significant error to the calculations, you can get more accurate results (i.e. the results will be equal or near equal at any DPI) if you temporarily scale the font size down before calculating any text extents. Note that that original font size is still used in the print preview and in the printed output.
This works because of the fact that Windows internally measures font size in device units, which for the screen, means pixels. The font's "point size" is converted to pixels by the application that let you select the font, using the current DPI setting:
font_height_in_pixels = (point_size * current_dpi / 72)
That is, Windows never deals directly with the font's point size: it's always dealing with pixels. As a result, it calculates text extents in terms of pixels as well.
This means you can work out a scaling factor based on the current DPI and font point size that will scale the font down to a new point size which always comes out to the same number of pixels at any DPI (I used 96 as my "baseline" DPI):
scaled_point_size = (point_size * 96 / current_dpi)
By effectively forcing the font to fit the same number of pixels at any DPI, this ensures that text extents will be the same at any DPI, and therefore the algorithm will lay the text out the same way at any DPI.
The only other thing I needed to do was ensure that the height and width parameters passed to the function, which are measured in inches, got converted to pixels correctly. I couldn't use VB6's existing inches-to-pixels conversion function, since it takes the current DPI into account (which would create inconsistent results, since the text height and width is "normalized" to 96dpi), so instead I just multiplied the height and width by 96, which converts them to what their pixel measurements would be at 96 DPI.

Resources