Picture a piece of metal bar, 20mm long, about 30mm round. On the bar, there is numbers stamped. 10 characters, 4.5mm high, spread around approximately 120° of the circumference.
I need to perform OCR on the characters BUT the text characters are not all visible in one image. Three images spaced at around 30° seems to look ok.
Next issue is the metal is freshly machined and the text characters do not seem to OCR well; I think due to the lack of real contrast. ie black/white difference.
Does anyone have any ideas on how these characters could be OCR'd??
You could try this ImageMagick command to increase 'contrast'. It basically leaves over only 2 values (zero or maximum) for each color. Every value below the threshold gets set to 0, values above the threshold get set to 255 (or 65535 if working at 16-bit depth):
convert original.jpg -threshold 50% modified.jpg
Play with the value of 50% to get best results -- set it higher or lower... Depending on your input image, this could already be enough to get images that are OK for OCR-ing.
Related
i'm using this:
GetClipboardData(CF_DIB);
i view it as a BITMAPINFOHEADER and take into account
biWidth
biHeight
biBitCount 24 vs 32
if biCompression is BI_BITFIELDS then i skip over the first 3*4 bytes of the pixel data.
most of the time when i copy an image i can get the pixel data from it just fine. but when copying from certain applications like Telegram, sometimes the colors are messed up, sometimes the image is skewed, all sorts of crazy artifacts happen when trying to view the clipboard image. i don't see anything special in the BITMAPINFOHEADER of these images i'm rendering incorrectly. what could i possibly be forgetting to handle?
A little-known fact about Microsoft bitmaps is that every row is padded to an even multiple of 4 bytes. It is natural to assume that one row follows the next immediately, but there may be 1 to 3 bytes of padding on each row that will throw your alignment off. You will see this as colors that aren't what they should be, or a skewed image as each row is shifted from its proper position.
If your pixels are all 4 bytes (RGBA) or your image width is a multiple of 4 (more common than random chance would suggest) each row will already be a multiple of 4 bytes, and you won't notice a problem.
I have DirectWrite setup to render single glyphs and then shape them programmatically based on the glyph run and font metrics. (Due to my use case, I can't store every full texture in an OpenGL texture otherwise it's essentially a memory leak. So we store each glyph into one texture to lay them out glyph by glyph.)
However, I have two issues:
Inconsistent rendering results.
Scaling metrics leads to inconsistent distances between glyphs.
These results are are transferred to a bitmap using Direct2D and WIC bitmap (CreateWicBitmapRenderTarget).
Let's look at an example, font size 12 with Segoe UI.
Full string rendered 1st line is rendered using DrawTextLayout drawn with D2D1_DRAW_TEXT_OPTIONS_ENABLE_COLOR_FONT. 2nd line is drawn with each Glyph using DrawGlyphRun with DWRITE_MEASURING_MODE_NATURAL. 3rd is rendered with paint.net just for reference.
This leads to the second issue, the distance between each letter can be off. I am not sure if this is a symptom of the previous issue. You can see the distance between s and P is now 2 pixels when drawn separately. Because i is no longer 3 pixels wide, it visually looks too close to c now when zoomed out. p and e look too close.
I have checked the metrics, and I am receiving the right metrics from the font from shaping. Above string metrics from DirectWrite : [1088.0, 1204.0, 1071.0, 946.0, 496.0, 1071.0, 869.0]. I am comparing output with Harfbuzz: [S=0+1088|p=1+1204|e=2+1071|c=3+946|i=4+496|e=5+1071|s=6+869] which is correct.
To convert to DIP I am using this formula for the ratio multiplier: (size * dpi) / 72 / metrics.designUnitsPerEm
So with a default DPI of 96 and default size of 12 we get the following ratio: 0.0078125.
Let's look at S is 1088. So the advance should be 1088 * 0.0078125 = 8.5. Since we can't write between half a pixel, which way do we go? I tried every value from the lsb, to the advance, to render offset in every combination of flooring, ceiling, rounding, converting to int. Whichever way I choose, even if it fixes it for one situation, I'll test with another font, or another size, it will be one or two pixels too close in another string. I just can't seem to find a proper balance that is universal.
I am not really sure where to go from here. Any suggestions are appreciated. Here is the code: https://github.com/pyglet/pyglet/blob/master/pyglet/font/directwrite.py#L1736
EDIT: After a suggestion of using DrawGlyphRun using the full run, it does appear exactly what the DrawTextLayout outputs. So the DrawGlyphRun should produce the same appearance. Here's where it gets interesting:
Line1: DrawTextLayout
Line2: Single glyphs drawn by DrawGlyphRun
Line3: All glyphs drawn using DrawGlyphRun
You can see something interesting. If I render each 'c' by itself (right side), you can see that it has 4 pixels on the left of the character by itself. But in the strings it looks like it's missing. Actually, taking a deeper look, and a color dropper, it appears the color is indeed there, but it's much darker. So somehow each glyph is affecting the blend of the pixels around it. I am not really sure how it's doing this.
EDIT2: Talking with another, I think we narrowed this down to anti-aliasing. Applying the antialias to the whole string vs each character produces a different result. Setting D2D1_TEXT_ANTIALIAS_MODE_ALIASED each character looks and appears exactly the same now compared to both.
I want to read a barcode from a scanned image that I printed. The image format is not relevant. I found that the scanned images are of very low quality and can understand why it normal barcodes fail.
My idea is to create a non standard and very simple barcode at the top of each page printed. It will be 20 squares in a row forming a simple binary code.Filled = 1, open = 0. It will be large enough on aA4 to make detection easy.
At this stage I need to load the image and find the barcode somewhere at the top. It will not be exactly at the same spot as it is scanned in. Step into each block and build the ID.
Any knowledge or links to info would be awesome.
If you can preset a region of interest that contains the code and nothing else, then detection is pretty easy. Scan a few rays across this region and find the white/black and black/white transitions. Then, knowing where the "cells" should be, you known their polarity.
For this to work, you need to frame your cells with two black ones on both ends to make sure to know where it starts/stops (if the scale is fixed, you can do with just a start cell, but I would not recommend this).
You could have a look at https://github.com/zxing/zxing. I would suggest to use a 1D bar code, but wide enough to match the low resolution of the scanner.
You could also invent your own bar code encoding and try to parse it your self. Use thick bars for 1 and thin lines for 0. A thick bar would be for instance 2 white pixels, 4 black pixels. A thin line would be 2 white pixels, 2 black pixels and 2 white pixels. The last two pixels encode the bit value.
The pixel should be the size of the scanned image pixel.
You then process the image scan line by scan line, trying to locate the bar code.
We locate the bar code by comparing a given pixel value sequence with a pattern. This is performed by computing a score function. The sum of squared difference is a good pick. When computing the score we ignore the two pixels encoding the bit value.
When the score is below a threshold, we found a matching pattern. It is good to add parity bits to the encoded value so that it's validity can be checked.
Computing a sum of square on a sliding window can be optimized.
I am trying to remove a background using MatLab.
I have achieved what looks like very good results using the traditional
imsubtracted = im - background;
However, the blackness that has replaced the background is not pure black. Further image processing reveals that there is a significant amount of noise left over. Is it possible to either completely remove the background or make it uniformly the same color?
Please note, I am dealing with very small objects in a rather large black space.
Once you subtract the background, you should threshold the resulting image to create a binary foreground mask. Set all the differences less than a threshold to 0 (background), and set the ones greater than or equal to a threshold to 1 (foreground). You can then use morphology such as imopen to get rid of small noisy specks in the background and imclose to get rid of small gaps or holes in the foreground.
Once you are happy with your foreground mask, you can use it as a logical index to set the background pixels to whatever color you choose.
I am trying to programmatically reduce (lossy) the file size of PNG and GIF files. As part of this I need to reduce the number of colors in the images. I don't want to reduce all the images to a single colors value, so what I am doing is; get the number of unique colors in the image, then; divide this number by 2 to reduce the number of colors by half.
The problem is this does not work. Using ImageMagic it is way too slow and doesn't reduce the file size unless the image has under a few hundred unique colors. Using GraphicsMagick always results in a unique colors value under 255 regardless of how many colors were in the original image. Another problem with GraphicsMagick is if there are any transparent pixels in the image it replaces the lost colors with transparent.
Any help would be gratefully welcome,
Thanks.
Reducing the number of colors is only useful if
the image can then use a palette instead of storing the color for each pixel
the size of a palette index is smaller than the size of a color
the image format supports the palette size
I think you can only get 1-bit, 4-bit, 8-bit, so 2-color, 16-color, or 256-color in those formats. I think if you ask for more, you just get truncated to 256. If you ask for less, it just doesn't use the entire palette.
Have you considered converting to JPEG and playing with the quality setting? You end up with more fine grain control of lossy-ness. The drawback is if the images aren't photos, but it sounds like they have a lot of colors, so they might be.
Perhaps choose 1, 4, 8 bit if it's close to what you want and jpeg if it has a lot of colors.
I think the ImageMagick facility you are after might be quantization:
http://www.imagemagick.org/Usage/quantize/
First problem, GraphicsMagick can be compiled using 8 bit, 16 bit or 32 bit quantum levels. My version is compiled into 8 bit (the default), this means that the max number of colors that can be assigned to an image is 256 unique colors (3*3*2, 1 of the blue bits is removed because the human eye can't see it properly). Obviously, GraphicsMagick can handle images with more colors than this but when reducing colors it can only reduce to 256 or less colors. Larger pixel quantums cause GraphicsMagick to run more slowly and to require more memory. For example, using sixteen-bit pixel quantums causes GraphicsMagick to run 15% to 50% slower (and take twice as much memory) than when it is built to support eight-bit pixel quantums.
Second problem; transparency handling in PNG images, I was using an earlier version of GraphicsMagick (1.1 I think), anyway, when I upgraded to 1.3 this problem is no longer present which tells me that it was a bug in GraphicsMagick 1.1 that caused this.