I have a barcode image. I have to make it smaller.
Can that damage the barcode?
Proportional scaling
Not proportional scaling (only height changes)
Barcodes are: Type UPC-A / EAN-13 "vertical lines". Sorry not an expert in barcodes, thought the type of barcode would not be important. Scaling is moderate, the image does not lose relevant data.
Regular barcode (=vertical stripes) is recognized by the relative width of the lines. Thus, the horizontal height only matters for robustness against diagonal scanning. If the codes are scanned with a hand scanner, I'd just scale the height (or crop the image). In any case, the different widths of the lines should still be clearly visible. There may be compliance rules suggesting minimum proportions for a given barcode standard.
For regular linear product barcodes, the simple answer is yes, you can scale it (both case are safe).
However, if you scale too far and the bars end up too close together, you will start to get a high level of read errors.
You'll need to test it with an appropriate barcode reader to make sure you haven't scaled too much.
When scaling a barcode, there are several things you must keep in mind.
1) You get the absolute sharpest edges in a barcode if each module (the narrowest bar) is a whole number of pixels wide.
2) If the module width is not a whole number of pixels, produce a barcode where the width of each module is the truncated whole number and use bilinear interpolation to scale up. This will give you at most one pixel of gradient at the edges.
3) Be careful when buying a barcode library, choose one that includes built-in scaling that preserves the barcode, such as this one or this one. Barcodes have special demands that image processing normally does not have, such as pixel-perfection. Using e.g. Gimp might damage the barcode.
Related
Just a straight forward question. I´m trying to make the best possible choice here and there is too much information for a "semi-beginner" like me.
Well, at this point, I´m trying with screen size values for my layout (activity_main.xml (normal, large, small)) and with different densities (xhdpi, xxhdpi, mhdpi) and, if a can say so myself, it is a mess. Do I have to create every possible option to support all screen sizes and densities? Or am I doing something really wrong here? what is the best approach for this?
My layouts are now activity_main(normal_land_xxhdpi) and I have serious doubts about it.
I´m using last version of android studio of course. My app is a single activity with buttons, textview and others. Does not have any fragments or intents whatsoever, and for that reason I think this has to be an easy task, but not for me.
Hope you guys can help. I don't think i need to put any code here, but if needed, i can add it.
If you want to make a responsive UI for every device you need to learn about some things first:
-Difference between PX, DP:
https://developer.android.com/training/multiscreen/screendensities
Here you can understand that dp is a standard measure which android uses to calculate how much pixels, lets say a line, should have to keep the proportions between different screensizes with different densities.
-Resolution, Density and Ratio:
The resolution is how much pixels a screen has over height and width. This pixels can be smaller or bigger, so for instance, if you have a screen A with 10x10 px whose pixels are two times smaller than other screen B with 10 x 10 pixels too, A is two times smaller than B even when both have 10 x 10 px.
For that reason exists the meaning Density, which is how much pixels your screen has for every inch, so you can measure the quality of a screen where most pixels per inch (ppi) is better.
Ratio tells you how much pixels are for height as well as width, for example the ratio of a screen of 1000 x 2000 px is 1:2, a full hd screen of 1920 x 1080 is 16:9 (16 pixels height for every 9 pixels width). A 1:1 ratio is a square screen.
-Standard device's resolutions
You can find the most common measurements on...
https://material.io/resources/devices/
When making a UI, you use the DP measurements. You will realize that even when resolution between screens are different, the DP is the same cause they have different densities.
Now, the right way is going with constraint layout using dp measures to put your views on screen, with correct constraints the content will adapt to other screen sizes
Anyway, you will need to make additional XML for some cases:
-Different orientation
-Different ratio
-Different DP resolution (not px)
For every activity, you need to provide a portrait and landscape design. If other device has different ratio, maybe you will need to adjust the height or width due to the proportions of the screens aren't the same. Finally, even if the ratio is the same, the DP resolution could be different, maybe you designed an activity for a 640x360dp and other device has 853x480dp, which means you will have more vertical space.
You can read more here:
https://developer.android.com/training/multiscreen/screensizes
And learn how to use constraintLayout correctly:
https://developer.android.com/training/constraint-layout?hl=es-419
Note:
Maybe it seems to be so much work for every activity, but you make the first design and then you just need to copy the design to other xml with some qualifiers and change the dp values to adjust the views as you wants (without making from scratch) which is really faster.
While people usually tend to simply resize any image into a square while training a CNN (for example, resnet takes a 224x224 square image), that looks ugly to me, especially when the aspect ratio is not around 1.
(In fact, that might change ground truth, for example, the label that an expert might give the distorted image could be different than the original one).
So now I resize the image to, say, 224x160 , keeping the original ratio, and then I pad the image with 0s (by pasting it into a random location in a totally black 224x224 image).
My approach doesn't seem original to me, and yet I cannot find any information whatsoever about my approach versus the "usual" approach.
Funky!
So, which approach is better? Why? (if the answer is data dependent, please share your thoughts regarding when one is preferable to the other.)
According to Jeremy Howard, padding a big piece of the image (64x160 pixels) will have the following effect: The CNN will have to learn that the black part of the image is not relevant and does not help distinguishing between the classes (in a classification setting), as there is no correlation between the pixels in the black part and belonging to a given class. As you are not hard coding this, the CNN will have to learn it by gradient descent, and this might probably take some epochs. For this reason, you can do it if you have lots of images and computational power, but if you are on a budget on any of them, resizing should work better.
Sorry, this is late but this answer is for anyone facing the same issue.
First, if scaling with changing the aspect ratio will affect some important features, then you have to use zero-padding.
Zero padding doesn't make it take longer for the network to learn because of the large black area itself but because of the different possible locations that the unpadded image could be inside the padded image since you can pad an image in many ways.
For areas with zero pixels, the output of the convolution operation is zero. The same with max or average pooling. Also, you can prove that the weight is not updated after backpropagation if the input associated with that weight is zero under some activation functions (e.g. relu, sigmoid). So the large area doesn't make any updates to the weights in this sense.
However, the relative position of the unpadded image inside the padded image does indeed affect training. This is not due to the convolution nor the pooling layers but the last fully connected layer(s). For example, if the unpadded image is on the left relative inside the padded image and the output of flattening the last convolution or pooling layer was [1, 0, 0] and the output for the same unpadded image but on the right relative inside the padded image was [0, 0, 1] then the fully connected layer(s) must learn that [1, 0, 0] and [0, 0, 1] are the same thing for a classification problem.
Therefore, learning the equivariance of different possible positions of the image is what makes training take more time. If you have 1,000,000 images then after resizing you will have the same number of images; on the other hand, if you pad and want to consider different possible locations (10 randomly for each image) then you will have 10,000,000 images. That is, training will take 10 times longer.
That said, it depends on your problem and what you want to achieve. Also, testing both methods will not hurt.
I want to read a barcode from a scanned image that I printed. The image format is not relevant. I found that the scanned images are of very low quality and can understand why it normal barcodes fail.
My idea is to create a non standard and very simple barcode at the top of each page printed. It will be 20 squares in a row forming a simple binary code.Filled = 1, open = 0. It will be large enough on aA4 to make detection easy.
At this stage I need to load the image and find the barcode somewhere at the top. It will not be exactly at the same spot as it is scanned in. Step into each block and build the ID.
Any knowledge or links to info would be awesome.
If you can preset a region of interest that contains the code and nothing else, then detection is pretty easy. Scan a few rays across this region and find the white/black and black/white transitions. Then, knowing where the "cells" should be, you known their polarity.
For this to work, you need to frame your cells with two black ones on both ends to make sure to know where it starts/stops (if the scale is fixed, you can do with just a start cell, but I would not recommend this).
You could have a look at https://github.com/zxing/zxing. I would suggest to use a 1D bar code, but wide enough to match the low resolution of the scanner.
You could also invent your own bar code encoding and try to parse it your self. Use thick bars for 1 and thin lines for 0. A thick bar would be for instance 2 white pixels, 4 black pixels. A thin line would be 2 white pixels, 2 black pixels and 2 white pixels. The last two pixels encode the bit value.
The pixel should be the size of the scanned image pixel.
You then process the image scan line by scan line, trying to locate the bar code.
We locate the bar code by comparing a given pixel value sequence with a pattern. This is performed by computing a score function. The sum of squared difference is a good pick. When computing the score we ignore the two pixels encoding the bit value.
When the score is below a threshold, we found a matching pattern. It is good to add parity bits to the encoded value so that it's validity can be checked.
Computing a sum of square on a sliding window can be optimized.
I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).
i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.