Here is an example of binary images, i.e. as input we have an imageByteArray with 2 possible values: 0 and 255.
Example1:
Example2:
The image contains some document edge on a background.
The task is to remove, decrease amount of background pixels with minimal impact on edge pixels.
The question is what modern algorithms, techniques exist to do this?
What I do not expect as an answer: use Gaussian blur to get rid of background noise, use bitonal algorithm (Canny, Sobel, etc.) thresholds or use Hough (Hough linearization goes crazy on such noise no matter what options are set)
The simplest solution is to detect all contours and filter out ones with the lowest length. This works good, but sometimes depending on an image it will also erase useful edge pixels pretty much.
Update:
As input I have standard RGB image with a document (driver license ID, check, bill, credit card, ...) on some background. The main task is to detect document edges. Next steps are pretty known: greyscale, blur, Sobel binarization, Hough probabilistic, find rectangle or trapezium (if trapezium shape found then go to perspective transformation). On simple contrast backgrounds it all works fine. The reason why I am asking about noise reduction is that I have to work with thousands of backgrounds and some of them give noise no matter what options used. The noise will cause additional lines no matter how Hough is configured and additional lines may fool subsequent logic and seriously affect performance. (It is implemented in java script, no OpenCV or GPU support).
It's hard to know whether this approach will work with all your images since you only provided one, but a Hough Line detection with ImageMagick and these parameters in the Terminal command-line produces this:
convert card.jpg \
\( +clone -background none -fill red -stroke red \
-strokewidth 2 -hough-lines 49x49+100 -write lines.mvg \
\) -composite hough.png
and the file lines.mvg contains 4 lines as follows:
# Hough line transform: 49x49+100
viewbox 0 0 1024 765
line 168.14,0 141.425,765 # 215
line 0,155.493 1024,191.252 # 226
line 0,653.606 1024,671.48 # 266
line 940.741,0 927.388,765 # 158
ImageMagick is installed on most Linux distros and is available for OSX and Windows from here.
I assume you did mean binary image instead of bitonic...
Do flood fill based segmentation
scan image for set pixels (color=255)
for each set pixel create a mask/map of its area
Just flood fill set pixels with 4 or 8 neighbor connection and count how many pixels you filled.
for each filled area compute its bounding box
detect edge lines
edge lines have rectangular bounding box so test its aspect ratio if close to square then this is not edge line
also too small bounding box means not an edge line
too small filled pixels count in comparison to bounding box bigger side size then area is also not an edge line
You can make this more robust if you regress line for set pixels of each area and compute the average distance between regressed line and each set pixel. If too high area is not edge line ...
recolor not edge lines areas to black
so either substract the mask from image or flood fill with black again ...
[notes]
Sometimes step #5 can mess the inside of document. In that case you do not recolor anything instead you remember all the regressed lines for edge areas. Then after whole process is done join together all lines that are parallel and close to same axis (infinite line) that should reduce to 4 big lines determining document rectangle. So now fill with black all outside pixels (by geometric approach)
For such tasks you would usually carefully examine input data and try to figure out what cues can you utilize. But unfortunately you have provided only one example, which makes this approach pretty useless. Besides, this representation is not really comfortable to work with - have you done some preprocessing, or this is what you get as input? In first case, you may get better advice if you can show us real input.
Next, if your goal is noise reduction and not document/background segmentation - you are really limited in options. Similar to what you said, I would try to detect connected components with 255 intensity (instead of detecting contours, which can be less robust) and remove ones with small area. That may fail on certain instances.
Besides, on image you have provided you can use local statistics to suppress areas of regular noise. This will reduce background clutter if you select neighborhood size appropriately.
But again, if you are doing this for document detection - there may be more robust approaches.
For example, if you know the foreground object (driver's ID) - you can try to collect a dataset of ID images, and calculate the 'typical' color histogram - it may be rather characteristic. After that, you can backproject this histogram on input image and get either rough region of interest, or maybe even precise mask. Then you may binarize it and try to detect contours. You may try different color spaces and bin sizes to see which fits best.
If you have to work in different lighting conditions you can try to equalize histogram or do some other preprocessing to reduce color variation caused by lighting.
Strictly answering the question for the binary image (i.e. after the harm as been made):
What seems characteristic of the edge pixels as opposed to noise is that they form (relatively) long and smooth chains.
So far I see no better way than tracing all chains of 8-connected pixels, for instance with a contour following algorithm, and detect the straight sections, for example by Douglas-Peucker simplification.
As the noise is only on the outside of the card, the outline of the blobs will have at least one "clean" section. Keep the sections that are long enough.
This may destroy the curved corners as well and actually you should look for the "smooth" paths that are long enough.
Unfortunately, I cannot advise of any specific algorithm to address that. It should probably be based on graph analysis combined to geometry (enumerating long paths in a graph and checking the local/global curvature).
As far as I know (after reading thousands related articles), this is nowhere addressed in the literature.
None of the previous answers would really work, the only thing that can work here is a blob filter, filter it so that blobs below a certain size get deleted.
Related
I'm writing a voronoi-based world generator in which I distinguish between geographic features like mountains, lakes, forests, and oceans.
Each feature is given an id so it can be identified and referenced. I use a flood fill algorithm to determine what features cells belong to.
I've realized a couple similar cases where I'd like to split a feature into multiple smaller ones. The most straightforward example is that of two big forests connected by a narrow strip of forest. Realistically, it should be treated as two forests, separated from each other around the narrow strip but my fill algorithm just plows right through and labels everything as part of one large forest.
I'd like to eventually label them "West 100 Acre Wood" and "East 100 Acre Wood", giving them the knowledge that they're deriving from the same continuous body of forest. I've looked up partial flood fill logic but my search has gotten stuck due to my lack of subject terminology.
If you'd like to see the code I'm working with:
https://github.com/olinkirkland/map
You would typically use a "morphological opening" see Wikipedia definition which is a morphological erosion followed by a dilation. If you imagine a white foreground object of interest on a black background, the erosion will erode (nibble away at the edges of) the object and the dilation will expand/fatten the edges back out - thereby removing small strips and narrow connections.
You can do it with the Scikit-Image module in Python, or with OpenCV in Python or C++. I choose to just do it at the command-line in Terminal here using ImageMagick which is installed on most Linux distros and is available for macOS and Windows.
So, using this map image:
I load it, invert/negate it to make the forest white, then apply the morphological opening I mentioned and then invert it back and save:
magick convert map.png -negate -morphology open disk:5 -negate result.png
After you find a connected region, you can trace around the interior using the right-hand rule (https://en.wikipedia.org/wiki/Maze_solving_algorithm#Wall_follower).
To find single-pixel paths that would make good splitting points, then, you can look for pixels in this interior path that are visited twice (once in each direction). If the path length is long enough on both sides of the split, then you can split the region into two at that pixel and maybe try again with the smaller regions.
If you want to find split points that are more than one pixel wide, or ensure that the forests on either side are "beefy" enough, I would suggest a pixel-based approach that combines this with the other methods:
BFS to remove pixels that are less than w away from the boundary.
Find each remaining connected region. Each will be the "center" of a forest.
Check each center to make sure it has pixels far enough from the edge to be the center of a forest.
Add the removed pixels back, connecting them to the closest center.
You could use a technique from image processing which uses blurring and applying a threshold of 50%. This way, thin connections and sharp spikes are reduced and features generally get rounder while the overall size of objects shouldn't change in one specific direction. Here's an image of what the process looks like when applied repetitively:
Separation of forests by blurring and applying a threshold
The top image represents your original situation with two forests which are connected by a thin corridor. The process step by step removes the corridor.
You can adjust some parameters in this process, e. g. the blurring radius and the number of steps, so you can tweak it to your needs.
I have an image which may contain some blobs. The blobs can be any size, and some will yield a very strong signal, while others are very weak. In this question I will focus on the weak ones because they are the difficult ones to detect.
Here is an example with 4 blobs.
The blob at (480, 180) is the most difficult one to detect. By running a Gaussian filter followed by an opening operation increases the contrast a bit, but not a lot:
The tricky part of this problem is that the natural noise in the background will result in (many) pixels which have a stronger signal than the blob I want to detect. What makes the blob a blob is that it's either a large area with an average increase in intensity, (or a small area with a very strong increase in intensity (not relevant here)).
How can I include this spacial information in order to detect my blob?
It is obvious that I first needs to filter the image with a Gaussian and/or median filter in order to incorporate the nearby region of each pixel into each single pixel value. However, no amount of blurring is enough to make it easy to segment the blobs from the background.
EDIT: Regarding thresholding: Thresholding is very temping, but also problematic by itself. I do not have a region of "pure background" and the larger a blob is, the weaker the signal can be - while still being detectable.
I should also not that the typical image will not have any blobs at all, but just be pure background.
You could try a h-minima transform. It removes any minima under the height of h and increases the height of all other throughs by h. It's defined as the morphological reconstruction of an erosion increased by the height h. Here's the results with h = 35:
It should be a lot easier to manipulate. It also needs a input like segmentation. The difference is that this is more robust. Underestimating h by a relatively large number will only bring you back closer to the original problem image instead of failing completely.
You could try to characterize the background noise to get an estimate, assuming that whatever your application is would have a relatively constant amount of it.
Note that one blue dot between the two large bottom blobs. Even further processing is needed. You could try continuing with the morphology. Something that I have found to work in some 'ink-blot' segmentation cases like this is running through every connected component, calculating their convex hulls and finally the union of all the convex hulls in the image. It usually makes further morphological operations much easier and provides a good estimate for the label.
In my experience, if you can see your gaussian filter size (those little circles), then your filter width is too small. Although terribly expensive, try bumping up the radius on your gaussian, it should continue to improve your results up to its radius matching the radius of the smallest object you are trying to find.
Following that (heavy gaussian), I would do a peak search across the whole image. Cut out any peaks that are too low, and or have too little contrast to the nearest valley/ background.
Don't be afraid to split this into two isolated processing pipelines: ie one filtration and extraction for low contrast spread out blobs, and a completely different one to isolate high contrast spikes (much much easier to find). That being said, a high contrast spike "should" survive even a pretty aggressive filter. Another thing to keep in mind is iterative subtraction, if there are some blobs that can be found easily from the get go, pull them out of the image and then do a stretch (but be careful as you can make the image be whatever you want it to be with too much stretching)
Maybe try an iterative approach using thresholding and edge detection:
Start with a very high threshold (say 90% signal), then run a canny filter (or any binary edge filter you like) on the thresholded image. Count and store the number of pixels (edge pixels) generated.
Proceed to repeat this step for lower and lower thresholds. At a certain point you are going to see a massive spike in edges detected (ie your cool textured background). Then pull back the threshold a little higher and run closing and floodfill on your resulting edge image.
I don't know much about image processing so please bear with me if this is not possible to implement.
I have several sets of aerial images of the same area originating from different sources. The pictures have been taken during different seasons, under different lighting conditions etc. Unfortunately some images look patchy and suffer from discolorations or are partially obstructed by clouds or pix-elated, as par example picture1 and picture2
I would like to take as an input several images of the same area and (by some kind of averaging them) produce 1 picture of improved quality. I know some C/C++ so I could use some image processing library.
Can anybody propose any image processing algorithm to achieve it or knows any research done in this field?
I would try with a "color twist" transform, i.e. a 3x3 matrix applied to the RGB components. To implement it, you need to pick color samples in areas that are split by a border, on both sides. You should fing three significantly different reference colors (hence six samples). This will allow you to write the nine linear equations to determine the matrix coefficients.
Then you will correct the altered areas by means of this color twist. As the geometry of these areas is intertwined with the field patches, I don't see a better way than contouring the regions by hand.
In the case of the second picture, the limits of the regions are blurred so that you will need to blur the region mask as well and perform blending.
In any case, don't expect a perfect repair of those problems as the transform might be nonlinear, and completely erasing the edges will be difficult. I also think that colors are so washed out at places that restoring them might create ugly artifacts.
For the sake of illustration, a quick attempt with PhotoShop using manual HLS adjustment (less powerful than color twist).
The first thing I thought of was a kernel matrix of sorts.
Do a first pass of the photo and use an edge detection algorithm to determine the borders between the photos - this should be fairly trivial, however you will need to eliminate any overlap/fading (looks like there's a bit in picture 2), you'll see why in a minute.
Do a second pass right along each border you've detected, and assume that the pixel on either side of the border should be the same color. Determine the difference between the red, green and blue values and average them along the entire length of the line, then divide it by two. The image with the lower red, green or blue value gets this new value added. The one with the higher red, green or blue value gets this value subtracted.
On either side of this line, every pixel should now be the exact same. You can remove one of these rows if you'd like, but if the lines don't run the length of the image this could cause size issues, and the line will likely not be very noticeable.
This could be made far more complicated by generating a filter by passing along this line - I'll leave that to you.
The issue with this could be where there was development/ fall colors etc, this might mess with your algorithm, but there's only one way to find out!
I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).
i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.