Maximally Stable Extremal Regions (MSER) Implementation in Document Image Character Patch Identification - image

My task is to identify character patches within the document image. Consider the image below:
Based from the paper, to extract character patches, the MSER based method will be adopted to detect character candidates.
"The main advantage of the MSER based method is that such algorithm is
able to find most legible characters even when the document image is in
low quality."
Another paper discusses about MSER. I'm having a hard time understanding the latter paper. Can anyone explain to me in simple terms the steps that I should take to implement MSER and extract character patches in my sample document. I will implement it in Python, and I need to fully grasp / understand how MSER works.
Below are the steps to identify character patches in the image document (based from the way I understand it, please correct me if I am wrong)
"First, pixels are sorted by intensity"
My comprehension:
Say for example I have 5 pixels in an image with intensities (Pixel 1) 1, (Pixel 2) 9,(Pixel 3) 255,(Pixel 4) 3,(Pixel 5) 4 consecutively, then if sorted increasingly, based on intensity it will yield an output, Pixel 1,4,5,2 and 3.
After sorting, pixels are placed in the image (either in decreasing or increasing order) and the list of connected components and their areas is maintained using the efficient union-find algorithm.
My Comprehension:
Using the example in number 1. Pixels will be arranged like below. Pixel component/group and Image X,Y coordinates are just examples.
Pixel Number | Intensity Level | Pixel Component/Group | Image X,Y Coordinates
1 | 1 | Pixel Component # 5 | (14,12)
4 | 3 | Pixel Component # 1 | (234,213)
5 | 4 | Pixel Component # 2 | (231,14)
2 | 9 | Pixel Component # 3 | (23,21)
3 | 255 | Pixel Component # 1 | (234,214)
"The process produces a data structure storing the area of each connected component as a function of intensity."
My comprehension:
A column in table in #2 will be added, called Area. It will count the number of pixels in a specific component with the same intensity level. Its like an aggregation of pixels within the component group with the same intensity level.
4."Finally, intensity levels that are local minima of the rate of change of the area function are selected as thresholds producing MSER. In the output, each MSER is represented by position of a local intensity minimum (or maximum) and a threshold."
How to get the local minima of the rate of change of the area function ?
Please help me understand this what and how to implement MSER. Feel free to correct my understanding. Thanks.

In one article the authors track a value they call "stability" (which roughly means the rate of change of area when going from region to region in their data structure), and then find regions corresponding to local minima of that value (a local minimum is a point in which the value of interest is smaller than that in the closest neighbors). If that is of any help, here is a C++ implementation of MSER (based on another article).

Related

Image regression with unknown number of targets/labels

I have grey scale images with an unknown frequency of handwriten digits (0-9) on them.
I am trying a build machine learning model that determines:
The x,y coordinate for each digit.
The digit label (i.e. 0-9).
Example
(I couldn't upload the greyscale images, so suppose . denotes "black background", and the numbers represent themselves):
Image1: Image2: Image3:
7....... .2...... ........
........ .....3.. ........
....1... ........ ........
........ ....2... ........
Thus, letting f denote my machine learning model/function we should have:
f(Image1) = [ label0:[], f(Image2) = [ label0:[], f(Image3) = [ label0:[],
label1:[(x=4,y=2)], label1:[], label1:[],
label2:[], label2:[(x=1,y=0), label2:[],
(x=1,y=3)],
label3:[], label3:[(x=5,y=1)], label3:[],
label4:[], label4:[], label4:[],
label5:[], label5:[], label5:[],
label6:[], label6:[], label6:[],
label7:[(x=0,y=0)], label7:[], label7:[],
label8:[], label8:[], label8:[],
label9:[], label9:[], label9:[],
]
I'm attempting to apply deep learning methods using Keras to solve both problems at the same time, but I'm struggling to setup my labels as there are an unknown number of labels for each image.
Anyone have any ideas about how I could setup such a problem for deep learning?Should I break the problem into 2 stages (location then classification - but then the location problem still has an unknown number of labels)? Thanks!
You can divide this problem to two parts.
In first part you should create a method for detecting if in the image is digit or not. For this purpose you can use method called "sliding windows" (watch this video by Andrew Ng explaining this method). Assume you have image with size 200x200 and each digit has size around 20x20. You can create a window of size 20x20, and in each iteration window is moving right by 20 pixels (or less/more), if window reach right part of image, its moving back to left side, and 20 pixel down (or less/more). After each move of window you're cropping image, checking by using neural network if there's digit on cropped image. If there's digit, you're saving x, y coordinates of window and cropped image to images' array.
Second part should be easy, having digits you're passing them to neural network which determines digit's label.
So, you should train two neural networks - one for detecting if there's digit on image or not, and second for determining label of digit.
There's second way to find digits on image, you can train neural network which determines number of digits on image (it might be difficult), and then, by using k-means (you should set number of clusters to number of digits you're got from NN) you can find positions of digits if they're not too close to each other. I did this in one project, and it worked, but you should have images with plain background and you have to create an array with pixels positions which have brightness exceeding some threshold.

Detecting individual images in an array of images

I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.

"Barcode" reading from scanned image

I want to read a barcode from a scanned image that I printed. The image format is not relevant. I found that the scanned images are of very low quality and can understand why it normal barcodes fail.
My idea is to create a non standard and very simple barcode at the top of each page printed. It will be 20 squares in a row forming a simple binary code.Filled = 1, open = 0. It will be large enough on aA4 to make detection easy.
At this stage I need to load the image and find the barcode somewhere at the top. It will not be exactly at the same spot as it is scanned in. Step into each block and build the ID.
Any knowledge or links to info would be awesome.
If you can preset a region of interest that contains the code and nothing else, then detection is pretty easy. Scan a few rays across this region and find the white/black and black/white transitions. Then, knowing where the "cells" should be, you known their polarity.
For this to work, you need to frame your cells with two black ones on both ends to make sure to know where it starts/stops (if the scale is fixed, you can do with just a start cell, but I would not recommend this).
You could have a look at https://github.com/zxing/zxing. I would suggest to use a 1D bar code, but wide enough to match the low resolution of the scanner.
You could also invent your own bar code encoding and try to parse it your self. Use thick bars for 1 and thin lines for 0. A thick bar would be for instance 2 white pixels, 4 black pixels. A thin line would be 2 white pixels, 2 black pixels and 2 white pixels. The last two pixels encode the bit value.
The pixel should be the size of the scanned image pixel.
You then process the image scan line by scan line, trying to locate the bar code.
We locate the bar code by comparing a given pixel value sequence with a pattern. This is performed by computing a score function. The sum of squared difference is a good pick. When computing the score we ignore the two pixels encoding the bit value.
When the score is below a threshold, we found a matching pattern. It is good to add parity bits to the encoded value so that it's validity can be checked.
Computing a sum of square on a sliding window can be optimized.

What are some common focus stacking algorithms?

I want to write my own focus stacking software but haven't been able to find a suitable explanation of any algorithm for extracting the in-focus portions of each image in the stack.
For those who are not familiar with focus stacking, this Wikipedia article does a nice job of explaining the idea.
Can anyone point me in the right direction for finding an algorithm? Even some key words to search would be helpful.
I realise this is over a year old but for anyone who is interested...
I have had a fair bit of experience in machine vision and this is how I would do it:
Load every image in memory
Perform a Gaussian blur on each image on one of the channels (maybe Green):
The simplest Gaussian kernel is:
1 2 1
2 4 2
1 2 1
The idea is to loop through every pixel and look at the pixels immediately adjacent. The pixel that you are looping through is multiplied by 4, and the neighboring pixels are multiplied by whatever value corresponds to the kernel above.
You can make a larger Gaussian kernel by using the equation:
exp(-(((x*x)/2/c/c+(y*y)/2/c/c)))
where c is the strength of the blur
Perform a Laplacian Edge Detection kernel on each Gaussian Blurred image but do not apply a threshold
The simplest Laplacian operator is:
-1 -1 -1
-1 8 -1
-1 -1 -1
same deal as the Gaussian, slide the kernel over the entire image and generate a result.
An equation to work out larger kernels is here:
(-1/pi/c/c/c/c)*(1-(x*x+y*y)/2/c/c)*exp(-(x*x+y*y)/2/c/c)
Take the absolute value of the Laplacian of Gaussian result. this will quantify the strength of edges with respect to the size and strength of your kernel.
Now create a blank image, loop through each pixel and find the strongest edge in the LoG (i.e. the highest value in the image stack) and take the RGB value for that pixel from the corresponding image.
Here is an example in MATLAB that I have created:
http://www.magiclantern.fm/forum/index.php?topic=11886.0
You are free to use it for whatever you like. It will create a file called Outsharp.bmp which is what you are after.
To better your output image you could:
- Compensate for differences in lightness levels between images (i.e. histogram matching or simple level adjustment)
- Create a custom algorithm to reject image noise
- Manually adjust the stack after you have generated it
- Apply a Gaussian blur (be sure to divide the result by 16) on the focus map so that the individual images are better merged
Good luck!

Is there any way to divide rgb color palette?

I'm trying to generate a color palette which has 16 colors.
i will display this palette in 4x4 grid.
so i have to find a way to rgb color palette which has
255*255*255 colors divided to 16 colors equally and logically.
i think it's gonna be a mathematical algorithm.
because i'm tring to pick 16 vectors from
3x3 matrix which picked in equal scale.
actually i have found a way depends on this "dividing color palette" problem.
i will use this color values with converting rgb values to hsv values.
hue, saturation, value
so i can use one integer value between 0-360 or i can use one integer between 0-100 (%) for my color palette.
finally, i can easily use this values for searhing/filtering my data based on color selection. i'm diving 0-360 range to 16 pices equally, so i can easily define 16 different colors.
but thanks for different approaches
You are basically projecting a cube (R X G X B) onto a square (4 X 4). First, I would start by asking myself what size cube fits inside that square.
1 X 1 X 1 = 1
2 X 2 X 2 = 8
3 X 3 X 3 = 27
The largest cube that fits in the square has 8 colors. At that point, I would note how conveniently 8 is an integral factor of 16.
I think the convenience would tempt me to use 8 basic colors in 2 variants like light and dark or saturated and unsaturated.
You can approach this as a purely mathematical equipartition problem, but then it isn't really about color.
If you are trying to equipartition a color palette in a way that is meaningful to human perception, there are a large number of non-linearities that need to be taken into account which this article only mentions. For example, the colors #fffffe, #fffeff, and #feffff occupy far corners of the mathematical space, but are nearly indistinguishable to the human eye.
When the number of selected colors (16) is so small (especially compared to the number of available colors), you'll be much better off hand-picking the good-looking palette or using a standard one (like some pre-defined system or Web palette for 16 color systems) instead of trying to invent a mathematical algorithm for selecting the palette.
A lot depends on what the colors are for. I you just want 16 somewhat arbitrary colors, I would suggest:
black darkgray lightgray white
darkred darkgreen darkblue darkyellow
medred medgreen medblue medyellow
lightred lightgreen lightblue lightyellow
I used that color set for a somewhat cartoonish-colored game (VGA) and found it worked pretty well. I think I sequenced the colors a little differently, but the sequence above would seem logical if arranged in a 4x4 square.
This is a standard problem and known as color quantization.
There are a couple of algorithms for this:
Objective: You basically want to make 16 clusters of your pixel in a 3 dimension space where the 3 axes varies from 0 to 255.
Methods are:
1) rounding of first significant bits. -- very easy to implement but does not give good result.
2) histogram method. - take median effort and give better result
3) quad tree. - state of the art data structure. Give best result but implementing the qaud tree data structure is hard.
There might be some more algorithms. But I have used these 3.
Start with the color as an integer for obvious math (or start with hex if you can think in base 16). Add to the color the number for each desired sample. Convert the color integer to hex, and then split the hex to RGB. In this code example the last color will be within the number of divisions to hex white (0xffffff).
# calculate color sample sizes
divisions = 16 # number of desired color samples
total_colors = 256**3-1
color_samples = int((total_colors) / divisions)
print('{0:,} colors in {1:,} parts requires {2:,} per step'.format(total_colors, divisions , color_samples))
# loop to print results
ii = 0
for io in range(0,total_colors,color_samples):
hex_color = '{0:0>6}'.format(hex(io)[2:])
rc = hex_color[0:2] # red
gc = hex_color[2:4] # blue
bc = hex_color[4:6] # green
print('{2:>5,} - {0:>10,} in hex {1} | '.format(io, hex_color, ii), end='')
print('r-{0} g-{1} b-{2} | '.format(rc, gc, bc), end='')
print('r-{0:0>3} g-{1:0>3} b-{2:0>3}'.format(int(rc,16), int(gc,16), int(bc,16)))
ii +=1

Resources