Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to understand MATLAB's code for the Hough Transform.
Some items are clear to me in this picture,
binary_image is the monochrome version of input_image.
hough_lines is a vector containing detected lines in the image. I see that, four lines have been detected.
T contain the thetas in the (ϴ, ρ) space of the image.
R contain the rhos in the (ϴ, ρ) space of the image.
I have the following questions,
Why is the image rotated before applying Hough Transform?
What do the entries in H represent?
Why is H(Hough Matrix) of size 45x180? Where does this size come from?
Why is T of size 1x180? Where does this size come from?
Why is R of size 1x45? Where does this size come from?
What do the entries in P represent? Are they (x, y) or (ϴ, ρ) ?
29 162
29 165
28 170
21 5
29 158
Why is the value 5 passed into houghpeaks()?
What is the logic behind ceil(0.3*max(H(:)))?
Relevant source code
% Read image into workspace.
input_image = imread('Untitled.bmp');
%Rotate the image.
rotated_image = imrotate(input_image,33,'crop');
% convert rgb to grascale
rotated_image = rgb2gray(rotated_image);
%Create a binary image.
binary_image = edge(rotated_image,'canny');
%Create the Hough transform using the binary image.
[H,T,R] = hough(binary_image);
%Find peaks in the Hough transform of the image.
P = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));
%Find lines
hough_lines = houghlines(binary_image,T,R,P,'FillGap',5,'MinLength',7);
% Plot the detected lines
figure, imshow(rotated_image), hold on
max_len = 0;
for k = 1:length(hough_lines)
xy = [hough_lines(k).point1; hough_lines(k).point2];
plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');
% Plot beginnings and ends of lines
plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');
% Determine the endpoints of the longest line segment
len = norm(hough_lines(k).point1 - hough_lines(k).point2);
if ( len > max_len)
max_len = len;
xy_long = xy;
end
end
% Highlight the longest line segment by coloring it cyan.
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','cyan');
Those are some good questions. Here are my answers for you:
Why is the image rotated before applying Hough Transform?
This I don't believe is MATLAB's "official example". I just took a quick look at the documentation page for the function. I believe you pulled this from another website that we don't have access to. In any case, in general it is not necessary for you to rotate the images prior to using the Hough Transform. The goal of the Hough Transform is to find lines in the image in any orientation. Rotating them should not affect the results. However, if I were to guess the rotation was performed as a preemptive measure because the lines in the "example image" were most likely oriented at a 33 degree angle clockwise. Performing the reverse rotation would make the lines more or less straight.
What do the entries in H represent?
H is what is known as an accumulator matrix. Before we get into what the purpose of H is and how to interpret the matrix, you need to know how the Hough Transform works. With the Hough transform, we first perform an edge detection on the image. This is done using the Canny edge detector in your case. If you recall the Hough Transform, we can parameterize a line using the following relationship:
rho = x*cos(theta) + y*sin(theta)
x and y are points in the image and most customarily they are edge points. theta would be the angle made from the intersection of a line drawn from the origin meeting with the line drawn through the edge point. rho would be the perpendicular distance from the origin to this line drawn through (x, y) at the angle theta.
Note that the equation can yield infinity many lines located at (x, y) so it's common to bin or discretize the total number of possible angles to a predefined amount. MATLAB by default assumes there are 180 possible angles that range from [-90, 90) with a sampling factor of 1. Therefore [-90, -89, -88, ... , 88, 89]. What you generally do is for each edge point, you search over a predefined number of angles, determine what the corresponding rho is. After, we count how many times you see each rho and theta pair. Here's a quick example pulled from Wikipedia:
Source: Wikipedia: Hough Transform
Here we see three black dots that follow a straight line. Ideally, the Hough Transform should determine that these black dots together form a straight line. To give you a sense of the calculations, take a look at the example at 30 degrees. Consulting earlier, when we extend a line where the angle made from the origin to this line is 30 degrees through each point, we find the perpendicular distance from this line to the origin.
Now what's interesting is if you see the perpendicular distance shown at 60 degrees for each point, the distance is more or less the same at about 80 pixels. Seeing this rho and theta pair for each of the three points is the driving force behind the Hough Transform. Also, what's nice about the above formula is that it will implicitly find the perpendicular distance for you.
The process of the Hough Transform is very simple. Suppose we have an edge detected image I and a set of angles theta:
For each point (x, y) in the image:
For each angle A in the angles theta:
Substitute theta into: rho = x*cos(theta) + y*sin(theta)
Solve for rho to find the perpendicular distance
Remember this rho and theta and count up the number of times you see this by 1
So ideally, if we had edge points that follow a straight line, we should see a rho and theta pair where the count of how many times we see this pair is relatively high. This is the purpose of the accumulator matrix H. The rows denote a unique rho value and the columns denote a unique theta value.
An example of this is shown below:
Source: Google Patents
Therefore using an example from this matrix, located at theta between 25 - 30 with a rho of 4 - 4.5, we have found that there are 8 edge points that would be characterized by a line given this rho, theta range pair.
Note that the range of rho is also infinitely many values so you need to not only restrict the range of rho that you have, but you also have to discretize the rho with a sampling interval. The default in MATLAB is 1. Therefore, if you calculate a rho value it will inevitably have floating point values, so you remove the decimal precision to determine the final rho.
For the above example the rho resolution is 0.5, so that means that for example if you calculated a rho value that falls between 2 to 2.5, it falls in the first column. Also note that the theta values are binned in intervals of 5. You traditionally would compute the Hough Transform with a theta sampling interval of 1, then you merge the bins together. However for the defaults of MATLAB, the bin size is 1. This accumulator matrix tells you how many times an edge point fits a particular rho and theta combination. Therefore, if we see many points that get mapped to a particular rho and theta value, this is a great potential for a line to be detected here and that is defined by rho = x*cos(theta) + y*sin(theta).
Why is H(Hough Matrix) of size 45x180? Where does this size come from?
This is a consequence of the previous point. Take note that the largest distance we would expect from the origin to any point in the image is bounded by the diagonal of the image. This makes sense because going from the top left corner to the bottom right corner, or from the bottom left corner to the top right corner would give you the greatest distance expected in the image. In general, this is defined as D = sqrt(rows^2 + cols^2) where rows and cols are the rows and columns of the image.
For the MATLAB defaults, the range of rho is such that it spans from -round(D) to round(D) in steps of 1. Therefore, your rows and columns are both 16, and so D = sqrt(16^2 + 16^2) = 22.45... and so the range of D will span from -22 to 22 and hence this results in 45 unique rho values. Remember that the default resolution of theta goes from [-90, 90) (with steps of 1) resulting in 180 unique angle values. Going with this, we have 45 rows and 180 columns in the accumulator matrix and hence H is 45 x 180.
Why is T of size 1x180? Where does this size come from?
This is an array that tells you all of the angles that were being used in the Hough Transform. This should be an array going from -90 to 89 in steps of 1.
Why is R of size 1x45? Where does this size come from?
This is an array that tells you all of the rho values that were being used in the Hough Transform. This should be an array that spans from -22 to 22 in steps of 1.
What you should take away from this is that each value in H determines how many times we have seen a particular pair of rho and theta such that for R(i) <= rho < R(i + 1) and T(j) <= theta < T(j + 1), where i spans from 1 to 44 and j spans from 1 to 179, this determines how many times we see edge points for a particular range of rho and theta defined previously.
What do the entries in P represent? Are they (x, y) or (ϴ, ρ)?
P is the output of the houghpeaks function. Basically, this determines what the possible lines are by finding where the peaks in the accumulator matrix happen. This gives you the actual physical locations in P where there is a peak. These locations are:
29 162
29 165
28 170
21 5
29 158
Each row gives you a gateway to the rho and theta parameters required to generate the detected line. Specifically, the first line is characterized by rho = R(29) and theta = T(162). The second line is characterized by rho = R(29) and theta = T(165) etc. To answer your question, the values in P are neither (x, y) or (ρ, ϴ). They represent the physical locations in P where cross-referencing R and T, it would give you the parameters to characterize the line that was detected in the image.
Why is the value 5 passed into houghpeaks()?
The extra 5 in houghpeaks returns the total number of lines you'd like to detect ideally. We can see that P is 5 rows, corresponding to 5 lines. If you can't find 5 lines, then MATLAB will return as many lines possible.
What is the logic behind ceil(0.3*max(H(:)))?
The logic behind this is that if you want to determine peaks in the accumulator matrix, you have to define a minimum threshold that would tell you whether the particular rho and theta combination would be considered a valid line. Making this threshold too low would report a lot of false lines and making this threshold too high misses a lot of lines. What they decided to do here was find the largest bin count in the accumulator matrix, take 30% of that, take the mathematical ceiling and any values in the accumulator matrix that are larger than this amount, those would be candidate lines.
Hope this helps!
I have the following image:
The coordinates corresponding to the white blobs in the image are sorted according to the increasing value of x-coordinate. However, I want them to follow the following pattern:
(In a zig-zag manner from bottom left to top left.)
Any clue how can I go about it? Any clue regarding the algorithm will be appreciated.
The set of coordinates are as follows:
[46.5000000000000,104.500000000000]
[57.5000000000000,164.500000000000]
[59.5000000000000,280.500000000000]
[96.5000000000000,66.5000000000000]
[127.500000000000,103.500000000000]
[142.500000000000,34.5000000000000]
[156.500000000000,173.500000000000]
[168.500000000000,68.5000000000000]
[175.500000000000,12.5000000000000]
[198.500000000000,37.5000000000000]
[206.500000000000,103.500000000000]
[216.500000000000,267.500000000000]
[225.500000000000,14.5000000000000]
[234.500000000000,62.5000000000000]
[251.500000000000,166.500000000000]
[258.500000000000,32.5000000000000]
[271.500000000000,13.5000000000000]
[284.500000000000,103.500000000000]
[291.500000000000,61.5000000000000]
[313.500000000000,32.5000000000000]
[318.500000000000,10.5000000000000]
[320.500000000000,267.500000000000]
[352.500000000000,57.5000000000000]
[359.500000000000,102.500000000000]
[360.500000000000,167.500000000000]
[366.500000000000,11.5000000000000]
[366.500000000000,34.5000000000000]
[408.500000000000,9.50000000000000]
[414.500000000000,62.5000000000000]
[419.500000000000,34.5000000000000]
[451.500000000000,12.5000000000000]
[456.500000000000,97.5000000000000]
[457.500000000000,168.500000000000]
[465.500000000000,62.5000000000000]
[465.500000000000,271.500000000000]
[468.500000000000,31.5000000000000]
[498.500000000000,10.5000000000000]
[522.500000000000,105.500000000000]
[524.500000000000,32.5000000000000]
[533.500000000000,60.5000000000000]
[534.500000000000,11.5000000000000]
[565.500000000000,164.500000000000]
[576.500000000000,33.5000000000000]
[581.500000000000,10.5000000000000]
[582.500000000000,67.5000000000000]
[586.500000000000,267.500000000000]
[590.500000000000,102.500000000000]
[622.500000000000,10.5000000000000]
[630.500000000000,32.5000000000000]
[646.500000000000,58.5000000000000]
[653.500000000000,94.5000000000000]
[669.500000000000,8.50000000000000]
[678.500000000000,167.500000000000]
[680.500000000000,31.5000000000000]
[705.500000000000,57.5000000000000]
[719.500000000000,9.50000000000000]
[729.500000000000,271.500000000000]
[732.500000000000,33.5000000000000]
[733.500000000000,97.5000000000000]
[757.500000000000,11.5000000000000]
[758.500000000000,59.5000000000000]
[778.500000000000,157.500000000000]
[792.500000000000,31.5000000000000]
[802.500000000000,10.5000000000000]
[812.500000000000,94.5000000000000]
[834.500000000000,59.5000000000000]
[839.500000000000,30.5000000000000]
[865.500000000000,160.500000000000]
[866.500000000000,272.500000000000]
[885.500000000000,58.5000000000000]
[892.500000000000,97.5000000000000]
[955.500000000000,94.5000000000000]
[963.500000000000,163.500000000000]
[972.500000000000,265.500000000000]
Building upon uSeemSurprised's answer, I would go for a 3-steps approach:
Sort the points list by y-coord. This is O(n log n)
Determine the y-axis ranges. I simply iterate over the points and take note of where the y-coord difference is larger than a threshold value. This is O(n) of course
Sort each of the sublists that represent the y-axis lines by x-coord. If we had m sublists of k items each this would be O(m (k log k)); so the overall process is still O(n log n)
The code:
def zigzag(points, threshold=10.0)
#step 1
points.sort(key=lambda x:x[1])
#step 2
breaks = []
for i in range(1, len(points)):
if points[i][1] - points[i-1][1] > threshold:
breaks.append(i)
breaks.append(i)
#step 3
rev = False
start = 0
outpoints = []
for b in breaks:
outpoints += sorted(points[start:b], reverse = rev)
start = b
rev = not rev
return outpoints
You can sort the x-axis coordinates corresponding to y-axis coordinates, where you consider certain y-axis range, i.e the coordinates that are sorted according to x-axis all belong to the same y-axis range. Each time you move up to a different y-axis range you can flip the sorting order, i.e increasing then decreasing and so on.
The most similar algorithm I can think of is Andrew's algorithm for convex hulls, specifically the lower hull (though depending on the coordinate system, you may need to use the upper hull instead).
Running the lower hull algorithm and removing points until no points remain would get you want. To get the zig-zag patterning, reverse the ordering every other time you run it.
Here is implementations in most languages:
https://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain
Edit: Downside here is precision in the case of fuzzy measurements. You may need to adjust the algorithm a bit if convex hulls aren't exactly what you need. IE: if you want to consider it still part of the hull if it's within say with 0.1 or say 1% of being on the hull or something. In the example given, the coordinates are exactly on the line so it would work well, but not so much so if the coordinates were say randomly distributed within say 0.1 of their actual positions.
This approach assumes you know how many rows you expect, although I suspect there's programmatic ways you could estimate that.
nbins = 6; % Number of horizontal rows we expect
[bin,binC] = kmedoids(A(:,2),nbins); % Use a clustering approach to group them
bin = binC(bin); % Clusters in random order, fix it so that clusters
[~,~,bin] = unique(bin); % are ordered by central y value
xord = A(:,1) .* (-1).^mod(bin+1,2); % flip/flop for each row making the x-coord +ve or -ve
% so that we can sort in a zig-zag
[~,idx] = sortrows([bin,xord], [1,2]); % Sort by the clusters and the zig-zag
B = A( idx, : ); % Create re-ordered array
Plotting this, it seems like what you want
figure(99); clf; hold on;
plot( A(:,1), A(:,2), '-o' );
plot( B(:,1), B(:,2), '-', 'linewidth', 1.5 );
set(gca, 'YDir', 'reverse');
legend( {'Original','Reordered'} );
Use a nearest neighbor search, where you define a custom distance measure which makes distance in the Y direction more expensive than distance in the X direction. Then start the algorithm with the bottom left point.
The "normal" Euclidean distance in Cartesian coordinates is calculated by sqrt( (x2 - x1)^2 + (y2 - y1)^2 )
To make the y direction more expensive, use a custom distance formula where you multiply the y result by a constant:
sqrt( (x2 - x1)^2 + k*(y2 - y1)^2 )
where the constant k is larger than 1 but not much larger, I would start with 2.
I have looked at many articles and answers to questions on how the Viola-Jones algorithm really works. I keep finding the answers saying the "sum of pixels" in a certain region subtracted by the "sum of pixels" in the adjacent region. I'm confused on what "sum of pixels" means. What is the value based on? Is it the number of pixels in the area? The intensity of the color?
Thanks in advance.
These are the definitions based on Viola-Jones paper on 'Robust Real-time Object Detection'
Integral Image: Integral Image(ii) at location x, y = ii(x,y)
ii(x,y) = > Sum of the pixels above and to the left of x, y inclusive
Here 'Sum of Pixels' implies the sum of pixels intensity values ( e.g., for a 8 bit gray scale image, a value between 0 and 255 ) at each pixel element to the above and to the left of pixel (x, y) and including the row/column x and y, considering a gray scale image in the representation.
Significance of the integral image is that it speeds up the computation of the sum of pixel intensities within any rectangular block of pixels. e.g. four array references.
And the integral image value by itself at each point given by ii(x,y) can be computed in one pass over the original image i(x,y)
using the below equations on each point during the pass as detailed in the reference paper:
s(x,y) = s(x,y-1) + i(x,y);
ii(x,y) = ii(x-1,y) + s(x,y);
where
s(x,y) = the cumulative row sum;
s(x,-1) = 0;
ii(-1,y) = 0;
These integral image values are then used to generate features to learn and later detect objects.
The original Viola-Jones algorithm uses "Haar-like" features, which are approximations of first and second Gaussian derivative filters.
Gaussian derivative filters look like this:
Haar-like filters look like this:
The reason Viola and Jones used Haar-like filters, is that they can be evaluated very efficiently. All you have to do is subtract the sum of pixels covered by the black region of the filter from the sum of pixels covered by the white region. And since the regions are rectangular, the sum of the pixels in each region can be efficiently calculated from the corresponding integral image.
I have a list of points moving in two dimensions (x- and y-axis) represented as rows in an array. I might have N points - i.e., N rows:
1 t1 x1 y1
2 t2 x2 y2
.
.
.
N tN xN yN
where ti, xi, and yi, is the time-index, x-coordinate, and the y-coordinate for point i. The time index-index ti is an integer from 1 to T. The number of points at each such possible time index can vary from 0 to N (still with only N points in total).
My goal is the filter out all the points that do not move in a certain way; or to keep only those that do. A point must move in a parabolic trajectory - with decreasing x- and y-coordinate (i.e., moving to the left and downwards only). Points with other dynamic behaviour must be removed.
Can I use a simple sorting mechanism on this array - and then analyse the order of the time-index? I have also considered the fact each point having the same time-index ti are physically distinct points, and so should be paired up with other points. The complexity of the problem grew - and now I turn to you.
NOTE: You can assume that the points are confined to a sub-region of the (x,y)-plane between two parabolic curves. These curves intersect only at only at one point: A point close to the origin of motion for any point.
More Information:
I have made some datafiles available:
MATLAB datafile (1.17 kB)
same data as CSV with semicolon as column separator (2.77 kB)
Necessary context:
The datafile hold one uint32 array with 176 rows and 5 columns. The columns are:
pixel x-coordinate in 175-by-175 lattice
pixel y-coordinate in 175-by-175 lattice
discrete theta angle-index
time index (from 1 to T = 10)
row index for this original sorting
The points "live" in a 175-by-175 pixel-lattice - and again inside the upper quadrant of a circle with radius 175. The points travel on the circle circumference in a counterclockwise rotation to a certain angle theta with horizontal, where they are thrown off into something close to a parabolic orbit. Column 3 holds a discrete index into a list with indices 1 to 45 from 0 to 90 degress (one index thus spans 2 degrees). The theta-angle was originally deduces solely from the points by setting up the trivial equations of motions and solving for the angle. This gives rise to a quasi-symmetric quartic which can be solved in close-form. The actual metric radius of the circle is 0.2 m and the pixel coordinate were converted from pixel-coordinate to metric using simple linear interpolation (but what we see here are the points in original pixel-space).
My problem is that some points are not behaving properly and since I need to statistics on the theta angle, I need to remove the points that certainly do NOT move in a parabolic trajoctory. These error are expected and fully natural, but still need to be filtered out.
MATLAB plot code:
% load data and setup variables:
load mat_points.mat;
num_r = 175;
num_T = 10;
num_gridN = 20;
% begin plotting:
figure(1000);
clf;
plot( ...
num_r * cos(0:0.1:pi/2), ...
num_r * sin(0:0.1:pi/2), ...
'Color', 'k', ...
'LineWidth', 2 ...
);
axis equal;
xlim([0 num_r]);
ylim([0 num_r]);
hold all;
% setup grid (yea... went crazy with one):
vec_tickValues = linspace(0, num_r, num_gridN);
cell_tickLabels = repmat({''}, size(vec_tickValues));
cell_tickLabels{1} = sprintf('%u', vec_tickValues(1));
cell_tickLabels{end} = sprintf('%u', vec_tickValues(end));
set(gca, 'XTick', vec_tickValues);
set(gca, 'XTickLabel', cell_tickLabels);
set(gca, 'YTick', vec_tickValues);
set(gca, 'YTickLabel', cell_tickLabels);
set(gca, 'GridLineStyle', '-');
grid on;
% plot points per timeindex (with increasing brightness):
vec_grayIndex = linspace(0,0.9,num_T);
for num_kt = 1:num_T
vec_xCoords = mat_points((mat_points(:,4) == num_kt), 1);
vec_yCoords = mat_points((mat_points(:,4) == num_kt), 2);
plot(vec_xCoords, vec_yCoords, 'o', ...
'MarkerEdgeColor', 'k', ...
'MarkerFaceColor', vec_grayIndex(num_kt) * ones(1,3) ...
);
end
Thanks :)
Why, it looks almost as if you're simulating a radar tracking debris from the collision of two missiles...
Anyway, let's coin a new term: object. Objects are moving along parabolae and at certain times they may emit flashes that appear as points. There are also other points which we are trying to filter out.
We will need some more information:
Can we assume that the objects obey the physics of things falling under gravity?
Must every object emit a point at every timestep during its lifetime?
Speaking of lifetime, do all objects begin at the same time? Can some expire before others?
How precise is the data? Is it exact? Is there a measure of error? To put it another way, do we understand how poorly the points from an object might fit a perfect parabola?
Sort the data with (index,time) as keys and for all locations of a point i see if they follow parabolic trajectory?
Which part are you facing problem? Sorting should be very easy. IMHO, it is the second part (testing if a set of points follow parabolic trajectory) that is difficult.
Here's the problem: I have a number of binary images composed by traces of different thickness. Below there are two images to illustrate the problem:
First Image - size: 711 x 643 px
Second Image - size: 930 x 951 px
What I need is to measure the average thickness (in pixels) of the traces in the images. In fact, the average thickness of traces in an image is a somewhat subjective measure. So, what I need is a measure that have some correlation with the radius of the trace, as indicated in the figure below:
Notes
Since the measure doesn't need to be very precise, I am willing to trade precision for speed. In other words, speed is an important factor to the solution of this problem.
There might be intersections in the traces.
The trace thickness might not be constant, but an average measure is OK (even the maximum trace thickness is acceptable).
The trace will always be much longer than it is wide.
I'd suggest this algorithm:
Apply a distance transformation to the image, so that all background pixels are set to 0, all foreground pixels are set to the distance from the background
Find the local maxima in the distance transformed image. These are points in the middle of the lines. Put their pixel values (i.e. distances from the background) image into a list
Calculate the median or average of that list
I was impressed by #nikie's answer, and gave it a try ...
I simplified the algorithm for just getting the maximum value, not the mean, so evading the local maxima detection algorithm. I think this is enough if the stroke is well-behaved (although for self intersecting lines it may not be accurate).
The program in Mathematica is:
m = Import["http://imgur.com/3Zs7m.png"] (* Get image from web*)
s = Abs[ImageData[m] - 1]; (* Invert colors to detect background *)
k = DistanceTransform[Image[s]] (* White Pxs converted to distance to black*)
k // ImageAdjust (* Show the image *)
Max[ImageData[k]] (* Get the max stroke width *)
The generated result is
The numerical value (28.46 px X 2) fits pretty well my measurement of 56 px (Although your value is 100px :* )
Edit - Implemented the full algorithm
Well ... sort of ... instead of searching the local maxima, finding the fixed point of the distance transformation. Almost, but not quite completely unlike the same thing :)
m = Import["http://imgur.com/3Zs7m.png"]; (*Get image from web*)
s = Abs[ImageData[m] - 1]; (*Invert colors to detect background*)
k = DistanceTransform[Image[s]]; (*White Pxs converted to distance to black*)
Print["Distance to Background*"]
k // ImageAdjust (*Show the image*)
Print["Local Maxima"]
weights =
Binarize[FixedPoint[ImageAdjust#DistanceTransform[Image[#], .4] &,s]]
Print["Stroke Width =",
2 Mean[Select[Flatten[ImageData[k]] Flatten[ImageData[weights]], # != 0 &]]]
As you may see, the result is very similar to the previous one, obtained with the simplified algorithm.
From Here. A simple method!
3.1 Estimating Pen Width
The pen thickness may be readily estimated from the area A and perimeter length L of the foreground
T = A/(L/2)
In essence, we have reshaped the foreground into a rectangle and measured the length of the longest side. Stronger modelling of the pen, for instance, as a disc yielding circular ends, might allow greater precision, but rasterisation error would compromise the signicance.
While precision is not a major issue, we do need to consider bias and singularities.
We should therefore calculate area A and perimeter length L using functions which take into account "roundedness".
In MATLAB
A = bwarea(.)
L = bwarea(bwperim(.; 8))
Since I don't have MATLAB at hand, I made a small program in Mathematica:
m = Binarize[Import["http://imgur.com/3Zs7m.png"]] (* Get Image *)
k = Binarize[MorphologicalPerimeter[m]] (* Get Perimeter *)
p = N[2 Count[ImageData[m], Except[1], 2]/
Count[ImageData[k], Except[0], 2]] (* Calculate *)
The output is 36 Px ...
Perimeter image follows
HTH!
Its been a 3 years since the question was asked :)
following the procedure of #nikie, here is a matlab implementation of the stroke width.
clc;
clear;
close all;
I = imread('3Zs7m.png');
X = im2bw(I,0.8);
subplottight(2,2,1);
imshow(X);
Dist=bwdist(X);
subplottight(2,2,2);
imshow(Dist,[]);
RegionMax=imregionalmax(Dist);
[x, y] = find(RegionMax ~= 0);
subplottight(2,2,3);
imshow(RegionMax);
List(1:size(x))=0;
for i = 1:size(x)
List(i)=Dist(x(i),y(i));
end
fprintf('Stroke Width = %u \n',mean(List));
Assuming that the trace has constant thickness, is much longer than it is wide, is not too strongly curved and has no intersections / crossings, I suggest an edge detection algorithm which also determines the direction of the edge, then a rise/fall detector with some trigonometry and a minimization algorithm. This gives you the minimal thickness across a relatively straight part of the curve.
I guess the error to be up to 25%.
First use an edge detector that gives us the information where an edge is and which direction (in 45° or PI/4 steps) it has. This is done by filtering with 4 different 3x3 matrices (Example).
Usually I'd say it's enough to scan the image horizontally, though you could also scan vertically or diagonally.
Assuming line-by-line (horizontal) scanning, once we find an edge, we check if it's a rise (going from background to trace color) or a fall (to background). If the edge's direction is at a right angle to the direction of scanning, skip it.
If you found one rise and one fall with the correct directions and without any disturbance in between, measure the distance from the rise to the fall. If the direction is diagonal, multiply by squareroot of 2. Store this measure together with the coordinate data.
The algorithm must then search along an edge (can't find a web resource on that right now) for neighboring (by their coordinates) measurements. If there is a local minimum with a padding of maybe 4 to 5 size units to each side (a value to play with - larger: less information, smaller: more noise), this measure qualifies as a candidate. This is to ensure that the ends of the trail or a section bent too much are not taken into account.
The minimum of that would be the measurement. Plausibility check: If the trace is not too tangled, there should be a lot of values in that area.
Please comment if there are more questions. :-)
Here is an answer that works in any computer language without the need of special functions...
Basic idea: Try to fit a circle into the black areas of the image. If you can, try with a bigger circle.
Algorithm:
set image background = 0 and trace = 1
initialize array result[]
set minimalExpectedWidth
set w = minimalExpectedWidth
loop
set counter = 0
create a matrix of zeros size w x w
within a circle of diameter w in that matrix, put ones
calculate area of the circle (= PI * w)
loop through all pixels of the image
optimization: if current pixel is of background color -> continue loop
multiply the matrix with the image at each pixel (e.g. filtering the image with that matrix)
(you can do this using the current x and y position and a double for loop from 0 to w)
take the sum of the result of each multiplication
if the sum equals the calculated circle's area, increment counter by one
store in result[w - minimalExpectedWidth]
increment w by one
optimization: include algorithm from further down here
while counter is greater zero
Now the result array contains the number of matches for each tested width.
Graph it to have a look at it.
For a width of one this will be equal to the number of pixels of trace color. For a greater width value less circle areas will fit into the trace. The result array will thus steadily decrease until there is a sudden drop. This is because the filter matrix with the circular area of that width now only fits into intersections.
Right before the drop is the width of your trace. If the width is not constant, the drop will not be that sudden.
I don't have MATLAB here for testing and don't know for sure about a function to detect this sudden drop, but we do know that the decrease is continuous, so I'd take the maximum of the second derivative of the (zero-based) result array like this
Algorithm:
set maximum = 0
set widthFound = 0
set minimalExpectedWidth as above
set prevvalue = result[0]
set index = 1
set prevFirstDerivative = result[1] - prevvalue
loop until index is greater result length
firstDerivative = result[index] - prevvalue
set secondDerivative = firstDerivative - prevFirstDerivative
if secondDerivative > maximum or secondDerivative < maximum * -1
maximum = secondDerivative
widthFound = index + minimalExpectedWidth
prevFirstDerivative = firstDerivative
prevvalue = result[index]
increment index by one
return widthFound
Now widthFound is the trace width for which (in relation to width + 1) many more matches were found.
I know that this is in part covered in some of the other answers, but my description is pretty much straightforward and you don't have to have learned image processing to do it.
I have interesting solution:
Do edge detection, for edge pixels extraction.
Do physical simulation - consider edge pixels as positively charged particles.
Now put some number of free positively charged particles in the stroke area.
Calculate electrical force equations for determining movement of these free particles.
Simulate particles movement for some time until particles reach position equilibrium.
(As they will repel from both stoke edges after some time they will stay in the middle line of stoke)
Now stroke thickness/2 would be average distance from edge particle to nearest free particle.