From an PNG Image containing a transparent region and a colored region, I would like to generate a polygon with N sides (N being configurable) approximating the best possible edge of the image. I want this polygon to be defined by a series of vector.
For example, let consider the following image: + link to plus. I can manage to detect the edges of the image by counting, for each pixel, the number of transparent pixels around it. I get the following matrix:
0000000000000000
0000053335000000
0000030003000000
0000030003000000
0000020002000000
0533210001233500
0300000000000300
0300000000000300
0300000000000300
0533210001233500
0000020002000000
0000030003000000
0000030003000000
0000053335000000
0000000000000000
0000000000000000
I think, based on this matrix, I should be able to get the coordinate of all the corners and therefore get the vectors, but I cannot figure out how. In this case, I would like my program to return:
[7,2]->[11,2]
[11,2]->[11,6]
[11,6]->[15,6]
...
Do any of you have a suggestion or a link to do that?
Ultimately, I would also like the approximate angle other than 90 and 0, but that's really for a second stage.
I think you will find a number of tools in the CV toolkit can be of use to you. You'll do best to leverage these resources rather than roll your own solution.
The two features I think you'd be interested in extracting are edges and corners.
Edges, like what you were going for, can get you toward the outline of the shape. What you're probably not interested in right now are Edge Detection techniques. These will transform your image into a binary image of edge/space. Instead, you'll want to look into the Hough Transform which can give you end points for each of the lines in your image. If you are dealing with well defined, solid, straight lines as you seem to be, this should work quite well. You've tagged your question as Ruby so maybe you can take a look into OpenCV (OpenCV is written in C but there are ruby-opencv and javacv projects to bind). Here is the Hough Transform documentation for OpenCV. One thing you may find, however, is that the Hough transform doesn't give you lines which connect. This depends on the regularity/irregularity of the actual lines in your image. Because of this, you may need to manually connect the end points of the lines into a structure.
Corners, may work quite well for images such as the one you provided. The standard algorithm is Harris corner detection. Similar to the Hough transform, you can use this technique to return the 'most significant' features in the image. This technique is known for giving consistent results, even for different images of the same thing. As such, it's often used for pattern recognition and the like. However, if your images are as simple as the one provided, you may well be able to extract all of the shape's corners in this manner. Getting the shape of the image would then just be a matter of connecting the points in a meaningful way given your predefined N sides.
You should definitely play with both of these feature spaces and see how they work, and you could probably use both in concert for better results.
As an aside, if your image really is color/intensity on transparent you can convert your image to a 'binary image'. Note that this is not just binary data. Instead, it means you are only representing two colors, one represented by 0 and the other represented by 1. Doing so opens up a whole suite of tools that work on grayscale and binary images. For example, the matrix of numbers you calculated manually above is known as a distance transformation and can be done quite easily and efficiently using tools like OpenCV.
The Hough transform is a standard technique for finding lines, polygons, and other shapes given a set of points. It might exactly what you're looking for here. You could use the Hough transform to find all possible line segments in the image, then group nearby line segments together to get a set of polygons approximating the image.
Hope this helps!
In such a simple situation you can do the following three steps: find the centroid of your shape, sort the points of interest based on the angle between the x axis and the line formed by the current point and the centroid, walk through the sorted points.
Given the situation, the x coordinate of the centroid is the sum of the x coordinates of each point of interest divided by the total number of points of interest (respectively for the y coord of centroid). To calculate the angles, it is a simple matter of using atan2 available in virtually any language. Your points of interest are those that are either presented as 1 or 5, otherwise it is not a corner (based on your input).
Do not be fooled that Hough will solve your question, i.e., it won't give the sorted coordinates you are after. It is also an expensive method. Also, given your matrix, you already have such perfect information that no other method will beat (the problem, of course, is repeating such good result as you presented -- in those occasions, Hough might prove useful).
My Ruby is quite bad, so take the following code as a guideline to your problem:
include Math
data = ["0000000000000000",
"0000053335000000",
"0000030003000000",
"0000030003000000",
"0000020002000000",
"0533210001233500",
"0300000000000300",
"0300000000000300",
"0300000000000300",
"0533210001233500",
"0000020002000000",
"0000030003000000",
"0000030003000000",
"0000053335000000",
"0000000000000000",
"0000000000000000"]
corner_x = []
corner_y = []
data.each_with_index{|line, i|
line.split(//).each_with_index{|col, j|
if col == "1" || col == "5"
# Cartesian coords.
corner_x.push(j + 1)
corner_y.push(data.length - i)
end
}
}
centroid_y = corner_y.reduce(:+)/corner_y.length.to_f
centroid_x = corner_x.reduce(:+)/corner_x.length.to_f
corner = []
corner_x.zip(corner_y).each{|c|
dy = c[1] - centroid_y
dx = c[0] - centroid_x
theta = Math.atan2(dy, dx)
corner.push([theta, c])
}
corner.sort!
corner.each_cons(2) {|c|
puts "%s->%s" % [c[0][1].inspect, c[1][1].inspect]
}
This results in:
[2, 7]->[6, 7]
[6, 7]->[6, 3]
[6, 3]->[10, 3]
[10, 3]->[10, 7]
[10, 7]->[14, 7]
[14, 7]->[14, 11]
[14, 11]->[10, 11]
[10, 11]->[10, 15]
[10, 15]->[6, 15]
[6, 15]->[6, 11]
[6, 11]->[2, 11]
Which are your vertices in anti-clock-wise order starting with the bottom leftmost point (in cartesian coords starting in (1, 1) at left-bottom most position).
Related
I am using matlab's built in function called Procrustes to see the rotation translation and scale between two images. But, I am just using coordinates of the brightest points in the image and rotating these coordinates about the center of the image. Procrustes compares two matrices and gives you the rotation, translation, and scale. However, procrustes only works correctly if the matrices are in the same order for comparison.
I am given an image and a separate comparison coordinate matrix. The end goal is to find how much the image has been rotated, translated, and scaled compared to the coordinate matrix. I can just use Procrustes for this, but I need to correctly order the coordinates found from the image to match the order in the comparison coordinate matrix. My thought was to compare the distance between every possible combination of points in the coordinate matrix and compare it to the coordinates that I find in the picture. I just do not know how to write this code due to the fact if there is n coordinates, there will be n! possible combinations.
Just searching for the shortest distance is not so hard.
A = rand(1E4,2);
B = rand(1E4,2);
tic
idx = nan(1,1E4);
for ct = 1:size(A,1)
d = sum((A(ct,:)-B).^2,2);
idx(ct) = find(d==min(d));
end
toc
plot(A(1:10,1),A(1:10,2),'.r',B(idx(1:10),1),B(idx(1:10),2),'.b')
takes half a second on my PC.
The problems can start when two points in set A are matched to the same location in set B.
length(unique(idx))==length(idx)
This can be solved in several ways. The best (imho) is to determine a probability that point B matches with point A based on the distance (usually something that decreases exponentially), and solve for the most probable situation.
A simpler method (but more error prone) is to remove the matched point from set B.
Say that my images are simple shapes - set of lines, dots, curves, and simple objects,
How do I calculate the distance between images - so length is important but total scale is non important, location of line\curve is important, angles is important etc
Attached image For example:
My comparison object is a cube on the top left, score are fictitious just for this example.
that the distance to the Cylinder is 80 (has 2 lines but top geometry is different)
The bottom left cube score is 100 since it exact match lines with different scale.
The bottom right Rectangle score is 90 since it has exact match lines on the top but different scale lines on the side.
I am looking for algorithm name or general approach that will help me to start to think towards a solution....
Thank you for your help.
Here is something to get you started. When jumping into new problems, I don't see much value in trying a lot of complex steps just because they are available somewhere to use. So my focus is on using relatively simple things, that will fail in more varied situations, but hopefully you will see its value and get some sense of the problem.
The approach is fully based on corner detection; two typical methods for this detection are the Harris detector or the one by Shi and Tomasi described in the paper "Good Features to Track", 1994. I will use the second one, just because there is a ready implementation in OpenCV, newer Matlab, and possibly many other places. Its implementation on these packages also allows for easier parameter adjustment, regarding corner quality and minimum distance between corners. So, suppose you can detect all corner points correctly, how do you measure how close one shape is to another one based on these points ? The images have arbitrary size, so my idea is to normalize the point coordinates to the range [0, 1]. This solves for the scaling issue, which is desired according to the original description. Now we have to compare point sets in the range [0, 1]. Here we go for the simplest thing: consider one point p from the shape a, what is the closest point in shape b ? We assume it is one with the minimum absolute different between this point p and any point in b. If we sum all the values, we get a scoring between shapes. The lower the score, the more similar the shapes (according to this approach).
Here are some shapes I drew:
Here are the detected corners:
As you can clearly see in this last set of images, the method will easily confuse a rectangle/square with a cylinder. To handle that you will need to combine the approach with other descriptors. Initially, a simple one that you might consider is the ratio between the shape's area and its bounding box area (which would give 1 for rectangle, and lower for cylinder).
With the method described above, here are the measurements between the first and second shapes, first and third shapes, ..., respectively: 0.02358485, 0.41350339, 0.30128458 0.4980852, 0.18031262. The second cube is a resized version of the first one, and as you see, they are very similar by this metric. The last shape is a resized version of the first cube but without keeping the aspect ratio, and the metric gives a much higher difference.
If you want to play with the code that performs this, here it is (in Python, depends on OpenCV, numpy):
import sys
import cv2 as cv
import numpy
inp = []
for fname in sys.argv[1:]:
img_color = cv.imread(fname)
img = cv.cvtColor(img_color, cv.COLOR_RGB2GRAY)
inp.append((img_color, img))
ptsets = []
# Corner detection parameters.
params = (
200, # max number of corners
0.01, # minimum quality level of corners
10, # minimum distance between corners
)
# Params for visual circle markers.
circle_radii = 3
circle_color = (255, 0, 0)
for i, (img_color, img) in enumerate(inp):
height, width = img.shape
cornerMap = cv.goodFeaturesToTrack(img, *params)
corner = numpy.array([c[0] for c in cornerMap])
for c in corner:
cv.circle(img_color, tuple(c), circle_radii, circle_color, -1)
# Just to visually check for correct corners.
cv.imwrite('temp_%d.png' % i, img_color)
# Convert corner coordinates to [0, 1]
cornerUnity = (corner - corner.min()) / (corner.max() - corner.min())
# You might want to use other descriptors here. XXX
ptsets.append(cornerUnity)
def compare_ptsets(p):
res = numpy.zeros(len(p))
base = p[0]
for i in xrange(1, len(p)):
sum_min_diff = sum(numpy.abs(p[i] - value).min() for value in base)
res[i] = sum_min_diff
return res
res = compare_ptsets(ptsets)
print res
The process to be followed depends on what depth of features you are going to consider and accuracy required.
If you want something more accurate, search some technical papers like this which can give a concrete and well-proven approach or algorithm.
EDIT:
The idea from Waltz algorithm (one method in AI) can be tweaked. This is just my thought. Interpret the original image, generate some constraints out of it. For each candidate, find out the number of constraints it satisfies. The one which satisfies more constraints will be the most similar to the original image.
Try to calculate mass center for each figure. Treat each point of figure as particle with mass equal 1.
Then calculate each distance as sqrt((x1-x2)^2 + (y1-y2)^2), where (xi, yi) is mass center coordinate for figure i.
I have a list of points that I want to draw a smooth line between. I am using the RVG library for drawing so if i could get a SVG string from my points I would be happy. Searched around and found that Catmull-Rom probably is the algorithm to use.
Found some implementations in the Kamelopard and Rubyvis libraries, but couldn't understand how to use them from my list of points.
So, the question is, how can I take my array of (x,y) points and get a Catmull-Rom interpolated SVG curve from them?
Catmull-Rom is probably a good place to start. I recently re-implemented the Kamelopard version, and found this helpful: http://www.cs.cmu.edu/~462/projects/assn2/assn2/catmullRom.pdf
It's fairly straightforward, provided you understand the matrix multiplication. You'll end up with a matrix equation you'll need to evaluate a bunch of times, once per point on the path you're drawing. If you have control points A, B, C, and D, and you want to draw the curve between B and C, make a matrix where A, B, C, and D are the rows, and plug it into the equation at the top of the paper I linked to. It will be the last matrix in the list. The other values you'll need to know are "u", which ranges from 0 to 1, and "T", the "tension" of the spline. You'll evaluate the equation multiple times, incrementing u across its domain each time. You can set the tension to whatever you want, between 0 and 1, and it will affect how sharply the spline curves. 0.5 is a common value.
If you're trying to evaluate the curve between, for instance, the first two control points on your list, or the last two, you'll find you have problems making your matrix, because you need the two control points on either side of the point you're evaluating. In these cases, just duplicate the first or last control point, as necessary.
I am new to matlab, so forgive me if i am asking for the obvious here: what i have is a collection of color photographic images (all the same dimensions). What i want to do is calculate the median color value for each pixel.
I know there is a median filter in matlab, but as far as i know it does not do exactly what i want. Because i want to calculate the median value between the entire collection of images, for each separate pixel.
So for example, if i have three images, i want matlab to calculate (for each pixel) which colorvalue out of those three images is the median value. How would i go about doing this, does anyone know?
Edit: From what i can come up with, i would have to load all the images into a single matrix. The matrix would have to have 4 dimensions (height, width, rgb, images), and for each pixel and each color find the median in the 4th dimension (between the images).
Is that correct (and possible)? And how can i do this?
Your intuition is correct. If you have images image_1, image_2, image_3, for example, you can assign them to a 4 dimensional matrix:
X(:,:,:,1) = image_1;
X(:,:,:,2) = image_2;
X(:,:,:,3) = image_3;
Then use:
Y=median(X,4);
To get the median.
Expanding my comments into a full answer;
#prototoast's answer is elegant, but since medians for the R, G and B values of each pixel are calculated separately, the output image will look very strange.
To get a well-defined median that makes visual sense, the easiest thing to do is cast the images to black-and-white before you try to take the median.
rgb2gray() from the Image Processing toolbox will do this in a way that preserves the luminance of each pixel while discarding the hue and saturation.
EDIT:
If you want to define the "RGB median" as "the middle value in cartesian coordinates" this is easy enough to do for three images.
Consider a single pixel with three possible choices for the median colour, C1=(r1,g1,b1), C2=(r2,g2,b2), C3=(r3,g3,b3). Generally these form a triangle in 3D space.
Take the Pythagorean distance between the three colours: D1_2=abs(C2-C1), D2_3=abs(C3-C2), D1_3=abs(C3-C1).
Pick the "median" to be the colour that has lowest distance to the other two. Defining D1=D1_2+D1_3, etc. and taking min(D1,D2,D3) should work, courtesy of the Triangle Inequality. Note the degenerate cases: equilateral triangle (C1, C2, C3 equidistant), line (C1, C2, C3 linear with each other), or point (C1=C2=C3).
Note that this simple way of thinking about a 3D median is hard to extend to more than three images, because "the median" of a set of four or more 3D points is a bit harder to define.
Edit 2
For defining the "median" of N points as the centre of the smallest sphere that encloses them in 3D space, you could try:
Find the two points N1 and N2 in {N} that are furthest apart. The distance between N1 and N2 is the diameter of the smallest sphere that encloses all the points. (Proof: Any smaller and the sphere would not be able to enclose both N1 and N2 at the same time.)
The median is then halfway between N1 and N2: M = (N1+N2)/2.
Edit 3: The above only works if no three points are equidistant. Maybe you need to ask math.stackexchange.com?
Edit 4: Wikipedia delivers again! Smallest circle problem, Bounding sphere.
So first of all I have such image (and ofcourse I have all points coordinates in 2d so I can regenerate lines and check where they cross each other)
(source: narod.ru)
But hey, I have another Image of same lines (I know thay are same) and new coords of my points like on this image
(source: narod.ru)
So... now Having points (coords) on first image, How can I determin plane rotation and Z depth on second image (asuming first one's center was in point (0,0,0) with no rotation)?
What you're trying to find is called a projection matrix. Determining precise inverse projection usually requires that you have firmly established coordinates in both source and destination vectors, which the images above aren't going to give you. You can approximate using pixel positions, however.
This thread will give you a basic walkthrough of the techniques you need to use.
Let me say this up front: this problem is hard. There is a reason Dan Story's linked question has not been answered. Let provide an explanation for people who want to take a stab at it. I hope I'm wrong about how hard it is, though.
I will assume that the 2D screen coordinates and projection/perspective matrix is known to you. You need to know at least this much (if you don't know the projection matrix, essentially you are using a different camera to look at the world). Let's call each pair of 2D screen coordinates (a_i, b_i), and I will assume the projection matrix is of the form
P = [ px 0 0 0 ]
[ 0 py 0 0 ]
[ 0 0 pz pw]
[ 0 0 s 0 ], s = +/-1
Almost any reasonable projection has this form. Working through the rendering pipeline, you find that
a_i = px x_i / (s z_i)
b_i = py y_i / (s z_i)
where (x_i, y_i, z_i) are the original 3D coordinates of the point.
Now, let's assume you know your shape in a set of canonical coordinates (whatever you want), so that the vertices is (x0_i, y0_i, z0_i). We can arrange these as columns of a matrix C. The actual coordinates of the shape are a rigid transformation of these coordinates. Let's similarly organize the actual coordinates as columns of a matrix V. Then these are related by
V = R C + v 1^T (*)
where 1^T is a row vector of ones with the right length, R is an orthogonal rotation matrix of the rigid transformation, and v is the offset vector of the transformation.
Now, you have an expression for each column of V from above: the first column is { s a_1 z_1 / px, s b_1 z_1 / py, z_1 } and so on.
You must solve the set of equations (*) for the set of scalars z_i, and the rigid transformation defined R and v.
Difficulties
The equation is nonlinear in the unknowns, involving quotients of R and z_i
We have assumed up to now that you know which 2D coordinates correspond to which vertices of the original shape (if your shape is a square, this is slightly less of a problem).
We assume there is even a solution at all; if there are errors in the 2D data, then it's hard to say how well equation (*) will be satisfied; the transformation will be nonrigid or nonlinear.
It's called (digital) photogrammetry. Start Googling.
If you are really interested in this kind of problems (which are common in computer vision, tracking objects with cameras etc.), the following book contains a detailed treatment:
Ma, Soatto, Kosecka, Sastry, An Invitation to 3-D Vision, Springer 2004.
Beware: this is an advanced engineering text, and uses many techniques which are mathematical in nature. Skim through the sample chapters featured on the book's web page to get an idea.