Been reading this paper:
http://photon07.pd.infn.it:5210/users/dazzi/Thesis_doctorate/Info/Chapter_6/Stereoscopy_(Mrovlje).pdf
to figure out how to use 2 parallel camera to find the depth of an object. Seems like some how we need the field of view of the camera at exact plane (which is the depth which the cameras try to measure anyway) to get the depth.
Am I interpreting this wrong? Or anyone else knows how does one use a pair of camera to measure distance of an object from the camera pair?
Kelvin
Camera sensors either have to lie on the same plane or their images has to be rectified so that 'virtually' they lie in the same plane. This is the only requirement and it simplifies the search for matches between the left and right image: whatever you have in the left image will be located in the right at the same row so you don't need to check other rows. You can skip this requirement but then your search will be more extensive. When you done with finding correspondences you can figure out the depth from them.
In rectified camera, the depth is determined from the shift: for example if the left image has a feature in row 4, column 11 and the left image has this feature in row 4 (same row since camera was rectified) column 1 then we say that disparity is 11-1=10. The disparity D is inversely proportional to dept Z:
Z=fB/D , where B is distance between cameras.
At the end you will have depth estimates everywhere where you found correspondences. So called dense stereo aims to get more than 90% of image area covered where sparse stereo recovers only a few depth measurements.
Note that it is hard to find correspondences if there is a little texture on the surface of the object or in other words it is uniformly colored. Some cameras such as Kinect project their own pattern on the objects to solve the problem of feature absence.
Related
In short what i want is to follow a line like the guy in this video:
https://www.youtube.com/watch?v=iHiasfAE63k
In the comment section he said:
"I get the array of values from my ROI, one per degree. I then set a threshold level and turn these values into binary values. In my direction of travel I set a region (30 degrees) either side of that point and disregard anything outside of it. I find the centre value of the binary array representing line. This gives me an angle to drive to. I then turn this into a x and y velocity using sin cos functions. I feed the x any velocity into the stepper drives and the camera closes the loop."
And i didn't understand a thing from his explanation? can you guys shed some light on this algorithm?
What I get from the info is that this guy is using a binarization algorithm to only have an image with pure black and pure white, then you can represent each pixel using only a bit (0 for black, 1 for white; and sometimes in the most common way using a byte 0 for black, 255 for white). So you need to apply this algorithm first to generate an image to work with.
After that this guy chose a direction of travel and search for that pixels that are in this direction and belongs to a region of interest of +/- 15 grades of the traveling direction. He obtains the points that are in this direction and compute a value which tells him where to go and translates this value to the movement of the motors.
i have a picture that captured from a fixed position [X Y Z] and angle [Pitch Yaw Roll] and a focal length of F (i think this information is called camera matrix)
i want to change the captured picture to a different position like it was taken in up position
the result image should be like:
in fact i have picture taken from this position:
and i want to change my picture in a way that it was taken in this position:
i hope that i could express my problem.
thnx in advance
It can be done accurately only for the (green) plane itself. The 3D objects standing onto the plane will be deformed after remapping, but the deformation may be acceptable if their height is small relative to the camera distance.
If the camera is never moving, all you need to do is identify on the perspective image four points that are the four vertices of a rectangle of known size (e.g. the soccer field itself), then compute the homography that maps those four points to that rectangle, and apply it to the whole image.
For details and code, see the OpenCV links at the bottom of that Wikipedia article.
Suppose I have an image of a scene as depicted above. A sort of a pole with a blob on it next to possibly similar objects with no blobs.
How can I find the blob marked by the red circle (a binary image indicating which pixels belong to the blob).
Note that the pole together with the blob may be rotated arbitrarily and also size may vary.
Can you try to do it in below 4 steps?
Circle detection like: writing robust (color and size invariant) circle detection with opencv (based on Hough transform or other features)
Line detection, like: Finding location of rectangles in an image with OpenCV
Identify rectangle position by combining neighboring lines (For each line segment you have the start and end point position, you also know the direction of each line segment. So that you can figure out if two connecting line segments (whose endpoints are close) are orthogonal. Your goal is to find 3 such segments for each rectangle.)
Check the relative position of each circle and rectangle to see if any pair can form the knob shape.
One approach could be using Viola-Jones object detection framework.
Though the framework is mostly used for face detection - it is actually designed for generic objects you feed to the algorithm.
The algorithm basic idea is to feed samples of "good object" (what you are looking for) and "bad objects" to a machine learning algorithm - which generates patterns from the images as its features.
During Classification - using a sliding window the algorithm will search for a "match" to the object (the classifier returned a positive answer).
The algorithm uses supervised learning and thus requires a labeled set of examples (both positive and negative ones)
I'm sure there is some boundary-map algorithm in image processing to do this.
Otherwise, here is a quick fix: pick a pixel at the center of the
"undiscovered zone", which initially is the whole image.
trace the horizantal and vertical lines at 4 directions each ending at the
borders of the zone and find the value changes from 0 to 1 or the vice verse.
Trace each such value switch and complete the boundary of each figure (Step-A).
Do the same for the zones
that still are undiscovered: start at some center
point and skim thru the lines connecting the center to the image border or to a
pixel at the boundary of a known zone.
In Step-A, you can also check to see whether the boundary you traced is
a line or a curve. Whenever it is a curve, you need only two points on it--
points at some distance from one another for the accuracy of the calculation.
The lines perpendicular each to these two points of tangency
intersect at the center of the circle red in your figure.
You can segment the image. Then use only the pixels in the segments to contribute to a Hough-transform to find the circles.
Then you will only have segments with circle in them. You can use a modified hough transform to find rectangles. The 'best' rectangle and square combination will then be your match. This is very computationally intentsive.
Another approach, if you already have these binary pictures, is to transform to a (for example 256 bin) sample by taking the distance to the centroid compared to the distance travelled along the edge. If you start at the point furthest away from the centroid you have a fairly rotational robust featurevector.
I am trying to implement the method of Dalal and Triggs. I could implement the first stage compute gradients on an image, and I could create the code who walk across the image in cells, but I don't understand the logic behind this stage.
I know is necessary identify first between a signed (0-360 degrees) or unsigned (0-180 degrees) gradients.
I know I must create a data structure to store each cell histogram, whit n bins. I know what is a histogram, hence I understand I must visit each pixel, but I I don't fully understand about the method for classify each pixel, get the gradient orientation of this pixel and build the histogram with this data.
In short HOG is nothing but a dense representation of gradient orientations weighted by their strengths over a overlapped local neighbourhoods.
You asked what is the significance of finding each pixel gradient orientation. In an image the gradient orientation at each pixel indicates the direction of the boundary(edge between two textures) of the object at that location with respect to X and Y axis. So if you group the orientations of a patch or block or part of an object it represents the distribution of edge directions of object at that region in a very strong way or unique way... Now let us take a simple example, a circle if you plot the gradient orientations of a circle as a histogram you will get a straight line (Don't imagine HOG just a simple plot of gradient orientations) because the orientations of edges of circle ranges from 0 degrees to 360 degrees if u sampled at 360 consecutive locations, For a different object it is different, HOG also do the same thing but in a more sophisticated manner by dividing image into overlapping blocks and dividing each block into cells and making the histogram weighted by the strengths of the local gradients...
Hope it is useful ...
Where can I find algorithms for image distortions? There are so much info of Blur and other classic algorithms but so little of more complex ones. In particular, I am interested in swirl effect image distortion algorithm.
I can't find any references, but I can give a basic idea of how distortion effects work.
The key to the distortion is a function which takes two coordinates (x,y) in the distorted image, and transforms them to coordinates (u,v) in the original image. This specifies the inverse function of the distortion, since it takes the distorted image back to the original image
To generate the distorted image, one loops over x and y, calculates the point (u,v) from (x,y) using the inverse distortion function, and sets the colour components at (x,y) to be the same as those at (u,v) in the original image. One ususally uses interpolation (e.g. http://en.wikipedia.org/wiki/Bilinear_interpolation ) to determine the colour at (u,v), since (u,v) usually does not lie exactly on the centre of a pixel, but rather at some fractional point between pixels.
A swirl is essentially a rotation, where the angle of rotation is dependent on the distance from the centre of the image. An example would be:
a = amount of rotation
b = size of effect
angle = a*exp(-(x*x+y*y)/(b*b))
u = cos(angle)*x + sin(angle)*y
v = -sin(angle)*x + cos(angle)*y
Here, I assume for simplicity that the centre of the swirl is at (0,0). The swirl can be put anywhere by subtracting the swirl position coordinates from x and y before the distortion function, and adding them to u and v after it.
There are various swirl effects around: some (like the above) swirl only a localised area, and have the amount of swirl decreasing towards the edge of the image. Others increase the swirling towards the edge of the image. This sort of thing can be done by playing about with the angle= line, e.g.
angle = a*(x*x+y*y)
There is a Java implementation of lot of image filters/effects at Jerry's Java Image Filters. Maybe you can take inspiration from there.
The swirl and others like it are a matrix transformation on the pixel locations. You make a new image and get the color from a position on the image that you get from multiplying the current position by a matrix.
The matrix is dependent on the current position.
here is a good CodeProject showing how to do it
http://www.codeproject.com/KB/GDI-plus/displacementfilters.aspx
there has a new graphic library have many feature
http://code.google.com/p/picasso-graphic/
Take a look at ImageMagick. It's a image conversion and editing toolkit and has interfaces for all popular languages.
The -displace operator can create swirls with the correct displacement map.
If you are for some reason not satisfied with the ImageMagick interface, you can always take a look at the source code of the filters and go from there.