In my program (using MATLAB), I specified(through dragging) the pedestrian lane as my Region Of Interest (ROI) with the coordinates [7, 178, 620, 190] (in xmin, ymin, width, and height respectively) using the getrect, roipoly and insertshape function. Refer to the image below.
The video from where this snapshot is taken is in 640x480 pixels resolution (480p)
Defining a real world space as my ROI by mouse dragging is barbaric. That's why the ROI coordinates must be derived mathematically.
What I'm going at is using real-world measurements from the video capturing site and use the Pythagorean Theorem from where the camera is positioned:
How do I obtain the equivalent pixel coordinates and parameters using the real-world measurements?
I'll try to split your question into 2 smaller questions.
A) How do I obtain the equivalent pixel coordinates of an interesting
point? (pratical question)
Your program shoudl be able to retrieve/reconnaise a feature/marker that you positioned in the "real-world" interesting point. The output is a coordinate in pixel. This can be done quite easily (think about QR-codes, for example)
B) What is the analytical relationship between 1 point in 3D space and
its pixel coordinate in the image? (theoretical question)
This is the projection equation based on the pinhole camera model. X,Y,Z 3D coordinates are related with x,y pixel coordinates
Cool, but some detail have to be explained (and there will be any "automatic short formula")
s represent the scale factor. A single pixel in an image could be the projection of infinite different point, due to perspective. In your photo, a pixel containing a piece of a car (when the car is present) will be the same pixel that contain a piece of street under the car (when the car is passed).
So there is not an univocal relationship starting from pixels coordinates
The matrix on the left involves the camera parameters (focal length, etc.) which are called intrinsic parameters. They have to be known to build the relationship between 3D coordinates and pixel coordinates
The matrix on the right seems to be trivial, is the combination of an identity matrix which represents rotation and a column array of zeros which represents translation. Something like T = [R|t].
Which rotation, which translation? You have to consider that every set of coordinates is implicitly expressed in its own reference system. So you have to determine the relationship between the reference system of your measurement and the camera reference system: not only to retrieve position of the camera in your 3D space with euclidean geometry, but also orientation of the camera (angles).
Related
I'm using VisualSfM to build the 3D reconstruction of a scene. Now I want to estimate the depthmap and reproject the image. Any idea on how to do it?
If you have the camera intrinsic matrix K, its position vector in the world C and an orientation matrix R that rotates from world space to camera space, you can iterate over all pixels x,y in your image and perform:
Then, find using ray tracing, the minimal t that causes the ray to intersect with your 3D model (assuming it's dense, otherwise interpolate it), so that P lies on your model. The t value you found is then the pixel value of the depth map (perhaps normalized to some range).
i have a picture that captured from a fixed position [X Y Z] and angle [Pitch Yaw Roll] and a focal length of F (i think this information is called camera matrix)
i want to change the captured picture to a different position like it was taken in up position
the result image should be like:
in fact i have picture taken from this position:
and i want to change my picture in a way that it was taken in this position:
i hope that i could express my problem.
thnx in advance
It can be done accurately only for the (green) plane itself. The 3D objects standing onto the plane will be deformed after remapping, but the deformation may be acceptable if their height is small relative to the camera distance.
If the camera is never moving, all you need to do is identify on the perspective image four points that are the four vertices of a rectangle of known size (e.g. the soccer field itself), then compute the homography that maps those four points to that rectangle, and apply it to the whole image.
For details and code, see the OpenCV links at the bottom of that Wikipedia article.
I am trying to implement the method of Dalal and Triggs. I could implement the first stage compute gradients on an image, and I could create the code who walk across the image in cells, but I don't understand the logic behind this stage.
I know is necessary identify first between a signed (0-360 degrees) or unsigned (0-180 degrees) gradients.
I know I must create a data structure to store each cell histogram, whit n bins. I know what is a histogram, hence I understand I must visit each pixel, but I I don't fully understand about the method for classify each pixel, get the gradient orientation of this pixel and build the histogram with this data.
In short HOG is nothing but a dense representation of gradient orientations weighted by their strengths over a overlapped local neighbourhoods.
You asked what is the significance of finding each pixel gradient orientation. In an image the gradient orientation at each pixel indicates the direction of the boundary(edge between two textures) of the object at that location with respect to X and Y axis. So if you group the orientations of a patch or block or part of an object it represents the distribution of edge directions of object at that region in a very strong way or unique way... Now let us take a simple example, a circle if you plot the gradient orientations of a circle as a histogram you will get a straight line (Don't imagine HOG just a simple plot of gradient orientations) because the orientations of edges of circle ranges from 0 degrees to 360 degrees if u sampled at 360 consecutive locations, For a different object it is different, HOG also do the same thing but in a more sophisticated manner by dividing image into overlapping blocks and dividing each block into cells and making the histogram weighted by the strengths of the local gradients...
Hope it is useful ...
So we have such situation:
In this illustration, the first quadrilateral is shown on the Image Plane and the second quadrilateral is shown on the World Plane. [1]
In my particular case the Image Plane has 3 quadrilaterals - projections of real world squares, which, as we know, have same size, lying on the same plane, with same rotation relative to the plane they are lying on, and are not situated on same line on plane.
I wonder if we can get rotation angles of Image Plane to World Plane knowing stuff described?
In my case as input I have such data structures: original image (RGB pixels), objects (squares) with angles points in pixels (x,y) on Image Plane.
Take a look at Sections 2 and 3 of Algorithms for plane-based pose estimation.
The methods described there assume that you know the (x,y) coordinates of the features in question - in this case the red squares.
The problem you are describing is generally known as pose estimation - determining the 3D orientation and position of an object relative to a camera from a 2D view. For you, the object is a plane. Googling 'pose estimation plane' should give you more sources.
Where can I find algorithms for image distortions? There are so much info of Blur and other classic algorithms but so little of more complex ones. In particular, I am interested in swirl effect image distortion algorithm.
I can't find any references, but I can give a basic idea of how distortion effects work.
The key to the distortion is a function which takes two coordinates (x,y) in the distorted image, and transforms them to coordinates (u,v) in the original image. This specifies the inverse function of the distortion, since it takes the distorted image back to the original image
To generate the distorted image, one loops over x and y, calculates the point (u,v) from (x,y) using the inverse distortion function, and sets the colour components at (x,y) to be the same as those at (u,v) in the original image. One ususally uses interpolation (e.g. http://en.wikipedia.org/wiki/Bilinear_interpolation ) to determine the colour at (u,v), since (u,v) usually does not lie exactly on the centre of a pixel, but rather at some fractional point between pixels.
A swirl is essentially a rotation, where the angle of rotation is dependent on the distance from the centre of the image. An example would be:
a = amount of rotation
b = size of effect
angle = a*exp(-(x*x+y*y)/(b*b))
u = cos(angle)*x + sin(angle)*y
v = -sin(angle)*x + cos(angle)*y
Here, I assume for simplicity that the centre of the swirl is at (0,0). The swirl can be put anywhere by subtracting the swirl position coordinates from x and y before the distortion function, and adding them to u and v after it.
There are various swirl effects around: some (like the above) swirl only a localised area, and have the amount of swirl decreasing towards the edge of the image. Others increase the swirling towards the edge of the image. This sort of thing can be done by playing about with the angle= line, e.g.
angle = a*(x*x+y*y)
There is a Java implementation of lot of image filters/effects at Jerry's Java Image Filters. Maybe you can take inspiration from there.
The swirl and others like it are a matrix transformation on the pixel locations. You make a new image and get the color from a position on the image that you get from multiplying the current position by a matrix.
The matrix is dependent on the current position.
here is a good CodeProject showing how to do it
http://www.codeproject.com/KB/GDI-plus/displacementfilters.aspx
there has a new graphic library have many feature
http://code.google.com/p/picasso-graphic/
Take a look at ImageMagick. It's a image conversion and editing toolkit and has interfaces for all popular languages.
The -displace operator can create swirls with the correct displacement map.
If you are for some reason not satisfied with the ImageMagick interface, you can always take a look at the source code of the filters and go from there.