How to find pixel per meter [closed] - image

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have a static camera through which I am focusing on the covered area, total covered area by a camera is
length 78.7 cm
width 102.1 cm
heigh 118.5 cm
my image size is 800 * 480
now in the total covered area, I have another box whose
length is 22.6 cm
width is 25.6 cm
height is 24 cm
I want to find out how many pixels I have per meter. I am using the formula of
m/pixels * 0.39 but it's not giving the exact answer.

Many manufacturers use the pixels per meter measure as a metric of video surveillance image quality. For instance, you need around 130 ppm to have enough detail to accurately recognize facial detail and indentify license plates.
To calculate pixel density (pixel per meter) you need the number of horizontal pixels of the image or video source and the width in meters of the scene where you are looking at.
Therefore,
ppm = ImageWidth (in pixels) / Field of view (in meters)
The easiest way to calculate ppm for an specific scene is to point the camera where you want to calculate ppm, and then divide the number of pixels of your image by the distance in meters of the field of view of you camera in that specific point. Note that this measure will not be constant across the vertical axis of your camera. Each line of your image will have a different pixel density.
If you calibrate the camera you could do these calculations theoretically, since you could know the width of the field of view in advance, but it is a little bit more complicated.

Your static camera will have a specific fixed lens size determining the focal length (f number). This means that there is a specific ideal focus point along the depth of view, this 2D plane of focus will form a parabolic shape proportional to the shape and size of the lens and because lens size and FOV are inversely proportional to one another the larger your lens size, the small the FOV and as someone already mentioned the focus point diminishes outwards in all directions from the center point of your focal area (think Doppler effect) this is why we tend to find blurry pixels in the corners of images/video with object outside foreground / background of the focus plane.
This is a beautiful problem to solve using math.
so by calculating the length of the hypotenuse vs the height of a right angle triangle you know the difference in focal distance between your focal point and the most outer pixel displayed from your image sensor and because you should know the resolution of your camera you can calculate the loss of pixel density from center in all directions.
i feel like there probably a clean looking formula to calculate this using that sly ol' dog Pi but i couldn't be bothered.
You might find this useful as step 1:
https://www.omnicalculator.com/math/right-triangle-side-angle
with the above calculator:
hypotenuse = C = distance from camera to outer most pixel capture
Height of triangle = b = distance from camera to focal point (center of your photo)
half the FOV angle = α (you would do this for both vertical and horizontal FOV per pixel example 2MP - 1920 x 1080 points of references)
every lens size has a specific vertical and horizontal FOV angle.
This will allow you to calculate which parts of objects in the scene are within perfect focus thus retaining the densest pixels.
PS if you wanted to be scientifically accurate you would need to calculate this for every pixel on your image sensor over the size of the sensor so if you had a 2MP camera you would need to do 1920 x 1080 / 1/3" (calculate the pixel density over your sensor)for example. the quality of the glass of your lens will also play a factor just not sure on what front.
The colour of the objects in the scene, atmospheric conditions and lux level in scene will allow for variable light wavelengths to the camera, also influencing the density pixels captured.
lastly because you want to calculate the resultant conversion of resolution captured vs displayed your display medium will influence the actual performance.
realistically you wouldnt be able to tell with certainty unless you measure every single distance of every pixel path from pixel on sensor to object point.
let us know how it goes. i could also be completely wrong LOL

You can't calculate the number of pixels per meter unless you know the distance to the object being captured. An object 10 meters away will have fewer pixels per meter than an object 1 meter away. All you can accurately calculate is the number of pixels per degree of your camera's field of view.
Even if you point the camera at a flat wall, the distance from the camera to the wall will change as the incident angle changes, so distance of the middle of the wall will be closer to the camera than the distance from the corners of the wall. This can be calculated using some simple trigonometry.

Related

Pixel coordinates derived from real distance measurements

In my program (using MATLAB), I specified(through dragging) the pedestrian lane as my Region Of Interest (ROI) with the coordinates [7, 178, 620, 190] (in xmin, ymin, width, and height respectively) using the getrect, roipoly and insertshape function. Refer to the image below.
The video from where this snapshot is taken is in 640x480 pixels resolution (480p)
Defining a real world space as my ROI by mouse dragging is barbaric. That's why the ROI coordinates must be derived mathematically.
What I'm going at is using real-world measurements from the video capturing site and use the Pythagorean Theorem from where the camera is positioned:
How do I obtain the equivalent pixel coordinates and parameters using the real-world measurements?
I'll try to split your question into 2 smaller questions.
A) How do I obtain the equivalent pixel coordinates of an interesting
point? (pratical question)
Your program shoudl be able to retrieve/reconnaise a feature/marker that you positioned in the "real-world" interesting point. The output is a coordinate in pixel. This can be done quite easily (think about QR-codes, for example)
B) What is the analytical relationship between 1 point in 3D space and
its pixel coordinate in the image? (theoretical question)
This is the projection equation based on the pinhole camera model. X,Y,Z 3D coordinates are related with x,y pixel coordinates
Cool, but some detail have to be explained (and there will be any "automatic short formula")
s represent the scale factor. A single pixel in an image could be the projection of infinite different point, due to perspective. In your photo, a pixel containing a piece of a car (when the car is present) will be the same pixel that contain a piece of street under the car (when the car is passed).
So there is not an univocal relationship starting from pixels coordinates
The matrix on the left involves the camera parameters (focal length, etc.) which are called intrinsic parameters. They have to be known to build the relationship between 3D coordinates and pixel coordinates
The matrix on the right seems to be trivial, is the combination of an identity matrix which represents rotation and a column array of zeros which represents translation. Something like T = [R|t].
Which rotation, which translation? You have to consider that every set of coordinates is implicitly expressed in its own reference system. So you have to determine the relationship between the reference system of your measurement and the camera reference system: not only to retrieve position of the camera in your 3D space with euclidean geometry, but also orientation of the camera (angles).

Clustering 1000 images to find group of images with greater similarity

I have 1000 of 2D gray-scale images and would like to cluster them in python in a way that images with more similarities stay in same group. The images represents simple geometrical shapes including circles, triangle etc.
If I wan to flatten each image to have a vector and then run the clustering algorithm, it would be very complicated. The images are 400*500, so my clustering training data would be 1000*200000 which means 200000 features!
Just wondering if anyone has come across this issue before?
This is a similar question to this one
Read my answer
Of course you don't use each picture as a feature.
In your case I would recommend features like:
Find corners and calculate their number
Assuming each edge is a straight line - do a histogram of orientations. In each pixel calculate the derivative angle atan(dy,dx), take the strongest 1% of derivative pixels and do a histogram. The amount of peaks in the histogram will correspond to amount of edges (will cluster triangles, squares, circles, etc)
Use connected components analysis to calculate how many shapes you have in the image. Calculate the amount of holes in each shape. Calculate the ratio between the circumference and the area o the shape. For geometrical shapes, geometrical features work extremely well
As you asked in the comment I am adding more info for issue 2.
Please read more about HOG feature here. I assume your are familiar with that is an edge in the image and what a gradient is. Imagine you have a triangle in the image. Only Pixels that lie on the edges of the shape will have a high gradient. Moreover you expect that all the gradients devide into 3 different directions, one for each edge. You don't know in which direction since you don't know the orientation of the triangle but you know that there should be 3 directions. With a square there would be 2 directions and with circle there will not be a clear direction. You want to count the amount of directions. Use the following steps. First find the pixels which have a high gradient value. Say from the entire image there is only 1000 such pixels (they lie on the edges of the shape). For each pixel calculate the angle of the gradient. So you have 1000 pixels, each may have an angle of [0..179] (Angle of 180 is equal to 0). There are 180 different angles. Lets assume that in order to reduce noise you don't need the exact angle but +- 1 degrees. So each angle is divided by 2 and rounded to the nearest integer. So totally you have 1000 pixels, each having only 90 options for different angle. Now make a histogram of angles. If the shape was a circle you expect that roughly ~11 (=1000/90) pixels will fall into each bin of the histogram. If it was a square you expect the histogram to be largely empty except for 2 bins with a very high amount of pixels in it and the bins being at distance of 45 from each other. Example: bin 13 has 400 pixels in it, bi 58 has
400 pixels in it and the rest 200 are noise split somehow in the other bins. Now you know that you are facing a square and you also know its rotation in the image.
If it was a triangle you expect 3 large bins in the histogram.

Captured image viewpoint changing

i have a picture that captured from a fixed position [X Y Z] and angle [Pitch Yaw Roll] and a focal length of F (i think this information is called camera matrix)
i want to change the captured picture to a different position like it was taken in up position
the result image should be like:
in fact i have picture taken from this position:
and i want to change my picture in a way that it was taken in this position:
i hope that i could express my problem.
thnx in advance
It can be done accurately only for the (green) plane itself. The 3D objects standing onto the plane will be deformed after remapping, but the deformation may be acceptable if their height is small relative to the camera distance.
If the camera is never moving, all you need to do is identify on the perspective image four points that are the four vertices of a rectangle of known size (e.g. the soccer field itself), then compute the homography that maps those four points to that rectangle, and apply it to the whole image.
For details and code, see the OpenCV links at the bottom of that Wikipedia article.

How to get the histogram orientation of a 'one' cell according to Dalal and Triggs?

I am trying to implement the method of Dalal and Triggs. I could implement the first stage compute gradients on an image, and I could create the code who walk across the image in cells, but I don't understand the logic behind this stage.
I know is necessary identify first between a signed (0-360 degrees) or unsigned (0-180 degrees) gradients.
I know I must create a data structure to store each cell histogram, whit n bins. I know what is a histogram, hence I understand I must visit each pixel, but I I don't fully understand about the method for classify each pixel, get the gradient orientation of this pixel and build the histogram with this data.
In short HOG is nothing but a dense representation of gradient orientations weighted by their strengths over a overlapped local neighbourhoods.
You asked what is the significance of finding each pixel gradient orientation. In an image the gradient orientation at each pixel indicates the direction of the boundary(edge between two textures) of the object at that location with respect to X and Y axis. So if you group the orientations of a patch or block or part of an object it represents the distribution of edge directions of object at that region in a very strong way or unique way... Now let us take a simple example, a circle if you plot the gradient orientations of a circle as a histogram you will get a straight line (Don't imagine HOG just a simple plot of gradient orientations) because the orientations of edges of circle ranges from 0 degrees to 360 degrees if u sampled at 360 consecutive locations, For a different object it is different, HOG also do the same thing but in a more sophisticated manner by dividing image into overlapping blocks and dividing each block into cells and making the histogram weighted by the strengths of the local gradients...
Hope it is useful ...

Resources for image distortion algorithms

Where can I find algorithms for image distortions? There are so much info of Blur and other classic algorithms but so little of more complex ones. In particular, I am interested in swirl effect image distortion algorithm.
I can't find any references, but I can give a basic idea of how distortion effects work.
The key to the distortion is a function which takes two coordinates (x,y) in the distorted image, and transforms them to coordinates (u,v) in the original image. This specifies the inverse function of the distortion, since it takes the distorted image back to the original image
To generate the distorted image, one loops over x and y, calculates the point (u,v) from (x,y) using the inverse distortion function, and sets the colour components at (x,y) to be the same as those at (u,v) in the original image. One ususally uses interpolation (e.g. http://en.wikipedia.org/wiki/Bilinear_interpolation ) to determine the colour at (u,v), since (u,v) usually does not lie exactly on the centre of a pixel, but rather at some fractional point between pixels.
A swirl is essentially a rotation, where the angle of rotation is dependent on the distance from the centre of the image. An example would be:
a = amount of rotation
b = size of effect
angle = a*exp(-(x*x+y*y)/(b*b))
u = cos(angle)*x + sin(angle)*y
v = -sin(angle)*x + cos(angle)*y
Here, I assume for simplicity that the centre of the swirl is at (0,0). The swirl can be put anywhere by subtracting the swirl position coordinates from x and y before the distortion function, and adding them to u and v after it.
There are various swirl effects around: some (like the above) swirl only a localised area, and have the amount of swirl decreasing towards the edge of the image. Others increase the swirling towards the edge of the image. This sort of thing can be done by playing about with the angle= line, e.g.
angle = a*(x*x+y*y)
There is a Java implementation of lot of image filters/effects at Jerry's Java Image Filters. Maybe you can take inspiration from there.
The swirl and others like it are a matrix transformation on the pixel locations. You make a new image and get the color from a position on the image that you get from multiplying the current position by a matrix.
The matrix is dependent on the current position.
here is a good CodeProject showing how to do it
http://www.codeproject.com/KB/GDI-plus/displacementfilters.aspx
there has a new graphic library have many feature
http://code.google.com/p/picasso-graphic/
Take a look at ImageMagick. It's a image conversion and editing toolkit and has interfaces for all popular languages.
The -displace operator can create swirls with the correct displacement map.
If you are for some reason not satisfied with the ImageMagick interface, you can always take a look at the source code of the filters and go from there.

Resources