Camera calibration: the projection matrix - matrix

I have been working on a 3D scanner for a while now and I still have some questions about the projection matrix I want to clear out before I continue.
I understand the fact that this matrix describes the relation between the camera coordinate system and the world coordinate system. Yet I don't understand why all the calibration software packages give you this matrix? Does the software just picks a random world coordinate system in space and does it calculate the matrix afterwards?
I was thinking it would be way easier to choose the world coordinate system by yourself (if it is even possible). My plan is to create a scanner where the object stands still on a static surface and where the camera + laser moves around the object in a circular movement. If it would be possible to create your projection matrix this way so the world coordinate system is nicely placed in the middle of the static platform.
If I'm not very clear, let me know and I'll add an image.
Hopefully someone can clear things a little bit up for me so I can make some progress :).
Kind regards
Ruts

The matrix after camera calibration give you relation between two cameras (stereo vision) and it consist of intrinsic and extrinsic of camera. The matrix convert your image to 3D coordinate system and give you depth of objects.
The are number of video on youtube about 3D scanner.
http://www.youtube.com/watch?v=AYq5n7jwe40 or http://www.youtube.com/watch?v=H3WzY8EWM9s

Related

Matching 2D image pixels in corresponding 3D point cloud

I want to match pixels of calibrated 3D lidar and 2D camera data. I will use this to train a network. Can this be considered as labeled data with this matching? If it is, is there anyone to help me to achive this? Any suggestions will be appreciated.
On a high level, assuming you have some transformation (rotation/translation) between your camera and your lidar, and the calibration matrix of the camera, you have a 3D image and a 2D projection of it.
That is, if you project the 3D pointcloud onto the the image plane of the camera, you will have a (x,y)_camera (point in camera frame) for every (RGB)D_world == (x,y,z)_world) point.
Whether this is helpful to train on depends on what you're trying to achieve; if you're trying to find where the camera is or calibrate it, given (RGB)D data and image(s), that could be done better with a Perspective-n point algorithm (the lidar could make it easier, perhaps, if it built up a "real" view of the world to compare against). Whether it would be considered labeled data, depends on how you are trying to label it. They both say very similar things.

Using three.js, how would you project a globe world to a map on the screen?

I am curious about the limits of three.js. The following question is asked mainly as a challenge, not because I actually need the specific knowledge/code right away.
Say you have a game/simulation world model around a sphere geometry representing a planet, like the worlds of the game Populous. The resolution of polygons and textures is sufficient to look smooth when the globe fills the view of an ordinary camera. There are animated macroscopic objects on the surface.
The challenge is to project everything from the model to a global map projection on the screen in real time. The choice of projection is yours, but it must be seamless/continuous, and it must be possible for the user to rotate it, placing any point on the planet surface in the center of the screen. (It is not an option to maintain an alternative model of the world only for visualization.)
There are no limits on the number of cameras etc. allowed, but the performance must be expected to be "realtime", say two-figured FPS or more.
I don't expect ayn proof in the form of a running application (although that would be cool), but some explanation as to how it could be done.
My own initial idea is to place a lot of cameras, in fact one for every pixel in the map projection, around the globe, within a Group object that is attached to some kind of orbit controls (with rotation only), but I expect the number of object culling operations to become a huge performance issue. I am sure there must exist more elegant (and faster) solutions. :-)
why not just use a spherical camera-model (think a 360° camera) and virtually put it in the center of the sphere? So this camera would (if it were physically possible) be wrapped all around the sphere, looking toward the center from all directions.
This camera could be implemented in shaders (instead of the regular projection-matrix) and would produce an equirectangular image of the planet-surface (or in fact any other projection you want, like spherical mercator-projection).
As far as I can tell the vertex-shader can implement any projection you want and it doesn't need to represent a camera that is physically possible. It just needs to produce consistent clip-space coordinates for all vertices. Fragment-Shaders for lighting would still need to operate on the original coordinates, normals etc. but that should be achievable. So the vertex-shader would just need compute (x,y,z) => (phi,theta,r) and go on with that.
Occlusion-culling would need to be disabled, but iirc three.js doesn't do that anyway.

the imu /location algorithm used by tango?

my use case is only concerned with locationing, in fact only 2-d locationing. so a lot of the cool capabilities in tango are probably not useful to me. so I'm trying to see if i could implement the location algorithm myself.
from teardown reports it seems the 9dof sensors are pretty commodity hardware. the basic integration-based location algorithm (even with magnetic field calibration) has been mature knowledge. what algorithm does tango use?
from the description it seems that tango tries to aid in navigation by using the images it sees as a reference, sort of like the "terrain-following" mode in cruise missiles, is this right? this would be too ccomplex for me to implemente
You may easily get 2D position using the TangoPoseData with the correct coordinate system:
Project Tango uses a right-handed, local-level frame for the START_OF_SERVICE and AREA_DESCRIPTION coordinate frames. This convention sets the Z-axis aligned with gravity, with Z+ pointed upwards, and the X-Y plane is perpendicular to gravity and locally level with the ground plane. This local-level convention is based on the local east-north-up (ENU) earth-based coordinate system. Instead of true north, Project Tango uses the direction the back of the device is pointed when the service started as the Y axis, and the X axis is pointed to the right. The START_OF_SERVICE and AREA_DESCRIPTION base coordinate frames of the API will use this local-level frame convention.
Said more simply, use the pose data y/x coordinates for your space as you would latitude/longitude for the earth.
Heading data is also derived from the TangoPoseData and can be converted from quaternion to euler angles. Euler angles may be easier for you to use in your 2D location app.
Tango uses 3D to increase the confidence of its position within the space...even if you don't need 3D. I would let Tango do the hard stuff and extract the 2D position so you can focus on your app.
Tango uses the camera images to detect any change in position. And uses the IMU for device rotation and acceleration. Try blocking the camera and using the Motion Tracking app, it will fail.

Find my camera's 3D position and orientation according to a 2D marker

I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.

Is it possible to use GIS terrain vector data in three.js?

I'm new to three.js and WebGL in general.
The sample at http://css.dzone.com/articles/threejs-render-real-world shows how to use raster GIS terrain data in three.js
Is it possible to use vector GIS data in a scene? For example, I have a series of points representing locations (including height) stored in real-world coordinates (meters). How would I go about displaying those in three.js?
The basic sample at http://threejs.org/docs/59/#Manual/Introduction/Creating_a_scene shows how to create a geometry using coordinates - could I use a similar approach with real-world coordinates such as
"x" : 339494.5,
"y" : 1294953.7,
"z": 0.75
or do I need to convert these into page units? Could I use my points to create a surface on which to drape an aerial image?
I tried modifying the simple sample but I'm not seeing anything (or any error messages): http://jsfiddle.net/slead/KpCfW/
Thanks for any suggestions on what I'm doing wrong, or whether this is indeed possible.
I did a number of things to get the JSFiddle show something.. here: http://jsfiddle.net/HxnnA/
You did not specify any faces in your geometry. In this case I just hard-coded a face with all three of your data points acting as corner. Alternatively you can look into using particles to display your data as points instead of faces.
Set material to THREE.DoubleSide. This is not usually needed or recommended, but helps debugging in early phases, when you can see both sides of a face.
Your camera was probably looking in a wrong direction. Added a lookAt() to point it to the center and made the field of view wider (this just makes it easier to find things while coding).
Your camera near and far planes were likely off-range for the camera position and terrain dimensions. So I increased the far plane distance.
Your coordinate values were quite huge, so I just modified them by hand a bit to make sense in relation to the camera, and to make sure they form a big enough triangle for it to be seen in camera. You could consider dividing your coordinates with something like 100 to make the units smaller. But adjusting the camera to account for the huge scale should be enough too.
Nothing wrong with your approach, just make sure you feed the data so that it makes sense considering the camera location, direction and near + far planes. Pay attention to how you make the faces. The parameters to Face3 is the index of each point in your vertices array. Later on you might need to take winding order, normals and uvs into account. You can study the geometry classes included in Three.js for reference.
Three.js does not specify any meaning to units. Its just floating point numbers, and you can decide yourself what a unit (1.0) represents. Whether it's 1mm, 1 inch or 1km, depends on what makes the most sense considering the application and the scale of it. Floating point numbers can bring precision problems when the actual numbers are extremely small or extremely big. My own applications typically deal with stuff in the range from a couple of centimeters to couple hundred meters, and use units in such a way that 1.0 = 1 meter, that has been working fine.

Resources