I'm using WIC and Direct2D (via SharpDX) to composite photos into video frames. For each frame I have the exact coordinates where each corner will be found. While the photos themselves are a standard aspect ratio (e.g. 4:3 or 16:9) the insertion points are not -- they may be rotated, scaled, and skewed.
Now I know in Direct2D I can apply matrix transformations to accomplish this... but I'm not exactly sure how. The examples I've seen are more about applying specific transformations (e.g. rotate 30 degrees) than trying to match an exact destination.
Given that I know the exact coordinates (A,B,C,D) above, is there an easy way to map the source image onto the target? Alternately how would I generate the matrix given the source and destination coordinates?
If Direct3D is an option, all you will need to do is to render the quadrilateral as two triangles (with the frog texture mapped onto it).
To make sure there are no artifacts, render the quad as an indexed mesh, like in the example here (note that it shares vertex 0 and vertex 2 across both triangles). Of course, you can replace the actual vertex coordinates with A, B, C and D.
To begin, you can check out these tutorials for SlimDX, an excellent set of .NET bindings to DirectX.
It is not really possible to achieve this with Direct2D. That could be possible with Direct2D1.1 (from Win8 Metro) with a custom vertex shader, but in the end, as ananthonline suggest, It will be much easier to do it with Direct3D11.
Also you can use triangle strip primitives that are easier to setup (you don't need to create an index buffer). For the coordinates, you can directly sends coordinates to a vertex shader without any transforms (the vertex shaders will copy input SV_POSITION directly to the pixel shader). You just have to map your coordinates into x [-1,1] and y[-1,1]. I suggest you to start with SharpDX MiniCubeTexture sample, change the matrix to perform an orthonormal projection (instead of the sample perspective).
Related
I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.
I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.
I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?
If you have any suggestions, thanks you in advance
I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.
PCL is not required to do this.
Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.
You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are
I'm currently working on a project that uses shadowtextures to render shadows.
It was pretty easy for spotlights, since only 1 texture in the direction of the spotlight is needed, but its a little more difficult since it needs either 6 textures in all directions or 1 texture that somehow renders all the obects around the pointlight.
And thats where my problem is. How can I generate a Projection matrix that somehow renders all the object in a 360 angle around the pointlight?
Basicly how do create a fisheye (or any other 360 degree camera) vertex shader?
How can I generate a Projection matrix that somehow renders all the object in a 360 angle around the pointlight?
You can't. A 4x4 projection matrix in homogenous space cannot represent any operation which would result in bending the edges of polygons. A straight line stays a straight line.
Basicly how do create a fisheye (or any other 360 degree camera) vertex shader?
You can't do that either, at least not in the general case. And this is not a limit of the projection matrix in use, but a general limit of the rasterizer. You could of course put the formula for fisheye distortion into the vertex shader. But the rasterizer will still rasterize each triangle with straight edges, you just distort the position of the corner points of each triangle. This means that it will only be correct for tiny triangles covering a single pixel. For larger triangles, you completely screw up the image. If you have stuff like T-joints, this even results in holes or overlaps in objects which actually should be perfectly closed.
It was pretty easy for spotlights, since only 1 texture in the direction of the spotlight is needed, but its a little more difficult since it needs either 6 textures in all directions or 1 texture that somehow renders all the obects around the pointlight.
The correct solution for this would be using a single cube map texture, with provides 6 faces. In a perfect cube, each face can then be rendered by a standard symmetric perspective projection with a field of view of 90 degrees both horizontally and vertically.
In modern OpenGL, you can use layered rendering. In that case, you attach each of the 6 faces of the cube map as a single layer to an FBO, and you can use the geometry shader to amplify your geomerty 6 times, and transform it according to the 6 different projection matrices, so that you still only need one render pass for the complete shadow map.
There are some other vendor-specific extensions which might be used to further optimize the cube map rendering, like Nvidia's NV_viewport_swizzle (available on Maxwell and newer GPUs), but I only mention this for completness.
Is there any way to render a different set of screen coordinates than the standard equidistant grid between -1,-1 to 1,1?
I am not talking about a transformation that can be accomplished by transformations in the vertex shader.
Specifically ES2 would be nice, but any version is fine.
Is this even directly OpenGl related or is the standard grid typically provided by plumbing libraries?
No, there isn't any other way. The values you write to gl_Position in the vertex (or tesselation or geometry) shader are clip space coordinates. The GPU will convert these to normalized device space (the "[-1,1] grid") by dividing by the w coordinate (after the actual primitive clipping, of course) and will finally use the viewport parameters to transform the results to window space.
There is no way to directly use window coordinates when you want to use the rendering pipeline. There are some operations which bypass large parts of that pipeline, like frambuffer blitting, which provides a limited way to draw some things directly to the framebuffer.
However, working with pixel coordinates isn't hard to achieve. You basically have to invert the transformations the GPU will be doing. While typically "ortho" matrices are used for this, the only required operations are a scale and a translation, which boils down to a fused multiply-add per component, so you can implement this extremely efficient in the vertex shader.
I'm writing a DirectX 11 overlay for a game. Creating textures is quite simple and I have good knowledge of C/C++.
The problem I am having is in my test window I can print the texture but as soon as I change the camera angle the texture moves with it. That is what most people want.
What I want to know is how do I print something in 2D to always appear on screen whether the camera moves or not?
Basically, since you use dx11, you use shaders to render your elements.
So standard 3d objects generally follow this guideline:
-Use 3 transforms : world (position object), view (transform in camera space), projection (transform in screen space).
In you vertex shader you multiply all that lot to convert from 3d to 2d.
Since now what you want is to display your elements in 2d (non relative to camera), you can easily create a new shader that doesn't take view/projection into account, so you just don't use those matrices in your vertex shader. (you can still use world for 2d transformation).
That's pretty much the easiest way, if you need pixel precise 2d elements, you need to create a billboard transform/shader. Basically you have your render target resolution, and standard render space is -1 -> 1, so you modify scale/translation to convert between both of those spaces.
When you render you overlay, also ensure that you disable depth completely.
If you need sample let me know I'll make one up quickly, but it should be quite simple.
Basically I was trying to achieve this: impose an arbitrary image to a pre-defined uneven surface. (See examples below).
-->
I do not have a lot of experience with image processing or 3D algorithms, so here is the best method I can think of so far:
Predefine a set of coordinates (say if we have a 10x10 grid, we have 100 coordinates that starts with (0,0), (0,10), (0,20), ... etc. There will be 9x9 = 81 grids.
Record the transformations for each individual coordinate on the t-shirt image e.g. (0,0) becomes (51,31), (0, 10) becomes (51, 35), etc.
Triangulate the original image into 81x2=162 triangles (with 2 triangles for each grid). Transform each triangle of the image based on the coordinate transformations obtained in Step 2 and draw it on the t-shirt image.
Problems/questions I have:
I don't know how to smooth out each triangle so that the image on t-shirt does not look ragged.
Is there a better way to do it? I want to make sure I'm not reinventing the wheels here before I proceed with an implementation.
Thanks!
This is called digital image warping. There was a popular graphics text on it in the 1990s (probably from somebody's thesis). You can also find an article on it from Dr. Dobb's Journal.
Your process is essentially correct. If you work pixel by pixel, rather than trying to use triangles, you'll avoid some of the problems you're facing. Scan across the pixels in target bitmap, and apply the local transformation based on the cell you're in to determine the coordinate of the corresponding pixel in the source bitmap. Copy that pixel over.
For a smoother result, you do your coordinate transformations in floating point and interpolate the pixel values from the source image using something like bilinear interpolation.
It's not really a solution for the problem, it's just a workaround :
If you have the 3D model that represents the T-Shirt.
you can use directX\OpenGL and put your image as a texture of the t-shirt.
Then you can ask it to render the picture you want from any point of view.