Image transformation for Kinect game

Image transformation for Kinect game - image

I am working on a Kinect game where I am supposed "to dress" the player into a kind of garment.
As the player should always stand directly in front of the device, I am using a simple jpg file for this "dressing".
My problem starts when the user, while still standing in the frontal position, bends the knees or leans right or left. I want to apply an appropriate transform to this "dress" image so that it still will cover player's body more or less correctly.
From Kinect sensors I can get a current information about the following player's body parts positions:
Is there any library (C++, C#, Java) or a known algorithm that can make such transformation?

Complex task but possible.
I would split the 'dress' into arms, torso/upper body and lower. you could then use (from memory) AffineTransform in java though most languages have algorithms for matrix transforms against images.
The reason I suggest splitting the image is that when you do a transform you will be distorting the top part of the image and it will allow you to do some rotation (for when people lean) and wrap the arms as they move also.
EDIT:
I would also NOT transform each frame (cpu intensive) I would create a rainbow table of the possible angles and do a lookup for the image

Related

Latent space image interpolation

Can someone tell me how (or the name of it, so that I could look it up) I can implement this interpolation effect? https://www.youtube.com/watch?v=36lE9tV9vm0&t=3010s&frags=pl%2Cwn
I tried to use r = r+dr, g = g+dr and b = b+db for the RGB values in each iteration, but it looks way too simple compared to the effect from the video.

"Can someone tell me how I can implement this interpolation effect?
(or the name of it, so that I could look it up)..."
It's not actually a named interpolation effect. It appears to interpolate but really it's just realtime updated variations of some fictional facial "features" (the hair, eyes, nose, etc are synthesized pixels taking hints from a library/database of possible matching feature types).
For this technique they used Neural Networks to do a process similar to DFT Image Reconstruction. You'll be modifying the image data in Frequency domain (with u,v), not Time domain (using x,y).
You can read about it at this PDF: https://research.nvidia.com/sites/default/files/pubs/2017-10_Progressive-Growing-of/karras2018iclr-paper.pdf
The (Python) source code:
https://github.com/tkarras/progressive_growing_of_gans
For ideas, on Youtube you can look up:
DFT image reconstruction (there's a good example with b/w Nicholas Cage photo reconstructed in stages. Loud music warning).
Image Synthesis with neural networks (one clip had salternative shoe and hand-bag designs (item photos) being "synthesized" by an N.N. after it analyzed features from other existing catalogue photos as "inspiration".
Image Enhancement Super Resolution using neural networks This method is closest to answering your question. One example has very low-res blurry pixelated image in b/w. Cannot tell if boy or girl. During a test, The network synthesizes various higher quality face images that it thinks is the correct match for the testing input.
After understanding what/how they're achieve it, you could think of shortcuts to get similar effect without needing networks eg: only using regular pixel editing functions.

Found it in another video, it is called "latent space interpolation", it has to be applied on the compressed images. If I have image A and the next image is image B, I have first to encode A and B, use the interpolation on the encoded data and finally decode the resulted image.

As of today, I found out that this kind of interpolation effect can be easily implemented for 3d image data. That is if the image data is available in a normalized and at 3d origin centred way, like for example in a unit sphere around the origin and the data of each faceimage is inside that unit sphere. Having the data of two images stored this way the interpolation can be calculated by taking the differences of rays going through the origin center and through each area of the sphere at some desired resolution.

Inward facing 360 degree photo

Problem statement:
I want to grab a smartphone, take a series of photos (or a video) of an object, and convert it to a 360 degree photo.
Some Research:
If we look at Facebook 360 Photos, this is exactly what I'm looking for, except that Facebook's solution is outward-facing 360 photos, and I'm looking for inward facing 360 photos.
This objective seems to be similar to 360 degree product photography. Important difference: I do not want to use any special equipment other than a smartphone. Just like you can create a 360 degree outward facing photo without needing a tripod or a turntable.
I want to understand from the community:
Does a solution like this exist? What's the best we can do at the moment?
What kind of technological expertise would a person require to create something like this? Consider yourself an investor or a CEO who needs to get this built. Who do you hire? Who do you consult?
Thanks a lot for the help.

There is a fundamental difference between the two cases:
"Outward": the translation of the camera is small. If the scene is far enough away, it can be ignored, and the camera motion can be approximated with a rotation about its focal point (there is almost no parallax between views). The mapping from one image to any other image is well approximated a homography, and the image set maps naturally to the inner surface of a sphere (or, aproximately, a cylinder, a cube, etc.). A scene far away will also appear to move slowly, therefore capture time is less of a factor when stitching images.
"Inward": the translation is large and cannot be ignored. There is parallax, the scene objects may self-occlude or mutually occlude each other in some of the images, making "stitching" highly nontrivial - mapping of one image onto the other depends on the scene content, unlike the outward case. If the content of the scene moves, stitching becomes an even harder problem.
In both cases, however, one normally relies on bundle adjustment for the final refinement of the camera poses/positions. In the second case the 3D geometry of the scene may need to be reconstructed, depending on the application.
To your questions:
Of course a solution exists: have you seen "The Matrix" with its "bullet-time" effect? Doing a google search of "bullet time" shows several more or less successful attempts at reproduction - the easiest involves tying an iPhone on a string and swinging it around.
Someone with background and expertize in photogrammetry, 3D computer vision (roughly, they have read and internalized Hartley & Zisserman's book or equivalents), and nontrivial image processing - there is some art involved in stitching correctly once you have solved the photogrammetry, it's not "just graphcut it and then multiband-blend it"

How to manage 2D data in a procedural game world

I’m building a Starflight-inspired 2D space exploration game with a procedural world. The gameplay is divided into different « scenes » (to use Godot terminology) to manage the different « depths » of the game. For example, interstellar flight is a scene where the star systems are simply represented by star sprites. When the player gets in range, the view is moved to the solar system scene, where the player moves his ship inside the actual solar system.
So far so good, I generate the universe (the solar systems) from a hard coded array of coordinates and seeds. Now I also want to make the universe generation procedural, but I’m guessing that loading a whole universe (there is no real limit to the number of solar systems once it becomes procedural) in memory won’t be efficient.
I’m thinking of generating the universe on the first run and saving the data to a file, but I’m wondering how to load the relevant data in an efficient way that would let me load only a certain « radius » of data around the player’s ship. I feel like it would be the way to go if I use my generation algorithms that generate « realistic » galaxy shapes, since it implies many steps of data processing (different cluster shapes are generated, arms, blobs, etc. and then stars are spinned around the center to simulate the galaxy rotation, etc.) that would be probably too long to calculate in realtime.
I’m wondering which approach I should take on this problem. It’s not really language or engine dependant, so references to generic articles and algorithms on the subject would suffice.
I also read a bit about QuadTrees and I think I’m getting to something there, but I’m not exactly sure how to use that with a file on disk.
Thanks in advance for your help!

I have some suggestions:
Do not generate the whole universe on the first run, generate only the areas that are somehow visible. Then, instead of loading the whole universe from disk, you just generate it whenever your spaceship (or whatever) come within view distance of that area. This makes game initialization much faster and allows an (almost) infinite universe.
If you want the universe to be modifiable, store only the `edits' that a player makes. So if you want to show a part of the universe, generate the part from your seed and then overlay the stored edits. This makes storage much smaller.
For storage on disk, have a look at R-Tree, especially R*Tree and R+Tree, they are designed for storing data in disk pages.

as TilmannZ suggested, you should not be generating the whole dataset for the galaxy when you start the game, because there is likely no need (unless the player needs to see/interact with all the data at once - e.g. all stars). If this is the case, for example for a starmap, then you may be better loading all the data once and saving the result in an image file.
Instead, you should only genereated the data as needed around the player. The most obvious way to do this would be to construct a grid around the player, and keep this grid centered on the player as they move around. As the player moves around, you only need to update the conceptual galaxy coordinates of each cell (not the rendered coordinates). Then for each cell you can then use the coordinates as the input into a value or gradient generator like Perlin to determine what features should spawn in that location.
As for 'shaping' the galaxy or universe, one effective way is to sample the pixel data of a greyscale image of a galaxy which has the shape you want. You could load the image's RGB data at run time, and use the coordinates of your grid as you generate the stars to get the RGB value, which you can use as a density factor for the star generation; the whiter the pixel, the higher the star density at this location and visa-versa for black pixels. This method lets you effectively draw the shape of the galaxy in paint.

Maybe think about different layers of abstractions. Each layer uses the parent layer, designer input, events & procedural generation algorithms to generate the needed data.
The Universe layer contains user or randomly placed galaxy polygons & types.
The Galaxy layers can add more details (number & density of spiral arms) or a density map.
A cluster of solar systems.
The solar system adds the stars & planets.
And only create the details for currently needed elements.

Augment reality like zookazam

What algorithms are used for augmented reality like zookazam ?
I think it analyze image and find planes by contrast, but i don't know how.
What topics should I read before starting with app like this?

[Prologue]
This is extremly broad topic and mostly off topic in it's current state. I reedited your question but to make your question answerable within the rules/possibilities of this site
You should specify more closely what your augmented reality:
should do
adding 2D/3D objects with known mesh ...
changing light conditions
adding/removing body parts/clothes/hairs ...
a good idea is to provide some example image (sketch) of input/output of what you want to achieve.
what input it has
video,static image, 2D,stereo,3D. For pure 2D input specify what conditions/markers/illumination/LASER patterns you have to help the reconstruction.
what will be in the input image? empty room, persons, specific objects etc.
specify target platform
many algorithms are limited to memory size/bandwidth, CPU power, special HW capabilities etc so it is a good idea to add tag for your platform. The OS and language is also a good idea to add.
[How augmented reality works]
acquire input image
if you are connecting to some device like camera you need to use its driver/framework or something to obtain the image or use some common API it supports. This task is OS dependent. My favorite way on Windows is to use VFW (video for windows) API.
I would start with some static file(s) from start instead to ease up the debug and incremental building process. (you do not need to wait for camera and stuff to happen on each build). And when your App is ready for live video then switch back to camera...
reconstruct the scene into 3D mesh
if you use 3D cameras like Kinect then this step is not necessary. Otherwise you need to distinguish the object by some segmentation process usually based on the edge detections or color homogenity.
The quality of the 3D mesh depends on what you want to achieve and what is your input. For example if you want realistic shadows and lighting then you need very good mesh. If the camera is fixed in some room you can predefine the mesh manually (hard code it) and compute just the objects in view. Also the objects detection/segmentation can be done very simply by substracting the empty room image from current view image so the pixels with big difference are the objects.
you can also use planes instead of real 3D mesh as you suggested in the OP but then you can forget about more realistic quality of effects like lighting,shadows,intersections... if you assume the objects are standing straight then you can use room metrics to obtain the distance from camera. see:
selection criteria for different projections
estimate measure of photographed things
For pure 2D input you can also use the illumination to estimate the 3D mesh see:
Turn any 2D image into 3D printable sculpture with code
render
Just render the scene back to some image/video/screen... with added/removed features. If you are not changing the light conditions too much you can also use the original image and render directly to it. Shadows can be achieved by darkening the pixels ... For better results with this the illumination/shadows/spots/etc. are usually filtered out from the original image and then added directly by rendering instead. see
White balance (Color Suppression) Formula?
Enhancing dynamic range and normalizing illumination
The rendering process itself is also platform dependent (unless you are doing it by low level graphics in memory). You can use things like GDI,DX,OpenGL,... see:
Graphics rendering
You also need camera parameters for rendering like:
Transformation of 3D objects related to vanishing points and horizon line
[Basic topics to google/read]
2D
DIP digital image processing
Image Segmentation
3D
Vector math
Homogenous coordinates
3D scene reconstruction
3D graphics
normal shading
paltform dependent
image acquisition
rendering

3d model construction using multiple images from multiple points (kinect)

is it possible to construct a 3d model of a still object if various images along with depth data was gathered from various angles, what I was thinking was have a sort of a circular conveyor belt where a kinect would be placed and the conveyor belt while the real object that is to be reconstructed in 3d space sits in the middle. The conveyor belt thereafter rotates around the image in a circle and lots of images are captured (perhaps 10 image per second) which would allow the kinect to catch an image from every angle including the depth data, theoretically this is possible. The model would also have to be recreated with the textures.
What I would like to know is whether there are any similar projects/software already available and any links would be appreciated
Whether this is possible within perhaps 6 months
How would I proceed to do this? Such as any similar algorithm you could point me to and such
Thanks,
MilindaD

It is definitely possible and there are a lot of 3D scanners which work out there, with more or less the same principle of stereoscopy.
You probably know this, but just to contextualize: The idea is to get two images from the same point and to use triangulation to compute the 3d coordinates of the point in your scene. Although this is quite easy, the big issue is to find the correspondence between the points in your 2 images, and this is where you need a good software to extract and recognize similar points.
There is an open-source project called Meshlab for 3d vision, which includes 3d reconstruction* algorithms. I don't know the details of the algorithms, but the software is definitely a good entrance point if you want to play with 3d.
I used to know some other ones, I will try to find them and add them here:
Insight3d
(*Wiki page has no content, redirects to login for editing)

Check out https://bitbucket.org/tobin/kinect-point-cloud-demo/overview which is a code sample for the Kinect for Windows SDK that does specifically this. Currently it uses the bitmaps captured by the depth sensor, and iterates through the byte array to create a point cloud in a PLY format that can read by MeshLab. The next stage of us is to apply/refine a delanunay triangle algoirthim to form a mesh instead of points, which a texture can be applied. A third stage would then me a mesh merging formula to combine multiple caputres from the Kinect to form a full 3D object mesh.
This is based on some work I done in June using Kinect for the purposes of 3D printing capture.
The .NET code in this source code repository will however get you started with what you want to achieve.

Autodesk has a piece of software that will do what you are asking for it is called "Photofly". It is currently in the labs section. Using a series of images taken from multiple angles the 3d geometry is created and then photo mapped with your images to create the scene.

If you interested more in theoretical (i mean if you want to know how) part of this problem,
here is some document from Microsoft Research about moving depth camera and 3D reconstruction.

Try out VisualSfM (http://ccwu.me/vsfm/) by Changchang Wu (http://ccwu.me/)
It takes multiple images from different angles of the scene and outputs a 3D point cloud.
The algorithm is called "Structure from Motion".
Brief idea of the algorithm : It involves extracting feature points in each image; finding correspondences between them across images; building feature tracks, estimating camera matrices and thereby the 3D coordinates of the feature points.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio