Kinect 2 - Significant delay on hand motion - render

I'm using the Kinect 2 to perform rotation and zooming of a virtual camera showing on an 3D object by moving the hand in all three directions. The problem I currently tackle with is that these operations are executed with some noticeable delay. If my hand is in a steady position again, the camera still continues to move for a short time. It feels like if I push the camera instead of control them in real time. Perhaps the frame rate is a problem. As far as I know the Kinect has 30 FPS while my application has 60 FPS (VSync enabled).
What could be the cause for this issue? How can I control my camera without any significant delay?

The Kinect is a very graphic and process intensive piece of hardware. For your application I'd suggest a minimum specification of a GTX960 and a 4th gen i7 processor. Your hardware will be a prime factor in how fast you can compute Kinect data.
You're also going to want to avoid using loops as much as possible and instead rely on multi threading and if you are looping be certain that there are no foreach loops as they take longer to execute. It is very important that your code is running the data read from the Kinect and the position command asynchronously.
The Kinect will never be real time responsive. There is just too much data that it is handling, the best you can do is optimize your code and increase your hardware power to shrink the response time.

Related

how to speed up objects movement speed in opengl without dropping movement smoothness

Here's the function which is registered as display function in glutDisplayFunc()
void RenderFunction(void)
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glPointSize(5);
glBegin(GL_POINTS);
glVertex2i(0+i,0);
glEnd();
glutSwapBuffers();
i+=1;
glutPostRedisplay();
}
This way the point moves across the screen but its speed is really slow.
I can speed it up by incrementing i with greater number than 1, but then the motion doesn't seem smooth. How do I achieve higher speed?
I used to work with SFML which is made on the top of OpenGL and there the object moved really fast with move() method. So there has to be a way in OpenGL too.
In this case, there's probably not a lot you can do other than moving your point further each time you redraw. In particular, most performance improvement probably won't have any significant effect on the perceived speed in this case.
The problem is fairly simple: you're changing the location by one pixel at a time. Chances are pretty good that you have screen updating "locked" so it's happening in the conjunction with the monitor's refresh.
Assuming that's the case, with a typical monitor that refreshes at 60 Hz, you're going to get a fixed rate of your point moving 60 pixels per second. If you improve the code's efficiency, the movement speed won't change--it'll just reduce the amount of CPU time you're using to move the dot at the same speed.
Essentially the only choice to move if faster is to move more than one pixel per screen refresh. One pixel per screen refresh means 60 pixels per second, so (for example) to move across a typical HD screen (1920 dots horizontally) will take 1920 pixels/60 pixels/second = 32 seconds.
With really slow code, you might use 1% of the CPU to do that. With faster code, that might drop to some unmeasureably small amount--but either way, it's going to travel the same speed, so it'll take 32 seconds to get across the screen.
If you wanted to, you could unlock your screen updates from the screen refresh. Since you're only drawing one point, you can probably update the screen at a few thousand frames per second, so the dot would appear to move across the screen a lot faster.
In reality, however, the monitor is still only refreshing the screen at 60 Hz. When your code updates faster than that, it just means you'll produce a number of intermediate updates that never show up on the screen. As far as the pictures actually showing up on the screen go, you're just moving the point more than one pixel per screen refresh. The fact that you updated data in memory for all the intermediate points doesn't really change anything. The visible result is essentially the same as if you redrew once per screen refresh, and moved the point a lot further each time.
define model view matrix with i as translation component.
Then apply this matrix to the vertex you are defining. But yes as others are telling try to move to modern OpenGL.
glMatrixMode(GL_MODELVIEW);
glTranslatef(i, i, i);

threejs benchmark and progressive enhancement

I am loading a ThreeJS scene on a website and I would like to optimize it depending on the capacity of the graphic card.
Is there a way to quickly benchmark the client computer and have some data that will let me decide how demanding or simple has to be my scene in order to run at a decent FPS ?
I am thinking of a benchmark library that can be easily plugged or a benchmark-as-a-service. And it has to run without the user noticing.
you can use stats.js to monitor performance. it is used in almost all threejs examples and is inluded in the treejs base.
the problem with this is that the framerate is locked to 60fps, so you cant tell how much ms get lost by vsynch
the only thing i found to be reliable is take the render time and increase quality if its under a limit, decrease it if it takes too long

Using windows phone combined motion api to track device position

I'd like to track the position of the device with respect to an initial position with high accuracy (ideally) for motions at a small scale (say < 1 meter). The best bet seems to be using motionReading.SensorReading.DeviceAcceleration. I tried this. But ran into few problems. Apart from the noisy readings (which I was expecting and can tolerate), I see some behaviors that are conceptually wrong - e.g. If I start from rest, move the phone around and bring it back to rest- and in the process periodically update the velocity vector along all the dimensions, I would expect the magnitude of the velocity to be very small (ideally 0). But I don't see that. I have extensively reviewed available help including the official msdn pages but I don't see any examples where the position/velocity of the device are updated using the acceleration vector. Is the acceleration vector that the api returns (atleast in theory) supposed to be the rate of change of velocity or something else? (FYI - my device does not have a gyroscope, so the api is going to be the low accuracy version.)

XNA Particle system performance

I have a particle system using basic spritebatch, where the particles are being created and destroyed based on decremental alpha value till 0.
The perfomance of the system is quite poor on pc, and very poor on xbox, with about a hundred particles on screen before significant fps slow down, I've read around regarding how to improve performance but does anyone have any tips on how to implment them, for exmaple what is the best way to - reuse particles rather than kill()? Does the image size of each particle make a difference? if I don't rotate each particle will this help?
I've played around with each of these suggestions but don't receive any significant improvement - does anyone have any advice - is it worth going gpu rather than cpu based?
From what I recall destroying and creating particles slows down the performance substantially.
You might want to reuse particles.
Not sure about image size or rotation drastically reducing performance as long as the image isn't substantially large.
I would have an array with swapping dead particles to the end of the active particles therefore processing only active particles.
For example:
Make an array of MAX particles;
When you need a particle grab particle_array[count];
Increment count.
When a particle dies, decrement count, swap the particle with particle_array[count];
Update only count particles;
Hope this helps.
I entirely agree with subsonic's answer, but I wanted to expand on it.
Creating a new particle every time and destroying (or letting go of) old particles creates a LOT of garbage. The garbage avoidance mantra in C#, particularly on Xbox (due to the compact framework's poor garbage handling), is to NEVER new a class type in your update/draw loop. ALWAYS pre-create into pools. Shawn H explains: http://blogs.msdn.com/b/shawnhar/archive/2007/07/02/twin-paths-to-garbage-collector-nirvana.aspx.
One other thing to consider is that you using multiple textures can cause slowdown for sprite batch due to multiple draw calls. Try to merge multiple particle textures into one and use the source rectangle SpriteBatch.Draw parameter.

graphics: best performance with floating point accumulation images

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:
Replace floats with uint32's in a fixed point 16.16 configuration
Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
Use OpenGL Accumulation Buffer instead of float array
Use OpenGL floating-point FBO's instead of float array
Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?
The problem is simply the sheer amount of data you have to process.
Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:
Clear the buffer
Render something on it (uses reads and writes)
Convert to unsigned bytes
Upload to OpenGL
That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.
If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:
Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).
Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.
Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.
The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.
Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.
It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.
Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.
Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).
If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

Resources