low depth resolution on google-project-tango - google-project-tango

I see that the resolution of the depth camera is 320*180, however, each depth capture frame produces only 10K to 15 K points. Am I missing a setting?
I looked at the transformation matrices keeping the device fixed and with an area_learn update method, with no ADF loaded. I see non-zero offsets on the translation values. I expected 0 offsets.
Is there a published motion estimation performance document for Tango that specifies latency and performance of the IMU + ADF? I am looking for detailed test information.
Thanks

You are right about the resolution of the depth camera and your results align with mine. Depending of where the depth camera is pointing at, i'll get between 5K and 12K points. Scanning the floor surface would generate more points since it is flat and uniform.
I think that you are experiencing drift. This is expected when not using Area Learning (no ADF loaded) There is a known issue of drift occuring because of android 4.4. (source https://plus.google.com/+ChuckKnowledge/posts/dVk4ZgVikgT)
Loading an ADF should help this but i wouldn't expect it to be perfect.
I don't know about this. Sorry!

Related

Detecting new object on image

The goal
My security camera is constantly taking images and I would like to know when a new object appears to that image. (It is not actually a security camera but a camera monitoring packaging waste, but let's stick with security camera for simplicity).
Some examples:
a cat walks in front of the camera – this should be detected
the light conditions change (sun comes out) – this should not be detected
a mild breeze moves the grass or leaves – this should not be detected
So in general, a new object should be detected, but changes in brightness or little shifts of the image in some direction should not.
What would be the appropriate way to tackle this issue?
What I have tried so far
I am using OpenCV with Python, but this is not a requirement.
Based on this article I have experimented with two algorithms: Mean Squared Error and Structural Similarity.
This seems to be a good starting point, but is not yet precise enough. I have considered computing the edges of objects (using cv2.Canny()) and comparing those so that the changes in brightness would not effect the result.
Am I on the right track? Are there any other approaches / algorithms / libraries which I should try out?
In case I am on a right track, are there any extra steps I could take to increase the accuracy?

Transforming and registering point clouds

I’m starting to develop with Project Tango API.
I need to save PointCloud data that I get in the event OnXyzIjAvailable;
to do this, I started from your example "PointCloudJava" and wrote PointCloud coordinates in single files (an AsyncTask is started for this purpose).
So I have one file with xyz for each event. On the same event I get the corresponding transformation matrix (mRenderer.getModelMatCalculator(). GetPointCloudModelMatrixCopy()).
Point clouds
Then I’ve imported all this data (xyz point cloud with corresponding transformation matrix; the transformation matrix is applied to the point clouds) but the point clouds doesn’t match exactly; it seems that point clouds are closed each other but not overlapping exactly.
My questions are:
-Why I don’t have the matching between the single point clouds ?
-What I should have to do to have this matching ?
Then I’ve notice the following that is probably related to the above problem; I’ve used Project Tango Explore application (Area learning), I can see my position, but is constantly in motion even if I don't move.
Which is the problem ? Is it necessary a calibration?
Device Information
Poses delivered by Tango have a non-negligible amount of drift. Here is a sample graph of pose position when my tablet was in its stand observing a static scene (ideally the traces should be flat):
When we couple this drift with tracking errors when the device is actually moving then this produces noticeable registration issues. I see this especially when the device is rolled, i.e. rotated about the view axis. The raw pose quality may be sufficient for some applications (e.g. location) but causes problems for others (e.g. 3D scanning, seamless augmented reality).
I was disappointed when I saw this. But if Tango is attempting to measure motion by using the fisheye camera to correct inertial motion prediction - and not by using stereo vision between the fisheye and color cameras - then that is a really hard problem. And the reason for doing that would be to stay within CPU/GPU/RAM/latency/battery budgets to leave something for applications. So after consideration, while I remain disappointed, I can understand it.
I am hopeful that Tango will improve their pose algorithm over time, but I suspect that applications that depend on precise tracking will still have to add their own corrections, e.g. via stereo, structure from motion, point cloud correlation, etc.
Point clouds should be viewed as statistically accurate, not exactly accurate - there is a distance estimation error range that is a function of distance and surface characteristics - a tango fixed in a specific location will not return a constant point clout - rotation of the device can cause apparent drift, but it really isn't, it's just that the error is rotating along with the tango

Project Tango Camera Specifications

I've been developing a virtual camera app for depth cameras and I'm extremely interested in the Tango project. I have several questions regarding the cameras on board. I can't seem to find these specs anywhere in the developer section or forums, so I understand completely if these cant be answered publicly. I thought I would ask regardless and see if the current device is suitable for my app.
Are the depth and color images from the rgb/ir camera captured simultaneously?
What frame rates is the rgb/ir capable of? e.g. 30, 25, 24? And at what resolutions?
Does the motion tracking camera run in sync with the rgb/ir camera? If not what frame rate (or refresh rate) does the motion tracking camera run at? Also if they do not run on the same clock does the API expose a relative or an absolute time stamp for both cameras?
What manual controls (if any) are exposed for the color camera? Frame rate, gain, exposure time, white balance?
If the color camera is fully automatic, does it automatically drop its frame rate in low light situations?
Thank you so much for your time!
Edit: Im specifically referring to the new tablet.
Some guessing
No, the actual image used to generate the point cloud is not the droid you want - I put up a picture on Google+ that shows what you get when you get one of the images that has the IR pattern used to calculate depth (an aside - it looks suspiciously like a Serpinski curve to me
Image frame rate is considerably higher than point cloud frame rate, but seems variable - probably a function of the load that Tango imposes
Motion tracking, i.e. pose, is captured at a rate roughly 3x the pose cloud rate
Timestamps are done with the most fascinating double precision number - in prior releases there was definitely artifacts/data in the lsb's of the double - I do a getposeattime (callbacks used for ADF localization) when I pick up a cloud, so supposedly I've got a pose aligned with the cloud - images have very low timestamp correspondance with pose and cloud data - it's very important to note that the 3 tango streams (pose,image,cloud) all return timestamps
Don't know about camera controls yet - still wedging OpenCV into the cloud services :-) Low light will be interesting - anecdotal data indicates that Tango has a wider visual spectrum than we do, which makes me wonder if fiddling with the camera at the point of capture to change image quality, e.g. dropping the frame rate, might not cause Tango problems

Future prospects for improvement of depth data on Project Tango tablet

I am interested in using the Project Tango tablet for 3D reconstruction using arbitrary point features. In the current SDK version, we seem to have access to the following data.
A 1280 x 720 RGB image.
A point cloud with 0-~10,000 points, depending on the environment. This seems to average between 3,000 and 6,000 in most environments.
What I really want is to be able to identify a 3D point for key points within an image. Therefore, it makes sense to project depth into the image plane. I have done this, and I get something like this:
The problem with this process is that the depth points are sparse compared to the RGB pixels. So I took it a step further and performed interpolation between the depth points. First, I did Delaunay triangulation, and once I got a good triangulation, I interpolated between the 3 points on each facet and got a decent, fairly uniform depth image. Here are the zones where the interpolated depth is valid, imposed upon the RGB iamge.
Now, given the camera model, it's possible to project depth back into Cartesian coordinates at any point on the depth image (since the depth image was made such that each pixel corresponds to a point on the original RGB image, and we have the camera parameters of the RGB camera). However, if you look at the triangulation image and compare it to the original RGB image, you can see that depth is valid for all of the uninteresting points in the image: blank, featureless planes mostly. This isn't just true for this single set of images; it's a trend I'm seeing for the sensor. If a person stands in front of the sensor, for example, there are very few depth points within their silhouette.
As a result of this characteristic of the sensor, if I perform visual feature extraction on the image, most of the areas with corners or interesting textures fall in areas without associated depth information. Just an example: I detected 1000 SIFT keypoints from an an RGB image from an Xtion sensor, and 960 of those had valid depth values. If I do the same thing to this system, I get around 80 keypoints with valid depth. At the moment, this level of performance is unacceptable for my purposes.
I can guess at the underlying reasons for this: it seems like some sort of plane extraction algorithm is being used to get depth points, whereas Primesense/DepthSense sensors are using something more sophisticated.
So anyway, my main question here is: can we expect any improvement in the depth data at a later point in time, through improved RGB-IR image processing algorithms? Or is this an inherent limit of the current sensor?
I am from the Project Tango team at Google. I am sorry you are experiencing trouble with depth on the device. Just so that we are sure your device is in good working condition, can you please test the depth performance against a flat wall. Instructions are as below:
https://developers.google.com/project-tango/hardware/depth-test
Even with a device in good working condition, the depth library is known to return sparse depth points on scenes with low IR reflectance objects, small sized objects, high dynamic range scenes, surfaces at certain angles and objects at distances larger than ~4m. While some of these are inherent limitations in the depth solution, we are working with the depth solution provider to bring improvements wherever possible.
Attached an image of a typical conference room scene and the corresponding point cloud. As you can see, 1) no depth points are returned from the laptop screen (low reflectance), the table top objects such as post-its, pencil holder etc (small object sizes), large portions of the table (surface at an angles), room corner at the far right (distance >4m).
But as you move around the device, you will start getting depth point returns. Accumulating depth points is a must to get denser point clouds.
Please also keep us posted on your findings at project-tango-hardware-support#google.com
In my very basic initial experiments, you are correct with respect to depth information returned from the visual field, however, the return of surface points is anything but constant. I find as I move the device I can get major shifts in where depth information is returned, i.e. there's a lot of transitory opacity in the image with respect to depth data, probably due to the characteristics of the surfaces.
So while no return frame is enough, the real question seems to be the construction of a larger model (point cloud to open, possibly voxel spaces as one scales up) to bring successive scans into a common model. It's reminiscent of synthetic aperture algorithms in spirit, but the letters in the equations are from a whole different set of laws.
In short, I think a more interesting approach is to synthesize a more complete model by successive accumulation of point cloud data - now, for this to work, the device team has to have their dead reckoning on the money for whatever scale this is done. Also this addresses an issue that no sensor improvements can address - if your visual sensor is perfect, it still does nothing to help you relate the sides of an object at least be in the close neighborhood of the front of the object.

Using windows phone combined motion api to track device position

I'd like to track the position of the device with respect to an initial position with high accuracy (ideally) for motions at a small scale (say < 1 meter). The best bet seems to be using motionReading.SensorReading.DeviceAcceleration. I tried this. But ran into few problems. Apart from the noisy readings (which I was expecting and can tolerate), I see some behaviors that are conceptually wrong - e.g. If I start from rest, move the phone around and bring it back to rest- and in the process periodically update the velocity vector along all the dimensions, I would expect the magnitude of the velocity to be very small (ideally 0). But I don't see that. I have extensively reviewed available help including the official msdn pages but I don't see any examples where the position/velocity of the device are updated using the acceleration vector. Is the acceleration vector that the api returns (atleast in theory) supposed to be the rate of change of velocity or something else? (FYI - my device does not have a gyroscope, so the api is going to be the low accuracy version.)

Resources