Multiple Tangos Looking at one location - IR Conflict - google-project-tango

I am getting my first Tango in the next day or so; worked a little bit with Occipital's Structure Sensor - which is where my background in depth perceiving camera's come from.
Has anyone used multiple Tango at once (lets say 6-10), looking at the same part of a room, using depth for identification and placement of 3d character/content? I have been told that multiple devices looking at the same part of a room will confuse each Tango as they will see the other Tango's IR dots.
Thanks for your input.
Grisly

I have not tried to use several Tangos, but I have however tried to use my Tango in a room where I had a Kinect 2 sensor, which caused the Tango to go bananas. It seems however like the Tango has lower intensity on its IR projector in comparison, but I would still say that it is a reasonable assumption that it will not work.
It might work under certain angles but I doubt that you will be able to find a configuration of that many cameras without any of them interfering with each other. If you would make it work however, I would be very interested to know how.

You could lower the depth camera rate (defaults to 5/second I believe) to avoid conflicts, but that might not be desirable given what you're using the system for.
Alternatively, only enable the depth camera when placing your 3D models on surfaces, then disable said depth camera when it is not needed. This can also help conserve CPU and battery power.

It did not work. Occipital Structure Sensor on the other hand, did work (multiple devices in one place)!

Related

Does the project tango tablet work outdoors?

I'm looking to develop an outdoor application but not sure if the tango tablet will work outdoors. Other depth devices out there tend to not work well outside becuase they depend on IR light being projected from the device and then observed after it bounces off the objects in the scene. I've been looking for information on this and all I've found is this video - https://www.youtube.com/watch?v=x5C_HNnW_3Q. Based on the video, it appears it can work outside by doing some IR compensation and/or using the depth sensor but just wanted to make sure before getting the tablet.
If the sun is out, it will only work in the shade, and darker shade is better. I tested this morning using the Java Point Cloud sample app, and only get > 10k points in my point cloud in center of my building's shadow, close to the building. Toward the edge of the shadow the depth point cloud frame rate goes way down and I get the "Few depth points" message. If it's overcast, I'm guessing your results will vary, depending on how dark it is, I haven't tested this yet.
The tango (yellowstone) tablet also works by projecting IR light patterns, like the other depth sensing devices you mentioned.
You can expect the pose tracking and area learning to work as well as they do indoors. The depth perception, however, will likely not work well outside in direct sunlight.

Transforming and registering point clouds

I’m starting to develop with Project Tango API.
I need to save PointCloud data that I get in the event OnXyzIjAvailable;
to do this, I started from your example "PointCloudJava" and wrote PointCloud coordinates in single files (an AsyncTask is started for this purpose).
So I have one file with xyz for each event. On the same event I get the corresponding transformation matrix (mRenderer.getModelMatCalculator(). GetPointCloudModelMatrixCopy()).
Point clouds
Then I’ve imported all this data (xyz point cloud with corresponding transformation matrix; the transformation matrix is applied to the point clouds) but the point clouds doesn’t match exactly; it seems that point clouds are closed each other but not overlapping exactly.
My questions are:
-Why I don’t have the matching between the single point clouds ?
-What I should have to do to have this matching ?
Then I’ve notice the following that is probably related to the above problem; I’ve used Project Tango Explore application (Area learning), I can see my position, but is constantly in motion even if I don't move.
Which is the problem ? Is it necessary a calibration?
Device Information
Poses delivered by Tango have a non-negligible amount of drift. Here is a sample graph of pose position when my tablet was in its stand observing a static scene (ideally the traces should be flat):
When we couple this drift with tracking errors when the device is actually moving then this produces noticeable registration issues. I see this especially when the device is rolled, i.e. rotated about the view axis. The raw pose quality may be sufficient for some applications (e.g. location) but causes problems for others (e.g. 3D scanning, seamless augmented reality).
I was disappointed when I saw this. But if Tango is attempting to measure motion by using the fisheye camera to correct inertial motion prediction - and not by using stereo vision between the fisheye and color cameras - then that is a really hard problem. And the reason for doing that would be to stay within CPU/GPU/RAM/latency/battery budgets to leave something for applications. So after consideration, while I remain disappointed, I can understand it.
I am hopeful that Tango will improve their pose algorithm over time, but I suspect that applications that depend on precise tracking will still have to add their own corrections, e.g. via stereo, structure from motion, point cloud correlation, etc.
Point clouds should be viewed as statistically accurate, not exactly accurate - there is a distance estimation error range that is a function of distance and surface characteristics - a tango fixed in a specific location will not return a constant point clout - rotation of the device can cause apparent drift, but it really isn't, it's just that the error is rotating along with the tango

Using windows phone combined motion api to track device position

I'd like to track the position of the device with respect to an initial position with high accuracy (ideally) for motions at a small scale (say < 1 meter). The best bet seems to be using motionReading.SensorReading.DeviceAcceleration. I tried this. But ran into few problems. Apart from the noisy readings (which I was expecting and can tolerate), I see some behaviors that are conceptually wrong - e.g. If I start from rest, move the phone around and bring it back to rest- and in the process periodically update the velocity vector along all the dimensions, I would expect the magnitude of the velocity to be very small (ideally 0). But I don't see that. I have extensively reviewed available help including the official msdn pages but I don't see any examples where the position/velocity of the device are updated using the acceleration vector. Is the acceleration vector that the api returns (atleast in theory) supposed to be the rate of change of velocity or something else? (FYI - my device does not have a gyroscope, so the api is going to be the low accuracy version.)

What is an algorithm I can use to program an image compare routine to detect changes (like a person coming into the frame of a web cam)?

I have a web cam that takes a picture every N seconds. This gives me a collection of images of the same scene over time. I want to process that collection of images as they are created to identify events like someone entering into the frame, or something else large happening. I will be comparing images that are adjacent in time and fixed in space - the same scene at different moments of time.
I want a reasonably sophisticated approach. For example, naive approaches fail for outdoor applications. If you count the number of pixels that change, for example, or the percentage of the picture that has a different color or grayscale value, that will give false positive reports every time the sun goes behind a cloud or the wind shakes a tree.
I want to be able to positively detect a truck parking in the scene, for example, while ignoring lighting changes from sun/cloud transitions, etc.
I've done a number of searches, and found a few survey papers (Radke et al, for example) but nothing that actually gives algorithms that I can put into a program I can write.
Use color spectroanalisys, without luminance: when the Sun goes down for a while, you will get similar result, colors does not change (too much).
Don't go for big changes, but quick changes. If the luminance of the image changes -10% during 10 min, it means the usual evening effect. But when the change is -5%, 0, +5% within seconds, its a quick change.
Don't forget to adjust the reference values.
Split the image to smaller regions. Then, when all the regions change same way, you know, it's a global change, like an eclypse or what, but if only one region's parameters are changing, then something happens there.
Use masks to create smart regions. If you're watching a street, filter out the sky, the trees (blown by wind), etc. You may set up different trigger values for different regions. The regions should overlap.
A special case of the region is the line. A line (a narrow region) contains less and more homogeneous pixels than a flat area. Mark, say, a green fence, it's easy to detect wheter someone crosses it, it makes bigger change in the line than in a flat area.
If you can, change the IRL world. Repaint the fence to a strange color to create a color spectrum, which can be identified easier. Paint tags to the floor and wall, which can be OCRed by the program, so you can detect wheter something hides it.
I believe you are looking for Template Matching
Also i would suggest you to look on to Open CV
We had to contend with many of these issues in our interactive installations. It's tough to not get false positives without being able to control some of your environment (sounds like you will have some degree of control). In the end we looked at combining some techniques and we created an open piece of software named OpenTSPS (Open Toolkit for Sensing People in Spaces - http://www.opentsps.com). You can look at the C++ source in github (https://github.com/labatrockwell/openTSPS/).
We use ‘progressive background relearn’ to adjust to the changing background over time. Progressive relearning is particularly useful in variable lighting conditions – e.g. if lighting in a space changes from day to night. This in combination with blob detection works pretty well and the only way we have found to improve is to use 3D cameras like the kinect which cast out IR and measure it.
There are other algorithms that might be relevant, like SURF (http://achuwilson.wordpress.com/2011/08/05/object-detection-using-surf-in-opencv-part-1/ and http://en.wikipedia.org/wiki/SURF) but I don't think it will help in your situation unless you know exactly the type of thing you are looking for in the image.
Sounds like a fun project. Best of luck.
The problem you are trying to solve is very interesting indeed!
I think that you would need to attack it in parts:
As you already pointed out, a sudden change in illumination can be problematic. This is an indicator that you probably need to achieve some sort of illumination-invariant representation of the images you are trying to analyze.
There are plenty of techniques lying around, one I have found very useful for illumination invariance (applied to face recognition) is DoG filtering (Difference of Gaussians)
The idea is that you first convert the image to gray-scale. Then you generate two blurred versions of this image by applying a gaussian filter, one a little bit more blurry than the first one. (you could use a 1.0 sigma and a 2.0 sigma in a gaussian filter respectively) Then you subtract from the less-blury image, the pixel intensities of the more-blurry image. This operation enhances edges and produces a similar image regardless of strong illumination intensity variations. These steps can be very easily performed using OpenCV (as others have stated). This technique has been applied and documented here.
This paper adds an extra step involving contrast equalization, In my experience this is only needed if you want to obtain "visible" images from the DoG operation (pixel values tend to be very low after the DoG filter and are veiwed as black rectangles onscreen), and performing a histogram equalization is an acceptable substitution if you want to be able to see the effect of the DoG filter.
Once you have illumination-invariant images you could focus on the detection part. If your problem can afford having a static camera that can be trained for a certain amount of time, then you could use a strategy similar to alarm motion detectors. Most of them work with an average thermal image - basically they record the average temperature of the "pixels" of a room view, and trigger an alarm when the heat signature varies greatly from one "frame" to the next. Here you wouldn't be working with temperatures, but with average, light-normalized pixel values. This would allow you to build up with time which areas of the image tend to have movement (e.g. the leaves of a tree in a windy environment), and which areas are fairly stable in the image. Then you could trigger an alarm when a large number of pixles already flagged as stable have a strong variation from one frame to the next one.
If you can't afford training your camera view, then I would suggest you take a look at the TLD tracker of Zdenek Kalal. His research is focused on object tracking with a single frame as training. You could probably use the semistatic view of the camera (with no foreign objects present) as a starting point for the tracker and flag a detection when the TLD tracker (a grid of points where local motion flow is estimated using the Lucas-Kanade algorithm) fails to track a large amount of gridpoints from one frame to the next. This scenario would probably allow even a panning camera to work as the algorithm is very resilient to motion disturbances.
Hope this pointers are of some help. Good Luck and enjoy the journey! =D
Use one of the standard measures like Mean Squared Error, for eg. to find out the difference between two consecutive images. If the MSE is beyond a certain threshold, you know that there is some motion.
Also read about Motion Estimation.
if you know that the image will remain reletivly static I would reccomend:
1) look into neural networks. you can use them to learn what defines someone within the image or what is a non-something in the image.
2) look into motion detection algorithms, they are used all over the place.
3) is you camera capable of thermal imaging? if so it may be worthwile to look for hotspots in the images. There may be existing algorithms to turn your webcam into a thermal imager.

Looking for ways for a robot to locate itself in the house

I am hacking a vacuum cleaner robot to control it with a microcontroller (Arduino). I want to make it more efficient when cleaning a room. For now, it just go straight and turn when it hits something.
But I have trouble finding the best algorithm or method to use to know its position in the room. I am looking for an idea that stays cheap (less than $100) and not to complex (one that don't require a PhD thesis in computer vision). I can add some discrete markers in the room if necessary.
Right now, my robot has:
One webcam
Three proximity sensors (around 1 meter range)
Compass (no used for now)
Wi-Fi
Its speed can vary if the battery is full or nearly empty
A netbook Eee PC is embedded on the robot
Do you have any idea for doing this? Does any standard method exist for these kind of problems?
Note: if this question belongs on another website, please move it, I couldn't find a better place than Stack Overflow.
The problem of figuring out a robot's position in its environment is called localization. Computer science researchers have been trying to solve this problem for many years, with limited success. One problem is that you need reasonably good sensory input to figure out where you are, and sensory input from webcams (i.e. computer vision) is far from a solved problem.
If that didn't scare you off: one of the approaches to localization that I find easiest to understand is particle filtering. The idea goes something like this:
You keep track of a bunch of particles, each of which represents one possible location in the environment.
Each particle also has an associated probability that tells you how confident you are that the particle really represents your true location in the environment.
When you start off, all of these particles might be distributed uniformly throughout your environment and be given equal probabilities. Here the robot is gray and the particles are green.
When your robot moves, you move each particle. You might also degrade each particle's probability to represent the uncertainty in how the motors actually move the robot.
When your robot observes something (e.g. a landmark seen with the webcam, a wifi signal, etc.) you can increase the probability of particles that agree with that observation.
You might also want to periodically replace the lowest-probability particles with new particles based on observations.
To decide where the robot actually is, you can either use the particle with the highest probability, the highest-probability cluster, the weighted average of all particles, etc.
If you search around a bit, you'll find plenty of examples: e.g. a video of a robot using particle filtering to determine its location in a small room.
Particle filtering is nice because it's pretty easy to understand. That makes implementing and tweaking it a little less difficult. There are other similar techniques (like Kalman filters) that are arguably more theoretically sound but can be harder to get your head around.
A QR Code poster in each room would not only make an interesting Modern art piece, but would be relatively easy to spot with the camera!
If you can place some markers in the room, using the camera could be an option. If 2 known markers have an angular displacement (left to right) then the camera and the markers lie on a circle whose radius is related to the measured angle between the markers. I don't recall the formula right off, but the arc segment (on that circle) between the markers will be twice the angle you see. If you have the markers at known height and the camera is at a fixed angle of inclination, you can compute the distance to the markers. Either of these methods alone can nail down your position given enough markers. Using both will help do it with fewer markers.
Unfortunately, those methods are imperfect due to measurement errors. You get around this by using a Kalman estimator to incorporate multiple noisy measurements to arrive at a good position estimate - you can then feed in some dead reckoning information (which is also imperfect) to refine it further. This part is goes pretty deep into math, but I'd say it's a requirement to do a great job at what you're attempting. You can do OK without it, but if you want an optimal solution (in terms of best position estimate for given input) there is no better way. If you actually want a career in autonomous robotics, this will play large in your future. (
Once you can determine your position you can cover the room in any pattern you'd like. Keep using the bump sensor to help construct a map of obstacles and then you'll need to devise a way to scan incorporating the obstacles.
Not sure if you've got the math background yet, but here is the book:
http://books.google.com/books/about/Applied_optimal_estimation.html?id=KlFrn8lpPP0C
This doesn't replace the accepted answer (which is great, thanks!) but I might recommend getting a Kinect and use that instead of your webcam, either through Microsoft's recently released official drivers or using the hacked drivers if your EeePC doesn't have Windows 7 (presumably it does not).
That way the positioning will be improved by the 3D vision. Observing landmarks will now tell you how far away the landmark is, and not just where in the visual field that landmark is located.
Regardless, the accepted answer doesn't really address how to pick out landmarks in the visual field, and simply assumes that you can. While the Kinect drivers may already have feature detection included (I'm not sure) you can also use OpenCV for detecting features in the image.
One solution would be to use a strategy similar to "flood fill" (wikipedia). To get the controller to accurately perform sweeps, it needs a sense of distance. You can calibrate your bot using the proximity sensors: e.g. run motor for 1 sec = xx change in proximity. With that info, you can move your bot for an exact distance, and continue sweeping the room using flood fill.
Assuming you are not looking for a generalised solution, you may actually know the room's shape, size, potential obstacle locations, etc. When the bot exists the factory there is no info about its future operating environment, which kind of forces it to be inefficient from the outset.
If that's you case, you can hardcode that info, and then use basic measurements (ie. rotary encoders on wheels + compass) to precisely figure out its location in the room/house. No need for wifi triangulation or crazy sensor setups in my opinion. At least for a start.
Ever considered GPS? Every position on earth has a unique GPS coordinates - with resolution of 1 to 3 metres, and doing differential GPS you can go down to sub-10 cm range - more info here:
http://en.wikipedia.org/wiki/Global_Positioning_System
And Arduino does have lots of options of GPS-modules:
http://www.arduino.cc/playground/Tutorials/GPS
After you have collected all the key coordinates points of the house, you can then write the routine for the arduino to move the robot from point to point (as collected above) - assuming it will do all those obstacles avoidance stuff.
More information can be found here:
http://www.google.com/search?q=GPS+localization+robots&num=100
And inside the list I found this - specifically for your case: Arduino + GPS + localization:
http://www.youtube.com/watch?v=u7evnfTAVyM
I was thinking about this problem too. But I don't understand why you can't just triangulate? Have two or three beacons (e.g. IR LEDs of different frequencies) and a IR rotating sensor 'eye' on a servo. You could then get an almost constant fix on your position. I expect the accuracy would be in low cm range and it would be cheap. You can then map anything you bump into easily.
Maybe you could also use any interruption in the beacon beams to plot objects that are quite far from the robot too.
You have a camera you said ? Did you consider looking at the ceiling ? There is little chance that two rooms have identical dimensions, so you can identify in which room you are, position in the room can be computed from angular distance to the borders of the ceiling and direction can probably be extracted by the position of doors.
This will require some image processing but the vacuum cleaner moving slowly to be efficiently cleaning will have enough time to compute.
Good luck !
Use Ultra Sonic Sensor HC-SR04 or similar.
As above told sense the walls distance from robot with sensors and room part with QR code.
When your are near to a wall turn 90 degree and move as width of your robot and again turn 90deg( i.e. 90 deg left turn) and again move your robot I think it will help :)

Resources