I would like to know if it is possible to measure the dimensions of an object by just pointing the camera at the object without moving the camera from left to right like we do in Google Measurement.
A depth map cannot be calculated from just a 2D camera image. A smart phone does not have a distance sensor but it does have motion sensors, so by combining the movement of the device with changes in the input from the camera(s), ARCore can calculate depth. To put it simply, objects in close to the camera move around on screen more, compared to objects further away.
To get depth data from a fixed position would require different technologies than found on current phones, such as LiDAR or an infrared beam projector and infrared camera.
Related
I would like to make a game where I use a camera with infrared tracking, so that I can track peoples heads (from top view). For example each player will get a helmet so that the camera or infrared sensor can track him/her.
After that I need to know the exact positions of that person in unity, to place a 3D gameobject at the players position.
Maybe there is another workaround to get peoples positions in unity. I know I could use a kinect, but I need to track at least 10 people at the same time.
Thanks
Note: This is not really a closed answer, just a collection of my thoughts regarding your question on how to transfer recorded positions into unity.
If you really need full 3D positions, I believe you won't be happy when using only one sensor. In order to obtain depth information, which can further be used to calculate 3D positions in a reference coordinate system, you would have to use at least 2 sensors.
Another thing you could do is fixing the camera position and assuming, that all persons are moving in the same plane (e.g. fixed y-component), which would allow you to determine 3D positions utilizing the projection formula given the camera parameters (so camera has to be calibrated).
What also comes to my mind is: You could try to simulate your real camera with a virtual camera in unity. This way you can use the virtual camera to project image coordinates (coming from the real camera) into unity's 3D world. I haven't tried this myself, but there was someone who tried it, you can have a look at that: https://community.unity.com/t5/Editor/How-to-simulate-Unity-Pinhole-Camera-from-its-intrinsic/td-p/1922835
Edit given your comment:
Okay, sticking to your soccer example, you could proceed as follows:
Setup: Say you define your playing area to be rectangular with its origin in the bottom left corner (think of UVs). You set these points in the real world (and in unitys representation of it) as (0,0) (bottom left) and (width, height) (top right), choosing whichever measure you like (e.g. meters, as this is unitys default unit). As your camera is stationary, you can assign the corresponding corner points in image coordinates (pixel coordinates) as well. To make things easier, work with normalized coordinates instead of pixels, thus bottom left is (0,0) ans top right is (1,1).
Tracking: When tracking persons in the image, you can calculate their normalized position (x,y) (with x and y in [0,1]). These normalized positions can be transferred into unitys 3D space (in unity you will have a playable area of the same width and height) by simply calculating a Vector3 as (x*widht, 0, y*height) (in unity x is pointing right, y is pointing up and z is pointing forward).
Edit on Tracking:
For top-view tracking in a game, I would say you are on the right track with using some sort of helmet, which enables you to use some sort of marker based tracking (in my opinion markerless multi-target tracking is not reliable enough for use in a video game) (if you want learn more about object tracking, there are lots of resources in the field of computer vision).
Independent of the sensor you are using (IR or camera), you would go create some unique marker for each helmet, thus enabling you to identify each helmet (and also the player). A marker in that case is some sort of unique pattern, that can be recognized by an algorithm for each recorded frame. In IR you can arrange quadratic IR markers to form a specific pattern and for normal cameras you can use markers like QR codes (there are also libraries for augmented reality related content, that offer functionality for creating and recognizing markers, e.g. ArUco or ARToolkit, although I don't know if they offer C# libraries, I have only used ArUco with c++ a while ago).
When you have your markers of choice, the tracking procedure is then pretty straightforward, for each recorded image:
- detect all markers in the current image (these correspond to all players currently visible)
- follow the steps from my last edit using the detected positions
I hope that helps, feel free to contact me again.
my use case is only concerned with locationing, in fact only 2-d locationing. so a lot of the cool capabilities in tango are probably not useful to me. so I'm trying to see if i could implement the location algorithm myself.
from teardown reports it seems the 9dof sensors are pretty commodity hardware. the basic integration-based location algorithm (even with magnetic field calibration) has been mature knowledge. what algorithm does tango use?
from the description it seems that tango tries to aid in navigation by using the images it sees as a reference, sort of like the "terrain-following" mode in cruise missiles, is this right? this would be too ccomplex for me to implemente
You may easily get 2D position using the TangoPoseData with the correct coordinate system:
Project Tango uses a right-handed, local-level frame for the START_OF_SERVICE and AREA_DESCRIPTION coordinate frames. This convention sets the Z-axis aligned with gravity, with Z+ pointed upwards, and the X-Y plane is perpendicular to gravity and locally level with the ground plane. This local-level convention is based on the local east-north-up (ENU) earth-based coordinate system. Instead of true north, Project Tango uses the direction the back of the device is pointed when the service started as the Y axis, and the X axis is pointed to the right. The START_OF_SERVICE and AREA_DESCRIPTION base coordinate frames of the API will use this local-level frame convention.
Said more simply, use the pose data y/x coordinates for your space as you would latitude/longitude for the earth.
Heading data is also derived from the TangoPoseData and can be converted from quaternion to euler angles. Euler angles may be easier for you to use in your 2D location app.
Tango uses 3D to increase the confidence of its position within the space...even if you don't need 3D. I would let Tango do the hard stuff and extract the 2D position so you can focus on your app.
Tango uses the camera images to detect any change in position. And uses the IMU for device rotation and acceleration. Try blocking the camera and using the Motion Tracking app, it will fail.
I recently came across a product called Kolibree on kickstarted, which is a smart toothbrush. From what they say on their website, it seems that Kolibree can detect each tooth. I have some exposure to gesture recognition and flight dynamics (roll angle, pitch angle, heading angle, ...) the technologies I believe need be used in this product, but I'm confused how it can accurately detect EACH tooth ? I think we can detect the left, right, up and down region using roll and pitch angle, maybe a little more precisely by using the heading angle. but accurate to each tooth is beyond my understanding. Could someone shed light on this ?
thanks,
Ted
from the kickstarter video it has:
Accelerometers
Gyroscopes
Magnetometers
These provide relative position and absolute direction of the device
So how to detect tooths? I would start with this:
tooth shape
by brushing you can collect surface data of close proximity to brush
but only when no significant surface movement is detected then
this can differentiate tooth types by curvature shape/size
so you have an idea in what part of jaw you are
vibrations
spinning brush creates noise pulses in accelerometer readings
these should be dependent on the movement and surface shape
when linear movement is detected (you move brush from side to side)
then the gaps between tooths will create measurable readings in acceleration
this can be used to recognize relative tooth position
angular constraints
when we brush teeth on the left/right side or up down of the mouth
we hold the brush differently
this can be also measured
if overall angular position is within certain borders
then we can assume which side of mouth are actually brushing
when you put all these data together
then we can improve the accuracy of tooth scan to better numbers
also if some kind of calibration is used that can improve it more
for example hold/click some button to start calibration
and move around the mouth by specific calibration movement ...
[notes]
some things that have to be taken in mind
left/right handed people hold the brush differently
this goes also for motoric dis-functions (disabled people)
missing or curved tooth anomalies (can be later used as mark point)
my guess is by adding camera info (for example from the linked device)
for head/jaw position detection can improve detection even more
I am developing a 3D game for Windows Phone that includes terrains and volcanoes at infinite distance similar to Battle Zone (1980) by Atari Inc. The player can never touch the terrains no matter how far player drives. Currently, to implement this I am mapping a 2D texture inside the wall of cylinder. The cylinder is also moving with the player so that the player can never reach terrains. I am not sure whether this is a good method to implement terrains as I am facing problems like distortion of texture when mapping it on the wall of cylinder.
Please suggest me methods to implement a view of terrains in XNA similar to Battle Zone?
normally instead of cylinder developers use box (so-called SkyBox)
It has less polygons and in general less distortion (could be some at edges)
To make it look more real some devs like Valve use off-screen render in first pass that include skybox + some distant models with low details and moving cloud sprites or textured ring with alpha. Both points of view are synchronised (main camera and off-screen camera) then (without clearing colour buffer) they render final scene on top. Thanks to that far building will move a bit and scene surrounding will look less plain. To avoid z-buffer cleaning between passes they simply doing first pass under the floor(literally) of the scene of main pass.
I've managed to understand how to project 3d point to 2d screen.
Now, I would like to ask some guidelines on how to integrate phone rotation according to accelerometar data to change marker's screen coordinate.
You need the gyro data, not the accelerometer data.
The gyro mouse might work for your application, see between 37:00-38:25 in the Google Tech Talk.
If you need more than the gyro mouse then I highly recommend Direction Cosine Matrix IMU: Theory, it is basically a tutorial on how to implement orientation tracking.
Similar questions:
track small movements of iphone with no GPS
What is the real world accuracy of phone accelerometers when used for positioning?
how to calculate phone's movement in the vertical direction from rest?
iOS: Movement Precision in 3D Space
How to use Accelerometer to measure distance for Android Application Development
How can I find distance traveled with a gyroscope and accelerometer?