Getting Y rotation of ARKit pointOfView - rotation

I'm trying to get the real life angle of the point of view in ARKit scene (0 - 360 degrees). I'm using euler angles from SCNNode of pointOfView.
print("\(pointOfView.eulerAngles.y.radiansToDegrees)")
Problem is, that when looking north, I'm getting 0 as a result and when looking south, I'm also getting 0. When looking NE, I get -45 degrees and when looking SE, I also get -45 degrees. Seems like SCNNode can not determine between North and South, only between West and East. Any advice?
I generally need to implement radar view in my ARKit real world scene. And expected behavior is North: 0, East: 90, South: 180, West: 270.
Thanks in advance!

I've just been working on a similar situation. What you are after I call the "heading" which isn't as easy to define cleanly as you might think.
Quick background: FYI, there are two kinds of rotation, "Euler" which are relative to the real world space but which suffer what they call Gimbal Lock at the "pitch" extremes. And then there are the rotation angles relative to the device's axis, held in the transform property of ARCamera.
To illustrate the difference euler.y alway means the way the device is facing (except when it is flat in which case gimbal lock mucks it up, hence our problem), whereas the transform y always means rotation around the vertical axis through the phone (which, just to make things extra confusing, is based on the device held landscape in ARKit).
(Side note: If you are used to CoreMotion, you may have notice that in ARKit, Gimbal Lock occurs when the device is held flat, whereas in CM it is upright).
So how do we get a "heading" that works whether the device is flat or upright? The solution below (sorry it's objective-c!) does the following:
Take two normal vector, one along the phone's Z axis (straight out from the screen) and one that sticks out the bottom of the phone, which I call the -Y axis (though it's actually the +X axis when held landscape).
Rotate the vector by the device's transform (not the Eulers), project onto the XZ plane and get the angle of the projected vectors wrt the Z-axis.
When the phone is upright, the Z Normal will be the perfect heading, but when the phone is flat, the Y normal is the one to use. In between we'll "crossfade" based on the phone's "tilt", ie the euler.x.
One small issue is the when user holds the phone slightly down past flat, the heading given by the Z Normal flips. We don't really want that (more from a UX perspective than a mathematical one) so let's detect this "downward tilt" and flip the zHeading 180˚ when it happens.
The end result is a consistent and smooth heading regardless of the device orientation. It even works when the device is changed moved between portrait and landscape...huzzah!
// Create a Quaternion representing the devices curent rotation (NOT the same as the euler angles!)
GLKMatrix3 deviceRotM = GLKMatrix4GetMatrix3(SCNMatrix4ToGLKMatrix4(SCNMatrix4FromMat4(camera.transform)));
GLKQuaternion Q = GLKQuaternionMakeWithMatrix3(deviceRotM);
// We want to use the phone's Z normal (in the phone's reference frame) projected onto XZ to get the angle when the phone is upright BUT the Y normal when it's horizontal. We'll crossfade between the two based on the phone tilt (euler x)...
GLKVector3 phoneZNormal = GLKQuaternionRotateVector3(Q, GLKVector3Make(0, 0, 1));
GLKVector3 phoneYNormal = GLKQuaternionRotateVector3(Q, GLKVector3Make(1, 0, 0)); // why 1,0,0? Rotation=(0,0,0) is when the phone is landscape and upright. We want the vector that will point to +Z when the phone is portrait and flat
float zHeading = atan2f(phoneZNormal.x, phoneZNormal.z);
float yHeading = atan2f(phoneYNormal.x, phoneYNormal.z);
// Flip the zHeading if phone is tilting down, ie. the normal pointing down the device suddenly has a +y component
BOOL isDownTilt = phoneYNormal.y > 0;
if (isDownTilt) {
zHeading = zHeading + M_PI;
if (zHeading > M_PI) {
zHeading -= 2 * M_PI;
}
}
float a = fabs(camera.eulerAngles.x / M_PI_2);
float heading = a * yHeading + (1 - a) * zHeading;
NSLog(#"euler: %3.1f˚ %3.1f˚ %3.1f˚ zHeading=%3.1f˚ yHeading=%3.1f˚ heading=%3.1f˚ a=%.2f status:%li:%li zNorm=(%3.2f, %3.2f, %3.2f) yNorm=(%3.2f, %3.2f, %3.2f)", GLKMathRadiansToDegrees(camera.eulerAngles.x), GLKMathRadiansToDegrees(camera.eulerAngles.y), GLKMathRadiansToDegrees(camera.eulerAngles.z), GLKMathRadiansToDegrees(zHeading), GLKMathRadiansToDegrees(yHeading), GLKMathRadiansToDegrees(heading), a, camera.trackingState, camera.trackingStateReason, phoneZNormal.x, phoneZNormal.y, phoneZNormal.z, phoneYNormal.x, phoneYNormal.y, phoneYNormal.z);

Related

Inverse Camera Intrinsic Matrix for Image Plane at Z = -1

A similar question was asked before, unfortunately I cannot comment Samgaks answer so I open up a new post with this one. Here is the link to the old question:
How to calculate ray in real-world coordinate system from image using projection matrix?
My goal is to map from image coordinates to world coordinates. In fact I am trying to do this with the Camera Intrinsics Parameters of the HoloLens Camera.
Of course this mapping will only give me a ray connecting the Camera Optical Centre and all points, which can lie on that ray. For the mapping from image coordinates to world coordinates we can use the inverse camera matrix which is:
K^-1 = [1/fx 0 -cx/fx; 0 1/fy -cy/fy; 0 0 1]
Pcam = K^-1 * Ppix;
Pcam_x = P_pix_x/fx - cx/fx;
Pcam_y = P_pix_y/fy - cy/fy;
Pcam_z = 1
Orientation of Camera Coordinate System and Image Plane
In this specific case the image plane is probably at Z = -1 (However, I am a bit uncertain about this). The Section Pixel to Application-specified Coordinate System on page HoloLens CameraProjectionTransform describes how to go form pixel coordinates to world coordinates. To what I understand two signs in the K^-1 are flipped s.t. we calculate the coordinates as follows:
Pcam_x = (Ppix_x/fx) - (cx*(-1)/fx) = P_pix_x/fx + cx/fx;
Pcam_y = (Ppix_y/fy) - (cy*(-1)/fy) = P_pix_y/fy + cy/fy;
Pcam_z = -1
Pcam = (Pcam_x, Pcam_y, -1)
CameraOpticalCentre = (0,0,0)
Ray = Pcam - CameraOpticalCentre
I do not understand how to create the Camera Intrinsics for the case of the image plane being at a negative Z-coordinate. And I would like to have a mathematical explanation or intuitive understanding of why we have the sign flip (P_pix_x/fx + cx/fx instead of P_pix_x/fx - cx/fx).
Edit: I read in another post that the thirst column of the camera matrix has to be negated for the case that the camera is facing down the negative z-direction. This would explain the sign flip. However, why do we need to change the sign of the third column. I would like to have a intuitive understanding of this.
Here the link to the post Negation of third column
Thanks a lot in advance,
Lisa
why do we need to change the sign of the third column
To understand why we need to negate the third column of K (i.e. negate the principal points of the intrinsic matrix) let's first understand how to get the pixel coordinates of a 3D point already in the camera coordinates frame. After that, it is easier to understand why -z requires negating things.
let's imagine a Camera c, and one point B in the space (w.r.t. the camera coordinate frame), let's put the camera sensor (i.e. image) at E' as in the image below. Therefore f (in red) will be the focal length and ? (in blue) will be the x coordinate in pixels of B (from the center of the image). To simplify things let's place B at the corner of the field of view (i.e. in the corner of the image)
We need to calculate the coordinates of B projected into the sensor d (which is the same as the 2d image). Because the triangles AEB and AE'B' are similar triangles then ?/f = X/Z therefore ? = X*f/Z. X*f is the first operation of the K matrix is. We can multiply K*B (with B as a column vector) to check.
This will give us coordinates in pixels w.r.t. the center of the image. Let's imagine the image is size 480x480. Therefore B' will look like this in the image below. Keep in mind that in image coordinates, the y-axis increases going down and the x-axis increases going right.
In images, the pixel at coordinates 0,0 is in the top left corner, therefore we need to add half of the width of the image to the point we have. then px = X*f/Z + cx. Where cx is the principal point in the x-axis, usually W/2. px = X*f/Z + cx is exactly as doing K * B / Z. So X*f/Z was -240, if we add cx (W/2 = 480/2 = 240) and therefore X*f/Z + cx = 0, same with the Y. The final pixel coordinates in the image are 0,0 (i.e. top left corner)
Now in the case where we use z as negative, when we divide X and Y by Z, because Z is negative, it will change the sign of X and Y, therefore it will be projected to B'' at the opposite quadrant as in the image below.
Now the second image will instead be:
Because of this, instead of adding the principal point, we need to subtract it. That is the same as negating the last column of K.
So we have 240 - 240 = 0 (where the second 240 is the principal point in x, cx) and the same for Y. The pixel coordinates are 0,0 as in the example when z was positive. If we do not negate the last column we will end up with 480,480 instead of 0,0.
Hope this helped a little bit

Point look at Point

So I have one point in 3D space, and I have location and rotation of the Camera in 3D space.
So basically there is Vector3 on the object.
Camera Vector3 and Quaternion.
I need to get how to look at that point.
I want to tell user how to move to that point.
Should user direct camera left or right or behind?
One way to do this is to calculate the direction the camera is currently facing as a yaw angle (like a compass heading), and calculate the direction it needs to face in order to be looking at the point.
Subtract one from the other and adjust the result so that it is in the range of -180 to 180 degrees (or -pi to pi radians) and then tell the user to turn left or right based on the sign. If the absolute value is more than 120 degrees (or some configurable value) then tell them it is behind them.
To find the camera's current heading, transform the vector (0, 0, 1) by the quaternion to get the forward vector, then calculate the heading using atan2(forward.z, forward.x).
To calculate the heading required to look at the point, subtract the current camera position from the point to get a desired forward vector and then pass to atan:
Vector3 desired_forward = point - camera_pos;
float desired_heading = atan2(desired_forward.z, desired_forward.x);
Then find the rotation needed:
float rotation_needed = desired_heading - heading;
if(rotation_needed > Math.PI)
rotation_needed -= 2 * Math.PI;
if(rotation_needed < -Math.PI)
rotation_needed += 2 * Math.PI;
Now tell the user to rotate left or right based on the sign of the rotation needed.
If you want to do it for look up/down, you can calculate a pitch angle by first calculating the length of the forward vector in the XZ plane and then using atan2 again:
float xzLength = sqrt(forward.x * forward.x + forward.z * forward.z);
float pitch_angle = atan2(forward.y, xzLength);
Do the same for the desired forward vector and subtract the current from the desired. Check the sign to tell the user whether to look up or down.
There's a few likely complications. For example, depending on whether the camera quaternion specifies the transform from world space to camera space or vice versa, you might need to negate the calculated camera heading.

Finding World Space Coordinates of a Unity UI Element

So according to the Unity documentation RectTransform.anchoredPosition will return the screen coordinates of a UI element if the anchors are touching at the pivot point of the RectTransform. However, if they are separated (in my case positioned at the corners of the rect) they will give you the position of the anchors relative to the pivot point. This is wonderful unless you want to keep appropriate dimensions of a UI object through multiple resolutions and position a different object based on that position at the same time.
Let's break this down. I have object1 and object2. object1 is positioned at (322.5, -600) and when the anchor points meet at the center (pivot) of the object anchoredPosition returns just that and object2 is positioned just fine. On the other hand once I have placed the anchors at the 4 corners of object1 anchoredPosition returns (45.6, -21). Thats just no good. I've even tried using Transform.position and then Camera.WorldToScreenPoint(), but that does just about as much to getting me to my goal.
I was hoping that you might be able to help me find a way to get the actual screen coordinates of this object. If anyone has any insight into this subject it would be greatly appreciated.
Notes: I've already attempted to use RectTranfrom.rect.center and it returned (0, 0)
I've also looked into RectTransformUtility and those helper functions have done all of squat.
anchoredPosition returns "The position of the pivot of this RectTransform relative to the anchor reference point." It has nothing to do with screen coordinates or world space.
If you're looking for the screen coordinates of a UI element in Unity, you can either use rectTransform.TransformPoint or rectTransform.GetWorldCorners to get any of the Vector3s you'd need in world space. Which ever you decide to go with, you can then pass them into Camera.WorldToScreenPoint()
Here's a glimpse on how finding world space coordinates of UI elements works if your stuck and need to roll your own transformations from view-space to world-space.
This may be beneficial if say you need something more than rectTransform.TransformPoint or want to know how this works.
Ok, so you want to do a transformation from normalised UI coordinates in the range [-1, 1], and de-project them back into world space coordinates.
To do this you could use something like Camera.main.ScreenToWorldPoint or Camera.main.ViewportToWorldPoint, or even rectTransform.position if your a lacker.
This is how to do it with just the camera's projection matrix.
/// <summary>
/// Get the world position of an anchor/normalised device coordinate in the range [-1, 1]
/// </summary>
private Vector3 GetAnchor(Vector2 ndcSpace)
{
Vector3 worldPosition;
Vector4 viewSpace = new Vector4(ndcSpace.x, ndcSpace.y, 1.0f, 1.0f);
// Transform to projection coordinate.
Vector4 projectionToWorld = (_mainCamera.projectionMatrix.inverse * viewSpace);
// Perspective divide.
projectionToWorld /= projectionToWorld.w;
// Z-component is backwards in Unity.
projectionToWorld.z = -projectionToWorld.z;
// Transform from camera space to world space.
worldPosition = _mainCamera.transform.position + _mainCamera.transform.TransformVector(projectionToWorld);
return worldPosition;
}
I've found out that you can multiply your coordinate by the 2 times the camera size and divide it to screen height.
I have a panel placed at (0, 1080) on a fullHD screen (1920 x 1080), camera size is 7. So the Y coordinate in world space will be 1080 * 7 * 2 / 1080 = 14 -> (0, 14).
ScreenToWorldPoint convert canvas position to world position :
Camera.main.ScreenToWorldPoint(transform.position)

Correct calculations of floats in OpenGL ES

I'm making a game in 3D. Everything is correct in my code, although I'm confused about one thing.
When I setting up my perspective (gluPerspective) I set it to zNear = 0.1f and zFar = 100.0f. So far so good. Now, I also wanna move things just in the x or y direction via glTranslate.... But, the origo starts in the absolute centrum of my screen. Like I have zFar and zNear, why isn't that properly to the x and y coordinates? Now it is like if I move my sprite -2.0f to left on x-axis and make glTranslate... to handle that, it almost out of screen. And the z-axis is not behave like that. That's make it a lot more difficult to handle calculations in all directions. It's quite hard to add an unique float value to an object and for now I just add these randomly to just make them stay inside screen.
So, I have problem calculate corrects value to each object. Have I missed something? Should I change or thinkig of something? The reason that this is important is because I need to know the absolute left and right of my screen to make these calculations.
This is my onSurfaceChanged:
public void onSurfaceChanged(GL10 gl, int width, int height) {
gl.glViewport(0, 0, width, height);
gl.glMatrixMode(GL10.GL_PROJECTION);
gl.glLoadIdentity();
GLU.gluPerspective(gl, 45.0f, (float)width / (float)height,
0.1f, 100.0f);
gl.glMatrixMode(GL10.GL_MODELVIEW);
gl.glLoadIdentity();
}
Thanks in advance!
When you use gluPerspective you are transforming your coordinates from 3D world space into 2D screen space, using a matrix which looks at (0,0,0) by default (i.e. x= 0, y = 0 is in the center of the screen). When you set your object coordinates you are doing it in world space, NOT screen space.
If you want to effectively do 2D graphics (where things are given coordinates respective to their position on the screen you want to use gluOrtho2D instead.

Rotating an image with the mouse

I am writing a drawing program, Whyteboard -- http://code.google.com/p/whyteboard/
I have implemented image rotating functionality, except that its behaviour is a little odd. I can't figure out the proper logic to make rotating the image in relation to the mouse position
My code is something similar to this:
(these are called from a mouse event handler)
def resize(self, x, y, direction=None):
"""Rotate the image"""
self.angle += 1
if self.angle > 360:
self.angle = 0
self.rotate()
def rotate(self, angle=None):
"""Rotate the image (in radians), turn it back into a bitmap"""
rad = (2 * math.pi * self.angle) / 360
if angle:
rad = (2 * math.pi * angle) / 360
img = self.img.Rotate(rad, (0, 0))
So, basically the angle to rotate the image keeps getting increased when the user moves the mouse. However, this sometimes means you have to "circle" the mouse many times to rotate an image 90 degrees, let alone 360.
But, I need it similar to other programs - how the image is rotated in relation to your mouse's position to the image.
This is the bit I'm having trouble with. I've left the question language-independent, although using Python and wxPython it could be applicable to any language
I'm assuming resize() is called for every mouse movement update. Your problem seems to be the self.angle += 1, which makes you update your angle by 1 degree on each mouse event.
A solution to your problem would be: pick the point on the image where the rotation will be centered (on this case, it's your (0,0) point on self.img.Rotate(), but usually it is the center of the image). The rotation angle should be the angle formed by the line that goes from this point to the mouse cursor minus the angle formed by the line that goes from this point to the mouse position when the user clicked.
To calculate the angle between two points, use math.atan2(y2-y1, x2-x1) which will give you the angle in radians. (you may have to change the order of the subtractions depending on your mouse position axis).
fserb's solution is the way I would go about the rotation too, but something additional to consider is your use of:
img = self.img.Rotate(rad, (0, 0))
If you are performing a bitmap image rotation in response to every mouse drag event, you are going to get a lot of data loss from the combined effect of all the interpolation required for the rotation. For example, rotating by 1 degree 360 times will give you a much blurrier image than the original.
Try having a rotation system something like this:
display_img = self.img.Rotate(rad, pos)
then use the display_img image while you are in rotation mode. When you end rotation mode (onMouseUp maybe), img = display_img.
This type of strategy is good whenever you have a lossy operation with a user preview.
Here's the solution in the end,
def rotate(self, position, origin):
""" position: mouse x/y position, origin: x/y to rotate around"""
origin_angle = self.find_angle(origin, self.center)
mouse_angle = self.find_angle(position, self.center)
angle = mouse_angle - origin_angle
# do the rotation here
def find_angle(self, a, b):
try:
answer = math.atan2((a[0] - b[0]) , (a[1] - b[1]))
except:
answer = 0
return answer

Resources