We get timestamps as a double value for pose, picture, and point data - they aren't always aligned - how do I calculate the temporal distance between two time stamps ? Yes, I know how to subtract two doubles, but I'm not at all sure of how the delta corresponds to time.
I have some interesting timestamp data that sheds light on your question, without exactly answering it. I have been trying to match up depth frames with image frames - just as a lot of people posting under this Tango tag. My data did not match exactly and I thought there were problems with my projection matrices and point reprojection. Then I checked the timestamps on my depth frames and image frames and found that they were off by as much as 130 milliseconds. A lot! Even though I was getting the most recent image whenever a depth frame was available. So I went back to test just the timestamp data.
I am working in Native with code based on the point-cloud-jni-example. For each of onXYZijAvailable(), onFrameAvailable(), and onPoseAvailable() I am dumping out time information. In the XYZ and Frame cases I am copying the returned data to a static buffer for later use. For this test I am ignoring the buffered image frame, and the XYZ depth data is displayed in the normal OpenGL display loop of the example code. The data captured looks like this:
callback type : systime : timestamp : last pose
I/tango_jni_example( 3247): TM CLK Img 5.420798 110.914437 110.845522
I/tango_jni_example( 3247): TM CLK XYZ 5.448181 110.792470 110.845522
I/tango_jni_example( 3247): TM CLK Pose 5.454577 110.878850
I/tango_jni_example( 3247): TM CLK Img 5.458924 110.947708 110.878850
I/tango_jni_example( 3247): TM CLK Pose 5.468766 110.912178
The system time is from std::chrono::system_clock::now() run inside of each callback. (Offset by a start time at app start.) The timestamp is the actual timestamp data from the XYZij, image, or pose struct. For depth and image I also list the most recent pose timestamp (from start-of-service to device, with given time of 0.0). A quick analysis of about 2 minutes of sample data leads to the following initial conclusions:
Pose data is captured at VERY regular intervals of 0.033328 seconds.
Depth data is captured at pretty regular intervals of 0.2 seconds.
Image data is captured at odd intervals
with 3 or 4 frames at 0.033 seconds
then 1 frame at about 0.100 seconds
often followed by a second frame with the same timestamp
(even though it is not reported until the next onFrameAvailable()?)
That is the actual timestamp data in the returned structs. The "real?" elapsed time between callbacks is much more variable. The pose callback fires anywhere from 0.010 to 0.079 seconds, even though the pose timestamps are rock solid at 0.033. The image (frame) callback fires 4 times at between 0.025 and 0.040 and then gives one long pause of around 0.065. That is where two images with the same timestamp are returned in successive calls. It appears that the camera is skipping a frame?
So, to match depth, image, and pose you really need to buffer multiple returns with their corresponding timestamps (ring buffer?) and then match them up by whichever value you want as master. Pose times are the most stable.
Note: I have not tried to get a pose for a particular "in between" time to see if the returned pose is interpolated between the values given by onPoseAvailable().
I have the logcat file and various awk extracts available. I am not sure how to post those (1000's of lines).
I think the fundamental question would be how to sync the pose, depth and color image data together into a single frame. So to answer that, there are actually two step
Sync pose to either color image or depth: to do that, the simplest way is to use the TangoService_getPoseAtTime function, that basically gives you the ability to query a pose with certain timestamp. i.e, you have a depth point cloud available, and it gives you a timestamp of that depth frame, then you could use the depth point cloud timestamp to query the corresponding pose.
Sync color image and depth image: currently, you would have to buffer either the depth point cloud or the color image at the application level, and base on one of their's timestamp, query the other's data in the buffer. There is a field name color_image in the TangoXYZij data structure, and the comment says it's reserved for future use, so the built-in sync up feature might be coming in future releases.
Related
I work in autonomous robotics. I will often simulate a robot without visualization, export position and rotation data to a file at ~30 fps, and then play that file back at a later time. Currently, I save the animation data in a custom-format JSON file and animate using three.js.
I am wondering if there is a better way to export this data?
I am not well versed in animation, but I suspect that I could be exporting to something like COLLADA or glTF and gain the benefits of using a format that many systems are already setup to import.
I have a few questions (some specific and some general):
How do animations usually get exported in these formats? It seems that most of them have something to do with the skeletons or morphing, but neither of concepts appear to apply to my case. (Could I get a pointer to an overview of general animation concepts?)
I don't really need key-framing. Is it reasonable to have key-frames at 30 to 60 fps without any need for interpolation?
Do any standard animation formats save data in a format that doesn't assume some form of interpolation?
Am I missing something? I'm sure my lack of knowledge in the area has hidden something that is obvious to animators.
You specifically mentioned autonomous robots, and position and rotation in particular. So I assume that the robot itself is the level of granularity that is supposed to be stored here. (Just to differentiate it from an articulated robot - basically a manipulator ("arm") with several rotational or translational joints that may have different angles)
For this case, a very short, high-level description about how this could be stored in glTF(*):
You would store the robot (or each robot) as one node of a glTF asset. Each of the nodes can contain a translation and rotation property (given as a 3D vector and a quaternion). These nodes would then simply describe the position and orientation of your robots. You could imagine the roboty being "attached" to these nodes. (In fact, you can attach a mesh to these nodes in glTF, which then could be the visual representation of the robot).
The animation data itself would then be a description about how these properties (translation and rotation) change over time. The way how this information is stored can be imagined as a table, where you associate the translation and rotation with each time stamp:
time (s) 0.1 0.2 ... 1.0
translation x 1.2 1.3 ... 2.3
translation y 3.4 3.4 ... 4.3
translation z 4.5 4.6 ... 4.9
rotation x 0.12 0.13 ... 0.42
rotation y 0.32 0.43 ... 0.53
rotation z 0.14 0.13 ... 0.34
rotation w 0.53 0.46 ... 0.45
This information is then stored, in a binary form, and provided by so-called accessor objects.
The animation of a glTF asset then basically establishes the connection between this binary animation data, and the properties in the node that are affected by that: Each animation refers to such a "data table", and to the node whose properties will be filled with the new translation and rotation value as time progresses.
Regarding interpolation:
In your case, where the output is sampled at a high rate from the simulation, basically each frame is a "key frame", and no explicit information about key frames or the interpolation scheme will have to be stored. Just declaring that the animation interpolation should be of the type LINEAR or STEP should be sufficient for this use case.
(The option to declare it as a LINEAR interpolation will mainly relevant for the playback. Imagine you stop your playback exactly after 0.15 seconds: Should it then show the state that the robot had at the time stamp 0.1 or the state at time stamp 0.2, or one that is interpolated linearly? This, however, would mainly apply to a standard viewer, and not necessarily to a custom playback)
(*) A side note: On a conceptual level, the way of how the information is represented in glTF and COLLADA is similar. Roughly speaking, COLLADA is an interchange format for authoring applications, and glTF is a transmission format that can efficiently be transferred and rendered. So although the answers until now refer to glTF, you should consider COLLADA as well, depending on your priorities, use-cases or how the "playback" that you mentioned is supposed to be implemented.
Disclaimer: I'm a glTF contributor as well. I also created the glTF tutorial section showing a simple animation and the one that explains some concepts of animations in glTF. You might find them useful, but they obviously build upon some of the concepts that are explained in the earlier sections.
The type of animation you describe is often called "baked" animation, where some calculation has been sampled, possibly at 30 ~ 60 fps, with keyframes saved at the high sample rate. For such animations, usually linear interpolation is applied. For example, in Blender, there's a way to run the Blender Game Engine and record the physics simulation to (dense) keyframes.
As for interpolation, here's a thought experiment: Consider for a moment a polygon-based render engine wants to render a circle, but must use only straight lines. Some limited number of points are calculated around the edge of the circle, and dozens or hundreds of small line segments fill in the gaps between the points. With enough density, or with the camera far enough back, it looks round, but the line segments ensure there are no leaks or gaps in the would-be circle. The same concept applies (in time rather than in space) to baked keyframes. There's a high sample density, and straight lines (linear interpolation) fill in the gaps. If you play it in super-slow motion, you might be able to detect subtle changes in speed as new keyframes are reached. But at normal speed, it looks normal, and the frame rate doesn't need to stay locked to the sample rate.
There's a section on animations for glTF 2.0 that I'll recommend reading here (disclaimer, I'm a glTF contributor and member of the working group). In particular, look at the descriptions of node-based animations with linear interpolation.
For robotics, you'll want to steer clear of skins and skeleton-based animation. Such things are not always compatible with node-based animations anyway (we've run into problems there just recently). The node-based animations are much more applicable to non-deforming robots with articulated joints and such.
I am using Qt 4.8.6 to display multiple radar videos.
For now i am getting about 4096 azimuths (360°) per 2.5 seconds and video.
I display my image using a class inherited from QGraphicsObject (see here), using one of the RGB-Channels for each video.
Per Azimuth I get the angle and an array of 8192 rangebins and my image has the size of 1024x1024 pixels. I now check for every pixel (i am going through every x-coordinate and check the max y- and min y-coordinate for every azimuth and pixel coordinate), which rangebins are present at that pixel and write the biggest data into my image-array.
My problems
The calculating of every azimuth lasts about 1ms, which is way too slow. (I get two azimuths every about 600 microseconds, later there may be even more video channels.)
I want to zoom and move my image and for now have thought about two methods to do that:
Using an image array in full size and zoom and move the QGraphicsscene directly/"virtual"
That would cause the array to have a size of 16384x16384x4 bytes, which is way too big (i can not manage to allocate enough space)
Save multiple images for different scalefactors and offsets, but for that I would need my transforming algorithm to calculate multiple times (which is already slow) and causing the zoom and offset to display only after the full 2.5 seconds
Can you think of any better methods to do that?
Are there any standard rules, how I can check my algorithm for better performance?
I know that is a very special question, but since my mentor is not at work for the next days, I will take a try here.
Thank you!
I'm not sure why you are using a QGraphicsScene for the scenario you are doing. Have you considered turning your data into a raster image, and presenting the data as a bitmap?
I'm working on a project in which a rod is attached at one end to a rotating shaft. So, as the shaft rotates from 0 to ~100 degrees back-and-forth (in the xy plane), so does the rod. I mounted a 3-axis accelerometer at the end of the moving rod, and I measured the distance of the accelerometer from the center of rotation (i.e., the length of the rod) to be about 38 cm. I have collected a lot of data, but I'm in need of help to find the best method to filter it. First, here's a plot of the raw data:
I think the data makes sense: if it's ramping up, then then I think at that point the acceleration should be linearly increasing, and then when it's ramping down, it should linearly decrease. If its moving constantly, the acceleration will be ~zero. Keep in mind though that sometimes the speed changes (is higher) from one "trial" to the other. In this case, there were ~120 "trials" or movements/sweeps, data sampled at 148 Hz.
For filtering, I've tried a low pass filter and then an exponentially decreasing moving average, and both plots weren't too hot. And although I'm not good at interpreting these: here is what I got when coding a power frequency plot:
What I was hoping to get help with here is, attain a really good method by which I can filter this data. The one thing that keeps coming up again time and time again (especially on this site) is the Kalman filter. While there's lots of code online that helps implementing these in MATLAB, I haven't been able to actually understand it that great, and therefore neglect to include my work on it here. So, is a kalman filter appropriate here, for rotational acceleration? If so, can someone help me implement one in matlab and interpret it? Is there something I'm not seeing that may be just as good/better that is relatively simple?
Here's the data I'm talking about. Looking at it more closely/zooming in gives a better appreciation for what's going on in the movement, I think:
http://cl.ly/433B1h3m1L0t?_ga=1.81885205.2093327149.1426657579
Edit: OK, here is the plot of both relavent dimensions collected from the accelerometer. I am neglecting to include the up and down dimension as the accelerometer shows a near constant ~1 G, so I think its safe to say its not capturing much rotational motion. Red is what I believe is the centripetal component, and blue is tangential. I have no idea how to combine them though, which is why I (maybe wrongfully?) ignored it in my post.
And here is the data for the other dimension:
http://cl.ly/1u133033182V?_ga=1.74069905.2093327149.1426657579
Forget the Kalman filter, see the note at the end of the answer for the reason why.
Using a simple moving average filter (like I showed you on an earlier reply if i recall) which is in essence a low-pass filter :
n = 30 ; %// length of the filter
kernel = ones(1,n)./n ;
ysm = filter( kernel , 1 , flipud(filter( kernel , 1 , flipud(y) )) ) ;
%// assuming your data "y" are in COLUMN (otherwise change 'flipud' to 'fliplr')
note: if you have access to the curvefit toolbox, you can simply use: ys = smooth(y,30) ; to get nearly the same result.
I get:
which once zoomed look like:
You can play with the parameter n to increase or decrease the smoothing.
The gray signal is your original signal. I strongly suspect that the noise spikes you are getting are just due to the vibrations of your rod. (depending on the ratio length/cross section of your rod, you can get significant vibrations at the end of your 38 cm rod. These vibrations will take the shape of oscillations around the main carrier signal, which definitely look like what I am seeing in your signal).
Note:
The Kalman filter is way overkill to do a simple filtering of noisy data. Kalman filter is used when you want to calculate a value (a position if I follow your example) based on some noisy measurement, but to refine the calculations, the Kalman filter will also use a prediction of the position based on the previous state (position) and the inertial data (how fast you were rotating for example). For that prediction you need a "model" of the behavior of your system, which you do not seem to have.
In your case, you would need to calculate the acceleration seen by the accelerometer based on the (known or theoretical) rotation speed of the shaft at any point of time, the distance of the accell to the center of rotation, and probably to make it more precise, a dynamic model of the main vibration modes of your rod. Then for each step, compare that to the actual measurement... seems a bit heavy for your case.
Look at the quick figure explaining the Kalman filter process in this wikipedia entry : Kalman filter, and read on if you want to understand it more.
I will propose for you low-pass filter, but ordinary first-order inertial model instead of Kalman. I designed filter with pass-band till 10 Hz (~~0,1 of your sample frequency). Discrete model has following equation:
y[k] = 0.9418*y[k-1] + 0.05824*u[k-1]
where u is your measured vector, and y is vector after filtering. This equation starts at sample number 1, so you can just assign 0 to the sample number 0.
I am detecting the object from the live camera through feature detection with svm , and it read every frame from camera while predicting which affect its speed , i just want that it should select the frame which contain the object and ignore other frames which have no object like empty street or standing car's , it should only detect the moving object
For example , If the object came into camera in 6th frame , it contain into the camera till many frames until it goes out from camera's range , so it should not recount the same object and ignore that frames.
Explanation :
I am detecting the vehicle from video , i want to ignore the empty frames , but how to ignore them ? i only want to check the frames which contain object like vehicle , but if the vehicle is passing from video it take approximately lets assume 5 sec , than it mean same object take 10 frames , so the program count it as 10 vehicles , one from each frame , i want to count it as 1 , because its the one (SAME) vehicle which use 10 frames
My video is already in background subtraction form
I explore two techniques :
1- Entropy ( Frame subtraction )
2- Keyframe extraction
This question is confusingly worded. What output do you want from this analysis? Here's the stages I see:
1) I assume each frame gives you an (x,y) or null for the position of each object in the frame. Can you do this?
2) If you might get multiple objects in a frame, you have to match them with objects in the previous frame. If this is not a concern, skip to (3). Otherwise, assign an index to each object in the first frame they appear. In subsequent frames, match each object to the index in the previous frame based on (x,y) distance. Clumsy, but it might be good enough.
3) Calculating velocity. Look at the difference in (x,y) between this frame and the last one. Of course, you can't do this on the first frame. Maybe apply a low-pass filter to position to smooth out any jittery motion.
4) Missing objects. This is a hard one. If your question is how to treat empty frames with no object in them, then I feel like you just ignore them. But, if you want to track objects that go missing in the middle of a trajectory (like maybe a ball with motion blur) then that is harder. If this is what you're going for, you might want to do object matching by predicting the next position using position, velocity, and maybe even object characteristics (like a histogram of hues).
I hope this was helpful.
You need an object tracker (many examples can be found on the web for tracking code). Then what you are looking for is the number of tracks. That's your answer.
I'm writing an application that averages/combines/stacks a series of exposures. This is commonly used to reduce noise in the resultant image.
However, it seems, to optimize the average/stack the exposures are usually first normalized. It seems that this process assigns weights to each of the exposures and then proceeds to combine them. I am guessing that the process computes the overall intensity of each image as the purpose is to match the intensities of all the images in the stack.
My question is, how can I incorporate an algorithm that will allow me to normalize a series of images? I guess the question be generalized by instead asking "How can I normalize a series of readings?"
An outline in my head appears as follows:
Compute the average of a reference image.
Divide the average of each frame by the average of the the reference frame.
The result of each division is the weight for each frame.
Scale/Multiply each pixel in a frame by the weight found for that particular frame.
Does this seem to make sense to anyone? I have tried to google for the past hour but didn't found anything. Also took at the indices of various image processing books on Amazon but that didn't turn up anything either.
Each integration consists of signal and assorted noise - some is time-independent (e.g. bias or CCD readout noise), some time-dependent (e.g dark current), and some is random (shot noise). The aim is to remove the noise, and leave the signal. So you would first subtract the 'fixed' sources using dark frames (which will include dark current, readout and bias) leaving signal plus shot noise. Signal scales as flux times exposure time, shot noise as the square root of the signal
http://en.wikipedia.org/wiki/Shot_noise
so overall your signal/noise scales as the square root of the integration time (assuming your integrations are short enough that they are not saturated). So by adding frames you are simply increasing the exposure time, and hence the signal/noise ratio. You don't need to normalize first.
To complicate matters, transient non-Gaussian noise is also present (e.g. cosmic ray hits). There are many techniques for dealing with these, but a common one is 'sigma-clipping', where you have an extra pass to calculate the mean and standard deviation of each pixel, and then reject outliers that are many standard deviations from the mean. Real signal will show Gaussian fluctuations around the mean value, whereas transients will show a large deviation in one frame of the stack. Maybe that's what you are thinking of?