Pass modified video stream into Vuforia Engine? - performance

Is it possible to modify the Vuforia video stream for better tracking performance?
Step 1: Get the raw pixel data from the VuforiaBehaviour.Instance.CameraDevice.GetCameraImage();
Step 2: Modify the pixels with post processing via custom shaders in Unity. For example apply a threshold or edge detection.
Step 3: Vuforia Engine uses the modified video input to track images.
That´s the idea but I´m not sure if Vuforia is gonna pass the modified video into the Vuforia Engine then or still uses the unmodified video input for tracking?
If anybody has experience with that I would be thankful for your help! :)

Vuforia Engine assumes that the input images look like "natural" images. Passing an image belonging to a different domain (e.g., the result of an edge detector) is unlikely to improve tracking performance.
That said, tracking performance is affected by image quality. For example, if images are blurry, tracking robustness is going to suffer. If this is the case you might want to look at trying to adjust system camera parameters via the platform API (iOS, Android, etc.). However, please note that this might or might not be possible depending on the platform. Also, on some platforms when a device tracker like ARKit or ARCore is used, the platform tracker itself adjusts the camera parameters for good tracking performance. For example it might keep the exposure time low to reduce blur.

Related

How to design a mission to visit several locations and in each location process some computer vision tasks on the mobile platform?

I would need to make my dron Mavic2 Pro to visit approx 10 locations in relatively low altitude 1.7 m. In each location the camera should look at right direction and the mission paused to let the mobile application process some CV tasks. I am not sure what is the best approach to make a mission that is partially processed on mobile platform? What to use in DJI mobile sdk api to pause mission when the location is reached?
I am going to use a time line mission composed from sequence of GoToAction. I wonder if this is a good way to do it. Is there a better solution?
Is MissionControl.Listener right place to interrupt a mission when a TimeLineElement finish or should I use WaypointReachedTrigger?
I wasn't able to find any suitable example.
Please add specific programming question. Otherwise, all answer is primarily opinion-based see https://stackoverflow.com/help/dont-ask for detail
DJI method allows you to control the drones gimbal and gps navigation through MissionAction. The gotoaction is a subclass of the Missionaction class. the gotoaction only goes to some GPS location . So you need other things in the mission action such as gimbalattiudeaction and camera capture action to perform camera pointing and capturing. See the Fig below.
For CV tasks, it is easy to link DJI app to OpenCV. But I highly not recommend you to do so as the task such as dection using CNN system takes too much resources. The popular approach is to upload the image taken in local buffer to a local server with GPU for processing in near real-time manner. See the Fig below, I`m using WSDK with windows online OCR for detection. Video at https://youtu.be/CcndnHkriyA . I tried with local phone based approch, but result is limited by the model accuracy. And I could not apply high accuracy model because of the processing demand for such model is high. You can see my demo in the Fig below
What you want is pretty ez to implement but hard to perfect. Flying in low altitude(1.7m) requires you to have some degree of obstacle avoidance and GPSless path planning. The one implemented in Mavic hardware is only simple avoidance or slip through. For a bit more complex like go around a wall or maze-like environment, it better to add your own global path planer and local path planner. For the feedback you can use the SVO method to get odometry and map the local sparse obstacle map for inflated radius calculation. See Fig below.
Fig taken from video https://www.youtube.com/watch?v=2YnIMfw6bJY.
The feedback code is available at https://github.com/uzh-rpg/rpg_svo.
The path planning code you can try with ETH`s https://github.com/ethz-asl/mav_voxblox_planning as well.
Good luck with your work.

Project Tango Camera Specifications

I've been developing a virtual camera app for depth cameras and I'm extremely interested in the Tango project. I have several questions regarding the cameras on board. I can't seem to find these specs anywhere in the developer section or forums, so I understand completely if these cant be answered publicly. I thought I would ask regardless and see if the current device is suitable for my app.
Are the depth and color images from the rgb/ir camera captured simultaneously?
What frame rates is the rgb/ir capable of? e.g. 30, 25, 24? And at what resolutions?
Does the motion tracking camera run in sync with the rgb/ir camera? If not what frame rate (or refresh rate) does the motion tracking camera run at? Also if they do not run on the same clock does the API expose a relative or an absolute time stamp for both cameras?
What manual controls (if any) are exposed for the color camera? Frame rate, gain, exposure time, white balance?
If the color camera is fully automatic, does it automatically drop its frame rate in low light situations?
Thank you so much for your time!
Edit: Im specifically referring to the new tablet.
Some guessing
No, the actual image used to generate the point cloud is not the droid you want - I put up a picture on Google+ that shows what you get when you get one of the images that has the IR pattern used to calculate depth (an aside - it looks suspiciously like a Serpinski curve to me
Image frame rate is considerably higher than point cloud frame rate, but seems variable - probably a function of the load that Tango imposes
Motion tracking, i.e. pose, is captured at a rate roughly 3x the pose cloud rate
Timestamps are done with the most fascinating double precision number - in prior releases there was definitely artifacts/data in the lsb's of the double - I do a getposeattime (callbacks used for ADF localization) when I pick up a cloud, so supposedly I've got a pose aligned with the cloud - images have very low timestamp correspondance with pose and cloud data - it's very important to note that the 3 tango streams (pose,image,cloud) all return timestamps
Don't know about camera controls yet - still wedging OpenCV into the cloud services :-) Low light will be interesting - anecdotal data indicates that Tango has a wider visual spectrum than we do, which makes me wonder if fiddling with the camera at the point of capture to change image quality, e.g. dropping the frame rate, might not cause Tango problems

Tango color frames not of best quality, looks like up-scaled from lower resolution. Is it a Tango hardware limitation?

I obtain color frames from TANGO_CAMERA_COLOR. Frames are not of the best quality - looks like they are up-scaled from lower resolution.
This can be easily seen by comparing video quality from standard Android Camera app and "Project Tango Native Augmented Reality" sample app, running on the same device.
Questions: is it what intended to be? If so then why?
Is there a way to improve quality, of if there is a plan to improve quality in future Tango releases?
I set config_color_iso to 400, default exposition time.
Each depth frame has corresponding color frame with the exactly same timestamp. Infrared illumination (artefacts) are seen at just a very few color frames.
You may want to stick with the images coming out of Tango
1) If you snag another camera, or grab the camera directly, then Tango depth information stops coming.
2) More importantly to my eyes, it is the images from Tango that are the source of the point cloud - anything you want to do with coloring cloud points and surfaces and having the faintest hope of success would do better with these images
3) Trying to offload the image stream in real time requires JPEG compression if you're going straight to the cloud - raw images from tango are 1280x720, so they way in at about a megabyte each before compression

animated gif vs video vs canvas - for speed & file size

Assuming a simple product demo e.g. the one found on http://www.sublimetext.com/
i.e. something this isn't traditional high res video and could be reasonable accomplished with:
animated gif
video (can be embedded youtube, custom html5 player, whatever is most competitive)
canvas
The question is, which performs better for the user? Both in terms of:
The size of the files the user must be downloaded to view the 'product demo'
The requirements in terms of processing power to display the 'product demo'
If you feel that there's a superior technology to accomplish this or another metric to judge its usefulness, let me know and I'll adjust accordingly.
I know it's already answered, but as you specifically referred to the Sublime Text animation I assume you're wanting to create something similar?
If that's the case then here is a post explaining how it was created by the Sublime Text author, himself:
http://www.sublimetext.com/~jps/animated_gifs_the_hard_way.html
The interesting part of the article is how he reduces the file size - which I believe is your question.
With a simple animation such as the one at the link you're referring to, with a very low frame rate, a simple animated-PNG of animated GIF will probably be the best solution.
However, you need to consider band-width factor in this. If the final size of the GIF or the PNG is large then probably a buffered video is probably better.
This is because the whole gif/png file needs to be downloaded before it shows (I am not sure how interleaved PNGs works when they contain animation though).
A video may be larger in file size, but as it is typically buffered you will be able to show the animation almost right away.
Using external hosts such as YouTube or others can be beneficial to your site as well as the band-width is drawn from those site and not from your server (in case you use a provider that limits or charge for this in various ways).
For more information on animated PNGs or APNG (as this is not so well-known):
https://en.wikipedia.org/wiki/APNG
The canvas in this is only a displaying device and not really necessary (an image container does the same job and can also animate the GIF/PNG whereas a canvas cannot).
If you use a lot of vectors then canvas can be considered.
CSS3 animation is also an option for things such as presentation slides.

Real-time video(image) stitching

I'm thinking of stitching images from 2 or more(currently maybe 3 or 4) cameras in real-time using OpenCV 2.3.1 on Visual Studio 2008.
However, I'm curious about how it is done.
Recently I've studied some techniques of feature-based image stitching method.
Most of them requires at least the following step:
1.Feature detection
2.Feature matching
3.Finding Homography
4.Transformation of target images to reference images
...etc
Now most of the techniques I've read only deal with images "ONCE", while I would like it to deal with a series of images captured from a few cameras and I want it to be "REAL-TIME".
So far it may still sound confusing. I'm describing the detail:
Put 3 cameras at different angles and positions, while each of them must have overlapping areas with its adjacent one so as to build a REAL-TIME video stitching.
What I would like to do is similiar to the content in the following link, where ASIFT is used.
http://www.youtube.com/watch?v=a5OK6bwke3I
I tried to consult the owner of that video but I got no reply from him:(.
Can I use image-stitching methods to deal with video stitching?
Video itself is composed of a series of images so I wonder if this is possible.
However, detecting feature points seems to be very time-consuming whatever feature detector(SURF, SIFT, ASIFT...etc) you use. This makes me doubt the possibility of doing Real-time video stitching.
I have worked on a real-time video stitching system and it is a difficult problem. I can't disclose the full solution we used due to an NDA, but I implemented something similar to the one described in this paper. The biggest problem is coping with objects at different depths (simple homographies are not sufficient); depth disparities must be determined and the video frames appropriately warped so that common features are aligned. This essentially is a stereo vision problem. The images must first be rectified so that common features appear on the same scan line.
You might also be interested in my project from a few years back. It's a program which lets you experiment with different stitching parameters and watch the results in real-time.
Project page - https://github.com/lukeyeager/StitcHD
Demo video - https://youtu.be/mMcrOpVx9aY?t=3m38s

Resources