Noisy disparity map in stereovision pipeline

Noisy disparity map in stereovision pipeline - fpga

I'm trying to implement the Xilinx xfOpenCV stereovision pipeline explained at the bottom page here in Vivado HLS as standalone IP core (instead using accelerated flow). The stereo pipeline functions are based and very similar as the OpenCV ones.
So first I've collected a couple of images from an already calibrated stereo camera and I've simulated the xfOpenCV functions as standalone HW IP to make sure I've got the expected result, and after simulation the result is not perfect and it's got quite a lot of noise, for instance:
I went ahead and synthesize and implement the IP in the hardware (FPGA) to test it with a live stereo-camera stream. I've got the calibration parameters and are being used to correct the live stream frames before the 'stereo' function. (The calibration parameters also have been tested previously in simulation)
All works fine in terms of video flow, memory etc, but what I've got is a quite high level of noise (as expected from the simulation), this is a screen shot example of the camera live:
Any idea why this 'flickering' noise is generated? is it caused by the original images noise? what would it be the best approach or next steps to get rid of it (or smooth it)?
Thanks in advance.

Related

Pass modified video stream into Vuforia Engine?

Is it possible to modify the Vuforia video stream for better tracking performance?
Step 1: Get the raw pixel data from the VuforiaBehaviour.Instance.CameraDevice.GetCameraImage();
Step 2: Modify the pixels with post processing via custom shaders in Unity. For example apply a threshold or edge detection.
Step 3: Vuforia Engine uses the modified video input to track images.
That´s the idea but I´m not sure if Vuforia is gonna pass the modified video into the Vuforia Engine then or still uses the unmodified video input for tracking?
If anybody has experience with that I would be thankful for your help! :)

Vuforia Engine assumes that the input images look like "natural" images. Passing an image belonging to a different domain (e.g., the result of an edge detector) is unlikely to improve tracking performance.
That said, tracking performance is affected by image quality. For example, if images are blurry, tracking robustness is going to suffer. If this is the case you might want to look at trying to adjust system camera parameters via the platform API (iOS, Android, etc.). However, please note that this might or might not be possible depending on the platform. Also, on some platforms when a device tracker like ARKit or ARCore is used, the platform tracker itself adjusts the camera parameters for good tracking performance. For example it might keep the exposure time low to reduce blur.

ESP32-CAM image noisy

I am using ESP32-CAM, and CameraWebServer example from standard Arduino IDE package.
It works fine, but the image I receive in a browser is noisy: color lines appear randomly over the picture. Any idea what causes it and how to fix?

There could be a number of reasons for this behaviour, and it possibly down to a cumulative number of issues which will affect the picture quality.
Power supply quality. ESP32's draw a lot of current under certain conditions, and this can cause a brownout condition to occur. This could be down to your USB port not being able to supply enough current. Check the serial terminal for messages, if you see brownout error messages on your serial monitor, try a powered USB hub or a better quality USB cable
Power supply noise. If you have access to an oscilloscope, check the 3.3 and 5v rails for garbage. If excessive, try adding 2 x 1000uf capacitors on each rail
RF interference. The ribbon cable between the camera and the board is not shielded. Try lifting it away from the board, or even wrapping it in a thin layer of foil and some insulating tape, ensuring no shorts occur. If the ribbon cable is long, try a camera with a shorter cable
Lighting. With fluorescent and LED lighting, some forms of illumination seem noisier than others. Try increasingly natural daylight
Interface settings. The defaults on the webserver example are not ideal for certain lighting conditions. Try disabling Lens correction and manually adjusting the gain control, AE level and exposure. Tweaking these settings will ellemminate much of the background noise.
I found all of these small improvements makes a big difference to picture quality. In my scenario, low light and noisy power seemed to be the worst culprits, but your YMMV. By implementing these I managed to improve the picture quality from unwatchable to reasonable.

Multiple Tangos Looking at one location - IR Conflict

I am getting my first Tango in the next day or so; worked a little bit with Occipital's Structure Sensor - which is where my background in depth perceiving camera's come from.
Has anyone used multiple Tango at once (lets say 6-10), looking at the same part of a room, using depth for identification and placement of 3d character/content? I have been told that multiple devices looking at the same part of a room will confuse each Tango as they will see the other Tango's IR dots.
Thanks for your input.
Grisly

I have not tried to use several Tangos, but I have however tried to use my Tango in a room where I had a Kinect 2 sensor, which caused the Tango to go bananas. It seems however like the Tango has lower intensity on its IR projector in comparison, but I would still say that it is a reasonable assumption that it will not work.
It might work under certain angles but I doubt that you will be able to find a configuration of that many cameras without any of them interfering with each other. If you would make it work however, I would be very interested to know how.

You could lower the depth camera rate (defaults to 5/second I believe) to avoid conflicts, but that might not be desirable given what you're using the system for.
Alternatively, only enable the depth camera when placing your 3D models on surfaces, then disable said depth camera when it is not needed. This can also help conserve CPU and battery power.

It did not work. Occipital Structure Sensor on the other hand, did work (multiple devices in one place)!

OpenGL Performance

First let me explain the application a little bit. This is video security software that can display up to 48 cameras at once. Each video stream gets its own Windows HDC but they all use a shared OpenGL context. I get pretty good performance with OpenGL and it runs on Windows/Linux/Mac. Under the hood the contexts are created using wxWidgets 2.8 wxGLCanvas, but I don't think that has anything to do with the issue.
Now here's the issue. Say I take the same camera and display it in all 48 of my windows. This basically means I'm only decoding 30fps (which is done on a different thread anywa) but displaying up to 1440fps to take decoding out of the picture. I'm using PBOs to transfer the images over, depending on whether pixel shaders and multitexturing are supported I may use those to do YUV->RGB conversion on the GPU. Then I use a quad to position the texture and call SwapBuffers. All the OpenGL calls come from the UI thread. Also I've tried doing YUV->RGB conversion on the CPU and messed with using GL_RGBA and GL_BGRA textures, but all formats still yield roughly the same performance. Now the problem is I'm only getting around 1000fps out of the possible 1440fps (I know I shouldn't be measuring in fps, but its easier in this scenario). The above scenario is using 320x240 (YUV420) video which is roughly only 110MB/sec. If I use a 1280x720 camera then I get roughly the same framerate which is nearly 1.3GB/sec. This tells me that it certainly isn't the texture upload speed. If I do the YUV->RGB conversion and scaling on the CPU and paint using a Windows DC then I can easily get the full 1440fps.
The other thing to mention is that I've disabled vsync both on my video card and through OpenGL using wglSwapIntervalEXT. Also there are no OpenGL errors being reported. However, using very sleepy to profile the application it seems to be spending most of its time in SwapBuffers. I'm assuming the issue is somehow related to my use of multiple HDCs or with SwapBuffers somewhere, however, I'm not sure how else to do what I'm doing.
I'm no expert on OpenGL so if anyone has any suggestions or anything I would love to hear them. If there is anything that I'm doing that sounds wrong or any way I could achieve the same thing more efficiently I'd love to hear it.
Here's some links to glIntercept logs for a better understanding of all the OpenGL calls being made:
Simple RGB: https://docs.google.com/open?id=0BzGMib6CGH4TdUdlcTBYMHNTRnM
Shaders YUV: https://docs.google.com/open?id=0BzGMib6CGH4TSDJTZGxDanBwS2M
Profiling Information:
So after profiling it reported several redundant state changes which I'm not surprised by. I eliminated all of them and saw no noticeable performance difference which I kind of expected. I have 34 state changes per render loop and I am using several deprecated functions. I'll look into using vertex arrays which would solve these. However, I'm just doing one quad per render loop so I don't really expect much performance impact from this. Also keep in mind I don't want to rip everything out and go all VBOs because I still need to support some fairly old Intel chipset drivers that I believe are only OpenGL 1.4.
The thing that really interested me and it hadn't occurred to me before was that each context has its own front and back buffer. Since I'm only using one context the previous HDCs render call must finish writing to the back buffer before the swap can occur and then the next one can start writing to the back buffer again. Would it really be more efficient to use more than one context? Or should I look into rendering to textures (FBOs I think) instead and continue using one context?
EDIT: The original description mentioned using multiple OpenGL contexts, but I was wrong I'm only using one OpenGL context and multiple HDCs.
EDIT2: Added some information after profiling with gDEBugger.

What I try to make your application faster. I made one OpenGL render thread (or more if you have 2 or more video cards). Video card cannot process several context in one time, your multiple OpenGL contexts are waiting one of context. This thread will make only OpenGL work, like YUV->RGB conversion (Used FBO to render to texture). Camere`s thread send images to this thread and UI thread can picked up it to show on window.
You have query to process in OpenGL context and you can combine several frames to one texture to convert it by one pass. It maybe useful, because you have up to 48 cameras. As another variant if OpenGL thread is busy now, you can convert some frame on CPU.
From the log I see you often call the same methods:
glEnable(GL_TEXTURE_2D)
glMatrixMode(GL_TEXTURE)
glLoadIdentity()
glColor4f(1.000000,1.000000,1.000000,1.000000)
You may call it once per context and did not call for each render.
If I understung correct you use 3 texture for each plane of YUV
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,352,240,GL_LUMINANCE,GL_UNSIGNED_BYTE,00000000)
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,176,120,GL_LUMINANCE,GL_UNSIGNED_BYTE,000000)
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,176,120,GL_LUMINANCE,GL_UNSIGNED_BYTE,00000000)
Try to use one texture and use calculation in shader to take correct YUV value for pixel. It is possible, I made it in my application.

Real-time video(image) stitching

I'm thinking of stitching images from 2 or more(currently maybe 3 or 4) cameras in real-time using OpenCV 2.3.1 on Visual Studio 2008.
However, I'm curious about how it is done.
Recently I've studied some techniques of feature-based image stitching method.
Most of them requires at least the following step:
1.Feature detection
2.Feature matching
3.Finding Homography
4.Transformation of target images to reference images
...etc
Now most of the techniques I've read only deal with images "ONCE", while I would like it to deal with a series of images captured from a few cameras and I want it to be "REAL-TIME".
So far it may still sound confusing. I'm describing the detail:
Put 3 cameras at different angles and positions, while each of them must have overlapping areas with its adjacent one so as to build a REAL-TIME video stitching.
What I would like to do is similiar to the content in the following link, where ASIFT is used.
http://www.youtube.com/watch?v=a5OK6bwke3I
I tried to consult the owner of that video but I got no reply from him:(.
Can I use image-stitching methods to deal with video stitching?
Video itself is composed of a series of images so I wonder if this is possible.
However, detecting feature points seems to be very time-consuming whatever feature detector(SURF, SIFT, ASIFT...etc) you use. This makes me doubt the possibility of doing Real-time video stitching.

I have worked on a real-time video stitching system and it is a difficult problem. I can't disclose the full solution we used due to an NDA, but I implemented something similar to the one described in this paper. The biggest problem is coping with objects at different depths (simple homographies are not sufficient); depth disparities must be determined and the video frames appropriately warped so that common features are aligned. This essentially is a stereo vision problem. The images must first be rectified so that common features appear on the same scan line.

You might also be interested in my project from a few years back. It's a program which lets you experiment with different stitching parameters and watch the results in real-time.
Project page - https://github.com/lukeyeager/StitcHD
Demo video - https://youtu.be/mMcrOpVx9aY?t=3m38s

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio