How to efficiently display a large number of moving points - performance

I have a large array of points, which updates dynamically. For the most part, only certain (relatively small) parts of the array get updated. The goal of my program is to build and display a picture using these points.
If I build a picture directly from the points it would be 8192 x 8192 pixels in size. I believe an optimization would be to reduce the array in size. My application has two screen areas (the one is a magnification/zooming in of the other). Additionally I will need to pan this picture in either of screen areas.
My approach for optimization is as follows.
Take a source array of points and reduce it with scaling factor for the first screen area
Same for the second area, but with larger scaling factor
Render there two arrays in two FBOs
Using FBOs as a textures (to provide ability to pan a picture)
When updating a picture I re-render only changed area.
Suggest ways to speed this up as my current implementation runs extremely slow.

You will hardly be able to optimize this a lot if you don't have the hardware to run it at an adequate rate. Even if you render in different threads to FBOs and then compose the result, your bottleneck is likely to remain. 67 million data points is nothing to sneeze at, even for modern GPUs.
Try not to update unnecessarily, update only what changes, render only what's updated and visible, try to minimize the size of your components, e.g. use a shorter data type if possible.

Related

Fast subrects from layered image

I have this 2d raster upon which are layered from 1 to say 20 other 2d rasters (with random size and offset). I'm searching for fast way to access a sub-rectangle view (with random size and offset). The view should return all the layered pixels for each X and Y coordinate.
I guess this is kind of how say, GIMP or other 2d paint apps draw layers upon each other, with the exception that I want to have all the pixels upon each other, and not just projection where the top pixel hides the other ones below it.
I have met this problem and before and I still do now, spend already a lot time to search around internet and here about similar issues, but can't find any. I will describe two possible solution, both from which I'm not satisfied:
Have a basically 3d array of pre-allocated size. This is easy to manage but the storage wasted and memory overhead is really big. For 4k raster of say 16 slots, 4 bytes each, is like 1 GiB of memory? And in application case, most of that space will be wasted, not used.
My solution which I made before. Have two 2d arrays, one is with indices, the other with actual values. Each "pixel" of the first one says in which range of pixels in the second array you can find the actual pixels contributed from all layers. This is well compressed on size, but any request is bouncing between two memory regions and is a bit hassle to setup, not to mention update (a nice to have feature, but not mandatory).
So... any know-how on such kind of problem? Thank you in advance!
Forgot to add that I'm targeting self-sufficient, preferably single thread, CPU solution. The layers, will be most likely greyscale with alpha (that is, certain pixel data will not existent). Lookup operation is priority, updates like adding/removing a layer can be more slow.
Added by Mark (see comment):
In that image, if taking top-left corner of the red rectangle, a lookup should report red, green, blue and black. If the bottom-right corner is taken, it should report red and black only.
I would store the offsets and size in a data-structure separate from the pixel-data. This way you do not jump around in the memory while you calculate the relative coordinates for each layer (or even if you can ignore some layers).
If you want to access single pixels or small areas rather than iterating big areas a Quad-Tree might be a good idea to store your data with more local memory access while accessing pixels or areas which are near each other (in x or y direction).

Optimized GPU to CPU data transfer

I'm a bit out of my depth here (best way to be me thinks), but I am poking around looking for an optimization that could reduce GPU to CPU data transfer for my application.
I have an application that performs some modifications to vertex data in the GPU. Occasionally the CPU has to read back parts of the modified vertex data and then compute some parameters which then get passed back into the GPU shader via uniforms, forming a loop.
It takes too long to transfer all the vertex data back to the CPU and then sift through it on the CPU (millions of points), and so I have a "hack" in place to reduce the workload to usable, although not optimal.
What I do:
CPU: read image
CPU: generate 1 vertex per pixel, Z based on colour information/filter etc
CPU: transfer all vertex data to GPU
GPU: transform feedback used to update GL_POINT vertex coords in realtime based on some uniform parameters set from the CPU.
When I wish to read only a rectangular "section", I use glMapBufferRange to map the entire rows that comprise the desired rect (bad diagram alert):
This is supposed to represent the image/set of vertices in the GPU. My "hack" involves having to read all the blue and red vertices. This is because I can only specify 1 continuous range of data to read back.
Does anyone know a clever way to efficiently get at the red, without the blue? (without having to issue a series of glMapBufferRange calls)
EDIT-
The use case is that I render the image into a 3D world as GLPoints, coloured and offset in the Z by an amount based on the colour info (sized etc according to distance). Then the user can modify the vertex Z data with a mouse cursor brush. The logic behind some of the brush application code needs to know the Z's of the area under the mouse (brush circle), eg. min/max/average etc so that the CPU can control the shaders modification of data by setting a series of uniforms that feed into the shader. So for example the user can say, I want all points under the cursor to set to the average value. This could all probably be done entirely in the GPU, but the idea is that once I get the CPU-GPU "loop" (optimised as far as I can reasonably do), I can then expand out the min/max/avg stuff to do interesting things on the CPU that would be cumbersome (probably) to do entirely on the GPU.
Cheers!
Laythe
To get any data from the GPU to the CPU you need to map the GPU memory in any case, which means the OpenGL application will have to use something like mmap under the hood. I have checked the implementation of that for both x86 and ARM, and it looks like it is page-aligned, so you cannot map less than 1 contiguous page of GPU memory at any given time, so even if you could request to map just the red areas, you quite likely would also get the blue ones as well (depending on your page and pixel data sizes).
Solution 1
Just use glReadPixels, as this allows you to select a window of the framebuffer. I assume a GPU vendor like Intel would optimize the driver, so it would map as few pages as possible, however this is not guaranteed, and in some cases you may need to map 2 pages just for 2 pixels.
Solution 2
Create a compute shader or use several glCopyBufferSubData calls to copy your region of interest into a contiguous buffer in GPU memory. If you know the height and width you want, you can then un-mangle and get a 2D buffer back on the CPU side.
Which of the above solutions works better depends on your hardware and driver implementation. If GPU->CPU is the bottleneck and GPU->GPU is fast, then the second solution may work well, however you would have to experiment.
Solution 3
As suggested in the comments, do everything on the GPU. This heavily depends on whether the work is parallelize-able well, but if the copying of memory is too slow for you, then you don't have much other choice.
I suppose you are asking because you can not do all work at shaders, right?
If you render to a Frame Buffer Object, then bind it as GL_READ_FRAMEBUFFER, you can read a block of it by glReadPixels.

Detecting weak blobs in a noise image

I have an image which may contain some blobs. The blobs can be any size, and some will yield a very strong signal, while others are very weak. In this question I will focus on the weak ones because they are the difficult ones to detect.
Here is an example with 4 blobs.
The blob at (480, 180) is the most difficult one to detect. By running a Gaussian filter followed by an opening operation increases the contrast a bit, but not a lot:
The tricky part of this problem is that the natural noise in the background will result in (many) pixels which have a stronger signal than the blob I want to detect. What makes the blob a blob is that it's either a large area with an average increase in intensity, (or a small area with a very strong increase in intensity (not relevant here)).
How can I include this spacial information in order to detect my blob?
It is obvious that I first needs to filter the image with a Gaussian and/or median filter in order to incorporate the nearby region of each pixel into each single pixel value. However, no amount of blurring is enough to make it easy to segment the blobs from the background.
EDIT: Regarding thresholding: Thresholding is very temping, but also problematic by itself. I do not have a region of "pure background" and the larger a blob is, the weaker the signal can be - while still being detectable.
I should also not that the typical image will not have any blobs at all, but just be pure background.
You could try a h-minima transform. It removes any minima under the height of h and increases the height of all other throughs by h. It's defined as the morphological reconstruction of an erosion increased by the height h. Here's the results with h = 35:
It should be a lot easier to manipulate. It also needs a input like segmentation. The difference is that this is more robust. Underestimating h by a relatively large number will only bring you back closer to the original problem image instead of failing completely.
You could try to characterize the background noise to get an estimate, assuming that whatever your application is would have a relatively constant amount of it.
Note that one blue dot between the two large bottom blobs. Even further processing is needed. You could try continuing with the morphology. Something that I have found to work in some 'ink-blot' segmentation cases like this is running through every connected component, calculating their convex hulls and finally the union of all the convex hulls in the image. It usually makes further morphological operations much easier and provides a good estimate for the label.
In my experience, if you can see your gaussian filter size (those little circles), then your filter width is too small. Although terribly expensive, try bumping up the radius on your gaussian, it should continue to improve your results up to its radius matching the radius of the smallest object you are trying to find.
Following that (heavy gaussian), I would do a peak search across the whole image. Cut out any peaks that are too low, and or have too little contrast to the nearest valley/ background.
Don't be afraid to split this into two isolated processing pipelines: ie one filtration and extraction for low contrast spread out blobs, and a completely different one to isolate high contrast spikes (much much easier to find). That being said, a high contrast spike "should" survive even a pretty aggressive filter. Another thing to keep in mind is iterative subtraction, if there are some blobs that can be found easily from the get go, pull them out of the image and then do a stretch (but be careful as you can make the image be whatever you want it to be with too much stretching)
Maybe try an iterative approach using thresholding and edge detection:
Start with a very high threshold (say 90% signal), then run a canny filter (or any binary edge filter you like) on the thresholded image. Count and store the number of pixels (edge pixels) generated.
Proceed to repeat this step for lower and lower thresholds. At a certain point you are going to see a massive spike in edges detected (ie your cool textured background). Then pull back the threshold a little higher and run closing and floodfill on your resulting edge image.

How can I get the rectangular areas of difference between two images?

I feel like I have a very typical problem with image comparison, and my googles are not revealing answers.
I want to transmit still images of a desktop every X amount of seconds. Currently, we send a new image if the old and new differ by even one pixel. Very often only something very minor changes, like the clock or an icon, and it would be great if I could just send the changed portion to the server and update the image (way less bandwidth).
The plan I envision is to get a rectangle of an area that has changed. For instance, if the clock changed, screen capture the smallest rectangle that encompasses the changes, and send it to the server along with its (x, y) coordinate. The server will then update the old image by overlaying the rectangle at the coordinate specified.
Is there any algorithm or library that accomplishes this? I don't want it to be perfect, let's say I'll always send a single rectangle that encompasses all the changes (even if many smaller rectangles would be more efficient).
My other idea was to get a diff between the new and old images that's saved as a series of transformations. Then, I would just send the series of transformations to the server, which would then apply this to the old image to get the new image. Not sure if this is even possible, just a thought.
Any ideas? Libraries I can use?
Compare every pixel of the previous frame with every pixel of the next frame, and keep track of which pixels have changed?
Since you are only looking for a single box to encompass all the changes, you actually only need to keep track of the min-x, min-y (not necessarily from the same pixel), max-x, and max-y. Those four values will give you the edges of your rectangle.
Note that this job (comparing the two frames) should really be off-loaded to the GPU, which could do this significantly faster than the CPU.
Note also that what you are trying to do is essentially a home-grown lossless streaming video compression algorithm. Using one from an existing library would not only be much easier, but also probably much more performant.
This is from algorithms point of view. Not sure if this is easier to implement.
Basically XOR the two images and compress using any information theory algorithm (huffman coding?)
I know am very late responding but I found this question today.
I have done some analysis on Image Differencing but the code was written for java. Kindly look into the below link that may come to help
How to find rectangle of difference between two images
The code finds differences and keeps the rectangles in a Linkedlist. You can use the linkedlist that contains the Rectangles to patch the differences on to the Base Image.
Cheers !

OpenGL drawing the same polygon many times in multiple places

In my opengl app, I am drawing the same polygon approximately 50k times but at different points on the screen. In my current approach, I do the following:
Draw the polygon once into a display list
for each instance of the polygon, push the matrix, translate to that point, scale and rotate appropriate (the scaling of each point will be the same, the translation and rotation will not).
However, with 50k polygons, this is 50k push and pops and computations of the correct matrix translations to move to the correct point.
A coworker of mine also suggested drawing the entire scene into a buffer and then just drawing the whole buffer with a single translation. The tradeoff here is that we need to keep all of the polygon vertices in memory rather than just the display list, but we wouldn't need to do a push/translate/scale/rotate/pop for each vertex.
The first approach is the one we currently have implemented, and I would prefer to see if we can improve that since it would require major changes to do it the second way (however, if the second way is much faster, we can always do the rewrite).
Are all of these push/pops necessary? Is there a faster way to do this? And should I be concerned that this many push/pops will degrade performance?
It depends on your ultimate goal. More recent OpenGL specs enable features for "geometry instancing". You can load all the matrices into a buffer and then draw all 50k with a single "draw instances" call (OpenGL 3+). If you are looking for a temporary fix, at the very least, load the polygon into a Vertex Buffer Object. Display Lists are very old and deprecated.
Are these 50k polygons going to move independently? You'll have to put up with some form of "pushing/popping" (even though modern scene graphs do not necessarily use an explicit matrix stack). If the 50k polygons are static, you could pre-compile the entire scene into one VBO. That would make it render very fast.
If you can assume a recent version of OpenGL (>=3.1, IIRC) you might want to look at glDrawArraysInstanced and/or glDrawElementsInstanced. For older versions, you can probably use glDrawArraysInstancedEXT/`glDrawElementsInstancedEXT, but they're extensions, so you'll have to access them as such.
Either way, the general idea is fairly simple: you have one mesh, and multiple transforms specifying where to draw the mesh, then you step through and draw the mesh with the different transforms. Note, however, that this doesn't necessarily give a major improvement -- it depends on the implementation (even more than most things do).

Resources