Fast subrects from layered image - algorithm

I have this 2d raster upon which are layered from 1 to say 20 other 2d rasters (with random size and offset). I'm searching for fast way to access a sub-rectangle view (with random size and offset). The view should return all the layered pixels for each X and Y coordinate.
I guess this is kind of how say, GIMP or other 2d paint apps draw layers upon each other, with the exception that I want to have all the pixels upon each other, and not just projection where the top pixel hides the other ones below it.
I have met this problem and before and I still do now, spend already a lot time to search around internet and here about similar issues, but can't find any. I will describe two possible solution, both from which I'm not satisfied:
Have a basically 3d array of pre-allocated size. This is easy to manage but the storage wasted and memory overhead is really big. For 4k raster of say 16 slots, 4 bytes each, is like 1 GiB of memory? And in application case, most of that space will be wasted, not used.
My solution which I made before. Have two 2d arrays, one is with indices, the other with actual values. Each "pixel" of the first one says in which range of pixels in the second array you can find the actual pixels contributed from all layers. This is well compressed on size, but any request is bouncing between two memory regions and is a bit hassle to setup, not to mention update (a nice to have feature, but not mandatory).
So... any know-how on such kind of problem? Thank you in advance!
Forgot to add that I'm targeting self-sufficient, preferably single thread, CPU solution. The layers, will be most likely greyscale with alpha (that is, certain pixel data will not existent). Lookup operation is priority, updates like adding/removing a layer can be more slow.
Added by Mark (see comment):
In that image, if taking top-left corner of the red rectangle, a lookup should report red, green, blue and black. If the bottom-right corner is taken, it should report red and black only.

I would store the offsets and size in a data-structure separate from the pixel-data. This way you do not jump around in the memory while you calculate the relative coordinates for each layer (or even if you can ignore some layers).
If you want to access single pixels or small areas rather than iterating big areas a Quad-Tree might be a good idea to store your data with more local memory access while accessing pixels or areas which are near each other (in x or y direction).

Related

Smartest way to draw symmetric QGraphicsRectItems

I am writing a GUI that is supposed to display entities of a system in a 2D coordinate system, which the user can select and drag around. The system is mirror-symmetric w.r.t. the x and y axes. Currently I am subclassing an entity using a QGraphicsRectItem so that I can drag it around in the first quadrant (x>0, y>0) of the coordinate system. I reimplemented the paint method to draw the other three additional rectangles with painter.drawRectangle(). So when I move the entity in quadrant 1, the elements in the other three quadrants perform mirror motions. That works well.
In the next stage, each entity can be subdivided, i.e. consist of hundreds of rectangles. So I need to draw hundreds of rectangles and that four times, with mirror operations. The naive approach takes four for-loops, but I'm wondering if there is a smarter way of doing this in QT. The for-loops hurt a little because I'm using PyQt.
If your drawing operations are so slow the simplest thing you could do is draw to an image, and then simply draw the cached painting from the image 4 times, which will be very fast, since it will just be copying some pixel values.
It might be efficient to cache the drawing result not on item basis but to cache a quadrant of your grid. This way if you zoom in and the items get huge or numerous in count, you won't be wasting lots of memory, instead you will only need one image cache that's the screen size of the quadrant.
It really depends what you want to achieve, which at this point is not entirely clear from the description, and your image isn't showing neither.

Detecting weak blobs in a noise image

I have an image which may contain some blobs. The blobs can be any size, and some will yield a very strong signal, while others are very weak. In this question I will focus on the weak ones because they are the difficult ones to detect.
Here is an example with 4 blobs.
The blob at (480, 180) is the most difficult one to detect. By running a Gaussian filter followed by an opening operation increases the contrast a bit, but not a lot:
The tricky part of this problem is that the natural noise in the background will result in (many) pixels which have a stronger signal than the blob I want to detect. What makes the blob a blob is that it's either a large area with an average increase in intensity, (or a small area with a very strong increase in intensity (not relevant here)).
How can I include this spacial information in order to detect my blob?
It is obvious that I first needs to filter the image with a Gaussian and/or median filter in order to incorporate the nearby region of each pixel into each single pixel value. However, no amount of blurring is enough to make it easy to segment the blobs from the background.
EDIT: Regarding thresholding: Thresholding is very temping, but also problematic by itself. I do not have a region of "pure background" and the larger a blob is, the weaker the signal can be - while still being detectable.
I should also not that the typical image will not have any blobs at all, but just be pure background.
You could try a h-minima transform. It removes any minima under the height of h and increases the height of all other throughs by h. It's defined as the morphological reconstruction of an erosion increased by the height h. Here's the results with h = 35:
It should be a lot easier to manipulate. It also needs a input like segmentation. The difference is that this is more robust. Underestimating h by a relatively large number will only bring you back closer to the original problem image instead of failing completely.
You could try to characterize the background noise to get an estimate, assuming that whatever your application is would have a relatively constant amount of it.
Note that one blue dot between the two large bottom blobs. Even further processing is needed. You could try continuing with the morphology. Something that I have found to work in some 'ink-blot' segmentation cases like this is running through every connected component, calculating their convex hulls and finally the union of all the convex hulls in the image. It usually makes further morphological operations much easier and provides a good estimate for the label.
In my experience, if you can see your gaussian filter size (those little circles), then your filter width is too small. Although terribly expensive, try bumping up the radius on your gaussian, it should continue to improve your results up to its radius matching the radius of the smallest object you are trying to find.
Following that (heavy gaussian), I would do a peak search across the whole image. Cut out any peaks that are too low, and or have too little contrast to the nearest valley/ background.
Don't be afraid to split this into two isolated processing pipelines: ie one filtration and extraction for low contrast spread out blobs, and a completely different one to isolate high contrast spikes (much much easier to find). That being said, a high contrast spike "should" survive even a pretty aggressive filter. Another thing to keep in mind is iterative subtraction, if there are some blobs that can be found easily from the get go, pull them out of the image and then do a stretch (but be careful as you can make the image be whatever you want it to be with too much stretching)
Maybe try an iterative approach using thresholding and edge detection:
Start with a very high threshold (say 90% signal), then run a canny filter (or any binary edge filter you like) on the thresholded image. Count and store the number of pixels (edge pixels) generated.
Proceed to repeat this step for lower and lower thresholds. At a certain point you are going to see a massive spike in edges detected (ie your cool textured background). Then pull back the threshold a little higher and run closing and floodfill on your resulting edge image.

Invoice / OCR: Detect two important points in invoice image

I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).

Storing data for levels in a game like RISK or Total War

I'm working on a game which is a bit like the boardgame RISK, or the campaign section of the Total War series. I currently have a working implementation of the region system, but because of bad performance, the game hangs after certain commands. I'm sure it is possible to do it better.
What I want to do
I want to be able to present a map, such as a world map, and divide it up into regions (e.g. countries). I want to be able to select regions by clicking on them, send units to them, and get the adjacent regions.
What I've tried
A map is defined by 3 files:
A text file, which contains data formatted like this:
"Region Name" "Region Color" "Game-related information" ["Adjacent Region 1", "Adjacent Region 2", ...]'
An image file, where each region is seperated by a black border and has its own color. So for example there could be two regions, one would have the RGB values 255, 0, 0 (red) and another one 255, 255, 255 (white). They are seperated by a black border (but this is not necessary for the algorithm to work).
Another image file, which is the actual image that is drawn to the screen. It is the "nice looking" map.
An example of such a colour map:
(All the white parts evaluate to the same region in the current implementation. Just imagine they all have different colours).
When I load these files, I first load the colour image. Then I load the text file and go through each line. I create regions with the correct settings, as I want to. There's no real performance hit here, as it's simply reading data. A bunch of Region objects is then made, and given the correct colors.
At this stage, everything works fine. I can click on regions, ask the pixel data of the colour image, and by going through all the Regions in a list I can find the one that matches the colour of that particular pixel.
Issues
However, here's where the performance hit comes in:
Issue 1: Units
Each player has a bunch of units. I want to be able to spawn these units in a region. Let's say I want to spawn a unit in the red region. I go through all the pixels in my file, and when I hit a red one, I place the unit there.
for(int i = 0; i < worldmap.size(); i++) {
for(int j = 0; j < worldmap[i].size(); j++) {
if(worldmap[i][j].color == unit_color) {
// place it here
}
}
}
A simple glance at this pseudocode shows that this is not going to work well. Not at a reasonable pace, anyway.
Issue 2: Region colouring
Another issue is that I want to colour the regions owned by players on the "nice looking" map. Let's say player one owns three regions: Blue, Red and Green. I then go through the worldmap, find the blue, red and green pixels on the colour image, and then colour those pixels on the "nice looking" map in a transparent version of the player colour.
However, this is also a very heavy operation and it takes a few seconds.
What I want to ask
Since this is a turn based game, it's not really that big a deal that every now and then, the game slows down a bit. However, it is not to my liking that I'm writing this ugly code.
I have considered other options, such as storing each point of a region as a float, but that would be a massive strain on memory (64 bits times a 3000x1000 resolution image is a lot).
I was wondering if there are algorithms created for this, or if I should try to use more memory to relieve the processor. I've looked for other games and how they do this, but to no avail. I've yet to find some source code on this, or an article.
I have deliberately not put too many code in this question, since it's already fairly lengthy, and the code has a lot of dependencies on other parts of my application. However, if it is needed to solve the problem, I will post some ASAP.
Thanks in advance!
Problem 1: go through the color map with a step size of 10 in both X and Y directions. This reduces the number of pixels considered by a factor of 100. Works if each country contains a square of at least 10x10 pixels.
Problem 2: The best solution here is to do this once, not once per player or once per region. Create a lookup table from region color to player color, iterate over all pixels of the region map, and look up the corresponding player color to apply.
It may help to reduce the region color map to RGB 332 (8 bits total). You probably don't need that many fine shades of lila, and using just one byte colors makes the lookup table a lot easier, just a plain array with 256 elements would work. Considering your maps are 3000x1000 pixels, this would also reduce the map size by 6 MB.
Another thing to consider is whether you really need a region map with 3000x1000 pixel resolution. The nice map may be that big, but the region map could be resampled at 1500x500 pixel resolution. Your borders looked thick enough (more than 2 pixels) so a 1 pixel loss of region resolution would not matter. Yet it would reduce the region map by another 2.25 MB. At 750 kB, it now probably fits in the CPU cache.
What if you traced the regions (so one read through the entire data file) and stored the boundaries. For example, in Java there is a Path2D class which I have used before to store the outlines of states. In fact, if you used this method your data file wouldn't even need all the pixel data, just the boundaries of the areas. This is especially true since it seems your regions aren't dynamic, so you can simply hard-code the boundary values into the data file.
From here you can simply target a location within the boundaries (most libraries/languages with this concept support some sort of isPointInBoundary(x, y) method). You could even create your own Region class that that has a boundary saved to it along with other information (such as what game pieces are currently on it!).
Hope that helps you think about it clearer - should be pretty nice to code too.

How to efficiently display a large number of moving points

I have a large array of points, which updates dynamically. For the most part, only certain (relatively small) parts of the array get updated. The goal of my program is to build and display a picture using these points.
If I build a picture directly from the points it would be 8192 x 8192 pixels in size. I believe an optimization would be to reduce the array in size. My application has two screen areas (the one is a magnification/zooming in of the other). Additionally I will need to pan this picture in either of screen areas.
My approach for optimization is as follows.
Take a source array of points and reduce it with scaling factor for the first screen area
Same for the second area, but with larger scaling factor
Render there two arrays in two FBOs
Using FBOs as a textures (to provide ability to pan a picture)
When updating a picture I re-render only changed area.
Suggest ways to speed this up as my current implementation runs extremely slow.
You will hardly be able to optimize this a lot if you don't have the hardware to run it at an adequate rate. Even if you render in different threads to FBOs and then compose the result, your bottleneck is likely to remain. 67 million data points is nothing to sneeze at, even for modern GPUs.
Try not to update unnecessarily, update only what changes, render only what's updated and visible, try to minimize the size of your components, e.g. use a shorter data type if possible.

Resources