I have a STM32H7 MCU with 1MB of RAM and 1MB of ROM. I need to make a blob detection algorithm on a binary image array of max size 1280x1024.
I have searched about blob detection algorithms and found out that they are mainly divided into 2 categories, LINK:
Algorithms based on label-propagation (One component at a time):
They first search an unlabeled object pixel, label the pixel with a new label; then, in the later processing, they propagate the same label to all object pixels that are connected to the pixel. A demo code would look something like this:
void setLabels(){
int m=2;
for(int y=0; y<height; y++){
for(int x=0; x<width; x++){
if(getPixel(x,y) == 1) compLabel(x,y,m++);
}
}
}
void compLabel(int i, int j,int m){
if(getPixel(i,j)==1){
setPixel(i,j,m); //assign label
compLabel(i-1,j-1,m);
compLabel(i-1,j,m);
compLabel(i-1,j+1,m);
compLabel(i,j-1,m);
compLabel(i,j+1,m);
compLabel(i+1,j-1,m);
compLabel(i+1,j,m);
compLabel(i+1,j+1,m);
}
}
Algorithms based on label-equivalent-resolving (Two-pass): They consist of two steps: in the first step, they assign a provisional label to each object pixel. In the second step, they integrate all provisional labels assigned to each object, which are called equivalent labels, to a unique label, which called the representative label, and replace the provisional label of each object pixel by its representative label.
The down sides of the 1st algorithm is that it is using recursive calls for all the pixel around the original pixel. I am afraid that it will cause hard fault errors on STM32 because of the limited stack.
The down sides of the 2nd algorithm is that it requires a lot of memory for the labeling image. For instance, for the max. resolution of 1280x1024 and for the max. number of labels 255 (0 for no label), image label size is 1.25MB. Way more than we have available.
I am looking for some advice on how to proceed. How to get center coordinates and area information of all blobs in the image without using to much memory? Any help is appreciated. I presume that the 2nd algorithm is out of the picture since there is no memory available.
You firstly have to go over you image with a scaling kernel to scale your image back to something that is able to be processed. 4:1 or 9:1 are good possibilities. Or you are going to have to get more RAM. Because this situation seems unworkable otherwise. Bit access is not really fast and is going to kill your efficiency and I don't even think that you need that big of an image. (at least that is my experience with vision systems)
You can then store the pixels in straight unsigned char array which can be labeled with the first method you named. It doesn't have to be a recursive process. You can also determine if a blob was relabeled to another blob and set a flag to do it again.
This makes it possible to have an externally visible function have a while loop which keeps calling your labeling function without creating a big stack.
Area determination is then done by going over the image and counting the instance of a pixel for every labeled blob.
The center of a certain blob can be found by calulating the moments of a blob and then calculating the center of mass. This is some pretty hefty math so don't be discouraged, it is a though apple to bite through but it is a great solution.
(small hint: you can take the C++ code from OpenCV and look through their code to find out how it's done)
Related
I have this 2d raster upon which are layered from 1 to say 20 other 2d rasters (with random size and offset). I'm searching for fast way to access a sub-rectangle view (with random size and offset). The view should return all the layered pixels for each X and Y coordinate.
I guess this is kind of how say, GIMP or other 2d paint apps draw layers upon each other, with the exception that I want to have all the pixels upon each other, and not just projection where the top pixel hides the other ones below it.
I have met this problem and before and I still do now, spend already a lot time to search around internet and here about similar issues, but can't find any. I will describe two possible solution, both from which I'm not satisfied:
Have a basically 3d array of pre-allocated size. This is easy to manage but the storage wasted and memory overhead is really big. For 4k raster of say 16 slots, 4 bytes each, is like 1 GiB of memory? And in application case, most of that space will be wasted, not used.
My solution which I made before. Have two 2d arrays, one is with indices, the other with actual values. Each "pixel" of the first one says in which range of pixels in the second array you can find the actual pixels contributed from all layers. This is well compressed on size, but any request is bouncing between two memory regions and is a bit hassle to setup, not to mention update (a nice to have feature, but not mandatory).
So... any know-how on such kind of problem? Thank you in advance!
Forgot to add that I'm targeting self-sufficient, preferably single thread, CPU solution. The layers, will be most likely greyscale with alpha (that is, certain pixel data will not existent). Lookup operation is priority, updates like adding/removing a layer can be more slow.
Added by Mark (see comment):
In that image, if taking top-left corner of the red rectangle, a lookup should report red, green, blue and black. If the bottom-right corner is taken, it should report red and black only.
I would store the offsets and size in a data-structure separate from the pixel-data. This way you do not jump around in the memory while you calculate the relative coordinates for each layer (or even if you can ignore some layers).
If you want to access single pixels or small areas rather than iterating big areas a Quad-Tree might be a good idea to store your data with more local memory access while accessing pixels or areas which are near each other (in x or y direction).
Take a look at these two example images:
I would like to be able to identify these types of images inside large set of photographs and similar images. By photograph I mean a photograph of people, a landscape, an animal etc.
I don't mind if some photographs are falsely identified as these uniform images but I wouldn't really want to "miss" some of these by identifying them as photographs.
The simplest thing that came to my mind was to analyze the images pixel by pixel to find highest and lowest R,G,B values (each channel separately). If the difference between lowest and highest value is large, then there are large color changes and such image is probably a photograph.
Other idea was to analyze the Hue value of each pixel in similar fashion. The problem is that in HSL model orangish-red and pinkish-red have roughly 350 degree difference when looking clockwise and 10 degree difference when looking counterclockwise. So I cant just compare each pixel's Hue component because I'll get some weird results.
Also, there is a problem of noise - one white or black pixel will ruin tests like that. So I would need to somehow exclude extreme values if there are only few pixels with such extremes. But at this point it gets more and more complicated and I'm feeling it's not the best approach.
I was also thinking about bumping contrast to the max and then running test like the RGB one I described above. It would probably make things easier but still one or two abnormal pixels would ruin the test anyway. How to deal with such cases?
I don't mind running few different algorithms that would cover different image types. But please note that I'm dealing with images from digital cameras so 6MP, 12MP or even 16MP are quite common. Because of that running computation intensive algorithms is not desired. I deal with hundreds or even thousands of images and have only limited CPU resources for image processing. Lets say a second or two per large image is max what I can accept.
I'm aware that for example a photograph of a blue sky might trigger a false positive, but that's OK. False positives are better than misses.
This how I would do it (Whole Method below, at the bottom of post, but just read from top to bottom):
Your quote:
"By photograph I mean a photograph of people, a landscape, an animal
etc."
My response to your quote:
This means that such images have edges, contours. The images you are
trying to separate out, no edges or little contours(for the second
example image at least)
Your quote:
one white or black pixel will ruin tests like that. So I would need to
somehow exclude extreme values if there are only few pixels with such
extremes
My response:
Minimizing the noise through methods like DoG(Difference of Gaussian), etc will reduce the
noisy, individual pixels
So I have taken your images and run it through the following codes:
cv::cvtColor(image, imagec, CV_BGR2GRAY); // where image is the example image you shown
cv::GaussianBlur(imagec,imagec, cv::Size(3,3), 0, 0, cv::BORDER_DEFAULT ); //blur image
cv::Canny(imagec, imagec, 20, 60, 3);
Results for example image 1 you gave:
As you can see after going through the code, the image became blank(totally black). The image quite big, hence bit difficult to show all in one window.
Results for example 2 you showed me:
The outline can be seen, but one method to solve this, is to introduce an ROI of about 20 to 30 pixels from the dimension of the image, so for instance, if image dimension is 640x320, the ROI may be 610x 290, where it is placed in the center of the image.
So now, let me introduce you my real method:
1) run all the images through the codes above to find edges
2) check which images doesn't have any any edges( images with no edges
will have 0 pixel with values more then 0 or little pixels with values more then 0, so set a slightly higher threshold to play it safe? You adjust accordingly, how many pixels to identify your images )
3) Save/Name out all the images without edges, which will be the images
you are trying to separate out from the rest.
4) The end.
EDIT(TO ANSWER THE COMMENT, would have commented back, but my reply is too long):
true about the blurring part. To minimise usage of blurring, you can first do an "elimination-like process", so those smooth like images in image 1 will be already separated and categorised into images you looking for.
From there you do a second test for the remaining images, which will be the "blurring".
If you really wish to avoid blurring, what I notice is that your example image 1 can be categorised as "smooth surface" while your example image 2 can be categorised as "rough-like surface", meaning which it be noisy, which led me to introduce the blurring in the first place.
From my experience and if I do remember correctly, such rough-like surfaces is very good in "watershed" or "clustering through colour" method, they blend very well, unlike the smooth images.
Since the leftover images are high chances of rough images, you can try the watershed method, and canny, you will find it be a black image, if I am not wrong. Try a line maybe like this:
pyrMeanShiftFiltering( image, images, 10, 20, 3)
I am not very sure if such method will be more expensive than Gaussian blurring. but you can try both and compare the computational speed for both.
In regard to your comment on grayscale images:
Converting to grayscale sounds risky - loosing color information may
trigger lot's of false positives
My answer:
I don't really think so. If your images you are trying to segment out
are of one colour, changing to grayscale doesn't matter. Of course if
you snap a photo of a blue sky, it might cause to be a false negative,
but as you said, those are ok.
If you think about it, images with people, etc inside, the intensity
change differs quite a lot. (of course unless your photograph have
extreme cases, like a green ball on a field of grass)
I do admit that converting to grayscale loses information. But in your
case, I doubt it will affect much, in fact, working with grayscale
images is faster and less expensive.
I would use entropy based approach. I don't have any custom code to share, but the following blog entry should push you in right direction.
http://envalo.com/image-cropping-php-using-entropy-explained/
The thing is, that the uniform images will have very low entropy compared to those with something interesting in them.
So the question is to find the correct threshold and process the whole set.
I would generate a color histogram for each image and compare how much they differ from a given pattern.
Maybe you want to normalize the brightness first to simplify the matching.
This is how I would solve it:
Find the average R, G, and B values across the image
Calculate a value for each pixel that is the sum of the differences of each channel from the average
Remove the top 0.1% of values to ignore outliers
Check the largest remaining difference against a threshold (you'll probably need to determine this threshold by trial and error)
The following apprach might be usefull.
Derive local binary pattern in 5x5 window centered around every pixel. So for one pixel you have 15 boolean values. In some direction (Clockwise or anticlockwise) calculate the number 1-0 and 0-1 changes. This is the feature value of the center pixel.
For all 20x20 window derive the variance of the pixel feature values.
If you take variance of the variances , for a uniform image it should approach towards zero. Whereas for other images it would be quite high. In this way there might be no necessary to fix thresholds and local binary pattern takes care of the potential uneven illumination.
for each of the R,G,B channels, calculate the standard deviation of intensity. If it is low enough, you have an uniform image.
If you are worried about having different uniform areas, calculate the standard deviations for, say, each 20x20 square separately, then calculate average of the standard deviations.
You probably can solve your problem using machine learning (classification). It is easier than it sounds. I will give an example:
1 - Feature extraction: compute a color histogram from all images (a histogram of RGB values). Probably you will want to reduce the number of possible values of R,G and B, so your histogram does not grows so large (this is known as requantization). For example, you could make a histogram that accepts 4 different values of R, G and B, yielding an histogram with 4*4*4 bins: [(R=1, G=1, B=1), (R=1, G=1, B=2), ... (R=4, G=4, B=4)].
2 - Manually mark some images that know that are not photographs.
3 - Train a classifier: now that you have examples of images that are photographs and images that are not photographs, you can use this information to train a classifier. This classifier, given a histogram can be used to predict the image is photography or not.
If you do not want to spend time on the classifier, you could try a more simple approach:
Compute the histogram from the image It that you want to know if it is a photography or not;
Compare the histogram of It with the histograms of all marked images and find the most similar histogram (for example, you can sum the differences between bins);
If the image with the most similar histogram is a photography, then you classify the image It as a photography. Otherwise, classify It as not being a photography
Below is my answer. I write a simple demo to explain my idea by C. You can find it in gist.
Ready:
one color/pixel contains three channels (four channels if you have alpha data)
every channel has 8 bit(256) in common
Make some defines:
#define IMAGEWIDTH 20 // Assumed
#define IMAGEHEIGHT 20 // Assumed
#define CHANNELBIT 8
#define COLORLEVEL 256
typedef struct tagPixel
{
unsigned int R : CHANNELBIT;
unsigned int G : CHANNELBIT;
unsigned int B : CHANNELBIT;
} Pixel;
Collect every count of color for every COLORLEVEL in each channel:
void TraverseAndCount(Pixel image_data[IMAGEWIDTH][IMAGEHEIGHT]
, unsigned int red_counts[COLORLEVEL]
, unsigned int green_counts[COLORLEVEL]
, unsigned int blue_counts[COLORLEVEL]);
Next step is very important. Analyze the count of color:
// just a very simple way to smooth the curve of the counts of colors
// and you can replace it with another way you want
unsigned int CalculateRange(unsigned int min_count
, unsigned int blur_size
, unsigned int color_counts[COLORLEVEL]);
This function does:
i smooth the curve of each channel count in axis - COLORLEVEL by blur_size. (you can smooth it by another way)
calculate the range of counts that is more than min_count
At last, calculate the average of range in each channel:
// calculate the average of the range for each channel of color
// the value is bigger if the image is more probably photographs
float AverageRange(unsigned int min_count, unsigned int blur_size
, unsigned int red_counts[COLORLEVEL]
, unsigned int green_counts[COLORLEVEL]
, unsigned int blue_counts[COLORLEVEL]);
Note:
the result depends the min_count. min_count should bigger than 0.
the bigger result is more probably that the image is a photo.
for a photograph, bigger result is more probably in smaller min_count.
I'm working on a game which is a bit like the boardgame RISK, or the campaign section of the Total War series. I currently have a working implementation of the region system, but because of bad performance, the game hangs after certain commands. I'm sure it is possible to do it better.
What I want to do
I want to be able to present a map, such as a world map, and divide it up into regions (e.g. countries). I want to be able to select regions by clicking on them, send units to them, and get the adjacent regions.
What I've tried
A map is defined by 3 files:
A text file, which contains data formatted like this:
"Region Name" "Region Color" "Game-related information" ["Adjacent Region 1", "Adjacent Region 2", ...]'
An image file, where each region is seperated by a black border and has its own color. So for example there could be two regions, one would have the RGB values 255, 0, 0 (red) and another one 255, 255, 255 (white). They are seperated by a black border (but this is not necessary for the algorithm to work).
Another image file, which is the actual image that is drawn to the screen. It is the "nice looking" map.
An example of such a colour map:
(All the white parts evaluate to the same region in the current implementation. Just imagine they all have different colours).
When I load these files, I first load the colour image. Then I load the text file and go through each line. I create regions with the correct settings, as I want to. There's no real performance hit here, as it's simply reading data. A bunch of Region objects is then made, and given the correct colors.
At this stage, everything works fine. I can click on regions, ask the pixel data of the colour image, and by going through all the Regions in a list I can find the one that matches the colour of that particular pixel.
Issues
However, here's where the performance hit comes in:
Issue 1: Units
Each player has a bunch of units. I want to be able to spawn these units in a region. Let's say I want to spawn a unit in the red region. I go through all the pixels in my file, and when I hit a red one, I place the unit there.
for(int i = 0; i < worldmap.size(); i++) {
for(int j = 0; j < worldmap[i].size(); j++) {
if(worldmap[i][j].color == unit_color) {
// place it here
}
}
}
A simple glance at this pseudocode shows that this is not going to work well. Not at a reasonable pace, anyway.
Issue 2: Region colouring
Another issue is that I want to colour the regions owned by players on the "nice looking" map. Let's say player one owns three regions: Blue, Red and Green. I then go through the worldmap, find the blue, red and green pixels on the colour image, and then colour those pixels on the "nice looking" map in a transparent version of the player colour.
However, this is also a very heavy operation and it takes a few seconds.
What I want to ask
Since this is a turn based game, it's not really that big a deal that every now and then, the game slows down a bit. However, it is not to my liking that I'm writing this ugly code.
I have considered other options, such as storing each point of a region as a float, but that would be a massive strain on memory (64 bits times a 3000x1000 resolution image is a lot).
I was wondering if there are algorithms created for this, or if I should try to use more memory to relieve the processor. I've looked for other games and how they do this, but to no avail. I've yet to find some source code on this, or an article.
I have deliberately not put too many code in this question, since it's already fairly lengthy, and the code has a lot of dependencies on other parts of my application. However, if it is needed to solve the problem, I will post some ASAP.
Thanks in advance!
Problem 1: go through the color map with a step size of 10 in both X and Y directions. This reduces the number of pixels considered by a factor of 100. Works if each country contains a square of at least 10x10 pixels.
Problem 2: The best solution here is to do this once, not once per player or once per region. Create a lookup table from region color to player color, iterate over all pixels of the region map, and look up the corresponding player color to apply.
It may help to reduce the region color map to RGB 332 (8 bits total). You probably don't need that many fine shades of lila, and using just one byte colors makes the lookup table a lot easier, just a plain array with 256 elements would work. Considering your maps are 3000x1000 pixels, this would also reduce the map size by 6 MB.
Another thing to consider is whether you really need a region map with 3000x1000 pixel resolution. The nice map may be that big, but the region map could be resampled at 1500x500 pixel resolution. Your borders looked thick enough (more than 2 pixels) so a 1 pixel loss of region resolution would not matter. Yet it would reduce the region map by another 2.25 MB. At 750 kB, it now probably fits in the CPU cache.
What if you traced the regions (so one read through the entire data file) and stored the boundaries. For example, in Java there is a Path2D class which I have used before to store the outlines of states. In fact, if you used this method your data file wouldn't even need all the pixel data, just the boundaries of the areas. This is especially true since it seems your regions aren't dynamic, so you can simply hard-code the boundary values into the data file.
From here you can simply target a location within the boundaries (most libraries/languages with this concept support some sort of isPointInBoundary(x, y) method). You could even create your own Region class that that has a boundary saved to it along with other information (such as what game pieces are currently on it!).
Hope that helps you think about it clearer - should be pretty nice to code too.
There is a very large picture that could not load into memory once. Because it may cause out of memory exception. I need to zoom this picture to small size. So what should I do?
The simple thought is open an inputstream, and process a buffer size at a time. But the zoom algorithm?
If you can access the picture row-by-row (e.g. it's a bitmap), the simplest thing you could do is just downsample it, e.g. only read every nth pixel of every nth row.
// n is an integer that is the downsampling factor
// width, height are the width and height of the original image, in pixels
// down is a new image that is (height/n * width/n) pixels in size
for (y = 0; y < height; y += n) {
row = ... // read row y from original image into a buffer
for (x = 0; x < width; x += n) {
down[y/n, x/n] = row[x]; // image[row,col] -- shorthand for accessing a pixel
}
}
This is a quick-and-dirty way that can quickly and cheaply resize the original image without ever loading the whole thing into memory. Unfortunately, it also introduces aliasing in the output image (down). Dealing with aliasing would require performing interpolation -- still possible using the above row-by-row approach, but is a bit more involved.
If you can't easily access the image row-by-row, e.g. it's a JPEG, which encodes data in 8x8 blocks, you can still do something similar to the approach I described above. You would simply read a row of blocks instead of a row of pixels -- the remainder of the algorithm would work the same. Furthermore, if you're downsampling by a factor of 8, then it's really easy with JPEG -- you just take the DC coefficient of each block. Downsampling by factors that are multiples of 8 is also possible using this approach.
I've glossed over many other details (such as color channels, pixel stride, etc), but it should be enough to get you started.
There are a lot of different resizing algorithms which offer varying level of quality with the trade off being cpu time.
I believe with any of these you should be able to process a massive file in chunks relatively easily, however, you should probably try existing tools to see whether they can already just handle the massive file anyway.
Gd graphics library allows you to define how much working memory it can use I believe so it obviously already has logic for processing files in chunks.
How do I segment a 2D image into blobs of similar values efficiently? The given input is a n array of integer, which includes hue for non-gray pixels and brightness of gray pixels.
I am writing a virtual mobile robot using Java, and I am using segmentation to analyze the map and also the image from the camera. This is a well-known problem in Computer Vision, but when it's on a robot performance does matter so I wanted some inputs. Algorithm is what matters, so you can post code in any language.
Wikipedia article: Segmentation (image processing)
[PPT] Stanford CS-223-B Lecture 11 Segmentation and Grouping (which says Mean Shift is perhaps the best technique to date)
Mean Shift Pictures (paper is also available from Dorin Comaniciu)
I would downsample,in colourspace and in number of pixels, use a vision method(probably meanshift) and upscale the result.
This is good because downsampling also increases the robustness to noise, and makes it more likely that you get meaningful segments.
You could use floodfill to smooth edges afterwards if you need smoothness.
Some more thoughts (in response to your comment).
1) Did you blend as you downsampled? y[i]=(x[2i]+x[2i+1])/2 This should eliminate noise.
2)How fast do you want it to be?
3)Have you tried dynamic meanshift?(also google for dynamic x for all algorithms x)
Not sure if it is too efficient, but you could try using a Kohonen neural network (or, self-organizing map; SOM) to group the similar values, where each pixel contains the original color and position and only the color is used for the Kohohen grouping.
You should read up before you implement this though, as my knowledge of the Kohonen network goes as far as that it is used for grouping data - so I don't know what the performance/viability options are for your scenario.
There are also Hopfield Networks. They can be mangled into grouping from what I read.
What I have now:
Make a buffer of the same size as the input image, initialized to UNSEGMENTED.
For each pixel in the image where the corresponding buffer value is not UNSEGMENTED, flood the buffer using the pixel value.
a. The border checking of the flooding is done by checking if pixel is within EPSILON (currently set to 10) of the originating pixel's value.
b. Flood filling algorithm.
Possible issue:
The 2.a.'s border checking is called many times in the flood filling algorithm. I could turn it into a lookup if I could precalculate the border using edge detection, but that may add more time than current check.
private boolean isValuesCloseEnough(int a_lhs, int a_rhs) {
return Math.abs(a_lhs - a_rhs) <= EPSILON;
}
Possible Enhancement:
Instead of checking every single pixel for UNSEGMENTED, I could randomly pick a few points. If you are expecting around 10 blobs, picking random points in that order may suffice. Drawback is that you might miss a useful but small blob.
Check out Eyepatch (eyepatch.stanford.edu). It should help you during the investigation phase by providing a variety of possible filters for segmentation.
An alternative to flood-fill is the connnected-components algorithm. So,
Cheaply classify your pixels. e.g. divide pixels in colour space.
Run the cc to find the blobs
Retain the blobs of significant size
This approach is widely used in early vision approaches. For example in the seminal paper "Blobworld: A System for Region-Based Image Indexing and Retrieval".