I have this 2d raster upon which are layered from 1 to say 20 other 2d rasters (with random size and offset). I'm searching for fast way to access a sub-rectangle view (with random size and offset). The view should return all the layered pixels for each X and Y coordinate.
I guess this is kind of how say, GIMP or other 2d paint apps draw layers upon each other, with the exception that I want to have all the pixels upon each other, and not just projection where the top pixel hides the other ones below it.
I have met this problem and before and I still do now, spend already a lot time to search around internet and here about similar issues, but can't find any. I will describe two possible solution, both from which I'm not satisfied:
Have a basically 3d array of pre-allocated size. This is easy to manage but the storage wasted and memory overhead is really big. For 4k raster of say 16 slots, 4 bytes each, is like 1 GiB of memory? And in application case, most of that space will be wasted, not used.
My solution which I made before. Have two 2d arrays, one is with indices, the other with actual values. Each "pixel" of the first one says in which range of pixels in the second array you can find the actual pixels contributed from all layers. This is well compressed on size, but any request is bouncing between two memory regions and is a bit hassle to setup, not to mention update (a nice to have feature, but not mandatory).
So... any know-how on such kind of problem? Thank you in advance!
Forgot to add that I'm targeting self-sufficient, preferably single thread, CPU solution. The layers, will be most likely greyscale with alpha (that is, certain pixel data will not existent). Lookup operation is priority, updates like adding/removing a layer can be more slow.
Added by Mark (see comment):
In that image, if taking top-left corner of the red rectangle, a lookup should report red, green, blue and black. If the bottom-right corner is taken, it should report red and black only.
I would store the offsets and size in a data-structure separate from the pixel-data. This way you do not jump around in the memory while you calculate the relative coordinates for each layer (or even if you can ignore some layers).
If you want to access single pixels or small areas rather than iterating big areas a Quad-Tree might be a good idea to store your data with more local memory access while accessing pixels or areas which are near each other (in x or y direction).
I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.
I have a bunch (up to 255) of colors (of the total 256^3 possible values) and for compression purposes I want to think up some another color, that isn't among them.
For example, I have such a small color table: [0,0,0], [1,42,69] -- any one of the remaining 256^3-2 colors would be fine -- no matter whether it is [0,0,7] or [6,6,6].
Can someone provide me with an easy and efficient algorithm to find another color?
UPD: bad ideas are also welcome.
Make a hash table of all known colors, and put your colors into it.
Make an algorithm that takes a color, and produces its "successor" by incrementing the lowest byte, and continuing with the increment into higher-order bytes when there is a carry.
Start at [0,0,0] and check it against the hash table from step 1.
Loop until you find the first gap.
This algorithm is linear in the number of colors.
Since now we have two answers, I would like to post my own.
We don't want any color in existing color table to become transparent. That is why I stated, that color table can be maximum of 255 colors long.
Because of this there would be at least one Red (or Green or Blue, whatever) channel value left unused. So we don't have to use 256^3 large table of flags -- 256 (bits for memory or bytes for speed) would be enough.
Walk through your image, counting the number of times each of the 256 possible pixel values occurs. Use std::min_element (for one possibility) to find the smallest count, and use that color number. If you're really talking about 256 possible color values, that's about it.
If you really have 24 bits per pixel, then you probably want to use a sparse representation for the counts, since (for any reasonable size of picture) many of them will inevitably be zero (you'd need roughly a 16-megapixel picture to even theoretically use all the possible colors). OTOH, on a modern computer, even using the few dozen megabytes (or so) necessary for a dense representation of the count may be worthwhile--it'll probably make your processing faster (no hash codes to compute) and still little enough memory usage that the reduction in processing time is worth it.
Take a look at these two example images:
I would like to be able to identify these types of images inside large set of photographs and similar images. By photograph I mean a photograph of people, a landscape, an animal etc.
I don't mind if some photographs are falsely identified as these uniform images but I wouldn't really want to "miss" some of these by identifying them as photographs.
The simplest thing that came to my mind was to analyze the images pixel by pixel to find highest and lowest R,G,B values (each channel separately). If the difference between lowest and highest value is large, then there are large color changes and such image is probably a photograph.
Other idea was to analyze the Hue value of each pixel in similar fashion. The problem is that in HSL model orangish-red and pinkish-red have roughly 350 degree difference when looking clockwise and 10 degree difference when looking counterclockwise. So I cant just compare each pixel's Hue component because I'll get some weird results.
Also, there is a problem of noise - one white or black pixel will ruin tests like that. So I would need to somehow exclude extreme values if there are only few pixels with such extremes. But at this point it gets more and more complicated and I'm feeling it's not the best approach.
I was also thinking about bumping contrast to the max and then running test like the RGB one I described above. It would probably make things easier but still one or two abnormal pixels would ruin the test anyway. How to deal with such cases?
I don't mind running few different algorithms that would cover different image types. But please note that I'm dealing with images from digital cameras so 6MP, 12MP or even 16MP are quite common. Because of that running computation intensive algorithms is not desired. I deal with hundreds or even thousands of images and have only limited CPU resources for image processing. Lets say a second or two per large image is max what I can accept.
I'm aware that for example a photograph of a blue sky might trigger a false positive, but that's OK. False positives are better than misses.
This how I would do it (Whole Method below, at the bottom of post, but just read from top to bottom):
Your quote:
"By photograph I mean a photograph of people, a landscape, an animal
etc."
My response to your quote:
This means that such images have edges, contours. The images you are
trying to separate out, no edges or little contours(for the second
example image at least)
Your quote:
one white or black pixel will ruin tests like that. So I would need to
somehow exclude extreme values if there are only few pixels with such
extremes
My response:
Minimizing the noise through methods like DoG(Difference of Gaussian), etc will reduce the
noisy, individual pixels
So I have taken your images and run it through the following codes:
cv::cvtColor(image, imagec, CV_BGR2GRAY); // where image is the example image you shown
cv::GaussianBlur(imagec,imagec, cv::Size(3,3), 0, 0, cv::BORDER_DEFAULT ); //blur image
cv::Canny(imagec, imagec, 20, 60, 3);
Results for example image 1 you gave:
As you can see after going through the code, the image became blank(totally black). The image quite big, hence bit difficult to show all in one window.
Results for example 2 you showed me:
The outline can be seen, but one method to solve this, is to introduce an ROI of about 20 to 30 pixels from the dimension of the image, so for instance, if image dimension is 640x320, the ROI may be 610x 290, where it is placed in the center of the image.
So now, let me introduce you my real method:
1) run all the images through the codes above to find edges
2) check which images doesn't have any any edges( images with no edges
will have 0 pixel with values more then 0 or little pixels with values more then 0, so set a slightly higher threshold to play it safe? You adjust accordingly, how many pixels to identify your images )
3) Save/Name out all the images without edges, which will be the images
you are trying to separate out from the rest.
4) The end.
EDIT(TO ANSWER THE COMMENT, would have commented back, but my reply is too long):
true about the blurring part. To minimise usage of blurring, you can first do an "elimination-like process", so those smooth like images in image 1 will be already separated and categorised into images you looking for.
From there you do a second test for the remaining images, which will be the "blurring".
If you really wish to avoid blurring, what I notice is that your example image 1 can be categorised as "smooth surface" while your example image 2 can be categorised as "rough-like surface", meaning which it be noisy, which led me to introduce the blurring in the first place.
From my experience and if I do remember correctly, such rough-like surfaces is very good in "watershed" or "clustering through colour" method, they blend very well, unlike the smooth images.
Since the leftover images are high chances of rough images, you can try the watershed method, and canny, you will find it be a black image, if I am not wrong. Try a line maybe like this:
pyrMeanShiftFiltering( image, images, 10, 20, 3)
I am not very sure if such method will be more expensive than Gaussian blurring. but you can try both and compare the computational speed for both.
In regard to your comment on grayscale images:
Converting to grayscale sounds risky - loosing color information may
trigger lot's of false positives
My answer:
I don't really think so. If your images you are trying to segment out
are of one colour, changing to grayscale doesn't matter. Of course if
you snap a photo of a blue sky, it might cause to be a false negative,
but as you said, those are ok.
If you think about it, images with people, etc inside, the intensity
change differs quite a lot. (of course unless your photograph have
extreme cases, like a green ball on a field of grass)
I do admit that converting to grayscale loses information. But in your
case, I doubt it will affect much, in fact, working with grayscale
images is faster and less expensive.
I would use entropy based approach. I don't have any custom code to share, but the following blog entry should push you in right direction.
http://envalo.com/image-cropping-php-using-entropy-explained/
The thing is, that the uniform images will have very low entropy compared to those with something interesting in them.
So the question is to find the correct threshold and process the whole set.
I would generate a color histogram for each image and compare how much they differ from a given pattern.
Maybe you want to normalize the brightness first to simplify the matching.
This is how I would solve it:
Find the average R, G, and B values across the image
Calculate a value for each pixel that is the sum of the differences of each channel from the average
Remove the top 0.1% of values to ignore outliers
Check the largest remaining difference against a threshold (you'll probably need to determine this threshold by trial and error)
The following apprach might be usefull.
Derive local binary pattern in 5x5 window centered around every pixel. So for one pixel you have 15 boolean values. In some direction (Clockwise or anticlockwise) calculate the number 1-0 and 0-1 changes. This is the feature value of the center pixel.
For all 20x20 window derive the variance of the pixel feature values.
If you take variance of the variances , for a uniform image it should approach towards zero. Whereas for other images it would be quite high. In this way there might be no necessary to fix thresholds and local binary pattern takes care of the potential uneven illumination.
for each of the R,G,B channels, calculate the standard deviation of intensity. If it is low enough, you have an uniform image.
If you are worried about having different uniform areas, calculate the standard deviations for, say, each 20x20 square separately, then calculate average of the standard deviations.
You probably can solve your problem using machine learning (classification). It is easier than it sounds. I will give an example:
1 - Feature extraction: compute a color histogram from all images (a histogram of RGB values). Probably you will want to reduce the number of possible values of R,G and B, so your histogram does not grows so large (this is known as requantization). For example, you could make a histogram that accepts 4 different values of R, G and B, yielding an histogram with 4*4*4 bins: [(R=1, G=1, B=1), (R=1, G=1, B=2), ... (R=4, G=4, B=4)].
2 - Manually mark some images that know that are not photographs.
3 - Train a classifier: now that you have examples of images that are photographs and images that are not photographs, you can use this information to train a classifier. This classifier, given a histogram can be used to predict the image is photography or not.
If you do not want to spend time on the classifier, you could try a more simple approach:
Compute the histogram from the image It that you want to know if it is a photography or not;
Compare the histogram of It with the histograms of all marked images and find the most similar histogram (for example, you can sum the differences between bins);
If the image with the most similar histogram is a photography, then you classify the image It as a photography. Otherwise, classify It as not being a photography
Below is my answer. I write a simple demo to explain my idea by C. You can find it in gist.
Ready:
one color/pixel contains three channels (four channels if you have alpha data)
every channel has 8 bit(256) in common
Make some defines:
#define IMAGEWIDTH 20 // Assumed
#define IMAGEHEIGHT 20 // Assumed
#define CHANNELBIT 8
#define COLORLEVEL 256
typedef struct tagPixel
{
unsigned int R : CHANNELBIT;
unsigned int G : CHANNELBIT;
unsigned int B : CHANNELBIT;
} Pixel;
Collect every count of color for every COLORLEVEL in each channel:
void TraverseAndCount(Pixel image_data[IMAGEWIDTH][IMAGEHEIGHT]
, unsigned int red_counts[COLORLEVEL]
, unsigned int green_counts[COLORLEVEL]
, unsigned int blue_counts[COLORLEVEL]);
Next step is very important. Analyze the count of color:
// just a very simple way to smooth the curve of the counts of colors
// and you can replace it with another way you want
unsigned int CalculateRange(unsigned int min_count
, unsigned int blur_size
, unsigned int color_counts[COLORLEVEL]);
This function does:
i smooth the curve of each channel count in axis - COLORLEVEL by blur_size. (you can smooth it by another way)
calculate the range of counts that is more than min_count
At last, calculate the average of range in each channel:
// calculate the average of the range for each channel of color
// the value is bigger if the image is more probably photographs
float AverageRange(unsigned int min_count, unsigned int blur_size
, unsigned int red_counts[COLORLEVEL]
, unsigned int green_counts[COLORLEVEL]
, unsigned int blue_counts[COLORLEVEL]);
Note:
the result depends the min_count. min_count should bigger than 0.
the bigger result is more probably that the image is a photo.
for a photograph, bigger result is more probably in smaller min_count.
At the moment I am working on an on screen display project with black, white and transparent pixels. (This is an open source project: http://code.google.com/p/super-osd; that shows the 256x192 pixel set/clear OSD in development but I'm migrating to a white/black/clear OSD.)
Since each pixel is black, white or transparent I can use a simple 2 bit/4 state encoding where I store the black/white selection and the transparent selection. So I would have a truth table like this (x = don't care):
B/W T
x 0 pixel is transparent
0 1 pixel is black
1 1 pixel is white
However as can be clearly seen this wastes one bit when the pixel is transparent. I'm designing for a memory constrained microcontroller, so whenever I can save memory it is good.
So I'm trying to think of a way to pack these 3 states into some larger unit (say, a byte.) I am open to using lookup tables to decode and encode the data, so a complex algorithm can be used, but it cannot depend on the states of the pixels before or after the current unit/byte (this rules out any proper data compression algorithm) and the size must be consistent; that is, a scene with all transparent pixels must be the same as a scene with random noise. I was imagining something on the level of densely packed decimal which packs 3 x 4-bit (0-9) BCD numbers in only 10 bits with something like 24 states remaining out of the 1024, which is great. So does anyone have any ideas?
Any suggestions? Thanks!
In a byte (256 possible values) you can store 5 of your three-bit values. One way to look at it: three to the fifth power is 243, slightly less than 256. The fact that it's slightly less also shows that you're not wasting much of a fraction of a bit (hardly any, either).
For encoding five of your 3-bit "digits" into a byte, think of taking a number in base 3 made from your five "digits" in succession -- the resulting value is guaranteed to be less than 243 and therefore directly storable in a byte. Similarly, for decoding, do the base-3 conversion of a byte's value.