The fastest algorithm for splitting image into footprint and nodata values - algorithm

I have a random satellite image that can be divided into 2 classes:
1) no data values (all pixel values are equal and randomly vary from image to image)
2) footprint (all pixel values are random)
A sum of all the values of no data and footprint gives a bounding box.
What is the fastest algorithm for dividing a random satellite image into these 2 classes?
UPDATE:
Are no data value-areas always at the border of the image?
No data value could not be inside of the footprint and it may be absent.
Are no data-values always black?
No, it's value may vary from picture to picture. But always equal each other inside one image.
Does this no data value-color appear within the footprint?
Most of the images are grayscale and may be in 16, 8-bit data formats. But i need general algorithm. Case specific algorithm is not what i want.
UPDATE 2:
My current approach is:
1) Take every pixel values that lay on the bounding box boarder
2) Take most frequent value and set it as nodata
3) Reclassify image into 2 classes with values: NoData value - nodata class,
1 - footprint class
4) Convert rasters pixels with value 1 into vector format
For big images it take more than 5 minutes to get vector boarders of footprint.

A simple way for you would be to multiply the pixel intensities. From the images you uploaded, the no data values are esentially of 0 intensity. Instead of going for complex methods, simply multiply the image intensities by 1000.
I used OpenCV and could segment out the regions in under 4 lines of code.
Here's an example -

Related

How to convert a DICOM from Monochrome 1 to Monochrome 2?

I am working on a project with DICOM images where I need to compare two DICOM images. The problem is, one is in monochrome 1 and the other is in monochrome 2 (zero means white and black, respectively). How can I convert these pixel intensities to compare them? I am using the "pydicom" toolkit.
Your major problem is not the Photometric Interpretation (MONO1/2).
You cannot compare pixel intensities of two DICOM images unless they refer to the same scale (e.g. Hounsfield Units).
If you have
(0028,1052) RescaleIntercept - present with any value
(0028,1053) RescaleSlope - present with any value
(0028,1054) RescaleType - present with value "OD" or "HU"
Then it is pretty easy: Apply the linear transformation:
<measured value> = <pixel value> * RescaleSlope + RescaleIntercept
The measured values can be compared.
The same is true if you have a non-linear Modality LUT stored as a lookup table in the header, but the same restrictions apply for Rescale Type.
Otherwise I would refrain from comparing pixel values. Of course, it appears to be easy to just invert one of the two images, but the fact that they have different Photometric Interpretation tells me that they have been acquired by different devices or techniques. This means, that the pixel data is visually correct and comparable but not mathematically related.
If it helps, when visualising with matplotlib.pyplot you can use
plt.imshow(image, cmap='gray_r')
to invert the pixels back to Monochrome2 for visual comparison without changing pixel values.
Also,
np.invert(image)
might be a work-around.

Image resizing method during preprocessing for neural network

I am new to machine learning. I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network. I aim to resize images and turn each image into one row by making each pixel a separate column.
If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).
I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).
I am thinking of another way of doing it, but I am sure. Assume I want 10000 columns per image. Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels. So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns. For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).
The third solution I mentioned above seems to preserve the original shape of the image. However, it may be losing all of this originality while converting into a row since not all images are of the same size. I wonder about your comments on this issue. It will also be great if you can direct me to sources I can learn about the topic.
Solution 1 (simply resizing the input image) is a common approach. Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.
As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image. You can get around that by running the classification on multiple subwindows of the original image (i.e., classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride). Then, you need to decide how to combine your multiple classification results.
Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent). So you need to pass an image with explicit dimensions (e.g., 100 x 100) unless the network expects an array that was flattened from assumed dimensions. But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.
Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.

Detecting individual images in an array of images

I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.

Grayscale image compression using Huffman Coding in MATLAB

I am trying to compress a grayscale image using Huffman coding in MATLAB, and have tried the following code.
I have used a grayscale image with size 512x512 in tif format. My problem is that the size of the compressed image (length of the compressed codeword) is getting bigger than the size of the uncompressed image. The compression ratio is getting less than 1.
clc;
clear all;
A1 = imread('fig1.tif');
[M N]=size(A1);
A = A1(:);
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
[dict,avglen]=huffmandict(count,p) % build the Huffman dictionary
comp= huffmanenco(A,dict); %encode your original image with the dictionary you just built
compression_ratio= (512*512*8)/length(comp) %computing the compression ratio
%% DECODING
Im = huffmandeco(comp,dict); % Decode the code
I11=uint8(Im);
decomp=reshape(I11,M,N);
imshow(decomp);
There is a slight error in your code. I'm assuming you want to calculate the probability of encountering each pixel, which is the normalized histogram. You're not computing it properly. Specifically:
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
total is summing over [0,255] which is not correct. You're supposed to compute the probability distribution of your image. You should use imhist for that instead. As such, you should do this instead:
count = 0:255;
p = imhist(A1) / numel(A1);
This will correctly calculate your probability distribution for your image. Remember, when you're doing Huffman coding, you need to specify the probability of encountering a pixel. Assuming that each pixel can equally be likely to be chosen, this is captured by calculating the image's histogram, then normalizing by the total number of pixels in your image. Try that and see if you get any better results.
However, Huffman will only give you good compression ratios if you have frequently occurring symbols. Did you happen to take a look at the histogram or the spread of your pixels in your image?
If the spread is quite large, with very few entries per bin, then Huffman will not give you any compression savings. In fact it may give you a larger size as a result. Bear in mind that the TIFF compression standard only uses Huffman as part of the algorithm. There is also some pre- and post-processing done to further drive down the size.
As a further example, suppose I had an image that consisted of [0, 1, 2, ... 255; 0, 1, 2, ..., 255; 0, 1, 2, ..., 255]; I have 3 rows of [0,255], but really it could be any number of rows. This means that the probability of encountering each symbol is equiprobable, or 1/255, which means that for each symbol, we would need 8 bits per symbol... which is essentially the raw pixel value anyway!
The key behind Huffman is that a group of bits together generate one symbol. Frequently occurring symbols get assigned a smaller sequence of bits. Because this particular image that I talked about has intensities that are equiprobable, then you'd only generate one symbol per intensity rather than a group. With this, not only will you transmit the dictionary, you would effectively be sending one character at a time, and this is no better than sending the raw byte stream.
If you want your image to be compressed by raw Huffman, the distribution of pixels has to be skewed. For example, if most of the intensities in your image are dark, or are bright. If your image has good contrast or if the spread of the pixel intensities is flat throughout the image, then Huffman will not give you any compression savings.

Detect uniform images that (most probably) are not photographs

Take a look at these two example images:
I would like to be able to identify these types of images inside large set of photographs and similar images. By photograph I mean a photograph of people, a landscape, an animal etc.
I don't mind if some photographs are falsely identified as these uniform images but I wouldn't really want to "miss" some of these by identifying them as photographs.
The simplest thing that came to my mind was to analyze the images pixel by pixel to find highest and lowest R,G,B values (each channel separately). If the difference between lowest and highest value is large, then there are large color changes and such image is probably a photograph.
Other idea was to analyze the Hue value of each pixel in similar fashion. The problem is that in HSL model orangish-red and pinkish-red have roughly 350 degree difference when looking clockwise and 10 degree difference when looking counterclockwise. So I cant just compare each pixel's Hue component because I'll get some weird results.
Also, there is a problem of noise - one white or black pixel will ruin tests like that. So I would need to somehow exclude extreme values if there are only few pixels with such extremes. But at this point it gets more and more complicated and I'm feeling it's not the best approach.
I was also thinking about bumping contrast to the max and then running test like the RGB one I described above. It would probably make things easier but still one or two abnormal pixels would ruin the test anyway. How to deal with such cases?
I don't mind running few different algorithms that would cover different image types. But please note that I'm dealing with images from digital cameras so 6MP, 12MP or even 16MP are quite common. Because of that running computation intensive algorithms is not desired. I deal with hundreds or even thousands of images and have only limited CPU resources for image processing. Lets say a second or two per large image is max what I can accept.
I'm aware that for example a photograph of a blue sky might trigger a false positive, but that's OK. False positives are better than misses.
This how I would do it (Whole Method below, at the bottom of post, but just read from top to bottom):
Your quote:
"By photograph I mean a photograph of people, a landscape, an animal
etc."
My response to your quote:
This means that such images have edges, contours. The images you are
trying to separate out, no edges or little contours(for the second
example image at least)
Your quote:
one white or black pixel will ruin tests like that. So I would need to
somehow exclude extreme values if there are only few pixels with such
extremes
My response:
Minimizing the noise through methods like DoG(Difference of Gaussian), etc will reduce the
noisy, individual pixels
So I have taken your images and run it through the following codes:
cv::cvtColor(image, imagec, CV_BGR2GRAY); // where image is the example image you shown
cv::GaussianBlur(imagec,imagec, cv::Size(3,3), 0, 0, cv::BORDER_DEFAULT ); //blur image
cv::Canny(imagec, imagec, 20, 60, 3);
Results for example image 1 you gave:
As you can see after going through the code, the image became blank(totally black). The image quite big, hence bit difficult to show all in one window.
Results for example 2 you showed me:
The outline can be seen, but one method to solve this, is to introduce an ROI of about 20 to 30 pixels from the dimension of the image, so for instance, if image dimension is 640x320, the ROI may be 610x 290, where it is placed in the center of the image.
So now, let me introduce you my real method:
1) run all the images through the codes above to find edges
2) check which images doesn't have any any edges( images with no edges
will have 0 pixel with values more then 0 or little pixels with values more then 0, so set a slightly higher threshold to play it safe? You adjust accordingly, how many pixels to identify your images )
3) Save/Name out all the images without edges, which will be the images
you are trying to separate out from the rest.
4) The end.
EDIT(TO ANSWER THE COMMENT, would have commented back, but my reply is too long):
true about the blurring part. To minimise usage of blurring, you can first do an "elimination-like process", so those smooth like images in image 1 will be already separated and categorised into images you looking for.
From there you do a second test for the remaining images, which will be the "blurring".
If you really wish to avoid blurring, what I notice is that your example image 1 can be categorised as "smooth surface" while your example image 2 can be categorised as "rough-like surface", meaning which it be noisy, which led me to introduce the blurring in the first place.
From my experience and if I do remember correctly, such rough-like surfaces is very good in "watershed" or "clustering through colour" method, they blend very well, unlike the smooth images.
Since the leftover images are high chances of rough images, you can try the watershed method, and canny, you will find it be a black image, if I am not wrong. Try a line maybe like this:
pyrMeanShiftFiltering( image, images, 10, 20, 3)
I am not very sure if such method will be more expensive than Gaussian blurring. but you can try both and compare the computational speed for both.
In regard to your comment on grayscale images:
Converting to grayscale sounds risky - loosing color information may
trigger lot's of false positives
My answer:
I don't really think so. If your images you are trying to segment out
are of one colour, changing to grayscale doesn't matter. Of course if
you snap a photo of a blue sky, it might cause to be a false negative,
but as you said, those are ok.
If you think about it, images with people, etc inside, the intensity
change differs quite a lot. (of course unless your photograph have
extreme cases, like a green ball on a field of grass)
I do admit that converting to grayscale loses information. But in your
case, I doubt it will affect much, in fact, working with grayscale
images is faster and less expensive.
I would use entropy based approach. I don't have any custom code to share, but the following blog entry should push you in right direction.
http://envalo.com/image-cropping-php-using-entropy-explained/
The thing is, that the uniform images will have very low entropy compared to those with something interesting in them.
So the question is to find the correct threshold and process the whole set.
I would generate a color histogram for each image and compare how much they differ from a given pattern.
Maybe you want to normalize the brightness first to simplify the matching.
This is how I would solve it:
Find the average R, G, and B values across the image
Calculate a value for each pixel that is the sum of the differences of each channel from the average
Remove the top 0.1% of values to ignore outliers
Check the largest remaining difference against a threshold (you'll probably need to determine this threshold by trial and error)
The following apprach might be usefull.
Derive local binary pattern in 5x5 window centered around every pixel. So for one pixel you have 15 boolean values. In some direction (Clockwise or anticlockwise) calculate the number 1-0 and 0-1 changes. This is the feature value of the center pixel.
For all 20x20 window derive the variance of the pixel feature values.
If you take variance of the variances , for a uniform image it should approach towards zero. Whereas for other images it would be quite high. In this way there might be no necessary to fix thresholds and local binary pattern takes care of the potential uneven illumination.
for each of the R,G,B channels, calculate the standard deviation of intensity. If it is low enough, you have an uniform image.
If you are worried about having different uniform areas, calculate the standard deviations for, say, each 20x20 square separately, then calculate average of the standard deviations.
You probably can solve your problem using machine learning (classification). It is easier than it sounds. I will give an example:
1 - Feature extraction: compute a color histogram from all images (a histogram of RGB values). Probably you will want to reduce the number of possible values of R,G and B, so your histogram does not grows so large (this is known as requantization). For example, you could make a histogram that accepts 4 different values of R, G and B, yielding an histogram with 4*4*4 bins: [(R=1, G=1, B=1), (R=1, G=1, B=2), ... (R=4, G=4, B=4)].
2 - Manually mark some images that know that are not photographs.
3 - Train a classifier: now that you have examples of images that are photographs and images that are not photographs, you can use this information to train a classifier. This classifier, given a histogram can be used to predict the image is photography or not.
If you do not want to spend time on the classifier, you could try a more simple approach:
Compute the histogram from the image It that you want to know if it is a photography or not;
Compare the histogram of It with the histograms of all marked images and find the most similar histogram (for example, you can sum the differences between bins);
If the image with the most similar histogram is a photography, then you classify the image It as a photography. Otherwise, classify It as not being a photography
Below is my answer. I write a simple demo to explain my idea by C. You can find it in gist.
Ready:
one color/pixel contains three channels (four channels if you have alpha data)
every channel has 8 bit(256) in common
Make some defines:
#define IMAGEWIDTH 20 // Assumed
#define IMAGEHEIGHT 20 // Assumed
#define CHANNELBIT 8
#define COLORLEVEL 256
typedef struct tagPixel
{
unsigned int R : CHANNELBIT;
unsigned int G : CHANNELBIT;
unsigned int B : CHANNELBIT;
} Pixel;
Collect every count of color for every COLORLEVEL in each channel:
void TraverseAndCount(Pixel image_data[IMAGEWIDTH][IMAGEHEIGHT]
, unsigned int red_counts[COLORLEVEL]
, unsigned int green_counts[COLORLEVEL]
, unsigned int blue_counts[COLORLEVEL]);
Next step is very important. Analyze the count of color:
// just a very simple way to smooth the curve of the counts of colors
// and you can replace it with another way you want
unsigned int CalculateRange(unsigned int min_count
, unsigned int blur_size
, unsigned int color_counts[COLORLEVEL]);
This function does:
i smooth the curve of each channel count in axis - COLORLEVEL by blur_size. (you can smooth it by another way)
calculate the range of counts that is more than min_count
At last, calculate the average of range in each channel:
// calculate the average of the range for each channel of color
// the value is bigger if the image is more probably photographs
float AverageRange(unsigned int min_count, unsigned int blur_size
, unsigned int red_counts[COLORLEVEL]
, unsigned int green_counts[COLORLEVEL]
, unsigned int blue_counts[COLORLEVEL]);
Note:
the result depends the min_count. min_count should bigger than 0.
the bigger result is more probably that the image is a photo.
for a photograph, bigger result is more probably in smaller min_count.

Resources