I have an array with 384x255 uint8 contains the features of an image, I want to train this image by using svmtrain. How to convert this array to 1-by-N single matrix so the number of rows will be equal to label.
I will explain my problem, I have extracted HOG features for ~500 images and saved results in a matrix.. easily, this matrix consist of 500 rows, each row has a HOG feature of one image.
BUT when I tried to extract LBP feature every thing is different. The matrix is about 384x255 uint8 for each image (I have ~500 images). I make reshape for this big matrix to be 500 rows, each raw has LBP features of an image but after classification them by SVM classifier, the results was terrible. So, does reshaping and converting from uint8 to single may change data and effect results?
Supposing that your array is stored in A:
B = reshape(single(A), 1, []);
Related
I have a 2D CNN model where I perform a classification task. My images are all coming from a sensor data after conversion.
So, normally, my way is to convert them into images using the following approach
newsize = (9, 1000)
pic = acc_normalized[0]
img = Image.fromarray(np.uint8(pic*255), 'L')
img = img.resize(newsize)
image_path = "Images_Accel"
image_name = "D1." + str(2)
img.save(f"{image_path}/{image_name}.jpeg")
This is what I obtain:
However, their precision is sort of important. For instance, some of the numerical values are like:
117.79348187327987 or 117.76568758022673.
As you see in the above line, their difference is the digits, when I use uint8, it only takes 117 to when converting it into image pixels and it looks the same, right? But, I'd like to make them different. In some cases, the difference is even at the 8th or 10th digit.
So, when I try to use mode F and save them .jpeg in Image.fromarray line it gives me error and says that PIL cannot write mode F to jpeg.
Then, I tried to first convert them RGB like following;
img = Image.fromarray(pic, 'RGB')
I am not including np.float32 just before pic or not multiplying it by 255 as it is. Then, I convert this image to grayscale. This is what I got for RGB image;
After converting RGB into grayscale:
As you see, it seems that there is a critical different between the first pic and the last pic. So, what should be the proper way to use them in 2D CNN classification? or, should I convert them into RGB and choose grayscale in CNN implementation and a channel of 1? My image dimensions 1000x9. I can even change this dimension like 250x36 or 100x90. It doesn't matter too much. By the way, in the CNN network, I am able to get more than 90% test accuracy when I use the first-type of image.
The main problem here is using which image conversion method I'll be able to take into account those precision differences across the pixels. Would you give me some idea?
---- EDIT -----
Using .tiff format I made some quick comparisons.
First of all, my data looks like the following;
So, if I convert this first reading into an image using the following code where I use np.float64 and L gives me a grayscale image;
newsize = (9, 1000)
pic = acc_normalized[0]
img = Image.fromarray(np.float64(pic), 'L')
img = img.resize(newsize)
image_path = "Images_Accel"
image_name = "D1." + str(2)
img.save(f"{image_path}/{image_name}.tiff")
It gives me this image;
Then, the first 15x9 matrix seems like following image; The contradiction is that if you take a closer look at the numerical array, for instance (1,4) member, it's a complete black where the numerical array is equal to 0.4326132099074307. For grayscale images, black means that it's close to 0 cause it makes white if it's close to 1. However, if it's making a row operation, there is another value closer to 0 and I was expecting to see it black at (1,5) location. If it does a column operation, there is again something wrong. As I said, this data has been already normalized and varies within 0 and 1. So, what's the logic that it converts the array into an image? What kind of operation it does?
Secondly, if I first get an RGB image of the data and then convert it into a grayscale image, why I am not having exactly the same image as what I obtained first? Should the image coming from direct grayscale conversion (L method, np.float64) and the one coming from RGB-based (first I get RGB then convert it to grayscale) be the same? There is a difference in black-white pixels in those images. I do not know why we have it.
---- EDIT 2 ----
.tiff image with F mode and np.float32 gives the following;
I don't really understand your question, but you seem to want to store image differences that are less than 1, i.e. less than the resolution of integer values.
To do so, you need to use an image format that can store floats. JPEG, PNG, GIF, TGA and BMP cannot store floats. Instead, use TIFF, EXR or PFM formats which can handle floats.
Alternatively, you can create 16-bit PNG images wherein each pixel can store values in range 0..65535. So, say the maximum difference you wanted to store was 60 you could calculate the difference and multiply it by 1000 and round it to make an integer in range 0..60000 and store as 16-bit PNG.
You could record the scale factor as a comment within the image if it is variable.
I have a random satellite image that can be divided into 2 classes:
1) no data values (all pixel values are equal and randomly vary from image to image)
2) footprint (all pixel values are random)
A sum of all the values of no data and footprint gives a bounding box.
What is the fastest algorithm for dividing a random satellite image into these 2 classes?
UPDATE:
Are no data value-areas always at the border of the image?
No data value could not be inside of the footprint and it may be absent.
Are no data-values always black?
No, it's value may vary from picture to picture. But always equal each other inside one image.
Does this no data value-color appear within the footprint?
Most of the images are grayscale and may be in 16, 8-bit data formats. But i need general algorithm. Case specific algorithm is not what i want.
UPDATE 2:
My current approach is:
1) Take every pixel values that lay on the bounding box boarder
2) Take most frequent value and set it as nodata
3) Reclassify image into 2 classes with values: NoData value - nodata class,
1 - footprint class
4) Convert rasters pixels with value 1 into vector format
For big images it take more than 5 minutes to get vector boarders of footprint.
A simple way for you would be to multiply the pixel intensities. From the images you uploaded, the no data values are esentially of 0 intensity. Instead of going for complex methods, simply multiply the image intensities by 1000.
I used OpenCV and could segment out the regions in under 4 lines of code.
Here's an example -
Say I have an grey scale image S and I'm looking to ignore all values above 250, how do I do it with out using NaN? the reason I don't want to use NaN is because Im looking to take statistical information from the resultant image such as average etc.
You can collect all image pixel intensities that are less than 250. That's effectively performing the same thing. If your image was stored in A, you can simply do:
pix = A(A < 250);
pix will be a single vector of all image pixels in A that have intensity of 249 or less. From there, you can perform whatever operations you want, such as the average, standard deviation, calculating the histogram of the above, etc.
Going with your post title, we can calculate the histogram of an image very easily using imhist that's part of the image processing toolbox, and so:
out = imhist(pix);
This will give you a 256 element vector where each value denotes the intensity count for a particular intensity. If we did this properly, you should only see bin counts up to intensity 249 (location 250 in the vector) and you should. If you don't have the image processing toolbox, you can repeat the same thing using histc and manually specifying the bin cutoffs to go from 0 up to 249:
out = histc(pix, 0:249);
The difference here is that we will get a histogram of exactly 250 bins whereas imhist will give you 256 bins by default. However, histc is soon to be deprecated and histcounts is what is recommended to use. Still the same syntax:
out = histcounts(pix, 0:249);
You can use logical indexing to build a histogram only using values in your specified range. For example you might do something like:
histogram(imgData(imgData < 250))
It is well known that histeq in MATLAB can perform histogram matching so that an image's histogram is transformed to look like another histogram. I am trying to perform this same operation without using histeq. I'm aware that you need to calculate the CDFs between the two images, but I'm not sure what to do next. What do I do?
Histogram matching is concerned with transforming one image's histogram so that it looks like another. The basic principle is to compute the histogram of each image individually, then compute their discrete cumulative distribution functions (CDFs). Let's denote the CDF of first image as while the CDF of the second image is . Therefore, would denote what the CDF value is for intensity x for the first image.
Once you calculate the CDFs for each of the images, you need to compute a mapping that transforms one intensity from the first image so that it is in agreement with the intensity distribution of the second image. To do this, for each intensity in the first image - let's call this which will be from [0,255] assuming an 8-bit image - we must find an intensity in the second image (also in the range of [0,255]) such that:
There may be a case where we won't get exactly an equality, so what you would need to do is find the smallest absolute difference between and . In other words, for a mapping M, for each entry of , we must find an intensity such that:
You would do this for all 256 values, and we would produce a mapping. Once you find this mapping, you simply have to apply this mapping on the first image to get it to look like the intensity distribution of the second image. A rough (and perhaps inefficient) algorithm would look something like this. Let im1 be the first image (of type uint8) while im2 is the second image (of type uint8):
M = zeros(256,1,'uint8'); %// Store mapping - Cast to uint8 to respect data type
hist1 = imhist(im1); %// Compute histograms
hist2 = imhist(im2);
cdf1 = cumsum(hist1) / numel(im1); %// Compute CDFs
cdf2 = cumsum(hist2) / numel(im2);
%// Compute the mapping
for idx = 1 : 256
[~,ind] = min(abs(cdf1(idx) - cdf2));
M(idx) = ind-1;
end
%// Now apply the mapping to get first image to make
%// the image look like the distribution of the second image
out = M(double(im1)+1);
out should contain your matched image where it transforms the intensity distribution of the first image to match that of the second image. Take special care of the out statement. The intensity range of im1 spans between [0,255], but MATLAB's indexing for arrays starts at 1. Therefore, we need to add 1 to every value of im1 so we can properly index into M to produce our output. However, im1 is of type uint8, and MATLAB saturates values should you try and go beyond 255. As such, to ensure that we get to 256, we must cast to a data type that is beyond 8-bit precision. I decided to use double, then when we add 1 to every value in im1, we will span between 1 to 256 so we can properly index into M. Also take not that when I find the location that minimizes the difference, I also must subtract by 1 as the data type spans from [0,255].
I am trying to compress a grayscale image using Huffman coding in MATLAB, and have tried the following code.
I have used a grayscale image with size 512x512 in tif format. My problem is that the size of the compressed image (length of the compressed codeword) is getting bigger than the size of the uncompressed image. The compression ratio is getting less than 1.
clc;
clear all;
A1 = imread('fig1.tif');
[M N]=size(A1);
A = A1(:);
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
[dict,avglen]=huffmandict(count,p) % build the Huffman dictionary
comp= huffmanenco(A,dict); %encode your original image with the dictionary you just built
compression_ratio= (512*512*8)/length(comp) %computing the compression ratio
%% DECODING
Im = huffmandeco(comp,dict); % Decode the code
I11=uint8(Im);
decomp=reshape(I11,M,N);
imshow(decomp);
There is a slight error in your code. I'm assuming you want to calculate the probability of encountering each pixel, which is the normalized histogram. You're not computing it properly. Specifically:
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
total is summing over [0,255] which is not correct. You're supposed to compute the probability distribution of your image. You should use imhist for that instead. As such, you should do this instead:
count = 0:255;
p = imhist(A1) / numel(A1);
This will correctly calculate your probability distribution for your image. Remember, when you're doing Huffman coding, you need to specify the probability of encountering a pixel. Assuming that each pixel can equally be likely to be chosen, this is captured by calculating the image's histogram, then normalizing by the total number of pixels in your image. Try that and see if you get any better results.
However, Huffman will only give you good compression ratios if you have frequently occurring symbols. Did you happen to take a look at the histogram or the spread of your pixels in your image?
If the spread is quite large, with very few entries per bin, then Huffman will not give you any compression savings. In fact it may give you a larger size as a result. Bear in mind that the TIFF compression standard only uses Huffman as part of the algorithm. There is also some pre- and post-processing done to further drive down the size.
As a further example, suppose I had an image that consisted of [0, 1, 2, ... 255; 0, 1, 2, ..., 255; 0, 1, 2, ..., 255]; I have 3 rows of [0,255], but really it could be any number of rows. This means that the probability of encountering each symbol is equiprobable, or 1/255, which means that for each symbol, we would need 8 bits per symbol... which is essentially the raw pixel value anyway!
The key behind Huffman is that a group of bits together generate one symbol. Frequently occurring symbols get assigned a smaller sequence of bits. Because this particular image that I talked about has intensities that are equiprobable, then you'd only generate one symbol per intensity rather than a group. With this, not only will you transmit the dictionary, you would effectively be sending one character at a time, and this is no better than sending the raw byte stream.
If you want your image to be compressed by raw Huffman, the distribution of pixels has to be skewed. For example, if most of the intensities in your image are dark, or are bright. If your image has good contrast or if the spread of the pixel intensities is flat throughout the image, then Huffman will not give you any compression savings.