Shoud I resize image for mask RCNN? - image

I am training custom object detection by using mask RCNN. I have custom images that are of different sizes, so I am wondering if I need to resize the images so that they are all of the same size or not?
And if so, which method should I use to resize them?
Also I guess that I have to resize before labeling the images right?

You don't necessarily have to resize it before hand.
you can use this option in the model config file to set the size limit for your training.
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
Please make sure all the bounding boxes are in range with the image dimensions. i.e. the within the range of width and height of the image. Then the boxes and the images will be auto resized according to the parameter set here.

In Matterplot's Mask RCNN you can find documentation in the config file:
# Input image resizing
# Generally, use the "square" resizing mode for training and predicting
# and it should work well in most cases. In this mode, images are scaled
# up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
# scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
# padded with zeros to make it a square so multiple images can be put
# in one batch.
# Available resizing modes:
# none: No resizing or padding. Return the image unchanged.
# square: Resize and pad with zeros to get a square image
# of size [max_dim, max_dim].
# pad64: Pads width and height with zeros to make them multiples of 64.
# If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
# up before padding. IMAGE_MAX_DIM is ignored in this mode.
# The multiple of 64 is needed to ensure smooth scaling of feature
# maps up and down the 6 levels of the FPN pyramid (2**6=64).
# crop: Picks random crops from the image. First, scales the image based
# on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
# size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
# IMAGE_MAX_DIM is not used in this mode.
IMAGE_RESIZE_MODE = "square"
IMAGE_MIN_DIM = 800
IMAGE_MAX_DIM = 1024
How I understand it. When you train or predict this configuration will be used without you having to resize it manually. Ofcourse if you have different sizes and ratios of images this can be a problem.
512x512: ratio = 1 so this will upscale to 1024x1024
2054x2456: ratio = 0.836... so this will downscale to 1024x1024 keeping the ratio of 0.836... but using zeropadding to get the square shape.
Where it could go wrong is if a dimension of an object is relatively smaller or bigger in comparison with the different image dimensions which can result in a stretched or compressed object. In this case you should preprocess it manually so that in the end your object is of the same size and shape after the Mask RCNN function has molded it into the right shape.
The Matterplot function is found in "utils.py" and is called "resize_image".
In the "model.py" this is used during training when loading in the data and during inference (detect) to reshape the given numpy-array.

Related

Reshaping greyscale images for neural network training - how to do this correctly

I have a general question about convolutional neural networks and image processing for training if your images are grey scale.
Take this image for example:
Its a grey scale image but when I do
image = cv2.imread("image.jpg")
print(image.shape)
I get
(1024, 1024, 3)
I know that opencv automatically creates 3 channels for jpg images. But when it comes to network training, it would be much more computationally efficient if I could use images in (1024, 1024, 1) - just like many of the MNIST tutorials demonstrate. However, if I reshape this:
image.reshape(1024, 1024 , 1)
And then try for example to show the image
plt.axis("off")
plt.imshow(reshaped_image)
plt.show()
I get
raise TypeError("Invalid dimensions for image data")
Does that mean that reshaping my images this way before network training is incorrect? I want to keep as much information in the image as possible but I don't want to have those extra channels if they aren't needed.
The reason that you're getting the error is that the output of your reshape does not have the same number of elements as the input. From the documentation for reshape:
No extra elements are included into the new matrix and no elements are excluded. Consequently, the product rows*cols*channels() must stay the same after the transformation.
Instead, use cvtColor to convert your 3-channel BGR image to a 1-channel grayscale image:
In Python:
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Or in C++:
cv::cvtColor(image, image, cv::COLOR_BGR2GRAY);
You could also avoid conversion altogether by reading the image using the IMREAD_GRAYSCALE flag:
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
or
image = cv2.imread(image_path, 0)
(Thanks to #Alexander Reynolds for the Python code.)
This worked for me.
for image_path in dir:
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
X.append(img)
X = np.array(X)
X = np.expand_dims(X, axis=3)
set axis = Int : based on your array, 1 means it will prepend a new dimension in front.

Image resizing method during preprocessing for neural network

I am new to machine learning. I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network. I aim to resize images and turn each image into one row by making each pixel a separate column.
If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).
I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).
I am thinking of another way of doing it, but I am sure. Assume I want 10000 columns per image. Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels. So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns. For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).
The third solution I mentioned above seems to preserve the original shape of the image. However, it may be losing all of this originality while converting into a row since not all images are of the same size. I wonder about your comments on this issue. It will also be great if you can direct me to sources I can learn about the topic.
Solution 1 (simply resizing the input image) is a common approach. Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.
As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image. You can get around that by running the classification on multiple subwindows of the original image (i.e., classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride). Then, you need to decide how to combine your multiple classification results.
Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent). So you need to pass an image with explicit dimensions (e.g., 100 x 100) unless the network expects an array that was flattened from assumed dimensions. But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.
Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.

Image resizing without quality loss?

If I have for example an image of size 400 x 600. I know how to resize it in order to be of size 80 x 80 by using the code below:
original_image = imread(my_image);
original_image_gray = rgb2gray(original_image);
Image_resized = imresize(original_image_gray, [80 80]);
But I think that imresize will resize the image with some losses in the quality. So how to resize it without any loss of the quality?
Image resizing itself will lose part of the image info, i.e. quality of the image.
What you can do is to choose the resizing method that fits your purpose by setting up the corresponding parameter:
[...] = imresize(...,method)
^^^^^^
Matlab stores images as pixel array. It is impossible, to store all the information contained in a 400x600 element matrix in a 80x80 matrix, therefore quality loss is unavoidable when resizing the pixel array, which is what imresize does.
If you want to reduce the physical size of your output, you should look at the imgwrite documentation, in particular at the XResolution and YResolution parameters in the case of creating png images.
original_image = imread(my_image);
imwrite(original_image_grey,'image.png','png','ResolutionUnit','cm','XResolution',400)
The above code will create a png of the original image with a resolution of 400px/cm, resulting in an image of 1cm width. The png will still be a 400x600px Bitmap.

Normalization of handwritten characters w.r.t size and position

I am doing a project on offline handwriting recognition.In the preprocessing stage,I need to normalize the handwritten part in binary image w.r.t its size and position.Can anyone tell me how to access just the writing part(black pixels) in the image and resize and shift its position?
Your problem is as broad as the field of image processing. There is no one way to segment an image into foreground and background so whatever solution you find here works on some cases and doesn't in others. However, the most basic way to segment a grayscale image is:
% invert your grayscale so text is white and background is black
gray_im = 1 - im2double(gray_im);
% compute the best global threshold
level = graythresh(gray_im);
% convert grayscale image to black and white based on best threshold
bw_im = im2bw(gray_im, level);
% find connected regions in the foreground
CC = bwconncomp(bw_im);
% if necessary, get the properties of those connected regions for further analysis
S = regionsprops(CC);
Note: Many people have much more sophisticated methods to segment and this is by no means the best way of doing it.
After post-processing, you will end up with one (or more) image containing only a single character. To resize to a specific size M x N, use:
resized_bw = imresize(single_char_im, [M N]);
To shift its position, the easiest way I know is to use circshift() function:
shifted_bw = circshift(resized_bw, [shift_pixels_up_down, shift_pixels_left_right]);
Note: circshift wraps the shifted columns or rows so if your bounding box is too tight, the best method is to pad your image, and re-crop it at the new location.

Plotting several sequences of frames from tiff files with a chosen size

I am analyzing some experimental data in the form of .tiff multi frames. Withins these tiff files i need to visualize and compare some specific sequences of frames. I want to generate a figure that contains the frames i have chosen from the files i have chosen. The file list and frames indexes list are generated with a user interface, which calls a plot function when parameters are filled.
The problem : What is the best solution so as to plot, with an optimal size BUT keeping square images (like the original), the chosen frames ? Simpler, how to choose the position and size of each frame i plot in the figure ?
I have tried with sub plot : it works but I can't manage to control the images size.
pos=0;
for j = 1:length(file_list)
for i = 1:length(index_list)
pos=pos+1;
subplot(size(file_list,1),length(index_list),pos)
a =imagesc(imread(file_list{j,:},index_list(i)));
I have also tried
for j = 1:length(file_list)
for i = 1:length(index_list)
a =imagesc(imread(file_list{j,:},index_list(i)));
set(gca,'Units','Pixels', 'Position', [10+100*i 10+100*j 100 100]);
But it seems like i can't set this individually without overwriting the last modification.
Finally, i have considered using "montage", but the way I save the images in a list doesn't seem to be ok.
frm_list=zeros(1,length(FL)*length(index_list));
for j = 1:length(FL)
for i = 1:length(index_list)
a =(imread(FL{j,:},index_list(i)));
frm_list=[frm_list a];
end
end
montage(frm_list,'Size', [length(FL) length(index_list)]);
Thanks
JC
You can use axis image to keep the same aspect ratio of the original image.
subplot('Position', [left bottom width height]) allows you to specify the relative position of the image to the figure window.
If you want to use a command other than imagesc, you can scale the data range of the image before drawing , then use colormap to apply false coloring to the image.

Resources