Good day,
In MATLAB, I have multiple image-pairs of various samples. The images in a pair are taken by different cameras. The images are in differing orientations, though I have created transforms (for each image-pair) that can be applied to correct that. Their bounds contain the same physical area, but one image has smaller dimensions (ie. 50x50 against 250x250). Additionally, the smaller image is not in a consistent location within the larger image. However, the smaller image is within the borders of the larger image.
What I'd like to do is as follows: after applying my pre-determined transform to the larger image, I want to crop the part of the larger image that is of the same as the smaller image.
I know I can specify XData and YData when applying my transforms to output a subset of the transformed image, but I don't know how to relate that to the location of the smaller image. (Note: Transforms were created from control-point structures)
Please let me know if anything is unclear.
Any help is much appreciated.
Seeing how you are specifying control points to get the transformation from one image to another, I'm assuming this is a registration problem. As such, I'm also assuming you are using imtransform to warp one image to another.
imtransform allows you to specify two additional output parameters:
[out, xdata, ydata] = imtransform(in, tform);
Here, in would be the smaller image and tform would be the transformation you created to register the smaller image to warp into the larger image. You don't need to specify the XData and YData inputs here. The inputs of XData and YData will bound where you want to do the transformation. Usually people specify the dimensions of the image to ensure that the output image is always contained within the borders of the image. However in your case, I don't believe this is necessary.
The output variable out is the warped and transformed image that is dictated by your tform object. The other two output variables xdata and ydata are the minimum and maximum x and y values within your co-ordinate system that will encompass the transformed image fully. As such, you can use these variables to help you locate where exactly in the larger image the transformed smaller image appears. If you want to do a comparison, you can use these to crop out the larger image and see how well the transformation worked.
NB: Sometimes the limits of xdata and ydata will go beyond the dimensions of your image. However, because you said that the smaller image will always be contained within the larger image (I'm assuming fully contained), then this shouldn't be a problem. Also, the limits may also be floating point so you'll need to be careful here if you want to use these co-ordinates to crop a minimum spanning bounding box.
Related
I want 2 pixels to be fixed and not changed in fake generated images in generator.
I have a feeling that masking that specific pixels are needed. But I dont know how exactly to do so in GANs training procedure. Because even though I mask the pixels before training loop starts, I do not know how to leave the fixed pixels not to be used in generating scene. I wish someone can code up a bit to show how it should be done.
Or I just need to train it and then replace those generated two pixel locations with original pixel values that I want to fix with? But this must distort other pixel values because I generated the two pixels as well that shouldnt be affected and affecting other pixels.
I am currently doing image registration using the 'Registration estimator' application.
Basically, the application allows the user to register two images using multiple methods and the output includes transformation matrix.
The question is, now I want to register two large images, the size of the two images are 63744*36064 and 64704*35072. It's almost impossible to directly register two images since they are too large.
The methods I use is to first obtain the scaled images for registration and derive transformation matrix and apply that matrix to the original images.
However, I found that even for the same image, different transformation matrices are obtained at different levels.
For example, the transformation matrix for images at sizes: 3984(63744/16)*2254(36064/16) and 4022*2192 is different from 1992*1127 (1/32) and 2022*1096 (1/32).
In that case, I am confused about the relationship between sizes and the transformation matrix. Could anyone give me a hint so that I can precisely register two original images based on the transformation matrix I have for the images at a lower level (smaller size)?
Downsampling an image has direct effect on translation matrix. Suppose for example that there is 2 pixel translation in x direction, downsapling by a factor of 2 changes it to 1 pixel. Whereas its easy to compensate this effect for registering original images, you should avoid downsamplind images if there's memory constrain, since you may lose invaluable key-points used for robust registration. Instead, you can slice your images up into several sub-images, extract the features in each sub-image, combine the features and match them.
I am new to machine learning. I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network. I aim to resize images and turn each image into one row by making each pixel a separate column.
If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).
I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).
I am thinking of another way of doing it, but I am sure. Assume I want 10000 columns per image. Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels. So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns. For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).
The third solution I mentioned above seems to preserve the original shape of the image. However, it may be losing all of this originality while converting into a row since not all images are of the same size. I wonder about your comments on this issue. It will also be great if you can direct me to sources I can learn about the topic.
Solution 1 (simply resizing the input image) is a common approach. Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.
As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image. You can get around that by running the classification on multiple subwindows of the original image (i.e., classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride). Then, you need to decide how to combine your multiple classification results.
Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent). So you need to pass an image with explicit dimensions (e.g., 100 x 100) unless the network expects an array that was flattened from assumed dimensions. But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.
Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.
I have roughly 160 images for an experiment. Some of the images, however, have clearly different levels of brightness and contrast compared to others. For instance, I have something like the two pictures below:
I would like to equalize the two pictures in terms of brightness and contrast (probably find some level in the middle and not equate one image to another - though this could be okay if that makes things easier). Would anyone have any suggestions as to how to go about this? I'm not really familiar with image analysis in Matlab so please bear with my follow-up questions should they arise. There is a question for Equalizing luminance, brightness and contrast for a set of images already on here but the code doesn't make much sense to me (due to my lack of experience working with images in Matlab).
Currently, I use Gimp to manipulate images but it's time consuming with 160 images and also just going with subjective eye judgment isn't very reliable. Thank you!
You can use histeq to perform histogram specification where the algorithm will try its best to make the target image match the distribution of intensities / histogram of a source image. This is also called histogram matching and you can read up about it on my previous answer.
In effect, the distribution of intensities between the two images should hopefully be the same. If you want to take advantage of this using histeq, you can specify an additional parameter that specifies the target histogram. Therefore, the input image would try and match itself to the target histogram. Something like this would work assuming you have the images stored in im1 and im2:
out = histeq(im1, imhist(im2));
However, imhistmatch is the more better version to use. It's almost the same way you'd call histeq except you don't have to manually compute the histogram. You just specify the actual image to match itself:
out = imhistmatch(im1, im2);
Here's a running example using your two images. Note that I'll opt to use imhistmatch instead. I read in the two images directly from StackOverflow, I perform a histogram matching so that the first image matches in intensity distribution with the second image and we show this result all in one window.
im1 = imread('http://i.stack.imgur.com/oaopV.png');
im2 = imread('http://i.stack.imgur.com/4fQPq.png');
out = imhistmatch(im1, im2);
figure;
subplot(1,3,1);
imshow(im1);
subplot(1,3,2);
imshow(im2);
subplot(1,3,3);
imshow(out);
This is what I get:
Note that the first image now more or less matches in distribution with the second image.
We can also flip it around and make the first image the source and we can try and match the second image to the first image. Just flip the two parameters with imhistmatch:
out = imhistmatch(im2, im1);
Repeating the above code to display the figure, I get this:
That looks a little more interesting. We can definitely see the shape of the second image's eyes, and some of the facial features are more pronounced.
As such, what you can finally do in the end is choose a good representative image that has the best brightness and contrast, then loop over each of the other images and call imhistmatch each time using this source image as the reference so that the other images will try and match their distribution of intensities to this source image. I can't really write code for this because I don't know how you are storing these images in MATLAB. If you share some of that code, I'd love to write more.
I'm trying to develop a mobile application, and I'm wondering the easiest way to convert an image into a text file, and then be able to recreate it later in memory said text. The image(s) in question will contain no more than 16 or so colors, so it would work out fine.
Basically, brute-forcing this solution would require me saving each individual's pixel color data into a file. However, this would result in a HUGE file. I know there's a better way - like, if there's a huge portion of the image that consists of the same color, breaking up the area into smaller squares and rectangles and saving their coordinates and size to file.
Here's an example. The image is supposed to be just black/white. The big color boxes represent theoretical 'data points' in the outputted text file. These boxes would really state their origin, size, and what color they should be.
E.g., top box has an origin of 0,0, a size of 359,48, and it represents the color black.
Saved in a text file, the data would be 0,0,359,48,0.
What kind of algorithm would this be?
NOTE: The SDK that I am using cannot return a pixel's color from an X,Y coordinate. However, I can load external information into the program from a text file and manipulate it that way. This data that I need to export to a text file will be from a different utility that will have the capability to get a pixel's color from X,Y coordinates.
EDIT: Added a picture
EDIT2: Added constraints
Could you elaborate on why you want to save an image (or its parts) as plain text? Can't you use a binary representation instead? Also, if images typically have lots of contiguous runs of pixels of same color, you may want to use the so-called run-length encoding (RLE). Alternatively, one of Lempel-Ziv-something compression algorithms could be used (LZ77, LZ78, LZW).
Compress the image into a compressed format (e.g. JPEG, PNG, GIF, etc) and then save it as a .txt file or whatever. To recreate the image, just read in the file into your program using whatever library function suits your particular needs.
If it's necessary that the .txt file have some textual meaning, then you may be in some trouble.
In cs there is an algorithm like spatial index to recursivley subdivide a plane into 4 tiles. If the cell has the same size it looks like a quadtree. If want you to subdivide a plane into pattern (of colors) you can use this tiling idea to dynamically change the size of the cell. A good start to look at is a z-curve or a hilbert curve.