Compare 2 rotated images that compose a 360 degrees image - rotation

I would like to compare 2 images where 2nd image is a 60 degrees rotation of 1st image. Imagine that I have 2 snapshots captured from the same position (both compose a 360 degrees image) - but 2nd image is 60 degrees rotated to the 1st image.
I would like to find the common part (overlap in pixels) of the 2 images, as my final goal is to create a single panoramic image from these 2.
Is this possible? If so - how?

The common way is to detect features in the two images and then match them (find which one exist in both images). This matching is done also usually using RANSAC as there will be outliers in the matches. See this tutorial that shows how to do that in Python.

Related

Fluctuation in bounding box with live feed

I'm detecting object from a live feed of a camera. The backend model used is ssd_mobilenet_v2. If I capture a image and feed it to the model for 10 times; everytime I get the bounding box of same size. But when I feed a live video to the model (without changing anything in the frame), with every frame I'm getting bouding box of different size (variation of 4 to 5 pixels when the image resolution is 640x480). The reason which I think behind is that due to tiny variations in the digital camera sensors, no two frames will be 100% the same — some pixels will most certainly have different intensity values 1. In this link the user have used GaussianBlur to average pixel intensities across an 21 x 21 region. Is this the only way to fix this ? Or there any better way to correct this.
I'm using Raspberry camera to get the video feed.
https://www.pyimagesearch.com/2015/05/25/basic-motion-detection-and-tracking-with-python-and-opencv/

What is the difference between cropping, resizing and scaling an image?

I am using Perl's
Image::Imlib2
package to generate thumbnails from larger images.
I've done such tasks before with several ImageMagick interfaces (PHP, Ruby, Python) and it was relatively easy. I have no prior experience with Imlib2 and it is a long time since I wrote something in Perl, so I am sorry if this seems naive!
This is what I've tried so far. It is simple, and assumes that scaling an image will keep the aspect ratio, and the generated thumbnail will be an exact miniature copy of the original image.
use strict;
use warnings;
use Image::Imlib2;
my $dir = 'imgs/*';
my #files = glob ($dir);
foreach my $img ( #files ) {
my $image = Image::Imlib2->load($img);
my $cropped_image = $image->create_scaled_image(50, 50);
$cropped_image->save($img);
}
Original image
Generated image
My first look at the image tells me that something is wrong. It may be my ignorance on cropping, resizing and scaling, but the generated image is displaying wrongly on small screens.
I've read What's the difference between cropping and resizing?, and honestly didn't understand anything. Also this one Image scaling.
Could someone explain the differences between those three ideas, and if possible give examples (preferably with Perl) to achieve better results? Or at least describe what I should consider when I want to create thumbnails?
The code you use isn't preserving the aspect-ratio. From Image::Imlib2::create_scaled_image
If x or y are 0, then retain the aspect ratio given in the other.
So change the line
my $cropped_image = $image->create_scaled_image(50, 50);
to
my $scaled_image = $image->create_scaled_image(50, 0);
and the new image will be 50 pixels wide, and its height computed so to keep the original aspect-ratio.
Since this is not cropping I've changed the variable name as well.
As for other questions, below is a basic discussion from comments. Please search for tutorials on image processing. Also, documentation of major libraries often have short and good explanations.
This is aggregated from comments deemed helpful. Also see Borodin's short and clear answer.
Imagine that you want to draw a picture (of some nice photograph) yourself in the following way. You draw a grid of, say, 120 (horizontally) by 60 (vertically) boxes. So 120 x 60, 720 boxes. These are your "pixels," and each you may fill with only one color. If the photo you are re-drawing is "mostly" blue at some spot, you color that pixel blue. Etc. It is not easy to end up with a faithful redrawing -- the denser the pixels the beter.
Now imagine that you want to draw another copy of this, just smaller. If you make it 20x20 that will be completely different, since it's a square. The best chance of getting it to "look the same" is to pick 2-to-1 ratio (like 120x60), so say 40x20. That's "aspect-ratio." But there is still a problem, since now you have to decide all over again what color to pick for each box, so to represent what is "mostly" on the photo at that spot. There are algorithms for that ("sampling," see your second link). That's involved with "resizing." The "quality" of the obtained drawing clearly must be much worse.
So "resizing" isn't all that simple. But, for us users, we mostly need to roughly know what is involved, and to find out how to use these features in a library. So read documentation. Some uses are very simple, while sometimes you'll have to decide which "algorithm" to let it use, or some such. Again, what I do is read manuals carefully.
The basic version of "cropping" is simple -- you just cut off a part of the picture. Say, remove the first and last 20 columns and the bottom and top 10 rows, and from the initial 120x60 you get a picture of 80x40. This is normally done when outer parts of an image have just white areas (or, worse, black!). So you want to "cut out" the picture itself from the whole image. Many graphics tools can do that on their own, by analyzing the image and figuring out those areas. Or, we select and hit a button.
I'm still not certain that you understand the difference between these terms
Your original image is 752 × 500 pixels
Resizing is a vague term that just means making a picture a different size somehow
Scaling is to change the size of an image proportionally. Scaling your picture down by a factor of ten would result in an image 75 × 50 (it should be 75.2 but we can't have 0.2 of a pixel). Scaling it up would make it bigger
You have scaled your picture to 50 × 50 pixels, which is a vertical scale of 10 (500 ÷ 5) but a horizontal scale of 15 (752 ÷ 50), so it appears squashed horizontally (or stretched vertically)
Cropping is to reduce an image by removing parts of it. To crop your image to 50 × 50 you would choose a 50 × 50 rectangle out of the whole picture and remove the rest. It would be a piece about the size of your monkey's nose, but you can pick any region you wish
zdim has shown you how you can call
$image->create_scaled_image(0, 50)
so that the height, or y-dimension, is reduced to 50, while the width, or x-dimension, is scaled by the same factor. That will result in a thumbnail 75 × 50 as above
I hope that helps
As I said in my comment, there is an
Image::Magick
Perl module if you would prefer to be back on familiar ground
Resizing and scaling is the same; you just change the size of the image. You can make it smaller or bigger.
Depending on the interface, you have to give either the new dimensions or a scaling factor for the operation. A factor less than or greater than 1.0 would make the image smaller or bigger. Smaller images are created by subsampling and bigger images by interpolation.
Cropping is very simple. You select a rectangular region of an image and that's your new image. It's like using scissors.
In your code example the image is named cropped_image although it is created through scaling, or resizing.
The output image is an image of size 50 x 50 pixels. That's what you did here:
my $cropped_image = $image->create_scaled_image(50, 50);
So no matter how your image looks before, you stuff it into 50 x 50 pixels. In this case not only reducing the resolution but also changing the aspect ratio.
The image is not displayed improperly, it's displayed perfectly fine.

Image resizing method during preprocessing for neural network

I am new to machine learning. I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network. I aim to resize images and turn each image into one row by making each pixel a separate column.
If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).
I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).
I am thinking of another way of doing it, but I am sure. Assume I want 10000 columns per image. Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels. So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns. For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).
The third solution I mentioned above seems to preserve the original shape of the image. However, it may be losing all of this originality while converting into a row since not all images are of the same size. I wonder about your comments on this issue. It will also be great if you can direct me to sources I can learn about the topic.
Solution 1 (simply resizing the input image) is a common approach. Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.
As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image. You can get around that by running the classification on multiple subwindows of the original image (i.e., classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride). Then, you need to decide how to combine your multiple classification results.
Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent). So you need to pass an image with explicit dimensions (e.g., 100 x 100) unless the network expects an array that was flattened from assumed dimensions. But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.
Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.

How to determine if an image needs to be rotated

I am trying to find a way to determine whether an image needs to be rotated in order for the text to be horizontally aligned. And if it does need to be rotated then by how many degrees?
I am sending the images to tesseract and for tesseract to be effective, the text in the images needs to be horizontally aligned.
I'm looking for a way do this without depending on the "Orientation" metadata in the image.
I've thought of following ways to do this:
Rotate the image 90 degrees clockwise four times and send all four images to tesseract. This isn't ideal because of the need to process one image 4 times.
Use hough line transform to see if the lines are vertical or horizontal. If they are vertical then rotate the image. This way the image still might need to be rotated 180 degrees. So I'm unsure how effective this would be.
I'm wondering if there are other ways to accomplish this using OpenCV, imageMagik or any other image processing techniques.
If you have a 1000 images which say horizontal or vertical, you can resize these images to 224x224 and then fine-tune a Convolutional neural network, like AlexNet or VGG for this task. If you want to know how many right rotations to make for the image, you can set the labels as the number of clock-wise rotations, like 0,1,2,3.
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
Aytempting ocr on all 4 orientations seems like a reasonable choice, and I doubt you will find a more reliable heuristic.
If speed is an issue, you could OCR a small part of the image first. Select a rectangular region, that has the proper amount of edge pixels and white/black ratio for text, then send that to tesseract in different orientations. With a small region, you could even try smaller steps than 90°, or combine it with another heuristic like Hough.
If you remember the most likely orientation based on previous images, and stop once an orientation is successfully processed by tesseract, you probably do not even have to try most orientations in most cases.
You can figure this out in a terminal with tesseract's psm option.
tesseract --psm 0 "infile" "outfile" will create outfile.osd which contains the info:
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 27.93
Script: Latin
Script confidence: 6.55
man tesseract
...
--psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
...

Processing images bigger than 4096px in Core Image for iOS 8

I am trying to write Core Image filter with custom kernel for iOS 8. My code works correct for images less than 4096x4096. For images bigger than 4096 I obtain "tile" effect. In other words it looks like image split into several sub-images and filters runs independently for each sub-image.
Here is good sample kernel:
kernel vec4 effectFoo(__sampler source_image) {
return sample(source_image, samplerTransform(source_image, destCoord() + vec2(200.0, 200.0)));
};
I apply it to big image with size 6000x4500. I expect to get image translated by (200, 200) offset with extrapolation by border color at two sides. However, the result is "tiled", so each tile has been translated and extrapolated independently.
Using log in ROI callback I got evidence that Core Image split image into 4 tiles of different sizes. Biggest tile was 4096x4096.
Thus there are two problems for me:
I can't get coordinates neither in full input image nor in full output image.
Core image extrapolate tiles without respect to neighbour tiles.

Resources