I am using Perl's
Image::Imlib2
package to generate thumbnails from larger images.
I've done such tasks before with several ImageMagick interfaces (PHP, Ruby, Python) and it was relatively easy. I have no prior experience with Imlib2 and it is a long time since I wrote something in Perl, so I am sorry if this seems naive!
This is what I've tried so far. It is simple, and assumes that scaling an image will keep the aspect ratio, and the generated thumbnail will be an exact miniature copy of the original image.
use strict;
use warnings;
use Image::Imlib2;
my $dir = 'imgs/*';
my #files = glob ($dir);
foreach my $img ( #files ) {
my $image = Image::Imlib2->load($img);
my $cropped_image = $image->create_scaled_image(50, 50);
$cropped_image->save($img);
}
Original image
Generated image
My first look at the image tells me that something is wrong. It may be my ignorance on cropping, resizing and scaling, but the generated image is displaying wrongly on small screens.
I've read What's the difference between cropping and resizing?, and honestly didn't understand anything. Also this one Image scaling.
Could someone explain the differences between those three ideas, and if possible give examples (preferably with Perl) to achieve better results? Or at least describe what I should consider when I want to create thumbnails?
The code you use isn't preserving the aspect-ratio. From Image::Imlib2::create_scaled_image
If x or y are 0, then retain the aspect ratio given in the other.
So change the line
my $cropped_image = $image->create_scaled_image(50, 50);
to
my $scaled_image = $image->create_scaled_image(50, 0);
and the new image will be 50 pixels wide, and its height computed so to keep the original aspect-ratio.
Since this is not cropping I've changed the variable name as well.
As for other questions, below is a basic discussion from comments. Please search for tutorials on image processing. Also, documentation of major libraries often have short and good explanations.
This is aggregated from comments deemed helpful. Also see Borodin's short and clear answer.
Imagine that you want to draw a picture (of some nice photograph) yourself in the following way. You draw a grid of, say, 120 (horizontally) by 60 (vertically) boxes. So 120 x 60, 720 boxes. These are your "pixels," and each you may fill with only one color. If the photo you are re-drawing is "mostly" blue at some spot, you color that pixel blue. Etc. It is not easy to end up with a faithful redrawing -- the denser the pixels the beter.
Now imagine that you want to draw another copy of this, just smaller. If you make it 20x20 that will be completely different, since it's a square. The best chance of getting it to "look the same" is to pick 2-to-1 ratio (like 120x60), so say 40x20. That's "aspect-ratio." But there is still a problem, since now you have to decide all over again what color to pick for each box, so to represent what is "mostly" on the photo at that spot. There are algorithms for that ("sampling," see your second link). That's involved with "resizing." The "quality" of the obtained drawing clearly must be much worse.
So "resizing" isn't all that simple. But, for us users, we mostly need to roughly know what is involved, and to find out how to use these features in a library. So read documentation. Some uses are very simple, while sometimes you'll have to decide which "algorithm" to let it use, or some such. Again, what I do is read manuals carefully.
The basic version of "cropping" is simple -- you just cut off a part of the picture. Say, remove the first and last 20 columns and the bottom and top 10 rows, and from the initial 120x60 you get a picture of 80x40. This is normally done when outer parts of an image have just white areas (or, worse, black!). So you want to "cut out" the picture itself from the whole image. Many graphics tools can do that on their own, by analyzing the image and figuring out those areas. Or, we select and hit a button.
I'm still not certain that you understand the difference between these terms
Your original image is 752 × 500 pixels
Resizing is a vague term that just means making a picture a different size somehow
Scaling is to change the size of an image proportionally. Scaling your picture down by a factor of ten would result in an image 75 × 50 (it should be 75.2 but we can't have 0.2 of a pixel). Scaling it up would make it bigger
You have scaled your picture to 50 × 50 pixels, which is a vertical scale of 10 (500 ÷ 5) but a horizontal scale of 15 (752 ÷ 50), so it appears squashed horizontally (or stretched vertically)
Cropping is to reduce an image by removing parts of it. To crop your image to 50 × 50 you would choose a 50 × 50 rectangle out of the whole picture and remove the rest. It would be a piece about the size of your monkey's nose, but you can pick any region you wish
zdim has shown you how you can call
$image->create_scaled_image(0, 50)
so that the height, or y-dimension, is reduced to 50, while the width, or x-dimension, is scaled by the same factor. That will result in a thumbnail 75 × 50 as above
I hope that helps
As I said in my comment, there is an
Image::Magick
Perl module if you would prefer to be back on familiar ground
Resizing and scaling is the same; you just change the size of the image. You can make it smaller or bigger.
Depending on the interface, you have to give either the new dimensions or a scaling factor for the operation. A factor less than or greater than 1.0 would make the image smaller or bigger. Smaller images are created by subsampling and bigger images by interpolation.
Cropping is very simple. You select a rectangular region of an image and that's your new image. It's like using scissors.
In your code example the image is named cropped_image although it is created through scaling, or resizing.
The output image is an image of size 50 x 50 pixels. That's what you did here:
my $cropped_image = $image->create_scaled_image(50, 50);
So no matter how your image looks before, you stuff it into 50 x 50 pixels. In this case not only reducing the resolution but also changing the aspect ratio.
The image is not displayed improperly, it's displayed perfectly fine.
Related
I have a training dataset of 640x512 images that I would like to use with a 320x240 camera.
Is it ok to change the aspect ratio and the size of the training images to that of the camera?
Would it be better to upscale the camera frames?
It is better if you keep the aspect ratio of the images because you will be artificially modifying the composition of the objects in the image. What you can do is downscale the image by a factor of 2, so it's 320 x 256, then crop from the center so you have a 320 x 240 image. You can do this by simply removing the first 8 and last 8 columns of the image to get it to 320 x 240. Removing the first 8 and last 8 columns should be safe because it is very unlikely you will see meaningful information within an 8 pixel band on either side of the image.
If you are using a deep learning framework such as Tensorflow or PyTorch, there are pre-processing methods to automatically allow you to crop from the center as well as downscale the image by a factor of 2 for you. You just need to set up a pre-processing pipeline and have these two things in place. You don't have any code established so I can't help you with implementation details, but hopefully what I've said is enough to get you started.
Finally, do not upsample the images. There will be no benefit because you will be using existing information to interpolate to a larger space which is inaccurate. You can scale down, but never scale up. The only situation where this could be useful is if you use superresolution, but that would be for specific cases and it highly depends on what images you use. In general, I do not recommend upscaling. Take your training set and downscale to the resolution of the camera as the images from the camera would be what is used at inference and at that resolution.
Just a straight forward question. I´m trying to make the best possible choice here and there is too much information for a "semi-beginner" like me.
Well, at this point, I´m trying with screen size values for my layout (activity_main.xml (normal, large, small)) and with different densities (xhdpi, xxhdpi, mhdpi) and, if a can say so myself, it is a mess. Do I have to create every possible option to support all screen sizes and densities? Or am I doing something really wrong here? what is the best approach for this?
My layouts are now activity_main(normal_land_xxhdpi) and I have serious doubts about it.
I´m using last version of android studio of course. My app is a single activity with buttons, textview and others. Does not have any fragments or intents whatsoever, and for that reason I think this has to be an easy task, but not for me.
Hope you guys can help. I don't think i need to put any code here, but if needed, i can add it.
If you want to make a responsive UI for every device you need to learn about some things first:
-Difference between PX, DP:
https://developer.android.com/training/multiscreen/screendensities
Here you can understand that dp is a standard measure which android uses to calculate how much pixels, lets say a line, should have to keep the proportions between different screensizes with different densities.
-Resolution, Density and Ratio:
The resolution is how much pixels a screen has over height and width. This pixels can be smaller or bigger, so for instance, if you have a screen A with 10x10 px whose pixels are two times smaller than other screen B with 10 x 10 pixels too, A is two times smaller than B even when both have 10 x 10 px.
For that reason exists the meaning Density, which is how much pixels your screen has for every inch, so you can measure the quality of a screen where most pixels per inch (ppi) is better.
Ratio tells you how much pixels are for height as well as width, for example the ratio of a screen of 1000 x 2000 px is 1:2, a full hd screen of 1920 x 1080 is 16:9 (16 pixels height for every 9 pixels width). A 1:1 ratio is a square screen.
-Standard device's resolutions
You can find the most common measurements on...
https://material.io/resources/devices/
When making a UI, you use the DP measurements. You will realize that even when resolution between screens are different, the DP is the same cause they have different densities.
Now, the right way is going with constraint layout using dp measures to put your views on screen, with correct constraints the content will adapt to other screen sizes
Anyway, you will need to make additional XML for some cases:
-Different orientation
-Different ratio
-Different DP resolution (not px)
For every activity, you need to provide a portrait and landscape design. If other device has different ratio, maybe you will need to adjust the height or width due to the proportions of the screens aren't the same. Finally, even if the ratio is the same, the DP resolution could be different, maybe you designed an activity for a 640x360dp and other device has 853x480dp, which means you will have more vertical space.
You can read more here:
https://developer.android.com/training/multiscreen/screensizes
And learn how to use constraintLayout correctly:
https://developer.android.com/training/constraint-layout?hl=es-419
Note:
Maybe it seems to be so much work for every activity, but you make the first design and then you just need to copy the design to other xml with some qualifiers and change the dp values to adjust the views as you wants (without making from scratch) which is really faster.
I am new to machine learning. I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network. I aim to resize images and turn each image into one row by making each pixel a separate column.
If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).
I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).
I am thinking of another way of doing it, but I am sure. Assume I want 10000 columns per image. Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels. So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns. For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).
The third solution I mentioned above seems to preserve the original shape of the image. However, it may be losing all of this originality while converting into a row since not all images are of the same size. I wonder about your comments on this issue. It will also be great if you can direct me to sources I can learn about the topic.
Solution 1 (simply resizing the input image) is a common approach. Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.
As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image. You can get around that by running the classification on multiple subwindows of the original image (i.e., classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride). Then, you need to decide how to combine your multiple classification results.
Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent). So you need to pass an image with explicit dimensions (e.g., 100 x 100) unless the network expects an array that was flattened from assumed dimensions. But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.
Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.
I am trying to find a way to determine whether an image needs to be rotated in order for the text to be horizontally aligned. And if it does need to be rotated then by how many degrees?
I am sending the images to tesseract and for tesseract to be effective, the text in the images needs to be horizontally aligned.
I'm looking for a way do this without depending on the "Orientation" metadata in the image.
I've thought of following ways to do this:
Rotate the image 90 degrees clockwise four times and send all four images to tesseract. This isn't ideal because of the need to process one image 4 times.
Use hough line transform to see if the lines are vertical or horizontal. If they are vertical then rotate the image. This way the image still might need to be rotated 180 degrees. So I'm unsure how effective this would be.
I'm wondering if there are other ways to accomplish this using OpenCV, imageMagik or any other image processing techniques.
If you have a 1000 images which say horizontal or vertical, you can resize these images to 224x224 and then fine-tune a Convolutional neural network, like AlexNet or VGG for this task. If you want to know how many right rotations to make for the image, you can set the labels as the number of clock-wise rotations, like 0,1,2,3.
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
Aytempting ocr on all 4 orientations seems like a reasonable choice, and I doubt you will find a more reliable heuristic.
If speed is an issue, you could OCR a small part of the image first. Select a rectangular region, that has the proper amount of edge pixels and white/black ratio for text, then send that to tesseract in different orientations. With a small region, you could even try smaller steps than 90°, or combine it with another heuristic like Hough.
If you remember the most likely orientation based on previous images, and stop once an orientation is successfully processed by tesseract, you probably do not even have to try most orientations in most cases.
You can figure this out in a terminal with tesseract's psm option.
tesseract --psm 0 "infile" "outfile" will create outfile.osd which contains the info:
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 27.93
Script: Latin
Script confidence: 6.55
man tesseract
...
--psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
...
I'm a visual/UI designer working on a project/product which has been designed by another designer. This designer provided the front-end dev with good quality PNG icons, but when the front-end dev sets the images scale to 0.7, they look blurry.
I've noticed that if we set the image's scale to 0.5, they don't look blurry at all:
0.7: [1]: http://i.stack.imgur.com/jQNYG.png
0.5: [2]: http://i.stack.imgur.com/hBShu.png
Anyone knows why does that happen?
I personally always work with 0.5 scales because I was taught so. Is there any logical reason for this?
Sorry if the answer is obvious. I am really curious about that. Thanks in advance.
What is happening largely depends upon the software that you are using to shrink the image. There is a major different between reducing by 0.5 and 0.7.
If you shrink by 0.5, you are combining 4 pixels into one.
If you shrink by 0.7 you are doing fractional sampling. 10 pixels in each direction get reduced to 7.
In 0.5 sampling, you read two pixels across, read two pixels down.
In 0.7 sampling you read 1.42857142857143 pixels in each direction. In order to do that you have to weight pixel values. That is going to create blurriness in a drawing.
It's because when you halve an image's size (in both dimensions), you effectively are combining exactly 4 pixels into one. However when you do a slightly off scale (such as 0.7) you have one and a fraction of a pixel going into one pixel (in each dimension). This means the data from one pixel is being used in up to 4 pixels, instead of one, causing a blurry effect.
Sorry, making an example image would be quite difficult for me, but I hope you get the concept.
I think this has to do with interpolation, when you resize an image there is no way of knowing what is supposed to be in-between the two pixels that are essentially being merged. What the computer tries to do is guess what the new pixel is supposed to look like by looking at the pixel around it and combining the values.
So for example in the image above it will go what is in between white and orange? a less bright orange. OK lets make the merged pixel look like that. When you get to a corner there might be more orange so the new pixel will look more orangey, you get the point.
Now when you scale by 0.5 the computer looks at the pixels and merges all the pixels together at a constant rate. What I mean by that is if you look at an image and try to divide it in half you will always merge 4 pixels together however if you scale by 0.7 your merging an irregular amount of pixels resulting in different concentrations of pixels as the image is scaled which results in a blurry image.
If you don't understand this I understand, I kinda went off on a tangent.... if you need more clarification comment bellow :)
Add an .img-crisp class to the image:
.img-crisp {
image-rendering: -moz-crisp-edges; /* Firefox */
image-rendering: -o-crisp-edges; /* Opera */
image-rendering: -webkit-optimize-contrast; /* Webkit (non-standard naming) */
image-rendering: crisp-edges;
-ms-interpolation-mode: nearest-neighbor; /* IE (non-standard property) */
}
The image-rendering CSS property sets an image scaling algorithm. The
property applies to an element itself, to any images set in its other
properties, and to its descendants.
Source.