How to extract the layers from an image (jpg,png,etc) - image

Given an image such as the CakePHP logo, how can this image be converted back into a PSD with the layers. As a human, I can easily work out how to translate this back to a PSD with layers. I can tell that the background is a circular shape with star edges. So the circular star part is at the back, the cake image is on top of this and the words CakePHP is over all of these two images.
I can use Photoshop/Gimp tools to separate these images into three images and fill in the areas in-between. Then I have three layers.
As a human, it is easy to work out the layering of most logos and images and many images have multiple layers, the CakePHP logo is just one example. Images in the real world also have a layering, there may be a tree layer on top of a background of grass. I need a general way to convert from an image back to the layered representation, ideally a software solution.
In absence of a programmed solution, are there any papers or research which solve this problem or are related to this problem? I am mostly interested in converting human constructed images such as logos or website titles back to layered representation.
I want to point out some benefits of doing this, if you can get this image to a layered representation automatically then it is more easy to modify the image. For example, maybe you want to make the cake smaller, if the computer already layered the cake on top of the red background, you can just scale the cake layer. This allows for layer adjustment of images on websites which do not have layer information already.

As already mentioned, this is a non-trivial task. Ultimately, it can be most
simply phrased as: given an image (or scene if real photo) which is composed of
pixels N, how can those be assigned to M layers?
For segmentation, it's all about the prior knowledge you can bring to bear to
this as to what properties of pixels, and of groups of pixels, give "hints"(and
I use the word advisedly!) as to the layer they belong to.
Consider even the simplest case of using just the colour in your image. I can
generate these 5 "layers" (for hue values 0,24,90, 117 and 118):
With this code (in python/opencv)
import cv
# get orginal image
orig = cv.LoadImage('cakephp.png')
# show original
cv.ShowImage("orig", orig)
# convert to hsv and get just hue
hsv = cv.CreateImage(cv.GetSize(orig), 8, 3)
hue = cv.CreateImage(cv.GetSize(orig), 8, 1)
sat = cv.CreateImage(cv.GetSize(orig), 8, 1)
val = cv.CreateImage(cv.GetSize(orig), 8, 1)
cv.CvtColor(orig, hsv, cv.CV_RGB2HSV)
cv.Split(hsv,hue,sat,val,None)
#cv.ShowImage("hue", hue)
# loop to find how many different hues are present...
query = cv.CreateImage(cv.GetSize(orig), 8, 1)
result = cv.CreateImage(cv.GetSize(orig), 8, 1)
for i in range(0,255):
cv.Set(query,i)
cv.Cmp(query,hue,result,cv.CV_CMP_EQ)
# if a number of pixels are equal - show where they are
if (cv.CountNonZero(result)>1000): # <-what is signficant?
cv.ShowImage(str(i),result)
cv.SaveImage(str(i)+".png",result)
cv.WaitKey(-1)
But, even here we are having to describe what is "significant" in terms of the
number of pixels that belong to a mask (to the extent that we can miss some
colours). We could start to cluster similar colours instead - but at what
density does a cluster become significant? And if it wasn't just pure colour,
but textured instead, how could we describe this? Or, what about inference that
one layer is part of another, or in front of it? Or, ultimately, that some of
the layers seem to be what we humans call "letters" and so should probably be
all related...
A lot of the research in Computer Vision in segmentation generally tries to take
this problem and improve it within a framework that can encode and apply this
prior knowledge effectively...

When you convert from a layer representation to an image you are loosing information. For instance, you don't know the values of the pixels of the background layer behind the cake. Additionally, you don't know for sure which part of the image belong to which layer.
However it may be possible in some cases to recover or estimate at least partially this information. For instance, you could try to separate an image into "layers" using segmentation algorithms. On your exemple, a simple segmentation based on color would probably work.
As for recovering lost pixel values in the background, there is so-called inpainting technics which attempt to estimate missing areas in images based on its surroudings.
Lastly, to recover position and content of texts in images you can rely on Optical Character Recognition (OCR) methods.
Keep in mind that there is no simple algorithm to solve your problem which is more complex than it seems. However, using the above information, you can try to automate at least partially your problem.

Related

Tensorflow object detection training

I would like to detect objects (upper half of the image below) in images (bottom half). Is it smart to train the dataset with images in a different scale (or size)? Or shall I train it with parts of the bottom half of the image below? What is the best way to mark the objects for training?
Kind regards
If I understand your question correctly. If you are exclusively interested in detecting objects at roughly the scale of the below picture, your training data should consist of images like the below one. To add on: try to get at least a decent range of sizes around the bottom so as to avoid small deviations from a specific scale throwing it off, but generally you should be fine.

CNN - Image Resizing VS Padding (keeping aspect ratio or not?)

While people usually tend to simply resize any image into a square while training a CNN (for example, resnet takes a 224x224 square image), that looks ugly to me, especially when the aspect ratio is not around 1.
(In fact, that might change ground truth, for example, the label that an expert might give the distorted image could be different than the original one).
So now I resize the image to, say, 224x160 , keeping the original ratio, and then I pad the image with 0s (by pasting it into a random location in a totally black 224x224 image).
My approach doesn't seem original to me, and yet I cannot find any information whatsoever about my approach versus the "usual" approach.
Funky!
So, which approach is better? Why? (if the answer is data dependent, please share your thoughts regarding when one is preferable to the other.)
According to Jeremy Howard, padding a big piece of the image (64x160 pixels) will have the following effect: The CNN will have to learn that the black part of the image is not relevant and does not help distinguishing between the classes (in a classification setting), as there is no correlation between the pixels in the black part and belonging to a given class. As you are not hard coding this, the CNN will have to learn it by gradient descent, and this might probably take some epochs. For this reason, you can do it if you have lots of images and computational power, but if you are on a budget on any of them, resizing should work better.
Sorry, this is late but this answer is for anyone facing the same issue.
First, if scaling with changing the aspect ratio will affect some important features, then you have to use zero-padding.
Zero padding doesn't make it take longer for the network to learn because of the large black area itself but because of the different possible locations that the unpadded image could be inside the padded image since you can pad an image in many ways.
For areas with zero pixels, the output of the convolution operation is zero. The same with max or average pooling. Also, you can prove that the weight is not updated after backpropagation if the input associated with that weight is zero under some activation functions (e.g. relu, sigmoid). So the large area doesn't make any updates to the weights in this sense.
However, the relative position of the unpadded image inside the padded image does indeed affect training. This is not due to the convolution nor the pooling layers but the last fully connected layer(s). For example, if the unpadded image is on the left relative inside the padded image and the output of flattening the last convolution or pooling layer was [1, 0, 0] and the output for the same unpadded image but on the right relative inside the padded image was [0, 0, 1] then the fully connected layer(s) must learn that [1, 0, 0] and [0, 0, 1] are the same thing for a classification problem.
Therefore, learning the equivariance of different possible positions of the image is what makes training take more time. If you have 1,000,000 images then after resizing you will have the same number of images; on the other hand, if you pad and want to consider different possible locations (10 randomly for each image) then you will have 10,000,000 images. That is, training will take 10 times longer.
That said, it depends on your problem and what you want to achieve. Also, testing both methods will not hurt.

Restoring an old manuscript with image processing

Say i have this old manuscript ..What am trying to do is making the manuscript such that all the characters present in it can be perfectly recognized what are the things i should keep in mind ?
While approaching such a problem any methods for the same?
Please help thank you
Some graphics applications have macro recorders (e.g. Paint Shop Pro). They can record a sequence of operations applied to an image and store them as macro script. You can then run the macro in a batch process, in order to process all the images contained in a folder automatically. This might be a better option, than re-inventing the wheel.
I would start by playing around with the different functions manually, in order to see what they do to your image. There are an awful number of things you can try: Sharpening, smoothing and remove noise with a lot of different methods and options. You can work on the contrast in many different ways (stretch, gamma correction, expand, and so on).
In addition, if your image has a yellowish background, then working on the red or green channel alone would probably lead to better results, because then the blue channel has a bad contrast.
Do you mean that you want to make it easier for people to read the characters, or are you trying to improve image quality so that optical character recognition (OCR) software can read them?
I'd recommend that you select a specific goal for readability. For example, you might want readers to be able to read the text 20% faster if the image has been processed. If you're using OCR software to read the text, set a read rate you'd like to achieve. Having a concrete goal makes it easier to keep track of your progress.
The image processing book Digital Image Processing by Gonzalez and Woods (3rd edition) has a nice example showing how to convert an image like this to a black-on-white representation. Once you have black text on a white background, you can perform a few additional image processing steps to "clean up" the image and make it a little more readable.
Sample steps:
Convert the image to black and white (grayscale)
Apply a moving average threshold to the image. If the characters are usually about the same size in an image, then you shouldn't have much trouble selecting values for the two parameters of the moving average threshold algorithm.
Once the image has been converted to just black characters on a white background, try simple operations such as morphological "close" to fill in small gaps.
Present the original image and the cleaned image to adult readers, and time how long it takes for them to read each sample. This will give you some indication of the improvement in image quality.
A technique call Stroke Width Transform has been discussed on SO previously. It can be used to extract character strokes from even very complex backgrounds. The SWT would be harder to implement, but could work for quite a wide variety of images:
Stroke Width Transform (SWT) implementation (Java, C#...)
The texture in the paper could present a problem for many algorithms. However, there are technique for denoising images based on the Fast Fourier Transform (FFT), an algorithm that you can use to find 1D or 2D sinusoidal patterns in an image (e.g. grid patterns). About halfway down the following page you can see examples of FFT-based techniques for removing periodic noise:
http://www.fmwconcepts.com/misc_tests/FFT_tests/index.html
If you find a technique that works for the images you're testing, I'm sure a number of people would be interested to see the unprocessed and processed images.

How can I deblur an image in matlab?

I need to remove the blur this image:
Image source: http://www.flickr.com/photos/63036721#N02/5733034767/
Any Ideas?
Although previous answers are right when they say that you can't recover lost information, you could investigate a little and make a few guesses.
I downloaded your image in what seems to be the original size (75x75) and you can see here a zoomed segment (one little square = one pixel)
It seems a pretty linear grayscale! Let's verify it by plotting the intensities of the central row. In Mathematica:
ListLinePlot[First /# ImageData[i][[38]][[1 ;; 15]]]
So, it is effectively linear, starting at zero and ending at one.
So you may guess it was originally a B&W image, linearly blurred.
The easiest way to deblur that (not always giving good results, but enough in your case) is to binarize the image with a 0.5 threshold. Like this:
And this is a possible way. Just remember we are guessing a lot here!
HTH!
You cannot generally retrieve missing information.
If you know what it is an image of, in this case a Gaussian or Airy profile then it's probably an out of focus image of a point source - you can determine the characteristics of the point.
Another technique is to try and determine the character tics of the blurring - especially if you have many images form the same blurred system. Then iteratively create a possible source image, blur it by that convolution and compare it to the blurred image.
This is the general technique used to make radio astronomy source maps (images) and was used for the flawed Hubble Space Telescope images
When working with images one of the most common things is to use a convolution filter. There is a "sharpen" filter that does what it can to remove blur from an image. An example of a sharpen filter can be found here:
http://www.panoramafactory.com/sharpness/sharpness.html
Some programs like matlab make convolution really easy: conv2(A,B)
And most nice photo editing have the filters under some name or another (sharpen usually).
But keep in mind that filters can only do so much. In theory, the actual information has been lost by the blurring process and it is impossible to perfectly reconstruct the initial image (no matter what TV will lead you to believe).
In this case it seems like you have a very simple image with only black and white. Knowing this about your image you could always use a simple threshold. Set everything above a certain threshold to white, and everything below to black. Once again most photo editing software makes this really easy.
You cannot retrieve missing information, but under certain assumptions you can sharpen.
Try unsharp masking.

Detecting if two images are visually identical

Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.
findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.
Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.
Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.
I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.
resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.
If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).
There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.
You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.

Resources