Is conversion to gray scale a necessary step in Image preprocessing? - image

I would like to know if converting an image to gray scale is necessary step for all image pre processing techniques. I am using a neural network for face recognition. Is it really necessary for converting it into a gray scale or can I give color images also as input to neural networks?

Converting to gray scale is not necessary for image processing, but is usually done for a few reasons:
Simplicity - Many image processing operations work on a plane of image data (e.g., a single color channel) at a time. So if you have an RGBA image you might need to apply the operation on each of the four image planes and then combine the results. Gray scale images only contain one image plane (containing the gray scale intensity values).
Data reduction - Suppose you have a RGBA image (red-green-blue-alpha). If you converted this image to gray scale you would only need to process 1/4 of the data compared to the color image. For many image processing applications, especially video processing (e.g., real-time object tracking), this data reduction allows the algorithm to run in a reasonable amount of time.
However, it's important to understand that while there are many advantages of converting to gray scale, it is not always desirable. When you convert to gray scale you not only reduce the quantity of image data, but you also lose information (e.g., color information). For many image processing applications color is very important, and converting to gray scale can worsen results.
To summarize: If converting to gray scale still yields reasonable results for whatever application you're working on, it is probably desirable, especially due to the likely reduction in processing time. However it comes at the cost of throwing away data (color data) that may be very helpful or required for many image processing applications.

No, it is not required, it simplifies things, so it is an often practice to do so, but in general you could work directly on the color image, in any representation (RGB, CMYK) by simply using more dimensions (or more complex similarity/distance measure/kernel).

Related

How to extract features from retina images

I'm working in Diabetic Retinopathy Detection problem where I've been given Retina images [image1] with score labels. My job is to build a classification model that can detect and score retinopathy given unlabeled retina images.
First step that I'm currently working on is extracting features from these images and build input vector that I'll use as an input for my classification algorithm. I've basic knowledge in image processing and I've tried to crop my images to edges [Image2], turn it to Gray scale and get its histogram as an input vector but it seems that I still have a large representation for an image. In addition to that I may have lost some essential features that was encoded into the RGB image.
Image1:
Image2:
pre-processing medical images is not a trivial task, for the performance improvement of diabetic retinopathy you need to highlight the blood vessels, there are several pre-processing suitable for this, I am sending a link that may be useful
https://github.com/tfs4/IDRID_hierarchical_combination/blob/master/preprocess.py

Uncertainty in L,a,b space of compressed JPEG images

My team wish to calculate the contrast between two photographs taken in a wet environment.
We will calculate contrast using the formula
Contrast = SQRT((ΔL)^2 + (Δa)^2 + (Δb)^2)
where ΔL is the difference in luminosity, Δa is the difference in (redness-greeness) and Δb is (yellowness-blueness), which are the dimensions of Lab space.
Our (so far successful) approach has been to convert each pixel from RGB to Lab space, and taking the mean values of the relevant sections of the image as our A and B variables.
However the environment limits us to using a (waterproof) GoPro camera which compresses images to JPEG format, rather than saving as TIFF, so we are not using a true-colour image.
We now need to quantify the uncertainty in the contrast - for which we need to know the uncertainty in A and B and by extension the uncertainties (or mean/typical uncertainty) in each a and b value for each RGB pixel. We can calculate this only if we know the typical/maximum uncertainty produced when converting from true-colour to JPEG.
Therefore we need to know the maximum possible difference in each of the RGB channels when saving in JPEG format.
EG. if true colour RGB pixel (5, 7, 9) became (2, 9, 13) after compression the uncertainty in each channel would be (+/- 3, +/- 2, +/- 4).
We believe that the camera compresses colour in the aspect ratio 4:2:0 - is there a way to test this?
However our main question is; is there any way of knowing the maximum possible error in each channel, or calculating the uncertainty from the compressed RGB result?
Note: We know it is impossible to convert back from JPEG to TIFF as JPEG compression is lossy. We merely need to quantify the extent of this loss on colour.
In short, it is not possible to absolutely quantify the maximum possible difference in digital counts in a JPEG image.
You highlight one of these points well already. When image data is encoded using the JPEG standard, it is first converted to the YCbCr color space.
Once in this color space, the chroma channels (Cb and Cr) are downsampled, because the human visual system is less sensitive to artifacts in chroma information than it is lightness information.
The error introduced here is content-dependent; an area of very rapidly varying chroma and hue will have considerably more content loss than an area of constant hue/chroma.
Even knowing the 4:2:0 compression, which describes the amount and geometry of downsampling (more information here), the content still dictates the error introduced at this step.
Another problem is the quantization performed in JPEG compression.
The resulting information is encoded using a Discrete Cosine Transform. In the transformed space, the results are again quantized depending on the desired quality. This quantization is set at the time of file generation, which is performed in-camera. Again, even if you knew the exact DCT quantization being performed by the camera, the actual effect on RGB digital counts is ultimately content-dependent.
Yet another difficulty is noise created by DCT block artifacts, which (again) is content dependent.
These scene dependencies make the algorithm very good for visual image compression, but very difficult to characterize absolutely.
However, there is some light at the end of the tunnel. JPEG compression will cause significantly more error in areas of rapidly changing image content. Areas of constant color and texture will have significantly less compression error and artifacts. Depending on your application you may be able to leverage this to your benefit.

How to scale JPEG image down so that text is clear as possible?

I have some JPEG images that I need scale down to about 80% of original size. Original image dimension are about 700px × 1000px. Images contain some computer generated text and possibly some graphics (similar to what you would find in corporate word documents).
How to scale image so that the text is as legible as possible? Currently we are scaling the imaeg down using bicubic interpolation, but that makes the text blurry and foggy.
Two options:
Use a different resampling algorithm. Lanczos gives you a much less blurrier result.
You ight use an advances JPEG library that resamples the 8x8 blocks to 6x6 pixels.
If you are not set on exactly 80% you can try getting and building djpeg from http://www.ijg.org/ as it can decompress your jpeg to 6/8ths (75%) or 7/8ths (87.5%) size and the text quality will still be pretty good:
Original
7/8
3/4
(SO decided to scale the images when showing them inline)
There may be a scaling algorithm out there that works similarly, but this is an easy off the shelf solution.
There is always a loss involved in scaling down, but it again depends of your trade offs.
Blurring and artifact generation is normal for jpeg images, so its recommended that you generate images is the correct size the first time.
Lanczos is a fine solution, but you have your trade offs
If its just the text and you are concerned about it, you could try dilation filter over the resampled image. This would correct some blurriness but may also affects the graphics. If you can live with it, its good. Alternatively if you can identify the areas of text, you can apply dilation just over those areas.

Get dominant colors from image discarding the background

What is the best (result, not performance) algorithm to fetch dominant colors from an image. The algorithm should discard the background of the image.
I know I can build an array of colors and how many they appear in the image, but I need a way to determine what is the background and what is the foreground, and keep only the second (foreground) in mind while read the dominant colors.
The problem is very hard especially for gradient backgrounds or backrounds with patterns (not plain)
Isolating the foreground from the background is beyond the scope of this particular answer, but...
I've found that applying a pixelation filter to an image will draw out a really good set of 'average' colours.
Before
After
I sometimes use this approach to derive a pallete of colours with a particular mood. I first find a photograph with the general tones I'm after, pixelate and then sample from the resulting image.
(Thanks to Pietro De Grandi for the image, found on unsplash.com)
The colour summarizer is a pretty sweet spot for info on this subject, not to mention their seemingly free XML Web API that will produce descriptive colour statistics for an image of your choosing, reporting back the following formatted with swatches in HTML or as XML...
what is the average color hue, saturation and value in my image?
what is the RGB colour that is most representative of the image?
what do the RGB and HSV histograms look like?
what is the image's human readable colour description (e.g. dark pure blue)?
The purpose of this utility is to generate metadata that summarizes an
image's colour characteristics for inclusion in an image database,
such as Flickr. In particular this tool is being used to generate
metadata for Flickr's Color Fields group.
In my experience though.. this tool still misses the "human-readable" / obvious "main" color, A LOT of the time. Silly machines!
I would say this problem is closer to "impossible" than "very hard". The only approach to it that I can think of would be to make the assumption that the background of an image is likely to consist of solid blocks of similar colors, while the foreground is likely to consist of smaller blocks of dissimilar colors.
If this assumption is generally true, then you could scan through the whole image and weight pixels according to how similar or dissimilar they are to neighboring pixels. In other words, if a pixel's neighbors (within some arbitrary radius, perhaps) were all similar colors, you would not incorporate that pixel into the overall estimate. If the neighbors tend to be very different colors, you would weight the pixel heavily, perhaps in proportion to the degree of difference.
This may not work perfectly, but it would definitely at least tend to exclude large swaths of similar colors.
As far as my knowledge of image processing algorithms extends , there is no certain way to get the "foreground"; it is only possible to get the borders between objects. You'll probably have to make do with an average, or your proposed array count method. In that, you'll want to give colours with higher saturation a higher "score" as they're much more prominent.

Image fingerprint to compare similarity of many images

I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.
Best would be an existing library, but also some hints to existing algorithms would help me a lot.
Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.
If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.
a few simple solutions are possible:
Create a luminosity histogram for the image as a fingerprint
Create scaled down versions of each image as a fingerprint
Combine technique (1) and (2) into a hybrid approach for improved comparison quality
A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.
Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.
Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.
There is a much less ad-hoc approach than the scaled down image variants that have been proposed here that retains their general flavor, but which gives a much more rigorous mathematical basis for what is going on.
Take a Haar wavelet of the image. Basically the Haar wavelet is the succession of differences from the lower resolution images to each higher resolution image, but weighted by how deep you are in the 'tree' of mipmaps. The calculation is straightforward. Then once you have the Haar wavelet appropriately weighted, throw away all but the k largest coefficients (in terms of absolute value), normalize the vector and save it.
If you take the dot product of two of those normalized vectors it gives you a measure of similarity with 1 being nearly identical. I posted more information over here.
You should definitely take a look at phash.
For image comparison there is this php project :
https://github.com/kennethrapp/phasher
And my little javascript clone:
https://redaktor.me/phasher/demo_js/index.html
Unfortunately this is "bitcount"-based but will recognize rotated images.
Another approach in javascript was to build a luminosity histogram from the image by the help of canvas. You can visualize a polygon histogram on the canvas and compare that polygon in your database (e.g. mySQL spatial ...)
A long time ago I worked on a system that had some similar characteristics, and this is an approximation of the algorithm we followed:
Divide the picture into zones. In our case we were dealing with 4:3 resolution video, so we used 12 zones. Doing this takes the resolution of the source images out of the picture.
For each zone, calculate an overall color - the average of all pixels in the zone
For the entire image, calculate an overall color - the average of all zones
So for each image, you're storing n + 1 integer values, where n is the number of zones you're tracking.
For comparisons, you also need to look at each color channel individually.
For the overall image, compare the color channels for the overall colors to see if they are within a certain threshold - say, 10%
If the images are within the threshold, next compare each zone. If all zones also are within the threshold, the images are a strong enough match that you can at least flag them for further comparison.
This lets you quickly discard images that are not matches; you can also use more zones and/or apply the algorithm recursively to get stronger match confidence.
Similar to Ic's answer - you might try comparing the images at multiple resolutions. So each image get saved as 1x1, 2x2, 4x4 .. 800x800. If the lowest resolution doesn't match (subject to a threshold), you can immediately reject it. If it does match, you can compare them at the next higher resolution, and so on..
Also - if the images share any similar structure, such as medical images, you might be able to extract that structure into a description that is easier/faster to compare.
As of 2015 (back to the future... on this 2009 question which is now high-ranked in Google) image similarity can be computed using Deep Learning techniques. The family of algorithms known as Auto Encoders can create a vector representation which is searchable for similarity. There is a demo here.
One way you can do this is to resize the image and drop the resolution significantly (to 200x200 maybe?), storing a smaller (pixel-averaged) version for doing the comparison. Then define a tolerance threshold and compare each pixel. If the RGB of all pixels are within the tolerance, you've got a match.
Your initial run through is O(n^2) but if you catalog all matches, each new image is just an O(n) algorithm to compare (you only have to compare it to each previously inserted image). It will eventually break down however as the list of images to compare becomes larger, but I think you're safe for a while.
After 400 days of running, you'll have 500,000 images, which means (discounting the time to resize the image down) 200(H)*200(W)*500,000(images)*3(RGB) = 60,000,000,000 comparisons. If every image is an exact match, you're going to be falling behind, but that's probably not going to be the case, right? Remember, you can discount an image as a match as soon as a single comparison falls outside your threshold.
Do you literally want to compare every image against the others? What is the application? Maybe you just need some kind of indexing and retrieval of images based on certain descriptors? Then for example you can look at MPEG-7 standard for Multimedia Content Description Interface. Then you could compare the different image descriptors, which will be not that accurate but much faster.
So you want to do "fingerprint matching" that's pretty different than "image matching". Fingerprints' analysis has been deeply studied during the past 20 years, and several interesting algorithms have been developed to ensure the right detection rate (with respect to FAR and FRR measures - False Acceptance Rate and False Rejection Rate).
I suggest you to better look to LFA (Local Feature Analysis) class of detection techniques, mostly built on minutiae inspection. Minutiae are specific characteristics of any fingerprint, and have been classified in several classes. Mapping a raster image to a minutiae map is what actually most of Public Authorities do to file criminals or terrorists.
See here for further references
For iPhone image comparison and image similarity development check out:
http://sites.google.com/site/imagecomparison/
To see it in action, check out eyeBuy Visual Search on the iTunes AppStore.
It seems that specialised image hashing algorithms are an area of active research but perhaps a normal hash calculation of the image bytes would do the trick.
Are you seeking byte-identical images rather than looking for images that are derived from the same source but may be a different format or resolution (which strikes me as a rather hard problem).

Resources