How to change dynamic range of an RGB image? - image

I have 16-bit raw image (12 effective bits). I convert it to rgb and now I want to change the dynamic range. I created 2 map functions. You can see them visualized below. As you can see the first function maps values 0-500 to 0-100 and the second one maps the rest values to 101-255.
Now I want to apply the map-functions on the rgb image. What I'm doing is iterating through each pixel, find appropriate function for each channel and apply it on the channel. For example, the pixel is RGB=[100 2000 4000]. On R channel I'll apply the first function since 100 is in 0-500 range. But, on G and B channels I'll apply the second function since their values are in 501-4095.
But, in doing this way I'm actually changing the actual color of the pixel since I apply different functions on the channels of the pixel.
Can you suggest how to do it or at least give me a direction or show some articles?

What you're doing is a very straightforward imaging operation, frequently applied in image and video processing. Sometimes it's (imprecisely) called a lookup table (LUT), even though it's not always implemented via an actual lookup table. Examples of this are gamma adjustment or log encoding.
For instance, an example of this kind of encoding is sRGB, which is a gamma encoding from linear light. You can read about it here: http://en.wikipedia.org/wiki/SRGB. You'll see that it has a nonlinear adjustment.
The name LUT implies a good way of doing it. If you can make your image a uint8 or uint16 valued set, you can create a vector of desired output values for any input value. The lookup table has the same number of elements as the possible range of the variable type. If you were using a uint8, you'd have a lookup table of 256 values. Then the lookup is easy, you just use the image value as an index into your LUT to get the resulting value. That computational efficiency is why LUTS are so widely used.
In your case, since you're working in RGB space, it is acceptable to apply the curves in exactly the same way to each of the three color channels. RGB space is nice for that reason. However, for various reasons, sometimes different LUTs are implemented per-channel.
So if you had an image (we'll use one included in MATLAB and pretend it's 12 bit by scaling it):
someimage = uint16(imread('autumn.tif')).*16;
image(someimage.*16); % Need to multiply again to display 16 bit data scaled properly
For your LUT, you would implement this as:
lut = uint8([(0:500).*(1/5), (501:4095).*((255-101)/(4095-501)) + 79.5326]);
plot(lut); %Take a look at the lut
This makes the piecewise calculation you described in your question.
You could make a new image this way:
convertedimage = lut(double(someimage)+1);
image(convertedimage);
Note that because MATLAB indexes with doubles--one based--you need to cast properly and add one. This doesn't slow things down as much as you may think; MATLAB is made to do this. I've been using MATLAB for decades and this still looks odd to me.
This method lets you get fancy with the LUT creation (logs, exp, whatever) and it still runs very fast.
In your case, your LUT only needs 4096 elements since your input data is only 12 bits. You may want to be careful with the bounds, since it's possible a uint16 could have higher values. One clean way to bound this is to use the min and end functions:
convertedimage = lut(min(double(someimage)+1, end));
Now, this has implemented your function, but perhaps you want a slightly different function. For instance, a common function of this type is a simple gamma adjustment. A gamma of 2.2 means that the incoming image values are scaled by taking them to the 1/2.2 power (if scaled between 0 and 1). We can create such a LUT as follows:
lutgamma = uint8(256.*(((0:4095)./4095).^(1/2.2)));
plot(lutgamma);
Again, we apply the LUT with a simple indexing:
convertedimage = lutgamma(min(double(someimage)+1, end));
And we get the following image:
Using a smooth LUT will usually improve overall image quality. A piecewise linear LUT will tend to cause the resulting image to have odd discontinuities in the shaded regions.
These are so common in many imaging systems that LUTs have file formats. To see what I mean, look at this LUT generator from a major camera company. LUTs are a big deal, and it looks like you're on the right track.

I think you are referring to something that Photoshop calls "Enhance Monochromatic Contrast", which is described here - look at "Step 3: Try Out The Different Algorithms".
Basically, I think you find a single min from all the channels and a single max from across all 3 channels and apply the same scaling to all the channels, rather than doing each channel individually with its own min and max.
Alternatively, you can convert to Lab (Lightness plus a and b) mode and apply your function to the Lightness channel (without affecting the a and b channels which hold the colour information) then transform back to RGB, your colour unaffected.

Related

What RGB format do tensorflow image ops expect?

How does a tensorflow image op (like nn.conv2d) expect image channels to be represented?
an array of 3 values ranging from [0-255]
an array of 3 values ranging from [0-1]
an array of 3 one-hot arrays of size 255
something else?
I'm trying to understand why my learning rate is so poor and I'm guessing it's because my input is malformed.
The conv2d accepts all the forms you mentioned here. It doesn't care what the input range should be, as long it is within the data-type range. But from a neural network training perspective its very important that the inputs are scaled properly. Not only with the input image, but even at each layer level we want the inputs to be scaled properly. And that why techniques like batch-normalization is present in almost all recent networks because it improves training by enabling better flow of gradients through the network. So scaling the images to [-1, +1] range (or zero mean unit variance) is important.

Approaches for efficient compression of images with several focus planes

I am working on a application where images at different focus planes are aquired and currently stored inside a multipage tif. Unfortunately the tif-based compression techniques does not
benefit
from the signal redundancy over the different focus planes.
I found some resourcs about this
here
ZPEG
and here
JPEG2000 Addon
unfortunately they are all far away from a standard.
I was wondering if there is probably a video codec which could achive great compression ratios in this scenario?
I am also very open very any other ideas.
Here's a different approach: turning the cross-plane redundancy into spatial redundancy and then using standard image compression.
In the simplest way, just take strips of width*1 pixel, from every plane, and stack them. As an image, that will look vertically smeared in a weird way. It's best if this lines up with DCT blocks (if applicable) to avoid having a sharp horizontal edge through a block, so it should probably be padded to a multiple of (usually) 8 planes by duplicating a plane. You could gain a bit more by optimizing the padding for minimum energy, but that's complicated whereas duplicating is already pretty good and trivial.
It obviously wouldn't compress well with unfiltered lossless compression, but PNG with a suitable filter (up, average or paeth) should work.
The problem with tiff is that it does not support inter-component decorrelation in its baseline. There are some extensions (not very broadly supported) that allow storing other file compression formats (such as a complete JPEG2000 JP2 file, extension 0x8798), but it is not guaranteed that an standard decoder will process it correctly.
If you can use any tool you want, close to optimal coding performance is probably obtained with a good spectral decorrelation transform (the KLT for lossy compression and the RKLT for lossless compression - see http://gici.uab.cat/GiciWebPage/downloads.php#spectral for a JAVA implementation of these transform) and then a good compression algorithm such as JPEG2000. On the other hand, this approach can be a bit complicated to implement and slow due to the KLT/RKLT transforms.
Other simpler approach is to simply use JPEG2000 with the DWT for spectral decorrelation. For instance, if you use the Kakadu implementation (kakadusoftware.com), you just need to use the proper parameters when compressing. Here you have an example invocation extracted from http://kakadusoftware.com/wp-content/uploads/2014/06/Usage_Examples.txt,
Ai) kdu_compress -i catscan.rawl*35#524288 -o catscan.jpx -jpx_layers *
-jpx_space sLUM Creversible=yes Sdims={512,512} Clayers=16
Mcomponents=35 Msigned=no Mprecision=12
Sprecision=12,12,12,12,12,13 Ssigned=no,no,no,no,no,yes
Mvector_size:I4=35 Mvector_coeffs:I4=2048
Mstage_inputs:I25={0,34} Mstage_outputs:I25={0,34}
Mstage_collections:I25={35,35}
Mstage_xforms:I25={DWT,1,4,3,0}
Mnum_stages=1 Mstages=25
-- Compresses a medical volume consisting of 35 slices, each 512x512,
represented in raw little-endian format with 12-bits per sample,
packed into 2 bytes per sample. This example follows example (x)
above, but adds a multi-component transform, which is implemented
using a 3 level DWT, based on the 5/3 reversible kernel (the kernel-id
is 1, which is found in the second field of the `Mstage_xforms' record.
-- To decode the above parameter attributes, note that:
a) There is only one multi-component transform stage, whose instance
index is 25 (this is the I25 suffix found on the descriptive
attributes for this stage). The value 25 is entirely arbitrary. I
picked it to make things interesting. There can, in general, be
any number of transform stages.
b) The single transform stage consists of only one transform block,
defined by the `Mstage_xforms:I25' attribute -- there can be
any number of transform blocks, in general.
c) This block takes 35 input components and produces 35 output
components, as indicated by the `Mstage_collections:I25' attribute.
d) The stage inputs and stage outputs are not permuted in this example;
they are enumerated as 0-34 in each case, as given by the
`Mstage_inputs:I25' and `Mstage_outputs:I25' attributes.
e) The transform block itself is implemented using a DWT, whose kernel
ID is 1 (this is the Part-1 5/3 reversible DWT kernel). Block
outputs are added to the offset vector whose instance index is 4
(as given by `Mvector_size:I4' and `Mvector_coeffs:I4') and the
DWT has 3 levels. The final field in the `Mstage_xforms' record
is set to 0, meaning that the canvas origin for the multi-component
DWT is to be taken as 0.
f) Since a multi-component transform is being used, the precision
and signed/unsigned properties of the final decompressed (or
original compressed) image components are given by `Mprecision'
and `Msigned', while their number is given by `Mcomponents'.
g) The `Sprecision' and `Ssigned' attributes record the precision
and signed/unsigned characteristics of what we call the codestream
components -- i.e., the components which are obtained by block
decoding and spatial inverse wavelet transformation. In this
case, the first 5 are low-pass subband components, at the bottom
of the DWT tree; the next 4 are high-pass subband components
from level 3; then come 9 high-pass components from level 2 of
the DWT; and finally the 17 high-pass components belonging to
the first DWT level. DWT normalization conventions for both
reversible and irreversible multi-component transforms dictate
that all high-pass subbands have a passband gain of 2, while
low-pass subbands have a passband gain of 1. This is why all
but the first 5 `Sprecision' values have an extra bit -- remember
that missing entries in the `Sprecision' and `Ssigned' arrays
are obtained by replicating the last supplied value.

Feature Vector Representation Neural Networks

Objective: Digit recognition by using Neural Networks
Description: images are normalized into 8 x 13 pixels. For each row ever black pixel is represented by 1and every white white 0. Every image is thus represented by a vector of vectors as follows:
Problem: is it possible to use a vector of vectors in Neural Networks? If not how should can the image be represented?
Combine rows into 1 vector?
Convert every row to its decimal format. Example: Row1: 11111000 = 248 etc.
Combining them into one vector simply by concatenation is certainly possible. In fact, you should notice that arbitrary reordering of the data doesn't change the results, as long as it's consistent between training and classification.
As to your second approach, I think (I am really not sure) you might lose some information that way.
To use multidimensional input, you'd need multidimensional neurons (which I suppose your formalism doesn't support). Sadly you didn't give any info on your network structure, which i think is your main source of problems an confusion. Whenever you evaluate a feature representation, you need to know how the input layer will be structured: If it's impractical, you probably need a different representation.
Your multidimensional vector:
A network that accepts 1 image as input has only 1 (!) input node containing multiple vectors (of rows, respectively). This is the worst possible representation of your data. If we:
flatten the input hierarchy: We get 1 input neuron for every row.
flatten the input hierarchy completely: we get 1 input neuron for every pixel.
Think about all 3 approaches and what it does to your data. The latter approach is almost always as bad as the first approach. Neural networks work best with features. Features are not restructurings of the pixels (your row vectors). They should be META-data you can gain from the pixels: Brightness, locations where we go from back to white, bounding boxes, edges, shapes, masses of gravity, ... there's tons of stuff that can be chosen as features in image processing. You have to think about your problem and choose one (or more).
In the end, when you ask about how to "combine rows into 1 vector": You're just rephrasing "finding a feature vector for the whole image". You definitely don't want to "concatenate" your vectors and feed raw data into the network, you need to find information before you use the network. This is critical for pre-processing.
For further information on which features might be viable for OCR, just read into some papers. The most successful network atm is Convolutional Neural Network. A starting point for the topic feature extraction is here.
1 ) Yes combine into one vector is suitable i use this way
http://vimeo.com/52775200
2) No it is not suitable because after normalization from rang ( 0-255 ) -> to range ( 0 - 1 ) differt rows gives aprox same values so lose data

How can I choose an image with higher contrast in PHP?

For a thumbnail-engine I would like to develop an algorithm that takes x random thumbnails (crop, no resize) from an image, analyzes them for contrast and chooses the one with the highest contrast. I'm working with PHP and Imagick but I would be glad for some general tips about how to compute contrast of imagery.
It seems that many things are easier than computing contrast, for example counting colors, computing luminosity,etc.
What are your experiences with the analysis of picture material?
I'd do it that way (pseudocode):
L[256] = {0,0,0...}
loop over each pixel:
luminance = avg(R,G,B)
increment L[luminance] by 1
for i = 0 to 255:
if L[i] < C: L[i] = 0 // C = threshold of your chose
find index of first and last non-zero value of L[]
contrast = last - first
In looking for the image "with the highest contrast," you will need to be very careful in how you define contrast for the image. In the simplest way, contrast is the difference between the lowest intensity and the highest intensity in the image. That is not going to be very useful in your case.
I suggest you use a histogram approach to describe the contrast of a given image and then compare the properties of the histograms to determine the image with the highest contrast as you define it. You could use a variety of well known containers to represent the histogram in code, or construct a class to meet your specific needs. (I am not implying that you need to create a histogram in the form of a chart – just a statistical representation of the intensity values.) You could use the variance of each histogram directly as a measure of contrast, or use the standard deviation if that is easier to work with.
The key really lies in how you define the contrast of the image. In general, I would define a high contrast image as one with values present for all, or nearly all, the possible values. And I would further add that in this definition of a high contrast image, the intensity values of the image will tend to be distributed across the range of possible values in a uniform way.
Using this approach, a low contrast image would tend to have relatively few discrete intensity values and they would tend to be closely grouped together rather than uniformly distributed. (As a general rule, they will also tend to be grouped toward the center of the range.)

How do I efficiently segment 2D images into regions/blobs of similar values?

How do I segment a 2D image into blobs of similar values efficiently? The given input is a n array of integer, which includes hue for non-gray pixels and brightness of gray pixels.
I am writing a virtual mobile robot using Java, and I am using segmentation to analyze the map and also the image from the camera. This is a well-known problem in Computer Vision, but when it's on a robot performance does matter so I wanted some inputs. Algorithm is what matters, so you can post code in any language.
Wikipedia article: Segmentation (image processing)
[PPT] Stanford CS-223-B Lecture 11 Segmentation and Grouping (which says Mean Shift is perhaps the best technique to date)
Mean Shift Pictures (paper is also available from Dorin Comaniciu)
I would downsample,in colourspace and in number of pixels, use a vision method(probably meanshift) and upscale the result.
This is good because downsampling also increases the robustness to noise, and makes it more likely that you get meaningful segments.
You could use floodfill to smooth edges afterwards if you need smoothness.
Some more thoughts (in response to your comment).
1) Did you blend as you downsampled? y[i]=(x[2i]+x[2i+1])/2 This should eliminate noise.
2)How fast do you want it to be?
3)Have you tried dynamic meanshift?(also google for dynamic x for all algorithms x)
Not sure if it is too efficient, but you could try using a Kohonen neural network (or, self-organizing map; SOM) to group the similar values, where each pixel contains the original color and position and only the color is used for the Kohohen grouping.
You should read up before you implement this though, as my knowledge of the Kohonen network goes as far as that it is used for grouping data - so I don't know what the performance/viability options are for your scenario.
There are also Hopfield Networks. They can be mangled into grouping from what I read.
What I have now:
Make a buffer of the same size as the input image, initialized to UNSEGMENTED.
For each pixel in the image where the corresponding buffer value is not UNSEGMENTED, flood the buffer using the pixel value.
a. The border checking of the flooding is done by checking if pixel is within EPSILON (currently set to 10) of the originating pixel's value.
b. Flood filling algorithm.
Possible issue:
The 2.a.'s border checking is called many times in the flood filling algorithm. I could turn it into a lookup if I could precalculate the border using edge detection, but that may add more time than current check.
private boolean isValuesCloseEnough(int a_lhs, int a_rhs) {
return Math.abs(a_lhs - a_rhs) <= EPSILON;
}
Possible Enhancement:
Instead of checking every single pixel for UNSEGMENTED, I could randomly pick a few points. If you are expecting around 10 blobs, picking random points in that order may suffice. Drawback is that you might miss a useful but small blob.
Check out Eyepatch (eyepatch.stanford.edu). It should help you during the investigation phase by providing a variety of possible filters for segmentation.
An alternative to flood-fill is the connnected-components algorithm. So,
Cheaply classify your pixels. e.g. divide pixels in colour space.
Run the cc to find the blobs
Retain the blobs of significant size
This approach is widely used in early vision approaches. For example in the seminal paper "Blobworld: A System for Region-Based Image Indexing and Retrieval".

Resources