Size of Input and ConvNet - image

In CS231n course about Convolution Neural Network, in ConvNet note:
INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
From the document, I understand that a INPUT will contain images with 32 (width) x 32 (height) x 3 depth. But later in result of Conv layer, it was [32x32x12] if we decided to use 12 filters.
Where is the 3 as in depth of the image?
Please help me out here, thank you in advance.

It gets "distributed" to each feature map (result after convolution with filter).
Before thinking about 12 filters, just think of one. That is, you are applying convolution with a filter of [filter_width * filter_height * input_channel_number]. And because your input_channel_number is the same as filter channel, you basically applying input_channel_number of 2d convolution independently on each input channel and then sum them together. And the result is a 2D feature map.
Now you can repeat this 12 times to get 12 feature maps and stack them together to get your [32 x 32 x 12] feature volume. And that's why your filter size is a 4D vector with [filter_width * filter_height * input_channel_number * output_channel_number], in your case this should be something like [3x3x3x12] (please note the ordering may vary between different framework, but operation is the same)

So, this is fun. I have read the document again and found the answer which is some 'scroll down' away. Before, I thought the filter, for example, is 32 x 32 (no depth). The truth is:
A typical filter on a first layer of a ConvNet might have size 5x5x3 (i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels).
During the forward pass, we slide (more precisely, convolve) each filter across the width and height of the input volume and compute dot products between the entries of the filter and the input at any position.

Related

How do successive convolutional layers work?

If my first convolution have 64 filters and my second has 32 filters.
Will i have :
1 Image -> Conv(64 filters) -> 64 ImagesFiltred -> Conv(32 filters) -> 64 x 32 = 2048 Images filtred
Or :
1 Image -> Conv(64 filters) -> 64 ImagesFiltred -> Conv(32 filters) -> 32 Images filtred
If it is the second answer : what are goin on between the 64 ImagesFiltred and the second Conv ??
Thanks for your answer, in don't find a good tutorial that explain clearly, it always a rush ...
Your first point is correct. Convolutions are essentially ways of altering and extracting features from data. We do this by creating m images, each looking at a certain frame of the original image. On this first convolutional layer, we then take n images for each convoluted image in the first layer.
SO: k1 *k2 would be the total number of images.
To further this point,
a convolution works by making feature maps of an image. When you have successive convolutional layers, you are making feature maps of feature maps. I.e. if I start with 1 image, and my first convolutional layer is of size 20, then I have 20 images (more specifically feature maps) at the end of convolution 1. Then let's say I add a second convolution of size 10. What happens is then I am making 10 feature maps for every 1 image. Thus, it would be 20*10 images = 200 feature maps.
Let's say for example you have a 50x50 pixel image. Let's say you have a convolutional layer with a filter of size 5x5. What happens if you don't have padding or anything else) is that you "slide" across the image and get a weighted average of the pixels at each iteration of the slide (depending on your location). You would then get an output feature map of size 5x5. Let's say you do this 20 times then (i.e. a 5x5x20 convolution) You would then have as an output 20 feature maps of size 5x5. In the diagram mentioned in the VGG neural network post below, the diagram only shows the number of feature maps to made for the incoming feature maps NOT the end sum of feature maps.
I hope this explanation was thorough!
Here we have the architecture of the VGG-16
In VGG-16 we have 4 convolutions : 64, 128, 256 512
And in the architecture we saw that we don't have 64 images, 64*128 images etc
but just 64 images, 128 images etc
So the good answer was not the first but the second. And it imply my second questions :
"What are goin on between the 64 ImagesFiltred and the second Conv ??"
I think between a 64 conv et and 32 conv they are finaly only 1 filter but on two pixel couch so it divide the thickness of the conv by 2.
And between a 64 conv and a 128 conv they are only 2 filter on one pixel couch so ti multiply by 2 the thickness of the conv.
Am i right ?

Average over a 3x3x3 voxel in a 192x192x24 volume

I am processing image files with measured intensity, basically extracting voxels in sizes of 1x1x1 pixels. The image files are forming a volume to avoid peak intensities. I would like find a way to average over 3x3x3 pixel.
My problem is to get my head around the problem, because it is a shape within the image separated by zeros and other values. So, first of I considered a for-loop with a if-statement. These are the considerations I have made so far for the for-loop and if-statement. MATLAB perceives the volume as a long matrix so by a simple for loop it should be easy to find a non-zero value and its adjacent values, and take the average over those values. The problem comes when I have to take the z dimension into account.
This is clearly not working optimal, and I find it hard to account for the boundary effects.
I hope I'm interpreting your question right, but you want to find the average over a 3 x 3 x 3 voxel volume for each voxel in the input image where each input voxel acts as the centre of each 3 x 3 x 3 voxel volume to be averaged. If you have the option of using MATLAB's built-in functions, consider using N-D convolution with convn. Don't use loops here because it will be notoriously slow. For convn, the first parameter is the 3D image, and the second parameter is a 3 x 3 x 3 kernel with values all equal to 1/27. You also have the option of specifying what happens along the border should your convolution kernel go beyond the limits of the input image. Usually, you want to return an output image that's the same size as the input and so you may want to specify the 'same' flag as the third optional parameter. This averaging mechanism also assumes that the outer edges are zero-padded.
Therefore, supposing your image is stored in im, do something like this:
%// Create kernel of all 1/27 in a 3 x 3 x 3 matrix
kernel = ones(3,3,3);
kernel = kernel / numel(kernel);
%// Perform N-D convolution
out = convn(double(im), kernel, 'same'); %// Cast to double for precision
out = cast(out, class(im)); %// Recast back to original data type
Alternatively, if you have access to the image processing toolbox, use imfilter instead. The difference with this and convn is that imfilter was written using Intel Integrated Performance Primitives (IIPP), and so performance will definitely be faster:
%// Create kernel of all 1/27 in a 3 x 3 x 3 matrix
kernel = ones(3,3,3);
kernel = kernel / numel(kernel);
%// Perform N-D convolution
out = imfilter(im, kernel);
The added bonus is that you aren't required to change the input type. imfilter automatically infers this, does the processing respecting the input image's original type and the output type of imfilter is the same as the input type. With convn, you must ensure that your data is floating-point before using it.

Detecting individual images in an array of images

I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.

How to change dynamic range of an RGB image?

I have 16-bit raw image (12 effective bits). I convert it to rgb and now I want to change the dynamic range. I created 2 map functions. You can see them visualized below. As you can see the first function maps values 0-500 to 0-100 and the second one maps the rest values to 101-255.
Now I want to apply the map-functions on the rgb image. What I'm doing is iterating through each pixel, find appropriate function for each channel and apply it on the channel. For example, the pixel is RGB=[100 2000 4000]. On R channel I'll apply the first function since 100 is in 0-500 range. But, on G and B channels I'll apply the second function since their values are in 501-4095.
But, in doing this way I'm actually changing the actual color of the pixel since I apply different functions on the channels of the pixel.
Can you suggest how to do it or at least give me a direction or show some articles?
What you're doing is a very straightforward imaging operation, frequently applied in image and video processing. Sometimes it's (imprecisely) called a lookup table (LUT), even though it's not always implemented via an actual lookup table. Examples of this are gamma adjustment or log encoding.
For instance, an example of this kind of encoding is sRGB, which is a gamma encoding from linear light. You can read about it here: http://en.wikipedia.org/wiki/SRGB. You'll see that it has a nonlinear adjustment.
The name LUT implies a good way of doing it. If you can make your image a uint8 or uint16 valued set, you can create a vector of desired output values for any input value. The lookup table has the same number of elements as the possible range of the variable type. If you were using a uint8, you'd have a lookup table of 256 values. Then the lookup is easy, you just use the image value as an index into your LUT to get the resulting value. That computational efficiency is why LUTS are so widely used.
In your case, since you're working in RGB space, it is acceptable to apply the curves in exactly the same way to each of the three color channels. RGB space is nice for that reason. However, for various reasons, sometimes different LUTs are implemented per-channel.
So if you had an image (we'll use one included in MATLAB and pretend it's 12 bit by scaling it):
someimage = uint16(imread('autumn.tif')).*16;
image(someimage.*16); % Need to multiply again to display 16 bit data scaled properly
For your LUT, you would implement this as:
lut = uint8([(0:500).*(1/5), (501:4095).*((255-101)/(4095-501)) + 79.5326]);
plot(lut); %Take a look at the lut
This makes the piecewise calculation you described in your question.
You could make a new image this way:
convertedimage = lut(double(someimage)+1);
image(convertedimage);
Note that because MATLAB indexes with doubles--one based--you need to cast properly and add one. This doesn't slow things down as much as you may think; MATLAB is made to do this. I've been using MATLAB for decades and this still looks odd to me.
This method lets you get fancy with the LUT creation (logs, exp, whatever) and it still runs very fast.
In your case, your LUT only needs 4096 elements since your input data is only 12 bits. You may want to be careful with the bounds, since it's possible a uint16 could have higher values. One clean way to bound this is to use the min and end functions:
convertedimage = lut(min(double(someimage)+1, end));
Now, this has implemented your function, but perhaps you want a slightly different function. For instance, a common function of this type is a simple gamma adjustment. A gamma of 2.2 means that the incoming image values are scaled by taking them to the 1/2.2 power (if scaled between 0 and 1). We can create such a LUT as follows:
lutgamma = uint8(256.*(((0:4095)./4095).^(1/2.2)));
plot(lutgamma);
Again, we apply the LUT with a simple indexing:
convertedimage = lutgamma(min(double(someimage)+1, end));
And we get the following image:
Using a smooth LUT will usually improve overall image quality. A piecewise linear LUT will tend to cause the resulting image to have odd discontinuities in the shaded regions.
These are so common in many imaging systems that LUTs have file formats. To see what I mean, look at this LUT generator from a major camera company. LUTs are a big deal, and it looks like you're on the right track.
I think you are referring to something that Photoshop calls "Enhance Monochromatic Contrast", which is described here - look at "Step 3: Try Out The Different Algorithms".
Basically, I think you find a single min from all the channels and a single max from across all 3 channels and apply the same scaling to all the channels, rather than doing each channel individually with its own min and max.
Alternatively, you can convert to Lab (Lightness plus a and b) mode and apply your function to the Lightness channel (without affecting the a and b channels which hold the colour information) then transform back to RGB, your colour unaffected.

How to remove gaussian noise?

I have to remove gaussian noise from this image (before, I had to filter it and add the noise). Then, I have to use function "o" and my grade is based on how low result of this function will be. I am trying and trying different things, but I can't remove this noise so I can get a good grade :/ any help please?
img=imread('liftingbody.png');
img=double(img)/255;
maska1=[1 1 1; 1 5 1; 1 1 1]/13;
odfiltrowany=imfilter(img,maska1);
zaszumiony=imnoise(odfiltrowany,'gaussian');
nowy=wiener2(zaszumiony);
nowy4=medfilt2(nowy);
o=1/512.*sqrt(sum(sum(img-nowy4).^2));
subplot(311); imshow(img);
subplot(312); imshow(zaszumiony);
subplot(313); imshow(nowy);
Try convoluting a Gaussian filter with your noisy image to remove Gaussian noise like below:
nowx=conv2(zaszumiony,fspecial('gaussian',[3 3],1.5),'same')/(sum(sum(fspecial('gaussian',[3 3],1.5))));
It should reduce your o function somewhat.
Try playing around with the strength of the filter (i.e. the 1.5 value) and the size of the kernel (i.e. [3 3] value) to reduce the noise to a minimum.
Adding to #ALM865's answer, you can also use imfilter. In fact, this is the recommended function that you use for images as imfilter has optimizations in place specifically for images. conv2 is the more general function for any 2D signal.
I have also answered how to choose the standard deviation and ultimately the size of your a Gaussian filter / kernel here: By which measures should I set the size of my Gaussian filter in MATLAB?
In essence, once you choose which standard deviation you want, you find a floor(6*sigma) + 1 x floor(6*sigma) + 1 Gaussian kernel to use in your filtering operation. Assuming that sigma = 2, you would get a 13 x 13 kernel. As ALM865 has said, you can create a Gaussian kernel using fspecial. You specify the 'gaussian' flag, followed by the size of the kernel and the standard deviation after. As such:
sigma = 2;
width = 6*sigma + 1;
kernel = fspecial('gaussian', [width width], sigma);
out = imfilter(zaszumiony, kernel, 'replicate');
imfilter takes in the image you want to filter, the convolution kernel you want to use to filter the image, and an optional flag that specifies what happens along the image pixel borders when the kernel doesn't fit completely inside the image. 'replicate' means that it simply copies the pixels along the borders, thus replicating them. There are other options, such as padding with a value (usually zero), circular padding and symmetric padding.
Play around with the standard deviation until you get what you believe is a good result.

Resources