I am using
loss = 'mse'
in Keras for an autoencoder model that reconstructs greyscale images. My batch size is 1. A single loss value is being produced during training.
I can't seem to find anywhere an answer to this question. How does Keras calculate this MSE loss value for these 2 images? They're represented as 2d NumPy arrays. Does it compute the squared difference between each pixel and then divide by the number of pixels (considering the batch size is 1)?
Is this process the same if the input is more than 1 greyscale image into the model; computing the squared pixel difference across all the images, then dividing by the total number of pixels in all the images?
Many thanks
def mse(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
This is the code for the mse, the operations (difference and square) are bitwise (pixel by pixel), then it computes the mean, so it divides for the number of values (pixel).
Related
I am having trouble understanding the rawintden measurement in imageJ. It outputs numbers as large as 300,000, but pixel intensity is only measured on a scale of 0-255. Why is this scale so much larger and what does the calculation represent?
Thank you for your help!!
The value raw integrated density (RawIntDen) is the sum of all pixel values in the ROI (region of interest). Dividing this value by the number of pixels in the ROI gives the Mean. Since it is the sum of several pixels, its value is usually larger than the bit-depth of the image.
There is another measurement called IntDen which is the Area multiplied by the Mean. In an uncalibrated image, RawIntDen and IntDen are equal.
I'm working on a project for recognition of multiple digits on an image using a neural network trained on the MNIST dataset. The first step is detecting digits on a binary image which is made using CCL algorithm. But the problem is that all detected digits should be size normalized with the anti-aliasing technique to fit in a 20x20 pixel box while preserving their aspect ratio (http://yann.lecun.com/exdb/mnist/).
So, how can I solve this problem?
Cheers.
I've found the useful code for grayscale resizing using Bilinear interpolation on the following link: http://tech-algorithm.com/articles/bilinear-image-scaling/
After resizing an image on size where the longer side is 20 pixels long, the image is centered on the 28x28 image according to the centroid of the digit.
The python code for calculating the centroid of the image which should be centered:
centroid = np.ones(2)
summ = 0
img2 = np.array(img2)
for i in range (img2.shape[0]):
for j in range (img2.shape[1]):
if img2[i][j]:
summ+=img2[i][j]
centroid[0]+=i*img2[i][j]
centroid[1]+=j*img2[i][j]
centroid /= summ
centroid = np.rint(centroid)
I have read two references about the SIFT algorithm here and here and I am not truly understanding how just some key points are detected considering that the algorithm works on the difference of Gaussians calculated on several resolutions (they call them octaves). Here are the steps from the technique according to what I've understood from the paper.
Given the input image, blur them with Gaussian filters using different sigmas, resulting in Gaussian filtered images. In the paper, they used 5 Gaussian filters per octave (they tell that two adjacent Gaussian filtered images were filtered using sigma and k * sigma gaussian filter parameters), and they consider 4 octaves in the algorithm. So, there are a total of 20 gaussian filtered images (5 per octave), but we will act on 5 Gaussian filtered images from each octave individually.
For each octave, we calculate 4 Difference Of Gaussian images (DOG) from the 5 Gaussian filtered images by just subtracting adjacent Gaussian filtered images. So, now we have a total of 16 difference of Gaussian images, but we will consider 4 Difference of Gaussian images from each octave individually.
Find local extrema (maximum or minimum values) pixels by comparing a pixel in each DOG with 26 neighbor pixels. Among these, 8 pixels are in the same scale as the pixel (in a 3x3 window), 9 are in a 3x3 window in the scale above (Difference of Gaussian image from the same octave) and 9 others in a 3x3 window in the scale below.
Once finding these local extrema in different octaves, we must refine them, eliminating low-contrast points or weak edge points. They filter bad candidates using threshold in a Taylor expansion function and eigenvalue ratio threshold calculated in a Hessian matrix.
(this part I don't understand perfectly): For each interest point that survived (in each octave, I believe), they consider a neighborhood around it and calculate gradient magnitudes and orientation of each pixel from that region. They build a gradient orientation histogram covering 360 degrees and select the highest peak and also peaks that are higher than 80% of the highest peak. They define that the orientation of the keypoint is defined in a parabola (fitting function?) to the 3 histogram values closest to each peak to interpolate the peak position (I really dont understand this part perfectly).
What I am not understanding
1- The tutorial and even the original paper are not clear on how to detect a single key point as we are dealing with multiple octaves (images resolutions). For example, suppose I have detected 1000 key points in the first octave, 500 in the second, 250 in the third and 125 in the fourth octave. The SIFT algorithm will return me the following data about the key points: 1-(x,y) coordinates 2- scale (what is that?) 3-orientation and 4- the feature vector (which I easily understood how it is built). There are also Python functions from Opencv that can draw these keypoints using the original image (thus, the first octave), but how if the keypoints are detected in different octaves and thus the algorithm considers DOG images with different resolutions?
2- I don't understand the part 5 of the algorithm very well. Is it useful for defining the orientation of the keypoint, right? can somebody explain that to me with other words and maybe I can understand?
3- To find Local extrema per octave (step 3), they don't explain how to do that in the first and last DOG images. As we are considering 4 DOG images, It is possible to do that only in the second and third DOG images.
4- There is another thing that the author wrote that completely messed the understanding of the approach from me:
Figure 1: For each octave of scale space, the initial image is
repeatedly convolved with Gaussians to produce the set of scale space
images shown on the left. Adjacent Gaussian images are subtracted to
produce the difference-of-Gaussian images on the right. After each
octave, the Gaussian image is down-sampled by a factor of 2, and the
process repeated.
What? Does he downsample only one Gaussian image? how can the process be repeated by doing that? I mean, the difference of Gaussians is originally done by filtering the INPUT IMAGE differently. So, I believe the INPUT IMAGE, and not the Gaussian image must be resampled. Or the author forgot to write that THE GAUSSIAN IMAGES from a given octave are downsampled and the process is repeated for another octave?
Here is the problem: I have an image 256x256 on which wavelet transform is applied. As a result I get a table 16x64 coefficients (3.066568 3.386725 and so on). Do you have any idea how the feature vector length can be calculated?
I am doing a project on image quality assessment. I converted the image to grayscale & divided the entire image into 8x8 matrices using mat2cell function. I did this for two images, and now I want to calculate covariance between these two images (i.e the covariance between the matrix of image 1 and covariance between the same matrix of image 2). Note that both are the same images: one a pure image without distortions and one with distortions.
First convert your image to matrix:
I = double(imread('photo.jpg'));
then calculate covariance:
x=cov(I);
For single matrix input, C has size [size(A,2) size(A,2)] based on the number of random variables (columns) represented by A. The variances of the columns are along the diagonal. If A is a row or column vector, C is the scalar-valued variance.
For two-vector or two-matrix input, C is the 2-by-2 covariance matrix between the two random variables. The variances are along the diagonal of C.