I want to find the euclidean distance between some datasets. For each class I have about 50 learning or training samples and 5 test samples. The problem is that this data for each sample is stored in cells and is in the m * n format rather than the preferred 1 * n format.
I see that matlab has a built in function for k nearest neighbours using euclidean distance; knnsearch. But the problem is that it expects the samples to be in the form of rows and columns. It gives an error when data in cells are provided.
Is there anyway to do this classification in the format my data is in? Or is it preferred that I convert them to 1-D format (maybe by using some sort of dimensionality reduction?)
Suppose there are 5 classes/models that are each represented by cells of size 32 * N, below is a screenshot from Matlab. Each cell represents the features of a lot of training images, so cell 1 is a representation of person 1 and so on.
Now similarly I have test images that need to be identified. The features of each test image are also in cells. So I have 10 unknown images with their features extracted and stored in cells. Each cell is a representation of a single unknown image. Below again is a screenshot from Matlab showing their dimensions
As can be seen they are all of varying dimensions. I know this is not the ideal format of KNN as here each image is represented by a multidimension matrix rather than a single row of variables. How do I perform euclidean distance on such data?
Thank you
Related
I have read two references about the SIFT algorithm here and here and I am not truly understanding how just some key points are detected considering that the algorithm works on the difference of Gaussians calculated on several resolutions (they call them octaves). Here are the steps from the technique according to what I've understood from the paper.
Given the input image, blur them with Gaussian filters using different sigmas, resulting in Gaussian filtered images. In the paper, they used 5 Gaussian filters per octave (they tell that two adjacent Gaussian filtered images were filtered using sigma and k * sigma gaussian filter parameters), and they consider 4 octaves in the algorithm. So, there are a total of 20 gaussian filtered images (5 per octave), but we will act on 5 Gaussian filtered images from each octave individually.
For each octave, we calculate 4 Difference Of Gaussian images (DOG) from the 5 Gaussian filtered images by just subtracting adjacent Gaussian filtered images. So, now we have a total of 16 difference of Gaussian images, but we will consider 4 Difference of Gaussian images from each octave individually.
Find local extrema (maximum or minimum values) pixels by comparing a pixel in each DOG with 26 neighbor pixels. Among these, 8 pixels are in the same scale as the pixel (in a 3x3 window), 9 are in a 3x3 window in the scale above (Difference of Gaussian image from the same octave) and 9 others in a 3x3 window in the scale below.
Once finding these local extrema in different octaves, we must refine them, eliminating low-contrast points or weak edge points. They filter bad candidates using threshold in a Taylor expansion function and eigenvalue ratio threshold calculated in a Hessian matrix.
(this part I don't understand perfectly): For each interest point that survived (in each octave, I believe), they consider a neighborhood around it and calculate gradient magnitudes and orientation of each pixel from that region. They build a gradient orientation histogram covering 360 degrees and select the highest peak and also peaks that are higher than 80% of the highest peak. They define that the orientation of the keypoint is defined in a parabola (fitting function?) to the 3 histogram values closest to each peak to interpolate the peak position (I really dont understand this part perfectly).
What I am not understanding
1- The tutorial and even the original paper are not clear on how to detect a single key point as we are dealing with multiple octaves (images resolutions). For example, suppose I have detected 1000 key points in the first octave, 500 in the second, 250 in the third and 125 in the fourth octave. The SIFT algorithm will return me the following data about the key points: 1-(x,y) coordinates 2- scale (what is that?) 3-orientation and 4- the feature vector (which I easily understood how it is built). There are also Python functions from Opencv that can draw these keypoints using the original image (thus, the first octave), but how if the keypoints are detected in different octaves and thus the algorithm considers DOG images with different resolutions?
2- I don't understand the part 5 of the algorithm very well. Is it useful for defining the orientation of the keypoint, right? can somebody explain that to me with other words and maybe I can understand?
3- To find Local extrema per octave (step 3), they don't explain how to do that in the first and last DOG images. As we are considering 4 DOG images, It is possible to do that only in the second and third DOG images.
4- There is another thing that the author wrote that completely messed the understanding of the approach from me:
Figure 1: For each octave of scale space, the initial image is
repeatedly convolved with Gaussians to produce the set of scale space
images shown on the left. Adjacent Gaussian images are subtracted to
produce the difference-of-Gaussian images on the right. After each
octave, the Gaussian image is down-sampled by a factor of 2, and the
process repeated.
What? Does he downsample only one Gaussian image? how can the process be repeated by doing that? I mean, the difference of Gaussians is originally done by filtering the INPUT IMAGE differently. So, I believe the INPUT IMAGE, and not the Gaussian image must be resampled. Or the author forgot to write that THE GAUSSIAN IMAGES from a given octave are downsampled and the process is repeated for another octave?
I am trying to look for trends in a large data matrix, say 50,000 x 1000 elements. If I use the image command the matrix is printed to the screen. I would like to know how Matlab's image function decides which elements of the matrix to display considering there are not enough pixels on my screen to display so many elements? In other words, which downsampling algorithm is it applying to the matrix?
Nearest interpolation (as described in the interp2 function) is used. For alternatives check this question: How to avoid image display artifacts in Matlab?
I have 2 2D depth/height maps of equal dimension (256 x 256). Each pixel/cell of the depth map image, contains a float value. There are some pixels that have no information so are set to nan currently. The percentage of non nan cells can vary from ~ 20% to 80%. The depth maps are taken of the same area through point sampling an underlying common surface.
The idea is that the images represent a partial, yet overlapping, sampling of an underlying surface. And I need to align these images to create a combined sampled representation of the surface. If done blindly then the combined images have discontinuities especially in the z dimension (the float value).
What would be a fast method of aligning the 2 images? Translation in the x and y direction should be minimal only a few pixels (~ 0 to 10 pixels). But the float values of one image may need to be adjusted to align the images better. So minimizing the difference between the 2 images is the goal.
Thnx for any advice.
If your images are lacunar, one way is the exhaustive computation of a matching score in the window of overlap, ruling out the voids. FFT convolution will not apply. (Workload = overlap area * X-range * Y-range).
If both images differ in noise only, use the SAD matching score. If they also differ by the reference zero, subtract the average height before comparing.
You can achieve some acceleration by using an image pyramid, but you'll need to handle the voids.
Other approach could be to fill in the gaps by some interpolation method that somehow ensures that the interpolated values are compatible between both images.
Fooling around with OCR.
I have a set of binary images of numbers 0-9 that can be used as training data, and another set of unknown numbers within the same range. I want to be able to classify the numbers in the unknown set by using the k nearest neighbour algorithm.
I've done some studying on the algorithm, and I've read that the best approach is to take quantity characteristics and plot each training data in a feature space with those characteristics as the axes, and for each image in the unknown set do the same, and using the k nearest neighbour algorithm find the closest points, something like what is done here.
What characteristics would be best suited to something like this?
In a simple case, as phs mentioned in his comment, pixel intensities are used. The images are resized to a standard size like 20x20, 10x10 etc, and express the whole image as a vector of 400 or 100 elements respectively.
Such an example is shown here: Simple Digit Recognition OCR in OpenCV-Python
Or you can look for features like moments, centroid, area, perimeter, euler number etc.
If your image is grayscale, you can for Histogram of Oriented Gradients. Here is an example with SVM. You can try adapting it to the kNN : http://docs.opencv.org/trunk/doc/py_tutorials/py_ml/py_svm/py_svm_opencv/py_svm_opencv.html#svm-opencv
I know how to write a similarity function for data points in euclidean space (by taking the negative min sqaured error.) Now if I want to check my clustering algorithms on images how can I write a similarity function for data points in images? Do I base it on their RGB values or what? and how?
I think we need to clarify better some points:
Are you clustering only on color? So, take RGB values for pixels and apply your metric function (minimize sum of sq. error, or just calculate SAD - Sum of Absolute Differences).
Are you clustering on space basis (in an image)? In this case, you should take care of position, as you specified for euclidean space, just considering the image as your samples' domain. It's a 2D space anyway... 3D if you consider color information too (see next).
Are you looking for 3D information from image? (2D position + 1D color) It's the most probable case. Consider segmentation techniques if your image shows regular or well defined shapes, as first approach. If it fails, or you wanted a less hand tuned algorithm, consider reducing the 3D space of information to 2D or even 1D by doing PCA on data. By analyzing Principal Components you could drop off unuseful information from your collection and/or exploiting intrinsic data structure in some way.
The argument would need much more than a post to be solved, but I hope this could help a bit.