Difference between observations and variables in Matlab - image

I'm kind of ashamed to even ask this but here goes. In every Matlab help file where the input matrix is a NxD matrix X Matlab describes the matrix arrangement as
Data, specified as a numeric matrix. The rows of X correspond to
observations, and the columns correspond to variables.
Above taken from help of kmeans
I'm kind of confused as to what does Matlab mean by observations and variables.
Suppose I have a data matrix composed of 100 images. Each image is represented by a feature vector of size 128 x 1. So here is 100 my observations and 128 the variables or is it the other way around?
Will my data matrix be of the size 128 x 100 or 100 x 128

Eugene's explanation in a statistical and probability construct is great, but I would like to explain it more in the viewpoint of data analysis and image processing.
Think of an observation as one sample from your data set. In this case, one observation is one image. For each sample, it has some dimensionality associated to it or a number of variables used to represent such a sample.
For example, if we had a set of 100 2D Cartesian points, the amount of observations is 100, while the dimensionality or the total number of variables used to describe the point is 2: We have a x point and a y point. As such, in the MATLAB universe, we'd place all of these data points into a single matrix. Each row of the matrix denotes one point in your data set. Therefore, the matrix you would create here is 100 x 2.
Now, go back to your problem. We have 100 images and each image can be expressed by 128 features. This suspiciously looks like you are trying to use SIFT or SURF to represent an image so think of this situation where each image can be described by a 128-dimensional vector, or a histogram with bins of 128 elements. Each feature is part of the dimensionality makeup that makes up the image. Therefore, you would have a 100 x 128 matrix. Each row represents one image, where each image is represented as a 1 x 128 feature vector.
In general, MATLAB's machine learning and data analysis algorithms assume that your matrix is M x N, where M is the total number of points that make up your data set while N is the dimensionality of one such point in your data set. In MATLAB's universe, the total number of observations is equal to the total number of points in your data set, while the total number of features / distinct attributes to represent one sample is the total number of variables.
Observation: One sample from your data set
Variable: One feature / attribute that helps describe an observation or sample in your data set.
Number of observations: Total number of points in your data set
Number of variables: Total number of features / attributes that make up an observation or sample in your data set.

It looks like you are talking about some specific statistical/probabilistic functions. In statistics or probability theory there are some random variables that are results of some kind of measurements/observations over time (or some other dimension). So such a matrix is just a collection of N measurements of D different random variables.


Showing two images with the same colorbar in log

I have two sparse matrices "Matrix1" and "Matrix2" of the same size p x n.
By sparse matrix I mean that it contains a lot of exactly zero elements.
I want to show the two matrices under the same colormap and a unique colorbar. Doing this in MATLAB is straightforward:
bottom = min(min(min(Matrix1)),min(min(Matrix2)));
top = max(max(max(Matrix1)),max(max(Matrix2)));
caxis manual
caxis([bottom top]);
caxis manual
caxis([bottom top]);
My problem:
In fact, when I show the matrix using imagesc(Matrix), it can ignore the noises (or backgrounds) that always appear with using imagesc(10*log10(Matrix)).
That is why, I want to show the 10*log10 of the matrices. But in this case, the minimum value will be -Inf since the matrices are sparse. In this case caxis will give an error because bottom is equal to -Inf.
What do you suggest me? How can I modify the above code?
Any help will be very appreciated!
A very important point is that the minimum value in your matrix will always be 0. Leveraging this, a very simple way to address your problem is to add 1 inside the log operation so that values that map to 0 in the original matrix also map to 0 in the log operation. This avoids the -Inf error that you're encountering. In fact, this is a very common way of visualizing the Fourier Transform if you will. Adding 1 to the logarithm ensures that the transform has no negative values in the output, yet the derivative or its rate of change remains intact as the effect is simply a translation of the curve by 1 unit to the left.
Therefore, simply do imagesc(10*log10(1 + Matrix));, then the minimum is always bounded at 0 while the maximum is unbounded but subject to the largest value that is seen in Matrix.

Savitzky–Golay filter for 2D images

I would like to ask about Savitzky–Golay filter on 2D-images.
What are the best coefficient and order to choose for finding local details in the image.
Moreover, if someone has an explanation for coefficients and the orders one the 2D-images, it would be perfect.
Thanks in advance
Please check out this website:
UPDATE: (Thank you for the suggestion, #Rasclatt)
Which has been reproduced here:
Two-dimensional smoothing and differentiation can also be applied to tables of data values, such as intensity values in a photographic image which is composed of a rectangular grid of pixels.[16] [17] The trick is to transform part of the table into a row by a simple ordering of the indices of the pixels. Whereas the one-dimensional filter coefficients are found by fitting a polynomial in the subsidiary variable, z to a set of m data points, the two-dimensional coefficients are found by fitting a polynomial in subsidiary variables v and w to a set of m × m data points. The following example, for a bicubic polynomial and m = 5, illustrates the process, which parallels the process for the one dimensional case, above.[18]
The square of 25 data values, d1 − d25
becomes a vector when the rows are placed one after another.
The Jacobian has 10 columns, one for each of the parameters a00 − a03 and 25 rows, one for each pair of v and w values. Each row has the form
The convolution coefficients are calculated as
The first row of C contains 25 convolution coefficients which can be multiplied with the 25 data values to provide a smoothed value for the central data point (13) of the 25.
check out the below links which use SURE(Stein's unbiased risk estimator) to minimizes the mean squared error between your estimate and the image. This method is useful for denoising and data smoothing.
this link is for optimization of parameters for 1D Savitzky Golay Filter(this will be helpful to understand the 2D part)
this link is for optimization of parameters of 2D Savitzky Golay Filter

How to compute a distance between two byte arrays?

I need to calculate the distance between two byte arrays of the same length. In particular, I am looking for approach to obtain a distance with the following features:
if the two arrays are very similar to each other, then the distance should be very small;
otherwise, the distance should be very large.
Basically, I'm looking for a way to measure the difference between two arrays.
UPDATE: As suggested, I provide the following additional information about the content of a byte array. A sequence of bytes contains the features of an image, so an image is divided into small regions, and some color information is measured for each region (each byte encodes information relating to a single region): when a bit is set within a byte, then it means that a given feature is present within the region.
Therefore, given two sequences of bytes, I would like to compare using a suitable distance measure. I read about Bhattacharyya distance, but I do not know how to apply it in this case, so I was wondering if there were other distance measures to compare two byte arrays.
You can use the Euclidean distance for this. Basically you add the squares of the difference between each pair of elements in your arrays and extract the square root from that sum.
See http://en.wikipedia.org/wiki/Euclidean_distance
However, there are other distance metrics that could apply better to your data, for example Pearson Correlation, cosine similiarity, hamming distance, etc.
By order of complexity,
a L1 = Sum | xi - yi |
or a L2 = Sum | xi - yi |^2

Given data range, need clever algorithm to calculate granularity of graph axis scales

Drawing a graph. Have data points which range from A to B, and want to decide on a granularity for drawing the axis scales. Eg, for 134 to 151 the scale might run from 130 to 155, to start and end on "round" numbers in the decimal system. But the numbers might run from 134.31 to 134.35, in which case a scale from 130 to 135 would (visually) compress out the "significance" in the data -- it would be better to draw the scale from 134 to 135, or maybe even 134.3 to 134.4. And the data values might instead run from 0.013431 to 0.013435, or from 1343100 to 1343500.
So I'm trying to figure out an elegant way to calculate the "granularity" to round the low bound down to and the upper bound up to, to produce a "pleasing" chart. One could just "hack" it somehow, but that produces little confidence that "odd" cases will be handled well.
Any ideas?
Just an idea:
Add about 10% to your range, tune this figure empirically
Divide size of range by number of tick marks you want to have
Take the base 10 logarithm of that number
Multiply the result by three, then round to the nearest integer
The remainder modulo 3 will tell you whether you want the least significant decimal to change in steps of 1, 2, or 5
The result of an integer division by 3 will tell you the power of ten to use
Take the (extended) range and compute the extremal tick points it contains, according to the tick frequencey just computed
Ensure that all data points actually lie within that range, add ticks if not
If needed, add minor ticks by decreasing the integer above by one
I found a very helpful calculation which is very similar to the axis scale of excel graphs:
It is written for excel but I used and transformed it into objective-c code for setting up my graph axis.

Optimal Bucket Size and No. of Buckets

Sorry this post is not related to coding but more to data structures and Algorithms.
I'm having large amount of data each having different frequencies. The approximate figure plot seems to be a Bell curve. I now want to display the data in ranges which most precisely describes the frequency of the ranges.
e.g. the entire range of data has total no. of frequencies but this range or bucket size is not precise and may be made more precise.(e.g if some data is more concentrated in a particular frequency zone, we may build up a bucket with less data size but having more closely related frequencies.)
Any help regarding some algorithm .
I thought of an algorithm related to binary search.
Any ideas folks.
Not sure I am following, but it seems you are looking for k beans, where for each two beans, the probability of the data falling in one bean is identical for it being in the other bean.
From your description, your data seems to be normally distributed, or T-distributed.
One can evaluate the mean and standard deviation of the data, let the extracted S.D. be s and the mean be u.
The standard formulas for evaluating the mean and S.D. from the sample are1:
u = (x1 + x2 + ... + xn) / n (simple average)
s^2 = Sigma((xi - u)^2)/(n-1)
Given this information, you can evaluate the distribution of your data, which is N(u,s^2). Given this information, you can create a random variabe: X~N(u,s^2)2
Now all is left is finding the a,b,... as follows (assuming 10 buckets, this can obviously be modified as you wish):
P(X<a) = 0.1
P(X<b) = 0.2
P(X<c) = 0.3
After finding a,b,c,... you have your beans: (-infinity,a], (a,b], (a,c], ...
(1) evaluating variance: http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance
(2)The real distribution for this variable is actually t-distribution, since the variance is unknown - and extracted from the data. However - for large enough n - t-distribution decays into normal distribution.
First count all the indexes then subtract the repeating values this will give you optimal number of buckets. but at small level
