Grauman & Darrells pyramid match kernel - can actual matching be done? - algorithm

I do the project on different matching algorithms, and with this one I can't understand quite clearly - does one really can get pair of corresponding features for train and test image or it just shows the degree of similarity between two images and you can't exactly match them? There are pictures in the article about it claiming some "partial matching", but is is a real matching indeed or not?

Here is a summary based mostly on remembering a paper in CACM, with a few quick looks at http://userweb.cs.utexas.edu/%7Egrauman/papers/grauman_cacm_extended.pdf
Given sets of points Xi and Yi representing features, you can produce a distance as SUM_i d(X_i, Y_p(i)) where p(i) matches each i with its own unique p(i), and is the p(x) producing the minimum such distance. You can find p(x) with the Hungarian algorithm, but this is expensive
The paper shows that you can approximate this distance much more cheaply. The approximation does not provide a p(x) for the original problem, but you could (I think) think of it as solving the matching problem for a simplified distance function f(X_i, Y_q(i)) where f(X, Y) only cares about whether X and Y fall into the bin of the histogram at some granularity, and, if so, which granularity that is. The algorithm does not produce an explicit q(x) but I suspect that you could produce one fairly easily if you wanted to, by pairing up points that fell into the same bin. If you did so, I suspect that it wouldn't do too badly with the original distance function d(X, Y), but I don't know what not too badly means here.
The function also has other nice properties, so that it plays well with Support Vector Machines, and fast approximate search algorithms.

Related

Ordering equations in the face of sometimes-unordered terms

Superposition calculus is used for reasoning with equations; it reduces the size of the search space by applying an order to equations, based on an ordering of terms.
A suitable ordering of terms, such as Knuth-Bendix, must sometimes answer 'unordered'. For example, f(x) vs f(y) where x and y are variables; a suitable order must be stable under substitution of terms for variables, so no matter which answer you might give for f(x) vs f(y) (less, same, greater), some substitution of terms for the two variables, would turn out to be inconsistent with the initial answer. In this domain, comparison needs a fourth possible answer, 'unordered'.
Superposition calculus orders equations relative to each other, based on the constituent terms and the polarity. There are ways of constructing this based on the multiset extension of term order, but perhaps the simplest correct algorithm is:
Compare the larger terms; if they are unequal, that's the answer.
Compare the polarities; if they are unequal, negative is greater than positive.
Compare the smaller terms.
It is tempting to implement this by first sorting each equation, larger term first, then implementing the above algorithm directly. The problem is that each equation may not have a larger term; it is quite possible that the component terms of one or both equations are unordered relative to each other, so a correct algorithm for comparing equations, must take that into account.
This could be derived from first principles by going through all the possibilities, but it also looks like there would be many opportunities to make a subtle error that would take a while to track down.
Is there a known/canonical algorithm already worked out, for comparing equations in this context?

Determining error between two surfaces given same discrete inputs?

I have, as an output of a machine learning algorithm, a surface in z, which has known increments along x and y. These points along x and y match exactly to a surface which I am comparing the output of my algorithm against in order to get a metric of fit, or error. I have been struggling to find an optimal way of calculating this, and can't find any good resources on different options that I have. I have tried simple pointwise subtraction of the surfaces, which I take the absolute value and summation of, and I have tried squared versions of this, as well as divided versions, but each of these encounters different problems. I was wondering if any of you knew of any good resources on different options and which of these work in different situations. Thanks!
If your problem is outliers, compute all absolute height differences and discard the N/2 largest (or another fraction, depending on the usual proportion of outliers). Then take the average of the remaining ones (or the RMS). This is called a trimmed average.

How is the class center for a decision attribute calculated in class center based fuzzification algorithm?

I came across class center based fuzzification algorithm on page 16 of this research paper on TRFDT. However, I fail to understand what is happening in step 2 of this algorithm (titled in the paper as Algorithm 2: Fuzzification). If someone could explain it by giving a small example it would certainly be helpful.
It is not clear from your question which parts of the article you understand and IMHO the article is written in not the clearest possible way, so this is going to be a long answer.
Let's start with some intuition behind this article. In short I'd say it is: "let's add fuzziness everywhere to decision trees".
How a decision tree works? We have a classification problem and we say that instead of analyzing all attributes of a data point in a holistic way, we'll analyze them one by one in an order defined by the tree and will navigate the tree until we reach some leaf node. The label at that leaf node is our prediction. So the trick is how to build a good tree i.e. a good order of attributes and good splitting points. This is a well studied problem and the idea is to build a tree that encode as much information as possible by some metric. There are several metrics and this article uses entropy which is similar to widely used information gain.
The next idea is that we can change the classification (i.e. split of the values into a classes) as fuzzy rather than exact (aka "crisp"). The idea here is that in many real life situations not all members of the class are equally representative: some a more "core" examples and some a more "edge" example. If we can catch this difference, we can provide a better classification.
And finally there is a question of how similar the data points are (generally or by some subset of attributes) and here we can also have a fuzzy answer (see formulas 6-8).
So the idea of the main algorithm (Algorithm 1) is the same as in the ID3 tree: recursively find the attribute a* that classifies the data in the best way and perform the best split along that attribute. The main difference is in how the information gain for the best attribute selection is measured (see heuristic in formulas 20-24) and that because of fuzziness the usual stop rule of "only one class left" doesn't work anymore and thus another entropy (Kosko fuzzy entropy in 25) is used to decide if it is time to stop.
Given this skeleton of the algorithm 1 there are quite a few parts that you can (or should) select:
How do you measure μ(ai)τ(Cj)(x) used in (20) (this is a measure of how well x represents the class Cj with respect to attribute ai, note that here being not in Cj and far from the points in Cj is also good) with two obvious choices of the lower (16 and 18) and the upper bounds (17 and 19)
How do you measure μRτ(x, y) used in (16-19). Given that R is induced by ai this becomes μ(ai)τ(x, y) which is a measure of similarity between two points with respect to attribute ai. Here you can choose one of the metrics (6-8)
How do you measure μCi(y) used in (16-19). This is the measure of how well the point y fits in the class Ci. If you already have data as fuzzy classification, there is nothing you should do here. But if your input classification is crisp, then you should somehow produce μCi(y) from that and this is what the Algorithm 2 does.
There is a trivial solution of μCj(xi) = "1 if xi ∈ Cj and 0 otherwise" but this is not fuzzy at all. The process of building fuzzy data is called "fuzzification". The idea behind the Algorithm 2 is that we assume that every class Cj is actually some kind of a cluster in the space of attributes. And so we can measure the degree of membership μCj(xi) as the distance from the xi to the center of the cluster cj (the closer we are, the higher the membership should be so it is really some inverse of a distance). Note that since distance is measured by attributes, you should normalize your attributes somehow or one of them might dominate the distance. And this is exactly what the Algorithm 2 does:
it estimates the center of the cluster for class Cj as the center of mass of all the known points in that class i.e. just an average of all points by each coordinate (attribute).
it calculates the distances from each point xi to each estimated center of class cj
looking into formula at step #12 it uses inverse square of the distance as a measure of proximity and just normalizes the value because for fuzzy sets Sum[over all Cj](μCj(xi)) should be 1

Efficient Computation of The Least Fixed Point of A Polynomial

Let P(x) denote the polynomial in question. The least fixed point (LFP) of P is the lowest value of x such that x=P(x). The polynomial has real coefficients. There is no guarantee in general that an LFP will exist, although one is guaranteed to exist if the degree is odd and ≥ 3. I know of an efficient solution if the degree is 3. x=P(x) thus 0=P(x)-x. There is a closed-form cubic formula, solving for x is somewhat trivial and can be hardcoded. Degrees 2 and 1 are similarly easy. It's the more complicated cases that I'm having trouble with, since I can't seem to come up with a good algorithm for arbitrary degree.
EDIT:
I'm only considering real fixed points and taking the least among them, not necessarily the fixed point with the least absolute value.
Just solve f(x) = P(x) - x using your favorite numerical method. For example, you could iterate
x_{n + 1} = x_n - P(x_n) / (P'(x_n) - 1).
You won't find closed-form formula in general because there aren't any closed-form formula for quintic and higher polynomials. Thus, for quintic and higher degree you have to use a numerical method of some sort.
Since you want the least fixed point, you can't get away without finding all real roots of P(x) - x and selecting the smallest.
Finding all the roots of a polynomial is a tricky subject. If you have a black box routine, then by all means use it. Otherwise, consider the following trick:
Form M the companion matrix of P(x) - x
Find all eigenvalues of M
but this requires you have access to a routine for finding eigenvalues (which is another tricky problem, but there are plenty of good libraries).
Otherwise, you can implement the Jenkins-Traub algorithm, which is a highly non trivial piece of code.
I don't really recommend finding a zero (with eg. Newton's method) and deflating until you reach degree one: it is very unstable if not done properly, and you'll lose a lot of accuracy (and it is very difficult to tackle multiple roots with it). The proper way do do it is in fact the above-mentioned Jenkins-Traub algorithm.
This problem is trying to find the "least" (here I'm not sure if you mean in magnitude or actually the smallest, which could be the most negative) root of a polynomial. There is no closed form solution for polynomials of large degree, but there are myriad numerical approaches to finding roots.
As is often the case, Wikipedia is a good place to begin your search.
If you want to find the smallest root, then you can use the rule of signs to pin down the interval where it exists and then use some numerical method to find roots in that interval.

What do you think of this interest point detection algorithm?

I've been trying to come up with an interest point detection algorithm and this is what I came up with:
You go through the X and the Y axises 3n pixels at a time creating 3n x 3n squares.
For the the n x n square in the middle of the 3n x 3n square (let's call it square Z), the R, G, and B values are averaged and rounded to preset values to limit the number of colors, and that is the color that square will be treated as.
The same is done for the 8 surrounding n x n squares.
After that, the color of square Z is compared to the surrounding squares, if it matches x out of the 8 surrounding squares where x <= 3 or x => 5 then that is an interest point (a corner is detected).
And so on till all the image is covered.
The bigger n is, the faster the image will be scanned and the the less accurate the detection is, and vice versa.
This, supposedly, detects "literal corners", that is corners you can actually SEE on the image.
What do you think of this algorithm? Is it efficient? Can it be used on a live video stream (say from the camera) on a hand-held device?
I'm sorry to say that I don't think this is likely to be very good. Your algorithm looks a bit like a simplistic version of Moravec's algorithm, which is itself one of the simplest corner detection algorithms. The hardcoded limits you test against effectively make your edge test a stepped function, unlike an approach such as summed square differences. This will almost certainly give you discontinuities in your detection function (corners that don't match when they should have), for some values.
You also have the same problem as Moravec, namely that if the edge lies at an angle to the direction of neighbours being considered, then it won't be detected.
Developing algorithms is fun, and if this isn't a business-critical project, then by all means, carry on tinkering and experimenting (and don't be put off by my comments!). But the fact is, for almost any practical problem, a better algorithm for the task you want to solve almost certainly already exists. The real challenge is identifying how you can best model your problem in such a way that you can solve it using an existing, well-understood approach, designed by experts.
In particular, robust identification and analysis of edge-cases and worst-case runtimes is a tricky business; unless you are a professional algorist, you are likely to find the going difficult. But I certainly encourage you to discover this for yourself by trying. nlucaroni mentions some excellent questions to use as starting points for your analysis.
Why not try it and see if it works the way you expect? It sounds like it should. How does the performance compare with other methods? What is the complexity of the algorithm? Is it efficient compared to others? Where can it be improved? What kind of false-positives and false negatives are expected? Are they within reason based on the data I plan to use this on? What threshold should be used to compare surrounding squares? ....
this is stuff you should be doing, not us.
I would suggest you look at the SIFT algorithm. Its the defacto standard for points of interest in an image. Unfortunately, its also patented, because its so good.
If you are interested in a real time version of SIFT you can get it to run on a GPU, but its highly experimental at this point. Note if you are developing a commercial application you'd have to first purchase a license for using SIFT or get approval from David Lowe.

Resources