Implementing a Least Squares Kernel classifier - matrix

I am trying to find the equation I would need to use in order to implement a Least Squares Kernel classifier for a dataset with N samples of feature length d. I have the kernel equation k(x_i, x_j) and I need the equation to pug it into to get the length-d vector used to classify future data. No matter where I look/google, Although there are dozens of powerpoints and pdfs that seem to give me almost what I'm looking for, I can't find a resource which can give me a straight answer.
note: I am not looking for the programming-language tool that computes this for me such as lsqlin, but the mathematical formula.

Least Squares Kernel SVM (what I assume your actually asking about) is equivalent to Kernelized Ridge Regression. This is the simplest what to implement it, and the solution can be found here, assume you have the appropriate background.

Related

Continous action-state-space and tiling

After getting used to the Q-Learning algorithm in discrete action-state-space I would like to expand this now to continous spaces. To do this I read the chapter On-Policy Control with Approximation of SuttonĀ“s introduction. Here, the usage of differentiable functions like a linear function or an ANN are recommended to solve the problem of continous action-state-space. Nevertheless Sutton then discribes the tiling method which maps the continous variables onto a discrete presentation. Is this always necessary?
Trying to understand this methods I tried to implement the example of the Hill Climbing Car in the book without the tiling method and a linear base function q. As my state space is 2 dimensional, and my action is one dimensional I used a three dimensional weight vector w in this equation:
When I now try to choose the action which will maximize the output, the obvious answer will be a=1, if w_2 > 0. Therefore, the weight will slowly converge to positive zero and the agent will not learn anything useful. As Sutton is able to solve the problem using the tiling I am wondering if my problem is caused by the absence of the tiling method or if I am doing anything else wrong.
So: Is the tiling always necessary?
Regarding your main question about tiling, the answer is no, not always it is necessary using tiling.
As you tried, it's a good idea to implement some easy example as the Hill Climbing Car in order to fully understand the concepts. Here, however, you are misundertanding something important. When the book talks about linear methods, it is refering to linear in the parameters, which means that you can extract a set of (non linear) features and combine them linearly. This kind of approximators can represent functions much more complex than a standard linear regression.
The parametrization you have proposed it's not able to represent a non-linear Q function. Taking into account that in the Hill Climbing problem you want to learn Q-functions of this style:
You will need something more powefull than . An easy solution for your problem could be to use a Radial Basis Function (RBF) network. In this case, you use a set of features (or BF, like for example Gaussians functions) to map your state space:
Additionally, if your action space is discrete and small, the easiest solution is to maintain an independent RBF network for each action. For selecting the action, simply compute the Q value for each action and select the one with higher value. In this way you avoid the (complex) optimization problem of selecting the best action in a continuous function.
You can find a more detailed explanation on the Busoniu et al. book Reinforcement Learning and Dynamic Programming Using Function Approximators, pages 49-51. It's available for free here.

How to Scale SPICE Matrix so LU-decomposition doesn't Fail

I am implementing a SPICE solver. I have the following problem: say I put two diodes and a current source in serial (standard diodes). I use MNA and Boost LU-decomposition. The problem is that the nodal matrix becomes very quickly near-singular. I think I have to scale the values but I don't know how and I couldn't find anything on the Internet. Any ideas how to do this scaling?
In the perspective of numerical, there is a scale technique for this kind of near-singular matrices. Basically, this technique is to divide each row of A by the sum (or maximum) of the absolute values in that row. You can find KLU which is a linear solver for circuit simulations for more details.
In perspective of SPICE simulation, it uses so-call Gmin stepping technique to iteratively compute and approach a real answer. You can find this in the documents of a SPICE project QUCS (Quite Universal Circuit Simulator).
Scaling does not help when the matrix has both very large and very small entries.
It is necessary to use some or all of the many tricks that were developed for circuit solver applications. A good start is clipping the range of the exponential and log function arguments to reasonable values -- in most circuits a diode forward voltage is never more than 1V and the diode reverse current not less than 1pA.
Actually, look at all library functions and wrap them in code that makes their arguments and results suitable for circuit-solving purposes. Simple clipping is sometimes good enough, but it is way better to make sure the functions stay (twice) differentiable and continuous.

Edge detection : Any performance evaluation technique?

I am working on edge detection in images and would like to evaluate the performance of algorithm, if any any one could give me a reference or method on how to proceed it will be really helpful. :)
I do not have ground truth and data set includes color as well as gray images.
Thank you.
Create a synthetic data set with known edges, for example by 3D rendering, by compositing 2D images with precise masks (as may be obtained in royalty free photosets), or by introducing edges directly (thin/faint lines). Remember to add some confounding non-edges that look like edges, of a type appropriate for what you're tuning for.
Use your (non-synthetic) data set. Run the reference algorithms that you want to compare against. Also produce combinations of the reference algorithms, for example by voting (majority, at least K out of N, etc). Calculate stats on your algo vs reference algo performance, in terms of (a) number of points your algo classifies as edge which each reference algo, or the combination, does not classify as edge (false positive), or (b) number of points which the reference algo classifies as edge that your algo does not (false negative). You can also calculate a rank correlation-type number for algos by looking at each point and looking at which algos do (or don't) classify that as an edge.
Create ground truth manually. Use reference edge-finding algos as a starting point, then fix up by hand. Probably valuable to do for a small number of images in any case.
Good luck!
For comparisons, quantitative measures like what #Alex I explained is best. To do so, you need to define what is "correct" with a ground truth set and a way to consistently determine if a given image is correct or on a more granular level, how correct (some number like a percentage) it is. #Alex I gave a way to do that.
Another option that is often used in graphics research where there is no ground truth is user studies. Usually less desirable as they are time consuming and often more costly. However, if it is a qualitative improvement that you are after or if a quantitative measurement is just too hard to do, a user study is an appropriate solution.
When I mean user study I mean to poll people on how well a result is given the input image. You could give them a scale to rate things on and randomly give them samples from both your results and the results of another algorithm
And of course, if you still want more ideas, be sure to check out edge detection papers to see how they measured their results (I'd actually look here first as they've already gone through this same process and determined what was best for them: google scholar).

Lapack's row reduction

I am trying to write a function that produces a single solution to an underrepresented system of equations (e.g. the matrix that describes the system is wider than it is tall). In order to do this, I have been looking in the LAPACK documentation for a way to row-reduce a matrix to it's reduced-echelon form, similar to the function rref() in both Mathematica and TI calculators. The closest I came across was http://software.intel.com/en-us/forums/intel-math-kernel-library/topic/53107/ this tiny thread. This thread, however, seems to imply that simply taking the "U" upper triangular matrix (and dividing each row by the diagonal) is the same as the reduced echelon form of a matrix, which I do not believe to be the case. I could code up rref() myself, but I do not believe I could achieve the performance LAPACK is famous for.
1) Is there a better way to simply get any one specific solution to an underrepresented system?
2) If not, is there a way for LAPACK to row-reduce a matrix?
Thanks!
One often used method for this is the least square solution, see lapack's sgelsx

Comparing a "path" (or GPS trail) of a vehicle

I have a bit of a difficult algorithm question, I can't find any suitable algorithm from a lot of searching, so I am hoping that someone here on stackoverflow might know the answer.
I have a set of x,y coordinates for a vehicle as it moves through a 2D space, the coordinates are recorded at "decision points" in the time period (i.e. they have stopped and made a determination of where to move next).
What I want to do is find a mechanism for comparing these trails efficiently (i.e. not going through each point individually). Compounding this is that I am interested in the "pattern" of their movement, not necessarily the individual points they went to. This means that the "path" is considered the same if you reflect it around an axis, or if you rotate it by 90,180 or 270 degrees.
Basically I am trying to distil some sort of "behaviour" to the way they move through the space, then examine the different "behaviours" for classification purposes.
Cheers,
Aidan
This may be way more complicated than you're looking for, but it sounds like what the guys did at astrometry.net may be similar to what you're looking for. Essentially, you can upload a picture of some stars, and it will figure out the position in the sky it belongs, along with rotation, you may be able to use similar pattern matching in what you're looking for.
They have a great pdf explaining how it works here, and apparently you can email them and they'll send you the source code (details are in the pdf).
Edit: apparently you can download the code directly here.
Hope it helps.
there are several approaches you could make:
Using vector paths and translation matricies together with two algorithms, The A* (a star) algorithm ( to locate best routes from what are called greedy functions ), and the "nearest neighbour" algorithm --- these are both commonly used for comparing path efficiencies for routes.
you may not know it but the issue you have is known as the "travelling salesman" problem and has many many approaches.
so look up
traveling salesman problem
A*
Nearest neighbour
also look at
Random walk algorithm - for the most basic approach
for a learned behaviour approach try neural networks "ANN" or genetic algorithms
the mathematics for this type of problem are covered under what is called "graph theory"
It seems that basically what is needed is some metric to compare two(N in general) paths and choose the best one?
If that's the case then I'd suggest plain statistics. I'd start with heading(orientation) histogram, relative(relative to previous heading) heading histogram and so on. Other thing comes to mind - distance/orientation between points covariance. Or just simply make up some kind of "statistics"(number of turns, etc.) and compare those paths using that.

Resources