How to use the Spark Mlib Multilayer Perceptron Weights Array - apache-spark-mllib

I have a requirement where i need to find the relative importance of the attributes used in ANN implementation. I use the spark MLib library MultiLayerPerceptron for implementation. The model gives me a vector which is an array of the weights. I know there are algorithms to derive the relative importance from weights , but the MLib implementation gives out a big single dimensional array and does not tell anything about the weights corresponding to each input. Anyone know how to get the weights corresponding to each input node?

The model flattens the weights matrices with the Breeze manipulation: toDenseVector. (notice the line: val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector)
This manipulation acts like numpy's flatten().
Therefore, to retrieve the weights matrices, you have to do two things:
Split the weights vector to parts, according to your layers. You have to take (layerSize + 1) * nextLayerSize weights per each non-final layer (+1 because of the bias).
For each flattened weight matrix, apply numpy's reshape with parameters (layerSize + 1, nextLayerSize).
When you derive the relative importance from your weights, notice that in the pyspark implementation, the bias is represented as the last feature: .
Therefore the last row in each weight matrix represents the bias value.

Related

Eigenvalues of large symmetric matrices

When I try to compute the eigenvalues of the adjacency matrix of a very large graph I get, what can be charitably described as, garbage. In particular, since the graph is four-regular, the eigenvalues should be in $[-4, 4]$ but they are visibly not. I used Matlab (via MATLink), and got the same problems, so this is clearly an issue that transcends mathematica. The question is: what is the best way to deal with it? I am sure MATLAB and Mathematica use the venerable EISPAK code, so there may be something newer/better
Eigenvalue methods for dense matrices usually proceed by first transforming the matrix into Hessenberg form, here this would result in a tridiagonal matrix. After that some variant of the shifted QR algorithm, like bulge-chasing, is applied to iteratively reduce the non-diagonal elements, splitting the matrix at positions where these become small enough.
But what I would like to draw the attention to is that first step and its structure destroying consequences. It is, for instance, not guaranteed that the tri-diagonal matrix is still symmetrical. This applies also to all further steps if they are not explicitly tailored for symmetric matrices.
But what is much more relevant here is that this step ignores all connectivity or non-connectivity of the graph and potentially connects all nodes, albeit with very small weights, when the transformation is reversed.
Each of the m connected component of the graph gives one eigenvalue 4 with an eigenvector that is 1 at the nodes of the components and 0 else. These eigenspaces have all dimension 1. Any small perturbation of the matrix first removes that separation and joins them in an eigenspace of dimension m and then perturbs this as a multiple eigenvalue. This then can result in an approximately regular m pointed star in the complex plane of radius 4*(1e-15)^(1/m) around the original value 4. Even for medium sized m this gives a substantial deviation from the true eigenvalue.
So in summary, use a sparse method as these usually will first re-order the matrix to be as diagonal as possible, which should give a block-diagonal structure according to the components. Then the eigenvalue method will automatically work on all blocks separately, avoiding the above described mixing. And if possible, use a method for symmetric matrices or set a corresponding option/flag if it exists.

To make a distance matrix or to repeatedly calculate distance

I'm working on K-medoids algorithm implementation. It is a clustering algorithm and one of its steps includes finding the most representative point in a cluster.
So, here's the thing
I have a certain number of clusters
Each cluster contains a certain number of points
I need to find the point in each cluster that results with the least error if it is picked as a cluster representative
Distance from each point to all the other in the cluster needs to be calculated
This distance calculation could be simple as Euclidean or more complex like DTW (Dynamic Time Warping) between two signals
There are two approaches, one is to calculate distance matrix that will save values between all the points in the dataset and the other is to calculate distances during clustering, which results that distances between some points will be calculated repeatedly.
On one hand, to build distance matrix you must calculate distances between all points in the whole dataset and some of calculated values will never be used.
On the other hand, if you don't build the distance matrix, you will repeat some calculations in certain number of iterations.
Which is the better approach?
I'm also considering MapReduce implementation, so opinions from that angle are also welcome.
Thanks
A 3rd approach could be a combination of both, and is lazily evaluating the distance matrix. Initialize a matrix with default values (unrealistic values, like negative ones), and when you need to calculate distance between two points, if the values is already present in the matrix - just take it from it.
Otherwise, calculate it and store it in the matrix.
This approach trades calculations (and is optimal in doing the lowest number of possible pair calculations), for more branches in the code, and a few more instructions. However, due to branch predictors, I assume this overhead will not be that dramatic.
I predict it will have better performance when the calculation is relatively expansive.
Another optimization of it could be to dynamically switch for a plain matrix implementation (and calculate the remaining part of the matrix) when the number of already calculated exceeds a certain threshold. This can be achieved pretty nicely in OOP languages, by switching the implementation of the interface when a certain threshold is met.
Which is actually better implementation is going to rely heavily on the cost of the distance function, and the data you are clustering, as some will need to calculate the same points more often than other data sets.
I suggest doing a benchmark, and using statistical tools to evaluate which method is actually better.

Algorithm for fitting abstract distances in 2D

Suppose we are given a small number of objects and "distances" between them -- what algorithm exists for fitting these objects to points in two-dimensional space in a way that approximates these distances?
The difficulty here is that the "distance" is not distance in Euclidean space -- this is why we can only fit/approximate.
(for those interested in what the notion of distance is precisely, it is the symmetric distance metric on the power set of a (finite) set).
Given that the number of objects is small, you can create an undirected weighted graph, where these objects would be nodes and the edge between any two nodes has the weight that corresponds to the distance between these two objects. You end up with n*(n-1)/2 edges.
Once the graph is created, there are a lot of visualization software and algorithms that correspond to graphs.
Try a triangulation method, something like this;
Start by taking three objects with known distances between them, and create a triangle in an arbitrary grid based on the side lengths.
For each additional object that has not been placed, find at least three other objects that have been placed that you have known distances to, and use those distances to place the object using distance / distance intersection (i.e. the intersection point of the three circles centred around the fixed points with radii of the distances)
Repeat until all objects have been placed, or no more objects can be placed.
For unplaced objects, you could start another similar exercise, and then use any available distances to relate the separate clusters. Look up triangulation and trilateration networks for more info.
Edit: As per the comment below, where the distances are approximate and include an element of error, the above approach may be used to establish provisional coordinates for each object, and those coordinates may then be adjusted using a least squares method such as variation of coordinates This would also cater for weighting distances based on their magnitude as required. For a more detailed description, check Ghilani & Wolf's book on the subject. This depends very much on the nature of the differences between your distances and how you would like your objects represented in Euclidean space based on those distances. The relationship needs to be modelled and applied as part of any solution.
This is an example of Multidimensional Scaling, or more generally, Nonlinear dimensionality reduction. There are a fair amount of tools/libraries for doing this available (see the second link for a list).

Algorithm: 2D transformation, find outlying pairs of points and omit

I am looking for the following type of algorithm:
There are n matched pairs of points in 2D. How can I identify outlying pairs of points according to Affine / Helmert transformation and omit them from the transformation key? We do not know the exact number of such outlying pairs.
I cannot use Trimmed Least Squares method because there is a basic assumption that a k percentage of pairs is correct. But we do not have any information about the sample and do not know the k... In such a sample of all pairs could be correct or vice versa.
Which types of algorithms are suitable for this problem?
Use RANSAC:
Repeat the following steps a fixed number of times:
Randomly select as much pairs as are necessary to compute the transformation parameters.
Compute the parameters.
Compute the subset of pairs that have small projection error (the 'consensus set').
If the consensus set is large enough, compute a projection for it (e.g. with Least Squares).
Computer the consensus set's projection error
Remember the model if it is the best you found so far.
You have to experiment to find good values for
"a fixed number of times"
"small projection error"
"consensus set is large enough".
The simplest approach is compute your transformation based on all points, compute the residuals for each point, remove the points with high residuals until you reach an acceptable transformation or hit the minimum number of acceptable input points. The residual for any given point is the join distance between the forward transformed value for a point, and the intended target point.
Note that the residuals between an affine transformation and a Helmert (conformal) transformation will be very different as these transformations do different things. The non-uniform scale of the affine has more 'stretch' and will hence lead to smaller residuals.

What are eigen values and expansions?

What are eigen values, vectors and expansions and as an algorithm designer how can I use them?
EDIT: I want to know how YOU have used it in your program so that I get an idea. Thanks.
they're used for a lot more than matrix algebra. examples include:
the asymptotic state distribution of a hidden markov model is given by the left eigenvector associated with the eigenvalue of unity from the state transition matrix.
one of the best & fastest methods of finding community structure in a network is to construct what's called the modularity matrix (which basically is how "surprising" is a connection between two nodes), and then the signs of the elements of the eigenvector associated with the largest eigenvalue tell you how to partition the network into two communities
in principle component analysis you essentially select the eigenvectors associated with the k largest eigenvalues from the n>=k dimensional covariance matrix of your data and project your data down to the k dimensional subspace. the use of the largest eigenvalues ensures that you're retaining the dimensions that are most significant to the data, since they are the ones that have the greatest variance.
many methods of image recognition (e.g. facial recognition) rely on building an eigenbasis from known data (a large set of faces) and seeing how difficult it is to reconstruct a target image using the eigenbasis -- if it's easy, then the target image is likely to be from the set the eigenbasis describes (i.e. eigenfaces easily reconstruct faces, but not cars).
if you're in to scientific computing, the eigenvectors of a quantum hamiltonian are those states that are stable, in that if a system is in an eigenstate at time t1, then at time t2>t1, if it hasn't been disturbed, it will still be in that eigenstate. also, the eigenvector associated with the smallest eigenvalue of a hamiltonian is the ground state of a system.
Eigen vectors and corresponding eigen values are mainly used to switch between different coordinate systems. This might simplify problems and computations enormously by moving the problem sphere to from one coordinate system to another.
This new coordinates system has the eigen vectors as its base vectors, i.e. they "span" this coordinate system. Since they can be normalized, the transformation matrix from the first coordinate system is "orthonormal", that is the eigen vectors have magnitude 1 and are perpendicular to each other.
In the transformed coordinate system, a linear operation A (matrix) is pure diagonal. See Spectral Theorem, and Eigendecomposition for more information.
A quick implication is for example that you can from a general quadratic curve:
ax^2 + 2bxy + cy^2 + 2dx + 2fy + g = 0
rewrite it as
AX^2 + BY^2 + C = 0
where X and Y are counted along the direction of the eigen vectors.
Cheers !
check out http://mathworld.wolfram.com/Eigenvalue.html
Using eigen values in algorithms will need you to be proficient with the math involved.
I'm absolutely the wrong person to be talking about math: I puke on it.
cheers, jrh.
Eigen values and vectors are used in matrix computation as finding of reverse matrix. So if you need to write math code, precomputing them can speed some operations.
In short, you need them if you do matrix algebra, linear algebra etc.
Using the notation favored by physicists, if we have an operator H, then |x> is an eigenstate of H if and only if
H|x> = h|x>
where we call h the eigenvalue associated with the eigenvector |x> under H.
(Here the state of the system can be represented by a matrix, making this math isomorphic with all the other expressions already linked.)
Which brings us to the uses of these things once they have been discovered:
The full set of eigenvectors of a system under a given operator form an orthagonal spanning set for they system. This set may be a basis if there is no degeneracy. This is very useful because it allows extremely compact expressions of arbitrary (non eigen-) states of the system.

Resources