Classification of multidimensional data - algorithm

I would like to classify some multidimensional data:
The input data is as follows:
Data1: [[a1,b1,f1], [a2,b2,f2], ... [an,bn,fn]] where: fn = F(an,bn) --> ClassA
Data2: [[c1,d1,g1], [c2,d2,g2], ... [cn,dn,gn]] where: gn = G(cn,dn) --> ClassB
...
So, given Datax, as follows, we would like to classify it into one of the finite classes we have:
Datax: [[x1,y1,z1], [x2,y2,z2], ... [xn,yn,zn]] where: zn = Z(xn,yn) --> which class?
I could probably flatten the array for each record and train my classifier:
Data1: [a1,b1,f1,a2,b2,f2,...,an,bn,fn]
But I thought because the third values themselves are a function of the first two values (e.g. fn = F(an,bn)), I should consider that relationship in my training rather than going for a flat array.
Does it make any difference? or what is the best approach to solve this problem?

If the 3rd data of each tuple is the product of the same deterministic function (that can be different in each row but must the same for each triple of the row)
then you can simply cut of zn because it does not bring any new information.
ex: z1 = 3x1 + 2y1 ; z2 = 3x1 + 2y1 ; [...] ; zn = 3xn + 2yn
If it is not the case then you should leave z1.
Said this, I think you can flatten the array because most models would automatically understand those kind of dependancies.

Related

How do I calculate inner product of two vectors in nalgebra?

From the following
let v = OVector::<f64, U2>::from_column_slice(&[3_f64, 4_f64]);
let x = &v.transpose() * &v; // get the inner product, i.e. <v,v>
I expected x to be a f64 scalar, i.e. x = 25.0.
But actually, I can only obtain x as OMatrix::<f64, Cosnt<1>, Const<1>>.
The case can be even worse in matrix product operations. for example, the following code doesn't work since v^T v is not a scalar.
let m = OMatrix::<f64, U2, U2>::from_element(1.0);
let v = OVector::<f64, U2>::from_column_slice(&[3_f64, 4_f64]);
// not working
let y = &v.transpose() * &v * m; // types conflict
// working
let y = 25.0 * m; // expected to behave like this
What is the correct way to do this?
Usually, in maths, you would identify 1x1 matrices with scalars (because, for some definition of being equivalent, they are equivalent...). When doing this, the dot product of two vectors is exactly the dot product between two matrices, when we see vectors as matrix columns (which are also equivalent for some equivalence...).
However, here, it is not the case: Rust has to know what is the type of the data. So, I would suggest, since you are using matrices to start with, to use the actual matrix dot product, not the vector one. It's simply (v.transpose()*v).trace(). This is a more general dot product, but notice taking the trace will exactly "extract" the scalar from the 1x1 matrix.
Otherwise, this operation is already defined as the dot product (unsurprisingly): v.dot(v).

Matrix valued undefined functions in SymPy

I'm looking for a possibility to specify matrix quantities that depend on a variables. For scalars that works as follows, using undefined functions:
from sympy import *
x = Function('f')(t)
diff(x,t)
For Matrix Symbols like
x = MatrixSymbol('x',3,3)
i cannot find an equivalent. There is
i,j = Symbols('i j')
x = FunctionMatrix(6,1,Lambda((i,j),f))
but this is not what i need as you need to specify the contents of the matrix. The context is that i have equations
which should be derived in time and contain matrix valued elements.
I cannot deal with the elements of the matrices one by one.
Thanks!
I'm not sure about what you want, but I think you want to make a Matrix with differentiable elements. In that case, see if this works for you.
Create a matrix with function elements:
X = sym.FunctionMatrix(6,1,lambda i,j:sym.Function("x_%d%d" % (i,j))(t))
M = sym.Matrix(X)
M.diff(t)
This results in
Matrix([
[Derivative(x_00(t), t)],
[Derivative(x_10(t), t)],
[Derivative(x_20(t), t)],
[Derivative(x_30(t), t)],
[Derivative(x_40(t), t)],
[Derivative(x_50(t), t)]])
You may then replace stuff as you need.
Also, it may be preferrable if you populate the matrix with the expressions you need before differentiating. Leaving them as undefined functions may make it harder for you to simplify after substitution.

Scala: How to create a new list from the weighted difference of two lists added to a third

As part of an implementation of the Differential Evolution algorithm I need to implement the 'mutation' step:
Pick three members from the population at random, they must be distinct from each other as well as from a given member
Calculate a donor member by adding the weighted difference of two of the vectors to the third
This is what I came up with:
// Algorithm types
type Member = List[Double]
type Generation = Vector[Member]
def mutate(index: Int, generation: Generation): Member = {
// Create a random number stream with distinct values
val selector = Stream.continually(Random.nextInt(N)).distinct
// Select 3 mates from the generation
val mates = selector.filter(_ != index).take(3).map(generation(_))
// Calculate the donor member
(mates(0), mates(1), mates(2)).zipped map {
case (e1, e2, e3) => e1 + F * (e2 - e3)
}
}
(I implemented the algorithm as explained here)
Now my question; Is there a better way to implement this step? I have been trying to find a better way to select 3 lists from a vector and zip them together but I couldn't find anything other then putting the selected lists in a tuple manually. The scala compiler gives a warning that instead of mates(0) one should use mates.head, which gives me an indication that this could be implemented in a more elegant way.
Thanks in advance!
You can transpose your mates, and than map over it with a Seq extractor:
mates.transpose map {
case Seq(e1, e2, e3) => e1 + F * (e2 - e3)
}
This will be a Stream[Double], so to get a Member, you'd have to call toList on it, or use mates.toList.transpose ...

find the intersection of two array structs in Matlab

How can I find the following intersection of two array structs in Matlab.
For example, I have two struct arrays a and b:
a(1)=struct('x',1,'y',1);
a(2)=struct('x',3,'y',2);
a(3)=struct('x',4,'y',3);
a(4)=struct('x',5,'y',4);
a(5)=struct('x',1,'y',5);
b(1)=struct('x',1,'y',1);
b(2)=struct('x',3,'y',5);
I want to find the intersection of a and b as follows:
c = intersect(a,b)
where c should be
c = struct('x',1,'y',1);
But when it seems wrong when I type intersect(a,b) since the elements of a and b are both structures. How can I combat this difficulty. Thanks.
The elegant solution would have been to supply intersect with a comparator operator (like in , e.g., C++).
Unfortunaetly, Matlab does not seem to support this kind of functionality/flexibility.
A workaround for your problem would be
% convert structs into matrices
A = [[a(:).x];[a(:).y]]';
B = [[b(:).x];[b(:).y]]';
% intersect the equivalent representation
[C, ia, ib] = intersect( A, B, 'rows' );
% map back to original structs
c = a(ia);
Alternatively, have you considered replacing your structs with class objects derived from handle class? It might be possible to overload the relational operators of the class and then it should be possible to sort the class objects directly (I haven't looked closely into this solution - it's just a proposal off the tip of my head).
A more general variant of Shai's approach is:
A = cell2mat(permute(struct2cell(a), [3 1 2]));
B = cell2mat(permute(struct2cell(b), [3 1 2]));
[C, ia] = intersect(A, B, 'rows');
c = a(ia);
This way you don't need to explicitly specify all the struct fields. Of course, this won't work if the struct fields contain non-numeric values.
Generalized approach for fields of any type and dimensions
If you're uncertain about the type and size of the data stored in your structs, interesect won't cut it. Instead, you'll have to use isequal with a loop. I'm using arrayfun here for elegancy:
[X, Y] = meshgrid(1:numel(a), 1:numel(b));
c = a(any(arrayfun(#(m, n)isequal(a(m), b(n)), X, Y)));
A systematic approach would be to produce a hash - and then use intersect:
hash_fun = #(x) sprintf('x:%g;y:%g',x.x,x.y);
ha = arrayfun(hash_fun,a,'UniformOutput',false);
hb = arrayfun(hash_fun,b,'UniformOutput',false);
[hi,ind_a,ind_b]=intersect(ha,hb)
res=a(ind_a) % result of intersection

What is the name of this geometrical function?

In a two dimentional integer space, you have two points, A and B. This function returns an enumeration of the points in the quadrilateral subset bounded by A and B.
A = {1,1} B = {2,3}
Fn(A,B) = {{1,1},{1,2},{1,3},{2,1},{2,2},{2,3}}
I can implement it in a few lines of LINQ.
private void UnknownFunction(Point to, Point from, List<Point> list)
{
var vectorX = Enumerable.Range(Math.Min(to.X, from.X), Math.Abs(to.X - from.Y) + 1);
var vectorY = Enumerable.Range(Math.Min(to.Y, from.Y), Math.Abs(to.Y - from.Y) + 1);
foreach (var x in vectorX)
foreach (var y in vectorY)
list.Add(new Point(x, y));
}
I'm fairly sure that this is a standard mathematical operation, but I can't think what it is.
Feel free to tell me that it's one line of code in your language of choice. Or to give me a cunning implementation with lambdas or some such.
But mostly I just want to know what it's called. It's driving me nuts.
It feels a little like a convolution, but it's been too long since I was at school for me to be sure.
It's the Cartesian product of the sets {1,2} and {1,2,3} in your specific example, or generally the Cartesian product of the vectorX and vectorY in your code example.
I don't know that this is a standard mathematical operation, if you wanted to describe it mathematically it would be described as such.
Given two points, (x_1,x_2) and (y_1,y_2) in N^2. Then take min_1 to be min(x_1,y_1) and max_1 to be max(x_1,y_1) and symetric operations for min_2 and max_2. Then the set is defined as:
Enum = { (a,b) : a,b in N^2 and min_1 <= a <= max_1 and min_2 <= b <= max_2 }
Which seems pretty arbitrary to me and I would say that it doesn't seem like a fairly standard mathematical operation to me.
Solving it using the Cartesian product becomes, trickier. It was simple to use the cartesian product when you have points that are so close together, but what about when you have {1,1} and {8,8}. Then the problem is a little more involved. You take the two sets:
{ a: min(x_1,y_1) <= a <= max(x_1,y_1) } and {b : min(x_2,y_2) <= b <= max(x_2,y_2) }
In both instances you're simply taking all the values in the range and enumerating across the space. Once again though, it feels like an arbitrary operation and maybe I'm wrong, but I don't think this has a well-known name. Besides enumerating the points in a rectangle.
Integer / lattice points in bounds / rectangle.
(Similar to the name of http://en.wikipedia.org/wiki/Integer_points_in_convex_polyhedra)
Cartesian Product using list comprehensions in
Python
[(x,y) for x in [1,2] for y in [1,2,3] ]
and Haskell
[(x,y) | x <- [1,2] , y <- [1,2,3] ]

Resources