find all indices of multiple value pairs in a matrix - performance

Suppose I have a matrix A, containing possible value pairs and a matrix B, containing all value pairs:
A = [1,1;2,2;3,3];
B = [1,1;3,4;2,2;1,1];
I would like to create a matrix C that contains all pairs that are allowed by A (i.e. C = [1,1;2,2;1,1]).
Using C = ismember(A,B,'rows') will only show the first occurence of 1,1, but I need both.
Currently I use a for-loop to create C, which looks like:
TFtot = false(size(B(:,1,1),1);
for i = 1:size(a(:,1),1)
TF1 = A(i,1) == B(:,1) & A(i,2) = B(:,2);
TFtot = TF1 | TFtot;
end
C = B(TFtot,:);
I would like to create a faster approach, because this loop currently greatly slows down the algorithm.

You're pretty close. You just need to swap B and A, then use this output to index into B:
L = ismember(B, A, 'rows');
C = B(L,:);
How ismember works in this particular case is that it outputs a logical vector that has the same number of rows as B where the ith value in B tells you whether we have found this ith row somewhere in A (logical 1) or if we haven't found this row (logical 0).
You want to select out those entries in B that are seen in A, and so you simply use the output of ismember to slice into B to extract out the affected rows, and grab all of the columns.
We get for C:
>> C
C =
1 1
2 2
1 1

Here's an alternative using bsxfun:
C = B(all(any(bsxfun(#eq, B, permute(A, [3 2 1])),3),2),:);
Or you could use pdist2 (Statistics Toolbox):
B(any(~pdist2(A,B),1),:);

Using matrix-multiplication based euclidean distance calculations -
Bt = B.'; %//'
[m,n] = size(A);
dists = [A.^2 ones(size(A)) -2*A ]*[ones(size(Bt)) ; Bt.^2 ; Bt];
C = B(any(dists==0,1),:);

Related

Can I put multiple functions into one matrix using iteration in Julia?

I am new to Julia and trying to see whether I can put different functions as an element of a Mtrix
I constructed a matrix B (2x2)
And I want to put function,for example, x^1 + 1x as a (1,1) element, x^1 + 2x as a (1,2) element, x^2 + 1x as a (2,1) element and x^2 + 2x as a (2,2) element
I was planning to do something like as below but couldn't find a way to implement this. Is it possible to make such Matrix?
B = Array{Function}(undef,2,2)
agrid = 1:1:2
dgrid = 1:1:2
for(i_a,a) in enumerate(agrid)
for(i_d,d) in enumerate(dgrid)
B[i_a,i_d](x) = x^a+d*x
end
end
The reason I want to construct this matrix is to solve the following model.
I need to solve the equation with two variables 'm' and 'n' given 'a' and 'd'.
And I thought if I have a matrix consisting of each function with all possible combinations of 'a' and 'd'(in the sample code, the possible combination would be (1,1) (1,2) (2,1) (2,2)), then it would be easier to solve the model at once.
Any help will be appreciated.
Thank you.
Yes it's possible. You could insert Functions one by one into the B you have. In this case, there's a pattern so you could have built B with anonymous functions and comprehension:
M = 2
N = 2
x = 10
B = [ ((y) -> (y^a + b*y)) for a in 1:M, b in 1:N]
result1 = x .|> B # result1[i] = B[i](x)
I'd rather not tie the array's indices to the functions though; you would have to keep making a new Function matrix for different M and N. Instead, you could make a more memory-efficient CartesianIndices matrix and use a function that takes in CartesianIndex:
# memory-efficient matrix of CartesianIndex{2}
indices = CartesianIndices((M,N))
# if x were an MxN Matrix, you could also do CartesianIndices(x)
f(y, a, b) = y^a + b*y
g(z, index::CartesianIndex{2}) = f(z, index[1], index[2])
result2 = g.(x, indices) # result2[i] = g(x, indices[i])
As an aside, it appears that in the anonymous function comprehension used to create my B, only 1 actual method is compiled, and each element is similar to an instance of a functor with a and b as fields:
struct P{A,B} <: Function # subtype Function to be similar to B
a::A
b::B
end
(p::P)(y) = y^(p.a) + (p.b)*x
PB = [P(a, b) for a in 1:M, b in 1:N]
result3 = x .|> PB # result3[i] = PB[i](x) = P(a, b)(x)

How to extract optimization problem matrices A,b,c using JuMP in Julia

I create an optimization model in Julia-JuMP using the symbolic variables and constraints e.g. below
using JuMP
using CPLEX
# model
Mod = Model(CPLEX.Optimizer)
# sets
I = 1:2;
# Variables
x = #variable( Mod , [I] , base_name = "x" )
y = #variable( Mod , [I] , base_name = "y" )
# constraints
Con1 = #constraint( Mod , [i in I] , 2 * x[i] + 3 * y[i] <= 100 )
# objective
ObjFun = #objective( Mod , Max , sum( x[i] + 2 * y[i] for i in I) ) ;
# solve
optimize!(Mod)
I guess JuMP creates the problem in the form minimize c'*x subj to Ax < b before it is passes to the solver CPLEX. I want to extract the matrices A,b,c. In the above example I would expect something like:
A
2×4 Array{Int64,2}:
2 0 3 0
0 2 0 3
b
2-element Array{Int64,1}:
100
100
c
4-element Array{Int64,1}:
1
1
2
2
In MATLAB the function prob2struct can do this https://www.mathworks.com/help/optim/ug/optim.problemdef.optimizationproblem.prob2struct.html
In there a JuMP function that can do this?
This is not easily possible as far as I am aware.
The problem is stored in the underlying MathOptInterface (MOI) specific data structures. For example, constraints are always stored as MOI.AbstractFunction - in - MOI.AbstractSet. The same is true for the MOI.ObjectiveFunction. (see MOI documentation: https://jump.dev/MathOptInterface.jl/dev/apimanual/#Functions-1)
You can however, try to recompute the objective function terms and the constraints in matrix-vector-form.
For example, assuming you still have your JuMP.Model Mod, you can examine the objective function closer by typing:
using MathOptInterface
const MOI = MathOptInterface
# this only works if you have a linear objective function (the model has a ScalarAffineFunction as its objective)
obj = MOI.get(Mod, MOI.ObjectiveFunction{MOI.ScalarAffineFunction{Float64}}())
# take a look at the terms
obj.terms
# from this you could extract your vector c
c = zeros(4)
for term in obj.terms
c[term.variable_index.value] = term.coefficient
end
#show(c)
This gives indeed: c = [1.;1.;2.;2.].
You can do something similar for the underlying MOI.constraints.
# list all the constraints present in the model
cons = MOI.get(Mod, MOI.ListOfConstraints())
#show(cons)
in this case we only have one type of constraint, i.e. (MOI.ScalarAffineFunction{Float64} in MOI.LessThan{Float64})
# get the constraint indices for this combination of F(unction) in S(et)
F = cons[1][1]
S = cons[1][2]
ci = MOI.get(Mod, MOI.ListOfConstraintIndices{F,S}())
You get two constraint indices (stored in the array ci), because there are two constraints for this combination F - in - S.
Let's examine the first one of them closer:
ci1 = ci[1]
# to get the function and set corresponding to this constraint (index):
moi_backend = backend(Mod)
f = MOI.get(moi_backend, MOI.ConstraintFunction(), ci1)
f is again of type MOI.ScalarAffineFunction which corresponds to one row a1 in your A = [a1; ...; am] matrix. The row is given by:
a1 = zeros(4)
for term in f.terms
a1[term.variable_index.value] = term.coefficient
end
#show(a1) # gives [2.0 0 3.0 0] (the first row of your A matrix)
To get the corresponding first entry b1 of your b = [b1; ...; bm] vector, you have to look at the constraint set of that same constraint index ci1:
s = MOI.get(moi_backend, MOI.ConstraintSet(), ci1)
#show(s) # MathOptInterface.LessThan{Float64}(100.0)
b1 = s.upper
I hope this gives you some intuition on how the data is stored in MathOptInterface format.
You would have to do this for all constraints and all constraint types and stack them as rows in your constraint matrix A and vector b.
Use the following lines:
Pkg.add("NLPModelsJuMP")
using NLPModelsJuMP
nlp = MathOptNLPModel(model) # the input "< model >" is the name of the model you created by JuMP before with variables and constraints (and optionally the objective function) attached to it.
x = zeros(nlp.meta.nvar)
b = NLPModelsJuMP.grad(nlp, x)
A = Matrix(NLPModelsJuMP.jac(nlp, x))
I didn't try it myself. But the MathProgBase package seems to be able to provide A, b, and c in matrix form.

Fast way of checking if an element is ranked higher than another

I am writing in MATLAB a program that checks whether two elements A and B were exchanged in ranking positions.
Example
Assume the first ranking is:
list1 = [1 2 3 4]
while the second one is:
list2 = [1 2 4 3]
I want to check whether A = 3 and B = 4 have exchanged relative positions in the rankings, which in this case is true, since in the first ranking 3 comes before 4 and in the second ranking 3 comes after 4.
Procedure
In order to do this, I have written the following MATLAB code:
positionA1 = find(list1 == A);
positionB1 = find(list1 == B);
positionA2 = find(list2 == A);
positionB2 = find(list2 == B);
if (positionA1 <= positionB1 && positionA2 >= positionB2) || ...
(positionA1 >= positionB1 && positionA2 <= positionB2)
... do something
end
Unfortunately, I need to run this code a lot of times, and the find function is really slow (but needed to get the element position in the list).
I was wondering if there is a way of speeding up the procedure. I have also tried to write a MEX file that performs in C the find operation, but it did not help.
If the lists don't change within your loop, then you can determine the positions of the items ahead of time.
Assuming that your items are always integers from 1 to N:
[~, positions_1] = sort( list1 );
[~, positions_2] = sort( list2 );
This way you won't need to call find within the loop, you can just do:
positionA1 = positions_1(A);
positionB1 = positions_1(B);
positionA2 = positions_2(A);
positionB2 = positions_2(B);
If your loop is going over all possible combinations of A and B, then you can also vectorize that
Find the elements that exchanged relative ranking:
rank_diff_1 = bsxfun(#minus, positions_1, positions_1');
rank_diff_2 = bsxfun(#minus, positions_2, positions_2');
rel_rank_changed = sign(rank_diff_1) ~= sign(rank_diff_2);
[A_changed, B_changed] = find(rel_rank_changed);
Optional: Throw out half of the results, because if (3,4) is in the list, then (4,3) also will be, and maybe you don't want that:
mask = (A_changed < B_changed);
A_changed = A_changed(mask);
B_changed = B_changed(mask);
Now loop over only those elements that have exchanged relative ranking
for ii = 1:length(A_changed)
A = A_changed(ii);
B = B_changed(ii);
% Do something...
end
Instead of find try to compute something like this
Check if there is any exchanged values.
if logical(sum(abs(list1-list2)))
do something
end;
For specific values A and B:
if (list1(logical((list1-list2)-abs((list1-list2))))==A)&&(list1(logical((list1-list2)+abs((list1-list2))))==B)
do something
end;

How to remove those rows of matrix A, which have equal values with matrix B in specified columns in Matlab?

I have two matrices in Matlab A and B, which have equal number of columns but different number of rows. The number of rows in B is also less than the number of rows in A. B is actually a subset of A.
How can I remove those rows efficiently from A, where the values in columns 1 and 2 of A are equal to the values in columns 1 and 2 of matrix B?
At the moment I'm doing this:
for k = 1:size(B, 1)
A(find((A(:,1) == B(k,1) & A(:,2) == B(k,2))), :) = [];
end
and Matlab complains that this is inefficient and that I should try to use any, but I'm not sure how to do it with any. Can someone help me out with this? =)
I tried this, but it doesn't work:
A(any(A(:,1) == B(:,1) & A(:,2) == B(:,2), 2), :) = [];
It complains the following:
Error using ==
Matrix dimensions must agree.
Example of what I want:
A-B in the results means that the rows of B are removed from A. The same goes with A-C.
try using setdiff. for example:
c=setdiff(a,b,'rows')
Note, if order is important use:
c = setdiff(a,b,'rows','stable')
Edit: reading the edited question and the comments to this answer, the specific usage of setdiff you look for is (as noticed by Shai):
[temp c] = setdiff(a(:,1:2),b(:,1:2),'rows','stable')
c = a(c,:)
Alternative solution:
you can just use ismember:
a(~ismember(a(:,1:2),b(:,1:2),'rows'),:)
Use bsxfun:
compare = bsxfun( #eq, permute( A(:,1:2), [1 3 2]), permute( B(:,1:2), [3 1 2] ) );
twoEq = all( compare, 3 );
toRemove = any( twoEq, 2 );
A( toRemove, : ) = [];
Explaining the code:
First we use bsxfun to compare all pairs of first to column of A and B, resulting with compare of size numRowsA-by-numRowsB-by-2 with true where compare( ii, jj, kk ) = A(ii,kk) == B(jj,kk).
Then we use all to create twoEq of size numRowsA-by-numRowsB where each entry indicates if both corresponding entries of A and B are equal.
Finally, we use any to select rows of A that matches at least one row of B.
What's wrong with original code:
By removing rows of A inside a loop (i.e., A( ... ) = []) you actually resizing A at almost each iteration. See this post on why exactly this is a bad practice.
Using setdiff
In order to use setdiff (as suggested by natan) on only the first two columns you'll need use it's second output argument:
[ignore, ia] = setdiff( A(:,1:2), B(:,1:2), 'rows', 'stable' );
A = A( ia, : ); % keeping only relevant rows, beyond first two columns.
Here's another bsxfun implementation -
A(~any(squeeze(all(bsxfun(#eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2)),2),:)
One more that is dangerously close to Shai's solution, but still avoids two permute to one permute -
A(~any(all(bsxfun(#eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2),3),:)

The Movie Scheduling _Problem_

Currently I'm reading "The Algorithm Design Manual" by Skiena (well, beginning to read)
He asks a problem he calls the "Movie Scheduling Problem":
Problem: Movie Scheduling Problem
Input: A set I of n intervals on the line.
Output: What is the largest subset of mutually non-overlapping intervals which can
be selected from I?
Example: (Each dashed line is a movie, you want to find a set with the highest quantity of movies)
----a---
-----b---- -----c--- ---d---
-----e--- -------f---
--g-- --h--
The algorithm I thought of to solve it was like this:
I could throw out the "worst offender" (intersects with the most other movies) until there are no worst offenders (zero intersections). The only problem I see is that if there is a tie (say two different movies each intersect with 3 other movies) could it matter which one I throw out?
Basically I'm wondering how I go about turning the idea into "math" and how to prove it correct/incorrect.
The algorithm is incorrect. Let's consider the following example:
Counterexample
|----F----| |-----G------|
|-------D-------| |--------E--------|
|-----A------| |------B------| |------C-------|
You can see that there is a solution of size at least 3 because you can pick A, B and C.
Firstly, let's count, for each interval the number of intersections:
A = 2 [F, D]
B = 4 [D, F, E, G]
C = 2 [E, G]
D = 3 [A, B, F]
E = 3 [B, C, G]
F = 3 [A, B, D]
G = 3 [B, C, E]
Now consider a run of your algorithm. In the first step we delete B because it intersects with the most number of invervals and we get:
|----F----| |-----G------|
|-------D-------| |--------E--------|
|-----A------| |------C-------|
It's easy to see that now from {A, D, F} you can choose only one, because each pair intersects. The same case with {G, E, C}, so after deleting B, you can choose at most one from {A, D, F} and at most one from {G, E, C}, to get the total of 2, which is smaller than the size of {A, B, C}.
The conclusion is, that after deleting B which intersects with the most number of invervals, you can't get the maximum number of nonintersecting movies.
Correct solution
The problem is very well known and one solution is to pick the interval which ends first, delete all intervals intersecting with it and continue until there are no intervals to examine. This is an example of a greedy method and you can find or develop a proof that it's correct.
This looks like a dynamic programming problem to me:
Define the following functions:
sched(t) = best schedule starting at time t
next(t) = set of movies that start next after time t
len(m) = length of movie m
next returns a set because there may be more than one movie that starts at the same time.
then sched should be defined as follows:
sched(t) = max { 1 + sched(t + len(m)), sched(t+1) } where m in next(t)
This recursive function selects a movie m from next(t) and compares the largest possible sets that either include or don't include m.
Invoke sched with the time of your first movie and you will get the size of the optimal set. Getting the optimal set itself just requires a little extra logic to remember which movies you select at each invocation.
I think this recursive (as opposed to iterative) algorithm runs in O(n^2) if you use memoization, where n is the number of movies.
It's correct, but I'd have to consult my algorithms textbook to give you an explicit proof, but hopefully this algorithm makes intuitive sense why it is correct.
# go through the database and create a 2-D matrix indexed a..h by a..h. Set each
# element of the matrix to 1 if the row index movie overlaps the column index movie.
mtx = []
for i in range(8):
column = []
for j in range(8):
column.append(0)
mtx.append(column)
# b <> e
mtx[1][4] = 1
mtx[4][1] = 1
# e <> g
mtx[4][6] = 1
mtx[6][4] = 1
# e <> c
mtx[4][2] = 1
mtx[2][4] = 1
# c <> a
mtx[2][0] = 1
mtx[0][2] = 1
# c <> f
mtx[2][5] = 1
mtx[5][2] = 1
# c <> g
mtx[2][6] = 1
mtx[6][2] = 1
# c <> h
mtx[2][7] = 1
mtx[7][2] = 1
# d <> f
mtx[3][5] = 1
mtx[5][3] = 1
# a <> f
mtx[0][5] = 1
mtx[5][0] = 1
# a <> d
mtx[0][3] = 1
mtx[3][0] = 1
# a <> h
mtx[0][7] = 1
mtx[7][0] = 1
# g <> e
mtx[4][7] = 1
mtx[7][4] = 1
# print out contstraints
for line in mtx:
print line
# keep track of which movies are still allowed
allowed = set(range(8))
# loop through in greedy fashion, picking movie that throws out the least
# number of other movies at each step
best = 8
while best > 0:
best_col = None
best_lost = set()
best = 8 # score if move does not overlap with any other
# each step, only try movies still allowed
for col in allowed:
lost = set()
for row in range(8):
# keep track of other movies eliminated by this selection
if mtx[row][col] == 1:
lost.add(row)
# this was the best of all the allowed choices so far
if len(lost) < best:
best_col = col
best_lost = lost
best = len(lost)
# there was a valid selection, process
if best_col > 0:
print 'watch movie: ', str(unichr(best_col+ord('a')))
for row in best_lost:
# now eliminate the other movies you can't now watch
if row in allowed:
print 'throwing out: ', str(unichr(row+ord('a')))
allowed.remove(row)
# also throw out this movie from the allowed list (can't watch twice)
allowed.remove(best_col)
# this is just a greedy algorithm, not guaranteed optimal!
# you could also iterate through all possible combinations of movies
# and simply eliminate all illegal possibilities (brute force search)

Resources