Resorting and holding a vector - sorting

So this is probably a very specific problem and i am not sure if it is even solvable but here we go:
I have a vector with 6 indices which each are variables. These variables get calculated separately. What I want now is that the order of the indices changes at a specific time and stays like that. But the actual value of the indices needs to keep being calculated. Maybe explaining it in my Modelica code helps with understanding.
I have a vector with six indices, made up out of 6 variables, let's name them A to F. A to F are each calculated in a different way which is (probably) not relevant here so I'm simply writing [...] for that here. They behave independently of each other.
Real Vector[6];
Real A;
Real B;
Real C;
Real D;
Real E;
Real F;
equation
A = 3*x;
B = 5-x +7/x ...;
C = [and so on]
D = [...]
E = [...];
F = [...];
Initially, the Vector is sorted like this:
Vector = {A, B, C, D, E, F};
But I want the order of the indices to be resorted via some if-clauses every 100 seconds (starting at time=0) which i make work like that:
when sample(0,100) then
Vector = {if xyz then A,
elseif xyz then B ....}
end when;
Again, the specific way in which i resort the indices (probably) doesn't matter because it definitely works.
My problem is: While it does resort my Vector every 100 seconds and holds this new order/sequence (which is exactly what i need), it of course also holds the calculated actual values of A to F at that time. Which means i get constant values between each time step.
What i need is the new order to hold but the values of A to F need to keep being calculated.
I also tried using if instead of when like
if time <100 then Vector = {A, B, C, D, E, F}
elseif time >=100 and <200 then Vector = {if xyz then A, elseif xyz then B ....(see above)}
else ...;
end if;
Problem here: it does also resort my Vector while also calculating A to F. But it looks to resort my vector all the time, not only once every 100 seconds --> holding the order until the next 100 seconds are over (the resorting is dependent on other calculated values in the model which are constantly changing).
My model is very huge so it's tricky to share all the parts that weave into this part of my work which is the reason i had to simplify my explanations as much as possible. I hope someone can help me with this.
I'm still relatively new at this and have been mostly teaching myself for the last few months so maybe I'm simply not aware of an easy obvious solution here. Or what I need is simply not doable in Modelica.
Thank you!

Not 100% sure I got the question correctly, but would the the graph below show what you need?
...with v being the original vector and vs being the continuously computed, but sorted (ascending every 100s) version of v.
This is the respective code:
model VectorSorting "Computes 'vs' every 100s from 'v' with acending order"
Real A, B, C, D, E, F; // Some variables computed in equations below
Real v[6]; // vector for A...F
Real vs[6]; // sorted version of 'v'
Integer i[6](start=1:6, fixed=true); // indexes of vector
Real d[6](start=zeros(6), fixed=true); // dummy variable
equation
A = time+200;
B = time-150;
C = 3*time-333;
D = 0.5*time+75;
E = -250;
F = 750;
v = {A, B, C, D, E, F};
vs = v[i];
when sample(0, 100) then
(d, i) = Modelica.Math.Vectors.sort(v);
end when;
annotation (experiment(StopTime=500), uses(Modelica(version="4.0.0")));
end VectorSorting;

Related

Pairing the weight of a protein sequence with the correct sequence

This piece of code is part of a larger function. I already created a list of molecular weights and I also defined a list of all the fragments in my data.
I'm trying to figure out how I can go through the list of fragments, calculate their molecular weight and check if it matches the number in the other list. If it matches, the sequence is appended into an empty list.
combs = [397.47, 2267.58, 475.63, 647.68]
fragments = ['SKEPFKTRIDKKPCDHNTEPYMSGGNY', 'KMITKARPGCMHQMGEY', 'AINV', 'QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV', 'MQCL', 'HMTPCYELHGLRWV', 'DHTAQPCRSWPMDYPLT', 'IEEATHM', 'MVGKMDMLEQYA', 'GWPDII', 'QIQDY', 'TPCYELHGLRWVQIQDYA', 'HGLRWVQIQDYAINV', 'KKKNARKW', 'TPCYELHGLRWV']
frags = []
for c in combs:
for f in fragments:
if c == SeqUtils.molecular_weight(f, 'protein', circular = True):
frags.append(f)
print(frags)
I'm guessing I don't fully know how the SeqUtils.molecular_weight command works in Python, but if there is another way that would also be great.
You are comparing floating point values for equality. That is bound to fail. You always have to account for some degree of error when dealing with floating point values. In this particular case you also have to take into account the error margin of the input values.
So do not compare floats like this
x == y
but instead like this
abs(x - y) < epsilon
where epsilon is some carefully selected arbitrary number.
I did two slight modifications to your code: I swapped the order of the f and the c loop to be able to store the calculated value of w. And I append the value of w to the list frags as well in order to better understand what is happening.
Your modified code now looks like this:
from Bio import SeqUtils
combs = [397.47, 2267.58, 475.63, 647.68]
fragments = ['SKEPFKTRIDKKPCDHNTEPYMSGGNY', 'KMITKARPGCMHQMGEY', 'AINV', 'QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV',
'MQCL', 'HMTPCYELHGLRWV', 'DHTAQPCRSWPMDYPLT', 'IEEATHM', 'MVGKMDMLEQYA', 'GWPDII', 'QIQDY',
'TPCYELHGLRWVQIQDYA', 'HGLRWVQIQDYAINV', 'KKKNARKW', 'TPCYELHGLRWV']
frags = []
threshold = 0.5
for f in fragments:
w = SeqUtils.molecular_weight(f, 'protein', circular=True)
for c in combs:
if abs(c - w) < threshold:
frags.append((f, w))
print(frags)
This prints the result
[('AINV', 397.46909999999997), ('IEEATHMTPCYELHGLRWV', 2267.5843), ('MQCL', 475.6257), ('QIQDY', 647.6766)]
As you can see, the first value for the weight differs from the reference value by about 0.0009. That's why you did not catch it with your approach.

Check if a vector lies in the span a subset of columns of a matrix in Sage

I'm new to programming with Sage. I have a rectangular R*C matrix (R rows and C columns) and the rank of M is (possibly) smaller than both R and C. I want to check if a target vector T is in the span of a subset of columns of M. I have written the following code in Sage (I haven't included the whole code because the way I get M and T are rather cumbersome). I just want to check if the code does what I want.
Briefly, this is what my code is trying to do: M is my given matrix, I first check that T is indeed in the span of columns of M (the first if condition). If they do, I proceed to trim down M (which had C columns) to a matrix M1 which has exactly rank(M) many columns (this is what the first while loop does). After that, I keep removing the columns one by one to check if the rest of the columns contain T in their span (this is the second while loop). In the second while loop, I first remove a column from M2 (which is essentially a copy of M1) and call this matrix M3. To M3. I augment the vector T and check if the rank decreases. Since T was already in the span of M2, rank([M2 T]) should be the same as rank(M2). Now by removing column c and augmenting T to M2 doesn't decrease the rank, then I know that c is not necessary to generate T. This way I only keep those columns that are necessary to generate T.
It does return correct answers for the examples I tried, but I am going to run this code on a matrix with entries which vary a lot in magnitude (say the maximum is as large as 20^20 and minimum is 1)and typically the matrix dimensions could go up to 300. So planning to run it over a set of few hundred test cases over the weekend. It'll be really helpful if you can tell me if something looks fishy/wrong -- for eg. will I run into precision errors? How should I modify my code so that it works for all values/ranges as mentioned above? Also, if there is any way to speed up my code (or write the same thing in a shorter/nicer way), I'd like to know.
R = 155
C= 167
T = vector(QQ, R)
M1 = matrix(ZZ, R, C)
M1 = M
C1 = C
i2 = 0
if rank(M.augment(T)) == rank(M):
print("The rank of M is")
print(rank(M))
while i2 < C1 :
if rank(M1.delete_columns([i2])) == rank(M1) :
M1 = M1.delete_columns([i2])
C1 = C1 - 1
else :
i2 = i2+1
C2 = M1.ncols()
print("The number of columns in the trimmed down matrix M1 is")
print(C2)
i3 = 0
M2 = M1
print("The rank of M1 which is now also the rank of M2 is")
print(rank(M2))
while i3 < C2 :
M3 = M2.delete_columns([i3])
if rank(M3.augment(T)) < rank(M2) :
M2 = M3
C2 = C2 - 1
else :
i3 = i3 + 1
print("Rank of matrix M is")
print(M.rank())
If I wanted to use Sage to decide whether a vector T was in the image of some a matrix M1 constructed from some subset of columns of another matrix M, I would do this:
M1 = M.matrix_from_columns([list of indices of the columns to use])
T in M1.column_space()
or use a while loop to modify M1 each time, as you do. (But I think T in M1.column_space() should work better than testing equality of ranks.)

How to implement dp?

I recently have encountered a question where I have been given two types of person: Left-handed and Right-handed (writing style). They all are to be seated in a class row and to avoid any disturbance their hands must not collide i.e. pattern may be like LR or LL or RR.
I tried using recursion (taking two branches of L and R for every seat) but number of computations would be very high even for a row size of 100.
Somehow I need to implement DP to reduce the computations. Please suggest.
EDIT:
Actually there is a matrix (like a classroom) in which three types (L R B) of people can be seated with no collisions of hand. I have to generate the maximum number of people that can be seated. Suppose I have a 2x2 matrix and to fill it with Left, Right and Both handed type of persons L=0, R=1 ,B =3 are given. So one valid arrangement would be Row0: B R and row1: B - where - means blank seat.
Actually the fact that you have a matrix doesn't make a difference in the solution you can transform it to an array without losing generality because each state depends on its left or right not its up and down. A bottom-up approach: In each state you have three things L, B and R , index of the seat you want to fill and its left person. Now we can fill the table from right to left. The answer is dp[inedx=0][L][B][R][left_person=' '].
recursive [index][L][B][R][left_person] :
if left_person = ' ' or 'L':
LVal = recursive[index+1][L-1][B][R]['L']
if left_person = ' ' or 'L' or 'R'
RVal = recursive[index+1][L][B][R-1]['R']
if left_person = ' ' or 'L':
BVal = recursive[index+1][L][B-1][R]['B']
NVal = recursive[index+1][L][B][R][' ']
max(LVal, RVal, BVal, NVal) -> dp[index][L][B][R][left_person]
Of course this is not complete and I'm just giving you the genereal idea. You should add some details like the base case and checking if there is anymore person of that kind before assigning it and some other details.

How can I intercept this approximation error in my matlab script?

I am trying to find the minimum of a function using this algorithm.
It's not an optimal algorithm, but I don't care at the moment.
Also, you don't have to know how the algorithm works in order to reply, but if you're curious, I talk about it at the end of this post. It's really not that difficult.
Incriminated Algorithm
function result = fmin(f,a,b,max_error)
if abs(b-a) < max_error
result = (a+b)/2;
else
r1 = a+(b-a)*rand(1,1); r2 = a+(b-a)*rand(1,1);
c = min([r1,r2]); d = max([r1,r2]);
fc = f(c); fd = f(d);
if fc <= fd
b = d;
else
a = c;
end
result = fmin(f,a,b,max_error);
end
Now, the problem is this algorithm returns a minimum that is far from the actual minimum (computed via the matlab predefined function fminbnd) for more than max_error, if I use it with values of max_error <= 1e-10. This situation, form a theoretical standpoint is not possible.
Being recursive, the algorithm would never return if the condition abs(b-a) < max_error is never satisfied.
So, I think there is some error arising form the approximation of the numbers. At first, I thought that r1 or r2 where not computed properly. At some point, the two numbers would go out of the [a,b] interval, thus invalidating the hypothesis on which the algorithm is working.
To prove this, I modified the algorithm above to include a check on the interval that's computed at every iteration:
Incriminated Algorithm 2 [Check on the extremes]
function result = fmin(f,a,b,max_error)
if abs(b-a) < max_error
result = (a+b)/2;
else
r1 = a+(b-a)*rand(1,1); r2 = a+(b-a)*rand(1,1);
c = min([r1,r2]); d=max([r1,r2]);
% check that c and d are actually inside [a,b]
if ((c < a)||(d > b))
disp('Max precision reached');
result = (a+b)/2;
return;
end
fc = f(c); fd = f(d);
if fc <= fd
b = d;
else
a = c;
end
result = fmin(f,a,b,max_error);
end
But I don't get any additional output from the console.
So, I am thinking there is some error in the computation of f(c) or f(d), but I don't know how to prove it.
Question
Finally, my questions are
Do we, at this point, can be sure that the error is committed in the computation of either one of f(c) or f(d)?
Can we prove it with some line of code? Or better, can we write the algorithm so that it returns when it is supposed to?
How the algorithm works (not strictly inherent to the question)
It's an iterative algorithm. Basically, the idea is to generate a sequence of intervals containing the solution, starting from an initial interval [a,b] in which a given function f is unimodal.
At every step, we randomly choose two number c and d so that a <= c <= d <= b. Now, if we find that f(c) > f(d) it means we are sure that we can discard the values the function assumes before c as valid candidates for a minimum, because of the unimodality. So we restrict the interval and repeat the procedure in the interval [c,b]. On the contrary, if it's f(c) < f(d), we can discard the values from d to b, so we repeat the procedure in the interval [a,d].
At every iteration, the interval gets shorter. When its length is minor than the specified max_error value, the algorithm returns the medium point of the last interval as an approximation of the minimum value.
EDIT
I see there is one person that wants to close this question because it is too broad.
Please sir, can you elaborate in the comments?
This subdivision method only works in the special case that your function is (quasi-)convex (one local minimum, monotonically falling on the left, raising on the right). In the case of several local minima it will often converge to one of them, but it is by no means guaranteed that the algorithm finds the global minimum. The reduction from a to c resp. from b to d can jump over several local minima.

Algorithm to separate items of the same type

I have a list of elements, each one identified with a type, I need to reorder the list to maximize the minimum distance between elements of the same type.
The set is small (10 to 30 items), so performance is not really important.
There's no limit about the quantity of items per type or quantity of types, the data can be considered random.
For example, if I have a list of:
5 items of A
3 items of B
2 items of C
2 items of D
1 item of E
1 item of F
I would like to produce something like:
A, B, C, A, D, F, B, A, E, C, A, D, B, A
A has at least 2 items between occurences
B has at least 4 items between occurences
C has 6 items between occurences
D has 6 items between occurences
Is there an algorithm to achieve this?
-Update-
After exchanging some comments, I came to a definition of a secondary goal:
main goal: maximize the minimum distance between elements of the same type, considering only the type(s) with less distance.
secondary goal: maximize the minimum distance between elements on every type. IE: if a combination increases the minimum distance of a certain type without decreasing other, then choose it.
-Update 2-
About the answers.
There were a lot of useful answers, although none is a solution for both goals, specially the second one which is tricky.
Some thoughts about the answers:
PengOne: Sounds good, although it doesn't provide a concrete implementation, and not always leads to the best result according to the second goal.
Evgeny Kluev: Provides a concrete implementation to the main goal, but it doesn't lead to the best result according to the secondary goal.
tobias_k: I liked the random approach, it doesn't always lead to the best result, but it's a good approximation and cost effective.
I tried a combination of Evgeny Kluev, backtracking, and tobias_k formula, but it needed too much time to get the result.
Finally, at least for my problem, I considered tobias_k to be the most adequate algorithm, for its simplicity and good results in a timely fashion. Probably, it could be improved using Simulated annealing.
First, you don't have a well-defined optimization problem yet. If you want to maximized the minimum distance between two items of the same type, that's well defined. If you want to maximize the minimum distance between two A's and between two B's and ... and between two Z's, then that's not well defined. How would you compare two solutions:
A's are at least 4 apart, B's at least 4 apart, and C's at least 2 apart
A's at least 3 apart, B's at least 3 apart, and C's at least 4 apart
You need a well-defined measure of "good" (or, more accurately, "better"). I'll assume for now that the measure is: maximize the minimum distance between any two of the same item.
Here's an algorithm that achieves a minimum distance of ceiling(N/n(A)) where N is the total number of items and n(A) is the number of items of instance A, assuming that A is the most numerous.
Order the item types A1, A2, ... , Ak where n(Ai) >= n(A{i+1}).
Initialize the list L to be empty.
For j from k to 1, distribute items of type Ak as uniformly as possible in L.
Example: Given the distribution in the question, the algorithm produces:
F
E, F
D, E, D, F
D, C, E, D, C, F
B, D, C, E, B, D, C, F, B
A, B, D, A, C, E, A, B, D, A, C, F, A, B
This sounded like an interesting problem, so I just gave it a try. Here's my super-simplistic randomized approach, done in Python:
def optimize(items, quality_function, stop=1000):
no_improvement = 0
best = 0
while no_improvement < stop:
i = random.randint(0, len(items)-1)
j = random.randint(0, len(items)-1)
copy = items[::]
copy[i], copy[j] = copy[j], copy[i]
q = quality_function(copy)
if q > best:
items, best = copy, q
no_improvement = 0
else:
no_improvement += 1
return items
As already discussed in the comments, the really tricky part is the quality function, passed as a parameter to the optimizer. After some trying I came up with one that almost always yields optimal results. Thank to pmoleri, for pointing out how to make this a whole lot more efficient.
def quality_maxmindist(items):
s = 0
for item in set(items):
indcs = [i for i in range(len(items)) if items[i] == item]
if len(indcs) > 1:
s += sum(1./(indcs[i+1] - indcs[i]) for i in range(len(indcs)-1))
return 1./s
And here some random result:
>>> print optimize(items, quality_maxmindist)
['A', 'B', 'C', 'A', 'D', 'E', 'A', 'B', 'F', 'C', 'A', 'D', 'B', 'A']
Note that, passing another quality function, the same optimizer could be used for different list-rearrangement tasks, e.g. as a (rather silly) randomized sorter.
Here is an algorithm that only maximizes the minimum distance between elements of the same type and does nothing beyond that. The following list is used as an example:
AAAAA BBBBB CCCC DDDD EEEE FFF GG
Sort element sets by number of elements of each type in descending order. Actually only largest sets (A & B) should be placed to the head of the list as well as those element sets that have one element less (C & D & E). Other sets may be unsorted.
Reserve R last positions in the array for one element from each of the largest sets, divide the remaining array evenly between the S-1 remaining elements of the largest sets. This gives optimal distance: K = (N - R) / (S - 1). Represent target array as a 2D matrix with K columns and L = N / K full rows (and possibly one partial row with N % K elements). For example sets we have R = 2, S = 5, N = 27, K = 6, L = 4.
If matrix has S - 1 full rows, fill first R columns of this matrix with elements of the largest sets (A & B), otherwise sequentially fill all columns, starting from last one.
For our example this gives:
AB....
AB....
AB....
AB....
AB.
If we try to fill the remaining columns with other sets in the same order, there is a problem:
ABCDE.
ABCDE.
ABCDE.
ABCE..
ABD
The last 'E' is only 5 positions apart from the first 'E'.
Sequentially fill all columns, starting from last one.
For our example this gives:
ABFEDC
ABFEDC
ABFEDC
ABGEDC
ABG
Returning to linear array we have:
ABFEDCABFEDCABFEDCABGEDCABG
Here is an attempt to use simulated annealing for this problem (C sources): http://ideone.com/OGkkc.
I believe you could see your problem like a bunch of particles that physically repel eachother. You could iterate to a 'stable' situation.
Basic pseudo-code:
force( x, y ) = 0 if x.type==y.type
1/distance(x,y) otherwise
nextposition( x, force ) = coined?(x) => same
else => x + force
notconverged(row,newrow) = // simplistically
row!=newrow
row=[a,b,a,b,b,b,a,e];
newrow=nextposition(row);
while( notconverged(row,newrow) )
newrow=nextposition(row);
I don't know if it converges, but it's an idea :)
I'm sure there may be a more efficient solution, but here is one possibility for you:
First, note that it is very easy to find an ordering which produces a minimum-distance-between-items-of-same-type of 1. Just use any random ordering, and the MDBIOST will be at least 1, if not more.
So, start off with the assumption that the MDBIOST will be 2. Do a recursive search of the space of possible orderings, based on the assumption that MDBIOST will be 2. There are a number of conditions you can use to prune branches from this search. Terminate the search if you find an ordering which works.
If you found one that works, try again, under the assumption that MDBIOST will be 3. Then 4... and so on, until the search fails.
UPDATE: It would actually be better to start with a high number, because that will constrain the possible choices more. Then gradually reduce the number, until you find an ordering which works.
Here's another approach.
If every item must be kept at least k places from every other item of the same type, then write down items from left to right, keeping track of the number of items left of each type. At each point put down an item with the largest number left that you can legally put down.
This will work for N items if there are no more than ceil(N / k) items of the same type, as it will preserve this property - after putting down k items we have k less items and we have put down at least one of each type that started with at ceil(N / k) items of that type.
Given a clutch of mixed items you could work out the largest k you can support and then lay out the items to solve for this k.

Resources