BLX-alpha crossover: what approach is the right one? - algorithm

I'm working on genetic algorithm which uses blend BLX-alpha crossover.
I found 2 algorithms, which seem to me quite different from each other
https://yadi.sk/i/u5nq986GuDoNm - page 8
crossover is made as follows:
a. Select 2 parents: G1, G2
b. generate uniformly distributed random number gamma from [-alpha, 1 + alpha], where alpha = 0.5
c. generate an offspring as follows: G = gamma * G1 + (1 - gamma) * G2
http://www.tomaszgwiazda.com/blendX.htm
crossover is made as follows:
a. select two parents X(t) and Y(t) from a parent pool
b. create two offspring X(t+1) and Y(t+1) as follows:
c. for i = 1 to n do
d. di=|xi(t)-yi(t)|
e. choose a uniform random real number u from interval
f. xi(t+1)=u
g. choose a uniform random real number u from interval
h. yi(t+1)=u
i. end do
where:
a – positive real parameter
xi, yi - the i-th component of a parent
di - distance betweet parent components
Which of these 2 algorithms is correct? Or they are equal?
In my task I'm using 2nd method, because the first one provides unsatisfying results.
I concerned with this question, because I'm working on GA, where the first algorithm is supposed to be used.
Any help would be appreciated!

You can search for paper "Real-Coded Genetic Algorithms and Interval Schemata" in which the BLX-alpha crossover first introduced.
In this paper,the first algorithm is introduced.
As for the second one ,I think it is equal to the first one in the way of producing offsprings.Because the second algs produces two offsprings one time,it has more chance to get a better individual.But it also needs more FEs.

def crossover_blen(p1,p2,alpha):
c1,c2 = deepcopy(p1),deepcopy(p2)
for i in range(len(p1)):
distancia = abs(c2[i]-c1[i])
l = min(c1[i],c2[i]) - alpha * distancia
u = max(c1[i],c2[i]) + alpha * distancia
c1[i] = l + random.random() * (u-l)
c2[i] = l + random.random() * (u-l)
return [c1,c2]

Related

Queries for minimum fuel needed to travel from U to V

Question:
Given a tree with N nodes.
Each edges of the tree contains:
D : the length of the edge
T : the gold needed to pay to go through that edge (the gold should be paid before going through the edge)
When moving through an edge, if you're carrying X golds, you will need X*D fuel.
There are 2 types of queries:
u, v: find the fuel needed to transfer G golds from u to v (G is fixed among all queries)
u, v, x: update T of edge {u,v} to x ({u, v} is guaranteed to be in the tree)
Constraints:
2 ≤ N ≤ 100.000
1 ≤ Q ≤ 100.000
1 ≤ Ai, Bi ≤ N
1 ≤ D, T, G ≤ 10^9
Example:
N = 6, G = 2
Take queries 1 with u = 3 and v = 6 for example. First, you start at 3 with 11 golds , pay 2, having 9, and go to node 2 with 9*1 = 9 fuel. Next, we pay 3 gold, having 6, and go to node 4 with 6*2 = 12 fuel. Finally, we pay 4, having 2 gold, and go to node 6 with 2*1 = 2 fuel. So the fuel needed would be 9 + 12 + 2 = 23.
So the answer to query: u = 3, v = 6 would be 23
The second query is just updating T of the edge so I think there's no need for explanation.
My take
I was only able to solve the problem in O(N*Q). Since it's a tree, there's only 1 path from u to v, so for each query, I do a DFS to find the fuel needed to go from u to v. Here's the code for that subtask: https://ideone.com/SyINTQ
For some special cases that all T are 0. We just need to find the length from u to v and multiply it by G. The length from u to v can be easily found using a distance array and LCA. I think this could be a hint for the proper solution.
Is there a way to do the queries in logN or less?
P/S: Please comment if anything needs to be clarified, and sorry for my bad English.
This answer will explain my matrix group comment in detail and then
sketch the standard data structures needed to make it work.
Let’s suppose that we’re carrying Gold and have burned Fuel so far.
If we traverse an edge with parameters Distance, Toll, then the
effect is
Gold -= Toll
Fuel += Gold * Distance,
or as a functional program,
Gold' = Gold - Toll
Fuel' = Fuel + Gold' * Distance.
= Fuel + Gold * Distance - Toll * Distance.
The latter code fragment defines what mathematicians call an action:
each Distance, Toll gives rise to a function from Gold, Fuel to
Gold, Fuel.
Now, whenever we have two functions from a domain to that same domain,
we can compose them (apply one after the other):
Gold' = Gold - Toll1
Fuel' = Fuel + Gold' * Distance1,
Gold'' = Gold' - Toll2
Fuel'' = Fuel' + Gold'' * Distance2.
The point of this math is that we can expand the definitions:
Gold'' = Gold - Toll1 - Toll2
= Gold - (Toll1 + Toll2),
Fuel'' = Fuel' + (Gold - (Toll1 + Toll2)) * Distance2
= Fuel + (Gold - Toll1) * Distance1 + (Gold - (Toll1 + Toll2)) * Distance2
= Fuel + Gold * (Distance1 + Distance2) - (Toll1 * Distance1 + (Toll1 + Toll2 ) * Distance2).
I’ve tried to express Fuel'' in the same form as before: the
composition has “Distance” Distance1 + Distance2 and “Toll”
Toll1 + Toll2, but the last term doesn’t fit the pattern. What we can
do, however, is add another parameter, FuelSaved and define it to be
Toll * Distance for each of the input edges. The generalized update
rule is
Gold' = Gold - Toll
Fuel' = Fuel + Gold * Distance - FuelSaved.
I’ll let you work out the generalized composition rule for
Distance1, Toll1, FuelSaved1 and Distance2, Toll2, FuelSaved2.
Suffice it to say, we can embed Gold, Fuel as a column vector
{1, Gold, Fuel}, and parameters Distance, Toll, FuelSaved as a unit
lower triangular matrix
{{1, 0, 0}, {-Toll, 1, 0}, {-FuelSaved, Distance, 1}}. Then
composition is matrix multiplication.
Now, so far we only have a semigroup. I could take it from here with
data structures, but they’re more complicated when we don’t have an
analog of subtraction (for intuition, compare the problems of finding
the sum of each length-k window in an array with finding the max).
Happily, there is a useful notion of undoing a traversal here (inverse).
We can derive it by solving for Gold, Fuel from Gold', Fuel':
Gold = Gold' + Toll
Fuel = Fuel' - Gold * Distance + FuelSaved,
Fuel = Fuel' + Gold' * (-Distance) - (-FuelSaved - Toll * Distance)
and reading off the inverse parameters.
I promised a sketch of the data structures, so here we are. Root the
tree anywhere. It suffices to be able to
Given nodes u and v, query the leafmost common ancestor of u and v;
Given a node u, query the parameters to get from u to the root;
Given a node v, query the parameters to get from the root to v;
Update the toll on an edge.
Then to answer a query u, v, we query their leafmost common ancestor w
and return the fuel cost of the composition (u to root) (w to root)⁻¹
(root to w)⁻¹ (root to v) where ⁻¹ means “take the inverse”.
The full-on sledgehammer approach here is to implement dynamic trees,
which will do all of these
things in amortized logarithmic
time per operation. But we don’t need dynamic topology updates and can
probably afford an extra log factor, so a set of more easily digestable
pieces would be leafmost common ancestors, heavy path decomposition, and
segment trees (one per path; Fenwick is potentially another option, but
I’m not sure what complications a noncommutative operation might
create).
I told in the comments that a Dijkstra algorithm was necessary, but thinking better the DFS is really enough because there is only one path for each pair of vertices, we will always need to go from the starting point to the endpoint.
Using a priority queue instead of a stack would only change the order that the graph is explored, but in the worst case it would still visit all the vertices.
Using a queue instead of a stack would make the algorithm a breadth first search, again would only change the order in which the graph is explored.
Assuming that the number of nodes in a given distance increases exponentially with the threshold. An improvement for the typical case could be achieved by doing two searches and meeting in the middle. But only a constant factor.
So I think it is better to go with the simple solution, implementing this in C/C++ will result in a program dozens of times faster.
Solution
Prepare adjacency lists, and also makes the graph undirected
from collections import defaultdict
def process_edges(rows):
edges = defaultdict(list)
for u,v,D,T in rows:
edges[u].append((v,(D,T)))
edges[v].append((u,(D,T)))
return edges
It is interesting to do the search backwards because the amount of gold is fixed at the destination, and unknown at the origin, then we can calculate the exact amount of gold and fuel required for each node going backwards.
Of course you can remove the print statement I left there
def dfs(edges, a, b, G):
Q = [((0,G),b)]
visited = set()
while len(Q) != 0:
((Fu,Gu), current_vertex) = Q.pop()
visited.add(current_vertex)
for neighbor,(D,T) in edges[current_vertex]:
if neighbor in visited:
continue; # avoid going backwards
Gv = Gu + T # add the tax of the edge to the gold budget
Fv = Fu + Gv * D # compute the required fuel
print(neighbor, (Fv, Gv))
if neighbor == a:
return (Fv, Gv)
Q.append(((Fv,Gv), neighbor))
Running your example
edges = process_edges([
[6,4,1,4],
[5,4,2,2],
[4,2,2,3],
[3,2,1,2],
[1,2,2,1]
])
dfs(edges,3,6,2)
Will print:
4 (6, 6)
5 (22, 8)
2 (24, 9)
3 (35, 11)
and return (35, 11). It means that for rounte from 3 to 6, it requires 11 gold, and 35 is the fuel used.

implementing stochastic ACO algorithm

I am trying to implement a stochastic ant colony optimisation algorithm, and I'm having trouble working out how to implement movement choices based on probabilities.
the standard (greedy) version that I have implemented so far is that an ant m at a vertex i on a graph G = (V,E) where E is the set of edges (i, j), will choose the next vertex j based on the following criteria:
j = argmax(<fitness function for j>)
such that j is connected to i
the problem I am having is in trying to implement a stochastic version of this, so that now the criteria for choosing a new vertex, j is:
P(j) = <fitness function for j>/sum(<fitness function for J>)
where P(j) is the probability of choosing vertex j,
such j is connected to i,
and J is the set of all vertices connected to i
I understand the mathematics behind it, I am just having trouble working out how i should actually implement it.
if, say, i have 3 vertices connected to i, each with a probability of 0.2, 0.3, 0.5 - what is the best way to make the selection? should I just randomly select a vertex j, then generate a random number r in the range (0,1) and if r >= P(j), select vertex j? or is there a better way?
Looking at the problem statement, I think you are not trying to visit all nodes (connected to i (say) ), but some of the nodes based on some probability distribution. Lets take an example:
You have a node i and connected to it are 5 nodes, a1...a5, with probabilities p1...p5, such that sum(p_i) = 1. No, say the precision of probabilities that you consider is 2 places after decimal. Also, you dont want to visit all 5 nodes, but only k of them. Lets say, in this example, k = 2. So, since 2 places of decimal is your probability precision, add 3 to it to increase normality of probability distribution in the random function. (You can change this 3 to any number of your choice, as far as performance is concerned) (Since you have not tagged any language, I'll take example of java's nextInt() function to generate random numbers.)
Lets give some values:
p1...p5 = {0.17, 0.11, 0.45, 0.03, 0.24}
Now, in a loop from 1 to k, generate a random number from (0...10^5). {5 = 2 + 3, ie. precision + 3}. If the generated number is from 0 to 16999, go with node a1, 17000 to 27999, go with a2, 28000 to 72999, go with a3...and so on. You get the idea.
What you're trying to implement is a weighted random choice depending on the probabilities for the components of the solution, or a random proportional selection rule on ACO terms. Here is an snippet of the implementation of this rule on the Isula Framework:
double value = random.nextDouble();
while (componentWithProbabilitiesIterator.hasNext()) {
Map.Entry<C, Double> componentWithProbability = componentWithProbabilitiesIterator
.next();
Double probability = componentWithProbability.getValue();
total += probability;
if (total >= value) {
nextNode = componentWithProbability.getKey();
getAnt().visitNode(nextNode);
return true;
}
}
You just need to generate a random value between 0 and 1 (stored in value), and start accumulating the probabilities of the components (on the total variable). When the total exceeds the threshold defined in value, we have found the component to add to the solution.

Genetic Algorithm - Best crossover operator for a weights assignment

According to your experience, what is the best crossover operator for weights assignment problem.
In particular, I am facing a constraint that force to be 1 the sum of the all weights. Currently, I am using the uniform crossover operator and then I divide all the parameters by the sum to get 1. The crossover works, but I am not sure that in this way I can save the good part of my solution and go to converge to a better solution.
Do you have any suggestion? No problem, if I need to build a custom operator.
If your initial population is made up of feasible individuals you could try a differential evolution-like approach.
The recombination operator needs three (random) vectors and adds the weighted difference between two population vectors to a third vector:
offspring = A + f (B - C)
You could try a fixed weighting factor f in the [0.6 ; 2.0] range or experimenting selecting f randomly for each generation or for each difference vector (a technique called dither, which should improve convergence behaviour significantly, especially for noisy objective functions).
This should work quite well since the offspring will automatically be feasible.
Special care should be taken to avoid premature convergence (e.g. some niching algorithm).
EDIT
With uniform crossover you are exploring the entire n-dimensional space, while the above recombination limits individuals to a subspace H (the hyperplane Σi wi = 1, where wi are the weights) of the original search space.
Reading the question I assumed that the sum-of-the-weights was the only constraint. Since there are other constraints, it's not true that the offspring is automatically feasible.
Anyway any feasible solution must be on H:
If A = (a1, a2, ... an), B = (b1, ... bn), C = (c1, ... cn) are feasible:
Σi ai = 1
Σi bi = 1
Σi ci = 1
so
Σi (ai + f (bi - ci)) =
Σi ai + f (Σi bi - Σi ci) =
1 + f (1 - 1) = 1
The offspring is on the H hyperplane.
Now depending on the number / type of additional constraints you could modify the proposed recombination operator or try something based on a penalty function.
EDIT2
You could determine analytically the "valid" range of f, but probably something like this is enough:
f = random(0.6, 2.0);
double trial[] = {f, f/2, f/4, -f, -f/2, -f/4, 0};
i = 0;
do
{
offspring = A + trial[i] * (B - C);
i = i + 1;
} while (unfeasible(offspring));
return offspring;
This is just a idea, I'm not sure how it works.

least square line fitting in 4D space

I have a set of points like:
(x , y , z , t)
(1 , 3 , 6 , 0.5)
(1.5 , 4 , 6.5 , 1)
(3.5 , 7 , 8 , 1.5)
(4 , 7.25 , 9 , 2)
I am looking to find the best linear fit on these points, let say a function like:
f(t) = a * x +b * y +c * z
This is Linear Regression problem. The "best fit" depends on the metric you define for being better.
One simple example is the Least Squares Metric, which aims to minimize the sum of squares: (f((x_i,y_i,z_i)) - w_i)^2 - where w_i is the measured value for the sample.
So, in least squares you are trying to minimize SUM{(a*x_i+b*y_i+c*z^i - w_i)^2 | per each i }. This function has a single global minimum at:
(a,b,c) = (X^T * X)^-1 * X^T * w
Where:
X is a 3xm matrix (m is the number of samples you have)
X^T - is the transposed of this matrix
w - is the measured results: `(w_1,w_2,...,w_m)`
The * operator represents matrix multiplication
There are more complex other methods, that use other distance metric, one example is the famous SVR with a linear kernel.
It seems that you are looking for the major axis of a point cloud.
You can work this out by finding the Eigenvector associated to the largest Eigenvalue of the covariance matrix. Could be an opportunity to use the power method (starting the iterations with the point farthest from the centroid, for example).
Can also be addressed by Singular Value Decomposition, preferably using methods that compute the largest values only.
If your data set contains outliers, then RANSAC could be a better choice: take two points at random and compute the sum of distances to the line they define. Repeat a number of times and keep the best fit.
Using the squared distances will answer your request for least-squares, but non-squared distances will be more robust.
You have a linear problem.
For example, my equation will be Y=ax1+bx2+c*x3.
In MATLAB do it:
B = [x1(:) x2(:) x3(:)] \ Y;
Y_fit = [x1(:) x2(:) x3(:)] * B;
In PYTHON do it:
import numpy as np
B, _, _, _ = np.linalg.lstsq([x1[:], x2[:], x3[:]], Y)
Y_fit = np.matmul([x1[:] x2[:] x3[:]], B)

Weights Optimization in matlab

I have to do optimization in supervised learning to get my weights.
I have to learn the values (w1,w2,w3,w4) such that whenever my vector A = [a1 a2 a3 a4] is 1 the sum w1*a1 + w2*a2 + w3*a3 + w4*a4 becomes greater than 0.5 and when its -1 ( labels ) then it becomes less than 0.5.
Can somebody tell me how I can approach this problem in Matlab ? One way that I know is to do it using evolutionary algorithms, taking a random value vector and then changing to pick the best n values.
Is there any other way that this can be approached ?
You can do it using linprog.
Let A be a matrix of size n by 4 consisting of all n training 4-vecotrs you have. You should also have a vector y with n elements (each either plus or minus 1), representing the label of each training 4-vecvtor.
Using A and y we can write a linear program (look at the doc for the names of the parameters I'm using). Now, you do not have an objective function, so you can simply set f to be f = zeros(4,1);.
The only thing you have is an inequality constraint (< a_i , w > - .5) * y_i >= 0 (where <.,.> is a dot-product between 4-vector a_i and weight vector w).
If my calculations are correct, this constraint can be written as
cmat = bsxfun( #times, A, y );
Overall you get
w = linprog( zeros(4,1), -cmat, .5*y );

Resources