How to deal with crossover when sequence matters?

How to deal with crossover when sequence matters? - algorithm

How does one go about crossing over two parents when the children must have a particular ordering?
For example, when applying genetic algorithms to the Travelling Salesman Problem on a fixed graph of vertices / edges, you must contend with the fact that not all vertices can travel to other vertices. This makes crossover much more difficult because unlike the TSP in which all vertices may travel to all other vertices, when a crossover is performed it must be done at a point that produces a legal path. The alternative is to just crossover anyway and reject illegal paths, but the risk is great computational expensive and few to no legal paths.
I've read about permutation crossover but I'm not entirely sure how this solves the issue. Can someone point me in the right direction or advise?

Ordering should, as far as possible, not be a constraint in genetic programming. Maybe you should contemplate to pick up another format for your solutions.
For example, in your TSP, consider the codon A->B.
Instead of the meaning 'take the edge from A to B', you could consider 'take the shortest path from A to B'. This way, your solutions are always feasible. You just have to pre-compute a shortest path matrix as a pre-processing.
Now, this does not guarantee that candidates will be feasible solutions after your crossover. Your crossover should be tuned in order to guarantee that your solution are still feasible. For example, for the TSP, consider the sequences :
1 : A B C D E F G H
2 : A D E G C B F H
Choose a pivot randomly (E in our example). This leads to the following sequences to be completed :
1' : A B C D E . . .
2' : A D E . . . . .
All the vertices have to be visited in order to have a valid solution. In 1', F, G and H have to be visited. We order them as they are in sequence 2. In 2', G, C, B, F and H are re-ordered as in 1 :
1' : A B C D E G F H
2' : A D E B C F G H
Hope this helps.

Related

why when we change the cost of every edge in G as c'= log17（c）,every MST in G is still an MST in G′ (and vice versa)?

remarks：c' is logc with base 17
MST means (minimum spanning tree)
it's easy to prove the conclusion is correct when we use linear function to transform the cost of every edge.
But log function is not a linear function ,I could not understand why this conclusion is correct。
Supplementary notes：
I did not consider specific algorithms, such as the greedy algorithm. I simply consider the relationship between the sum of the weights of the two trees after transformation.
Numerically if (a + b) > (c + d) , (log a + log b) maybe not > ( logc + logd) .
If a tree generated by G has two edge a and b ,another tree generated by G has c and d，a + b < c + d and the first tree is a MST,but in transformed graph G' ,the sum of weights of edges of second tree may be smaller.
Because of this, I want to construct a counterexample based on "if (a + b)> (c + d), (log a + log b) maybe not> (logc + logd) ", but I failed.

One way to characterize when a spanning tree T is a minimum spanning tree is that, for every edge e not in T, the cycle formed by e and edges of T (the fundamental cycle of e with respect to T) has no edge more expensive than e. Using this characterization, I hope you see how to prove that transforming the costs with any increasing function preserves minimum spanning trees.
There's a one line proof that this condition is necessary. If the fundamental cycle contained a more expensive edge, we could replace it with e and get a spanning tree that costs less than T.
It's less obvious that this condition is sufficient, since at first glance it looks like we're trying to prove global optimality from a local optimality condition. To prove this statement, let T be a spanning tree that satisfies the condition, let T' be a minimum spanning tree, and let G' be the graph whose edges are the union of the edges of T and T'. Run Kruskal's algorithm on G', breaking ties by favoring edges in T over edges not in T. Let T'' be the resulting minimum spanning tree in G'. Since T' is a spanning tree in G', the cost of T'' is not greater than T', hence T'' is a minimum spanning tree in G as well as G'.
Suppose to the contrary that T'' ≠ T. Then there exists an edge in T but not in T''. Let e be the first such edge considered by Kruskal's algorithm. At the time that e was considered, it formed a cycle C in the edges that had been selected from T''. Since T is acyclic, C \ T is nonempty. By the tie breaking criterion, we know that every edge in C \ T costs less than e. Observing that some edge e' in C \ T must have one endpoint in each of the two connected components of T \ {e}, we infer that the fundamental cycle of e' with respect to T contains e, which violates the local optimality condition. In conclusion, T = T'', hence is a minimum spanning tree in G.
If you want a deeper dive, this logic gets abstracted out in the theory of matroids.

Well, its pretty easy to understand...let's see if I can break it down for you:
c` = log_17(c) // here 17 is base
log may not be linear function...but we can say that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
I hope you get the equation I've written...In words in means, consider a base "b" such that b > 1, then log_b(x) would be greater than log_b(y) if x > y.
So, if we apply this rule in your costs of MST of G, then we see that the edges those were selected for G, would still produce the least possible edges to construct MST G' if c' = log_17(c) // here 17 is base.
UPDATE: As I can see you've problem understanding the proof, I'm elaborating a bit:
I guess, you know MST construction is greedy. We're going to use kruskal's algo to proof why it is correct.(In case, you don't know, how kruskal's algo works, you can read it somewhere, or just google it, you'll find millions of resources). Now, Let me write some steps of kruskal's edge selection for MST of G:
// the following edges are sorted by cost..i.e. c_0 <= c_1 <= c_2 ....
c_0: A, F // here, edge c_0 connects A, F, we've to take the edge in MST
c_1: A, B // it is also taken to construct MST
c_2: B, R // it is also taken to construct MST
c_3: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4: F, X // it is also taken to construct MST
...
...
so on...
Now, when constructing MST of G', we've to select edges which are in the form c' = log_17(c) // where 17 is base
Now, if we convert the edges using log of base 17, then c_0 becomes c_0', c_1 becomes c_1' and so on...
But we, know that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
So, we may say that,
log_17(c_0) <= log_17(c_1), cause c_0 <= c_1
in general,
log_17(c_i) <= log_17(c_j), where i <= j
And now, we may say:
c_0` <= c_1` <= c_2` <= c_3` <= ....
So, the edge selection process to construct MST of G' would be:
// the following edges are sorted by cost..i.e. c_0` <= c_1` <= c_2` ....
c_0`: A, F // here, edge c_0` connects A, F, we've to take the edge in MST
c_1`: A, B // it is also taken to construct MST
c_2`: B, R // it is also taken to construct MST
c_3`: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4`: F, X // it is also taken to construct MST
...
...
so on...
Which is same as MST of G...
That proves the theorem ultimately....
I hope you get it...if not ask me in the comment what is not clear to you...

Shortest path in matrix with obstacles with cheat paths

First of all this is an assingment and I am not looking for direct answers but instead the complexity of the best solution as you might be thinking it .
This is the known problem of shortest path between 2 points in a matrix (Start and End) while having obstacles in the way. Moves acceptables is up,down,left and right . Lets say when moving i carry sth and the cost of each movement is 2 . There are points in the matrix (lets name them B points) where I can leave this sth in one B point and pick it up from another B point . Cost of dumping sth in B point is 1 and cost of picking sth up from a B point is 1 again . Whenever I move without this sth , my cost of moving now is 1 .
What I think of the solution is transform the matrix into a tree and have a BFS applied . However that works without the B points .
Whenever i take into account the B points complexity comes to a worst case scenario N^2.
Here is an example :
S - - -
- - - -
B - - B
- - O E
S = Start , E = End , B = B point to drop sth, O = obstacle
So i start with S move down down to the B point (2*2=4 points) leave sth in the B point (1 point ) move right right (2*1= 2 points ) , pick it up (1 point ) , move down 2 points = total of 10 points .
What i thought was build the tree with nodes every B point , however this would create a very dense cyclic graph of almost (V-1)*(V-1) edges which takes the algortithm in N^2 boundaries just to create the graph .
That is the worst case scenario as above :
S b b b
b b b b
b b b b
b b b E
Another option I thought was that of first calculating shortest paths withouth B points .
Then have iterations where at each iteration :
First have bfs on S and closest B
have BFS on E and closest B
Then see if there is a path between B of closest to S and B closest to E .
If there is then I would see if the path is smaller than that of regular shortest path with obstacles .
If that is bigger then there is no shortest path (no greedy test).
If there is no path between the 2 B points , try second closest to S and try again .
If no path again , the second closest to E and closest to S .
However I am not able to calculate the complexity in this one in the worst case scenario plus there is no greedy test that evaluates that.
Any help on calculating the complexity or even pointing out the best complexity solution (not the solution but just the complexity ) would be greatly appreciated

Your matrix is a representation of a graph. Without the cheat paths it is quite easy to implement a nice BFS. Implementing the cheat paths is not a big deal. Just add the same matrix as another 'layer' on top of the first one. bottom layer is 'carry', top layer is 'no carry'. You can move to the other layer only at B-points for the given cost. This is the same BFS with a third dimension.
You have n^2 nodes and (n-1)^2 edges per layer and additionally a maximum of n^2 eges connecting the layers. That's O(n^2).

You can just build a new graph with nodes labeled by (N, w) where N is a node in the original graph (so a position in your matrix), and w=0 or 1 is whether you're carrying a weight. It's then quite easy to add all possible edges in this graph
This new graph is of size 2*V, not V^2 (and the number of edges is around 4*V+number(B)).
Then you can use a shortest path algorithm, for instance Dijkstra's algorithm: complexity O(E + V log(V)) which is O(V log(V)) in your case.

Transitive set merging

Is there a well-known algorithm, that given a collection of sets, would merge together every two sets that have at least one common element? So for example, given the following input:
A B C
B D
A E
F G H
I J
K F
L M N
E O
It would produce:
A B C D E O
F G H K
I J
L M N
I already have a working implementation, but it seems to be common enough that there has to be a name for what I am doing.

You can model this as a simple graph problem: Introduce a node for every distinct element. Introduce a node for every set. Connect every set to the elements it contains. You get an (undirected) bipartite graph, in which the connected components are the solution of your problem. You can use depth-first search to find the CCs.
The runtime should be linear (with hash tables, so only expected runtime unless your numbers are bounded).
I don't think it deserves a special name, it's just an application of well-known concepts.

Why is greedy algorithm not finding maximum independent set of a graph?

Given a graph G, why is following greedy algorithm not guaranteed to find maximum independent set of G:
Greedy(G):
S = {}
While G is not empty:
Let v be a node with minimum degree in G
S = union(S, {v})
remove v and its neighbors from G
return S
I am wondering can someone show me a simple example of a graph where this algorithm fails?

I'm not sure this is the simplest example, but here is one that fails: http://imgur.com/QK3DC
For the first step, you can choose B, C, D, or F since they all have degree 2. Suppose we remove B and its neighbors. That leaves F and D with degree 1 and E with degree 2. During the next two steps, we remove F and D and end up with a set size of 3, which is the maximum.
Instead suppose on the first step we removed C and its neighbors. This leaves us with F, A and E, each with a degree size of 2. We take either one of these next, and the graph is empty and our solution only contains 2 nodes, which as we have seen, isn't the maximum.

Fastest way traversing a graph with several unknown weight edges

Given a undirected and positive weighted graph G, some edges of G have unknown weight. For instance,
where edge(B, C) has unknown weight.
Traversing from A to B costs you 7.
We are allowed to derive the unknown weight e = weight(B,C) by traversing from B to C or vise versa and costs you e, which becomes a known weight in the end. And walk from A to C through B costs you e+7 in total.
My question is, how fast can we get all the unknown weight when given a starting point? That is, traverse all the unknown weight edges from a starting point(e.g A) with as small costs as possible.
The case that the number of unknown weight is 1 is simple. You can first find out the shortest path from the starting point to the vertices of the desired edge and traverse the unknown weight edge. However, I don't know how to solve it when the number of unknown weight edges grows larger than 1.
Can anyone figure out how to do this?

Can't offer a complete solution, but it looks related to the travelling salesman problem where the unknown edges are the nodes to be visited.
But I think you can't find an optimal solution a priori. Consider this example
a-b = 1
b-c = ?
b-d = ?
a-d = 10
if b-c has low weight (say 1) the shortest path starting from a is a-b-c-b-d which traverses b-c twice. If b-c has high weight (say 100) the shortest path becomes a-d-b-c, preferring the cheaper connection a-d over traversing b-c twice.

You can create a second graph, G', which will be the same - but without the "unkown edges"1. Then, you can use all to all shortest path algorithm, and use the data from the algorithm to fill in the blanks.
Floyd Warshall algorithm offers an O(n3) all to all shortest path.
(1) Formally: G'=(V,E',w') where:
E' = { e | e is in E and w(e) != ? }
w'(e) = w(e) if w(e) != ?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio