How to perform crossover in a 2-dimensional array - genetic algorithm

How to perform crossover in a 2-dimensional array - genetic algorithm - algorithm

I have the following two chromosomes which are represented as a 2D array.
// First chromosome
[
[ 12 45 23 ]
[ 34 01 89 ]
[ 33 90 82 ]
]
// Second chromosome
[
[00 45 89 ]
[00 00 34 ]
]
The constraints on the chromosome are that each array in the chromosome array must remain together. For example in the first chromosome [ 12 45 23 ] must remain together. With this in mind, I believe the way to perform crossover with the above chromosome structure is to randomly select a horizontal crossover point. such as the following:
// First produced off-spring
[
[ 12 45 23 ] // First chromosome
[ 00 00 34 ] // Second chromosome
]
// Second produced off-spring
[
[ 00 45 89 ] // Second chromosome
[ 34 01 89 ] // First chromosome
[ 33 90 82 ] // First chromosome
]
Is this the correct way to perform mutation on a 2D chromosome array which rows must remain intact? If this is, does this method have a specific name? Or would this come under One-point crossover?

does this method have a specific name? Or would this come under One-point crossover?
In various papers about variable length genetic algorithms it's called one point crossover.
For variable length chromosomes one point crossover is often proposed in a more general way: you can select a distinct crossover point for each chromosome. E.g.
C1 = [ A1, A2, A3, A4, A5, A6]
C2 = [ B1, B2, B3, B4]
Choosing crossover point 1 for C1 and 3 for C2 you get:
C1 = [ A1 | A2, A3, A4, A5, A6]
C2 = [ B1, B2, B3 | B4]
C1' = [A1 B4]
C2' = [B1, B2, B3, A2, A3, A4, A5, A6]
This allows the chromosome length to start growing. Depending on the specific problem it could be a requirement or just bloating (in both cases you may need to account for that in the fitness function).
Is this the correct way to perform mutation on a 2D chromosome array which rows must remain intact?
It's a simple method (so a good one). Uniform crossover is another simple approach.
Synapsing Variable-Length Crossover: Meaningful Crossover for Variable-Length Genomes (Benjamin Hutt and Kevin Warwick, IEEE Transactions on Evolutionary Computation, vol. 11, no. 1, february 2007) describes other interesting (more complex) possibilities.
The best crossover is very problem specific.

Related

Need help for finding optimal path which visits multiple sequences of nodes

Summary
Recently I have had a path-finding puzzle that has some complex constraints (currently, I don't have any solution for this one)
A 2D matrix represented the graph. The length of a path is the number of traversed cells.
One or more number sequences are to be found inside the matrix. Each sequence is scored with a value.
Maximum length of the path in the graph. The number of picked cells must not exceed this value.
At any given moment, you can only choose cells in a specific column or row.
On each turn, you need to switch between column and row and stay on
the same line as the last cell you picked. You have to move at right angles. (The direction is like the Snake game).
Always start with picking the first cell from the top row, then go
vertically down to pick the second cell, and then continue switching
between column and row as usual.
You can't choose the same cell twice. The resulting path must not contain duplicated
cells.
For example:
The task is to find the shortest path, if possible in the graph that contains one or more sequences with the highest total score and the path's length is not exceed the provided maximum length.
The picture below demonstrates the solved puzzle with the resulting path marked in red:
Here, we have a path 3A-10-9B. This path contains the given
sequence 3A-10-9B so, which earns 10pts. More complex graphs typically have longer paths containing various sequences at once.
More complex examples
Multiple Sequences
You can complete sequences in any order. The order in which the sequences are listed doesn't matter.
Wasted Moves
Sometimes we are forced to waste moves and choose different cells that don't belong to any sequence. Here are the rules:
Able to waste 1 or 2 moves before the first sequence.
Able to waste 1 or 2 moves between any neighboring sequences.
However, you cannot break sequences and waste moves in the middle of them.
Here, we must waste one move before the sequence 3A-9B and two moves between sequences 3A-9B and 72-D4. Also, notice how red lines between 3A and 9B as well as between 72 and D4 "cross" previously selected cells D4 and 9B, respectively. You can pick different cells from the same row or column multiple times.
Optimal Sequences
Sometimes, it is not possible to have a path that contains all of the provided sequences. In this case, choose the way which achieved the most significant score.
In the above example, we can complete either 9B-3A-72-D4 or 72-D4-3A but not both due to the maximum path length of 5 cells. We have chosen the sequence 9B-3A-72-D4 since it grants more score points than 72-D4-3A.
Unsolvable solution
The first sequence 3A-D4 can't be completed since the code matrix doesn't contain code D4 at all. The second sequence, 72-10, can't be completed for another reason: codes 72 and 10 aren't located in the same row or column anywhere in the matrix and, therefore, can't form a sequence.
Performance advice
One brute force way is to generate all possible paths in the code matrix, loop through them and choose the best one. This is the easiest but also the slowest approach. Solving larger matrices with larger maximum length of path might take dozens of minutes, if not hours.
Try to implement a faster algorithm that doesn’t iterate through all possible paths and can solve puzzles with the following parameters in less than 10 seconds:
Matrix size: 10x10
Number of sequences: 5
Average length of sequences: 4
Maximum path length: 12
At least one solution exists
For example:
Matrix:
41,0f,32,18,29,4b,55,3f,10,3a,
19,4f,57,43,3a,25,19,1e,5e,42,
13,5a,54,3c,1b,32,29,1c,15,30,
49,45,22,2e,25,51,2f,21,4c,37,
1a,5e,49,12,55,1e,49,19,43,2d,
34,26,53,48,49,60,32,3c,50,10,
0f,1e,30,3d,64,37,5b,5e,22,61,
4e,4f,15,5a,13,56,44,22,40,26,
43,2c,17,2b,1f,25,43,60,50,1f,
3c,2b,54,46,42,4d,32,46,30,24,
Sequences:
30, 26, 44, 32, 3c - 25pts
5a, 3c, 12, 1e, 4d - 10pts
1e, 5a, 12 - 10pts
4d, 1e - 5pts
32, 51, 2f, 49, 55, 42 - 30pts
Optimal solution
3f, 1c, 30, 26, 44, 32, 3c, 22, 5a, 12, 1e, 4d
Which contains
30, 26, 44, 32, 3c
5a, 12, 1e
1e, 4d
Conclusion
I am looking for any advice for this puzzle since I have no idea what keywords to look for. A pseudo-code or hints would be helpful for me, and I appreciate that. What has come to my mind is just Dijkstra:
For each sequence, since the order doesn't matter, I have to find all get all possible paths with every permutation, then find the highest score path that contains other input sequences
After that, choose the best of the best.
In this case, I doubt the performance will be the issue.

First step is to find if a required sequence exists.
- SET found FALSE
- LOOP C1 over cells in first row
- CLEAR foundSequence
- ADD C1 to foundSequence
- LOOP C2 over cells is column containing C1
- IF C2 value == first value in sequence
- ADD C2 to foundSequence
- SET found TRUE
- break from LOOP C2
- IF found
- SET direction VERT
- LOOP V over remaining values in sequence
- TOGGLE direction
- SET found FALSE
- LOOP C2 over cells in same column or row ( depending on direction ) containing last cell in foundSequence
- IF C2 value == V
- ADD C2 to foundSequence
- SET found TRUE
- break from LOOP C2
- IF ! found
break out of LOOP V
- IF foundSequence == required sequence
- RETURN foundSequence
RETURN failed
Note: this doesn't find sequences that are feasible with "wasted moves". I would implement this first and get it working. Then, using the same ideas, it can be extended to allow wasted moves.
You have not specified an input format! I suggest a space delimited text files with lines beginning with 'm' containing matrix values and lines beginning 's' containing sequences, like this
m 3A 3A 10 9B
m 9B 72 3A 10
m 10 3A 3A 3A
m 3A 10 3A 9B
s 3A 10 9B
I have implemented the sequence finder in C++
std::vector<int> findSequence()
{
int w, h;
pA->size(w, h);
std::vector<int> foundSequence;
bool found = false;
bool vert = false;
// loop over cells in first row
for (int c = 0; c < w; c++)
{
foundSequence.clear();
found = false;
if (pA->cell(c, 0)->value == vSequence[0][0])
{
foundSequence.push_back(pA->cell(c, 0)->ID());
found = true;
}
while (found)
{
// found possible starting cell
// toggle search direction
vert = (!vert);
// start from last cell found
auto pmCell = pA->cell(foundSequence.back());
int c, r;
pA->coords(c, r, pmCell);
// look for next value in required sequence
std::string nextValue = vSequence[0][foundSequence.size()];
found = false;
if (vert)
{
// loop over cells in column
for (int r2 = 1; r2 < w; r2++)
{
if (pA->cell(c, r2)->value == nextValue)
{
foundSequence.push_back(pA->cell(c, r2)->ID());
found = true;
break;
}
}
}
else
{
// loop over cells in row
for (int c2 = 0; c2 < h; c2++)
{
if (pA->cell(c2, r)->value == nextValue)
{
foundSequence.push_back(pA->cell(c2, r)->ID());
found = true;
break;
}
}
}
if (!found) {
// dead end - try starting from next cell in first row
break;
}
if( foundSequence.size() == vSequence[0].size()) {
// success!!!
return foundSequence;
}
}
}
std::cout << "Cannot find sequence\n";
exit(1);
}
This outputs:
3A 3A 10 9B
9B 72 3A 10
10 3A 3A 3A
3A 10 3A 9B
row 0 col 1 3A
row 3 col 1 10
row 3 col 3 9B
You can check out the code for the complete application at https://github.com/JamesBremner/stackoverflow75410318
I have added the ability to find sequences that start elsewhere than the first row ( i.e. with "wasted moves" ). You can see the code in the github repo.
Here are the the results of a timing profile run on a 10 by 10 matrix - the algorithm finds 5 sequences in 0.6 milliseconds
Searching
41 0f 32 18 29 4b 55 3f 10 3a
19 4f 57 43 3a 25 19 1e 5e 42
13 5a 54 3c 1b 32 29 1c 15 30
49 45 22 2e 25 51 2f 21 4c 37
1a 5e 49 12 55 1e 49 19 43 2d
34 26 53 48 49 60 32 3c 50 10
0f 1e 30 3d 64 37 5b 5e 22 61
4e 4f 15 5a 13 56 44 22 40 26
43 2c 17 2b 1f 25 43 60 50 1f
3c 2b 54 46 42 4d 32 46 30 24
for sequence 4d 1e
Cannot find sequence starting in 1st row, using wasted moves
row 9 col 5 4d
row 4 col 5 1e
for sequence 30 26 44 32 3c
Cannot find sequence starting in 1st row, using wasted moves
Cannot find sequence
for sequence 5a 3c 12 1e 4d
Cannot find sequence starting in 1st row, using wasted moves
row 2 col 1 5a
row 2 col 3 3c
row 4 col 3 12
row 4 col 5 1e
row 9 col 5 4d
for sequence 1e 5a 12
Cannot find sequence starting in 1st row, using wasted moves
row 6 col 1 1e
row 4 col 5 1e
row 4 col 3 12
for sequence 32 51 2f 49 55 42
Cannot find sequence starting in 1st row, using wasted moves
row 2 col 5 32
row 3 col 5 51
row 3 col 6 2f
row 4 col 6 49
row 4 col 4 55
row 9 col 4 42
raven::set::cRunWatch code timing profile
Calls Mean (secs) Total Scope
5 0.00059034 0.0029517 findSequence

Avoid accuracy problems while computing the permanent using the Ryser formula

Task
I want to calculate the permanent P of a NxN matrix for N up to 100. I can make use of the fact that the matrix features only M=4 (or slightly more) different rows and cols. The matrix might look like
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
... | r1 identical rows
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
...
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
A3 ... A3 B3 ... B2 C2 ... C2 D2 ... D2
...
A3 ... A3 B3 ... B3 C3 ... C3 D3 ... D3
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
...
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
---------
c1 identical cols
and c and r are the multiplicities of cols and rows. All values in the matrix are laying between 0 and 1 and are encoded as double precision floating-point numbers.
Algorithm
I tried to use the Ryser formula to calculate the permanent. For the formula, one needs to first calculate the sum of each row and multiply all the row sums. For the matrix above this yields
S0 = (c1 * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* (c1 * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
As a next step the same is done with col 1 deleted
S1 = ((c1-1) * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* ((c1-1) * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
and this number is subtracted from S0.
The algorithm continues with all possible ways to delete single and group of cols and the products of the row sums of the remaining matrix are added (even number of cols deleted) and subtracted (odd number of cols deleted).
The task can be solved relative efficiently if one makes use of the identical cols (for example the result S1 will pop up exactly c1 times).
Problem
Even if the final result is small the values of the intermediate results S0, S1, ... can reach values up to N^N. A double can hold this number but the absolute precision for such big numbers is below or on the order of the expected overall result. The expected result P is on the order of c1!*c2!*c3!*c4! (actually I am interested in P/(c1!*c2!*c3!*c4!) which should lay between 0 and 1).
I tried to arrange the additions and subtractions of the values S in a way that the sums of the intermediate results are around 0. This helps in the sense that I can avoid intermediate results that are exceeding N^N, but this improves things only a little bit. I also thought about using logarithms for the intermediate results to keep the absolute numbers down - but the relative accuracy of the encoded numbers will be still bounded by the encoding as floating point number and I think I will run into the same problem. If possible, I want to avoid the usage of data types that are implementing a variable-precision arithmetic for performance reasons (currently I am using matlab).

Uniqueness in Permutation and Combination

I am trying to create some pseudocode to generate possible outcomes for this scenario:
There is a tournament taking place, where each round all players in the tournament are in a group with other players of different teams.
Given x amount of teams, each team has exactly n amount of players. What are the possible outcomes for groups of size r where you can only have one player of each team AND the player must have not played with any of the other players already in previous rounds.
Example: 4 teams (A-D), 4 players each team, 4 players each grouping.
Possible groupings are: (correct team constraint)
A1, B1, C1, D1
A1, B3, C1, D2
But not: (violates same team constraint)
A1, A3, C2, D2
B3, C2, D4, B1
However, the uniqueness constraint comes into play in this grouping
A1, B1, C1, D1
A1, B3, C1, D2
While it does follow the constraints of playing with different teams, it has broken the rule of uniqueness of playing with different players. In this case A1 is grouped up twice with C1
At the end of the day the pseudocode should be able to create something like the following
Round 1 Round 2 Round 3 Round 4
a1 b1 a1 d4 a1 c2 a1 c4
c1 d1 b2 c3 b4 d3 d2 b3
a2 b2 a2 d1 a2 c3 a2 c1
c2 d2 b3 c4 b1 d4 d3 b4
a3 b3 a3 d2 a3 c4 a3 c2
c3 d3 b4 c1 b2 d1 d4 b1
a4 b4 a4 d3 a4 c1 a4 c3
c4 d4 b1 c2 b3 d2 d1 b2
In the example you see that in each round no player has been grouped up with another previous player.

If the number of players on a team is a prime power (2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, etc.), then here's an algorithm that creates a schedule with the maximum number of rounds, based on a finite affine plane.
We work in the finite field GF(n), where n is the number of players on a team. GF(n) has its own notion of multiplication; when n is a prime, it's multiplication mod n, and when n is higher power of some prime, it's multiplication of univariate polynomials mod some irreducible polynomial of the appropriate degree. Each team is identified by a nonzero element of GF(n); let the set of team identifiers be T. Each team member is identified by a pair in T×GF(n). For each nonzero element r of GF(n), the groups for round r are
{{(t, r*t + c) | t in T} | c in GF(n)},
where * and + denote multiplication and addition respectively in GF(n).
Implementation in Python 3
This problem is very closely related to the Social Golfer Problem. The Social Golfer Problem asks, given n players who each play once a day in g groups of size s (n = g×s), how many days can they be scheduled such that no player plays with any other player more than once?
The algorithms for finding solutions to instances of Social Golfer problems are a patchwork of constraint solvers and mathematical constructions, which together don't address very many cases satisfactorily. If the number of players on a team is equal to the group size, then solutions to this problem can be derived by interpreting the first day's schedule as the team assignments and then using the rest of the schedule. There may be other constructions.

How to assign a different value to each node and each arc of a tree

I wanted to ask a somewhat specific question. I had to solve a problem, which I manage to do with some help. I want to explain what I did and then say what I've been told I'm missing.
What I did: Given a list of nodes (a-b,b-c) I return nodes "ids" from 1 to N (like giving them a name, each an unique number). Then, I see which nodes are connected, and calculate the absolute number that subtracting its names/ids gives me (a-b would be 1-2, abs of that is 1. b-c would have 2-3 as values and would get another 1 for their arc. If I had an a-d node, I would return 1-4, so 3 as their arc value).
Then I return the list of nodes and their IDs/values, and the list of the arcs and its values (1,a-b), (1,a-c).
Graphically:
6a
5 4
1b 2e
2 3
3c 5f
1
4d
5a
4 3
1b 2e
1 2
3c 4f
5a
4 3
1b 2e
2
3c
1
4d
9a
8 7
1b 2e
6 4
7c 6f
2 2
5d 4g
3 1
8h 3i
I've worked this by hand so... still no clear algorythm.
What I've been told I'm missing: Each ARC has to have an unique number, too. So not only nodes, but nodes and arc can only have one number, from 1 to N, being N the number of nodes/arcs.
Problem is, I can't figure this out, at all. Closer I get if I do it on paper, I would calculate doing longer and longer ecuations, but not sure that would actually solve anything, so far I found nothing.
The reason for me not understanding this is that the tree/list a-b,b-c would have as nodes:
a-3,b-2,c-1 ; and arcs: 2,a-b|1,b-c
And another, very simple list like a-b,a-c would have:
a-2,b-1,c-3 ; and arcs: 2,a-b|1,a-c
This is one possible solution, but if the trees grow bigger, I fail to see how is it possible for each arc to have one value between 1 and N, non repeating, and same for each node. Is this even possible? How should I approach this task/am I missing some kind of point of view?
Thanks in advance.
edit: since I am not being clear with terminology:
enumerate(CONNECTIONS_IN, NODES_OUT, ARCS_OUT)
?- enumerate([a-b,b-c], EnumNodos, EnumArcos) returns, as of right now:
EnumNodos=[(1,a),(2,b),(3,c)]
EnumArcos=[(1,a-b), (1,a-c)]
It should give:
EnumNodos=[(3,a),(1,b),(2,c)]
EnumArcos=[(1,a-b), (2,a-c)]%because each arc HAS to have an unique number from 1 to N-1 (nodes are from 1 to N)

Is there a specialized algorithm, faster than quicksort, to reorder data ACEGBDFH?

I have some data coming from the hardware. Data comes in blocks of 32 bytes, and there are potentially millions of blocks. Data blocks are scattered in two halves the following way (a letter is one block):
A C E G I K M O B D F H J L N P
or if numbered
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
First all blocks with even indexes, then the odd blocks. Is there a specialized algorithm to reorder the data correctly (alphabetical order)?
The constraints are mainly on space. I don't want to allocate another buffer to reorder: just one more block. But I'd also like to keep the number of moves low: a simple quicksort would be O(NlogN). Is there a faster solution in O(N) for this special reordering case?

Since this data is always in the same order, sorting in the classical sense is not needed at all. You do not need any comparisons, since you already know in advance which of two given data points.
Instead you can produce the permutation on the data directly. If you transform this into cyclic form, this will tell you exactly which swaps to do, to transform the permuted data into ordered data.
Here is an example for your data:
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now calculate the inverse (I'll skip this step, because I am lazy here, assume instead the permutation I have given above actually is the inverse already).
Here is the cyclic form:
(0)(1 8 4 2)(3 9 12 6)(5 10)(7 11 13 14)(15)
So if you want to reorder a sequence structured like this, you would do
# first cycle
# nothing to do
# second cycle
swap 1 8
swap 8 4
swap 4 2
# third cycle
swap 3 9
swap 9 12
swap 12 6
# so on for the other cycles
If you would have done this for the inverse instead of the original permutation, you would get the correct sequence with a proven minimal number of swaps.
EDIT:
For more details on something like this, see the chapter on Permutations in TAOCP for example.

So you have data coming in in a pattern like
a0 a2 a4...a14 a1 a3 a5...a15
and you want to have it sorted to
b0 b1 b2...b15
With some reordering the permutation can be written like:
a0 -> b0
a8 -> b1
a1 -> b2
a2 -> b4
a4 -> b8
a9 -> b3
a3 -> b6
a6 -> b12
a12 -> b9
a10 -> b5
a5 -> b10
a11 -> b7
a7 -> b14
a14 -> b13
a13 -> b11
a15 -> b15
So if you want to sort in place it with only one block additional space in a temporary t, this could be done in O(1) with
t = a8; a8 = a4; a4 = a2; a2 = a1; a1 = t
t = a9; a9 = a12; a12= a6; a6 = a3; a9 = t
t = a10; a10 = a5; a5 = t
t = a11; a11 = a13; a13 = a14; a14 = a7; a7 = t
Edit:The general case (for N != 16), if it is solvable in O(N), is actually an interesting question. I suspect the cycles always start with a prime number which satisfies p < N/2 && N mod p != 0 and the indices have a recurrence like in+1 = 2in mod N, but I am not able to prove it. If this is the case, deriving an O(N) algorithm is trivial.

maybe i'm misunderstanding, but if the order is always identical to the one given then you can "pre-program" (ie avoiding all comparisons) the optimum solution (which is going to be the one that has the minimmum number of swaps to move from the string given to ABCDEFGHIJKLMNOP and which, for something this small, you can work out by hand - see LiKao's answer).

It is easier for me to label your set with numbers:
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
Start from the 14 and move all even numbers to place (8 swaps). You will get this:
0 1 2 9 4 6 13 8 3 10 7 12 11 14 15
Now you need another 3 swaps (9 with 3, 7 with 13, 11 with 13 moved from 7).
A total of 11 swaps. Not a general solution, but it could give you some hints.

You can also view the intended permutation as a shuffle of the address-bits `abcd <-> dabc' (with abcd the individual bits of the index) Like:
#include <stdio.h>
#define ROTATE(v,n,i) (((v)>>(i)) | (((v) & ((1u <<(i))-1)) << ((n)-(i))))
/******************************************************/
int main (int argc, char **argv)
{
unsigned i,a,b;
for (i=0; i < 16; i++) {
a = ROTATE(i,4,1);
b = ROTATE(a,4,3);
fprintf(stdout,"i=%u a=%u b=%u\n", i, a, b);
}
return 0;
}
/******************************************************/

That was count sort I believe

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio