Uniqueness in Permutation and Combination - algorithm

I am trying to create some pseudocode to generate possible outcomes for this scenario:
There is a tournament taking place, where each round all players in the tournament are in a group with other players of different teams.
Given x amount of teams, each team has exactly n amount of players. What are the possible outcomes for groups of size r where you can only have one player of each team AND the player must have not played with any of the other players already in previous rounds.
Example: 4 teams (A-D), 4 players each team, 4 players each grouping.
Possible groupings are: (correct team constraint)
A1, B1, C1, D1
A1, B3, C1, D2
But not: (violates same team constraint)
A1, A3, C2, D2
B3, C2, D4, B1
However, the uniqueness constraint comes into play in this grouping
A1, B1, C1, D1
A1, B3, C1, D2
While it does follow the constraints of playing with different teams, it has broken the rule of uniqueness of playing with different players. In this case A1 is grouped up twice with C1
At the end of the day the pseudocode should be able to create something like the following
Round 1 Round 2 Round 3 Round 4
a1 b1 a1 d4 a1 c2 a1 c4
c1 d1 b2 c3 b4 d3 d2 b3
a2 b2 a2 d1 a2 c3 a2 c1
c2 d2 b3 c4 b1 d4 d3 b4
a3 b3 a3 d2 a3 c4 a3 c2
c3 d3 b4 c1 b2 d1 d4 b1
a4 b4 a4 d3 a4 c1 a4 c3
c4 d4 b1 c2 b3 d2 d1 b2
In the example you see that in each round no player has been grouped up with another previous player.

If the number of players on a team is a prime power (2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, etc.), then here's an algorithm that creates a schedule with the maximum number of rounds, based on a finite affine plane.
We work in the finite field GF(n), where n is the number of players on a team. GF(n) has its own notion of multiplication; when n is a prime, it's multiplication mod n, and when n is higher power of some prime, it's multiplication of univariate polynomials mod some irreducible polynomial of the appropriate degree. Each team is identified by a nonzero element of GF(n); let the set of team identifiers be T. Each team member is identified by a pair in T×GF(n). For each nonzero element r of GF(n), the groups for round r are
{{(t, r*t + c) | t in T} | c in GF(n)},
where * and + denote multiplication and addition respectively in GF(n).
Implementation in Python 3
This problem is very closely related to the Social Golfer Problem. The Social Golfer Problem asks, given n players who each play once a day in g groups of size s (n = g×s), how many days can they be scheduled such that no player plays with any other player more than once?
The algorithms for finding solutions to instances of Social Golfer problems are a patchwork of constraint solvers and mathematical constructions, which together don't address very many cases satisfactorily. If the number of players on a team is equal to the group size, then solutions to this problem can be derived by interpreting the first day's schedule as the team assignments and then using the rest of the schedule. There may be other constructions.

Related

How to perform crossover in a 2-dimensional array - genetic algorithm

I have the following two chromosomes which are represented as a 2D array.
// First chromosome
[
[ 12 45 23 ]
[ 34 01 89 ]
[ 33 90 82 ]
]
// Second chromosome
[
[00 45 89 ]
[00 00 34 ]
]
The constraints on the chromosome are that each array in the chromosome array must remain together. For example in the first chromosome [ 12 45 23 ] must remain together. With this in mind, I believe the way to perform crossover with the above chromosome structure is to randomly select a horizontal crossover point. such as the following:
// First produced off-spring
[
[ 12 45 23 ] // First chromosome
[ 00 00 34 ] // Second chromosome
]
// Second produced off-spring
[
[ 00 45 89 ] // Second chromosome
[ 34 01 89 ] // First chromosome
[ 33 90 82 ] // First chromosome
]
Is this the correct way to perform mutation on a 2D chromosome array which rows must remain intact? If this is, does this method have a specific name? Or would this come under One-point crossover?
does this method have a specific name? Or would this come under One-point crossover?
In various papers about variable length genetic algorithms it's called one point crossover.
For variable length chromosomes one point crossover is often proposed in a more general way: you can select a distinct crossover point for each chromosome. E.g.
C1 = [ A1, A2, A3, A4, A5, A6]
C2 = [ B1, B2, B3, B4]
Choosing crossover point 1 for C1 and 3 for C2 you get:
C1 = [ A1 | A2, A3, A4, A5, A6]
C2 = [ B1, B2, B3 | B4]
C1' = [A1 B4]
C2' = [B1, B2, B3, A2, A3, A4, A5, A6]
This allows the chromosome length to start growing. Depending on the specific problem it could be a requirement or just bloating (in both cases you may need to account for that in the fitness function).
Is this the correct way to perform mutation on a 2D chromosome array which rows must remain intact?
It's a simple method (so a good one). Uniform crossover is another simple approach.
Synapsing Variable-Length Crossover: Meaningful Crossover for Variable-Length Genomes (Benjamin Hutt and Kevin Warwick, IEEE Transactions on Evolutionary Computation, vol. 11, no. 1, february 2007) describes other interesting (more complex) possibilities.
The best crossover is very problem specific.

Clusterization algorithm

I have problem with clusterization of clients.
I have a dataset with columns such as name, address, email, phone, etc. (in a example A,B,C). Each row has unique identifier (ID). I need to assign CLUSTER_ID (X) to each row. In one cluster all rows have one or more the same attributes as other rows. So clients with ID=1,2,3 have the same A attribute and clients with ID=3,10 have the same B attribute then ID=1,2,3,10 should be in the same cluster.
How can I solve this problem using SQL?
If it's not possible how to write the algorithm (pseudocode)?
The performance is very important, because the dataset contains milions of rows.
Sample Input:
ID A B C
1 A1 B3 C1
2 A1 B2 C5
3 A1 B10 C10
4 A2 B1 C5
5 A2 B8 C1
6 A3 B1 C4
7 A4 B6 C3
8 A4 B3 C5
9 A5 B7 C2
10 A6 B10 C3
11 A8 B5 C4
Sample Output:
ID A B C X
1 A1 B3 C1 1
2 A1 B2 C5 1
3 A1 B10 C10 1
4 A2 B1 C5 1
5 A2 B8 C1 1
6 A3 B1 C4 1
7 A4 B6 C3 1
8 A4 B3 C5 1
9 A5 B7 C2 2
10 A6 B10 C3 1
11 A8 B5 C4 1
Thanks for any help.
A possible way is by repeating updates for the empty X.
Start with cluster_id 1.
F.e. by using a variable.
SET #CurrentClusterID = 1
Take the top 1 record, and update it's X to 1.
Now loop an update for all records with an empty X,
and that can be linked to a record with X = 1 and that has the same A or B or C
Disclaimer:
The statement will vary depending on the RDBMS.
This is just intended as pseudo-code.
WHILE (<<some check to see if there were records updated>>)
BEGIN
UPDATE yourtable t
SET t.X = #CurrentClusterID
WHERE t.X IS NULL
AND EXISTS (
SELECT 1 FROM yourtable d
WHERE d.X = #CurrentClusterID
AND (d.A = t.A OR d.B = t.B OR d.C = t.C)
);
END
Loop that till it updates 0 records.
Now repeat the method for the other clusters, till there are no more empty X in the table.
1) Increase the #CurrentClusterID by 1
2) Update the next top 1 record with an empty X to the new #CurrentClusterID
3) Loop the update till no-more updates were done.
An example test on db<>fiddle here for MS Sql Server.

How to optimize the algorithm to find the max_depth_contact_series in a time varying graph?

Assuming there is a time varying graph with N nodes named a1,a2,...,an and contact series as t node1 node2 meaning node1 contacts with node2 at time t
Assuming node a1 carries a message(there is only one copy of the message in the graph), from time 0, how many nodes can the message contact with at most in time T? The message can be transferred to another node freely at anytime. For example, a1 can chose to transfer it to a2 at time 2 or keeps the message until a1 contacts with a3 and transfers it to a3.
Here is an example to make it more clear. For a graph with 6 nodes and contact series:
1 a1 a2
2 a1 a3
3 a1 a4
4 a3 a5
6 a3 a6
10 a4 a3
During time 0~10 the message can contact with 4 nodes at most:a2,a3,a5,a6 with message tranferred from a1 to a3 at time 2.
Keep in mind the time series. Here a1 carries the message but transfers the message to a3 at time 2. Then at time 3 node a1 has no message so the message cant contact with a4. If a1 keeps message at time 2 instead of tranferring to a3, the message contacts with the list a2,a3,a4,a3. The contact set will be {a2,a3,a4} with size 3 which is smaller than 4.
How can I get the largest contact nodes set? Or just the number?
At present I get it with recursive algorithm but the cost is unbearable when T is large.

Avoid accuracy problems while computing the permanent using the Ryser formula

Task
I want to calculate the permanent P of a NxN matrix for N up to 100. I can make use of the fact that the matrix features only M=4 (or slightly more) different rows and cols. The matrix might look like
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
... | r1 identical rows
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
...
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
A3 ... A3 B3 ... B2 C2 ... C2 D2 ... D2
...
A3 ... A3 B3 ... B3 C3 ... C3 D3 ... D3
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
...
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
---------
c1 identical cols
and c and r are the multiplicities of cols and rows. All values in the matrix are laying between 0 and 1 and are encoded as double precision floating-point numbers.
Algorithm
I tried to use the Ryser formula to calculate the permanent. For the formula, one needs to first calculate the sum of each row and multiply all the row sums. For the matrix above this yields
S0 = (c1 * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* (c1 * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
As a next step the same is done with col 1 deleted
S1 = ((c1-1) * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* ((c1-1) * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
and this number is subtracted from S0.
The algorithm continues with all possible ways to delete single and group of cols and the products of the row sums of the remaining matrix are added (even number of cols deleted) and subtracted (odd number of cols deleted).
The task can be solved relative efficiently if one makes use of the identical cols (for example the result S1 will pop up exactly c1 times).
Problem
Even if the final result is small the values of the intermediate results S0, S1, ... can reach values up to N^N. A double can hold this number but the absolute precision for such big numbers is below or on the order of the expected overall result. The expected result P is on the order of c1!*c2!*c3!*c4! (actually I am interested in P/(c1!*c2!*c3!*c4!) which should lay between 0 and 1).
I tried to arrange the additions and subtractions of the values S in a way that the sums of the intermediate results are around 0. This helps in the sense that I can avoid intermediate results that are exceeding N^N, but this improves things only a little bit. I also thought about using logarithms for the intermediate results to keep the absolute numbers down - but the relative accuracy of the encoded numbers will be still bounded by the encoding as floating point number and I think I will run into the same problem. If possible, I want to avoid the usage of data types that are implementing a variable-precision arithmetic for performance reasons (currently I am using matlab).

Find the best set among the many sets based on it's item's cost

I have items in sets as a below example. Each item contains particular cost.
I have a max budget. I need to do combination in such a way that in each combination I need at least one item from each set and sum of the costs should be equal to my budget.
Example
A = [a1, a2, a3, a4, ... , a10]
B = [b1, b2, b3, b4, ... , b10]
C = [c1, c2, c3, c4, ... , c10] may be upto G
Max budget = 10
cost of a1 = 2
a2 = 8
b1 = 1
b2 = 7
c1 = 3
c2 = 1
etc
Output can be
[a1, b2, c2] i,e 2+7+1 = 10
[a2, b1, c2] i,e 8+1+1 = 10
[a1, b1, c1] i,e 2+1+3 = 6 Eliminated (since 6 != 10)
goes on
I can have max of 7 sets and 10 items in each. So maximum combinations will be 10^7. Is there any algorithm to achieve this easily. I followed brute force method and it is too expensive.
Thank you.

Resources