How many offspring should we produce in Genetic Algorithm?

How many offspring should we produce in Genetic Algorithm? - genetic-algorithm

Suppose I have 100 popsize, should I make 10 offspring?
I want the best combination between popsize and offspring to achieve convergent quickly and please also include the paper.

There is not best offspring/population to every problem you can solve with genetic algorithm. Every problem has a best configuration, which is unknown , not only of offspring/population but also of mutation chance, chromosome design, etc...
With that said there are multiple solutions like total substitution, partial substitution, etc.. With it's own benefits and downsides. You should explore them and decide which one fits your problem the most.

-Suppose N population size for your GA.
chrom # 0 = "01010110101" | Fitness = f0
chrom # 1 = "11010010111" | Fitness = f1
chrom # 2 = "01010111011" | Fitness = f2
chrom # 3 = "01111010100" | Fitness = f3
.
.
.
chrom # N = "01011010110" | Fitness = fN
-You apply tournament of chromosome randomly from the main population with size T : (T < N)
Tournament chrom # 0 = "01010110101" | Fitness = f0
Tournament chrom # 1 = "11010010111" | Fitness = f1
Tournament chrom # 2 = "01010111011" | Fitness = f2
Tournament chrom # 3 = "01111010100" | Fitness = f3
.
.
.
Tournament chrom # T = "01011010110" | Fitness = fT
to simply get mate chromosome:
Mate Chromosome # 1
another Tournament :
Mate Chromosome # 2
you apply crossover to return offspring:
Crossover(Mate Chromosome # 1, Mate Chromosome # 2) => offspring
That technically means you'd get N offspring for your new population.
Mutation(offspring) => new chromosome for new population
Continue the iteration until you converge to the max size of the target chromosome.

Related

N bits integer matching algorithm

I am trying to write an algorithm to establish correlation between n bits integers for the value “1”.
Here is an exemple of a 5 bits integer: 0,1,0,0,1
I want to establish the percentage of correlation between this integer and a set of N other integers.
For example, Integer A(0,1,0,0,1) and Integer B(0,1,0,0,0) have a correlation of 0,5 for the value “1” as only the second bit is matching.
In my Firebase database, I have one n bits integer attached to each user_ID that I want to match against the n bits integer of every other user of my application to get a type of correlation between each user.
The distribution of the total correlations between users will follow a Gaussian curve that I want to use in the future to match users with each other.
For example: I want user A to be matched with every other user with these matches sorted by decreasing order of affinity (from high to low correlation between their n bits integers).
Do you guys have any idea how I could perform the algorithm to establish the correlation between the N number of users and then perform another algorithm to sort these correlations from high to low?
Any help would be greatly appreciated.
Thank you for your time,
Maxime

you can use the and operation to get the Result R.
Example:
A = 9 = 01001
B = 8 = 01000
C = 7 = 00111
D = 31 = 11111
R = A & B gives 8 = 01000, the correlation is counting the ones: R/A = 1/2 = 0,5.
R = A & C gives 1 = 00001, the correlation: R/A = 1/2 = 0,5.
R = A & D gives 9 = 01001, R/A = 2/2 = 1.
Here we have a problem. you can solve this by using the max of the ones occuring in the num like R/max(A,D)
I believe it is better to use the total bit count (here 5).
results would be.
corr AB = 1/5 = 0,2
corr AC = 1/5 = 0,2
corr AD = 2/5 = 0,4
corr CD = 3/5 = 0,6

How to implement Roulette Wheel Selection and Rank Sleection on Matlab code for the Traveling Salesman Problom?

I have an assignment coding a genetic algorithm for the traveling salesman problem. I've written some code giving correct results using Tournament Selection.
The problem is, I have to do Wheel and Rank and the results I get are incorrect.
Here is my code using Tournament Selection:
clc;
clear all;
close all;
nofCities = 30;
initialPopulationSize = nofCities*nofCities;
generations = nofCities*ceil(nofCities/10);
cities = floor(rand([nofCities 2])*100+1);
figure;
hold on;
scatter(cities(:,1), cities(:,2), 5, 'b','fill');
line(cities(:,1), cities(:,2));
line(cities([1 end],1), cities([1 end],2));
axis([0 110 0 110]);
population = zeros(initialPopulationSize ,nofCities);
for i=1:initialPopulationSize
population(i,:) = randperm(nofCities);
end
distanceMatrix = zeros(nofCities);
for i=1:nofCities
for j=1:nofCities
if (i==j)
distanceMatrix(i,j)=0;
else
distanceMatrix(i,j) = sqrt((cities(i,1)-cities(j,1))^2+(cities(i,2)-cities(j,2))^2);
end
end
end
for u=1:generations
tourDistance = zeros(initialPopulationSize ,1);
for i=1:initialPopulationSize
for j=1:length(cities)-1
tourDistance(i) = tourDistance(i) + distanceMatrix(population(i,j),population(i,j+1));
end
end
for i=1:initialPopulationSize
tourDistance(i) = tourDistance(i) + distanceMatrix(population(i,end),population(i,1));
end
min(tourDistance)
newPopulation = zeros(initialPopulationSize,nofCities);
for k=1:initialPopulationSize
child = zeros(1,nofCities);
%tournament start
for i=1:5
tournamentParent1(i) = ceil(rand()*initialPopulationSize);
end
p1 = find(tourDistance == min(tourDistance([tournamentParent1])));
parent1 = population(p1(1), :);
for i=1:5
tournamentParent2(i) = ceil(rand()*initialPopulationSize);
end
p2 = find(tourDistance == min(tourDistance([tournamentParent2])));
parent2 = population(p2(1), :);
%tournament end
%crossover
startPos = ceil(rand()*(nofCities/2));
endPos = ceil(rand()*(nofCities/2)+10);
for i=1:nofCities
if (i>startPos && i<endPos)
child(i) = parent1(i);
end
end
for i=1:nofCities
if (isempty(find(child==parent2(i))))
for j=1:nofCities
if (child(j) == 0)
child(j) = parent2(i);
break;
end
end
end
end
newPopulation(k,:) = child;
end
%mutation
mutationRate = 0.015;
for i=1:initialPopulationSize
if (rand() < mutationRate)
pos1 = ceil(rand()*nofCities);
pos2 = ceil(rand()*nofCities);
mutation1 = newPopulation(i,pos1);
mutation2 = newPopulation(i,pos2);
newPopulation(i,pos1) = mutation2;
newPopulation(i,pos2) = mutation1;
end
end
population = newPopulation;
u
end
figure;
hold on;
scatter(cities(:,1), cities(:,2), 5, 'b','fill');
line(cities(population(i,:),1), cities(population(i,:),2));
line(cities([population(i,1) population(i,end)],1), cities([population(i,1) population(i,end)],2));
axis([0 110 0 110]);
%close all;
What I want is to replace the tournament code with wheel and rank code.
Here is what I wrote for the Wheel Selection:
fitness = tourDistance./sum(tourDistance);
wheel = cumsum(fitness);
parent1 = population(find(wheel >= rand(),1),:);
parent2 = population(find(wheel >= rand(),1),:);

Here is a vectorized implementation of a roulette wheel selection in Matlab:
[~,W] = min(ones(popSize,1)*rand(1,2*popSize) > ((cumsum(fitness)*ones(1,2*popSize)/sum(fitness))),[],1);
This assumes that the fitness input into the selection scheme is a matrix of size (popSize x 1) (or a column vector of the same size as the number of population members).
And popSize is obviously the amount of members in your population. And W is the winners or the population members that are selected to become parents/crossover.
The output of the selection will be selected_parents which is a double row vector of size 2*popSize which has all of the indices of the members of the population that will be used in the crossover stage.
This row vector can then be input into a vectorized crossover scheme that could look something like this:
%% Single-Point Preservation Crossover
Pop2 = Pop(W(1:2:end),:); % Pop2 Winners 1
P2A = Pop(W(2:2:end),:); % Pop2 Winners 2
Lidx = sub2ind(size(Pop),[1:popSize]',round(rand(popSize,1)*(genome-1)+1));
vLidx = P2A(Lidx)*ones(1,genome);
[r,c]=find(Pop2==vLidx);
[~,Ord]=sort(r);
r = r(Ord); c = c(Ord);
Lidx2 = sub2ind(size(Pop),r,c);
Pop2(Lidx2) = Pop2(Lidx);
Pop2(Lidx) = P2A(Lidx);
this crossover assumes an input of the W variable from the selection scheme. It also uses Pop which is the population members stored in a popSize by genome matrix. (genome is the number of cities in one tour and also happens to be the size of the genome). The genome is stored as an array of integers with each integer being a city and the tour being defined as the order from the value of the genome array from the array's first index to the array's last index.
while we are at it we may as well include a nice vectorized mutation scheme for a permuation genetic algorithm (which this is).
%% Mutation (Permutation)
idx = rand(popSize,1)<mutRate;
Loc1 = sub2ind(size(Pop2),1:popSize,round(rand(1,popSize)*(genome-1)+1));
Loc2 = sub2ind(size(Pop2),1:popSize,round(rand(1,popSize)*(genome-1)+1));
Loc2(idx == 0) = Loc1(idx == 0);
[Pop2(Loc1), Pop2(Loc2)] = deal(Pop2(Loc2), Pop2(Loc1));
This mutation randomly flips the order of 2 cities in our tour (genome).
Finally make sure to update your population after all of that work we did!
%% Update Population!
Pop = Pop2; % updates the population to include crossovers and mutation.
So i know this reply is probably way too late for your assignment, but hopefully it will help someone else with a similar problem.
I REALLY REALLY recommend anyone interested in vectorized genetic algorithms in Matlab to read this paper: UCL: Efficiently Vectorized Code for Population Based Optimization Algorithms
It is what i based all of the code off of in the examples and it will teach you why you are writing the code that way. Its a great resource and what got me started with GAs.

For wheel selection to work, you should start with designing a fitness measure with fitter individuals having a bigger value. In contrast to the distance where better individuals having a smaller value. Then your approach with the cumsum should work.
Where is the issue with ranking selection?

Understanding the algorithm for pattern matching using an LCP array

Foreword: My question is mainly an algorithmic question, so even if you are not familiar with suffix and LCP arrays you can probably help me.
In this paper it is described how to efficiently use suffix and LCP arrays for string pattern matching.
I understood SA and LCP work and how the algorithm's runtime can be improved from O(P*log(N)) (where P is the length of the pattern and N is length of the string) to O(P+log(N)) (Thanks to Chris Eelmaa's answer here and jogojapans answer here).
I was trying to go through the algorithm in figure 4 which explains the usage of LLcp and RLcp. But I have problems understanding how it works.
The algorithm (taken from the source):
Explanation of the used variable names:
lcp(v,w) : Length of the longest common prefix of v and w
W = w0..wP-1 : pattern of length P
A = a0..aN-1 : the text (length N)
Pos[0..N-1] : suffix array
L_W : index (in A) of first occurrence of the matched pattern
M : middle index of current substring
L : lower bound
R : upper bound
Lcp : array of size N-2 such that Lcp[M] = lcp(A_Pos[L_M], A_pos[M]) where L_M is the lower bound of the unique interval with M in the middle
Rcp : array of size N-2 such that Rcp[M] = lcp(A_Pos[R_M], A_pos[M]) where R_M is the upper bound of the unique interval with M in the middle
Now I want to try the algorithm using the following example (partly taken from here):
SA | LCP | Suffix entry
-----------------------
5 | N/A | a
3 | 1 | ana
1 | 3 | anana
0 | 0 | banana
4 | 0 | na
2 | 2 | nana
A = "banana" ; N = 6
W = "ban" ; P = 3
I want to try to match a string, say ban and would expect the algorithm to return 0 as L_W.
Here is how I would step through the algorithm:
l = lcp("a", "ban") = 0
r = lcp("nana", "ban") = 0
if 0 = 3 or 'b' =< 'a' then // which is NOT the case for both conditions
L_W = 0
else if 0 < 3 or 'b' =< 'n' then // which is the case for both conditions
L_W = 6 // which means 'not found'
...
...
I feel like I am missing something but I can't find out what. Also I am wondering how the precomputed LCP array can be used instead of calling lcp(v,w).

I believe there was an error.
First condition is fairly easy to understand. When LCP length == pattern length, it's done. When your pattern is even smaller than or equal to the smallest one, then only choice is the smallest one.
The second condition is wrong. We can prove it by contradiction. r < P || Wr <= a... means r >= P && Wr > a... If r >= P, then how can we have Lw = N(not found), since we already have r length common prefix?

Elevator algorithm for minimum distance

I have a building with a single elevator and I need to find an algorithm for this elevator. We gets a list of objects of this form: {i->j}, where i is the floor that a resident wants to take the elevator from and j is the floor he wants to get down on.
An infinite amount of people can use the elevator at the same time, and it's irrelevant how long people stay in the elevator. The elevator starts from the first floor.
I checked a little on the web and I found the "elevator algorithm" but it doesn't really help me. It says that I should go all the way up and then all the way down. But consider when one resident wants to go from 1 to 100 and another resident wants to go from 50 to 49. Using the above algorithm, it will take a distance of 151 floors. If I instead follow this path: 1->50->49->100, it takes only 102 floors, which is better.
What algorithm should I use?

Here's one way to formulate this problem as a Time-based Integer Program. (It might seem like an overkill to generate all the constraints, but it is guaranteed to produce the optimal solution)
Let's say that elevator takes 1 unit of time to go from floor F to F+1 or to F-1.
The Insight: We use the fact that at any time t, there is only one decision to be made. Whether to go UP or to go DOWN. That is the Decision Variable for our problem. DIR_t = +1 if the elevator moves up at time t, -1 otherwise.
We want to minimize the time when all the passengers reach their destination.
This table makes it clearer
Time FLOOR_t Dir_t
1 1 1
2 2 1
3 3 1
4 4 1
... ... ...
49 49 1
50 50 -1
51 49 1
52 50 1
...
100 99 1
101 100 NA
Now, let's bring in the passengers. There are P passengers and each one wants to go from
SF to EF (their starting Floor to their ending floor, their destination.)
So we are given (SF_p, EF_p) for each passenger p.
Constraints
We know that the Floor in which the elevator is present at time t is
F_t = F_t-1 + DIR_t-1
(F0 = 0, DIR_0 = 1, F1 = 1 just to start things off.)
Now, let ST_p be the time instant when passenger p Starts their elevator journey. Let ET_p be the time instant when passenger p ends their elevator journey.
Note that SF and EF are input parameters given to us, but ST and ET are variables that the IP will set when solving. That is, the floors are given to us, we have to come up with the times.
ST_p = t if F_t = SF_p # whenever the elevator comes to a passenger's starting floor, their journey starts.
ET_p = t if F_t = EF_p AND ST_p > 0 (a passenger cannot end their journey before it commenced.)
This can be enforced by introducing new 0/1 indicator variables.
ETp > STp # you can only get off after you got on
Finally, let's introduce one number T which is the time when the entire set of trips is done. It is the max of all ET's for each p. This is what needs to be minimized.
T > ET_p for all p # we want to find the time when the last passenger gets off.
Formulation
Putting it all together:
Min T
T > ET_p for all p
F_t = F_t-1 + DIR_t-1
ETp > STp # you can only get off after you got on
ST_p = t if F_t = SF_p # whenever the elevator some to a passenger's starting floor, their journey starts.
ET_p = t if F_t = EF_p AND ST_p > 0
ET_p >= 1 #everyone should end their journey. Otherwise model will give 0 as the obj function value.
DIR_t = (+1, -1) # can be enforced with 2 binary variables if needed.
Now after solving this IP problem, the exact trip can be traced using the values of each DIR_t for each t.

There's a polynomial-time dynamic program whose running time does not depend on the number of floors. If we pick up passengers greedily and make them wait, then the relevant state is the interval of floors that the elevator has visited (hence the passengers picked up), the floor on which the elevator most recently picked up or dropped off, and two optional values: the lowest floor it is obligated to visit for the purpose of dropping off passengers currently inside, and the highest. All of this state can be described by the identities of five passengers plus a constant number of bits.
I'm quite sure that there is room for improvement here.

Your question mirrors disk-head scheduling algorithms.
Check out shortest seek time first vs scan, cscan, etc.
There are cases where sstf wins, but what if it was 50 to 10, and you also had 2 to 100, 3 to 100, 4 to 100, 5 to 100, 6 to 100 etc. You can see you add the overhead to all of the other people. Also, if incoming requests have a smaller seek time, starvation can occur (similar to process scheduling).
In your case, it really depends on if the requests are static or dynamic. If you want to minimize variance, go with scan/cscan etc.

In the comments to C.B.'s answer, the OP comments: "the requests are static. in the beginning i get the full list." I would welcome counter examples and/or other feedback since it seems to me that if we are given all trips in advance, the problem can be drastically reduced if we consider the following:
Since the elevator has an unlimited capacity, any trips going up that end lower than the highest floor we will visit are irrelevant to our calculation. Since we are guaranteed to pass all those pickups and dropoffs on the way to the highest point, we can place them in our schedule after considering the descending trips.
Any trips 'contained' in other trips of the same direction are also irrelevant since we will pass those pickups and dropoffs during the 'outer-most' trips, and may be appropriately scheduled after considering those.
Any overlapping descending trips may be combined for a reason soon to be apparent.
Any descending trips occur either before or after the highest point is reached (excluding the highest floor reached being a pickup). The optimal schedule for all descending trips that we've determined to occur before the highest point (considering only 'outer-container' types and two or more overlapping trips as one trip) is one-by-one as we ascend, since we are on the way up anyway.
How do we determine which descending trips should occur after the highest point?
We conduct our calculation in reference to one point, TOP. Let's call the trip that includes the highest floor reached H and the highest floor reached HFR. If HFR is a pickup, H is descending and TOP = H_dropoff. If HFR is a dropoff, H is ascending and TOP = HFR.
The descending trips that should be scheduled after the highest floor to be visited are all members of the largest group of adjacent descending trips (considering only 'outer-container' types and two or more overlapping trips as one trip) that we can gather, starting from the next lower descending trip after TOP and continuing downward, where their combined individual distances, doubled, is greater than the total distance from TOP to their last dropoff. That is, where (D1 + D2 + D3...+ Dn) * 2 > TOP - Dn_dropoff
Here's a crude attempt in Haskell:
import Data.List (sort,sortBy)
trips = [(101,100),(50,49),(25,19),(99,97),(95,93),(30,20),(35,70),(28,25)]
isDescending (a,a') = a > a'
areDescending a b = isDescending a && isDescending b
isContained aa#(a,a') bb#(b,b') = areDescending aa bb && a < b && a' > b'
extends aa#(a,a') bb#(b,b') = areDescending aa bb && a <= b && a > b' && a' < b'
max' aa#(a,a') bb#(b,b') = if (maximum [b,a,a'] == b) || (maximum [b',a,a'] == b')
then bb
else aa
(outerDescents,innerDescents,ascents,topTrip) = foldr f ([],[],[],(0,0)) trips where
f trip (outerDescents,innerDescents,ascents,topTrip) = g outerDescents trip ([],innerDescents,ascents,topTrip) where
g [] trip (outerDescents,innerDescents,ascents,topTrip) = (trip:outerDescents,innerDescents,ascents,max' trip topTrip)
g (descent:descents) trip (outerDescents,innerDescents,ascents,topTrip)
| not (isDescending trip) = (outerDescents ++ (descent:descents),innerDescents,trip:ascents,max' trip topTrip)
| isContained trip descent = (outerDescents ++ (descent:descents),trip:innerDescents,ascents,topTrip)
| isContained descent trip = (trip:outerDescents ++ descents,descent:innerDescents,ascents,max' trip topTrip)
| extends trip descent = ((d,t'):outerDescents ++ descents,(t,d'):innerDescents,ascents,max' topTrip (d,t'))
| extends descent trip = ((t,d'):outerDescents ++ descents,(d,t'):innerDescents,ascents,max' topTrip (t,d'))
| otherwise = g descents trip (descent:outerDescents,innerDescents,ascents,topTrip)
where (t,t') = trip
(d,d') = descent
top = snd topTrip
scheduleFirst descents = (sum $ map (\(from,to) -> 2 * (from - to)) descents)
> top - (snd . last) descents
(descentsScheduledFirst,descentsScheduledAfterTop) =
(descentsScheduledFirst,descentsScheduledAfterTop) where
descentsScheduledAfterTop = (\x -> if not (null x) then head x else [])
. take 1 . filter scheduleFirst
$ foldl (\accum num -> take num sorted : accum) [] [1..length sorted]
sorted = sortBy(\a b -> compare b a) outerDescents
descentsScheduledFirst = if null descentsScheduledAfterTop
then sorted
else drop (length descentsScheduledAfterTop) sorted
scheduled = ((>>= \(a,b) -> [a,b]) $ sort descentsScheduledFirst)
++ (if isDescending topTrip then [] else [top])
++ ((>>= \(a,b) -> [a,b]) $ sortBy (\a b -> compare b a) descentsScheduledAfterTop)
place _ [] _ _ = error "topTrip was not calculated."
place floor' (floor:floors) prev (accum,numStops)
| floor' == prev || floor' == floor = (accum ++ [prev] ++ (floor:floors),numStops)
| prev == floor = place floor' floors floor (accum,numStops)
| prev < floor = f
| prev > floor = g
where f | floor' > prev && floor' < floor = (accum ++ [prev] ++ (floor':floor:floors),numStops)
| otherwise = place floor' floors floor (accum ++ [prev],numStops + 1)
g | floor' < prev && floor' > floor = (accum ++ [prev] ++ (floor':floor:floors),numStops)
| otherwise = place floor' floors floor (accum ++ [prev],numStops + 1)
schedule trip#(from,to) floors = take num floors' ++ fst placeTo
where placeFrom#(floors',num) = place from floors 1 ([],1)
trimmed = drop num floors'
placeTo = place to (tail trimmed) (head trimmed) ([],1)
solution = foldl (\trips trip -> schedule trip trips) scheduled (innerDescents ++ ascents)
main = do print trips
print solution
Output:
*Main> main
[(101,100),(50,49),(25,19),(99,97),(95,93),(30,20),(35,70),(28,25)]
[1,25,28,30,25,20,19,35,50,49,70,101,100,99,97,95,93]

Enumerate matrix combinations with fixed row and column sums

I'm attempting to find an algorithm (not a matlab command) to enumerate all possible NxM matrices with the constraints of having only positive integers in each cell (or 0) and fixed sums for each row and column (these are the parameters of the algorithm).
Exemple :
Enumerate all 2x3 matrices with row totals 2, 1 and column totals 0, 1, 2:
| 0 0 2 | = 2
| 0 1 0 | = 1
0 1 2
| 0 1 1 | = 2
| 0 0 1 | = 1
0 1 2
This is a rather simple example, but as N and M increase, as well as the sums, there can be a lot of possibilities.
Edit 1
I might have a valid arrangement to start the algorithm:
matrix = new Matrix(N, M) // NxM matrix filled with 0s
FOR i FROM 0 TO matrix.rows().count()
FOR j FROM 0 TO matrix.columns().count()
a = target_row_sum[i] - matrix.rows[i].sum()
b = target_column_sum[j] - matrix.columns[j].sum()
matrix[i, j] = min(a, b)
END FOR
END FOR
target_row_sum[i] being the expected sum on row i.
In the example above it gives the 2nd arrangement.
Edit 2:
(based on j_random_hacker's last statement)
Let M be any matrix verifying the given conditions (row and column sums fixed, positive or null cell values).
Let (a, b, c, d) be 4 cell values in M where (a, b) and (c, d) are on the same row, and (a, c) and (b, d) are on the same column.
Let Xa be the row number of the cell containing a and Ya be its column number.
Example:
| 1 a b |
| 1 2 3 |
| 1 c d |
-> Xa = 0, Ya = 1
-> Xb = 0, Yb = 2
-> Xc = 2, Yc = 1
-> Xd = 2, Yd = 2
Here is an algorithm to get all the combinations verifying the initial conditions and making only a, b, c and d varying:
// A matrix array containing a single element, M
// It will be filled with all possible combinations
matrices = [M]
I = min(a, d)
J = min(b, c)
FOR i FROM 1 TO I
tmp_matrix = M
tmp_matrix[Xa, Ya] = a - i
tmp_matrix[Xb, Yb] = b + i
tmp_matrix[Xc, Yc] = c - i
tmp_matrix[Xd, Yd] = d + i
matrices.add(tmp_matrix)
END FOR
FOR j FROM 1 TO J
tmp_matrix = M
tmp_matrix[Xa, Ya] = a + j
tmp_matrix[Xb, Yb] = b - j
tmp_matrix[Xc, Yc] = c + j
tmp_matrix[Xd, Yd] = d - j
matrices.add(tmp_matrix)
END FOR
It should then be possible to find every possible combination of matrix values:
Apply the algorithm on the first matrix for every possible group of 4 cells ;
Recursively apply the algorithm on each sub-matrix obtained by the previous iteration, for every possible group of 4 cells except any group already used in a parent execution ;
The recursive depth should be (N*(N-1)/2)*(M*(M-1)/2), each execution resulting in ((N*(N-1)/2)*(M*(M-1)/2) - depth)*(I+J+1) sub-matrices. But this creates a LOT of duplicate matrices, so this could probably be optimized.

Are you needing this to calculate Fisher's exact test? Because that requires what you're doing, and based on that page, it seems there will in general be a vast number of solutions, so you probably can't do better than a brute force recursive enumeration if you want every solution. OTOH it seems Monte Carlo approximations are successfully used by some software instead of full-blown enumerations.
I asked a similar question, which might be helpful. Although that question deals with preserving frequencies of letters in each row and column rather than sums, some results can be translated across. E.g. if you find any submatrix (pair of not-necessarily-adjacent rows and pair of not-necessarily-adjacent columns) with numbers
xy
yx
Then you can rearrange these to
yx
xy
without changing any row or column sums. However:
mhum's answer proves that there will in general be valid matrices that cannot be reached by any sequence of such 2x2 swaps. This can be seen by taking his 3x3 matrices and mapping A -> 1, B -> 2, C -> 4 and noticing that, because no element appears more than once in a row or column, frequency preservation in the original matrix is equivalent to sum preservation in the new matrix. However...
someone's answer links to a mathematical proof that it actually will work for matrices whose entries are just 0 or 1.
More generally, if you have any submatrix
ab
cd
where the (not necessarily unique) minimum is d, then you can replace this with any of the d+1 matrices
ef
gh
where h = d-i, g = c+i, f = b+i and e = a-i, for any integer 0 <= i <= d.

For a NXM matrix you have NXM unknowns and N+M equations. Put random numbers to the top-left (N-1)X(M-1) sub-matrix, except for the (N-1, M-1) element. Now, you can find the closed form for the rest of N+M elements trivially.
More details: There are total of T = N*M elements
There are R = (N-1)+(M-1)-1 randomly filled out elements.
Remaining number of unknowns: T-S = N*M - (N-1)*(M-1) +1 = N+M

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio