Looking for a clever way to sort a set of data

Looking for a clever way to sort a set of data - algorithm

I have a set of 80 students and I need to sort them into 20 groups of 4.
I have their previous exam scores from a prerequisite module and I want to ensure that the average of the sorted group members scores is as close as possible to the overall average of the previous exam scores.
Sorry, if that isn't particularly clear.
Here's a snapshot of the problem:
Student Score
AA 50
AB 45
AC 80
AD 70
AE 45
AF 55
AG 65
AH 90
So the average of the scores here is 62.5. How would I best go about sorting these eight students into two groups of four such that, for both groups, the average of their combined exam scores is as close as possible to 62.5.
My problem is exactly this but with 80 data points (20 groups) rather than 8 (2 groups).
The more I think about this problem the harder it seems.
Does anyone have any ideas?
Thanks

One Possible Solution:
I would try going with a greedy algorithm that starts by pairing each student with another student that gets you closest to your target average. After the initial pairing you should then be able to make subsequent pairs out of the first pairs using the same approach.
After the first round of pairing, this approach leverages taking the average of two averages and comparing that to the target mean to create subsequent groups. You can read more about why that will work for this problem here.
However,
This will not necessarily give you the optimal solution, but is rather a heuristic technique to solve the problem. One noted example below is when one low value must be offset by three high values to reach the targeted mean. These types of groupings will not be accounted for by this technique. However, if you know you have a relatively normal distribution centered around your targeted mean then I think this approach should give a decent approximation.

First sort the goup by score. So it becomes:
AH 90
AC 80
.....
AB 45
AE 45
Then start combinning the first with the last:
(AE, AH, 67.5)
(AB, AC, 62.5)
(AD, AA, 60)
(AG, AF, 60)
And so on in the other case you will combine the two by two. First two with the last two.
Another way:
1. Find all the possible groups by 4 students.
2. Then for every combination of groups find the abs deviation from the average score and SUM it up for the combination of groups.
3. Choose the combination of groups with the lowest sum.

Initially, I did think about the top-bottom match option.
However, as John has highlighted, the results certainly aren't optimal:
Scores Students Avg.
40 94 40 94 'AE' 'DA' 'AI' 'AR' 67
40 90 40 88 'AK' 'CI' 'AM' 'BP' 64.5
40 85 40 80 'AQ' 'AW' 'AT' 'BD' 61.25
40 79 40 77 'AU' 'BC' 'AV' 'AB' 59
40 76 40 75 'AX' 'CG' 'AZ' 'CQ' 57.75
40 75 40 75 'BF' 'CB' 'BN' 'BQ' 57.5
40 75 40 74 'BR' 'BI' 'CF' 'CZ' 57.25
40 74 40 74 'CK' 'CO' 'CP' 'AL' 57
40 72 41 71 'DB' 'CN' 'AG' 'BO' 56
41 71 42 70 'CD' 'BM' 'AH' 'BS' 56
42 70 42 69 'BG' 'BL' 'CU' 'CX' 55.75
43 68 44 67 'BK' 'CY' 'AD' 'CE' 55.5
44 64 44 64 'BJ' 'CR' 'BZ' 'BY' 54
45 64 45 63 'BW' 'BV' 'CS' 'BE' 54.25
45 62 47 60 'CV' 'CH' 'AC' 'CM' 53.5
47 59 47 58 'BT' 'AY' 'CL' 'AP' 52.75
47 57 48 57 'CT' 'BA' 'BX' 'AS' 52.25
48 56 49 56 'CA' 'AJ' 'AN' 'AA' 52.25
50 55 50 54 'BB' 'AF' 'CJ' 'AO' 52.25
51 52 51 52 'CC' 'BU' 'CW' 'BH' 51.5

Related

Is it possible to find maximum value of 2 or more column in a table?

for example : I have a table as follows
id math science english history
1 80 90 90 90
2 70 60 81 78
3 69 50 45 80
4 30 40 10 80
i only want to find the maximum value in column math and science.
Is it possible?

Simply use this :
select max(science),max(math) from your_table

Identifying DEFLATE Algorithm Variant Being Used in Proprietary File Format

Disclaimer: This problem requires a very good knowledge of the DEFLATE algorithm.
I am hoping I could solicit some ideas identifying the compression algorithm being used in a particular file format. This is a legacy proprietary format that my application needs to support, so we are trying to reverse engineer it. (Going to the original creator is not an option, for reasons I won't get into).
I'm extremely close to cracking it, but I feel like I'm living Xeno's paradox because every day I seem to get halfway closer to the finish line but never there!
Here's what I know so far:
It is definitely using something extremely similar to the DEFLATE algorithm. Similarities -
The compressed data is represented by canonical Huffman codes
(usually starting with 000, but I'm not sure that is always the
case).
The data is preceded (I believe immediately) by a header table
which identifies the bit lenghts of each of the actual codes. Like
DEFLATE, this table ALSO comprises cannonical Huffman codes
(starting either at 0 or 00). These codes provide the bit-lenghts of
each character in the 0-255+ alphabet plus whatever distance codes
might be used.
Finally, again like DEFLATE, the header table with the
bit lenghts for the main codes is also preceded (I think immediately)
by a series of 3-bit codes used to derive the header table codes
(I'll call this the "pre-header").
At this point the similarities seem to end though.
The 3-bit codes in the pre-header do not appear go in the 16, 17, 18, 0, 8 ... optimal order as specified by DEFLATE, but rather seem to go sequentially, like 6 7 8 9....
Another difference is that each 3-bit code is not necessarily a literal bit length. For example, here's a header that I've mostly deciphered (I'm 99.99% confident it is correct):
00000001011 100 010 110 010 010 011 010 110 101 100 011 010 010 011 100 010 111
*0* skA *3* *4* *5* *6* *7* *8* *9* skB
Ignoring the unmarked bits, this results in the following code table:
00 7-bits
01 8-bits
100 6-bits
101 9-bits
1100 0-bits (skip code)
1101 skA = skip 3 + value of next 4 bits
1110 5-bits
11110 4-bits
111110 skB = skip 11? + value of next 9 bits
111111 3-bits
The most glaring problem is that there are additional bit-lenghts in the header table that are unused. And, in fact, they would not be usable at all, as there cannot be any additional 2-bit or 3-bit codes, for example, for the codes to be canonical (right?).
The author is also using non-standard codes for 16+. They don't seem to use the copy code (16 in DEFLATE) at all; the main headers all have huge strings of identical length codes (terribly inefficient...), and the skip codes use the next 4 and 9 bits to determine the number of skips, respectively, rather than 3 and 7 as in DEFLATE.
Yet another key difference is in the very first bits of the header. In DEFLATE the first bits are HLIT(5), HDIST(5), and HCLEN(4). If I interpreted the above header that way using LSB packing, I'd get HLIT = 257 (correct), HDIST = 21 (unsure if correct) and HCLEN = 7 (definitely not correct). If I use MSB packing instead, I'd get HLIT=257, HDIST = 6 (more likely correct) and HCLEN = 16 (appears correct). BUT, I don't think there are actually intended to be 14 bits in the prefix because I appear to need the "100" (see above) for the bit count of the 0-bit (skip) code. And in other examples, bits 10-13 don't appear to correlate to the length of the pre-header at all.
Speaking of other examples, not every file appears to follow the same header format. Here's another header:
00000001000 100 100 110 011 010 111 010 111 011 101 010 110 100 011 101 000 100 011
In this second example, I again happen to know that the code table for the header is:
0 8-bits
10 7-bits
110 6-bits
11100 skA
11101 5-bits
111100 0-bits (skip)
111101 skB
111110 9-bits
1111110 3-bits
1111111 4-bits
However, as you can see, many of the required code lenghts are not in the header at all. For example there's no "001" to represent the 8-bit code, and they are not even close to being in sequence (neither consecutively nor by the optimal 16, 17, 18...).
And yet, if we shift the bits left by 1:
skA *0* *5* *6* *7* *8* *9*
0000000100 010 010 011 001 101 011 101 011 101 110 101 011 010 001 110 100 010 001 1
This is much better, but we still can't correctly derive the code for skB (110), or 3 or 4 (111). Shifting by another bit does not improve the situation.
Incidentally, if you're wondering how I am confident that I know the code tables in these two examples, the answer is A LOT of painstaking reverse engineering, i.e., looking at the bits in files that differ slightly or have discernable patterns, and deriving the canonical code table being used. These code tables are 99+% certainly correct.
To summarize, then, we appear to have an extremely close variant of DEFLATE, but for inexplicable reasons one that uses some kind of non-standard pre-header. Where I am getting tripped up, of course, is identifying which pre-header bits correspond to the code bit-lengths for the main header. If I had that, everything would fall into place.
I have a couple of other examples I could post, but rather than ask people to do pattern matching for me, what I'm really praying for is that someone will recognize the algorithm being used and be able to point me to it. I find it unlikely that the author, rather than use an existing standard, would have gone to the trouble of coding his own algorithm from scratch that was 99% like DEFLATE but then change the pre-header structure only slightly. It makes no sense; if they simply wanted to obfuscate the data to prevent what I'm trying to do, there are much easier and more effective ways.
The software dates back to the late 90s, early 2000s, by the way, so consider what was being done back then. This is not "middle out" or anything new and crazy. It's something old and probably obscure. I'm guessing some variant of DEFLATE that was in use in some semi-popular library around that time, but I've not been having much luck finding information on anything that isn't actually DEFLATE.
Many, many thanks for any input.
Peter
PS - As requested, here is the complete data block from the first example in the post. I don't know if it'll be of much use, but here goes. BTW, the first four bytes are the uncompressed output size. The fifth byte begins the pre-header.
B0 01 00 00 01 71 64 9A D6 34 9C 5F C0 A8 B6 D4 D0 76 6E 7A 57 92 80 00 54 51 16 A1 68 AA AA EC B9 8E 22 B6 42 48 48 10 9C 11 FE 10 84 A1 7E 36 74 73 7E D4 90 06 94 73 CA 61 7C C8 E6 4D D8 D9 DA 9D B7 B8 65 35 50 3E 85 B0 46 46 B7 DB 7D 1C 14 3E F4 69 53 A9 56 B5 7B 1F 8E 1B 3C 5C 76 B9 2D F2 F3 7E 79 EE 5D FD 7E CB 64 B7 8A F7 47 4F 57 5F 67 6F 77 7F 87 8F 97 9D FF 4F 5F 62 DA 51 AF E2 EC 60 65 A6 F0 B8 EE 2C 6F 64 7D 39 73 41 EE 21 CF 16 88 F4 C9 FD D5 AF FC 53 89 62 0E 34 79 A1 77 06 3A A6 C4 06 98 9F 36 D3 A0 F1 43 93 2B 4C 9A 73 B5 01 6D 97 07 C0 57 97 D3 19 C9 23 29 C3 A8 E8 1C 4D 3E 0C 24 E5 93 7C D8 5C 39 58 B7 14 9F 02 53 93 9C D8 84 1E B7 5B 3B 47 72 E9 D1 B6 75 0E CD 23 5D F6 4D 65 8B E4 5F 59 53 DF 38 D3 09 C4 EB CF 57 52 61 C4 BA 93 DE 48 F7 34 B7 2D 0B 20 B8 60 60 0C 86 83 63 08 70 3A 31 0C 61 E1 90 3E 12 32 AA 8F A8 26 61 00 57 D4 19 C4 43 40 8C 69 1C 22 C8 E2 1C 62 D0 E4 16 CB 76 50 8B 04 0D F1 44 52 14 C5 41 54 56 15 C5 81 CA 39 91 EC 8B C8 F5 29 EA 70 45 84 48 8D 48 A2 85 8A 5C 9A AE CC FF E8
Edit 7/11/2015
I've managed to decipher quite a bit additional information. The algorithm is definitely using LZ77 and Huffman coding. The length codes and extra bits seem to all match that used in Deflate.
I was able to learn a lot more detail about the pre-header as well. It has the following structure:
HLEN 0 SkS SkL ?? 3 4 5 6 7 8 9 HLIT
00000 00101110 001 0 1100 100 100 110 10 110 101 100 011 010 010 011 100010111
HLEN = the last bit-length in the pre-header - 3 (e.g. 1100 (12) means 9 is the last bit-length code)
HLIT = the number of literal codes in the main dictionary
SkS = "skip short" - skips a # of codes determined by the next 4-bits
SkL = "skip long" - skips a # of codes determined by the next 9-bits
0 - 9 = the number of bits in the dictionary codes for the respective bit lengths
The unmarked bits I'm still unable to decipher. Also, what I'm now seeing is that the pre-header codes themselves appear to have some extra bits thrown in (note the ?? between SkL and 3, above). They're not all straight 3-bit codes.
So, the only essential information that's now missing is:
How to parse the pre-header for extra bits and whatnot; and
How many distance codes follow the literal codes
If I had that information, I could actually feed the remaining data to zlib by manually supplying the code length dictionary along with the correct number of literal vs. distance codes. Everything after this header follows DEFLATE to the letter.
Here are some more example headers, with the bit-length codes indicated along with the number of literal and length codes. Note in each one I was able to reverse engineer the the answers, but I remain unable to match the undeciphered bits to those statistics.
Sample 1
(273 literals, 35 length, 308 total)
????? ???????? ??? ? HLEN 0 SkS SkL ?? 3 ? 4 ? 5 6 7 8 9 HLIT
00000 00100010 010 0 1100 110 101 110 10 111 0 111 0 101 011 010 001 110 100010001
Sample 2
(325 literal, 23 length, 348 total)
????? ???????? ??? ? HLEN 0 SkS SkL ?? 3 4 5 6 7 8 9 HLIT
00000 00110110 001 0 1100 101 101 110 10 110 000 101 000 011 010 001 101000101
Sample 3
(317 literal, 23 length, 340 total)
????? ???????? ??? ? HLEN 0 SkS SkL ??? 4 5 ? 6 7 8 9 HLIT
00000 01000100 111 0 1100 000 101 111 011 110 111 0 100 011 010 001 100111101
Sample 4
(279 literals, 18 length, 297 total)
????? ???????? ??? ? HLEN 0 SkS SkL ?? 3 4 5 6 7 8 9 HLIT
00000 00101110 001 0 1100 100 100 110 10 110 101 100 011 010 010 011 100010111
Sample 5
(258 literals, 12 length, 270 total)
????? ???????? ??? ? HLEN 0 SkS SkL ?? 2 3 4 HLIT
00000 00000010 000 0 0111 011 000 011 01 010 000 001 100000010
I'm still hoping someone has seen a non-standard DEFLATE-style header like this before. Or maybe you'll see a pattern I'm failing to see... Many thanks for any further input.

Well I finally managed to fully crack it. It was indeed using an implementation of LZ77 and Huffman coding, but very much a non-standard DEFLATE-like method for storing and deriving the codes.
As it turns out the pre-header codes were themselves fixed-dictionary Huffman codes and not literal bit lengths. Figuring out the distance codes was similarly tricky because unlike DEFLATE, they were not using the same bit-length codes as the literals, but rather were using yet another fixed-Huffman dictionary.
The takeaway for anyone interested is that apparently, there are old file formats out there using DEFLATE-derivatives. They CAN be reverse engineered with determination. In this case, I probably spent about 100 hours total, most of which was manually reconstructing compressed data from the known decompressed samples in order to find the code patterns. Once I knew enough about what they were doing to automate that process, I was able to make a few dozen example headers and thereby find the patterns.
I still fail to understand why they did this rather than use a standard format. It must have been a fair amount of work deriving a new compression format versus just using ZLib. If they were trying to obfuscate the data, they could have done so much more effectively by encrypting it, xor'ing with other values, etc. Nope, none of that. They just decided to show off their genius to their bosses, I suppose, by coming up with something "new" even if the differences from the standard were trivial and added no value other than to make MY life difficult. :)
Thanks to those who offered their input.

Pyramidal algorithm

I'm trying to find an algorithm in which i can go through a numerical pyramid, starting for the top of the pyramid and go forward through adjacent numbers in the next row and each number has to be added to a final sum. The thing is, i have to find the route that returns the highest result.
I already tried to go throught the higher adjacent number in next row, but that is not the answer, because it not always get the best route.
I.E.
34
43 42
67 89 68
05 51 32 78
72 25 32 49 40
If i go through highest adjacent number, it is:
34 + 43 + 89 + 51 + 32 = 249
But if i go:
34 + 42 + 68 + 78 + 49 = 269
In the second case the result is higher, but i made that route by hand and i can't think in an algorithm that get the highest result in all cases.
Can anyone give me a hand?
(Please tell me if I did not express myself well)

Start with the bottom row. As you go from left to right, consider the two adjacent numbers. Now go up one row and compare the sum of the number that is above the two numbers, in the row above, with each of the numbers below. Select the larger sum.
Basically you are looking at the triangles formed by the bottom row and the row above. So for your original triangle,
34
43 42
67 89 68
05 51 32 78
72 25 32 49 40
the bottom left triangle looks like,
05
72 25
So you would add 72 + 05 = 77, as that is the largest sum between 72 + 05 and 25 + 05.
Similarly,
51
25 32
will give you 51 + 32 = 83.
If you continue this approach for each two adjacent numbers and the number above, you can discard the bottom row and replace the row above with the computed sums.
So in this case, the second to last row becomes
77 83 81 127
and your new pyramid is
34
43 42
67 89 68
77 83 81 127
Keep doing this and your pyramid starts shrinking until you have one number which is the number you are after.
34
43 42
150 172 195
34
215 237
Finally, you are left with one number, 271.

Starting at the bottom (row by row), add the highest value of both the values under each element to that element.
So, for your tree, 05 for example, will get replaced by max(72, 25) + 05 = 77. Later you'll add the maximum of that value and the new value for the 51 element to 67.
The top-most node will be the maximum sum.
Not to spoil all your fun, I'll leave the implementation to you, or the details of getting the actual path, if required.

Algorithm to find average of group of numbers

I have a quite small list of numbers (a few hundred max) like for example this one:
117 99 91 93 95 95 91 97 89 99 89 99
91 95 89 99 89 99 89 95 95 95 89 948
189 99 89 189 189 95 186 95 93 189 95
189 89 193 189 93 91 193 89 193 185 95
89 194 185 99 89 189 95 189 189 95 89
189 189 95 189 95 89 193 101 180 189
95 89 195 185 95 89 193 89 193 185 99
185 95 189 95 89 193 91 190 94 190 185
99 89 189 95 189 189 95 185 95 185 99
89 189 95 189 186 99 89 189 191 95 185
99 89 189 189 96 89 193 189 95 185 95
89 193 95 189 185 95 93 189 189 95 186
97 185 95 189 95 185 99 185 95 185 99
185 95 190 95 185 95 95 189 185 95 189
2451
If you create a graph with X=the number and Y=number of times we see the number, we'll have something like this:
What I want is to know the average number of each group of numbers. In the example, there's 4 groups and the resulting numbers are 92, 187, 948 and 2451
The number of groups of number is not known.
Do you have any idea of how to create a (simple if possible) algorithm do extract these resulting numbers (if possible in c or pseudo code or English :)

What you want to do is called clustering. If the data you've shown is typical, a gready approach, such as neighbor joining, should be sufficient. So the procedure is:
1) Apply neighbor joining
2) Apply an (empirically identified) threshold to define the clusters
3) Calculate average of each cluster
Using a package that already has clustering algorithms, such as R, would probably be the easiest course, though neighbor joining is not a particularly hard algorithm.

I think std::map<int,int> can easily solve this problem. The key of the map would be the number, and value would be the times/frequency the number occurs.
So the average can be calculated as,
int average = (m[key] * key) / count;
Where count is total number of numbers, so it calculates the average of each group over all numbers, as you didn't clearly mention what you mean by average. I'm also assuming that each distinct number forms its own group!

Here's a way:
Decide what width your bins will be. Let's say 10 (i.e. e.g. numbers > -5 and <= 5 go into bin 0, numbers > 5 and <= 15 go into bin 1, ...).
Create a list which holds lists to the number in each bin. I'd go with something like map<unsigned int, vector<unsigned int> * > in C++.
Now iterate over the numbers, decide what bin they belong to. Check if there's already a vector for this bin in your map, if not create one. Add the number to the vector.
After iterating over all the numbers, simply calculate the average of each vector.

So you are looking for "spikes" in the graph. I'm guessing you are interested in the size and position of each group?
You might use something like this:
Sort the numbers
Loop:
Take the highest number you have
Investigate more numbers until you find a number that is too small to belong to the group (maybe 5% smaller)
Calculate the average of the selected numbers
Let the discarded number be the last number
End loop

In PHP you could do it like this:
$array = array(//an array of numbers);
$average = array_sum($array) / count($array);
With multiple groups of numbers you can do something like:
$array = array(
array(array of numbers, group1),
array(array of numbers, group2),
//etc.
);
foreach($array as $numbers)
{
$average[] = array_sum($numbers) / count($numbers);
}
Unless you're looking for the median or mode.
Ah, I see what you're asking now, you're not asking how to find the average, you're asking how to group the numbers up and find the average of each group.
Lets see, you'd have to find the mode, $counts = array_count_values($array)); array_keys(max($counts)); will do that and the keys in $counts will be the values of the original array, with the values in $counts being the number of times that each number shows up. Then you need to figure out where the bigger gaps in the keys in $counts are. You could also array_unique() the array original array and find the gaps in the values.
Wish my statistics teacher had done a bit more than play poker with us, or I could probably figure out the exact statistical method to determine how big the range checked to determine the groups should be.

Finding a set of permutations, with a constraint

I have a set of N^2 numbers and N bins. Each bin is supposed to have N numbers from the set assigned to it. The problem I am facing is finding a set of distributions that map the numbers to the bins, satisfying the constraint, that each pair of numbers can share the same bin only once.
A distribution can nicely be represented by an NxN matrix, in which each row represents a bin. Then the problem is finding a set of permutations of the matrix' elements, in which each pair of numbers shares the same row only once. It's irrelevant which row it is, only that two numbers were both assigned to the same one.
Example set of 3 permutations satisfying the constraint for N=8:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
0 8 16 24 32 40 48 56
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
0 9 18 27 36 45 54 63
1 10 19 28 37 46 55 56
2 11 20 29 38 47 48 57
3 12 21 30 39 40 49 58
4 13 22 31 32 41 50 59
5 14 23 24 33 42 51 60
6 15 16 25 34 43 52 61
7 8 17 26 35 44 53 62
A permutation that doesn't belong in the above set:
0 10 20 30 32 42 52 62
1 11 21 31 33 43 53 63
2 12 22 24 34 44 54 56
3 13 23 25 35 45 55 57
4 14 16 26 36 46 48 58
5 15 17 27 37 47 49 59
6 8 18 28 38 40 50 60
7 9 19 29 39 41 51 61
Because of multiple collisions with the second permutation, since, for example they're both pairing the numbers 0 and 32 in one row.
Enumerating three is easy, it consists of 1 arbitrary permutation, its transposition and a matrix where the rows are made of the previous matrix' diagonals.
I can't find a way to produce a set consisting of more though. It seems to be either a very complex problem, or a simple problem with an unobvious solution. Either way I'd be thankful if somebody had any ideas how to solve it in reasonable time for the N=8 case, or identified the proper, academic name of the problem, so I could google for it.
In case you were wondering what is it useful for, I'm looking for a scheduling algorithm for a crossbar switch with 8 buffers, which serves traffic to 64 destinations. This part of the scheduling algorithm is input traffic agnostic, and switches cyclically between a number of hardwired destination-buffer mappings. The goal is to have each pair of destination addresses compete for the same buffer only once in the cycling period, and to maximize that period's length. In other words, so that each pair of addresses was competing for the same buffer as seldom as possible.
EDIT:
Here's some code I have.
CODE
It's greedy, it usually terminates after finding the third permutation. But there should exist a set of at least N permutations satisfying the problem.
The alternative would require that choosing permutation I involved looking for permutations (I+1..N), to check if permutation I is part of the solution consisting of the maximal number of permutations. That'd require enumerating all permutations to check at each step, which is prohibitively expensive.

What you want is a combinatorial block design. Using the nomenclature on the linked page, you want designs of size (n^2, n, 1) for maximum k. This will give you n(n+1) permutations, using your nomenclature. This is the maximum theoretically possible by a counting argument (see the explanation in the article for the derivation of b from v, k, and lambda). Such designs exist for n = p^k for some prime p and integer k, using an affine plane. It is conjectured that the only affine planes that exist are of this size. Therefore, if you can select n, maybe this answer will suffice.
However, if instead of the maximum theoretically possible number of permutations, you just want to find a large number (the most you can for a given n^2), I am not sure what the study of these objects is called.

Make a 64 x 64 x 8 array: bool forbidden[i][j][k] which indicates whether the pair (i,j) has appeared in row k. Each time you use the pair (i, j) in the row k, you will set the associated value in this array to one. Note that you will only use the half of this array for which i < j.
To construct a new permutation, start by trying the member 0, and verify that at least seven of forbidden[0][j][0] that are unset. If there are not seven left, increment and try again. Repeat to fill out the rest of the row. Repeat this whole process to fill the entire NxN permutation.
There are probably optimizations you should be able to come up with as you implement this, but this should do pretty well.

Possibly you could reformulate your problem into graph theory. For example, you start with the complete graph with N×N vertices. At each step, you partition the graph into N N-cliques, and then remove all edges used.
For this N=8 case, K64 has 64×63/2 = 2016 edges, and sixty-four lots of K8 have 1792 edges, so your problem may not be impossible :-)

Right, the greedy style doesn't work because you run out of numbers.
It's easy to see that there can't be more than 63 permutations before you violate the constraint. On the 64th, you'll have to pair at least one of the numbers with another its already been paired with. The pigeonhole principle.
In fact, if you use the table of forbidden pairs I suggested earlier, you find that there are a maximum of only N+1 = 9 permutations possible before you run out. The table has N^2 x (N^2-1)/2 = 2016 non-redundant constraints, and each new permutation will create N x (N choose 2) = 28 new pairings. So all the pairings will be used up after 2016/28 = 9 permutations. It seems like realizing that there are so few permutations is the key to solving the problem.
You can generate a list of N permutations numbered n = 0 ... N-1 as
A_ij = (i * N + j + j * n * N) mod N^2
which generates a new permutation by shifting the columns in each permutation. The top row of the nth permutation are the diagonals of the n-1th permutation. EDIT: Oops... this only appears to work when N is prime.
This misses one last permutation, which you can get by transposing the matrix:
A_ij = j * N + i

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio