QuickSort with middle elemenet as pivot - algorithm

I am trying to search for any explanation on how Quick sort works with middle element as pivot but I couldn't find any. What I am trying to look for is there any demo on how the numbers are sorted step by step because its really hard understanding the algorithms. Thanks.

The vertical bars are around the pivot:
61 11 93 74 75 21 12|55|81 19 14 86 19 79 23 44
44 11 23|19|14 21 12 19
19|11|12 14
11
19|12|14
12
|19|14
14
19
19|21|23 44
|19|21
19
21
|23|44
23
44
81 55 75|86|74 79 93 61
81 55|75|61 74 79
74|55|61
55
|74|61
61
74
75|81|79
|75|79
75
79
81
|93|86
86
93
11 12 14 19 19 21 23 44 55 61 74 75 79 81 86 93
Based on this variation of Hoare partition scheme:
void QuickSort(int a[], int lo, int hi) {
int i, j, p;
if (lo >= hi)
return;
i = lo - 1;
j = hi + 1;
p = a[(lo + hi)/2];
while (1)
{
while (a[++i] < p) ;
while (a[--j] > p) ;
if (i >= j)
break;
swap(a+i, a+j);
}
QuickSort(a, lo, j);
QuickSort(a, j + 1, hi);
}
Note that the pivot can end up in either the left or right part after partition step.

Quicksort chooses a pivot value and moves the smaller elements to the beginning of the array and the larger elements to end. This is done by repeatedly scanning from both ends until a pair large/small is found, and swapped.
After such a partition process, all elements smaller than the pivot are stored before those larger than the pivot. Then the process is repeated on both subarrays, recursively. Of course when a subarray reduces to one or two elements, sorting them is trivial.
Recall that the pivot value can be chosen arbitrarily, provided there exist at least one element smaller and one larger in the array.

Related

Need help for finding optimal path which visits multiple sequences of nodes

Summary
Recently I have had a path-finding puzzle that has some complex constraints (currently, I don't have any solution for this one)
A 2D matrix represented the graph. The length of a path is the number of traversed cells.
One or more number sequences are to be found inside the matrix. Each sequence is scored with a value.
Maximum length of the path in the graph. The number of picked cells must not exceed this value.
At any given moment, you can only choose cells in a specific column or row.
On each turn, you need to switch between column and row and stay on
the same line as the last cell you picked. You have to move at right angles. (The direction is like the Snake game).
Always start with picking the first cell from the top row, then go
vertically down to pick the second cell, and then continue switching
between column and row as usual.
You can't choose the same cell twice. The resulting path must not contain duplicated
cells.
For example:
The task is to find the shortest path, if possible in the graph that contains one or more sequences with the highest total score and the path's length is not exceed the provided maximum length.
The picture below demonstrates the solved puzzle with the resulting path marked in red:
Here, we have a path 3A-10-9B. This path contains the given
sequence 3A-10-9B so, which earns 10pts. More complex graphs typically have longer paths containing various sequences at once.
More complex examples
Multiple Sequences
You can complete sequences in any order. The order in which the sequences are listed doesn't matter.
Wasted Moves
Sometimes we are forced to waste moves and choose different cells that don't belong to any sequence. Here are the rules:
Able to waste 1 or 2 moves before the first sequence.
Able to waste 1 or 2 moves between any neighboring sequences.
However, you cannot break sequences and waste moves in the middle of them.
Here, we must waste one move before the sequence 3A-9B and two moves between sequences 3A-9B and 72-D4. Also, notice how red lines between 3A and 9B as well as between 72 and D4 "cross" previously selected cells D4 and 9B, respectively. You can pick different cells from the same row or column multiple times.
Optimal Sequences
Sometimes, it is not possible to have a path that contains all of the provided sequences. In this case, choose the way which achieved the most significant score.
In the above example, we can complete either 9B-3A-72-D4 or 72-D4-3A but not both due to the maximum path length of 5 cells. We have chosen the sequence 9B-3A-72-D4 since it grants more score points than 72-D4-3A.
Unsolvable solution
The first sequence 3A-D4 can't be completed since the code matrix doesn't contain code D4 at all. The second sequence, 72-10, can't be completed for another reason: codes 72 and 10 aren't located in the same row or column anywhere in the matrix and, therefore, can't form a sequence.
Performance advice
One brute force way is to generate all possible paths in the code matrix, loop through them and choose the best one. This is the easiest but also the slowest approach. Solving larger matrices with larger maximum length of path might take dozens of minutes, if not hours.
Try to implement a faster algorithm that doesn’t iterate through all possible paths and can solve puzzles with the following parameters in less than 10 seconds:
Matrix size: 10x10
Number of sequences: 5
Average length of sequences: 4
Maximum path length: 12
At least one solution exists
For example:
Matrix:
41,0f,32,18,29,4b,55,3f,10,3a,
19,4f,57,43,3a,25,19,1e,5e,42,
13,5a,54,3c,1b,32,29,1c,15,30,
49,45,22,2e,25,51,2f,21,4c,37,
1a,5e,49,12,55,1e,49,19,43,2d,
34,26,53,48,49,60,32,3c,50,10,
0f,1e,30,3d,64,37,5b,5e,22,61,
4e,4f,15,5a,13,56,44,22,40,26,
43,2c,17,2b,1f,25,43,60,50,1f,
3c,2b,54,46,42,4d,32,46,30,24,
Sequences:
30, 26, 44, 32, 3c - 25pts
5a, 3c, 12, 1e, 4d - 10pts
1e, 5a, 12 - 10pts
4d, 1e - 5pts
32, 51, 2f, 49, 55, 42 - 30pts
Optimal solution
3f, 1c, 30, 26, 44, 32, 3c, 22, 5a, 12, 1e, 4d
Which contains
30, 26, 44, 32, 3c
5a, 12, 1e
1e, 4d
Conclusion
I am looking for any advice for this puzzle since I have no idea what keywords to look for. A pseudo-code or hints would be helpful for me, and I appreciate that. What has come to my mind is just Dijkstra:
For each sequence, since the order doesn't matter, I have to find all get all possible paths with every permutation, then find the highest score path that contains other input sequences
After that, choose the best of the best.
In this case, I doubt the performance will be the issue.
First step is to find if a required sequence exists.
- SET found FALSE
- LOOP C1 over cells in first row
- CLEAR foundSequence
- ADD C1 to foundSequence
- LOOP C2 over cells is column containing C1
- IF C2 value == first value in sequence
- ADD C2 to foundSequence
- SET found TRUE
- break from LOOP C2
- IF found
- SET direction VERT
- LOOP V over remaining values in sequence
- TOGGLE direction
- SET found FALSE
- LOOP C2 over cells in same column or row ( depending on direction ) containing last cell in foundSequence
- IF C2 value == V
- ADD C2 to foundSequence
- SET found TRUE
- break from LOOP C2
- IF ! found
break out of LOOP V
- IF foundSequence == required sequence
- RETURN foundSequence
RETURN failed
Note: this doesn't find sequences that are feasible with "wasted moves". I would implement this first and get it working. Then, using the same ideas, it can be extended to allow wasted moves.
You have not specified an input format! I suggest a space delimited text files with lines beginning with 'm' containing matrix values and lines beginning 's' containing sequences, like this
m 3A 3A 10 9B
m 9B 72 3A 10
m 10 3A 3A 3A
m 3A 10 3A 9B
s 3A 10 9B
I have implemented the sequence finder in C++
std::vector<int> findSequence()
{
int w, h;
pA->size(w, h);
std::vector<int> foundSequence;
bool found = false;
bool vert = false;
// loop over cells in first row
for (int c = 0; c < w; c++)
{
foundSequence.clear();
found = false;
if (pA->cell(c, 0)->value == vSequence[0][0])
{
foundSequence.push_back(pA->cell(c, 0)->ID());
found = true;
}
while (found)
{
// found possible starting cell
// toggle search direction
vert = (!vert);
// start from last cell found
auto pmCell = pA->cell(foundSequence.back());
int c, r;
pA->coords(c, r, pmCell);
// look for next value in required sequence
std::string nextValue = vSequence[0][foundSequence.size()];
found = false;
if (vert)
{
// loop over cells in column
for (int r2 = 1; r2 < w; r2++)
{
if (pA->cell(c, r2)->value == nextValue)
{
foundSequence.push_back(pA->cell(c, r2)->ID());
found = true;
break;
}
}
}
else
{
// loop over cells in row
for (int c2 = 0; c2 < h; c2++)
{
if (pA->cell(c2, r)->value == nextValue)
{
foundSequence.push_back(pA->cell(c2, r)->ID());
found = true;
break;
}
}
}
if (!found) {
// dead end - try starting from next cell in first row
break;
}
if( foundSequence.size() == vSequence[0].size()) {
// success!!!
return foundSequence;
}
}
}
std::cout << "Cannot find sequence\n";
exit(1);
}
This outputs:
3A 3A 10 9B
9B 72 3A 10
10 3A 3A 3A
3A 10 3A 9B
row 0 col 1 3A
row 3 col 1 10
row 3 col 3 9B
You can check out the code for the complete application at https://github.com/JamesBremner/stackoverflow75410318
I have added the ability to find sequences that start elsewhere than the first row ( i.e. with "wasted moves" ). You can see the code in the github repo.
Here are the the results of a timing profile run on a 10 by 10 matrix - the algorithm finds 5 sequences in 0.6 milliseconds
Searching
41 0f 32 18 29 4b 55 3f 10 3a
19 4f 57 43 3a 25 19 1e 5e 42
13 5a 54 3c 1b 32 29 1c 15 30
49 45 22 2e 25 51 2f 21 4c 37
1a 5e 49 12 55 1e 49 19 43 2d
34 26 53 48 49 60 32 3c 50 10
0f 1e 30 3d 64 37 5b 5e 22 61
4e 4f 15 5a 13 56 44 22 40 26
43 2c 17 2b 1f 25 43 60 50 1f
3c 2b 54 46 42 4d 32 46 30 24
for sequence 4d 1e
Cannot find sequence starting in 1st row, using wasted moves
row 9 col 5 4d
row 4 col 5 1e
for sequence 30 26 44 32 3c
Cannot find sequence starting in 1st row, using wasted moves
Cannot find sequence
for sequence 5a 3c 12 1e 4d
Cannot find sequence starting in 1st row, using wasted moves
row 2 col 1 5a
row 2 col 3 3c
row 4 col 3 12
row 4 col 5 1e
row 9 col 5 4d
for sequence 1e 5a 12
Cannot find sequence starting in 1st row, using wasted moves
row 6 col 1 1e
row 4 col 5 1e
row 4 col 3 12
for sequence 32 51 2f 49 55 42
Cannot find sequence starting in 1st row, using wasted moves
row 2 col 5 32
row 3 col 5 51
row 3 col 6 2f
row 4 col 6 49
row 4 col 4 55
row 9 col 4 42
raven::set::cRunWatch code timing profile
Calls Mean (secs) Total Scope
5 0.00059034 0.0029517 findSequence

Pyramidal algorithm

I'm trying to find an algorithm in which i can go through a numerical pyramid, starting for the top of the pyramid and go forward through adjacent numbers in the next row and each number has to be added to a final sum. The thing is, i have to find the route that returns the highest result.
I already tried to go throught the higher adjacent number in next row, but that is not the answer, because it not always get the best route.
I.E.
34
43 42
67 89 68
05 51 32 78
72 25 32 49 40
If i go through highest adjacent number, it is:
34 + 43 + 89 + 51 + 32 = 249
But if i go:
34 + 42 + 68 + 78 + 49 = 269
In the second case the result is higher, but i made that route by hand and i can't think in an algorithm that get the highest result in all cases.
Can anyone give me a hand?
(Please tell me if I did not express myself well)
Start with the bottom row. As you go from left to right, consider the two adjacent numbers. Now go up one row and compare the sum of the number that is above the two numbers, in the row above, with each of the numbers below. Select the larger sum.
Basically you are looking at the triangles formed by the bottom row and the row above. So for your original triangle,
34
43 42
67 89 68
05 51 32 78
72 25 32 49 40
the bottom left triangle looks like,
05
72 25
So you would add 72 + 05 = 77, as that is the largest sum between 72 + 05 and 25 + 05.
Similarly,
51
25 32
will give you 51 + 32 = 83.
If you continue this approach for each two adjacent numbers and the number above, you can discard the bottom row and replace the row above with the computed sums.
So in this case, the second to last row becomes
77 83 81 127
and your new pyramid is
34
43 42
67 89 68
77 83 81 127
Keep doing this and your pyramid starts shrinking until you have one number which is the number you are after.
34
43 42
150 172 195
34
215 237
Finally, you are left with one number, 271.
Starting at the bottom (row by row), add the highest value of both the values under each element to that element.
So, for your tree, 05 for example, will get replaced by max(72, 25) + 05 = 77. Later you'll add the maximum of that value and the new value for the 51 element to 67.
The top-most node will be the maximum sum.
Not to spoil all your fun, I'll leave the implementation to you, or the details of getting the actual path, if required.

Finding number C of key comparisons and number M of moves

I'm reading N.Wirth - Algorithms and Data Structures now. (Oberon version: August 2004)
The Question: how did he count these C and M? There is no explanation of this process... (any help will be useful)
Let me tell you what is the matter... I came across the following:
2.2.1 Sorting by Straight Insertion
... A good measure of efficiency is obtained by counting the numbers C of
needed key comparisons and M of moves (transpositions) of items.
He describes how does this algorithm work:
PROCEDURE StraightInsertion;
VAR i, j: INTEGER; x: Item;
BEGIN
FOR i := 1 TO n-1 DO
x := a[i]; j := i;
WHILE (j > 0) & (x < a[j-1] DO a[j] := a[j-1]; DEC(j) END ;
a[j] := x
END
END StraightInsertion
...and then he tells about C and M. But he doesn't explain the process of finding them --> he just shown the counted Cmin, Mmax... :
Analysis of straight insertion. The number Ci of key comparisons in the i-th sift is at most i-1, at least 1, and --- assuming that all permutations of the n keys are equally probable --- i/2 in the average. The number Mi of moves (assignments of items) is Ci + 2 (including the sentinel). Therefore, the total numbers of comparisons and moves are:
Cmin = n-1 Mmin = 3*(n-1)
Cave = (n^2 + n - 2)/4 Mave = (n^2 + 9n - 10)/4
Cmax = (n^2 + n - 4)/4 Mmax = (n^2 + 3n - 4)/2
So the question is:
how did he count these C and M? He doesn't explain the process of finding all of these numbers. Can you help me to understand how to find them? Any help will be good.
PS
I have been looking for the information on this subject, but with no result.
Additionally:
Here is the process of insertion shown in an example of eight numbers chosen at random (if needed):
Initial Keys: 44 55 12 42 94 18 06 67
v
i=1 44 55 12 42 94 18 06 67
v
v-----<
i=2 12 44 55 42 94 18 06 67
v
v-----<
i=3 12 42 44 55 94 18 06 67
v
i=4 12 42 44 55 94 18 06 67
v
v-----------<
i=5 12 18 42 44 55 94 06 67
v
v-----------------<
i=6 06 12 18 42 44 55 94 67
v
v--<
i=7 06 12 18 42 44 55 67 94
C is the number of comparisons and M is the number of data items moved. If we go by your example, on iteration 1, there is 1 comparison and no move. On iteration 2, there are 2 comparisons and 2 moves. And so on. Now, let us consider kth iteration. There will be k comparisons and assuming that your exact spot is halfway from 1 to k, there will be k/2 moves.
The number C and M are the sum of all comparisons and movements when k changes from 1 to n. All you have to do is to add up the summations, varying k from 1 to n and you have the numbers.

Understanding "median of medians" algorithm

I want to understand "median of medians" algorithm on the following example:
We have 45 distinct numbers divided into 9 group with 5 elements each.
48 43 38 33 28 23 18 13 8
49 44 39 34 29 24 19 14 9
50 45 40 35 30 25 20 15 10
51 46 41 36 31 26 21 16 53
52 47 42 37 32 27 22 17 54
The first step is sorting every group (in this case they are already sorted)
Second step recursively, find the "true" median of the medians (50 45 40 35 30 25 20 15 10) i.e. the set will be divided into 2 groups:
50 25
45 20
40 15
35 10
30
sorting these 2 groups
30 10
35 15
40 20
45 25
50
the medians is 40 and 15 (in case the numbers are even we took left median)
so the returned value is 15 however "true" median of medians (50 45 40 35 30 25 20 15 10) is 30, moreover there are 5 elements less then 15 which are much less than 30% of 45 which are mentioned in wikipedia
and so T(n) <= T(n/5) + T(7n/10) + O(n) fails.
By the way in the Wikipedia example, I get result of recursion as 36. However, the true median is 47.
So, I think in some cases this recursion may not return true median of medians. I want to understand where is my mistake.
The problem is in the step where you say to find the true median of the medians. In your example, you had these medians:
50 45 40 35 30 25 20 15 10
The true median of this data set is 30, not 15. You don't find this median by splitting the groups into blocks of five and taking the median of those medians, but instead by recursively calling the selection algorithm on this smaller group. The error in your logic is assuming that median of this group is found by splitting the above sequence into two blocks
50 45 40 35 30
and
25 20 15 10
then finding the median of each block. Instead, the median-of-medians algorithm will recursively call itself on the complete data set 50 45 40 35 30 25 20 15 10. Internally, this will split the group into blocks of five and sort them, etc., but it does so to determine the partition point for the partitioning step, and it's in this partitioning step that the recursive call will find the true median of the medians, which in this case will be 30. If you use 30 as the median as the partitioning step in the original algorithm, you do indeed get a very good split as required.
Hope this helps!
Here is the pseudocode for median of medians algorithm (slightly modified to suit your example). The pseudocode in wikipedia fails to portray the inner workings of the selectIdx function call.
I've added comments to the code for explanation.
// L is the array on which median of medians needs to be found.
// k is the expected median position. E.g. first select call might look like:
// select (array, N/2), where 'array' is an array of numbers of length N
select(L,k)
{
if (L has 5 or fewer elements) {
sort L
return the element in the kth position
}
partition L into subsets S[i] of five elements each
(there will be n/5 subsets total).
for (i = 1 to n/5) do
x[i] = select(S[i],3)
M = select({x[i]}, n/10)
// The code to follow ensures that even if M turns out to be the
// smallest/largest value in the array, we'll get the kth smallest
// element in the array
// Partition array into three groups based on their value as
// compared to median M
partition L into L1<M, L2=M, L3>M
// Compare the expected median position k with length of first array L1
// Run recursive select over the array L1 if k is less than length
// of array L1
if (k <= length(L1))
return select(L1,k)
// Check if k falls in L3 array. Recurse accordingly
else if (k > length(L1)+length(L2))
return select(L3,k-length(L1)-length(L2))
// Simply return M since k falls in L2
else return M
}
Taking your example:
The median of medians function will be called over the entire array of 45 elements like (with k = 45/2 = 22):
median = select({48 49 50 51 52 43 44 45 46 47 38 39 40 41 42 33 34 35 36 37 28 29 30 31 32 23 24 25 26 27 18 19 20 21 22 13 14 15 16 17 8 9 10 53 54}, 45/2)
The first time M = select({x[i]}, n/10) is called, array {x[i]} will contain the following numbers: 50 45 40 35 30 20 15 10.
In this call, n = 45, and hence the select function call will be M = select({50 45 40 35 30 20 15 10}, 4)
The second time M = select({x[i]}, n/10) is called, array {x[i]} will contain the following numbers: 40 20.
In this call, n = 9 and hence the call will be M = select({40 20}, 0).
This select call will return and assign the value M = 20.
Now, coming to the point where you had a doubt, we now partition the array L around M = 20 with k = 4.
Remember array L here is: 50 45 40 35 30 20 15 10.
The array will be partitioned into L1, L2 and L3 according to the rules L1 < M, L2 = M and L3 > M. Hence:
L1: 10 15
L2: 20
L3: 30 35 40 45 50
Since k = 4, it's greater than length(L1) + length(L2) = 3. Hence, the search will be continued with the following recursive call now:
return select(L3,k-length(L1)-length(L2))
which translates to:
return select({30 35 40 45 50}, 1)
which will return 30 as a result. (since L has 5 or fewer elements, hence it'll return the element in kth i.e. 1st position in the sorted array, which is 30).
Now, M = 30 will be received in the first select function call over the entire array of 45 elements, and the same partitioning logic which separates the array L around M = 30 will apply to finally get the median of medians.
Phew! I hope I was verbose and clear enough to explain median of medians algorithm.

Finding a set of permutations, with a constraint

I have a set of N^2 numbers and N bins. Each bin is supposed to have N numbers from the set assigned to it. The problem I am facing is finding a set of distributions that map the numbers to the bins, satisfying the constraint, that each pair of numbers can share the same bin only once.
A distribution can nicely be represented by an NxN matrix, in which each row represents a bin. Then the problem is finding a set of permutations of the matrix' elements, in which each pair of numbers shares the same row only once. It's irrelevant which row it is, only that two numbers were both assigned to the same one.
Example set of 3 permutations satisfying the constraint for N=8:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
0 8 16 24 32 40 48 56
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
0 9 18 27 36 45 54 63
1 10 19 28 37 46 55 56
2 11 20 29 38 47 48 57
3 12 21 30 39 40 49 58
4 13 22 31 32 41 50 59
5 14 23 24 33 42 51 60
6 15 16 25 34 43 52 61
7 8 17 26 35 44 53 62
A permutation that doesn't belong in the above set:
0 10 20 30 32 42 52 62
1 11 21 31 33 43 53 63
2 12 22 24 34 44 54 56
3 13 23 25 35 45 55 57
4 14 16 26 36 46 48 58
5 15 17 27 37 47 49 59
6 8 18 28 38 40 50 60
7 9 19 29 39 41 51 61
Because of multiple collisions with the second permutation, since, for example they're both pairing the numbers 0 and 32 in one row.
Enumerating three is easy, it consists of 1 arbitrary permutation, its transposition and a matrix where the rows are made of the previous matrix' diagonals.
I can't find a way to produce a set consisting of more though. It seems to be either a very complex problem, or a simple problem with an unobvious solution. Either way I'd be thankful if somebody had any ideas how to solve it in reasonable time for the N=8 case, or identified the proper, academic name of the problem, so I could google for it.
In case you were wondering what is it useful for, I'm looking for a scheduling algorithm for a crossbar switch with 8 buffers, which serves traffic to 64 destinations. This part of the scheduling algorithm is input traffic agnostic, and switches cyclically between a number of hardwired destination-buffer mappings. The goal is to have each pair of destination addresses compete for the same buffer only once in the cycling period, and to maximize that period's length. In other words, so that each pair of addresses was competing for the same buffer as seldom as possible.
EDIT:
Here's some code I have.
CODE
It's greedy, it usually terminates after finding the third permutation. But there should exist a set of at least N permutations satisfying the problem.
The alternative would require that choosing permutation I involved looking for permutations (I+1..N), to check if permutation I is part of the solution consisting of the maximal number of permutations. That'd require enumerating all permutations to check at each step, which is prohibitively expensive.
What you want is a combinatorial block design. Using the nomenclature on the linked page, you want designs of size (n^2, n, 1) for maximum k. This will give you n(n+1) permutations, using your nomenclature. This is the maximum theoretically possible by a counting argument (see the explanation in the article for the derivation of b from v, k, and lambda). Such designs exist for n = p^k for some prime p and integer k, using an affine plane. It is conjectured that the only affine planes that exist are of this size. Therefore, if you can select n, maybe this answer will suffice.
However, if instead of the maximum theoretically possible number of permutations, you just want to find a large number (the most you can for a given n^2), I am not sure what the study of these objects is called.
Make a 64 x 64 x 8 array: bool forbidden[i][j][k] which indicates whether the pair (i,j) has appeared in row k. Each time you use the pair (i, j) in the row k, you will set the associated value in this array to one. Note that you will only use the half of this array for which i < j.
To construct a new permutation, start by trying the member 0, and verify that at least seven of forbidden[0][j][0] that are unset. If there are not seven left, increment and try again. Repeat to fill out the rest of the row. Repeat this whole process to fill the entire NxN permutation.
There are probably optimizations you should be able to come up with as you implement this, but this should do pretty well.
Possibly you could reformulate your problem into graph theory. For example, you start with the complete graph with N×N vertices. At each step, you partition the graph into N N-cliques, and then remove all edges used.
For this N=8 case, K64 has 64×63/2 = 2016 edges, and sixty-four lots of K8 have 1792 edges, so your problem may not be impossible :-)
Right, the greedy style doesn't work because you run out of numbers.
It's easy to see that there can't be more than 63 permutations before you violate the constraint. On the 64th, you'll have to pair at least one of the numbers with another its already been paired with. The pigeonhole principle.
In fact, if you use the table of forbidden pairs I suggested earlier, you find that there are a maximum of only N+1 = 9 permutations possible before you run out. The table has N^2 x (N^2-1)/2 = 2016 non-redundant constraints, and each new permutation will create N x (N choose 2) = 28 new pairings. So all the pairings will be used up after 2016/28 = 9 permutations. It seems like realizing that there are so few permutations is the key to solving the problem.
You can generate a list of N permutations numbered n = 0 ... N-1 as
A_ij = (i * N + j + j * n * N) mod N^2
which generates a new permutation by shifting the columns in each permutation. The top row of the nth permutation are the diagonals of the n-1th permutation. EDIT: Oops... this only appears to work when N is prime.
This misses one last permutation, which you can get by transposing the matrix:
A_ij = j * N + i

Resources