Divide and Conquer Matrix Multiplication

Divide and Conquer Matrix Multiplication - algorithm

I am having trouble getting divide and conquer matrix multiplication to work. From what I understand, you split the matrices of size nxn into quadrants (each quadrant is n/2) and then you do:
C11 = A11⋅ B11 + A12 ⋅ B21
C12 = A11⋅ B12 + A12 ⋅ B22
C21 = A21 ⋅ B11 + A22 ⋅ B21
C22 = A21 ⋅ B12 + A22 ⋅ B22
My output for divide and conquer is really large and I'm having trouble figuring out the problem as I am not very good with recursion.
example output:
Original Matrix A:
4 0 4 3
5 4 0 4
4 0 4 0
4 1 1 1
A x A
Classical:
44 3 35 15
56 20 24 35
32 0 32 12
29 5 21 17
Divide and Conquer:
992 24 632 408
1600 272 720 1232
512 0 512 384
460 17 405 497
Could someone tell me what I am doing wrong for divide and conquer? All my matrices are int[][] and classical method is the traditional 3 for loop matrix multiplication

You are recursively calling divideAndConquer in the wrong way. What your function does is square a matrix. In order for divide and conquer matrix multiplication to work, it needs to be able to multiply two potentially different matrixes together.
It should look something like this:
private static int[][] divideAndConquer(int[][] matrixA, int[][] matrixB){
if (matrixA.length == 2){
//calculate and return base case
}
else {
//make a11, b11, a12, b12 etc. by dividing a and b into quarters
int[][] c11 = addMatrix(divideAndConquer(a11,b11),divideAndConquer(a12,b21));
int[][] c12 = addMatrix(divideAndConquer(a11,b12),divideAndConquer(a12,b22));
int[][] c21 = addMatrix(divideAndConquer(a21,b11),divideAndConquer(a22,b21));
int[][] c22 = addMatrix(divideAndConquer(a21,b12),divideAndConquer(a22,b22));
//combine result quarters into one result matrix and return
}
}

Some debugging approaches to try:
Try some very simple test matrices as input (e.g. all zeros, with a one or a few strategic ones). You may see a pattern in the "failures" that will show you where your error(s) are.
Make sure your "classical" approach is giving you correct answers. For small matrices, you can use Woflram Alpha on-line to test answers: http://www.wolframalpha.com/examples/Matrices.html
To debug recursion: add printf() statements at the entry and exit of your function, including the invocation arguments. Run your test matrix, write the output to a log file, and open the log file with a text editor. Step through each case, writing your notes alongside in the editor making sure it's working correctly at each step. Add more printf() statements and run again if needed.
Good luck with the homework!

Could someone tell me what I am doing wrong for divide and conquer?
Yes:
int[][] a = divideAndConquer(topLeft);
int[][] b = divideAndConquer(topRight);
int[][] c = divideAndConquer(bottomLeft);
int[][] d = divideAndConquer(bottomRight);
int[][] c11 = addMatrix(classical(a,a),classical(b,c));
int[][] c12 = addMatrix(classical(a,b),classical(b,d));
int[][] c21 = addMatrix(classical(c,a),classical(d,c));
int[][] c22 = addMatrix(classical(c,b),classical(d,d));
You are going through an extra multiplication step here: you shouldn't be calling both divideAndConquer() and classical().
What you are effectively doing is:
C11 = (A11^2)⋅(B11^2) + (A12^2)⋅(B21^2)
C12 = (A11^2)⋅(B12^2) + (A12^2)⋅(B22^2)
C21 = (A21^2)⋅(B11^2) + (A22^2)⋅(B21^2)
C22 = (A21^2)⋅(B12^2) + (A22^2)⋅(B22^2)
which is not correct.
First, remove the divideAndConquer() calls, and replace a/b/c/d by topLeft/topRight/etc.
See if it gives you the proper results.
Your divideAndConquer() method needs a pair of input parameters, so you can use A*B. Once you get that working, get rid of the calls to classical(), and use divideAndConquer() instead. (or save them for matrices that are not a multiple of 2 in length.)

You might find the Wiki article on Strassen's algorithm helpful.

Related

Formula to get next question in quiz basing on previous statistics

My goal is to dynamically determine what question should be next in quiz by using statistics of previous answers
So, I have:
Question with difficulty field (1-100)
Maximum score you can get in question (let it be 256)
Score user have reached in question (x out of max)
I want to somehow combine these paramaters in formula to choose most suitable next question for user
How can I do it?
My idea was to give user a question with median difficulty as first one and then check if user scored less than 50% of maximum, then get questions with 25 percentile difficulty else get 75 percentile. Then repeat this schema on a smaller stint (25-50 percentile or 50-75 percentile and so on)

Let's assume that the player has a fixed function score = f(difficulty) that gives for each difficulty the expected score percentage. Once we know this function, we can invert it and find the difficulty level that will give us the expected score we want.
However, the function is not known. But we have samples of this function in the form of our previous questions. So, we can fit a function to these samples. If you have knowledge about the form of the dependence, you can include that knowledge in the shape of your fitted function. I will simply assume a truncated linear function:
score = f(difficulty) = max(0, min(m * difficulty + n, 1))
The two parameters that we need to find are m and n. If we remove all sample questions where the user scored 100% or 0%, we can ignore the truncation. Then, we have a list of samples that form a linear system of equations:
score1 = m * difficulty1 + n
score2 = m * difficulty2 + n
score3 = m * difficulty3 + n
...
This system will usually not have a solution. So, we can solve for a least-squares solution. To do this, we will incrementally build a 2x2 matrix A and a 2-dimensional vector b that represent the system A * x = b. We will start with the zero matrix and the zero vector. For each question, we will update:
/ A11 A12 \ += / difficulty * difficulty difficulty \
\ A21 A22 / \ difficulty 1 /
/ b1 \ += / difficulty * score \
\ b2 / \ score /
Once we have added at least two questions, we can solve:
m = (A12 * b2 - A22 * b1) / (A12 * A12 - A11 * A22)
n = (A12 * b1 - A11 * b2) / (A12 * A12 - A11 * A22)
And we can find the difficulty for an expected score of P as:
difficulty = (P - n) / m
Let's do an example. The following table contains a few questions and the state of the function after adding the question.
diff score | A11 A12 A22 b1 b2 | m n
--------------+----------------------------+-------------
70 0.3 | 4900 70 1 21 0.3 |
50 0.4 | 7400 120 2 41 0.7 | -0.005 0.65
40 0.5 | 9000 160 3 61 1.2 | -0.006 0.74
35 0.7 | 10225 195 4 85.5 1.9 | -0.010 0.96
Here is the fitted function and the sample questions:
And if we want to find the difficulty for an expected score of e.g. 75%, we get:
difficulty(0.75) = 21.009

Bilinear image interpolation / scaling - A calculation example

I would like to ask you about some bilinear interpolation / scaling details. Let's assume that we have this matrix:
|100 | 50 |
|70 | 20 |
This is a 2 x 2 grayscale image. Now, I would like scale it by factor of two and my matrix looks like this:
| 100 | f1 | 50 | f2 |
| f3 | f4 | f5 | f6 |
| 70 | f7 | 20 | f8 |
so if we would like to calculate f4, the calculation is defined as
f1 = 100 + 0.5(50 - 100) = 75
f7 = 70 + 0.5(20 - 70) = 45
and now finally:
f4 = 75 + 0.5(45 - 75) = 60
However, I can't really understand what calculations are proper for f3 or f1
Do we do the bilinear scaling in each direction separately? Therefore, this would mean that:
f3 = 100 + 0.5(70 - 100) = 85
f1 = 100 + 0.5(50 - 100) = 75
Also, how should I treat f2, f6, f8. Are those points simply being copied like in the nearest neighbor algorithm?

I would like to point you to this very insightful graphic from Wikipedia that illustrates how to do bilinear interpolation for one point:
Source: Wikipedia
As you can see, the four red points are what is known. These points you know before hand and P is the point we wish to interpolate. As such, we have to do two steps (as you have indicated in your post). To handle the x coordinate (horizontal), we must calculate what the interpolated value is row wise for the top row of red points and the bottom row of red points. This results in the two blue points R1 and R2. To handle the y coordinate (vertical), we use the two blue points and interpolate vertically to get the final P point.
When you resize an image, even though we don't visually see what I'm about to say, but imagine that this image is a 3D signal f. Each point in the matrix is in fact a 3D coordinate where the column location is the x value, the row location is the y value and the z value is the quantity / grayscale value of the matrix itself. Therefore, doing z = f(x,y) is the value of the matrix at location (x,y) in the matrix. In our case, because you're dealing with images, each value of (x,y) are integers that go from 1 up to as many rows/columns as we have depending on what dimension you're looking at.
Therefore, given the coordinate you want to interpolate at (x,y), and given the red coordinates in the image above, which we call them x1,y1,x2,y2 as per the diagram - specifically going with the convention of the diagram and referencing how images are accessed: x1 = 1, x2 = 2, y1 = 2, y2 = 1, the blue coordinates R1 and R2 are computed via 1D interpolation column wise using the same row both points coincide on:
R1 = f(x1,y1) + (x - x1)/(x2 - x1)*(f(x2,y1) - f(x1,y1))
R2 = f(x1,y2) + (x - x1)/(x2 - x1)*(f(x2,y2) - f(x1,y2))
It's important to note that (x - x1) / (x2 - x1) is a weight / proportion of how much of a mix the output consists of between the two values seen at f(x1,y1) and f(x2,y1) for R1 or f(x1,y2) and f(x2,y2) for R2. Specifically, x1 is the starting point and (x2 - x1) is the difference in x values. You can verify that substituting x1 as x gives us 0 while x2 as x gives us 1. This weight fluctuates between [0,1] which is required for the calculations to work.
It should be noted that the origin of the image is at the top-left corner, and so (1,1) is at the top-left corner. Once you find R1 and R2, we can find P by interpolating row wise:
P = R2 + (y - y2)/(y2 - y1)*(R1 - R2)
Again, (y - y2) / (y2 - y1) denote the proportion / mix of how much R1 and R2 contribute to the final output P. As such, you calculated f5 correctly because you used four known points: The top left is 100, top right is 50, bottom left is 70 and bottom right is 20. Specifically, if you want to compute f5, this means that (x,y) = (1.5,1.5) because we're halfway in between the 100 and 50 due to the fact that you're scaling the image by two. If you plug in these values into the above computation, you will get the value of 60 as you expected. The weights for both calculations will also result in 0.5, which is what you got in your calculations and that's what we expect.
If you compute f1, this corresponds to (x,y) = (1.5,1) and if you substitute this into the above equation, you will see that (y - y2)/(y2 - y1) gives you 0 or the weight is 0, and so what is computed is just R2, corresponding to the linear interpolation along the top row only. Similarly, if we computed f7, this means we want to interpolate at (x,y) = (1.5,2). In this case, you will see that (y - y2) / (y2 - y1) is 1 or the weight is 1 and so P = R2 + (R1 - R2), which simplifies to R1 and is the linear interpolation along the bottom row only.
Now there's the case of f3 and f5. Those both correspond to (x,y) = (1,1.5) and (x,y) = (2,1.5) respectively. Substituting these values in for R1 and R2 and P for both cases give:
f3
R1 = f(1,2) + (1 - 1)/(2 - 1)*(f(2,2) - f(1,2)) = f(1,2)
R2 = f(1,1) + (1 - 1)/(2 - 1)*(f(1,2) - f(1,1)) = f(1,1)
P = R1 + (1.5 - 1)*(R1 - R2) = f(1,2) + 0.5*(f(1,2) - f(1,1))
P = 70 + 0.5*(100 - 70) = 85
f5
R1 = f(1,2) + (2 - 1)/(2 - 1)*(f(2,2) - f(1,2)) = f(2,2)
R2 = f(1,1) + (2 - 1)/(2 - 1)*(f(1,2) - f(1,1)) = f(1,2)
P = R1 + (1.5 - 1)*(R1 - R2) = f(2,2) + 0.5*(f(2,2) - f(1,2))
P = 20 + 0.5*(50 - 20) = 35
So what does this tell us? This means that you are interpolating along the y-direction only. This is apparent when we take a look at P. Examining the calculations more thoroughly of P for each of f3 and f5, you see that we are considering values along the vertical direction only.
As such, if you want a definitive answer, f1 and f7 are found by interpolating along the x / column direction only along the same row. f3 and f5 are found by interpolating y / row direction along the same column. f4 uses a mixture of f1 and f7 to compute the final value as you have already seen.
To answer your final question, f2, f6 and f8 are filled in based on personal preference. These values are considered to be out of bounds, with the x and y values both being 2.5 and that's outside of our [1,2] grid for (x,y). In MATLAB, the default implementation of this is to fill any values outside of the defined boundaries to be not-a-number (NaN), but sometimes, people extrapolate using linear interpolation, copy the border values, or perform some elaborate padding like symmetric or circular padding. It depends on what situation you're in, but there is no correct and definitive answer on how to fill in f2, f6 and f8 - it all depends on your application and what makes the most sense to you.
As a bonus, we can verify that my calculations are correct in MATLAB. We first define a grid of (x,y) points in the [1,2] range, then resize the image so that it's twice as large where we specify a resolution of 0.5 per point rather than 1. I'm going to call your defined matrix A:
A = [100 50; 70 20]; %// Define original matrix
[X,Y] = meshgrid(1:2,1:2); %// Define original grid of points
[X2,Y2] = meshgrid(1:0.5:2.5,1:0.5:2.5) %// Define expanded grid of points
B = interp2(X,Y,A,X2,Y2,'linear'); %// Perform bilinear interpolation
The original (x,y) grid of points looks like:
>> X
X =
1 2
1 2
>> Y
Y =
1 1
2 2
The expanded grid to expand the size of the matrix by twice as much looks like:
>> X2
X2 =
1.0000 1.5000 2.0000 2.5000
1.0000 1.5000 2.0000 2.5000
1.0000 1.5000 2.0000 2.5000
1.0000 1.5000 2.0000 2.5000
>> Y2
Y2 =
1.0000 1.0000 1.0000 1.0000
1.5000 1.5000 1.5000 1.5000
2.0000 2.0000 2.0000 2.0000
2.5000 2.5000 2.5000 2.5000
B is the output using X and Y as the original grid of points and X2 and Y2 are the points we want to interpolate at.
We get:
>> B
B =
100 75 50 NaN
85 60 35 NaN
70 45 20 NaN
NaN NaN NaN NaN

Classical Round Table algorithm?

Coins with different value are spread in circle around a round table . We can choose any coin such that for any two adjacent pair of coins , atleast one must be selected (both maybe selected too) . In such condition we have to find minimum possible value of coins selected .
I have to respect time complexity so instead of using naive recursive bruteforce , i tried doing it using dynamic programming . But i get Wrong Answer - my algorithm is incorrect .
If someone could suggest an algorithm to do it dynamically , i could code myself in c++ . Also maximum number of coins is 10^6 , so i think O(n) solution exists .
EDIT : Okay , i also add an example .
If coins value around table is 1,2,1,2,2 (in circle) , then minimum value of coin would be 4 by selecting 1st,3rd & 4th(or 5th) .

Having everything in a circle hampers dynamic programming, because there is no stable start point.
If you knew that a particular coin would be included in the best answer, you could use that as your start point. Renumber it coin 1 and use dynamic programming to work out the best cost of 1..N, with and without the Nth coin selected. Given this you can work out the best cost of 1..N+1 and so on.
Actually you can also use this method if somebody tells you that a particular coin would not be selected - you just have slightly different starting conditions. Or you could use that fact that if you know that a particular coin is not selected, the two on either side of it must be selected.
Any coin is either selected or not, so you can look at the costs both ways, produced by solving two dynamic programming problems, and pick whichever cost is cheapest.

I think the following algorithm will get you the best solution. I have not gone through your code (sorry):
We will select a random point in the circle to start. Say it's 1. We will look at what happens if it would be selected.
So we select 1. Move up in the circle and you get the choice of selecting 2 or not. This can be shown in a tree where the top branch represents selecting the coin and the lower one not selecting the coin. The numbers represent the total sum of the selected coins.
3 = 1 and 2 both selected
/
1
\
1 = 1 selected, 2 not
Now we continue in the circle and get the choice of selecting 3 or not. This gives a tree like
6 = 1, 2 and 3 selected
/
3
/ \
/ 3= 1 and 2 selected, 3 not
/
1
\
\ 4 = 1 and 3 selected, 2 not
\ /
1
\
1 = 1 selected, 2 and 3 not
Now in that tree, we can prune! Given your problem statement, you have to keep track of which coins are taken to make sure every coin is 'covered'. Say the last 2 coins were not selected. Then you know the the next has to be selected in order not to violate your constraints. More importantly, the possibilities in the rest of your algorithm only depend on the choice of the last 2 coins.
Now look at all branches that have selected the last coin (3). You only need to keep the one with the lowest weight. Both those branches are free to choose what they want in the rest of the algorithm. In this case, we can safely remove the top branch. We then have 3 possible paths left.
Now take a look at what happens if we enumerate the choices for coin 4
3 7= 1, 2 and 4 selected, 3 not
/ \ /
/ 3
/ \
3 = 1 and 2 selected, 3 and 4 not
1 8 = 1, 3 and 4 selected, 2 not
\ /
\ 4
\ / \
1 4 = 1 and 3 selected, 2 and 4 not
5 = 1 and 4 selected, 2 and 3 not
\ /
1
\
1 = only 1 selected
Now you have 6 choices. However, the lowest branch (only 1 is selected) is invalid because 3 is not adjacent to anything. You can prune that to have 5 branches left. Of those 5 there are 3 that selected 4 (=the last coin so far) and we can do the same thing as before: only keep the cheapest branch. This reduces the number of branches to 3 again.
You can keep doing this for your whole circle until you reach the start again. Then you should have 3 paths of which you can choose the cheapest. This gives you the best solution if you start of by selecting coin 1.
Now we have the best solution for when 1 is selected. However, It could be that 1 should not be selected. It could be that it is adjacent to another coin that is selected: coin 2 or coin 6. If we now do the above algorithm once for coin 2 instead of coin 1 and once for coin 6 we should have the best solution.
This approach relies on the fact that either coin 1, 2 or 6 is selected.
I hope I made my approach comprehensible. It's rather long and you could do it fasterr by using some state transition diagram in which you only maintain the possible states (which depends on the last 2 coins) and work on that. The methods are the same as above, only more compact)

O(n) suggestion, by induction. Hmm, I read the wiki now and I found out it counts as dynamic programming. Really a broad term. I had a different understanding of dynamic programming before.
Glossary
We have N coins in N places. Coin values are a[i], where 0 <= i < N. Each coin may be selected or deselected which we express as the sequence of 0 and 1.
Algorithm description
00 is invalid sequence in any place, because it would violate the problem constraints. 111 is also invalid because it is not optimal, 101 is always better.
Sequentially for every place i we calculate 3 best sums, for 3 codes: 01, 10, 11. The code comes from the setting of last 2 coins, that is i-1 and i. So we have best (minimum) sums in variables b01, b02, b11.
We have to start from something sure, so we will apply the algorithm 2 times. One for coin at place 0 set, and one for unset.
At the beginning we try places 0 and 1 and initiate bs directly. b01 = a[1], b10 = a[0], b11 = a[0] + a[1]. However if this is the round in which we choose the first coin to be unset, we can accept only b01 solution. So we assign a big number to b10 and b11. These solutions will be quickly dropped by next algorithm steps. On the second round we will do the opposite: assign big nuber to b01, because first bit must be set.
At step i we have best sums for place i-1 in bs. We compute cs which are the best sums for place i.
c01 = b10 + a[i] // 101 (10 -> 01)
c10 = min(b01, b11) // 010 (01 -> 10) or 110 (11 -> 10)
c11 = b01 + a[i] // 011 (01 -> 11)
That comes from following possibilities:
010 - b01 -> c10
011 - b01 -> c11
100 - invalid
101 - b10 -> c01
110 - b11 -> c10
111 - invalid
Of course we finish each step with assigning best sums back to bs.
When we processed all the coins we must drop the solutions that are incompatible with the initial assumption. Bits i-2, i-1 and 0 must produce valid sequences.
This is example run for 123456 sequence.
A. assume first bit 0
1 a[1] = 2: b01 = 2, b10 = 999, b11 = 999
2 a[2] = 3: b01 = 1002, b10 = 2, b11 = 5
3 a[3] = 4: b01 = 6, b10 = 9, b11 = 1006
4 a[4] = 5: b01 = 13, b10 = 6, b11 = 11
5 a[5] = 6: b01 = 12, b10 = 13, b11 = 19
b10 is unacceptable, we choose better from b01 and b11, which is 12.
B. assume first bit 1
1 a[1] = 2: b01 = 999, b10 = 1, b11 = 3
2 a[2] = 3: b01 = 4, b10 = 3, b11 = 1002
3 a[3] = 4: b01 = 7, b10 = 4, b11 = 8
4 a[4] = 5: b01 = 9, b10 = 12, b11 = 12
5 a[5] = 6: b01 = 18, b10 = 9, b11 = 15
Now b11 is invalid as it would produce 111. So we choose best of b01 and b10, which is 9. Step A gave 12, step B gave 9. 9 is better. This is the result.
I made the above calculations manually, so sorry if there is a mistake in them. However for the first coin unset I computed 2+4+6 and for first coin set the result was 1+3+5. Seems to be right.

Determine elements of a matrix when sum of rows and columns are given

There is 4x4 matrix with all 4 diagonal elements zero. All other elements are non negative integers. Sum of all 4 rows and 4 columns are known individually. Is it possible to determine the remaining 12 elements of the matrix? Eg
0 1 1 0 sum=2
2 0 0 1 sum=3
4 1 0 0 sum=5
0 1 6 0 sum=7
sum=6 sum=3 sum=7 sum=1
Any guidance will be very helpful.
Thanks

The matrix is
0 a12 a13 a14
a21 0 a23 a24
a31 a32 0 a34
a41 a42 a43 0
The problem is to solve a set of linear equations:
a12 + a13 + a14 = c1
a21 + a23 + a24 = c2
and so on. We have 12 variables and 8 equations (4 for the rows and 4 for the columns). To solve a linear equation system in 12 variables, we generally need 12 equations. Since the number of equations is lesser, the system will not have a unique solution. It may have infinitely many solutions.

The matrix is
0 a12 a13 a14
a21 0 a23 a24
a31 a32 0 a34
a41 a42 a43 0
The problem is to solve a set of linear equations:
a12 + a13 + a14 = r1
a21 + a23 + a24 = r2
a31 + a32 + a34 = r3
a41 + a43 + a44 = r4
a21 + a31 + a41 = c1
a12 + a32 + a42 = c2
a13 + a23 + a43 = c3
a14 + a34 + a44 = c4
Thus you need to solve an equation of the form Ax = b with A consisting of only 0 and 1 coefficients. Use Gauss Elimination and Euclidian Algorithm to find integer Matrices S, D, T such that D is in Diagonal form and SDT = A. If you do not know how to do this search the web for Smith normal form algorithm.
Then
SDTx = Ax = b
Thus
DTx = S-1Ax = S-1b
Since D is in diagonal form you can check if you can solve
Dy = S-1b
for y. You also find a base for the (Homogenous) solution space. This in turn can then be used to cut down the complexity in the search for the positive solutions of the original equation.

Is there a specialized algorithm, faster than quicksort, to reorder data ACEGBDFH?

I have some data coming from the hardware. Data comes in blocks of 32 bytes, and there are potentially millions of blocks. Data blocks are scattered in two halves the following way (a letter is one block):
A C E G I K M O B D F H J L N P
or if numbered
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
First all blocks with even indexes, then the odd blocks. Is there a specialized algorithm to reorder the data correctly (alphabetical order)?
The constraints are mainly on space. I don't want to allocate another buffer to reorder: just one more block. But I'd also like to keep the number of moves low: a simple quicksort would be O(NlogN). Is there a faster solution in O(N) for this special reordering case?

Since this data is always in the same order, sorting in the classical sense is not needed at all. You do not need any comparisons, since you already know in advance which of two given data points.
Instead you can produce the permutation on the data directly. If you transform this into cyclic form, this will tell you exactly which swaps to do, to transform the permuted data into ordered data.
Here is an example for your data:
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now calculate the inverse (I'll skip this step, because I am lazy here, assume instead the permutation I have given above actually is the inverse already).
Here is the cyclic form:
(0)(1 8 4 2)(3 9 12 6)(5 10)(7 11 13 14)(15)
So if you want to reorder a sequence structured like this, you would do
# first cycle
# nothing to do
# second cycle
swap 1 8
swap 8 4
swap 4 2
# third cycle
swap 3 9
swap 9 12
swap 12 6
# so on for the other cycles
If you would have done this for the inverse instead of the original permutation, you would get the correct sequence with a proven minimal number of swaps.
EDIT:
For more details on something like this, see the chapter on Permutations in TAOCP for example.

So you have data coming in in a pattern like
a0 a2 a4...a14 a1 a3 a5...a15
and you want to have it sorted to
b0 b1 b2...b15
With some reordering the permutation can be written like:
a0 -> b0
a8 -> b1
a1 -> b2
a2 -> b4
a4 -> b8
a9 -> b3
a3 -> b6
a6 -> b12
a12 -> b9
a10 -> b5
a5 -> b10
a11 -> b7
a7 -> b14
a14 -> b13
a13 -> b11
a15 -> b15
So if you want to sort in place it with only one block additional space in a temporary t, this could be done in O(1) with
t = a8; a8 = a4; a4 = a2; a2 = a1; a1 = t
t = a9; a9 = a12; a12= a6; a6 = a3; a9 = t
t = a10; a10 = a5; a5 = t
t = a11; a11 = a13; a13 = a14; a14 = a7; a7 = t
Edit:The general case (for N != 16), if it is solvable in O(N), is actually an interesting question. I suspect the cycles always start with a prime number which satisfies p < N/2 && N mod p != 0 and the indices have a recurrence like in+1 = 2in mod N, but I am not able to prove it. If this is the case, deriving an O(N) algorithm is trivial.

maybe i'm misunderstanding, but if the order is always identical to the one given then you can "pre-program" (ie avoiding all comparisons) the optimum solution (which is going to be the one that has the minimmum number of swaps to move from the string given to ABCDEFGHIJKLMNOP and which, for something this small, you can work out by hand - see LiKao's answer).

It is easier for me to label your set with numbers:
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
Start from the 14 and move all even numbers to place (8 swaps). You will get this:
0 1 2 9 4 6 13 8 3 10 7 12 11 14 15
Now you need another 3 swaps (9 with 3, 7 with 13, 11 with 13 moved from 7).
A total of 11 swaps. Not a general solution, but it could give you some hints.

You can also view the intended permutation as a shuffle of the address-bits `abcd <-> dabc' (with abcd the individual bits of the index) Like:
#include <stdio.h>
#define ROTATE(v,n,i) (((v)>>(i)) | (((v) & ((1u <<(i))-1)) << ((n)-(i))))
/******************************************************/
int main (int argc, char **argv)
{
unsigned i,a,b;
for (i=0; i < 16; i++) {
a = ROTATE(i,4,1);
b = ROTATE(a,4,3);
fprintf(stdout,"i=%u a=%u b=%u\n", i, a, b);
}
return 0;
}
/******************************************************/

That was count sort I believe

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio