Selecting the "P" in Prune and Search Algorithm - algorithm

Note: the diagram above shows a partition into groups of 5 (the columns). The horizontal box denotes the median values of each partition. The 'P' item indicates the median of medians.
Most of the researches that I saw have this picture in Selecting their "P" and it always have an odd numbers of elements. But What if the numbers elements you have are even?
ex.
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
how do you get your "P" in an even set of elements?

This explanation gives the detail I think you're looking for:
https://www.cs.duke.edu/courses/summer10/cps130/files/Edelsbrunner_Median.pdf
The median of the set plays a special role in this algorithm, and it
is defined as the i-smallest item where i = (n+1)/2 if n is odd and i =
n/2 or (n+2)/2 if n is even.

Related

How to subset rows from one dataframe based on matching values from a second smaller data frame in R

I want to select a control group from one data frame based of matching the age from a second data frame. As an example I have subject.df
subject.df
id age
1 1 55
2 2 62
3 3 73
4 4 54
5 5 66
I'd like to subset control.df based off of matching the age directly on a 1 to 1 matching from the subject.df dataframe.
control.df
id age
6 6 66
7 7 71
8 8 80
9 9 51
10 10 55
11 11 56
12 12 77
13 13 62
14 14 64
15 15 73
16 16 67
17 17 54
18 18 75
19 19 77
20 20 78
21 21 53
22 22 64
23 23 83
24 24 61
25 25 77
I'm fairly new to R. In the past I've used Matlab and in this instance would use a for loop to iterate over the control.df dataframe, but I've been told that R doesn't always like for loops and that it can be computationally difficult in R.
In the end I'll be doing this on a much larger data set where the subject group is around 250 and the control group is more than 40K so I know that 1:1 matching is possible.

Is there a way to sum pairwise in Octave, vectorized (ie. mapping and reducing matrices)?

Is there a way to sum pairwise in Octave?
If for example, I have a 10-row by 4 column. I want a new 10 row by 2 column, where each column is the sum of the pairs.
ex.
[ 1 2 3 4
2 3 4 5
...
]
=> [ 3 7
5 9
...
]
I know how to accomplish this using for loops and accumarray etc, but I'm just not sure if there's a way to do it that is completely vectorized.
Here are a few more options.
Given:
a = reshape(1:40, 10, 4)
a =
1 11 21 31
2 12 22 32
3 13 23 33
4 14 24 34
5 15 25 35
6 16 26 36
7 17 27 37
8 18 28 38
9 19 29 39
10 20 30 40
Keep it simple
b = [sum(a(:,1:2),2) sum(a(:,3:4),2)]
b =
12 52
14 54
16 56
18 58
20 60
22 62
24 64
26 66
28 68
30 70
Squeeze a little
b = squeeze(sum(reshape(a, [], 2, 2), 2))
b =
12 52
14 54
16 56
18 58
20 60
22 62
24 64
26 66
28 68
30 70
Or, my personal favorite...
Mathemagic
b = a * [1 1 0 0; 0 0 1 1].'
b =
12 52
14 54
16 56
18 58
20 60
22 62
24 64
26 66
28 68
30 70
Perhaps someone comes with a better idea:
a = [1 2 3 4; 2 3 4 5]
b = reshape (sum (reshape (a.', 2, [])), [], rows(a)).'
gives
b =
3 7
5 9

How does the "successive passes in opposite direction" improvement work for bubble sort?

According to Data Structures Using C by Tenenbaum, one of the improvements of bubble sort is to have successive passes go in opposite direction so that the small elements move quickly to the front which will reduce the required number of passes [pg 336].
I worked out two examples, one which supports this statement and other which is against this one.
Supports: 25 48 37 12 57 86 33 92
iterations using usual Bubble sort :
25 48 37 12 57 86 33 92
25 37 12 48 57 33 86 92
25 12 37 48 33 57 86 92
12 25 37 33 48 57 86 92
12 25 33 37 48 57 86 92
iterations using improvement:
25 48 37 12 57 86 33 92
25 37 12 48 57 33 86 92
12 25 37 33 48 57 86 92
12 25 33 37 48 57 86 92
against: 3 4 1 2 5
iterations using usual Bubble sort:
3 4 1 2 5
3 1 2 4 5
1 2 3 4 5
iterations using improvement:
3 4 1 2 5
3 1 2 4 5
1 3 2 4 5
1 2 3 4 5
So is the statement incorrect that this improvement will always help? Or I am doing something wrong here ?
The example you gave above shows that this algorithm isn't a strict improvement over a standard bubble sort.
The advantage of this approach (sometimes called "cocktail sort," by the way) is that in cases where there are a lot of small elements at the end of the array, it rapidly pulls them to the front compared against normal bubble sort. For example, consider this array:
2 3 4 5 6 7 8 9 10 11 12 ... 10,000,000 1
With a normal bubble sort, it would take 9,999,999 passes over this array to sort it because the element 1, which is way out of place, only gets swapped one step forward on each iteration. On the other hand, with a cocktail sort, this would take just two passes - one initial pass and then a reverse pass.
While the above example is definitely contrived, in a randomly-shuffled array, there are likely going to be some smaller elements toward the end of the array and the number of passes of bubblesort is going to have to be large to move them back. Going in both directions helps speed this up.
That said, bubblesort is a pretty poor choice of a sorting algorithm, so hopefully this is just a theoretical discussion. :-)

vectorized indexing of matrices with other matrices (in octave)

Suppose we have a 2D (5x5) matrix:
test =
39 13 90 5 71
60 78 38 4 11
87 92 46 45 35
40 96 61 17 1
90 50 46 89 63
And a second 2D (5x2) matrix:
tidx =
1 3
2 4
2 3
2 4
4 5
And now we want to use tidx as an idex into test, so that we get the following output:
out =
39 90
78 4
92 46
96 17
89 63
One way to do this is with a for loop...
for i=1:size(test,1)
out(i,:) = test(i,tidx(i,:));
end
Question:
Is there a way to vectorize this so the same output is generated without a for loop?
Here is one way:
test(repmat([1:rows(test)]',1,columns(tidx)) + (tidx-1)*rows(test))
What you describe is an index problem. When you place a matrix all in one dimension, you get
test(:) =
39
60
87
40
90
13
78
92
96
50
90
38
46
61
46
5
4
45
17
89
71
11
35
1
63
This can be indexed using a single number. Here is how you figure out how to transform tidx into the correct format.
First, I use the above reference to figure out the index numbers which are:
outinx =
1 11
7 17
8 13
9 19
20 25
Then I start trying to figure out the pattern. This calculation gives a clue:
(tidx-1)*rows(test) =
0 10
5 15
5 10
5 15
15 20
This will move the index count to the correct column of test. Now I just need the correct row.
outinx-(tidx-1)*rows(test) =
1 1
2 2
3 3
4 4
5 5
This pattern is created by the for loop. I created that matrix with:
[1:rows(test)]' * ones(1,columns(tidx))
*EDIT: This does the same thing with a built in function.
repmat([1:rows(test)]',1,columns(tidx))
I then add the 2 together and use them as the index for test.

Finding a set of permutations, with a constraint

I have a set of N^2 numbers and N bins. Each bin is supposed to have N numbers from the set assigned to it. The problem I am facing is finding a set of distributions that map the numbers to the bins, satisfying the constraint, that each pair of numbers can share the same bin only once.
A distribution can nicely be represented by an NxN matrix, in which each row represents a bin. Then the problem is finding a set of permutations of the matrix' elements, in which each pair of numbers shares the same row only once. It's irrelevant which row it is, only that two numbers were both assigned to the same one.
Example set of 3 permutations satisfying the constraint for N=8:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
0 8 16 24 32 40 48 56
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
0 9 18 27 36 45 54 63
1 10 19 28 37 46 55 56
2 11 20 29 38 47 48 57
3 12 21 30 39 40 49 58
4 13 22 31 32 41 50 59
5 14 23 24 33 42 51 60
6 15 16 25 34 43 52 61
7 8 17 26 35 44 53 62
A permutation that doesn't belong in the above set:
0 10 20 30 32 42 52 62
1 11 21 31 33 43 53 63
2 12 22 24 34 44 54 56
3 13 23 25 35 45 55 57
4 14 16 26 36 46 48 58
5 15 17 27 37 47 49 59
6 8 18 28 38 40 50 60
7 9 19 29 39 41 51 61
Because of multiple collisions with the second permutation, since, for example they're both pairing the numbers 0 and 32 in one row.
Enumerating three is easy, it consists of 1 arbitrary permutation, its transposition and a matrix where the rows are made of the previous matrix' diagonals.
I can't find a way to produce a set consisting of more though. It seems to be either a very complex problem, or a simple problem with an unobvious solution. Either way I'd be thankful if somebody had any ideas how to solve it in reasonable time for the N=8 case, or identified the proper, academic name of the problem, so I could google for it.
In case you were wondering what is it useful for, I'm looking for a scheduling algorithm for a crossbar switch with 8 buffers, which serves traffic to 64 destinations. This part of the scheduling algorithm is input traffic agnostic, and switches cyclically between a number of hardwired destination-buffer mappings. The goal is to have each pair of destination addresses compete for the same buffer only once in the cycling period, and to maximize that period's length. In other words, so that each pair of addresses was competing for the same buffer as seldom as possible.
EDIT:
Here's some code I have.
CODE
It's greedy, it usually terminates after finding the third permutation. But there should exist a set of at least N permutations satisfying the problem.
The alternative would require that choosing permutation I involved looking for permutations (I+1..N), to check if permutation I is part of the solution consisting of the maximal number of permutations. That'd require enumerating all permutations to check at each step, which is prohibitively expensive.
What you want is a combinatorial block design. Using the nomenclature on the linked page, you want designs of size (n^2, n, 1) for maximum k. This will give you n(n+1) permutations, using your nomenclature. This is the maximum theoretically possible by a counting argument (see the explanation in the article for the derivation of b from v, k, and lambda). Such designs exist for n = p^k for some prime p and integer k, using an affine plane. It is conjectured that the only affine planes that exist are of this size. Therefore, if you can select n, maybe this answer will suffice.
However, if instead of the maximum theoretically possible number of permutations, you just want to find a large number (the most you can for a given n^2), I am not sure what the study of these objects is called.
Make a 64 x 64 x 8 array: bool forbidden[i][j][k] which indicates whether the pair (i,j) has appeared in row k. Each time you use the pair (i, j) in the row k, you will set the associated value in this array to one. Note that you will only use the half of this array for which i < j.
To construct a new permutation, start by trying the member 0, and verify that at least seven of forbidden[0][j][0] that are unset. If there are not seven left, increment and try again. Repeat to fill out the rest of the row. Repeat this whole process to fill the entire NxN permutation.
There are probably optimizations you should be able to come up with as you implement this, but this should do pretty well.
Possibly you could reformulate your problem into graph theory. For example, you start with the complete graph with N×N vertices. At each step, you partition the graph into N N-cliques, and then remove all edges used.
For this N=8 case, K64 has 64×63/2 = 2016 edges, and sixty-four lots of K8 have 1792 edges, so your problem may not be impossible :-)
Right, the greedy style doesn't work because you run out of numbers.
It's easy to see that there can't be more than 63 permutations before you violate the constraint. On the 64th, you'll have to pair at least one of the numbers with another its already been paired with. The pigeonhole principle.
In fact, if you use the table of forbidden pairs I suggested earlier, you find that there are a maximum of only N+1 = 9 permutations possible before you run out. The table has N^2 x (N^2-1)/2 = 2016 non-redundant constraints, and each new permutation will create N x (N choose 2) = 28 new pairings. So all the pairings will be used up after 2016/28 = 9 permutations. It seems like realizing that there are so few permutations is the key to solving the problem.
You can generate a list of N permutations numbered n = 0 ... N-1 as
A_ij = (i * N + j + j * n * N) mod N^2
which generates a new permutation by shifting the columns in each permutation. The top row of the nth permutation are the diagonals of the n-1th permutation. EDIT: Oops... this only appears to work when N is prime.
This misses one last permutation, which you can get by transposing the matrix:
A_ij = j * N + i

Resources