MATLAB - Split a large matrix into submatrices efficiently - performance

I have a large rectangular matrix (like 3000x3000) in MATLAB and I need to split it into several square submatrices (starting from 1x1 and ending at 3000x3000) to do some scientific calculations.
a simplified example would be as follows:
01 02 03 04
05 06 07 08
09 10 11 12
13 14 15 16
submatrices:
1x1: [01],[02].... [16]
2x2: [[01,02], [05,06]], [[02,03], [06,07]].... [[11,12],[15,16]]
until
4x4: same as the given matrix.
I have used several for loops but it takes a lot of time and seems not efficient.
How can I do that more efficiently?

Related

forvalues and xtile in Stata

What do the last two lines do? As far as I understand, these lines loop through the list h_nwave and calculate the weighted quantiles, if syear2digit == 'nwave' , i.e. calculate 5 quantiles for each year. But I'm not sure if my understanding is correct. Also is this equivalent to using group() function?
h_nwave "91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15"
generate quantile_ip = .
forvalues number = 1(1)15 {
local nwave : word `number' of `h_nwave'
xtile quantile_ip_`nwave' = a_ip if syear2digit == `nwave' [ w = weight ], nq(5)
replace quantile_ip = quantile_ip_`nwave' if syear2digit == `nwave'
}
I try to convert this into R with forloop, mutate, xtile (statar package required) and case_when. However, so far I cannot find a suitable way to get similar result.
There is no source or context for this code.
Detail: The first command is truncated and presumably should have been
local h_nwave 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
Detail: The first list contains 25 values, presumably corresponding to years 1991 to 2015. But the second list implies 15 values, so we are only looking at 91 to 05.
Main idea: xtile bins to quintile bins on variable a_ip, with weights. So the lowest 20% of observations (taking weighting into account) should be in bin 1, and so on. In practice observations with the same value must be assigned to the same bin, so 20-20-20-20-20 splits are not guaranteed, quite apart from the small print of whether sample size is a multiple of 5. So, the result is assignment to bins 1 to 5, and not quintiles themselves, or any other kind quantiles.
This is done separately for each survey wave.
The xtile command is documented for everyone at https://www.stata.com/manuals/dpctile.pdf regardless of personal or workplace access to Stata.
In R, you may well be able to produce quintile bins for all survey years at once. I have no idea how to do that.
Otherwise put, the loop arises because xtile doesn't work on separate subsets in one command call. There are community-contributed Stata commands that allow that. This kind of topic is much discussed on Statalist.

How to calculate / describe relative position (rubix cube)

This is an algorithmic problem. I can't seem to find a way to compare relative positions of 2 cubes in a rubix cube.
I've numbered all the 20 cubes in my program. and I'm using their this coordinate system, but now that I wanted to model two cubes in relative position I'm having trouble.
For example, say I saw the two cubes I'm watching in position 8 and 10, then later I saw them in position 12 and 13, well in both situations they're both on the same face of the cube, and they're both across from each other, not adjacent. Relatively speaking, that's the same representation of their location.
(By the way I'm only concerned with the "edge cubes" at this point, that's not the corners, so: 8 10 9 11 12 13 14 15 16 17 18 19 positions).
So anyway I thought if I listed every position in relation to each staring point, using the same algorithm to list each one, then I could compare the indexes and if they were the same, the relative position would be the same (but I was wrong, I might be on the right track, but it doesn't always work):
08 10 18 16 12 13 14 15 09 11 19 17
09 11 19 17 13 14 15 12 10 08 16 18
10 18 16 08 14 15 12 13 11 09 17 19
11 19 17 09 15 12 13 14 08 10 18 16
12 13 14 15 11 19 17 09 16 08 10 18
13 14 15 12 08 16 18 10 17 09 11 19
14 15 12 13 09 17 19 11 18 10 08 16
15 12 13 14 10 18 16 08 19 11 09 17
16 08 10 18 19 17 09 11 13 12 15 14
17 09 11 19 16 18 10 08 14 13 12 15
18 16 08 10 17 19 11 09 15 14 13 12
19 17 09 11 18 16 08 10 12 15 14 13
Consider the following two positions: cube A is at potion 19 and cube b is at 16. they're adjacent on the bottom level. Here's "19" row and it's indices to 16:
0 1 2 3 4 5
19 17 09 11 18 16 08 10 12 15 14 13
Now compare that to the relative position of the cube c and d at 13 and 9. C and D are adjacent on the right side, so they should have the same relative position. But my method doesn't determine that.
0 1 2 3 4 5 6 7 8 9
13 14 15 12 08 16 18 10 17 09 11 19
index 6 is not equal to index 9. Anyway that was my best approach and it took all day to come up with.
Does anyone have any other strategies that come to mind for calculating / expressing relative position between two locations on a cube?
Thanks very much for your help, and consideration on this topic!
There are two problems here:
I think you made a mistake when you calculated the relative positions from cube 13. I get:
0 1 2 3 4 5 6 7 8 9 10 11
13 14 15 12 17 09 11 19 08 16 18 10
This lines up with the other one, so cube 9 occurs at position 5. Compare this with the first row:
0 1 2 3 4 5 6
19 17 09 11 18 16 08 10 12 15 14 13
As required, cube 16 also occurs at position 5 (I think you mixed something up in your question. You mention index 6 when you mean 5. You number the indexes up to 6, but at position 6 there is cube 8, not cube 16. Please check that again).
The second problem is that given only a cube position without a reference cube for the orientation, there are two ways to number the cubes. Since your cube is not colored, you can rotate the cube by 180 degrees and come to another numbering for the reference cubes. Given that the relative positions for cube 19 are correct, I can also number the relative positions for cube 13 like this:
0 1 2 3 4 5 6 7 8 9 10 11
13 12 15 14 08 16 18 10 17 09 11 19
Note that this is close to your version but indexes 1 to 3 are in a different order. I think you were not consistent in the way you looked at the cube.
The main problem already becomes apparent in this paragraph:
For example, say I saw the two cubes I'm watching in position 8 and
10, then later I saw them in position 12 and 13, well in both
situations they're both on the same face of the cube, and they're both
across from each other, not adjacent. Relatively speaking, that's the
same representation of their location.
For every cube, there are two other cubes being on the same face and across from each other. To eliminate this ambiguity, you have to take orientations into account or reduce the number of relative positions (e.g. index 1 and 3 in your current scheme would denote the same relative position).

Convert a binary min-heap to a binary search tree in place [duplicate]

We are given an array of 2m - 1 distinct, comparable elements, indexed starting from 1.
We can view the array as a complete binary tree:
Node is placed at index i.
Left child is placed at 2i.
Right child is placed at 2i+1.
For instance, the array
[7 6 4 5 2 3 1]
is the tree
7
/ \
6 4
/ \ / \
5 2 3 1
Now when viewed as a binary tree, these elements satisfy the heap property, a node is greater than both its children:
A[i] > A[2i] and A[i] > A[2i+1]
Are there reasonably fast, in-place algorithm to shuffle the elements of the array around so that the resulting binary tree (as described above) is a binary search tree?
Recall that in a binary search tree, a node is greater than all its left descendants, and less than all its right descendants.
For instance the reshuffle of the above array would be
[4 2 6 1 3 5 7]
which corresponds to the binary search tree
4
/ \
2 6
/ \ / \
1 3 5 7
First we note that we can -- without loss of generality -- assume that we have the elements 1,2,3,... 2^m-1 in our binary tree. So, from now on, we assume that we have these numbers.
Then, my attempt would be some function to convert a sorted array (i.e. 1 2 3 4 5) into an array representing a sorted binary tree.
In a sorted binary tree with (2^m)-1 elements we have always that the "bottom" of the tree consists of all the uneven numbers, e.g. for m=3:
4
2 6
1 3 5 7
This means, in the corresponding array, we have that the last numbers are all the uneven numbers:
4 2 6 1 3 5 7
-------
^
uneven numbers!
So we can construct the last "row" of the binary tree by ensuring that the last 2^(m-1) numbers in the corresponding array are all the uneven numbers. So all we need to do for the last row is to construct a function that moves all elements at positions with uneven indices to the last row.
So let us for now assume that we have a routine that -- given a sorted array as input -- establishes the last row correctly.
Then we can call the routine for the whole array to construct the last row while all other elements stay sorted. When we apply this routine on the array 1 2 3 4 5 6 7, we have the following situation:
2 4 6 1 3 5 7
-------
^
correct!
After the first round, we apply the routine for the remaining subarray (namely 2 4 6) which constructs the second last "row" of our binary tree, while we leave the remaining elements unchanged, so we get the following:
now correct as well!
v
---
4 2 6 1 3 5 7
-------
^
correct from run before
So all we have to do is to construct a function that installs the last row (i.e. the second half of the array) correctly!
This can be done in O(n log n) where n is the input size of the array. Therefore, we just traverse the array from end to the beginning and exchange the uneven positions in such a way that the last row (i.e. the latter half of the array) is correct. This can be done in-place. Afterwards, we sort the first half of the array (using e.g. heapsort). So the whole runtime of this subroutine is O(n log n).
So the runtime for an array of size n in total is:
O(n log n) + O(n/2 log n/2) + O(n/4 log n/4) + ... which is the same as O(n log n). Note that we have to use a in-place sorting algorithm such as Heapsort so that this whole stuff works completely in-place.
I'm sorry that I can't elaborate it further, but I think you can get the idea.
Let n = 2m - 1. In linear time, we can both make a max-heap and extract the elements of a binary search tree in sorted order, so the best we can hope for (assuming comparison-based algorithms) is O(n log n) time and O(1) space. Here is such an algorithm.
For j = n down to 1, pop the max element from the j-element max-heap and store it at (newly vacated) location j. This sorts the array.
Convert the sorted array to a binary search tree with a divide and conquer strategy. (Naively this is Omega(log n) space, but I believe we can compress the stack to O(1) log(n)-bit words.)
a. Treeify the elements less than the root.
b. Treeify the elements greater than the root.
c. Merge the trees by rotating the leaves less than the root into position (= three reverses) so as to leave a subproblem of half the size (O(n)).
(08 04 12 02 06 10 14 01 03 05 07 09 11 13 15)16(24 20 28 18 22 26 30 17 19 21 23 25 27 29 31)
(08 04 12 02 06 10 14)16(24 20 28 18 22 26 30)01 03 05 07 09 11 13 15 17 19 21 23 25 27 29 31
(08 04 12)16(24 20 28)02 06 10 14 18 22 26 30 01 03 05 07 09 11 13 15 17 19 21 23 25 27 29 31
(08)16(24)04 12 20 28 02 06 10 14 18 22 26 30 01 03 05 07 09 11 13 15 17 19 21 23 25 27 29 31
16 08 24 04 12 20 28 02 06 10 14 18 22 26 30 01 03 05 07 09 11 13 15 17 19 21 23 25 27 29 31
Just some basic ideas:
A binary search tree is a binary tree.
Both children of the root are either nil or themselves binary search trees
The values satisfy the following condition: left child < root < right child
Condition 1 is not problem - the heap is a binary tree as well.
Condition 2 is problematic but suggests a bottom up approach.
Condition 3 is not satisfied as well.
Bottom up means:
- We start with all leaves - this is unproblematic, they are binary search trees.
- Now we continue with a recursive walk through each level of parents up to the root.
- Swap the subtrees if the left child is larger than the right child.
- Swap the root with the larger value of the 2 children (it's the right child)
- This might not be enough - you might need to continue to correct the right subtree until it is a binary search tree again.
This should work. But still - removing the top element and inserting it into a self balancing tree will be the faster/better approach and a lot easier to implement (e.g. using standard components like std::map in c++).
Another idea: for binary search trees holds the property that a left-root-right walk through the tree obtains the sorted values. This could be done reverse. Getting the values sorted from the heap should be easy as well. Just try to combine this - reading from the heap and writing the tree directly from the sorted values. This can be done in O(n) I think - but I'm not sure wether it can be done in place or not - I guess not.

Figuring out how to decode obfuscated URL parameters

I have web based system that uses encrypted GET parameters. I need to figure out what encryption is used and create a PHP function to recreate it. Any ideas?
Example URL:
...&watermark=ISpQICAK&width=IypcOysK&height=IypcLykK&...
You haven't provided nearly enough sample data for us to reliably guess even the alphabet used to encode it, much less what structure it might have.
What I can tell, from the three sample values you've provided, is:
There is quite a lot of redundancy in the data — compare e.g. width=IypcOysK and height=IypcLykK (and even watermark=ISpQICAK, though that might be just coincidence). This suggests that the data is neither random nor securely encrypted (which would make it look random).
The alphabet contains a fairly broad range of upper- and lowercase letters, from A to S and from c to y. Assuming that the alphabet consists of contiguous letter ranges, that means a palette of between 42 and 52 possible letters. Of course, we can't tell with any certainty from the samples whether other characters might also be used, so we can't even entirely rule out Base64.
This is not the output of PHP's base_convert function, as I first guessed it might be: that function only handles bases up to 36, and doesn't output uppercase letters.
That, however, is just about all. It would help to see some more data samples, ideally with the plaintext values they correspond to.
Edit: The id parameters you give in the comments are definitely in Base64. Besides the distinctive trailing = signs, they both decode to simple strings of nine printable ASCII characters followed by a line feed (hex 0A):
_Base64___________Hex____________________________ASCII_____
JiJQPjNfT0MtCg== 26 22 50 3e 33 5f 4f 43 2d 0a &"P>3_OC-.
JikwPClUPENICg== 26 29 30 3c 29 54 3c 43 48 0a &)0<)T<CH.
(I've replaced non-printable characters with a . in the ASCII column above.) On the assumption that all the other parameters are Base64 too, let's see what they decode to:
_Base64___Hex________________ASCII_
ISpQICAK 21 2a 50 20 20 0a !*P .
IypcOysK 23 2a 5c 3b 2b 0a #*\;+.
IypcLykK 23 2a 5c 2f 29 0a #*\/).
ISNAICAK 21 23 40 20 20 0a !## .
IyNAPjIK 23 23 40 3e 32 0a ###>2.
IyNAKjAK 23 23 40 2a 30 0a ###*0.
ISggICAK 21 28 20 20 20 0a !( .
IikwICAK 22 29 30 20 20 0a ")0 .
IilAPCAK 22 29 40 3c 20 0a ")#< .
So there's definitely another encoding layer involved, but we can already see some patterns:
All decoded values consist of a constant number of printable ASCII characters followed by a trailing line feed character. This cannot be a coincidence.
Most of the characters are on the low end of the printable ASCII range (hex 20 – 7E). In particular, the lowest printable ASCII character, space = hex 20, is particularly common, especially in the watermark strings.
The strings in each URL resemble each other more than they resemble the corresponding strings from other URLs. (But there are resemblances between URLs too: for example, all the decoded watermark values begin with ! = hex 21.)
In fact, the highest numbered character that occurs in any of the strings is _ = hex 5F, while the lowest (excluding the line feeds) is space = hex 20. Their difference is hex 3F = decimal 63. Coincidence? I think not. I'll guess that the second encoding layer is similar to uuencoding: the data is split into 6-bit groups (as in Base64), and each group is mapped to an ASCII character simply by adding hex 20 to it.
In fact, it looks like the second layer might be uuencoding: the first bytes of each string have the right values to be uuencode length indicators. Let's see what we get if we try to decode them:
_Base64___________UUEnc______Hex________________ASCII___re-UUE____
JiJQPjNfT0MtCg== &"P>3_OC- 0b 07 93 fe f8 cd ...... &"P>3_OC-
JikwPClUPENICg== &)0<)T<CH 25 07 09 d1 c8 e8 %..... &)0<)T<CH
_Base64___UUEnc__Hex_______ASC__re-UUE____
ISpQICAK !*P 2b + !*P``
IypcOysK #*\;+ 2b c6 cb +.. #*\;+
IypcLykK #*\/) 2b c3 c9 +.. #*\/)
ISNAICAK !## 0e . !##``
IyNAPjIK ###>2 0e 07 92 ... ###>2
IyNAKjAK ###*0 0e 02 90 ... ###*0
ISggICAK !( 20 !(```
IikwICAK ")0 25 00 %. ")0``
IilAPCAK ")#< 26 07 &. ")#<`
This is looking good:
Uudecoding and re-encoding the data (using Perl's unpack "u" and pack "u") produces the original string, except that trailing spaces are replaced with ` characters (which falls within acceptable variation between encoders).
The decoded strings are no longer printable ASCII, which suggests that we might be closer to the real data.
The watermark strings are now single characters. In two cases out of three, they're prefixes of the corresponding width and height strings. (In the third case, which looks a bit different, the watermark might perhaps have been added to the other values.)
One more piece of the puzzle — comparing the ID strings and corresponding numeric values you give in the comments, we see that:
The numbers all have six digits. The first two digits of each number are the same.
The uudecoded strings all have six bytes. The first two bytes of each string are the same.
Coincidence? Again, I think not. Let's see what we get if we write the numbers out as ASCII strings, and XOR them with the uudecoded strings:
_Num_____ASCII_hex___________UUDecoded_ID________XOR______________
406747 34 30 36 37 34 37 25 07 09 d1 c8 e8 11 37 3f e6 fc df
405174 34 30 35 31 37 34 25 07 0a d7 cb eb 11 37 3f e6 fc df
405273 34 30 35 32 37 33 25 07 0a d4 cb ec 11 37 3f e6 fc df
What is this 11 37 3f e6 fc df string? I have no idea — it's mostly not printable ASCII — but XORing the uudecoded ID with it yields the corresponding ID number in three cases out of three.
More to think about: you've provided two different ID strings for the value 405174: JiJQPjNfT0MtCg== and JikwPCpVXE9LCg==. These decode to 0b 07 93 fe f8 cd and 25 07 0a d7 cb eb respectively, and their XOR is 2e 00 99 29 33 26. The two URLs from which these ID strings came from have decoded watermarks of 0e and 20 respectively, which accounts for the first byte (and the second byte is the same in both, anyway). Where the differences in the remaining four bytes come from is still a mystery to me.
That's going to be difficult. Even if you find the encryption method and keys, the original data is likely salted and the salt is probably varied with each record.
That's the point of encryption.

Finding a set of permutations, with a constraint

I have a set of N^2 numbers and N bins. Each bin is supposed to have N numbers from the set assigned to it. The problem I am facing is finding a set of distributions that map the numbers to the bins, satisfying the constraint, that each pair of numbers can share the same bin only once.
A distribution can nicely be represented by an NxN matrix, in which each row represents a bin. Then the problem is finding a set of permutations of the matrix' elements, in which each pair of numbers shares the same row only once. It's irrelevant which row it is, only that two numbers were both assigned to the same one.
Example set of 3 permutations satisfying the constraint for N=8:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
0 8 16 24 32 40 48 56
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
0 9 18 27 36 45 54 63
1 10 19 28 37 46 55 56
2 11 20 29 38 47 48 57
3 12 21 30 39 40 49 58
4 13 22 31 32 41 50 59
5 14 23 24 33 42 51 60
6 15 16 25 34 43 52 61
7 8 17 26 35 44 53 62
A permutation that doesn't belong in the above set:
0 10 20 30 32 42 52 62
1 11 21 31 33 43 53 63
2 12 22 24 34 44 54 56
3 13 23 25 35 45 55 57
4 14 16 26 36 46 48 58
5 15 17 27 37 47 49 59
6 8 18 28 38 40 50 60
7 9 19 29 39 41 51 61
Because of multiple collisions with the second permutation, since, for example they're both pairing the numbers 0 and 32 in one row.
Enumerating three is easy, it consists of 1 arbitrary permutation, its transposition and a matrix where the rows are made of the previous matrix' diagonals.
I can't find a way to produce a set consisting of more though. It seems to be either a very complex problem, or a simple problem with an unobvious solution. Either way I'd be thankful if somebody had any ideas how to solve it in reasonable time for the N=8 case, or identified the proper, academic name of the problem, so I could google for it.
In case you were wondering what is it useful for, I'm looking for a scheduling algorithm for a crossbar switch with 8 buffers, which serves traffic to 64 destinations. This part of the scheduling algorithm is input traffic agnostic, and switches cyclically between a number of hardwired destination-buffer mappings. The goal is to have each pair of destination addresses compete for the same buffer only once in the cycling period, and to maximize that period's length. In other words, so that each pair of addresses was competing for the same buffer as seldom as possible.
EDIT:
Here's some code I have.
CODE
It's greedy, it usually terminates after finding the third permutation. But there should exist a set of at least N permutations satisfying the problem.
The alternative would require that choosing permutation I involved looking for permutations (I+1..N), to check if permutation I is part of the solution consisting of the maximal number of permutations. That'd require enumerating all permutations to check at each step, which is prohibitively expensive.
What you want is a combinatorial block design. Using the nomenclature on the linked page, you want designs of size (n^2, n, 1) for maximum k. This will give you n(n+1) permutations, using your nomenclature. This is the maximum theoretically possible by a counting argument (see the explanation in the article for the derivation of b from v, k, and lambda). Such designs exist for n = p^k for some prime p and integer k, using an affine plane. It is conjectured that the only affine planes that exist are of this size. Therefore, if you can select n, maybe this answer will suffice.
However, if instead of the maximum theoretically possible number of permutations, you just want to find a large number (the most you can for a given n^2), I am not sure what the study of these objects is called.
Make a 64 x 64 x 8 array: bool forbidden[i][j][k] which indicates whether the pair (i,j) has appeared in row k. Each time you use the pair (i, j) in the row k, you will set the associated value in this array to one. Note that you will only use the half of this array for which i < j.
To construct a new permutation, start by trying the member 0, and verify that at least seven of forbidden[0][j][0] that are unset. If there are not seven left, increment and try again. Repeat to fill out the rest of the row. Repeat this whole process to fill the entire NxN permutation.
There are probably optimizations you should be able to come up with as you implement this, but this should do pretty well.
Possibly you could reformulate your problem into graph theory. For example, you start with the complete graph with N×N vertices. At each step, you partition the graph into N N-cliques, and then remove all edges used.
For this N=8 case, K64 has 64×63/2 = 2016 edges, and sixty-four lots of K8 have 1792 edges, so your problem may not be impossible :-)
Right, the greedy style doesn't work because you run out of numbers.
It's easy to see that there can't be more than 63 permutations before you violate the constraint. On the 64th, you'll have to pair at least one of the numbers with another its already been paired with. The pigeonhole principle.
In fact, if you use the table of forbidden pairs I suggested earlier, you find that there are a maximum of only N+1 = 9 permutations possible before you run out. The table has N^2 x (N^2-1)/2 = 2016 non-redundant constraints, and each new permutation will create N x (N choose 2) = 28 new pairings. So all the pairings will be used up after 2016/28 = 9 permutations. It seems like realizing that there are so few permutations is the key to solving the problem.
You can generate a list of N permutations numbered n = 0 ... N-1 as
A_ij = (i * N + j + j * n * N) mod N^2
which generates a new permutation by shifting the columns in each permutation. The top row of the nth permutation are the diagonals of the n-1th permutation. EDIT: Oops... this only appears to work when N is prime.
This misses one last permutation, which you can get by transposing the matrix:
A_ij = j * N + i

Resources