I'm trying to implement the a block conjugate gradient algorithm that is not subject to breakdown from non invertible residual matrices; But I'm getting nonsensical results (in each iteration, the rank of Rcurrent should be getting smaller, not increasing). It is presented in the paper "A breakdown-free block conjugate gradient method" by Hao Ji and Yaohang Li.
Here is the algorithm:
This is my implementation in Julia:
function orth(M::Matrix)
matrixRank = rank(M)
Ufactor = svdfact(M)[:U]
return Ufactor[:,1:matrixRank]
end
function BFBCG(A::Matrix, Xcurrent::Matrix, M::Matrix, tol::Number, maxit::Number, Rcurrent::Matrix)
# initialization
#Rcurrent = B - A*Xcurrent;
Zcurrent = M*Rcurrent;
Pcurrent = orth(Zcurrent);
Xnext::Matrix = ones(size(Xcurrent))
# iterative method
for i = 0:maxit
Qcurrent = A*Pcurrent
acurrent = (Pcurrent' * Qcurrent)\(Pcurrent'*Rcurrent)
Xnext = Xcurrent+Pcurrent*acurrent
Rnext = Rcurrent-Qcurrent*acurrent
# if Residual norm of columns in Rcurrent < tol, stop
Znext = M*Rnext
bcurrent = -(Pcurrent' * Qcurrent)\ (Qcurrent'*Znext)
Pnext = orth(Znext+Pcurrent*bcurrent)
Xcurrent = Xnext
Zcurrent = Znext
Rcurrent = Rnext
Pcurrent = Pnext
#printf("\nRANK:\t%d",rank(Rcurrent))
#printf("\nNORM column1:\t%1.8f",vecnorm(Rcurrent[:,1]))
#printf("\nNORM column2:\t%1.8f\n=============",vecnorm(Rcurrent[:,2]))
end
return Xnext
end
The results of the paper for those inputs:
A = [15 5 4 3 2 1; 5 35 9 8 7 6; 4 9 46 12 11 10; 3 8 12 50 14 13; 2 7 11 14 19 15; 1 6 10 13 15 45]
M = eye(6)
guess = rand(6,2)
R0 = [1 0.537266261211281;2 0.043775211060964;3 0.964458562037146;4 0.622317517840541;5 0.552735938776748;6 0.023323943544997]
X = BFBCG(A,guess,M,tol,9,R0)
are a rank that reaches zero in the third iteration.
The algorithm works, and the rank goes to zero in the third iteration. The problem is numerical inaccuracies which would leave any matrix fully ranked. To get a better result, use rank(Rcurrent, tol) instead of rank(Rcurrent) which is a version which takes tolerance into account. After which, at least on my machine, the rank drops to zero.
julia> X = BFBCG(A,guess,M,tol,9,R0)
RANK: 2
NORM column1: 1.78951939
NORM column2: 0.41155080
=============
RANK: 2
NORM column1: 0.97949620
NORM column2: 0.16170799
=============
RANK: 0
NORM column1: 0.00000000
NORM column2: 0.00000000
=============
RANK: 0
NORM column1: 0.00000000
NORM column2: 0.00000000
=============
Related
An implementation of a 1D convolution operation will often need to load a vectors of data that sequentially step through a buffer of data offset by one element each iteration.
For example, consider a buffer of input data X[0], X[1], ..., X[n-1], where n is greater than twice the kernel length. If the convolution length is three, and we can fit eight elements in each vector, we might first want a vector with X[0], X[1], ..., X[7], then the next with X[1], X[2], ..., X[8] and the last with X[2], X[3], ..., X[9].
Consider the case where the kernel length as well as the vector length is 8. We must load eight vectors, that might look sequentially like this:
{ 0 1 2 3 4 5 6 7 }
{ 1 2 3 4 5 6 7 8 }
{ 2 3 4 5 6 7 8 9 }
{ 3 4 5 6 7 8 9 10 }
{ 4 5 6 7 8 9 10 11 }
{ 5 6 7 8 9 10 11 12 }
{ 6 7 8 9 10 11 12 13 }
{ 7 8 9 10 11 12 13 14 }
By reducing this sequence vertically, we could produce a running mean or sum. I.e., the sum of these vectors will have the sum of the first 8 elements in it's first position.
Consider that the order of the elements in the column does not matter. Any permutation of the elements in each column will still produce the same result. For a convolution, this permutation can be accounted for by altering the order of the constants used in the kernel.
Is there a faster way to load these vectors that takes advantage of this? Consider as a baseline the simple sequence of unaligned loads:
// Any sort of sliding window function, i.e. running mean, running max, convolution, etc.
void sliding_window(const float* input, unsigned length)
{
for (unsigned i = 0; i < length - 7; i += 8) {
for (unsigned j = 0; i < 8; j++) {
__m256 v = _mm256_loadu_ps(input[i + j]);
// reduction operation on v (e.g. max or fmadd) goes here
}
}
// handle tail here
}
First of all, you should note that if your convolution is separable, this is very often worth doing. Simple example:
res[i] = x[i]+x[i+1]+x[i+2]+x[i+3]+x[i+4]+x[i+5]+x[i+6]+x[i+7];
This can be done by convoluting with [1 1] * [1 0 1] * [1 0 0 0 1] in three steps, for example like so:
void sliding_window(float* output, const float* input, size_t length)
{
// Nomenclature
// aX input at i+X
// bX convolution with [1 1] starting at i+X
// cX convolution with [1 1] * [1 0 1] starting at i+X
// dX convolution with [1 1] * [1 0 1] * [1 0 0 0 1] starting at i+X
__m256 a0 = _mm256_load_ps(input), a8 = _mm256_load_ps(input + 8);
__m256 b0 = _mm256_add_ps(a0, _mm256_loadu_ps(input+1)), b8 = _mm256_add_ps(a8, _mm256_loadu_ps(input+9));
__m256 b4 = _mm256_permute2f128_ps(b0, b8, 1+16*2);
__m256 b2 = _mm256_shuffle_ps(b0, b4, 2+3*4+0*16+1*64);
__m256 c0 = _mm256_add_ps(b0, b2);
for (unsigned i = 0; i < length - 25; i += 8) {
// Convolute input with [1 1]
__m256 a16 = _mm256_load_ps( input + i + 16);
__m256 a17 = _mm256_loadu_ps(input + i + 17);
__m256 b16 = _mm256_add_ps(a16, a17);
// Convolute first convolution with [1 0 1]
__m256 b12 = _mm256_permute2f128_ps(b8, b16, 1+16*2);
__m256 b10 = _mm256_shuffle_ps(b8, b12, 2+3*4+0*16+1*64);
__m256 c8 = _mm256_add_ps(b8, b10);
// Convolute second convolution with [1 0 0 0 1]
__m256 c4 = _mm256_permute2f128_ps(c0, c8, 1+16*2);
__m256 d0 = _mm256_add_ps(c0, c4);
// Store result
_mm256_store_ps(output + i, d0);
// rename registers for next iteration:
b8 = b16;
c0 = c8;
}
// handle tail here ...
}
You can of course replace addps by maxps. Godbolt-Demo: https://godbolt.org/z/W9K9o943o
Overall, this takes 1 aligned + 1 unaligned load, 3 shuffles, 3 additions and 1 store for 8 elements (actually only using AVX1). On Intel CPUs with only 1 shuffle per cycle this may actually just be slightly faster than a naïve 8-load, 7-addition implementation (I did not benchmark this). On Zen3 I'm not sure about the actual cost of loading unaligned data.
If you have a non-trivial kernel it is probably hard to determine if it is separable, though.
The best I've been able to come up with is this sequence:
{ 0 1 2 3 4 5 6 7 }
{ 1 2 3 8 5 6 7 12 }
{ 2 3 8 9 6 7 12 13 }
{ 3 8 9 10 7 12 13 14 }
{ 4 5 6 7 8 9 10 11 }
{ 5 6 7 4 9 10 11 8 }
{ 6 7 4 5 10 11 8 9 }
{ 7 4 5 6 11 8 9 10 }
Each column contains the necessary elements. The first column contains 0 - 7, the next 1 - 8, then 2 - 9, etc.
This can be produced with the following sequence of operations:
void sliding_window(const float* input, unsigned length)
{
__m256 a = _mm256_load_ps(input);
for (unsigned i = 8; i < length - 7; i += 8) {
__m256 b = _mm256_load_ps(input + i);
__m256i ai = _mm256_castps_si256(a); // not part of sequence
__m256i bi = _mm256_castps_si256(b); // just for code reduction
// a is the first vector, these are remaining 7
__m256 j1 = _mm256_castsi256_ps(_mm256_alignr_epi8(bi, ai, 4));
// Reduction operation (add, fmadd, max, etc.) between a and j1 goes here
__m256 j2 = _mm256_castsi256_ps(_mm256_alignr_epi8(bi, ai, 8));
// Reduction with j2 goes here, and so on after each value
__m256 j3 = _mm256_castsi256_ps(_mm256_alignr_epi8(bi, ai, 12));
__m256 r0 = _mm256_permute2f128_ps(a, b, 0x21);
a = b; // Register with "b" isn't needed anymore
__m256 r1 = _mm256_permute_ps(r0, 0x39);
__m256 r2 = _mm256_permute_ps(r0, 0x4e);
__m256 r3 = _mm256_permute_ps(r0, 0x93);
// Final reduction with r3 to produce result
}
// handle tail here
}
On Zen3, I benchmark this as about 10% faster than the sequence of unaligned loads.
I am doing a question where, given an n x n 2D matrix representing an image, rotate the image by 90 degrees (clockwise).You have to rotate the image in-place, which means you have to modify the input 2D matrix directly. DO NOT allocate another 2D matrix and do the rotation. This my my code:
class Solution {
public void rotate(int[][] matrix) {
int size = matrix.length;
for(int i = 0 ; i < matrix.length; i++){
for(int y = 0 ; y < matrix[0].length ; y++){
matrix[i][y] = matrix[size - y - 1][i];
System.out.println(size - y - 1);
System.out.println(i);
System.out.println("");
}
}
}
}
This is the input and output results:
input matrix: [[1,2,3],[4,5,6],[7,8,9]]
output matrix: [[7,4,7],[8,5,4],[9,4,7]]
expected matrix: [[7,4,1],[8,5,2],[9,6,3]]
I do not really understand why I am getting duplicates in my output such as the number seven 3 times. On my System.out.println statement, I am getting the correct list of indexes :
2
0
1
0
0
0
2
1
1
1
0
1
2
2
What can be wrong?
I have found a solution. I will try my best to explain it.
Let us consider an array of size 4.
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Now lets look at the numbers present only on the outside of the array:
1 2 3 4
5 8
9 12
13 14 15 16
We will proceed by storing the first element 1 in a temporary variable. Next we will replace 1 by 13, 13 by 16, 16 by 4 and at last 4 by 1 (whose value we already stored in the temporary variable).
We will do the same for all the elements of the first row.
Here is a pseudocode if you just want to rotate this outer ring, lets call it an outer ring:
for i = 0 to n-1
{
temp = A[0][i];
A[0][i] = A[n-1-i][0];
A[n-1-i][0] = A[n-1-0][n-1-i];
A[n-1-0][n-1-i] = A[i][n-1-0];
A[i][n-1-0] = temp;
}
The code runs for a total of n times. Once for each element of first row. Implement this code an run it. You will see only the outer ring is rotated. Now lets look at the inner ring:
6 7
10 11
Now the loop in pseudocode only needs to run for 2 times and also our range of indexes has decreased. For outer ring, the loop started from i = 0 and ended at i = n-1. However, for the inner ring the for loop need to run from i = 1 to i = n-2.
If you had an array of size n, to rotate the xth ring of the array, the loop needs to run from i = x to i = n-1-x.
Here is the code to rotate the entire array:
x = 0;
int temp;
while (x < n/2)
{
for (int i = x;i < n-1-x;i++)
{
temp = arr[x][i];
arr[x][i] = arr[n-1-i][x];
arr[n-1-i][x] = arr[n-1-x][n-1-i];
arr[n-1-x][n-1-i] = arr[i][n-1-x];
arr[i][n-1-x] = temp;
}
x++;
}
Here each value of x denotes the xth ring.
0 <= x <= n-1
The reason why the outer loop runs only for x < n/2 times is because each array has n/2 rings when n is even and n/2 + 1 rings if n is odd.
I hope I have helped you. Do comment if face any problems with the solution or its explanation.
I have a matrix in Rcpp (C++ for R) which is stored in column order in memory. Ie, it looks like:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
Now, I have a single for loop that runs from i = 1 to 25 (bear in mind, it is all zero based, but here I am just saying one for convenience).
For every element of the matrix, I want its Moore neighbourhood. This is easy for the elements that are not on the edge. So if our selected index is idx and the size of the square matrix is nrow then we have
leftmid = idx - nrow
lefttop = (idx - nrow) - 1
leftbot = (idx - nrow) + 1
rightmid = idx + nrow
righttop = (idx + nrow) - 1
rightbot = (idx + nrow) + 1
midtop = idx - 1
midbot = idx + 1
But i cant figure out how to deal with the edge cases. For example, if idx = 3, then i want the neighbours:
leftmid = 23
lefttop = 22
leftbot = 24
rightmid = 8
righttop = 7
rightbot = 9
midtop = 2
midbot = 4
It's a little bit more complicated at the corner cases as well. My goal here is to reduce time. I am currently running my program with a double for loop which works, but is slower than reasonable. I want to change it into a single for loop to improve performance.
Edit: I realized the left and right boundaries can be obtained by modulus. So 3 - 5 %% 25 = 23. But I still have the top and bottom edge cases.
It appears you're interested in "cyclic" boundary conditions, where the matrix has a toroidal topology, i.e. the top wraps around to the bottom and the right wraps around to the left.
It might be easier to iterate with four loops, one each over the row and column, and then one each over the row and column of the neighborhood. Something like this should work:
int mooreNeighbors[3][3];
int nRows = 5;
int nCols = 5;
// Loop over the rows and columns of the matrix
for (int i = 0; i < nRows; ++i) {
for (int j = 0; j < nCols; ++j) {
// Loop over the cyclic Moore neighborhood
for (int mnI = 0; mnI < 3; ++mnI) {
for (int mnJ = 0; mnJ < 3; ++mnJ) {
// Sub-matrix indices
int subI = (i - mnI - 1) % nRows;
int subJ = (j - mnJ - 1) % nCols;
// Index into column-dominant matrix
int idx = subI + subJ*nRows;
mooreNeighbors[mnI][mnJ] = matrix[idx];
}
}
}
}
I haven't tried compiling this, but it should be close to correct and clear enough to correct any mistakes. Think of it as pseudo-code.
Also, I'm preferring clarity over optimality. For example, you don't have to do everything in the inner-most loop.
I'm interested in finding the nth row of pascal triangle (not a specific element but the whole row itself). What would be the most efficient way to do it?
I thought about the conventional way to construct the triangle by summing up the corresponding elements in the row above which would take:
1 + 2 + .. + n = O(n^2)
Another way could be using the combination formula of a specific element:
c(n, k) = n! / (k!(n-k)!)
for each element in the row which I guess would take more time the the former method depending on the way to calculate the combination. Any ideas?
>>> def pascal(n):
... line = [1]
... for k in range(n):
... line.append(line[k] * (n-k) / (k+1))
... return line
...
>>> pascal(9)
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
This uses the following identity:
C(n,k+1) = C(n,k) * (n-k) / (k+1)
So you can start with C(n,0) = 1 and then calculate the rest of the line using this identity, each time multiplying the previous element by (n-k) / (k+1).
A single row can be calculated as follows:
First compute 1. -> N choose 0
Then N/1 -> N choose 1
Then N*(N-1)/1*2 -> N choose 2
Then N*(N-1)*(N-2)/1*2*3 -> N choose 3
.....
Notice that you can compute the next value from the previous value, by just multipyling by a single number and then dividing by another number.
This can be done in a single loop. Sample python.
def comb_row(n):
r = 0
num = n
cur = 1
yield cur
while r <= n:
r += 1
cur = (cur* num)/r
yield cur
num -= 1
The most efficient approach would be:
std::vector<int> pascal_row(int n){
std::vector<int> row(n+1);
row[0] = 1; //First element is always 1
for(int i=1; i<n/2+1; i++){ //Progress up, until reaching the middle value
row[i] = row[i-1] * (n-i+1)/i;
}
for(int i=n/2+1; i<=n; i++){ //Copy the inverse of the first part
row[i] = row[n-i];
}
return row;
}
here is a fast example implemented in go-lang that calculates from the outer edges of a row and works it's way to the middle assigning two values with a single calculation...
package main
import "fmt"
func calcRow(n int) []int {
// row always has n + 1 elements
row := make( []int, n + 1, n + 1 )
// set the edges
row[0], row[n] = 1, 1
// calculate values for the next n-1 columns
for i := 0; i < int(n / 2) ; i++ {
x := row[ i ] * (n - i) / (i + 1)
row[ i + 1 ], row[ n - 1 - i ] = x, x
}
return row
}
func main() {
for n := 0; n < 20; n++ {
fmt.Printf("n = %d, row = %v\n", n, calcRow( n ))
}
}
the output for 20 iterations takes about 1/4 millisecond to run...
n = 0, row = [1]
n = 1, row = [1 1]
n = 2, row = [1 2 1]
n = 3, row = [1 3 3 1]
n = 4, row = [1 4 6 4 1]
n = 5, row = [1 5 10 10 5 1]
n = 6, row = [1 6 15 20 15 6 1]
n = 7, row = [1 7 21 35 35 21 7 1]
n = 8, row = [1 8 28 56 70 56 28 8 1]
n = 9, row = [1 9 36 84 126 126 84 36 9 1]
n = 10, row = [1 10 45 120 210 252 210 120 45 10 1]
n = 11, row = [1 11 55 165 330 462 462 330 165 55 11 1]
n = 12, row = [1 12 66 220 495 792 924 792 495 220 66 12 1]
n = 13, row = [1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1]
n = 14, row = [1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1]
n = 15, row = [1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1]
n = 16, row = [1 16 120 560 1820 4368 8008 11440 12870 11440 8008 4368 1820 560 120 16 1]
n = 17, row = [1 17 136 680 2380 6188 12376 19448 24310 24310 19448 12376 6188 2380 680 136 17 1]
n = 18, row = [1 18 153 816 3060 8568 18564 31824 43758 48620 43758 31824 18564 8568 3060 816 153 18 1]
n = 19, row = [1 19 171 969 3876 11628 27132 50388 75582 92378 92378 75582 50388 27132 11628 3876 969 171 19 1]
An easy way to calculate it is by noticing that the element of the next row can be calculated as a sum of two consecutive elements in the previous row.
[1, 5, 10, 10, 5, 1]
[1, 6, 15, 20, 15, 6, 1]
For example 6 = 5 + 1, 15 = 5 + 10, 1 = 1 + 0 and 20 = 10 + 10. This gives a simple algorithm to calculate the next row from the previous one.
def pascal(n):
row = [1]
for x in xrange(n):
row = [l + r for l, r in zip(row + [0], [0] + row)]
# print row
return row
print pascal(10)
In Scala Programming: i would have done it as simple as this:
def pascal(c: Int, r: Int): Int = c match {
case 0 => 1
case `c` if c >= r => 1
case _ => pascal(c-1, r-1)+pascal(c, r-1)
}
I would call it inside this:
for (row <- 0 to 10) {
for (col <- 0 to row)
print(pascal(col, row) + " ")
println()
}
resulting to:
.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
To explain step by step:
Step 1: We make sure that if our column is the first one we always return figure 1.
Step 2: Since each X-th row there are X number of columns. So we say that; the last column X is greater than or equal to X-th row, then the return figure 1.
Step 3: Otherwise, we get the sum of the repeated pascal of the column just before the current one and the row just before the current one ; and the pascal of that column and the row just before the current one.
Good Luck.
Let me build upon Shane's excellent work for an R solution. (Thank you, Shane!. His code for generating the triangle:
pascalTriangle <- function(h) {
lapply(0:h, function(i) choose(i, 0:i))
}
This will allow one to store the triangle as a list. We can then index whatever row desired. But please add 1 when indexing! For example, I'll grab the bottom row:
pt_with_24_rows <- pascalTriangle(24)
row_24 <- pt_with_24_rows[25] # add one
row_24[[1]] # prints the row
So, finally, make-believe I have a Galton Board problem. I have the arbitrary challenge of finding out percentage of beans have clustered in the center: say, bins 10 to 15 (out of 25).
sum(row_24[[1]][10:15])/sum(row_24[[1]])
Which turns out to be 0.7704771. All good!
In Ruby, the following code will print out the specific row of Pascals Triangle that you want:
def row(n)
pascal = [1]
if n < 1
p pascal
return pascal
else
n.times do |num|
nextNum = ((n - num)/(num.to_f + 1)) * pascal[num]
pascal << nextNum.to_i
end
end
p pascal
end
Where calling row(0) returns [1] and row(5) returns [1, 5, 10, 10, 5, 1]
Here is the another best and simple way to design a Pascal Triangle dynamically using VBA.
`1
11
121
1331
14641`
`Sub pascal()
Dim book As Excel.Workbook
Dim sht As Worksheet
Set book = ThisWorkbook
Set sht = book.Worksheets("sheet1")
a = InputBox("Enter the Number", "Fill")
For i = 1 To a
For k = 1 To i
If i >= 2 And k >= 2 Then
sht.Cells(i, k).Value = sht.Cells(i - 1, k - 1) + sht.Cell(i- 1, k)
Else
sht.Cells(i, k).Value = 1
End If
Next k
Next i
End Sub`
I used Ti-84 Plus CE
The use of –> in line 6 is the store value button
Forloop syntax is
:For(variable, beginning, end [, increment])
:Commands
:End
nCr syntax is
:valueA nCr valueB
List indexes start at 1 so that's why i set it to R+1
N= row
R= column
PROGRAM: PASCAL
:ClrHome
:ClrList L1
:Disp "ROW
:Input N
:For(R,0,N,1)
:N nCr R–>L1(R+1)
:End
:Disp L1
This is the fastest way I can think of to do this in programming (with a ti 84) but if you mean to be able to calculate the row using pen and paper then just draw out the triangle cause doing factorals are a pain!
Here's an O(n) space-complexity solution in Python:
def generate_pascal_nth_row(n):
result=[1]*n
for i in range(n):
previous_res = result.copy()
for j in range(1,i):
result[j] = previous_res[j-1] + previous_res[j]
return result
print(generate_pascal_nth_row(6))
class Solution{
public:
int comb(int n,int r){
long long c=1;
for(int i=1;i<=r;i++) { //calculates n!/(n-r)!
c=((c*n))/i; n--;
}
return c;
}
vector<int> getRow(int n) {
vector<int> v;
for (int i = 0; i < n; ++i)
v.push_back(comb(n,i));
return v;
}
};
faster than 100% submissions on leet code https://leetcode.com/submissions/detail/406399031/
The most efficient way to calculate a row in pascal's triangle is through convolution. First we chose the second row (1,1) to be a kernel and then in order to get the next row we only need to convolve curent row with the kernel.
So convolution of the kernel with second row gives third row [1 1]*[1 1] = [1 2 1], convolution with the third row gives fourth [1 2 1]*[1 1] = [1 3 3 1] and so on
This is a function in julia-lang (very simular to matlab):
function binomRow(n::Int64)
baseVector = [1] #the first row is equal to 1.
kernel = [1,1] #This is the second row and a kernel.
row = zeros(n)
for i = 1 : n
row = baseVector
baseVector = conv(baseVector, kernel) #convoltion with kernel
end
return row::Array{Int64,1}
end
To find nth row -
int res[] = new int[n+1];
res[0] = 1;
for(int i = 1; i <= n; i++)
for(int j = i; j > 0; j++)
res[j] += res[j-1];
How can one generate say 1000 random points with a distribution like that of
towns and cities in e.g. Ohio ?
I'm afraid I can't define "distributed like cities" precisely;
uniformly distributed centres + small Gaussian clouds
are easy but ad hoc.
Added: There must be a family of 2d distributions
with a clustering parameter that can be varied to match a given set of points ?
Maybe you can take a look at Walter Christaller's Theory of Central Places. I guess there must be some generator somewhere, or you can cook up your own.
Start with a model of the water features in your target area (or make one up, if it's for an imaginary place), then cluster the cities near river junctions, along lakeshores, lake-river junctions. Then make imaginary highways connecting those major cities. Now sprinkle some intermediate cities along those highways at reasonable spacing, preferring to be near junctions in the highways. Now sprinkle some small towns through the empty spaces.
Gaussian clusters with Poisson cluster sizes work fairly well.
Problem: generate random points that cluster roughly like given cities, say in the USA.
Subproblems:
a) describe clusters with rows of numbers, so that "cluster A is like cluster B"
simplifies to "clusternumbers(A) is like "clusternumbers(B)".
Running N=100 then 1000 points through fcluster below, with ncluster=25, gives
N 100 ncluster 25: 22 + 3 r 117
sizes: av 4 10 9 8 7 6 6 5 5 4 4 4 ...
radii: av 117 202 198 140 134 64 62 28 197 144 148 132 ...
N 1000 cluster 25: 22 + 3 r 197
sizes: av 45 144 139 130 85 84 69 63 43 38 33 30 ...
radii: av 197 213 279 118 146 282 154 245 212 243 226 235 ...
b) find a combiation of random generators with 2 or 3 parameters
which can be varied to generate different clusterings.
Gaussian clusters with Poisson cluster sizes can match clustering of cities fairly well:
def randomclusters( N, ncluster=25, radius=1, box=box ):
""" -> N 2d points: Gaussian clusters, Poisson cluster sizes """
pts = []
lam = eval( str( N // ncluster ))
clustersize = lambda: np.random.poisson(lam - 1) + 1
# poisson 2: 14 27 27 18 9 4 %
# poisson 3: 5 15 22 22 17 10 %
while len(pts) < N:
u = uniformrandom2(box)
csize = clustersize()
if csize == 1:
pts.append( u )
else:
pts.extend( inbox( gauss2( u, radius, csize )))
return pts[:N]
# Utility functions --
import scipy.cluster.hierarchy as hier
def fcluster( pts, ncluster, method="average", criterion="maxclust" ):
""" -> (pts, Y pdist, Z linkage, T fcluster, clusterlists)
ncluster = n1 + n2 + ... (including n1 singletons)
av cluster size = len(pts) / ncluster
"""
# Clustering is pretty fast:
# sort pdist, then like Kruskal's MST, O( N^2 ln N )
# Many metrics and parameters are possible; these satisfice.
pts = np.asarray(pts)
Y = scipy.spatial.distance.pdist( pts ) # N*(N-1)/2
Z = hier.linkage( Y, method ) # N-1, like mst
T = hier.fcluster( Z, ncluster, criterion=criterion )
clusters = clusterlists(T)
return (pts, Y, Z, T, clusters)
def clusterlists(T):
""" T = hier.fcluster( Z, t ) e.g. [a b a b c a]
-> [ [0 2 5] [1 3] ] sorted by len, no singletons [4]
"""
clists = [ [] for j in range( max(T) + 1 )]
for j, c in enumerate(T):
clists[c].append( j )
clists.sort( key=len, reverse=True )
n1 = np.searchsorted( map( len, clists )[::-1], 2 )
return clists[:-n1]
def radius( x ):
""" rms |x - xmid| """
return np.sqrt( np.mean( np.var( x, axis=0 )))
# * 100 # 1 degree lat/long ~ 70 .. 111 km
In java this is provided through new Random().nextGaussian(). Since the java source is available, you can look at it:
synchronized public double nextGaussian() {
// See Knuth, ACP, Section 3.4.1 Algorithm C.
if (haveNextNextGaussian) {
haveNextNextGaussian = false;
return nextNextGaussian;
} else {
double v1, v2, s;
do {
v1 = 2 * nextDouble() - 1; // between -1 and 1
v2 = 2 * nextDouble() - 1; // between -1 and 1
s = v1 * v1 + v2 * v2;
} while (s >= 1 || s == 0);
double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
nextNextGaussian = v2 * multiplier;
haveNextNextGaussian = true;
return v1 * multiplier;
}
}
Plotting 30000 houses using
x = r.nextGaussian() * rad/4 + rad;
y = r.nextGaussian() * rad/4 + rad;
yields this beautiful city: