Strassen Multiplication Algorithm StackOverFlow Error - algorithm

I am working on implementing Straussen's Multiplication. Below is my method for multiplying them in a Divide and Conquer Approach.
public static double[][] multiply(double[][] A, double[][] B)
{
int n = A.length;
double[][] R = new double[n][n];
/** base case **/
//if (n == 1){
// R[0][0] = A[0][0] * B[0][0];
// }
//else{
double[][] A11 = new double [n/2][n/2];
double[][] A12 = new double [n/2][n/2];
double[][] A21 = new double [n/2][n/2];
double[][] A22 = new double [n/2][n/2];
double[][] B11 = new double [n/2][n/2];
double[][] B12 = new double [n/2][n/2];
double[][] B21 = new double [n/2][n/2];
double[][] B22 = new double [n/2][n/2];
/** Dividing matrix A into 4 halves **/
split(A, A11, 0 , 0);
split(A, A12, 0 , n/2);
split(A, A21, n/2, 0);
split(A, A22, n/2, n/2);
/** Dividing matrix B into 4 halves **/
split(B, B11, 0 , 0);
split(B, B12, 0 , n/2);
split(B, B21, n/2, 0);
split(B, B22, n/2, n/2);
/**
M1 = (A11 + A22)(B11 + B22)
M2 = (A21 + A22) B11
M3 = A11 (B12 - B22)
M4 = A22 (B21 - B11)
M5 = (A11 + A12) B22
M6 = (A21 - A11) (B11 + B12)
M7 = (A12 - A22) (B21 + B22)
**/
double [][] M1 = multiply(add(A11, A22), add(B11, B22));
double [][] M2 = multiply(add(A21, A22), B11);
double [][] M3 = multiply(A11, sub(B12, B22));
double [][] M4 = multiply(A22, sub(B21, B11));
double [][] M5 = multiply(add(A11, A12), B22);
double [][] M6 = multiply(sub(A21, A11), add(B11, B12));
double [][] M7 = multiply(sub(A12, A22), add(B21, B22));
/**
C11 = M1 + M4 - M5 + M7
C12 = M3 + M5
C21 = M2 + M4
C22 = M1 - M2 + M3 + M6
**/
double [][] C11 = add(sub(add(M1, M4), M5), M7);
double [][] C12 = add(M3, M5);
double [][] C21 = add(M2, M4);
double [][] C22 = add(sub(add(M1, M3), M2), M6);
/** join 4 halves into one result matrix **/
join(C11, R, 0 , 0);
join(C12, R, 0 , n/2);
join(C21, R, n/2, 0);
join(C22, R, n/2, n/2);
/** return result **/
return R;
}
In order to implement this code, I am reading in 2 txt files, one for matrix A and one for matrix B. For terms of testing I have the two matrices being the exact same:
5
3.250 6.130 3.180 7.680 9.060
5.450 1.660 6.790 6.650 4.250
4.460 8.260 7.870 7.880 1.890
1.460 8.510 8.510 3.510 1.440
1.590 7.160 4.400 3.310 1.970
Where the first line is n, and the following lines are the matrix.
My problem is that I am getting a stack overflow error on the line
int n = A.length;
Which I can't seem to figure out why or where to look. So my question is, does the problem lie in this algorithm? Or would the problem be in my main method?

Related

This is a question for my data structures course, how do I output this matrix?

enter image description hereIn this assignment you are asked to write a program that reads in two matrixes of size n x m and s x t and then it outputs the resulting multiplication of those two matrixes. Since sizes of matrices are not known in advance, you need to implement a dynamic memory allocation scheme for matrices. Your program should prompt the user for n, m, s, t, and the elements of each matrix. After that, if the multiplication can be performed on the matrices, your program should output each matrix and the result of the matrix multiplication.
Recall that if the matrix A is of size n x m, and matrix B is of size m x t, the resulting matrix C would be of size n x t. However, if a matrix is n x m and the other is s x t, these matrices cannot be multiplied if m is not equal to s.
If you are not familiar with the matrix multiplication problem, study the following example in order to find a general solution for the multiplication of two-dimensional matrices. In this example, matrices A and B are of size 3 x 3. An entry Xij indicates an element X[i][j].
A00 A01 A02 B00 B01 B02
A = A10 A11 A12 B = B10 B11 B12
A20 A21 A22 B20 B21 B22
Resulting multiplication of the matrices A and B is equal to
A00 * B00 + A01 * B10 + A02 * B20 A00*B01 + A01 * B11 + A02*B21 A00*B02 + A01 *B12 + A02 *B22
C = A10 * B00 + A11 * B10 + A12 * B20 A10B01 + A11 * B11 + A12B21 A10B02 + A11 B12 + A12 B22
A20 * B00 + A21 * B10 + A22 * B20 A20B01 + A21 * B11 + A22B21 A20B02 + A21 *B12 + A22 *B22
To solve the problem, you need to determine how to obtain an entry, say C[i][k], from the entries of matrices A and B. Once you figure this out, the programming will be extremely easy task. Warning; before you start to program, test your solution.
Matrix multiplication is done by first determining the size of the resultant matrix. You simply take the number of rows of the first matrix, and number of columns of the second matrix. Then you multiply every row of the first matrix with every column of the second matrix. This is the code.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int **a, **b, **c;
int n, m, s, t;
int i, j;
printf("\nEnter rows for matrix A: ");
scanf("%d", &n);
printf("\nEnter cols for matrix A : ");
scanf("%d", &m);
printf("\nEnter rows for matrix B : ");
scanf("%d", &s);
printf("\nEnter cols for matrix B: ");
scanf("%d", &t);
if(m != s)
{
printf("\nCan't multiply two matrices if col of a and row of b are not same.");
return(0);
}
a = (int **) malloc(n*sizeof(int *));
b = (int **) malloc(s*sizeof(int *));
c = (int **) malloc(n*sizeof(int *));
for(i=0; i<n; i++)
a[i] = (int *)malloc(m*sizeof(int));
for(i=0; i<s; i++)
b[i] = (int *)malloc(t*sizeof(int));
for(i=0; i<n; i++)
c[i] = (int *)malloc(t*sizeof(int));
printf("\nEnter elements of matrix A :\n");
for(i=0; i<n; i++)
{
for(j=0; j<m; j++)
{
printf("\tA[%d][%d] = ",i, j);
scanf("%d", &a[i][j]);
}
}
printf("\nEnter elements of matrix B :\n");
for(i=0; i<s; i++)
{
for(j=0; j<t; j++)
{
printf("\tB[%d][%d] = ",i, j);
scanf("%d", &b[i][j]);
}
}
printf("Elements of matrix A: \n");
for(i=0; i<n; i++)
{
for(j=0; j<m; j++)
{
printf(" %d ", a[i][j]);
}
printf("\n");
}
printf("Elements of matrix B: \n");
for(i=0; i<s; i++)
{
for(j=0; j<t; j++)
{
printf(" %d ", b[i][j]);
}
printf("\n");
}
for(i=0; i<n; i++)
for(j=0; j<s; j++)
{
c[i][j] = 0;
for(t=0; t<m; t++)
c[i][j] = c[i][j] + a[i][t] * b[t][j];
}
printf("\n\nResultant matrix :");
for(i=0; i<n; i++)
{
printf("\n\t\t\t");
for(j=0; j<t; j++)
printf("%d\t", c[i][j]);
}
return 0;
}
So, basically, you do dynamic allocation for rows and columns of each matrix that you need. Then you provide input for them, and calculate the values of matrix C with the following formula:
c[i][j] = c[i][j] + a[i][t] * b[t][j];

How to calculate log determinant of an Armadillo sparse matrix efficiently

I'm trying to write an MCMC procedure using RcppArmadillo which involves computing log determinants of some around 30,000 x 30,000 sparse matrices. It seems that log_det() in Armadillo does not support sp_mat right now so I'm doing something like this:
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEigen)]]
#include <RcppArmadillo.h>
#include <RcppEigen.h>
using namespace arma;
double eigen_ldet(sp_mat arma_mat) {
Eigen::SparseMatrix<double> eigen_s = Rcpp::as<Eigen::SparseMatrix<double>>(Rcpp::wrap(arma_mat));
Eigen::SparseLU<Eigen::SparseMatrix<double>> solver;
solver.compute(eigen_s);
double det = solver.logAbsDeterminant();
return det;
}
I feel it is really crappy and slow. Any help would be much appreciated.
Edit:
Here is the mockup:
library(Matrix)
m_mat = function(i = 1688, j = 18, rho = 0.5, alp = 0.5){
w1 = matrix(runif(i^2),nrow = i, ncol = i)
w2 = matrix(runif(j^2),nrow = j, ncol = j)
w1 = w1/rowSums(w1)
w2 = w2/rowSums(w2)
diag(w1) = 0
diag(w2) = 0
w1 = diag(i) - rho*w1
w2 = diag(j) - alp*w2
w1 = kronecker(Matrix(diag(j)), w1)
w2 = kronecker(Matrix(diag(i)), w2)
ind = matrix(c(rep(seq(1,i), each = j), rep(seq(1,j),i)), ncol = 2)
w2 = cbind(ind, w2)
w2 = w2[order(w2[,2]),]
w2 = t(w2[, -c(1,2)])
w2 = cbind(as.matrix(ind), w2)
w2 = w2[order(w2[,2]),]
w2 = t(w2[, -c(1,2)])
return(w1 + w2)
}
Edit2: Here is the second mockup with a sparse w1:
m_mat2 = function(i = 1688, j = 18, nb = 4, range = 10, rho = 0.5, alp = 0.5){
w1 = Matrix(0, nrow = i, ncol = i)
for ( h in 1:i){
rnd = as.integer(rnorm(nb, h, range))
rnd = ifelse(rnd > 0 & rnd <= i, rnd, h)
rnd = unique(rnd)
w1[h, rnd] = 1
}
w1 = w1/rowSums(w1)
w2 = matrix(runif(j^2),nrow = j, ncol = j)
w2 = w2/rowSums(w2)
diag(w1) = 0
diag(w2) = 0
w1 = diag(i) - rho*w1
w2 = diag(j) - alp*w2
w1 = kronecker(Matrix(diag(j)), w1)
w2 = kronecker(Matrix(diag(i)), w2)
ind = matrix(c(rep(seq(1,i), each = j), rep(seq(1,j),i)), ncol = 2)
w2 = cbind(ind, w2)
w2 = w2[order(w2[,2]),]
w2 = t(w2[, -c(1,2)])
w2 = cbind(as.matrix(ind), w2)
w2 = w2[order(w2[,2]),]
w2 = t(w2[, -c(1,2)])
return(w1 + w2)
}
An actual sparse w1 case should be much more irregular, but it takes about the same time to calculate (by the above code) the determinant of this one as using an actual w1.
From my experiments I find that the conversion from Armadillo to Eigen matrix is quite fast. Most of the time is spent in solver.compute(). I do not know if there are any faster algorithms to determine the log determinant of a sparse matrix, but I have found an approximation that is at least applicable to your mock-up: Only use the (dense) block-diagonal (see e.g. here for ways to include other parts of the matrix). If an approximate solution is sufficient, this is quite good and fast:
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEigen)]]
#include <RcppArmadillo.h>
#include <RcppEigen.h>
#include <Rcpp/Benchmark/Timer.h>
using namespace arma;
// [[Rcpp::export]]
double arma_sldet(sp_mat arma_mat, int blocks, int size) {
double ldet = 0.0;
double val = 0.0;
double sign = 0.0;
for (int i = 0; i < blocks; ++i) {
int begin = i * size;
int end = (i + 1) * size - 1;
sp_mat sblock = arma_mat.submat(begin, begin, end, end);
mat dblock(sblock);
log_det(val, sign, dblock);
ldet += val;
}
return ldet;
}
// [[Rcpp::export]]
Rcpp::List eigen_ldet(sp_mat arma_mat) {
Rcpp::Timer timer;
timer.step("start");
Eigen::SparseMatrix<double> eigen_s = Rcpp::as<Eigen::SparseMatrix<double>>(Rcpp::wrap(arma_mat));
timer.step("conversion");
Eigen::SparseLU<Eigen::SparseMatrix<double>> solver;
solver.compute(eigen_s);
timer.step("solver");
double det = solver.logAbsDeterminant();
timer.step("log_det");
Rcpp::NumericVector res(timer);
return Rcpp::List::create(Rcpp::Named("log_det") = det,
Rcpp::Named("timer") = res);
}
/*** R
library(Matrix)
m_mat = function(i = 1688, j = 18, rho = 0.5, alp = 0.5){
w1 = matrix(runif(i^2),nrow = i, ncol = i)
w2 = matrix(runif(j^2),nrow = j, ncol = j)
w1 = w1/rowSums(w1)
w2 = w2/rowSums(w2)
diag(w1) = 0
diag(w2) = 0
w1 = diag(i) - rho*w1
w2 = diag(j) - alp*w2
w1 = kronecker(Matrix(diag(j)), w1)
w2 = kronecker(Matrix(diag(i)), w2)
ind = matrix(c(rep(seq(1,i), each = j), rep(seq(1,j),i)), ncol = 2)
w2 = cbind(ind, w2)
w2 = w2[order(w2[,2]),]
w2 = t(w2[, -c(1,2)])
w2 = cbind(as.matrix(ind), w2)
w2 = w2[order(w2[,2]),]
w2 = t(w2[, -c(1,2)])
return(w1 + w2)
}
m <- m_mat(i = 200)
system.time(eigen <- eigen_ldet(m))
system.time(arma <- arma_sldet(m, 18, 200))
diff(eigen$timer)/1000000
all.equal(eigen$log_det, arma)
m <- m_mat()
#eigen_ldet(m) # takes to long ...
system.time(arma <- arma_sldet(m, 18, 1688))
*/
Results for a smaller mock-up:
> m <- m_mat(i = 200)
> system.time(eigen <- eigen_ldet(m))
user system elapsed
3.703 0.049 3.751
> system.time(arma <- arma_sldet(m, 18, 200))
user system elapsed
0.059 0.012 0.019
> diff(eigen$timer)/1000000
conversion solver log_det
5.208586 3738.131168 0.582578
> all.equal(eigen$log_det, arma)
[1] "Mean relative difference: 0.002874847"
The approximate solution is very close and much faster. We also see the timing distribution for the exact solution.
Results for the full mock-up:
> m <- m_mat()
> #eigen_ldet(m) # takes to long ...
> system.time(arma <- arma_sldet(m, 18, 1688))
user system elapsed
5.965 2.529 2.578
An even faster approximation can be achieved when only considering the diagonal:
// [[Rcpp::export]]
double arma_sldet_diag(sp_mat arma_mat) {
vec d(arma_mat.diag());
return sum(log(d));
}
If you have plenty of memory on your machine (say 32+ Gb), and a fast implementation of LAPACK (example: OpenBLAS or Intel MKL), a quick and dirty way is to convert the sparse matrix into a dense matrix, and compute the log determinant on the dense matrix.
Example:
sp_mat X = sprandu(30000,30000,0.01);
cx_double log_result = log_det( mat(X) );
While this obviously takes lots of memory, the advantage is that it avoids time consuming sparse solvers / factorizations. OpenBLAS or MKL will also take advantage of multiple cores.

3D Simplex Noise Sudden Height Change

I have a problem generating 3D Noise.
I've written a framework that uses DirectX11 to render everything.
I generate a Geo-sphere and modify the height values using a 3D Simplex Noise function.
The problem is that when I see the result I see sudden changes in height that are not noise like at all...
(the rectangle shape in the center of the picture)
I've modified the Persistence to 0.1 so the error is easily seen here...
I can't figure out the issue with the sudden height changes. This is an error, right?
I calculate the height with the following...
for( int i = 0; i < sphere.Vertices.size(); ++i )
{
// seperate out our positions
float x = sphere.Vertices[ i ].Position.x;
float y = sphere.Vertices[ i ].Position.y;
float z = sphere.Vertices[ i ].Position.z;
// get our noise value ( -1 to 1 )
float ix = noise.octavenoise3D( 10, 0.1, 0.5, x, y, z, perm, &grad3[0][0] );
// pack our coordinates into a vector
XMVECTOR curPos = { x, y, z };
// get the normalized vector of our position
XMVECTOR normPos = XMVector3Normalize( curPos );
// seperate our normalzed x y and z
float normX = XMVectorGetX( normPos );
float normY = XMVectorGetY( normPos );
float normZ = XMVectorGetZ( normPos );
// figure out the height of this specific vertice, maxHeight = sphereRadius / 3.0f;
float height = ix * maxHeight;
float change = height + sphereRadius;
// calculate the offset x y and z by the noise
float changeX = change * normX;
float changeY = change * normY;
float changeZ = change * normZ;
// save our new x y and z
vertices[ i ].Pos.x = x + changeX;
vertices[ i ].Pos.y = y + changeY;
vertices[ i ].Pos.z = z + changeZ;
// calculate color based on noise value
float colorChange = ( 0.5f * ix );
float color = 0.5f + colorChange;
// save color value in r g b
vertices[ i ].Color.x = color;
vertices[ i ].Color.y = color;
vertices[ i ].Color.z = color;
}
Also, changing the base coordinates of the function doesn't get rid of this weird output. (for anyone who thinks that starting at 0,0,0 was messing it up somehow)
Noise Implementation
float Noise::octavenoise3D( const float octaves, const float persistence,
const float scale, const float x, const float y, const float z, int *perm, int *grad3 )
float total = 0;
float frequency = scale;
float amplitude = 1;
float maxAmplitude = 0;
for( int i = 0; i < octaves; i++ )
{
total = total + rawnoise3D( x * frequency, y * frequency, z * frequency, perm, grad3 ) * amplitude;
frequency = frequency * 2;
maxAmplitude = maxAmplitude + amplitude;
amplitude = amplitude * persistence;
}
return total / maxAmplitude;
float Noise::rawnoise3D( const float x, const float y, const float z, int *perm, int *grad3 )
float n0, n1, n2, n3;
float F3 = 1.0 / 3.0;
float s = ( x + y + z ) * F3;
int i = fastfloor( x + s );
int j = fastfloor( y + s );
int k = fastfloor( z + s );
float G3 = 1.0 / 6.0;
float t = ( i + j + k ) * G3;
float X0 = i - t;
float Y0 = j - t;
float Z0 = k - t;
float x0 = x - X0;
float y0 = y - Y0;
float z0 = z - Z0;
int i1, j1, k1;
int i2, j2, k2;
if( x0 >= y0 )
{
if( y0 >= z0 )
{
i1 = 0;
j1 = 0;
k1 = 1;
i2 = 1;
j2 = 1;
k2 = 0;
}
else if( x0 >= z0 )
{
i1 = 1;
j1 = 0;
k1 = 0;
i2 = 1;
j2 = 0;
k2 = 1;
}
else
{
i1 = 0;
j1 = 0;
k1 = 1;
i2 = 1;
j2 = 0;
k2 = 1;
}
}
else
{
if( y0 < z0 )
{
i1 = 0;
j1 = 0;
k1 = 1;
i2 = 0;
j2 = 1;
k2 = 0;
}
else if( x0 < z0 )
{
i1 = 0;
j1 = 1;
k1 = 0;
i2 = 0;
j2 = 1;
k2 = 1;
}
else
{
i1 = 0;
j1 = 1;
k1 = 0;
i2 = 1;
j2 = 1;
k2 = 0;
}
}
float x1 = x0 - i1 + G3;
float y1 = y0 - j1 + G3;
float z1 = z0 - k1 + G3;
float x2 = x0 - i2 + 2.0 * G3;
float y2 = y0 - j2 + 2.0 *G3;
float z2 = z0 - k2 + 2.0 *G3;
float x3 = x0 - 1.0 + 3.0 * G3;
float y3 = y0 - 1.0 + 3.0 * G3;
float z3 = z0 - 1.0 + 3.0 * G3;
int ii = i & 255;
int jj = j & 255;
int kk = k & 255;
int gi0 = perm[ ii + perm[ jj + perm[ kk ] ] ] % 12;
int gi1 = perm[ ii+ i1 + perm[ jj + j1 + perm[ kk + k1 ] ] ] % 12;
int gi2 = perm[ ii + i2 + perm[ jj + j2 + perm[ kk + k2 ] ] ] % 12;
int gi3 = perm[ ii + 1 + perm[ jj + 1 + perm[ kk + 1 ] ] ] % 12;
float t0 = 0.6 - ( x0 * x0 ) - ( y0 * y0 ) - ( z0 * z0 );
if( t0 < 0 )
{
n0 = 0.0;
}
else
{
t0 = t0 * t0;
n0 = ( t0 * t0 ) * dot( &grad3[ gi0 ], x0, y0, z0);
}
float t1 = 0.6 - ( x1 * x1 ) - ( y1 * y1 ) - ( z1 * z1 );
if( t1 < 0 )
{
n1 = 0.0;
}
else
{
t1 *= t1;
n1 = ( t1 * t1 ) * dot( &grad3[ gi1 ], x1, y1, z1);
}
float t2 = 0.6 - ( x2 * x2 ) - ( y2 * y2 ) - ( z2 * z2 );
if( t2 < 0 )
{
n2 = 0.0;
}
else
{
t2 *= t2;
n2 = ( t2 * t2 ) * dot( &grad3[ gi2 ], x2, y2, z2);
}
float t3 = 0.6 - ( x3 * x3 ) - ( y3 * y3 ) - ( z3 * z3 );
if( t3 < 0 )
{
n3 = 0.0;
}
else
{
t3 = t3 * t3;
n3 = t3 * t3 * dot( &grad3[ gi3 ], x3, y3, z3);
}
float final = 32.0 * ( n0 + n1 + n2 + n3 );
return final;
int Noise::fastfloor( const float x )
return x > 0 ? (int)x : (int)x - 1;
float Noise::dot( const int* g, const float x, const float y, const float z )
return g[0]*x + g[1]*y + g[2]*z;
I found the solution...
Solution: I was silly. I had a dumb error in my raw noise function but i fixed it now
This is the fixed part of the code
if( x0 >= y0 )
{
if( y0 >= z0 )
{
i1 = 1;
j1 = 0;
k1 = 0;
i2 = 1;
j2 = 1;
k2 = 0;
}
else if( x0 >= z0 )
{
i1 = 1;
j1 = 0;
k1 = 0;
i2 = 1;
j2 = 0;
k2 = 1;
}
else
{
i1 = 0;
j1 = 0;
k1 = 1;
i2 = 1;
j2 = 0;
k2 = 1;
}
}
else
{
if( y0 < z0 )
{
i1 = 0;
j1 = 0;
k1 = 1;
i2 = 0;
j2 = 1;
k2 = 1;
}
else if( x0 < z0 )
{
i1 = 0;
j1 = 1;
k1 = 0;
i2 = 0;
j2 = 1;
k2 = 1;
}
else
{
i1 = 0;
j1 = 1;
k1 = 0;
i2 = 1;
j2 = 1;
k2 = 0;
}
}
Turned out I had several incorrect values in the old one.

Strassen algorithm not the fastest?

I copied strassen's algorithm from somewhere and then executed it. Here is the output
n = 256
classical took 360ms
strassen 1 took 33609ms
strassen2 took 1172ms
classical took 437ms
strassen 1 took 32891ms
strassen2 took 1156ms
classical took 266ms
strassen 1 took 27234ms
strassen2 took 734ms
where strassen1 is a dynamic approach, strassen2 for cache and classical is the old matrix multiplication. This means that our old and easy classical one is the best. Is this true or i am wrong somewhere? Here's the code in Java.
import java.util.Random;
class TestIntMatrixMultiplication {
public static void main (String...args) throws Exception {
final int n = args.length > 0 ? Integer.parseInt(args[0]) : 256;
final int seed = args.length > 1 ? Integer.parseInt(args[1]) : 256;
final Random random = new Random(seed);
int[][] a, b, c;
a = new int[n][n];
b = new int[n][n];
c = new int[n][n];
for(int i=0; i<n; i++) {
for(int j=0; j<n; j++) {
a[i][j] = random.nextInt(100);
b[i][j] = random.nextInt(100);
}
}
System.out.println("n = " + n);
if (a.length < 64) {
System.out.println("A");
dumpMatrix(a);
System.out.println("B");
dumpMatrix(b);
System.out.println("classic");
Classical.mult(c, a, b);
dumpMatrix(c);
System.out.println("strassen");
strassen2.mult(c, a, b);
dumpMatrix(c);
return;
}
for (int i = 0; i <3; ++i) {
timeMultiplies1(a, b, c);
if (n <= 256)
timeMultiplies2( a, b, c);
timeMultiplies3( a, b, c);
}
}
static void timeMultiplies1 (int[][] a, int[][] b, int[][] c) {
final long start = System.currentTimeMillis();
Classical.mult(c, a, b);
final long finish = System.currentTimeMillis();
System.out.println("classical took " + (finish - start) + "ms");
}
static void timeMultiplies2(int[][] a, int[][] b, int[][] c) {
final long start = System.currentTimeMillis();
strassen1.mult(c, a, b);
final long finish = System.currentTimeMillis();
System.out.println("strassen 1 took " + (finish - start) + "ms");
}
static void timeMultiplies3 (int[][] a, int[][] b, int[][] c) {
final long start = System.currentTimeMillis();
strassen2.mult(c, a, b);
final long finish = System.currentTimeMillis();
System.out.println("strassen2 took " + (finish - start) + "ms");
}
static void dumpMatrix (int[][] m) {
for (int[] row : m) {
System.out.print("[\t");
for (int val : row) {
System.out.print(val);
System.out.print('\t');
}
System.out.println(']');
}
}
}
class strassen1{
public String getName () {
return "Strassen(dynamic)";
}
public static int[][] mult (int[][] c, int[][] a, int[][] b) {
return strassenMatrixMultiplication(a, b);
}
public static int [][] strassenMatrixMultiplication(int [][] A, int [][] B) {
int n = A.length;
int [][] result = new int[n][n];
if(n == 1) {
result[0][0] = A[0][0] * B[0][0];
} else {
int [][] A11 = new int[n/2][n/2];
int [][] A12 = new int[n/2][n/2];
int [][] A21 = new int[n/2][n/2];
int [][] A22 = new int[n/2][n/2];
int [][] B11 = new int[n/2][n/2];
int [][] B12 = new int[n/2][n/2];
int [][] B21 = new int[n/2][n/2];
int [][] B22 = new int[n/2][n/2];
divideArray(A, A11, 0 , 0);
divideArray(A, A12, 0 , n/2);
divideArray(A, A21, n/2, 0);
divideArray(A, A22, n/2, n/2);
divideArray(B, B11, 0 , 0);
divideArray(B, B12, 0 , n/2);
divideArray(B, B21, n/2, 0);
divideArray(B, B22, n/2, n/2);
int [][] P1 = strassenMatrixMultiplication(addMatrices(A11, A22), addMatrices(B11, B22));
int [][] P2 = strassenMatrixMultiplication(addMatrices(A21, A22), B11);
int [][] P3 = strassenMatrixMultiplication(A11, subtractMatrices(B12, B22));
int [][] P4 = strassenMatrixMultiplication(A22, subtractMatrices(B21, B11));
int [][] P5 = strassenMatrixMultiplication(addMatrices(A11, A12), B22);
int [][] P6 = strassenMatrixMultiplication(subtractMatrices(A21, A11), addMatrices(B11, B12));
int [][] P7 = strassenMatrixMultiplication(subtractMatrices(A12, A22), addMatrices(B21, B22));
int [][] C11 = addMatrices(subtractMatrices(addMatrices(P1, P4), P5), P7);
int [][] C12 = addMatrices(P3, P5);
int [][] C21 = addMatrices(P2, P4);
int [][] C22 = addMatrices(subtractMatrices(addMatrices(P1, P3), P2), P6);
copySubArray(C11, result, 0 , 0);
copySubArray(C12, result, 0 , n/2);
copySubArray(C21, result, n/2, 0);
copySubArray(C22, result, n/2, n/2);
}
return result;
}
public static int [][] addMatrices(int [][] A, int [][] B) {
int n = A.length;
int [][] result = new int[n][n];
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
result[i][j] = A[i][j] + B[i][j];
return result;
}
public static int [][] subtractMatrices(int [][] A, int [][] B) {
int n = A.length;
int [][] result = new int[n][n];
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
result[i][j] = A[i][j] - B[i][j];
return result;
}
public static void divideArray(int[][] parent, int[][] child, int iB, int jB) {
for(int i1 = 0, i2=iB; i1<child.length; i1++, i2++)
for(int j1 = 0, j2=jB; j1<child.length; j1++, j2++)
child[i1][j1] = parent[i2][j2];
}
public static void copySubArray(int[][] child, int[][] parent, int iB, int jB) {
for(int i1 = 0, i2=iB; i1<child.length; i1++, i2++)
for(int j1 = 0, j2=jB; j1<child.length; j1++, j2++)
parent[i2][j2] = child[i1][j1];
}
}
class strassen2{
public String getName () {
return "Strassen(cached)";
}
static int [][] p1;
static int [][] p2;
static int [][] p3;
static int [][] p4;
static int [][] p5;
static int [][] p6;
static int [][] p7;
static int [][] t0;
static int [][] t1;
public static int[][] mult (int[][] c, int[][] a, int[][] b) {
final int n = c.length;
if (p1 == null || p1.length < n) {
p1 = new int[n/2][n-1];
p2 = new int[n/2][n-1];
p3 = new int[n/2][n-1];
p4 = new int[n/2][n-1];
p5 = new int[n/2][n-1];
p6 = new int[n/2][n-1];
p7 = new int[n/2][n-1];
t0 = new int[n/2][n-1];
t1 = new int[n/2][n-1];
}
mult(c, a, b, 0, 0, n, 0);
return c;
}
public static void mult (int[][] c, int[][] a, int[][] b, int i0, int j0, int n, int offs) {
if(n == 1) {
c[i0][j0] = a[i0][j0] * b[i0][j0];
} else {
final int nBy2 = n/2;
final int i1 = i0 + nBy2;
final int j1 = j0 + nBy2;
// offset applied to 'p' j index so recursive calls don't overwrite data
final int jp0 = offs;
final int jp1 = nBy2 + offs;
// P1 <- (A11 + A22)(B11 + B22)
// T0 <- (A11 + A22), T1 <- (B11 + B22), P1 <- T0*T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i0][j + j0] + a[i + i1][j + j1];
t1[i + i0][j + jp0] = b[i + i0][j + j0] + b[i + i1][j + j1];
}
}
mult(p1, t0, t1, i0, jp0, nBy2, offs + nBy2);
// P2 <- (A21 + A22)B11
// T0 <- (A21 + A22), T1 <- B11, P2 <- T0*T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i1][j + j0] + a[i + i1][j + j1];
t1[i + i0][j + jp0] = b[i + i0][j + j0];
}
}
mult(p2, t0, t1, i0, jp0, nBy2, offs + nBy2);
// P3 <- A11(B12 - B22)
// T0 <- A11, T1 <- (B12 - B22), P3 <- T0*T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i0][j + j0];
t1[i + i0][j + jp0] = b[i + i0][j + j1] - b[i + i1][j + j1];
}
}
mult(p3, t0, t1, i0, jp0, nBy2, offs + nBy2);
// P4 <- A22(B21 - B11)
// T0 <- A22, T1 <- (B21 - B11), P4 <- T0*T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i1][j + j1];
t1[i + i0][j + jp0] = b[i + i1][j + j0] - b[i + i0][j + j0];
}
}
mult(p4, t0, t1, i0, jp0, nBy2, offs + nBy2);
// P5 <- (A11 + A12) B22
// T0 <- (A11 + A12), T1 <- B22, P5 <- T0*T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i0][j + j0] + a[i + i0][j + j1];
t1[i + i0][j + jp0] = b[i + i1][j + j1];
}
}
mult(p5, t0, t1, i0, jp0, nBy2, offs + nBy2);
// P6 <- (A21 - A11)(B11 - B12)
// T0 <- (A21 - A11), T1 <- (B11 - B12), P6 <- T0 * T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i1][j + j0] - a[i + i0][j + j0];
t1[i + i0][j + jp0] = b[i + i0][j + j0] - b[i + i0][j + j1];
}
}
mult(p6, t0, t1, i0, jp0, nBy2, offs + nBy2);
// P7 <- (A12 - A22)(B21 + B22)
// T0 <- (A12 - A22), T1 <- (B21 + B22), P7 <- T0 * T1
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
t0[i + i0][j + jp0] = a[i + i0][j + j1] - a[i + i1][j + j1];
t1[i + i0][j + jp0] = b[i + i1][j + j0] + b[i + i1][j + j1];
}
}
mult(p7, t0, t1, i0, jp0, nBy2, offs + nBy2);
// combine
for (int i = 0; i < nBy2; ++i) {
for (int j = 0; j < nBy2; ++j) {
// C11 = P1 + P4 - P5 + P7;
c[i + i0][j + j0] = p1[i + i0][j + jp0] + p4[i + i0][j + jp0] - p5[i + i0][j + jp0] + p7[i + i0][j + jp0];
// C12 = P3 + P5;
c[i + i0][j + j1] = p3[i + i0][j + jp0] + p5[i + i0][j + jp0];
// C21 = P2 + P4;
c[i + i1][j + j0] = p2[i + i0][j + jp0] + p4[i + i0][j + jp0];
// C22 = P1 + P3 - P2 + P6;
c[i + i1][j + j1] = p1[i + i0][j + jp0] + p3[i + i0][j + jp0] - p2[i + i0][j + jp0] + p6[i + i0][j + jp0];
}
}
}
}
void dumpInternal () {
System.out.println("P1");
TestIntMatrixMultiplication.dumpMatrix(p1);
System.out.println("P2");
TestIntMatrixMultiplication.dumpMatrix(p2);
System.out.println("P3");
TestIntMatrixMultiplication.dumpMatrix(p3);
System.out.println("P4");
TestIntMatrixMultiplication.dumpMatrix(p4);
System.out.println("P5");
TestIntMatrixMultiplication.dumpMatrix(p5);
System.out.println("P6");
TestIntMatrixMultiplication.dumpMatrix(p6);
System.out.println("P7");
TestIntMatrixMultiplication.dumpMatrix(p7);
System.out.println("T0");
TestIntMatrixMultiplication.dumpMatrix(t0);
System.out.println("T1");
TestIntMatrixMultiplication.dumpMatrix(t1);
}
}
class Classical{
public String getName () {
return "classic";
}
public static int[][] mult (int[][] c, int[][] a, int[][] b) {
int n = a.length;
for(int i=0; i<n; i++) {
final int[] a_i = a[i];
final int[] c_i = c[i];
for(int j=0; j<n; j++) {
int sum = 0;
for(int k=0; k<n; k++) {
sum += a_i[k] * b[k][j];
}
c_i[j] = sum;
}
}
return c;
}
}
Issues I see:
1)Your Strassen multiply is dynamically allocating memory all the time. This is going to kill performance.
2)Your Strassen multiply should switch over to conventional multiply for small sizes rather than being recursive all the way down (though this optimization sort of invalidates your test).
3)You're matrix size may simply be too small to see the difference.
You should do comparisons with several different sizes. Perhaps 256, 512, 1024, 2048, 4096, 8192... Then plot the times and look at the trends. You will probably want matrix size on a log scale if it's all powers of 2.
Strassen is only faster for large N. How large will depend a lot on the implementation. What you have done for classical is only a basic implementation and is not optimal on a modern machine either.
Implementation questions aside, I think you're misunderstanding the algorithm's performance. Like phkahler said, your expectations are a little off for the performance of the algorithm. Divide-and-conquer algorithms work well for large inputs because they recursively break the problem into sub-problems which can be solved more quickly.
However, the overhead associated with this splitting action can cause the algorithm to run (sometimes much) slower for small or even medium-sized inputs. Typically, the theoretical analysis of an algorithm like Strassen will include a so-called "breakpoint" calculation. This is the input size where the overhead of splitting becomes preferable to a naive technique.
Your code needs to include a check on the size of the input that switches to the naive technique at the breakpoint.
Write down what the Strassen algorithm does for a 2 x 2 matrix. Count the operations. The number is absolutely ridiculous. It's stupid to use Strassen's method for a 2x2 matrix. Same for a 3 x 3, or 4 x 4, matrix and probably quite a way up.

An algorithm to find bounding box of closed bezier curves?

I'm looking for an algorithm to find bounding box (max/min points) of a closed quadratic bezier curve in Cartesian axis:
input: C (a closed bezier curve)
output: A B C D points
Image http://www.imagechicken.com/uploads/1270586513022388700.jpg
Note: above image shows a smooth curve. it could be not smooth. (have corners)
Ivan Kuckir's DeCasteljau is a brute force, but works in many cases. The problem with it is the count of iterations. The actual shape and the distance between coordinates affect to the precision of the result. And to find a precise enough answer, you have to iterate tens of times, may be more. And it may fail if there are sharp turns in curve.
Better solution is to find first derivative roots, as is described on the excellent site http://processingjs.nihongoresources.com/bezierinfo/. Please read the section Finding the extremities of the curves.
The link above has the algorithm for both quadratic and cubic curves.
The asker of question is interested in quadratic curves, so the rest of this answer may be irrelevant, because I provide codes for calculating extremities of Cubic curves.
Below are three Javascript codes of which the first (CODE 1) is the one I suggest to use.
** CODE 1 **
After testing processingjs and Raphael's solutions I find they had some restrictions and/or bugs. Then more search and found Bonsai and it's bounding box function, which is based on NISHIO Hirokazu's Python script. Both have a downside where double equality is tested using ==. When I changed these to numerically robust comparisons, then script succeeds 100% right in all cases. I tested the script with thousands of random paths and also with all collinear cases and all succeeded:
Various cubic curves
Random cubic curves
Collinear cubic curves
The code is as follows. Usually left, right, top and bottom values are the all needed, but in some cases it's fine to know the coordinates of local extreme points and corresponding t values. So I added there two variables: tvalues and points. Remove code regarding them and you have fast and stable bounding box calculation function.
// Source: http://blog.hackers-cafe.net/2009/06/how-to-calculate-bezier-curves-bounding.html
// Original version: NISHIO Hirokazu
// Modifications: Timo
var pow = Math.pow,
sqrt = Math.sqrt,
min = Math.min,
max = Math.max;
abs = Math.abs;
function getBoundsOfCurve(x0, y0, x1, y1, x2, y2, x3, y3)
{
var tvalues = new Array();
var bounds = [new Array(), new Array()];
var points = new Array();
var a, b, c, t, t1, t2, b2ac, sqrtb2ac;
for (var i = 0; i < 2; ++i)
{
if (i == 0)
{
b = 6 * x0 - 12 * x1 + 6 * x2;
a = -3 * x0 + 9 * x1 - 9 * x2 + 3 * x3;
c = 3 * x1 - 3 * x0;
}
else
{
b = 6 * y0 - 12 * y1 + 6 * y2;
a = -3 * y0 + 9 * y1 - 9 * y2 + 3 * y3;
c = 3 * y1 - 3 * y0;
}
if (abs(a) < 1e-12) // Numerical robustness
{
if (abs(b) < 1e-12) // Numerical robustness
{
continue;
}
t = -c / b;
if (0 < t && t < 1)
{
tvalues.push(t);
}
continue;
}
b2ac = b * b - 4 * c * a;
sqrtb2ac = sqrt(b2ac);
if (b2ac < 0)
{
continue;
}
t1 = (-b + sqrtb2ac) / (2 * a);
if (0 < t1 && t1 < 1)
{
tvalues.push(t1);
}
t2 = (-b - sqrtb2ac) / (2 * a);
if (0 < t2 && t2 < 1)
{
tvalues.push(t2);
}
}
var x, y, j = tvalues.length,
jlen = j,
mt;
while (j--)
{
t = tvalues[j];
mt = 1 - t;
x = (mt * mt * mt * x0) + (3 * mt * mt * t * x1) + (3 * mt * t * t * x2) + (t * t * t * x3);
bounds[0][j] = x;
y = (mt * mt * mt * y0) + (3 * mt * mt * t * y1) + (3 * mt * t * t * y2) + (t * t * t * y3);
bounds[1][j] = y;
points[j] = {
X: x,
Y: y
};
}
tvalues[jlen] = 0;
tvalues[jlen + 1] = 1;
points[jlen] = {
X: x0,
Y: y0
};
points[jlen + 1] = {
X: x3,
Y: y3
};
bounds[0][jlen] = x0;
bounds[1][jlen] = y0;
bounds[0][jlen + 1] = x3;
bounds[1][jlen + 1] = y3;
tvalues.length = bounds[0].length = bounds[1].length = points.length = jlen + 2;
return {
left: min.apply(null, bounds[0]),
top: min.apply(null, bounds[1]),
right: max.apply(null, bounds[0]),
bottom: max.apply(null, bounds[1]),
points: points, // local extremes
tvalues: tvalues // t values of local extremes
};
};
// Usage:
var bounds = getBoundsOfCurve(532,333,117,305,28,93,265,42);
console.log(JSON.stringify(bounds));
// Prints: {"left":135.77684049079755,"top":42,"right":532,"bottom":333,"points":[{"X":135.77684049079755,"Y":144.86387466397255},{"X":532,"Y":333},{"X":265,"Y":42}],"tvalues":[0.6365030674846626,0,1]}
CODE 2 (which fails in collinear cases):
I translated the code from http://processingjs.nihongoresources.com/bezierinfo/sketchsource.php?sketch=tightBoundsCubicBezier to Javascript. The code works fine in normal cases, but not in collinear cases where all points lie on the same line.
For reference, here is the Javascript code.
function computeCubicBaseValue(a,b,c,d,t) {
var mt = 1-t;
return mt*mt*mt*a + 3*mt*mt*t*b + 3*mt*t*t*c + t*t*t*d;
}
function computeCubicFirstDerivativeRoots(a,b,c,d) {
var ret = [-1,-1];
var tl = -a+2*b-c;
var tr = -Math.sqrt(-a*(c-d) + b*b - b*(c+d) +c*c);
var dn = -a+3*b-3*c+d;
if(dn!=0) { ret[0] = (tl+tr)/dn; ret[1] = (tl-tr)/dn; }
return ret;
}
function computeCubicBoundingBox(xa,ya,xb,yb,xc,yc,xd,yd)
{
// find the zero point for x and y in the derivatives
var minx = 9999;
var maxx = -9999;
if(xa<minx) { minx=xa; }
if(xa>maxx) { maxx=xa; }
if(xd<minx) { minx=xd; }
if(xd>maxx) { maxx=xd; }
var ts = computeCubicFirstDerivativeRoots(xa, xb, xc, xd);
for(var i=0; i<ts.length;i++) {
var t = ts[i];
if(t>=0 && t<=1) {
var x = computeCubicBaseValue(t, xa, xb, xc, xd);
var y = computeCubicBaseValue(t, ya, yb, yc, yd);
if(x<minx) { minx=x; }
if(x>maxx) { maxx=x; }}}
var miny = 9999;
var maxy = -9999;
if(ya<miny) { miny=ya; }
if(ya>maxy) { maxy=ya; }
if(yd<miny) { miny=yd; }
if(yd>maxy) { maxy=yd; }
ts = computeCubicFirstDerivativeRoots(ya, yb, yc, yd);
for(i=0; i<ts.length;i++) {
var t = ts[i];
if(t>=0 && t<=1) {
var x = computeCubicBaseValue(t, xa, xb, xc, xd);
var y = computeCubicBaseValue(t, ya, yb, yc, yd);
if(y<miny) { miny=y; }
if(y>maxy) { maxy=y; }}}
// bounding box corner coordinates
var bbox = [minx,miny, maxx,miny, maxx,maxy, minx,maxy ];
return bbox;
}
CODE 3 (works in most cases):
To handle also collinear cases, I found Raphael's solution, which is based on the same first derivative method as the CODE 2. I added also a return value dots, which has the extrema points, because always it's not enough to know bounding boxes min and max coordinates, but we want to know the exact extrema coordinates.
EDIT: found another bug. Fails eg. in 532,333,117,305,28,93,265,42 and also many other cases.
The code is here:
Array.max = function( array ){
return Math.max.apply( Math, array );
};
Array.min = function( array ){
return Math.min.apply( Math, array );
};
var findDotAtSegment = function (p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y, t) {
var t1 = 1 - t;
return {
x: t1*t1*t1*p1x + t1*t1*3*t*c1x + t1*3*t*t * c2x + t*t*t * p2x,
y: t1*t1*t1*p1y + t1*t1*3*t*c1y + t1*3*t*t * c2y + t*t*t * p2y
};
};
var cubicBBox = function (p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y) {
var a = (c2x - 2 * c1x + p1x) - (p2x - 2 * c2x + c1x),
b = 2 * (c1x - p1x) - 2 * (c2x - c1x),
c = p1x - c1x,
t1 = (-b + Math.sqrt(b * b - 4 * a * c)) / 2 / a,
t2 = (-b - Math.sqrt(b * b - 4 * a * c)) / 2 / a,
y = [p1y, p2y],
x = [p1x, p2x],
dot, dots=[];
Math.abs(t1) > "1e12" && (t1 = 0.5);
Math.abs(t2) > "1e12" && (t2 = 0.5);
if (t1 >= 0 && t1 <= 1) {
dot = findDotAtSegment(p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y, t1);
x.push(dot.x);
y.push(dot.y);
dots.push({X:dot.x, Y:dot.y});
}
if (t2 >= 0 && t2 <= 1) {
dot = findDotAtSegment(p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y, t2);
x.push(dot.x);
y.push(dot.y);
dots.push({X:dot.x, Y:dot.y});
}
a = (c2y - 2 * c1y + p1y) - (p2y - 2 * c2y + c1y);
b = 2 * (c1y - p1y) - 2 * (c2y - c1y);
c = p1y - c1y;
t1 = (-b + Math.sqrt(b * b - 4 * a * c)) / 2 / a;
t2 = (-b - Math.sqrt(b * b - 4 * a * c)) / 2 / a;
Math.abs(t1) > "1e12" && (t1 = 0.5);
Math.abs(t2) > "1e12" && (t2 = 0.5);
if (t1 >= 0 && t1 <= 1) {
dot = findDotAtSegment(p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y, t1);
x.push(dot.x);
y.push(dot.y);
dots.push({X:dot.x, Y:dot.y});
}
if (t2 >= 0 && t2 <= 1) {
dot = findDotAtSegment(p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y, t2);
x.push(dot.x);
y.push(dot.y);
dots.push({X:dot.x, Y:dot.y});
}
// remove duplicate dots
var dots2 = [];
var l = dots.length;
for(var i=0; i<l; i++) {
for(var j=i+1; j<l; j++) {
if (dots[i].X === dots[j].X && dots[i].Y === dots[j].Y)
j = ++i;
}
dots2.push({X: dots[i].X, Y: dots[i].Y});
}
return {
min: {x: Array.min(x), y: Array.min(y)},
max: {x: Array.max(x), y: Array.max(y)},
dots: dots2 // these are the extrema points
};
};
Well, I would say you start by adding all endpoints to your bounding box. Then, you go through all the bezier elements. I assume the formula in question is this one:
From this, extract two formulas for X and Y, respectively. Test both for extrema by taking the derivative (zero crossings). Then add the corresponding points to your bounding box as well.
Use De Casteljau algorithm to approximate the curve of higher orders. Here is how it works for cubic curve
http://jsfiddle.net/4VCVX/25/
function getCurveBounds(ax, ay, bx, by, cx, cy, dx, dy)
{
var px, py, qx, qy, rx, ry, sx, sy, tx, ty,
tobx, toby, tocx, tocy, todx, tody, toqx, toqy,
torx, tory, totx, toty;
var x, y, minx, miny, maxx, maxy;
minx = miny = Number.POSITIVE_INFINITY;
maxx = maxy = Number.NEGATIVE_INFINITY;
tobx = bx - ax; toby = by - ay; // directions
tocx = cx - bx; tocy = cy - by;
todx = dx - cx; tody = dy - cy;
var step = 1/40; // precision
for(var d=0; d<1.001; d+=step)
{
px = ax +d*tobx; py = ay +d*toby;
qx = bx +d*tocx; qy = by +d*tocy;
rx = cx +d*todx; ry = cy +d*tody;
toqx = qx - px; toqy = qy - py;
torx = rx - qx; tory = ry - qy;
sx = px +d*toqx; sy = py +d*toqy;
tx = qx +d*torx; ty = qy +d*tory;
totx = tx - sx; toty = ty - sy;
x = sx + d*totx; y = sy + d*toty;
minx = Math.min(minx, x); miny = Math.min(miny, y);
maxx = Math.max(maxx, x); maxy = Math.max(maxy, y);
}
return {x:minx, y:miny, width:maxx-minx, height:maxy-miny};
}
I believe that the control points of a Bezier curve form a convex hull that encloses the curve. If you just want a axis-aligned bounding box, I think you need to find the min and max of each (x, y) for each control point of all the segments.
I suppose that might not be a tight box. That is, the box might be slightly larger than it needs to be, but it's simple and fast to compute. I guess it depends on your requirements.
I think the accepted answer is fine, but just wanted to offer a little more explanation for anyone else trying to do this.
Consider a quadratic Bezier with starting point p1, ending point p2 and "control point" pc. This curve has three parametric equations:
pa(t) = p1 + t(pc-p1)
pb(t) = pc + t(p2-pc)
p(t) = pa(t) + t*(pb(t) - pa(t))
In all cases, t runs from 0 to 1, inclusive.
The first two are linear, defining line segments from p1 to pc and from pc to p2, respectively. The third is quadratic once you substitute in the expressions for pa(t) and pb(t); this is the one that actually defines points on the curve.
Actually, each of these equations is a pair of equations, one for the horizontal dimension, and one for the vertical. The nice thing about parametric curves is that the x and y can be handled independently of one another. The equations are exactly the same, just substitute x or y for p in the above equations.
The important point is that the line segment defined in equation 3, that runs from pa(t) to pb(t) for a specific value of t is tangent to the curve at the corresponding point p(t). To find the local extrema of the curve, you need to find the parameter value where the tangent is flat (i.e., a critical point). For the vertical dimension, you want to find the value of t such that ya(t) = yb(t), which gives the tangent a slope of 0. For the horizontal dimension, find t such that xa(t) = xb(t), which gives the tangent an infinite slope (i.e., a vertical line). In each case, you can just plug the value of t back into equation 1 (or 2, or even 3) to get the location of that extrema.
In other words, to find the vertical extrema of the curve, take just the y-component of equations 1 and 2, set them equal to each other and solve for t; plug this back into the y-component of equation 1, to get the y-value of that extrema. To get the complete y-range of the curve, find the minimum of this extreme y value and the y-components of the two end points, and likewise find the maximum of all three. Repeat for x to get the horizontal limits.
Remember that t only runs in [0, 1], so if you get a value outside of this range, it means there is no local extrema on the curve (at least not between your two endpoints). This includes the case where you end up dividing by zero when solving for t, which you will probably need to check for before you do it.
The same idea can be applied to higher-order Beziers, there are just more equations of higher degree, which also means there are potentially more local extrema per curve. For instance, on a cubic Bezier (two control points), solving for t to find the local extrema is a quadratic equation, so you could get 0, 1, or 2 values (remember to check for 0-denominators, and for negative square-roots, both of which indicate that there are no local extrema for that dimension). To find the range, you just need to find the min/max of all the local extrema, and the two end points.
I answered this question in Calculating the bounding box of cubic bezier curve
this article explain the details and also has a live html5 demo:
Calculating / Computing the Bounding Box of Cubic Bezier
I found a javascript in Snap.svg to calculate that: here
see the bezierBBox and curveDim functions.
I rewrite a javascript function.
//(x0,y0) is start point; (x1,y1),(x2,y2) is control points; (x3,y3) is end point.
function bezierMinMax(x0, y0, x1, y1, x2, y2, x3, y3) {
var tvalues = [], xvalues = [], yvalues = [],
a, b, c, t, t1, t2, b2ac, sqrtb2ac;
for (var i = 0; i < 2; ++i) {
if (i == 0) {
b = 6 * x0 - 12 * x1 + 6 * x2;
a = -3 * x0 + 9 * x1 - 9 * x2 + 3 * x3;
c = 3 * x1 - 3 * x0;
} else {
b = 6 * y0 - 12 * y1 + 6 * y2;
a = -3 * y0 + 9 * y1 - 9 * y2 + 3 * y3;
c = 3 * y1 - 3 * y0;
}
if (Math.abs(a) < 1e-12) {
if (Math.abs(b) < 1e-12) {
continue;
}
t = -c / b;
if (0 < t && t < 1) {
tvalues.push(t);
}
continue;
}
b2ac = b * b - 4 * c * a;
if (b2ac < 0) {
continue;
}
sqrtb2ac = Math.sqrt(b2ac);
t1 = (-b + sqrtb2ac) / (2 * a);
if (0 < t1 && t1 < 1) {
tvalues.push(t1);
}
t2 = (-b - sqrtb2ac) / (2 * a);
if (0 < t2 && t2 < 1) {
tvalues.push(t2);
}
}
var j = tvalues.length, mt;
while (j--) {
t = tvalues[j];
mt = 1 - t;
xvalues[j] = (mt * mt * mt * x0) + (3 * mt * mt * t * x1) + (3 * mt * t * t * x2) + (t * t * t * x3);
yvalues[j] = (mt * mt * mt * y0) + (3 * mt * mt * t * y1) + (3 * mt * t * t * y2) + (t * t * t * y3);
}
xvalues.push(x0,x3);
yvalues.push(y0,y3);
return {
min: {x: Math.min.apply(0, xvalues), y: Math.min.apply(0, yvalues)},
max: {x: Math.max.apply(0, xvalues), y: Math.max.apply(0, yvalues)}
};
}
Timo-s first variant adapted to Objective-C
CGPoint CubicBezierPointAt(CGPoint p1, CGPoint p2, CGPoint p3, CGPoint p4, CGFloat t) {
CGFloat x = CubicBezier(p1.x, p2.x, p3.x, p4.x, t);
CGFloat y = CubicBezier(p1.y, p2.y, p3.y, p4.y, t);
return CGPointMake(x, y);
}
// array containing TopLeft and BottomRight points for curve`s enclosing bounds
NSArray* CubicBezierExtremums(CGPoint p1, CGPoint p2, CGPoint p3, CGPoint p4) {
CGFloat a, b, c, t, t1, t2, b2ac, sqrtb2ac;
NSMutableArray *tValues = [NSMutableArray new];
for (int i = 0; i < 2; i++) {
if (i == 0) {
a = 3 * (-p1.x + 3 * p2.x - 3 * p3.x + p4.x);
b = 6 * (p1.x - 2 * p2.x + p3.x);
c = 3 * (p2.x - p1.x);
}
else {
a = 3 * (-p1.y + 3 * p2.y - 3 * p3.y + p4.y);
b = 6 * (p1.y - 2 * p2.y + p3.y);
c = 3 * (p2.y - p1.y);
}
if(ABS(a) < CGFLOAT_MIN) {// Numerical robustness
if (ABS(b) < CGFLOAT_MIN) {// Numerical robustness
continue;
}
t = -c / b;
if (t > 0 && t < 1) {
[tValues addObject:[NSNumber numberWithDouble:t]];
}
continue;
}
b2ac = pow(b, 2) - 4 * c * a;
if (b2ac < 0) {
continue;
}
sqrtb2ac = sqrt(b2ac);
t1 = (-b + sqrtb2ac) / (2 * a);
if (t1 > 0.0 && t1 < 1.0) {
[tValues addObject:[NSNumber numberWithDouble:t1]];
}
t2 = (-b - sqrtb2ac) / (2 * a);
if (t2 > 0.0 && t2 < 1.0) {
[tValues addObject:[NSNumber numberWithDouble:t2]];
}
}
int j = (int)tValues.count;
CGFloat x = 0;
CGFloat y = 0;
NSMutableArray *xValues = [NSMutableArray new];
NSMutableArray *yValues = [NSMutableArray new];
while (j--) {
t = [[tValues objectAtIndex:j] doubleValue];
x = CubicBezier(p1.x, p2.x, p3.x, p4.x, t);
y = CubicBezier(p1.y, p2.y, p3.y, p4.y, t);
[xValues addObject:[NSNumber numberWithDouble:x]];
[yValues addObject:[NSNumber numberWithDouble:y]];
}
[xValues addObject:[NSNumber numberWithDouble:p1.x]];
[xValues addObject:[NSNumber numberWithDouble:p4.x]];
[yValues addObject:[NSNumber numberWithDouble:p1.y]];
[yValues addObject:[NSNumber numberWithDouble:p4.y]];
//find minX, minY, maxX, maxY
CGFloat minX = [[xValues valueForKeyPath:#"#min.self"] doubleValue];
CGFloat minY = [[yValues valueForKeyPath:#"#min.self"] doubleValue];
CGFloat maxX = [[xValues valueForKeyPath:#"#max.self"] doubleValue];
CGFloat maxY = [[yValues valueForKeyPath:#"#max.self"] doubleValue];
CGPoint origin = CGPointMake(minX, minY);
CGPoint bottomRight = CGPointMake(maxX, maxY);
NSArray *toReturn = [NSArray arrayWithObjects:
[NSValue valueWithCGPoint:origin],
[NSValue valueWithCGPoint:bottomRight],
nil];
return toReturn;
}
Timo's CODE 2 answer has a small bug: the t parameter in computeCubicBaseValue function should be last. Nevertheless good job, works like a charm ;)
Solution in C# :
double computeCubicBaseValue(double a, double b, double c, double d, double t)
{
var mt = 1 - t;
return mt * mt * mt * a + 3 * mt * mt * t * b + 3 * mt * t * t * c + t * t * t * d;
}
double[] computeCubicFirstDerivativeRoots(double a, double b, double c, double d)
{
var ret = new double[2] { -1, -1 };
var tl = -a + 2 * b - c;
var tr = -Math.Sqrt(-a * (c - d) + b * b - b * (c + d) + c * c);
var dn = -a + 3 * b - 3 * c + d;
if (dn != 0) { ret[0] = (tl + tr) / dn; ret[1] = (tl - tr) / dn; }
return ret;
}
public double[] ComputeCubicBoundingBox(Point start, Point firstControl, Point secondControl, Point end)
{
double xa, ya, xb, yb, xc, yc, xd, yd;
xa = start.X;
ya = start.Y;
xb = firstControl.X;
yb = firstControl.Y;
xc = secondControl.X;
yc = secondControl.Y;
xd = end.X;
yd = end.Y;
// find the zero point for x and y in the derivatives
double minx = Double.MaxValue;
double maxx = Double.MinValue;
if (xa < minx) { minx = xa; }
if (xa > maxx) { maxx = xa; }
if (xd < minx) { minx = xd; }
if (xd > maxx) { maxx = xd; }
var ts = computeCubicFirstDerivativeRoots(xa, xb, xc, xd);
for (var i = 0; i < ts.Length; i++)
{
var t = ts[i];
if (t >= 0 && t <= 1)
{
var x = computeCubicBaseValue(xa, xb, xc, xd,t);
var y = computeCubicBaseValue(ya, yb, yc, yd,t);
if (x < minx) { minx = x; }
if (x > maxx) { maxx = x; }
}
}
double miny = Double.MaxValue;
double maxy = Double.MinValue;
if (ya < miny) { miny = ya; }
if (ya > maxy) { maxy = ya; }
if (yd < miny) { miny = yd; }
if (yd > maxy) { maxy = yd; }
ts = computeCubicFirstDerivativeRoots(ya, yb, yc, yd);
for (var i = 0; i < ts.Length; i++)
{
var t = ts[i];
if (t >= 0 && t <= 1)
{
var x = computeCubicBaseValue(xa, xb, xc, xd,t);
var y = computeCubicBaseValue(ya, yb, yc, yd,t);
if (y < miny) { miny = y; }
if (y > maxy) { maxy = y; }
}
}
// bounding box corner coordinates
var bbox = new double[] { minx, miny, maxx, maxy};
return bbox;
}

Resources