Is this knapsack algorithm or bin packing? I couldn't find an exact solution but basically I have a fixed rectangle area that I want to fill with perfect squares that represents my items where each have a different weight which will influence their size relative to others.
They will be sorted from large to smaller from top left to bottom right.
Also even though I need perfect squares, in the end some non-uniform scaling is allowed to fill the entire space as long as they still retain their relative area, and the non-uniform scaling is done with the least possible amount.
What algorithm I can use to achieve this?
There's a fast approximation algorithm due to Hiroshi Nagamochi and Yuusuke Abe. I implemented it in C++, taking care to obtain a worst-case O(n log n)-time implementation with worst-case recursive depth O(log n). If n ≤ 100, these precautions are probably unnecessary.
#include <algorithm>
#include <iostream>
#include <random>
#include <vector>
namespace {
struct Rectangle {
double x;
double y;
double width;
double height;
};
Rectangle Slice(Rectangle &r, const double beta) {
const double alpha = 1 - beta;
if (r.width > r.height) {
const double alpha_width = alpha * r.width;
const double beta_width = beta * r.width;
r.width = alpha_width;
return {r.x + alpha_width, r.y, beta_width, r.height};
}
const double alpha_height = alpha * r.height;
const double beta_height = beta * r.height;
r.height = alpha_height;
return {r.x, r.y + alpha_height, r.width, beta_height};
}
void LayoutRecursive(const std::vector<double> &reals, const std::size_t begin,
std::size_t end, double sum, Rectangle rectangle,
std::vector<Rectangle> &dissection) {
while (end - begin > 1) {
double suffix_sum = reals[end - 1];
std::size_t mid = end - 1;
while (mid > begin + 1 && suffix_sum + reals[mid - 1] <= 2 * sum / 3) {
suffix_sum += reals[mid - 1];
mid -= 1;
}
LayoutRecursive(reals, mid, end, suffix_sum,
Slice(rectangle, suffix_sum / sum), dissection);
end = mid;
sum -= suffix_sum;
}
dissection.push_back(rectangle);
}
std::vector<Rectangle> Layout(std::vector<double> reals,
const Rectangle rectangle) {
std::sort(reals.begin(), reals.end());
std::vector<Rectangle> dissection;
dissection.reserve(reals.size());
LayoutRecursive(reals, 0, reals.size(),
std::accumulate(reals.begin(), reals.end(), double{0}),
rectangle, dissection);
return dissection;
}
std::vector<double> RandomReals(const std::size_t n) {
std::vector<double> reals(n);
std::exponential_distribution<> dist;
std::default_random_engine gen;
for (double &real : reals) {
real = dist(gen);
}
return reals;
}
} // namespace
int main() {
const std::vector<Rectangle> dissection =
Layout(RandomReals(100), {72, 72, 6.5 * 72, 9 * 72});
std::cout << "%!PS\n";
for (const Rectangle &r : dissection) {
std::cout << r.x << " " << r.y << " " << r.width << " " << r.height
<< " rectstroke\n";
}
std::cout << "showpage\n";
}
Ok so lets assume integer positions and sizes (no float operations). To evenly divide rectangle into regular square grid (as big squares as possible) the size of the cells will be greatest common divisor GCD of the rectangle sizes.
However you want to have much less squares than that so I would try something like this:
try all square sizes a from 1 to smaller size of rectangle
for each a compute the naive square grid size of the rest of rectangle once a*a square is cut of it
so its simply GCD again on the 2 rectangles that will be created once a*a square is cut of. If the min of all 3 sizes a and GCD for the 2 rectangles is bigger than 1 (ignoring zero area rectangles) then consider a as valid solution so remember it.
after the for loop use last found valida
so simply add a*a square to your output and recursively do this whole again for the 2 rectangles that will remain from your original rectangle after a*a square was cut off.
Here simple C++/VCL/OpenGL example for this:
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
#include "Unit1.h"
#include "gl_simple.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
class square // simple square
{
public:
int x,y,a; // corner 2D position and size
square(){ x=y=a=0.0; }
square(int _x,int _y,int _a){ x=_x; y=_y; a=_a; }
~square(){}
void draw()
{
glBegin(GL_LINE_LOOP);
glVertex2i(x ,y);
glVertex2i(x+a,y);
glVertex2i(x+a,y+a);
glVertex2i(x ,y+a);
glEnd();
}
};
int rec[4]={20,20,760,560}; // x,y,a,b
const int N=1024; // max square number
int n=0; // number of squares
square s[N]; // squares
//---------------------------------------------------------------------------
int gcd(int a,int b) // slow euclid GCD
{
if(!b) return a;
return gcd(b, a % b);
}
//---------------------------------------------------------------------------
void compute(int x0,int y0,int xs,int ys)
{
if ((xs==0)||(ys==0)) return;
const int x1=x0+xs;
const int y1=y0+ys;
int a,b,i,x,y;
square t;
// try to find biggest square first
for (a=1,b=0;(a<=xs)&&(a<=ys);a++)
{
// sizes for the rest of the rectangle once a*a square is cut of
if (xs==a) x=0; else x=gcd(a,xs-a);
if (ys==a) y=0; else y=gcd(a,ys-a);
// min of all sizes
i=a;
if ((x>0)&&(i>x)) i=x;
if ((y>0)&&(i>y)) i=y;
// if divisible better than by 1 remember it as better solution
if (i>1) b=a;
} a=b;
// bigger square not found so use naive square grid division
if (a<=1)
{
t.a=gcd(xs,ys);
for (t.y=y0;t.y<y1;t.y+=t.a)
for (t.x=x0;t.x<x1;t.x+=t.a)
if (n<N){ s[n]=t; n++; }
}
// bigest square found so add it to result and recursively process the rest
else{
t=square(x0,y0,a);
if (n<N){ s[n]=t; n++; }
compute(x0+a,y0,xs-a,a);
compute(x0,y0+a,xs,ys-a);
}
}
//---------------------------------------------------------------------------
void gl_draw()
{
glClear(GL_COLOR_BUFFER_BIT);
glDisable(GL_DEPTH_TEST);
glDisable(GL_TEXTURE_2D);
// set view to 2D [pixel] units
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(-1.0,-1.0,0.0);
glScalef(2.0/float(xs),2.0/float(ys),1.0);
// render input rectangle
glColor3f(0.2,0.2,0.2);
glBegin(GL_QUADS);
glVertex2i(rec[0] ,rec[1]);
glVertex2i(rec[0]+rec[2],rec[1]);
glVertex2i(rec[0]+rec[2],rec[1]+rec[3]);
glVertex2i(rec[0] ,rec[1]+rec[3]);
glEnd();
// render output squares
glColor3f(0.2,0.5,0.9);
for (int i=0;i<n;i++) s[i].draw();
glFinish();
SwapBuffers(hdc);
}
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
// Init of program
gl_init(Handle); // init OpenGL
n=0; compute(rec[0],rec[1],rec[2],rec[3]);
Caption=n;
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender)
{
// Exit of program
gl_exit();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender)
{
// repaint
gl_draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormResize(TObject *Sender)
{
// resize
gl_resize(ClientWidth,ClientHeight);
}
//---------------------------------------------------------------------------
And preview for the actually hardcoded rectangle:
The number 8 in Caption of the window is the number of squares produced.
Beware this is just very simple startup example for this problem. I Have not tested it extensively so there is possibility once prime sizes or just unlucky aspect ratios are involved then this might result in really high number of squares... for example if GCD of the rectangle size is 1 (primes) ... In such case you should tweak the initial rectangle size (+/-1 or whatever)
The important thing in the code is just the compute() function and the global variables holding the output squares s[n]... beware the compute() does not clear the list (in order to allow recursion) so you need to set n=0 before its non recursive call.
To keep this simple I avoided to use any libs or dynamic allocations or containers for the compute itself...
Related
I have a time-series, which essentially amounts to some instrument recording the current time whenever it makes a "detection". The sampling rate is therefore not in constant time, however we can treat it as such by "re-sampling", relying on the fact that the detections are made reliably and we can simply insert 0's to "fill in" the gaps. This will be important later...
The instrument should detect the "signals" sent by another, nearby instrument. This second instrument emits a signal at some unknown period, T (e.g. 1 signal per second), with a "jitter" likely on the order of a few tenths of a percent of the period.
My goal is to determine this period (or frequency, if you like) using only the timestamps recorded by the "detecting" instrument. Unfortunately, however, the detector is flooded with noise, and a significant amount (I estimate 97-98%) of "detections" (and therefore "points" in the time-series) are due to noise. Therefore, extracting the period will require more careful analysis.
My first thought was to simply feed the time series into an FFT algorithm (I'm using FFTW/DHT), however this wasn't particularly enlightening. I've also tried my own (admittedly rather crude) algorithm, which simply computed a cross-correlation of the series with "clean" series of increasing period. I didn't get very far with this, either, and there are quite a handful of details to consider (phase, etc).
It occurs to me that something like this must've been done before, and surely there's a "nice" way to accomplish it.
Here's my approach. Given a period, we can score it using a dynamic program to find the subsequence of detection times that includes the first and last detection and maximizes the sum of gap log-likelihoods, where the gap log-likelihood is defined as minus the square of the difference of the gap and the period (Gaussian jitter model).
If we have approximately the right period, then we can get a very good gap sequence (some weirdness at the beginning and end and wherever there is a missed detection, but this is OK).
If we have the wrong period, then we end up with basically exponential jitter, which has low log-likelihood.
The C++ below generates fake detection times with a planted period and then searches over periods. Scores are normalized by a (bad) estimate of the score for Poisson noise, so wrong periods score about 0.4. See the plot below.
#include <algorithm>
#include <cmath>
#include <iostream>
#include <limits>
#include <random>
#include <vector>
namespace {
static constexpr double kFalseNegativeRate = 0.01;
static constexpr double kCoefficientOfVariation = 0.003;
static constexpr int kSignals = 6000;
static constexpr int kNoiseToSignalRatio = 50;
template <class URNG>
std::vector<double> FakeTimes(URNG &g, const double period) {
std::vector<double> times;
std::bernoulli_distribution false_negative(kFalseNegativeRate);
std::uniform_real_distribution<double> phase(0, period);
double signal = phase(g);
std::normal_distribution<double> interval(period,
kCoefficientOfVariation * period);
std::uniform_real_distribution<double> noise(0, kSignals * period);
for (int i = 0; i < kSignals; i++) {
if (!false_negative(g)) {
times.push_back(signal);
}
signal += interval(g);
for (double j = 0; j < kNoiseToSignalRatio; j++) {
times.push_back(noise(g));
}
}
std::sort(times.begin(), times.end());
return times;
}
constexpr double Square(const double x) { return x * x; }
struct Subsequence {
double score;
int previous;
};
struct Result {
double score = std::numeric_limits<double>::quiet_NaN();
double median_interval = std::numeric_limits<double>::quiet_NaN();
};
Result Score(const std::vector<double> ×, const double period) {
if (times.empty() || !std::is_sorted(times.begin(), times.end())) {
return {};
}
std::vector<Subsequence> bests;
bests.reserve(times.size());
bests.push_back({0, -1});
for (int i = 1; i < times.size(); i++) {
Subsequence best = {std::numeric_limits<double>::infinity(), -1};
for (int j = i - 1; j > -1; j--) {
const double difference = times[i] - times[j];
const double penalty = Square(difference - period);
if (difference >= period && penalty >= best.score) {
break;
}
const Subsequence candidate = {bests[j].score + penalty, j};
if (candidate.score < best.score) {
best = candidate;
}
}
bests.push_back(best);
}
std::vector<double> intervals;
int i = bests.size() - 1;
while (true) {
int previous_i = bests[i].previous;
if (previous_i < 0) {
break;
}
intervals.push_back(times[i] - times[previous_i]);
i = previous_i;
}
if (intervals.empty()) {
return {};
}
const double duration = times.back() - times.front();
// The rate is doubled because we can look for a time in either direction.
const double rate = 2 * (times.size() - 1) / duration;
// Mean of the square of an exponential distribution with the given rate.
const double mean_square = 2 / Square(rate);
const double score = bests.back().score / (intervals.size() * mean_square);
const auto median_interval = intervals.begin() + intervals.size() / 2;
std::nth_element(intervals.begin(), median_interval, intervals.end());
return {score, *median_interval};
}
} // namespace
int main() {
std::default_random_engine g;
const auto times = FakeTimes(g, std::sqrt(2));
for (int i = 0; i < 2000; i++) {
const double period = std::pow(1.001, i) / 3;
const Result result = Score(times, period);
std::cout << period << ' ' << result.score << ' ' << result.median_interval
<< std::endl;
}
}
I am trying to compare same size arrays.
Given the following array:
I am looking for an algorithm to tell me the most "similar" array to the input. I understand that the word "similar" is not very specific but I don't know how to be more specific.
For example the following is very similar to the input.
The following is somewhat similar.
The following is very different.
You could apply a smoothing kernel to the array, and then compute the L2 norm (Euclidean distance) on it.
This is often used to compare e.g. neural spike trains or other continuous signals.
http://www.cs.utah.edu/~suresh/papers/kerneld/kerneld.pdf
You didn't specify a language...I happen to have code in C++ (may not be the most efficient).
First, you do a smoothing of the vector based on your desired kernel width and parameterize it depending on the scale/desired amount of "blur", etc.. For example:
Output of code below (behaves as expected):
riveale#rv-mba:~/tmpdir$ g++ -std=c++11 test.cpp -o test.exe
riveale#rv-mba:~/tmpdir$ ./test.exe
Distance [1] to [2]: [31.488026] (should be far)
Distance [2] to [3]: [26.591297] (should be far)
Distance [1] to [3]: [12.468342] (should be closer)
And code (test.cpp):
#include <vector>
#include <cstdlib>
#include <cstdio>
#include <cmath>
double gauss_kernel_funct(const size_t& sourcetime, const size_t& thistime)
{
const double tauval = 5.0; //width of kernel
double dist = ((sourcetime-thistime)/tauval); //distance between the points in the vector
double retval = exp(-1 * dist*dist); //exponential decay away from center of that point, squared....
return retval;
}
std::vector<double> convolvegauss( const std::vector<double>& v1)
{
std::vector<double> convolved( v1.size(), 0.0 );
for(size_t t=0; t<v1.size(); ++t)
{
for(size_t u=0; u<v1.size(); ++u)
{
double coeff = gauss_kernel_funct(u, t);
convolved[t]+=v1[u] * coeff;
}
}
return (convolved);
}
double eucliddist( const std::vector<double>& v1, const std::vector<double>& v2 )
{
if(v1.size() != v2.size()) { fprintf(stderr, "ERROR v1!=v2 sizes\n"); exit(1); }
double sum=0.0;
for(size_t x=0; x<v1.size(); ++x)
{
double tmp = (v1[x] - v2[x]);
sum += tmp*tmp; //sum += distance of this dimension squared
}
return (sqrt( sum ));
}
double vectdist( const std::vector<double>& v1, const std::vector<double>& v2 )
{
std::vector<double> convolved1 = convolvegauss( v1 );
std::vector<double> convolved2 = convolvegauss( v2 );
return (eucliddist( convolved1, convolved2 ));
}
int main()
{
//Original 3 vectors. (1 and 3) are closer than (1 and 2) or (2 and 3)...like your example.
std::vector<double> myvector1 = {1.0, 32.0, 10.0, 5.0, 2.0};
std::vector<double> myvector2 = {2.0, 3.0, 10.0, 22.0, 2.0};
std::vector<double> myvector3 = {2.0, 20.0, 17.0, 1.0, 2.0};
//Now run the vectdist on each, which convolves each vector with the gaussian kernel, and takes the euclid distance between the convovled vectors)
fprintf(stdout, "Distance [%d] to [%d]: [%lf] (should be far)\n", 1, 2, vectdist(myvector1, myvector2) );
fprintf(stdout, "Distance [%d] to [%d]: [%lf] (should be far)\n", 2, 3, vectdist(myvector2, myvector3) );
fprintf(stdout, "Distance [%d] to [%d]: [%lf] (should be closer)\n", 1, 3, vectdist(myvector1, myvector3) );
return 0;
}
I have a problem where I need to define a polygon with the minimum number of vertices that intersects or contains every pixel in an image that is not transparent (Let N be the number of pixels in the image). My only assumptions are that the image cannot contain transparent pixels within its boundary (holes), and that at least two pixels are non-transparent. As an example, lets say I have the following image:
My idea for an algorithm is thus:
1) Determine the edge pixels.This is done in O(N) time by iterating through each pixel, and determining whether any neighbors (out of the four pixels left, above, right, and below it) are empty. Store the pixel and which neighbors were non-transparent, keyed by linear index into the array of pixels. Let there be P edge pixels, shown in orange below.
2) Get an adjacency list of the edge pixels.This is done in O(P) time by selecting one of the edge pixels, and chosing a direction based on empty neighbors. For example, if a pixel has a bottom and right neighbor, then the next pixel will be either one in the upper-right corner, or the pixel immediately to the right. Select the next edge pixel from the dictionary of remaining edge pixels. Append that pixel to the list until the algorithm returns to the starting pixel. There are 27 edge pixels in the example image below (some are an edge pixel more than once).
3) Draw a maze that all edges must lie between.This is done in O(P) time by iterating through the adjacency list, and adding an edge to all edges on those pixels without a neighbor. In addition, an edge is added to the interior of the shape based on the direction to the next edge pixel. If the pixel represents a peninsula with single pixel width, the inner edge is added to the middle of the edge instead of the pixel vertex. The interior of the maze is shown in red. Note that the maze boundary is a super-set of all the edge pixels found in step 2.
4) Find a polygon with almost minimal edges that does not touch the border of the maze.This is the part I need help with. Does anyone have a suggestion of how you would go from step #3 to a solution such as the following?
I have no background in image processing, but I came across the Ramer–Douglas–Peucker algorithm yesterday and I think it might be helpful.
From my quick scan of the Wikipedia article, it reduces the number of point in a curve, so I would run this algorithm on each line where the points of the line are the end points you have and also set as points the borders of squares that the line crosses.
I marked out in this image two lines you could run the algorithm on and I think it would work.
How to find each line and when to stop - not 100% sure, but I hope this was useful.
See find holes in 2D point set
it is very similar problem
just invert the map and use midpoint of each grid square as point
that will lead to this:
as you want the inner polygon then there are 2 choices
shrink shape by 1 cell before applying this (can loose some detail in shape)
change the H/V lines so they are 1 cell shorter (half from both sides)
that will lead to something like this:
after some changes in code I can use 2x multi-sampling now so result is:
now you can join connected lines with the same slope
and apply something like ear clipping on the found corners to get more close to your desired polygon
Here the inverted source code in C++ (there may be some hole comments left):
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
/* usage:
int i;
pntcloud_polygons h;
pnt2D point[points];
h.scann_beg(); for (i=0;i<points;i++) { p=point[i]; h.scann_pnt(p.x,p.y); } h.scann_end();
h.cell_size(2.5); // or (h.x1-h.x0)*0.01 ... cell size >> avg point distance
h.holes_beg(); for (i=0;i<points;i++) { p=point[i]; h.holes_pnt(p.x,p.y); } h.holes_end();
*/
//---------------------------------------------------------------------------
class pntcloud_polygons
{
public:
int xs,ys,n; // cell grid x,y - size and points count
int **map; // points density map[xs][ys]
// i=(x-x0)*g2l; x=x0+(i*l2g);
// j=(y-y0)*g2l; y=y0+(j*l2g);
double mg2l,ml2g; // scale to/from global/map space (x,y) <-> map[i][j]
double x0,x1,y0,y1; // used area (bounding box)
struct _line
{
int id; // id of hole for segmentation/polygonize
float i0,i1,j0,j1; // index in map[][]
_line(){}; _line(_line& a){ *this=a; }; ~_line(){}; _line* operator = (const _line *a) { *this=*a; return this; }; /*_line* operator = (const _line &a) { ...copy... return this; };*/
};
List<_line> lin;
int lin_i0; // start index for perimeter lines (smaller indexes are the H,V lines inside hole)
struct _point
{
int i,j; // index in map[][]
int p0,p1; // previous next point
int used;
_point(){}; _point(_point& a){ *this=a; }; ~_point(){}; _point* operator = (const _point *a) { *this=*a; return this; }; /*_point* operator = (const _point &a) { ...copy... return this; };*/
};
List<_point> pnt;
// class init and internal stuff
pntcloud_polygons() { xs=0; ys=0; n=0; map=NULL; mg2l=1.0; ml2g=1.0; x0=0.0; y0=0.0; x1=0.0; y1=0.0; lin_i0=0; };
pntcloud_polygons(pntcloud_polygons& a){ *this=a; };
~pntcloud_polygons() { _free(); };
pntcloud_polygons* operator = (const pntcloud_polygons *a) { *this=*a; return this; };
pntcloud_polygons* operator = (const pntcloud_polygons &a)
{
xs=0; ys=0; n=a.n; map=NULL;
mg2l=a.mg2l; x0=a.x0; x1=a.x1;
ml2g=a.ml2g; y0=a.y0; y1=a.y1;
_alloc(a.xs,a.ys);
for (int i=0;i<xs;i++)
for (int j=0;j<ys;j++) map[i][j]=a.map[i][j];
return this;
}
void _free() { if (map) { for (int i=0;i<xs;i++) if (map[i]) delete[] map[i]; delete[] map; } xs=0; ys=0; }
void _alloc(int _xs,int _ys) { int i=0; _free(); xs=_xs; ys=_ys; map=new int*[xs]; if (map) for (i=0;i<xs;i++) { map[i]=new int[ys]; if (map[i]==NULL) { i=-1; break; } } else i=-1; if (i<0) _free(); }
// scann boundary box interface
void scann_beg();
void scann_pnt(double x,double y);
void scann_end();
// dynamic allocations
void cell_size(double sz); // compute/allocate grid from grid cell size = sz x sz
// scann pntcloud_polygons interface
void holes_beg();
void holes_pnt(double x,double y);
void holes_end();
// global(x,y) <- local map[i][j] + half cell offset
inline void l2g(double &x,double &y,int i,int j) { x=x0+((double(i)+0.5)*ml2g); y=y0+((double(j)+0.5)*ml2g); }
inline void l2g(double &x,double &y,float i,float j) { x=x0+((double(i)+0.5)*ml2g); y=y0+((double(j)+0.5)*ml2g); }
// local map[i][j] <- global(x,y)
inline void g2l(int &i,int &j,double x,double y) { i= double((x-x0) *mg2l); j= double((y-y0) *mg2l); }
};
//---------------------------------------------------------------------------
void pntcloud_polygons::scann_beg()
{
x0=0.0; y0=0.0; x1=0.0; y1=0.0; n=0;
}
//---------------------------------------------------------------------------
void pntcloud_polygons::scann_pnt(double x,double y)
{
if (!n) { x0=x; y0=y; x1=x; y1=y; }
if (n<0x7FFFFFFF) n++; // avoid overflow
if (x0>x) x0=x; if (x1<x) x1=x;
if (y0>y) y0=y; if (y1<y) y1=y;
}
//---------------------------------------------------------------------------
void pntcloud_polygons::scann_end()
{
}
//---------------------------------------------------------------------------
void pntcloud_polygons::cell_size(double sz)
{
int x,y;
if (sz<1e-6) sz=1e-6;
x=ceil((x1-x0)/sz);
y=ceil((y1-y0)/sz);
_alloc(x,y);
ml2g=sz; mg2l=1.0/sz;
}
//---------------------------------------------------------------------------
void pntcloud_polygons::holes_beg()
{
int i,j;
for (i=0;i<xs;i++)
for (j=0;j<ys;j++)
map[i][j]=0;
}
//---------------------------------------------------------------------------
void pntcloud_polygons::holes_pnt(double x,double y)
{
int i,j;
g2l(i,j,x,y);
if ((i>=0)&&(i<xs))
if ((j>=0)&&(j<ys))
if (map[i][j]<0x7FFFFFFF) map[i][j]++; // avoid overflow
}
//---------------------------------------------------------------------------
void pntcloud_polygons::holes_end()
{
int i,j,e,i0,i1;
List<int> ix; // hole lines start/stop indexes for speed up the polygonization
_line *a,*b,l;
_point *aa,*bb,p;
lin.num=0; lin_i0=0;// clear lines
ix.num=0; // clear indexes
// find pntcloud_polygons (map[i][j].cnt!=0) or (map[i][j].cnt>=treshold)
// and create lin[] list of H,V lines covering pntcloud_polygons
for (j=0;j<ys;j++) // search lines
for (i=0;i<xs;)
{
int i0,i1;
for (;i<xs;i++) if (map[i][j]!=0) break; i0=i-1; // find start of polygon
for (;i<xs;i++) if (map[i][j]==0) break; i1=i; // find end of polygon
if (i0< 0) continue; // skip bad circumstances (edges or no hole found)
if (i1>=xs) continue;
if (map[i0][j]!=0) continue;
if (map[i1][j]!=0) continue;
l.i0=i0+0.5;
l.i1=i1-0.5;
l.j0=j ;
l.j1=j ;
l.id=-1;
lin.add(l);
}
for (i=0;i<xs;i++) // search columns
for (j=0;j<ys;)
{
int j0,j1;
for (;j<ys;j++) if (map[i][j]!=0) break; j0=j-1; // find start of polygon
for (;j<ys;j++) if (map[i][j]==0) break; j1=j ; // find end of polygon
if (j0< 0) continue; // skip bad circumstances (edges or no hole found)
if (j1>=ys) continue;
if (map[i][j0]!=0) continue;
if (map[i][j1]!=0) continue;
l.i0=i ;
l.i1=i ;
l.j0=j0+0.5;
l.j1=j1-0.5;
l.id=-1;
lin.add(l);
}
// segmentate lin[] ... group lines of the same hole together by lin[].id
// segmentation based on vector lines data
// you can also segmentate the map[][] directly as bitmap during hole detection
for (i=0;i<lin.num;i++) lin[i].id=i; // all lines are separate
for (;;) // join what you can
{
for (e=0,a=lin.dat,i=0;i<lin.num;i++,a++)
{
for (b=a,j=i;j<lin.num;j++,b++)
if (a->id!=b->id)
{
// if a,b not intersecting or neighbouring
if (a->i0>b->i1) continue;
if (b->i0>a->i1) continue;
if (a->j0>b->j1) continue;
if (b->j0>a->j1) continue;
// if they do mark e for join groups
e=1; break;
}
if (e) break; // join found ... stop searching
}
if (!e) break; // no join found ... stop segmentation
i0=a->id; // joid ids ... rename i1 to i0
i1=b->id;
for (a=lin.dat,i=0;i<lin.num;i++,a++)
if (a->id==i1)
a->id=i0;
}
// sort lin[] by id
for (e=1;e;) for (e=0,a=&lin[0],b=&lin[1],i=1;i<lin.num;i++,a++,b++)
if (a->id>b->id) { l=*a; *a=*b; *b=l; e=1; }
// re id lin[] and prepare start/stop indexes
for (i0=-1,i1=-1,a=&lin[0],i=0;i<lin.num;i++,a++)
if (a->id==i1) a->id=i0;
else { i0++; i1=a->id; a->id=i0; ix.add(i); }
ix.add(lin.num);
// polygonize
lin_i0=lin.num;
for (j=1;j<ix.num;j++) // process hole
{
i0=ix[j-1]; i1=ix[j];
// create border pnt[] list (unique points only)
pnt.num=0; p.used=0; p.p0=-1; p.p1=-1;
for (a=&lin[i0],i=i0;i<i1;i++,a++)
{
p.i=a->i0;
p.j=a->j0;
map[p.i][p.j]=0;
for (aa=&pnt[0],e=0;e<pnt.num;e++,aa++)
if ((aa->i==p.i)&&(aa->j==p.j)) { e=-1; break; }
if (e>=0) pnt.add(p);
p.i=a->i1;
p.j=a->j1;
map[p.i][p.j]=0;
for (aa=&pnt[0],e=0;e<pnt.num;e++,aa++)
if ((aa->i==p.i)&&(aa->j==p.j)) { e=-1; break; }
if (e>=0) pnt.add(p);
}
// mark not border points
for (aa=&pnt[0],i=0;i<pnt.num;i++,aa++)
if (!aa->used) // ignore marked points
if ((aa->i>0)&&(aa->i<xs-1)) // ignore map[][] border points
if ((aa->j>0)&&(aa->j<ys-1))
{ // ignore if any non hole cell around
if (map[aa->i-1][aa->j-1]>0) continue;
if (map[aa->i-1][aa->j ]>0) continue;
if (map[aa->i-1][aa->j+1]>0) continue;
if (map[aa->i ][aa->j-1]>0) continue;
if (map[aa->i ][aa->j+1]>0) continue;
if (map[aa->i+1][aa->j-1]>0) continue;
if (map[aa->i+1][aa->j ]>0) continue;
if (map[aa->i+1][aa->j+1]>0) continue;
aa->used=1;
}
// delete marked points
for (aa=&pnt[0],e=0,i=0;i<pnt.num;i++,aa++)
if (!aa->used) { pnt[e]=*aa; e++; } pnt.num=e;
// connect neighbouring points distance=1
for (i0= 0,aa=&pnt[i0];i0<pnt.num;i0++,aa++)
if (aa->used<2)
for (i1=i0+1,bb=&pnt[i1];i1<pnt.num;i1++,bb++)
if (bb->used<2)
{
i=aa->i-bb->i; if (i<0) i=-i; e =i;
i=aa->j-bb->j; if (i<0) i=-i; e+=i;
if (e!=1) continue;
aa->used++; if (aa->p0<0) aa->p0=i1; else aa->p1=i1;
bb->used++; if (bb->p0<0) bb->p0=i0; else bb->p1=i0;
}
// try to connect neighbouring points distance=sqrt(2)
for (i0= 0,aa=&pnt[i0];i0<pnt.num;i0++,aa++)
if (aa->used<2)
for (i1=i0+1,bb=&pnt[i1];i1<pnt.num;i1++,bb++)
if (bb->used<2)
if ((aa->p0!=i1)&&(aa->p1!=i1))
if ((bb->p0!=i0)&&(bb->p1!=i0))
{
if ((aa->used)&&(aa->p0==bb->p0)) continue; // avoid small losed loops
i=aa->i-bb->i; if (i<0) i=-i; e =i*i;
i=aa->j-bb->j; if (i<0) i=-i; e+=i*i;
if (e!=2) continue;
aa->used++; if (aa->p0<0) aa->p0=i1; else aa->p1=i1;
bb->used++; if (bb->p0<0) bb->p0=i0; else bb->p1=i0;
}
// try to connect to closest point
int ii,dd;
for (i0= 0,aa=&pnt[i0];i0<pnt.num;i0++,aa++)
if (aa->used<2)
{
for (ii=-1,i1=i0+1,bb=&pnt[i1];i1<pnt.num;i1++,bb++)
if (bb->used<2)
if ((aa->p0!=i1)&&(aa->p1!=i1))
if ((bb->p0!=i0)&&(bb->p1!=i0))
{
i=aa->i-bb->i; if (i<0) i=-i; e =i*i;
i=aa->j-bb->j; if (i<0) i=-i; e+=i*i;
if ((ii<0)||(e<dd)) { ii=i1; dd=e; }
}
if (ii<0) continue;
i1=ii; bb=&pnt[i1];
aa->used++; if (aa->p0<0) aa->p0=i1; else aa->p1=i1;
bb->used++; if (bb->p0<0) bb->p0=i0; else bb->p1=i0;
}
// add connected points to lin[] ... this is hole perimeter !!!
// lines are 2 x duplicated so some additional code for sort the order of line swill be good idea
l.id=lin[ix[j-1]].id;
// add index of points instead points
int lin_i1=lin.num;
for (i0=0,aa=&pnt[i0];i0<pnt.num;i0++,aa++)
{
l.i0=i0;
if (aa->p0>i0) { l.i1=aa->p0; lin.add(l); }
if (aa->p1>i0) { l.i1=aa->p1; lin.add(l); }
}
// reorder perimeter lines
for (i0=lin_i1,a=&lin[i0];i0<lin.num-1;i0++,a++)
for (i1=i0+1 ,b=&lin[i1];i1<lin.num ;i1++,b++)
{
if (a->i1==b->i0) { a++; l=*a; *a=*b; *b=l; a--; break; }
if (a->i1==b->i1) { a++; l=*a; *a=*b; *b=l; i=a->i0; a->i0=a->i1; a->i1=i; a--; break; }
}
// convert point indexes to points
for (i0=lin_i1,a=&lin[i0];i0<lin.num;i0++,a++)
{
bb=&pnt[a->i0]; a->i0=bb->i; a->j0=bb->j;
bb=&pnt[a->i1]; a->i1=bb->i; a->j1=bb->j;
}
}
}
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
it is the same as the code in holes link above
just the map[][] conditions inverted to search polygons instead of holes
and found HV lines are shrinked by half of cell from each side
_lin coordinates are float now so o need for 4x multisampling
the best results are with 2x multi-sampling (to avoid polygon width=1)
so each cell in your map add as 2x2 points in the cell area
I added also 2 corner points to better align map[][] and your image
I have a question about boost::geometry::intersection performance in Debug configuration. One part of my project has a lot(millions) of intersections of polygon-polygon kind. And it is very-very slow in debug comparing to release. So I need to wait a lot of time to debug problems after this 'intersection' part. What can I do to speed up it in Debug mode?
Code example of simple Win32 console project in VS2010:
#include "stdafx.h"
#include <time.h>
#include <deque>
#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point_xy.hpp>
#include <boost/geometry/geometries/polygon.hpp>
#include <boost/foreach.hpp>
bool get_poly_intersection_area_S_2d( const double *poly_x_1, const int poly_n_1, const double *poly_x_2, const int poly_n_2, double *S)
{
// intersects two 2d polygons using boost::geometry library
// polygons are in 3d format [(x0, y0, 0.0) (x1, y1, 0.0) .... (xn, yn, 0.0) ]
// S is resulting area of intersection
// returns true if intersection exists (area > DBL_EPSILON) and false otherwise
typedef boost::geometry::model::d2::point_xy<double> bg_point;
typedef boost::geometry::model::polygon< bg_point, false, false > bg_polygon;
*S = 0.0;
bg_polygon bg_poly_1, bg_poly_2;
// init boost 2d polygons by our double 3D polygons
for(int i=0; i<poly_n_1; i++)
bg_poly_1.outer().push_back(bg_point(poly_x_1[i*3], poly_x_1[i*3+1]));
for(int i=0; i<poly_n_2; i++)
bg_poly_2.outer().push_back(bg_point(poly_x_2[i*3], poly_x_2[i*3+1]));
// correct polygons
boost::geometry::correct(bg_poly_1);
boost::geometry::correct(bg_poly_2);
// call intersection
std::deque<bg_polygon> output;
bool res = boost::geometry::intersection(bg_poly_1, bg_poly_2, output);
if(!res)
return false;
// for each polygon of intersection we add area
BOOST_FOREACH(bg_polygon const& p, output)
{
double s = boost::geometry::area(p);
*S += s;
}
// no intersection
if(fabs(*S) <= DBL_EPSILON)
return false;
// intersection > DBL_EPSILON
return true;
}
int _tmain(int argc, _TCHAR* argv[])
{
double p1[4 * 3] = {0,0,0, 2,0,0, 2,2,0, 0,2,0};
double p2[4 * 3] = {1,1,0, 3,1,0, 3,3,0, 1,3,0};
clock_t start_t, finish_t;
start_t = clock();
for(int i=0;i<5000; i++)
{
double s;
for(int k=0; k<4; k++)
{
p1[k*4] += i;
p2[k*4] += i;
}
get_poly_intersection_area_S_2d(p1, 4, p2, 4, &s);
}
finish_t = clock();
printf("time=%.3f s\n", double(finish_t - start_t)/1000.);
return 0;
}
In Debug it takes 15 seconds, in Release 0.1 seconds. And here only 5000 polygons intersections. For millions it will be much slower.
The Boost.Geometry speed is heavily influenced by the speed of stl (vector, deque, map) which is considerably slower in Debug Mode.
You might try to solve that, e.g. by this links:
Why does my STL code run so slowly when I have the debugger/IDE attached?
Why is this code 100 times slower in debug?
Especially the second link suggests setting _HAS_ITERATOR_DEBUGGING to 0 which might help a lot because iterators are heavily used inside Boost.Geometry
Alternatively you might try using stl port.
The best way is to work out a way to reproduce the problem you are debugging using a small dataset. This helps in MANY ways, especially when creating a regression test when you have solved the problem.
You write: "debug problems after this 'intersection' part "
This suggests to me that you should optionally save the output from the intersection part, so that you need run it only once, and then have an option to load the saved partial result and run the rest of the program from there in debug mode.
I need to compute the nullspace of several thousand small matrices (8x9, not 4x3 as I wrote previously) in parallel (CUDA). All references point to SVD but the algorithm in numerical recipes seems very expensive, and gives me lots of things other than the null space that I don't really need. Is Gaussian elimination really not an option? Are there any other commonly used methods?
To answer your question directly... yes! QR decomposition!
Let A be an m-by-n matrix with rank n. QR decomposition finds orthonormal m-by-m matrix Q and upper triangular m-by-n matrix R such that A = QR. If we define Q = [Q1 Q2], where Q1 is m-by-n and Q2 is m-by-(m-n), then the columns of Q2 form the null space of A^T.
QR decomposition is computed either by Gram-Schmidt, Givens rotations, or Householder reflections. They have different stability properties and operation counts.
You are right: SVD is expensive! I can't speak for what state-of-the-art stuff uses, but when I hear "compute null space" (EDIT: in a way that is simple for me to understand), I think QR.
I don't think the above proposed method always gives the whole null space. To recap: "A = QR, where Q = [Q1 Q2], and Q1 is m-by-n and Q2 is m-by-(m-n). Then the columns of Q2 form the null space of A^T."
Indeed, this may only give a subspace of the null space. Simple counter-example is when A=0, in which case the null space of A^T is the whole R^m.
Therefore, it is necessary to check R too. Based on my experience with Matlab, if a row of R is straight 0, then the corresponding column in Q should also be a basis of the null space of A^T. Clearly this observation is heuristic and hinges on the particular algorithm used for QR decomposition.
Gaussian elimination is plenty fast for 4x3 matrices. IIRC I've done about 5 million per second with Java without parallelism. With such a small problem, your best bet is to code the routine (row reduce etc.) yourself; otherwise you'll waste most of the time putting the data into the right format for the external routine.
In the anwers above, it has been already pointed out how the null space of a matrix can be calculated by using the QR or the SVD approach. SVD should be preferred when accuracy is required, see also Null-space of a rectangular dense matrix.
As of February 2015, CUDA 7 (now in release candidate) makes SVD available through its new cuSOLVER library. Below I report an example on how using cuSOLVER's SVD to calculate the null space of a matrix.
Be aware that the problem you are focusing on concerns the calculation of several small matrices, so you should adapt the example I'm providing below by using streams to make sense for your case. To associate a stream to each task you can use
cudaStreamCreate()
and
cusolverDnSetStream()
kernel.cu
#include "cuda_runtime.h"
#include "device_launch_paraMeters.h"
#include<iostream>
#include<iomanip>
#include<stdlib.h>
#include<stdio.h>
#include<assert.h>
#include<math.h>
#include <cusolverDn.h>
#include <cuda_runtime_api.h>
#include "Utilities.cuh"
/********/
/* MAIN */
/********/
int main(){
// --- gesvd only supports Nrows >= Ncols
// --- column major memory ordering
const int Nrows = 7;
const int Ncols = 5;
// --- cuSOLVE input/output parameters/arrays
int work_size = 0;
int *devInfo; gpuErrchk(cudaMalloc(&devInfo, sizeof(int)));
// --- CUDA solver initialization
cusolverDnHandle_t solver_handle;
cusolverDnCreate(&solver_handle);
// --- Singular values threshold
double threshold = 1e-12;
// --- Setting the host, Nrows x Ncols matrix
double *h_A = (double *)malloc(Nrows * Ncols * sizeof(double));
for(int j = 0; j < Nrows; j++)
for(int i = 0; i < Ncols; i++)
h_A[j + i*Nrows] = (i + j*j) * sqrt((double)(i + j));
// --- Setting the device matrix and moving the host matrix to the device
double *d_A; gpuErrchk(cudaMalloc(&d_A, Nrows * Ncols * sizeof(double)));
gpuErrchk(cudaMemcpy(d_A, h_A, Nrows * Ncols * sizeof(double), cudaMemcpyHostToDevice));
// --- host side SVD results space
double *h_U = (double *)malloc(Nrows * Nrows * sizeof(double));
double *h_V = (double *)malloc(Ncols * Ncols * sizeof(double));
double *h_S = (double *)malloc(min(Nrows, Ncols) * sizeof(double));
// --- device side SVD workspace and matrices
double *d_U; gpuErrchk(cudaMalloc(&d_U, Nrows * Nrows * sizeof(double)));
double *d_V; gpuErrchk(cudaMalloc(&d_V, Ncols * Ncols * sizeof(double)));
double *d_S; gpuErrchk(cudaMalloc(&d_S, min(Nrows, Ncols) * sizeof(double)));
// --- CUDA SVD initialization
cusolveSafeCall(cusolverDnDgesvd_bufferSize(solver_handle, Nrows, Ncols, &work_size));
double *work; gpuErrchk(cudaMalloc(&work, work_size * sizeof(double)));
// --- CUDA SVD execution
cusolveSafeCall(cusolverDnDgesvd(solver_handle, 'A', 'A', Nrows, Ncols, d_A, Nrows, d_S, d_U, Nrows, d_V, Ncols, work, work_size, NULL, devInfo));
int devInfo_h = 0; gpuErrchk(cudaMemcpy(&devInfo_h, devInfo, sizeof(int), cudaMemcpyDeviceToHost));
if (devInfo_h != 0) std::cout << "Unsuccessful SVD execution\n\n";
// --- Moving the results from device to host
gpuErrchk(cudaMemcpy(h_S, d_S, min(Nrows, Ncols) * sizeof(double), cudaMemcpyDeviceToHost));
gpuErrchk(cudaMemcpy(h_U, d_U, Nrows * Nrows * sizeof(double), cudaMemcpyDeviceToHost));
gpuErrchk(cudaMemcpy(h_V, d_V, Ncols * Ncols * sizeof(double), cudaMemcpyDeviceToHost));
for(int i = 0; i < min(Nrows, Ncols); i++)
std::cout << "d_S["<<i<<"] = " << std::setprecision(15) << h_S[i] << std::endl;
printf("\n\n");
int count = 0;
bool flag = 0;
while (!flag) {
if (h_S[count] < threshold) flag = 1;
if (count == min(Nrows, Ncols)) flag = 1;
count++;
}
count--;
printf("The null space of A has dimension %i\n\n", min(Ncols, Nrows) - count);
for(int j = count; j < Ncols; j++) {
printf("Basis vector nr. %i\n", j - count);
for(int i = 0; i < Ncols; i++)
std::cout << "d_V["<<i<<"] = " << std::setprecision(15) << h_U[j*Ncols + i] << std::endl;
printf("\n");
}
cusolverDnDestroy(solver_handle);
return 0;
}
Utilities.cuh
#ifndef UTILITIES_CUH
#define UTILITIES_CUH
extern "C" int iDivUp(int, int);
extern "C" void gpuErrchk(cudaError_t);
extern "C" void cusolveSafeCall(cusolverStatus_t);
#endif
Utilities.cu
#include <stdio.h>
#include <assert.h>
#include "cuda_runtime.h"
#include <cuda.h>
#include <cusolverDn.h>
/*******************/
/* iDivUp FUNCTION */
/*******************/
extern "C" int iDivUp(int a, int b){ return ((a % b) != 0) ? (a / b + 1) : (a / b); }
/********************/
/* CUDA ERROR CHECK */
/********************/
// --- Credit to http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api
void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) { exit(code); }
}
}
extern "C" void gpuErrchk(cudaError_t ans) { gpuAssert((ans), __FILE__, __LINE__); }
/**************************/
/* CUSOLVE ERROR CHECKING */
/**************************/
static const char *_cudaGetErrorEnum(cusolverStatus_t error)
{
switch (error)
{
case CUSOLVER_STATUS_SUCCESS:
return "CUSOLVER_SUCCESS";
case CUSOLVER_STATUS_NOT_INITIALIZED:
return "CUSOLVER_STATUS_NOT_INITIALIZED";
case CUSOLVER_STATUS_ALLOC_FAILED:
return "CUSOLVER_STATUS_ALLOC_FAILED";
case CUSOLVER_STATUS_INVALID_VALUE:
return "CUSOLVER_STATUS_INVALID_VALUE";
case CUSOLVER_STATUS_ARCH_MISMATCH:
return "CUSOLVER_STATUS_ARCH_MISMATCH";
case CUSOLVER_STATUS_EXECUTION_FAILED:
return "CUSOLVER_STATUS_EXECUTION_FAILED";
case CUSOLVER_STATUS_INTERNAL_ERROR:
return "CUSOLVER_STATUS_INTERNAL_ERROR";
case CUSOLVER_STATUS_MATRIX_TYPE_NOT_SUPPORTED:
return "CUSOLVER_STATUS_MATRIX_TYPE_NOT_SUPPORTED";
}
return "<unknown>";
}
inline void __cusolveSafeCall(cusolverStatus_t err, const char *file, const int line)
{
if(CUSOLVER_STATUS_SUCCESS != err) {
fprintf(stderr, "CUSOLVE error in file '%s', line %d\n %s\nerror %d: %s\nterminating!\n",__FILE__, __LINE__,err, \
_cudaGetErrorEnum(err)); \
cudaDeviceReset(); assert(0); \
}
}
extern "C" void cusolveSafeCall(cusolverStatus_t err) { __cusolveSafeCall(err, __FILE__, __LINE__); }
I think the most important thing for CUDA is to find an algorithm that doesn't depend on conditional branching (which is quite slow on graphics hardware). Simple if statements that can be optimized into conditional assignment are much better (or you can use the ?: operator).
If necessary, you should be able to do some form of pivoting using conditional assignment. It might actually be harder to determine how to store your result: if your matrix is rank-deficient, what do you want your CUDA program to do about it?
If you assume your 4x3 matrix is not actually rank-deficient, you can find your (single) null-space vector without any conditionals at all: the matrix is small enough that you can use Cramer's rule efficiently.
Actually, since you don't actually care about the scale of your null vector, you don't have to divide by the determinant -- you can just take the determinants of the minors:
x1 x2 x3
M = y1 y2 y3
z1 z2 z3
w1 w2 w3
|y1 y2 y3| |x1 x2 x3| |x1 x2 x3| |x1 x2 x3|
-> x0 = |z1 z2 z3| y0 = -|z1 z2 z3| z0 = |y1 y2 y3| w0 = -|y1 y2 y3|
|w1 w2 w3| |w1 w2 w3| |w1 w2 w3| |z1 z2 z3|
Note that these 3x3 determinants are just triple products; you can save computation by reusing the cross products.
"seems very expensive" - what data do you have that supports this?
Maybe Block Lanczos is the answer you seek.
Or maybe this.
Both JAMA and Apache Commons Math have SVD implementations in Java. Why not take those and try them out? Get some real data for your case instead of impressions. It won't cost you much, since the code is already written and tested.
I wondered if the matrixes are related rather than just being random, so that the null spaces you are seeking can be considered to be like 1-dimensional tangents to a curve in N-space (N = 9). If so, you may be able to speed things up by using Newton's method to solve successive instances of the system of quadratic equations Ax = 0, |x|^2 = 1, starting from a previous null space vector. Newton's method uses first derivatives to converge to a solution, and so would use Gaussian elimination to solve 9x9 systems. Using this technique would require that you be able to make small steps from matrix to matrix by say varying a parameter.
So the idea is that you initialize using SVD on the first matrix, but thereafter you step from matrix to matrix, using the null space vector of one as the starting point for the iteration for the next one. You need one or two iterations to get convergence. If you don't get convegence you use SVD to restart. If this situation is what you have, it is much faster than starting fresh on each matrix.
I used this a long time ago to map contours in the solutions of sets of 50 x 50 quadratic equations associated with the behavior of electric power systems.