Optimize calculating distance between two borders

Optimize calculating distance between two borders - algorithm

I have a bitmap world map where each country is drawn in a unique color. Upon loading the map I have stored all border pixels in an array per country.
Next I calculate the distance between two counties (A and B). I do this by looping over every pixel in A's border array and calculating the distance between it and every pixel in B's border array. After finding the shortest distance I store it in a lookup table.
To optimize this I have:
Filtered out all immediate neighbors beforehand
When trying to find the shortest distance between two pixels I only compare the squared distance (only when I've found the closest one do I calculate the actual distance using square root).
When storing the distance I store it for both A->B and B->A so B will then only calculate distance against C to Z and C only against D to Z, etc.
With a large map this still takes quite a lot of time, so are there any other optimizations that I could do?

Store the border pixel data in quadtree or another hierarchical structure exploiting the actual geometry (perhaps in a triangular tree). Instead of calculating true distances for N*N/2 pixels, you will calculate ranges of min/max distances for log2(N)*log2(N)/2 areas containing the border pixels, ruling out large sets of impossible candidates, then refining to next level.
Here in sample A, there are 12 squares to be compared to 4 candidate squares of sample B, leading probably to 4*5 next level candidates (all B squares and 5 closest regions in A).

Consider calculating distance not between every pixel border of A and B but let's say each 10-th. This will give you rough solution. If the precision is not enough for your purpose, you can make it more accurate by more comparisons.
Another approach may be to introduce new model of border data structure. Store it not as each point but the set of 'characteristic' points

Related

How to index nearby 3D points on the fly?

In physics simulations (for example n-body systems) it is sometimes necessary to keep track of which particles (points in 3D space) are close enough to interact (within some cutoff distance d) in some kind of index. However, particles can move around, so it is necessary to update the index, ideally on the fly without recomputing it entirely. Also, for efficiency in calculating interactions it is necessary to keep the list of interacting particles in the form of tiles: a tile is a fixed size array (eg 32x32) where the rows and columns are particles, and almost every row-particle is close enough to interact with almost every column particle (and the array keeps track of which ones actually do interact).
What algorithms may be used to do this?
Here is a more detailed description of the problem:
Initial construction: Given a list of points in 3D space (on the order of a few thousand to a few million, stored as array of floats), produce a list of tiles of a fixed size (NxN), where each tile has two lists of points (N row points and N column points), and a boolean array NxN which describes whether the interaction between each row and column particle should be calculated, and for which:
a. every pair of points p1,p2 for which distance(p1,p2) < d is found in at least one tile and marked as being calculated (no missing interactions), and
b. if any pair of points is in more than one tile, it is only marked as being calculated in the boolean array in at most one tile (no duplicates),
and also the number of tiles is relatively small if possible (but this is less important than being able to update the tiles efficiently)
Update step: If the positions of the points change slightly (by much less than d), update the list of tiles in the fastest way possible so that they still meet the same conditions a and b (this step is repeated many times)
It is okay to keep any necessary data structures that help with this, for example the bounding boxes of each tile, or a spatial index like a quadtree. It is probably too slow to calculate all particle pairwise distances for every update step (and in any case we only care about particles which are close, so we can skip most possible pairs of distances just by sorting along a single dimension for example). Also it is probably too slow to keep a full (quadtree or similar) index of all particle positions. On the other hand is perfectly fine to construct the tiles on a regular grid of some kind. The density of particles per unit volume in 3D space is roughly constant, so the tiles can probably be built from (essentially) fixed size bounding boxes.
To give an example of the typical scale/properties of this kind of problem, suppose there is 1 million particles, which are arranged as a random packing of spheres of diameter 1 unit into a cube with of size roughly 100x100x100. Suppose the cutoff distance is 5 units, so typically each particle would be interacting with (2*5)**3 or ~1000 other particles or so. The tile size is 32x32. There are roughly 1e+9 interacting pairs of particles, so the minimum possible number of tiles is ~1e+6. Now assume each time the positions change, the particles move a distance around 0.0001 unit in a random direction, but always in a way such that they are at least 1 unit away from any other particle and the typical density of particles per unit volume stays the same. There would typically be many millions of position update steps like that. The number of newly created pairs of interactions per step due to the movement is (back of the envelope) (10**2 * 6 * 0.0001 / 10**3) * 1e+9 = 60000, so one update step can be handled in principle by marking 60000 particles as non-interacting in their original tiles, and adding at most 60000 new tiles (mostly empty - one per pair of newly interacting particles). This would rapidly get to a point where most tiles are empty, so it is definitely necessary to combine/merge tiles somehow pretty often - but how to do it without a full rebuild of the tile list?
P.S. It is probably useful to describe how this differs from the typical spatial index (eg octrees) scenario: a. we only care about grouping close by points together into tiles, not looking up which points are in an arbitrary bounding box or which points are closest to a query point - a bit closer to clustering that querying and b. the density of points in space is pretty constant and c. the index has to be updated very often, but most moves are tiny

Not sure my reasoning is sound, but here's an idea:
Divide your space into a grid of 3d cubes, like this in three dimensions:
The cubes have a side length of d. Then do the following:
Assign all points to all cubes in which they're contained; this is fast since you can derive a point's cube from just their coordinates
Now check the following:
Mark all points in the top left of your cube as colliding; they're less than d apart. Further, every "quarter cube" in space is only the top left quarter of exactly one cube, so you won't check the same pair twice.
Check fo collisions of type (p, q), where p is a point in the top left quartile, and q is a point not in the top left quartile. In this way, you will check collision between every two points again at most once, because very pair of quantiles is checked exactly once.
Since every pair of points is either in the same quartile or in neihgbouring quartiles, they'll be checked by the first or the second algorithm. Further, since points are approximately distributed evenly, your runtime is much less than n^2 (n=no points); in aggregate, it's k^2 (k = no points per quartile, which appears to be approximately constant).
In an update step, you only need to check:
if a point crossed a boundary of a box, which should be fast since you can look at one coordinate at a time, and box' boundaries are a simple multiple of d/2
check for collisions of the points as above
To create the tiles, divide the space into a second grid of (non-overlapping) cubes whose width is chosen s.t. the average count of centers between two particles that almost interact with each other that fall into a given cube is less than the width of your tiles (i.e. 32). Since each particle is expected to interact with 300-500 particles, the width will be much smaller than d.
Then, while checking for interactions in step 1 & 2, assigne particle interactions to these new cubes according to the coordinates of the center of their interaction. Assign one tile per cube, and mark interacting particles assigned to that cube in the tile. Visualization:
Further optimizations might be to consider the distance of a point's closest neighbour within a cube, and derive from that how many update steps are needed at least to change the collision status of that point; then ignore that point for this many steps.

I suggest the following algorithm. E.g we have cube 1x1x1 and the cutoff distance is 0.001
Let's choose three base anchor points: (0,0,0) (0,1,0) (1,0,0)
Associate array of size 1000 ( 1 / 0.001) with each anchor point
Add three numbers into each regular point. We will store the distance between the given point and each anchor point inside these fields
At the same time this distance will be used as an index in an array inside the anchor point. E.g. 0.4324 means index 432.
Let's store the set of points inside of each three arrays
Calculate distance between the regular point and each anchor point every time when update point
Move point between sets in arrays during the update
The given structures will give you an easy way to find all closer points: it is the intersection between three sets. And we choose these sets based on the distance between point and anchor points.
In short, it is the intersection between three spheres. Maybe you need to apply additional filtering for the result if you want to erase the corners of this intersection.

Consider using the Barnes-Hut algorithm or something similar. A simulation in 2D would use a quadtree data structure to store particles, and a 3D simulation would use an octree.
The benefit of using a a tree structure is that it stores the particles in a way that nearby particles can be found quickly by traversing the tree, and far-away particles are in traversal paths that can be ignored.
Wikipedia has a good description of the algorithm:
The Barnes–Hut tree
In a three-dimensional n-body simulation, the Barnes–Hut algorithm recursively divides the n bodies into groups by storing them in an octree (or a quad-tree in a 2D simulation). Each node in this tree represents a region of the three-dimensional space. The topmost node represents the whole space, and its eight children represent the eight octants of the space. The space is recursively subdivided into octants until each subdivision contains 0 or 1 bodies (some regions do not have bodies in all of their octants). There are two types of nodes in the octree: internal and external nodes. An external node has no children and is either empty or represents a single body. Each internal node represents the group of bodies beneath it, and stores the center of mass and the total mass of all its children bodies.
demo

Snapping vector to a point from a grid on a sphere (icosahedron)

here is a problem that will turn your brain inside out, I'm trying to deal with it for a quite some time already.
Suppose you have sphere located in the origin of a 3d space. The sphere is segmented into a grid of equidistant points. The procedure that forms grid isn't that important but what seems simple to me is to use regular 3d computer graphics sphere generation procedure (The algorithm that forms the sphere described in the picture below)
Now, after I have such sphere (i.e. icosahedron of some degree) I need a computationally trivial procedure that will be capable to snap (an angle) of a random unit vector to it's closest icosahedron edge points. Also it is acceptable if the vector will be snapped to a center point of triangle that the vector is intersecting.
I would like to emphasise that it is important that the procedure should be computationally trivial. This means that procedures that actually create a sphere in memory and then involve a search among every triangle in sphere is not a good idea because such search will require access to global heap and ram which is slow because I need to perform this procedure millions of times on a low end mobile hardware.
The procedure should yield it's result through a set of mathematical equations based only on two values, the vector and degree of icosahedron (i.e. sphere)
Any thoughts? Thank you in advance!
============
Edit
One afterthought that just came to my mind, it seems that within diagram below step 3 (i.e. Project each new vertex to the unit sphere) is not important at all, because after bisection, projection of every vertex to a sphere would preserve all angular characteristics of a bisected shape that we are trying to snap to. So the task simplifies to identifying a bisected sub triangle coordinates that are penetrated by vector.

Make a table with 20 entries of top-level icosahedron faces coordinates - for example, build them from wiki coordinate set)
The vertices of an icosahedron centered at the origin with an
edge-length of 2 and a circumscribed sphere radius of 2 sin (2π/5) are
described by circular permutations of:
V[] = (0, ±1, ±ϕ)
where ϕ = (1 + √5)/2
is the golden ratio (also written τ).
and calculate corresponding central vectors C[] (sum of three vectors for vertices of every face).
Find the closest central vector using maximum of dot product (DP) of your vector P and all C[]. Perhaps, it is possible to reduce number of checks accounting for P components (for example if dot product of P and some V[i] is negative, there is no sense to consider faces being neighbors of V[i]). Don't sure that this elimination takes less time than direct full comparison of DP's with centers.
When big triangle face is determined, project P onto the plane of that face and get coordinates of P' in u-v (decompose AP' by AB and AC, where A,B,C are face vertices).
Multiply u,v by 2^N (degree of subdivision).
u' = u * 2^N
v' = v * 2^N
iu = Floor(u')
iv = Floor(v')
fu = Frac(u')
fv = Frac(v')
Integer part of u' is "row" of small triangle, integer part of v' is "column". Fractional parts are trilinear coordinates inside small triangle face, so we can choose the smallest value of fu, fv, 1-fu-fv to get the closest vertice. Calculate this closest vertex and normalize vector if needed.

It's not equidistant, you can see if you study this version:
It's a problem of geodesic dome frequency and some people have spent time researching all known methods to do that geometry: http://geo-dome.co.uk/article.asp?uname=domefreq, see that guy is a self labelled geodesizer :)
One page told me that the progression goes like this: 2 + 10·4N (12,42,162...)
You can simplify it down to a simple flat fractal triangle, where every triangle devides into 4 smaller triangles, and every time the subdivision is rotated 12 times around a sphere.
Logically, it is only one triangle rotated 12 times, and if you solve the code on that side, then you have the lowest computation version of the geodesic spheres.
If you don't want to keep the 12 sides as a series of arrays, and you want a lower memory version, then you can read about midpoint subdivision code, there's a lot of versions of midpoint subdivision.
I may have completely missed something. just that there isn't a true equidistant geodesic dome, because a triangle doesn't map to a sphere, only for icos.

Minimum amount of rectangles from multi-colored grid

I've been working for some time in an XNA roguelike game and I can't get my head around the following problem: developing an algorithm to divide a matrix of non-binary values into the fewest rectangles grouping these values.
Example: given the following matrix
01234567
0 ---##*##
1 ---##*##
2 --------
The algorithm should return:
3x3 rectangle of '-'s starting at (0,0)
2x2 rectangle of '#'s starting at (3, 0)
1x2 rectangle of '*'s starting at (5, 0)
2x2 rectangle of '#'s starting at (6, 0)
5x1 rectangle of '-'s starting at (3, 2)
Why am I doing this: I've gotten a pretty big dungeon type with a size of approximately 500x500. If I were to individually call the "Draw" method for each tile's Sprite, my FPS would be far too low. It is possible to optimize this process by grouping similar-textured tiles and applying texture repetition to them, which would dramatically decrease the amount of GPU draw calls for that. For example, if my map were the previous matrix, instead of calling draw 16 times, I'd call it only 5 times.
I've looked at some algorithms which can give you the biggest rectangle of a type inside a given binary matrix, but that doesn't fit my problem.
Thanks in advance!

You can use breadth first searches to separate each area of different tile type.
Picking a partitioning within the individual shapes is an NP-hard problem (see https://en.wikipedia.org/wiki/Graph_partition), so you can't find an efficient solution that guarantees the minimum number of rectangles. However if you don't mind an extra rectangle or two for each shape and your shapes are relatively small, you can come up with algorithms that split the shape into a number of rectangles close to the minimum.
An off the top of my head guess for something that could potentially work would be to pick a tile with the maximum connecting tiles and start growing a rectangle from it using a recursive algorithm to maximize the size. Remove the resulting rectangle from the shape, then repeat until there are no more tiles not included in a rectangle. Again, this won't produce perfect results, there are graphs on which this will return with more than the minimum amount of rectangles, but it's an easy to implement ballpark solution. With a little more effort I'm sure you will be able to find better heuristics to use and get better results too.

One possible building block is a routine to check, given two points, whether the rectangle formed by using those points as opposite corners is all of the same type. I think that a fast (but unreliable) means of testing this can be based on mapping each type to a large random number, and then working out the sum of the numbers within a rectangle modulo a large prime. Take one of the numbers within the rectangle. If the sum of the numbers within the rectangle is the size of the rectangle times the one number sampled, assume that the all of the numbers in the rectangle are the same.
In one dimension we can work out all of the cumulative sums a, a+b, a+b+c, a+b+c+d,... in time O(N) and then, for any two points, work out the sum for the interval between them by subtracting cumulative sums: b+c+d = a+b+c+d - a. In two dimensions, we can use cumulative sums to work out, for each point, the sum of all of the numbers from positions which have x and y co-ordinates no greater than the (x, y) coordinate of that position. For any rectangle we can work out the sum of the numbers within that rectangle by working out A-B-C+D where A,B,C,D are two-dimensional cumulative sums.
So with pre-processing O(N) we can work out a table which allows us to compute the sum of the numbers within a rectangle specified by its opposite corners in time O(1). Unless we are very unlucky, checking this sum against the size of the rectangle times a number extracted from within the rectangle will tell us whether the rectangle is all of the same type.
Based on this, repeatedly start with a random point not covered. Take a point just to its left and move that point left as long as the interval between the two points is of the same type. Then move that point up as long as the rectangle formed by the two points is of the same type. Now move the first point to the right and down as long as the rectangle formed by the two points is of the same type. Now you think you have a large rectangle covering the original point. Check it. In the unlikely event that it is not all of the same type, add that rectangle to a list of "fooled me" rectangles you can check against in future and try again. If it is all of the same type, count that as one extracted rectangle and mark all of the points in it as covered. Continue until all points are covered.
This is a greedy algorithm that makes no attempt at producing the optimal solution, but it should be reasonably fast - the most expensive part is checking that the rectangle really is all of the same type, and - assuming you pass that test - each cell checked is also a cell covered so the total cost for the whole process should be O(N) where N is the number of cells of input data.

Translate and transform plane geometry based on corner coordinates

I have a plane mesh with divisions and I want to specify the coordinates that each of the corners should be positioned. Moving and updating the mesh vertices achieves what I'm trying to do, so long as the plane only has no internal segments. If internal segments are added then I have more vertices than I can manually place, so these need to automatically fall in line with the transformation of the outer edges.
My initial thought here was that I could create a geometry with only four vertices, reposition them, and then increase the number of segments on my plane, apparently, this isn't something that Three.js supports, so I'm looking for a workaround.
Any thoughts would be appreciated.

I don't think that this sort of transformation is expressible as a single matrix that you could then just apply to your plane mesh. I think you really do need to calculate the coordinates of each vertex of the subdivided plane manually.
There are different ways to do this calculation. Bilinear interpolation is this case seems to do the job. Here's how you do it. If you have four points A, B, C, D, then for each internal points, its position can be found as the weighted average of (the weighted average of A and B, and the weighted average of C and D). The weights for the averages come from the index of the subdivision vertex in one direction (say, X) for the inner averages and in the other direction (say, Y) for the outer average. Your indexes run from 0 up to the number of subdivisions in that direction (inclusive), the weight should be from 0 to 1, so the weight = index / number of subdivisions.

How to perform spatial partitioning in n-dimensions?

I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc).
I've been reading up on the algorithms, and I understand most of it:
here and here
I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane.
[edit to add more info]
This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely.
My questions are:
What is an algorithm to find the plane once I have the centroid?
How can I test a vector to see if it is on either side of that plane?

If you start with one centroid, then you'll have to split it, basically by doubling it and slightly moving the points apart in an arbitrary direction. The plane is just the plane orthogonal to that direction.
But you don't need to compute that plane.
More generally, the region (i) is defined as the set of points which are closer to the centroid c_i than to any other centroid. When you have two centroids, each region is a half space, thus separated by a (hyper)plane.
How to test on a vector x to see on which side of the plane it is? (that's with two centroids)
Just compute the distance ||x-c1|| and ||x-c2||, the index of the minimum value (1 or 2) will give you which region the point x belongs to.
More generally, if you have n centroids, you would compute all the distances ||x-c_i||, and the centroid x is closest to (i.e., for which the distance is minimal) will give you the region x is belonging to.

I don't quite understand the algorithm, but the second question is easy:
Let's call V a vector which extends from any point on the plane to the point-in-question. Then the point-in-question lies on the same side of the (hyper)plane as the normal N iff V·N > 0

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio