Dungeon Keeper 2 Style Map, Vertex Compression - algorithm

So working on an little project but thinking about making maps efficient. I have a grid of numbers say
100110
011011
010110
If you've ever played dungeon keeper, the idea is a 0 is a flat dug out square, and 1 is a still standing square.
I want to take advantage of the grid layout and be able to minimise the number of vertexes used. So instead of using individuals cubes for an area like:
1111
1111
1111
I want to just use 8.
Any idea on the best approach to this? or even just knows the name of the type of algorithm i should use. Something that can do it quickly on the fly would be preferable so not to bottle neck rendering.

I agree that this is probably not gonna be a performance issue, but you could represent your map in a compressed map by using a (slightly modified) unbalanced Quad-tree.
Start by your map consisting only of 1's. You can store this as a box of size n*n in the root node of your tree.
IF you want to dig out one of the boxes you recursively walk down the tree, splitting the n*n box (or whatever you find there) using the default quad tree rules (= split an n*n box into four n/2*n/2 boxes, etc.). At some point you'll arrive in a leaf of the tree that only contains the single box (the one you want to dig out) and you may change it from 1 to 0.
Additionally, after the leaf has changed and your recursive calls return (= you walk back up the tree towards the root node), you can check neighboring boxes for whether they may be merged. (If you have two neighboring boxes that are both dug out, you can merge them).
Another technique that is sometimes used when indexing low-dimensional data like this is a space filling curve. One that has good average locality and is reversible is the Hilbert curve. Basically, you may enumerate your boxes (dug out ones and filled ones) along the space filling curve and then use simple run-length compression.
The tree-idea allows you to reduce the number of rendered geometry (you can rescale texture, etc. to emulate n*n boxes by a single larger box). The space filling curve probably will only save you some memory.

Related

Testing grid passability

Consider this problem:
There's a square grid defined, each tile being either passable (1) or impassable (0).
At first, we have a simply connected space in the grid with an impassable border, like this:
We then start placing impassable obstacles of various dimensions (e.g. 1x1, 2x2,..) into the passable space. After each obstacle is placed, we need to test whether the remaining passable space is still connected (i.e. make sure we didn't split the passable space in two or more disconnected spaces). Tiles are connected diagonally, too.
The point is that after every obstacle placement, every remaining passable tile has a path that connects it to EVERY other remaining passable tile.
I'm aware of the possibility of searching for paths between possibly disconnected points, but I'm afraid that might be too inefficient. What I'm interested in is doing this testing as fast as possible.
Thanks for any help!
Implement a flood fill algorithm. As a side effect of performing the fill, count the number of squares filled. After placing your obstacles perform another flood fill starting from any open square and compare the number of filled squares to the original number minus the number of squares placed as obstacles. If they are not the same, you have disconnected regions.
Wikipedia says that this can be done in amortized O(|V|) time using disjoint-set data structures, where V is the number of elements in the passable space (the second paragraph of that section). The citation is to this paper.
This is the same asymptotic complexity as Benjamin's answer and is presumably harder to implement, so I'd go with that. :)

Sparse (Pseudo) Infinite Grid Data Structure for Web Game

I'm considering trying to make a game that takes place on an essentially infinite grid.
The grid is very sparse. Certain small regions of relatively high density. Relatively few isolated nonempty cells.
The amount of the grid in use is too large to implement naively but probably smallish by "big data" standards (I'm not trying to map the Internet or anything like that)
This needs to be easy to persist.
Here are the operations I may want to perform (reasonably efficiently) on this grid:
Ask for some small rectangular region of cells and all their contents (a player's current neighborhood)
Set individual cells or blit small regions (the player is making a move)
Ask for the rough shape or outline/silhouette of some larger rectangular regions (a world map or region preview)
Find some regions with approximately a given density (player spawning location)
Approximate shortest path through gaps of at most some small constant empty spaces per hop (it's OK to be a bad approximation often, but not OK to keep heading the wrong direction searching)
Approximate convex hull for a region
Here's the catch: I want to do this in a web app. That is, I would prefer to use existing data storage (perhaps in the form of a relational database) and relatively little external dependency (preferably avoiding the need for a persistent process).
Guys, what advice can you give me on actually implementing this? How would you do this if the web-app restrictions weren't in place? How would you modify that if they were?
Thanks a lot, everyone!
I think you can do everything using quadtrees, as others have suggested, and maybe a few additional data structures. Here's a bit more detail:
Asking for cell contents, setting cell contents: these are the basic quadtree operations.
Rough shape/outline: Given a rectangle, go down sufficiently many steps within the quadtree that most cells are empty, and make the nonempty subcells at that level black, the others white.
Region with approximately given density: if the density you're looking for is high, then I would maintain a separate index of all objects in your map. Take a random object and check the density around that object in the quadtree. Most objects will be near high density areas, simply because high-density areas have many objects. If the density near the object you picked is not the one you were looking for, pick another one.
If you're looking for low-density, then just pick random locations on the map - given that it's a sparse map, that should typically give you low density spots. Again, if it doesn't work right try again.
Approximate shortest path: if this is a not-too-frequent operation, then create a rough graph of the area "between" the starting point A and end point B, for some suitable definition of between (maybe the square containing the circle with the midpoint of AB as center and 1.5*AB as diameter, except if that diameter is less than a certain minimum, in which case... experiment). Make the same type of grid that you would use for the rough shape / outline, then create (say) a Delaunay triangulation of the black points. Do a shortest path on this graph, then overlay that on the actual map and refine the path to one that makes sense given the actual map. You may have to redo this at a few different levels of refinement - start with a very rough graph, then "zoom in" taking two points that you got from the higher level as start and end point, and iterate.
If you need to do this very frequently, you'll want to maintain this type of graph for the entire map instead of reconstructing it every time. This could be expensive, though.
Approx convex hull: again start from something like the rough shape, then take the convex hull of the black points in that.
I'm not sure if this would be easy to put into a relational database; a file-based storage could work but it would be impractical to have a write operation be concurrent with anything else, which you would probably want if you want to allow this to grow to a reasonable number of players (per world / map, if there are multiple worlds / maps). I think in that case you are probably best off keeping a separate process alive... and even then making this properly respect multithreading is going to be a headache.
A kd tree or a quadtree is a good data structure to solve your problem. Especially the latter it's a clever way to address the grid and to reduce the 2d complexity to a 1d complexity. Quadtrees is also used in many maps application like bing and google maps. Here is a good start: Nick quadtree spatial index hilbert curve blog.

Randomly and efficiently filling space with shapes

What is the most efficient way to randomly fill a space with as many non-overlapping shapes? In my specific case, I'm filling a circle with circles. I'm randomly placing circles until either a certain percentage of the outer circle is filled OR a certain number of placements have failed (i.e. were placed in a position that overlapped an existing circle). This is pretty slow, and often leaves empty spaces unless I allow a huge number of failures.
So, is there some other type of filling algorithm I can use to quickly fill as much space as possible, but still look random?
Issue you are running into
You are running into the Coupon collector's problem because you are using a technique of Rejection sampling.
You are also making strong assumptions about what a "random filling" is. Your algorithm will leave large gaps between circles; is this what you mean by "random"? Nevertheless it is a perfectly valid definition, and I approve of it.
Solution
To adapt your current "random filling" to avoid the rejection sampling coupon-collector's issue, merely divide the space you are filling into a grid. For example if your circles are of radius 1, divide the larger circle into a grid of 1/sqrt(2)-width blocks. When it becomes "impossible" to fill a gridbox, ignore that gridbox when you pick new points. Problem solved!
Possible dangers
You have to be careful how you code this however! Possible dangers:
If you do something like if (random point in invalid grid){ generateAnotherPoint() } then you ignore the benefit / core idea of this optimization.
If you do something like pickARandomValidGridbox() then you will slightly reduce the probability of making circles near the edge of the larger circle (though this may be fine if you're doing this for a graphics art project and not for a scientific or mathematical project); however if you make the grid size 1/sqrt(2) times the radius of the circle, you will not run into this problem because it will be impossible to draw blocks at the edge of the large circle, and thus you can ignore all gridboxes at the edge.
Implementation
Thus the generalization of your method to avoid the coupon-collector's problem is as follows:
Inputs: large circle coordinates/radius(R), small circle radius(r)
Output: set of coordinates of all the small circles
Algorithm:
divide your LargeCircle into a grid of r/sqrt(2)
ValidBoxes = {set of all gridboxes that lie entirely within LargeCircle}
SmallCircles = {empty set}
until ValidBoxes is empty:
pick a random gridbox Box from ValidBoxes
pick a random point inside Box to be center of small circle C
check neighboring gridboxes for other circles which may overlap*
if there is no overlap:
add C to SmallCircles
remove the box from ValidBoxes # possible because grid is small
else if there is an overlap:
increase the Box.failcount
if Box.failcount > MAX_PERGRIDBOX_FAIL_COUNT:
remove the box from ValidBoxes
return SmallCircles
(*) This step is also an important optimization, which I can only assume you do not already have. Without it, your doesThisCircleOverlapAnother(...) function is incredibly inefficient at O(N) per query, which will make filling in circles nearly impossible for large ratios R>>r.
This is the exact generalization of your algorithm to avoid the slowness, while still retaining the elegant randomness of it.
Generalization to larger irregular features
edit: Since you've commented that this is for a game and you are interested in irregular shapes, you can generalize this as follows. For any small irregular shape, enclose it in a circle that represent how far you want it to be from things. Your grid can be the size of the smallest terrain feature. Larger features can encompass 1x2 or 2x2 or 3x2 or 3x3 etc. contiguous blocks. Note that many games with features that span large distances (mountains) and small distances (torches) often require grids which are recursively split (i.e. some blocks are split into further 2x2 or 2x2x2 subblocks), generating a tree structure. This structure with extensive bookkeeping will allow you to randomly place the contiguous blocks, however it requires a lot of coding. What you can do however is use the circle-grid algorithm to place the larger features first (when there's lot of space to work with on the map and you can just check adjacent gridboxes for a collection without running into the coupon-collector's problem), then place the smaller features. If you can place your features in this order, this requires almost no extra coding besides checking neighboring gridboxes for collisions when you place a 1x2/3x3/etc. group.
One way to do this that produces interesting looking results is
create an empty NxM grid
create an empty has-open-neighbors set
for i = 1 to NumberOfRegions
pick a random point in the grid
assign that grid point a (terrain) type
add the point to the has-open-neighbors set
while has-open-neighbors is not empty
foreach point in has-open-neighbors
get neighbor-points as the immediate neighbors of point
that don't have an assigned terrain type in the grid
if none
remove point from has-open-neighbors
else
pick a random neighbor-point from neighbor-points
assign its grid location the same (terrain) type as point
add neighbor-point to the has-open-neighbors set
When done, has-open-neighbors will be empty and the grid will have been populated with at most NumberOfRegions regions (some regions with the same terrain type may be adjacent and so will combine to form a single region).
Sample output using this algorithm with 30 points, 14 terrain types, and a 200x200 pixel world:
Edit: tried to clarify the algorithm.
How about using a 2-step process:
Choose a bunch of n points randomly -- these will become the centres of the circles.
Determine the radii of these circles so that they do not overlap.
For step 2, for each circle centre you need to know the distance to its nearest neighbour. (This can be computed for all points in O(n^2) time using brute force, although it may be that faster algorithms exist for points in the plane.) Then simply divide that distance by 2 to get a safe radius. (You can also shrink it further, either by a fixed amount or by an amount proportional to the radius, to ensure that no circles will be touching.)
To see that this works, consider any point p and its nearest neighbour q, which is some distance d from p. If p is also q's nearest neighbour, then both points will get circles with radius d/2, which will therefore be touching; OTOH, if q has a different nearest neighbour, it must be at distance d' < d, so the circle centred at q will be even smaller. So either way, the 2 circles will not overlap.
My idea would be to start out with a compact grid layout. Then take each circle and perturb it in some random direction. The distance in which you perturb it can also be chosen at random (just make sure that the distance doesn't make it overlap another circle).
This is just an idea and I'm sure there are a number of ways you could modify it and improve upon it.

Space partitioning algorithm

I have a set of points which are contained within the rectangle. I'd like to split the rectangles into subrectangles based on point density (giving a number of subrectangles or desired density, whichever is easiest).
The partitioning doesn't have to be exact (almost any approximation better than regular grid would do), but the algorithm has to cope with the large number of points - approx. 200 millions. The desired number of subrectangles however is substantially lower (around 1000).
Does anyone know any algorithm which may help me with this particular task?
Just to understand the problem.
The following is crude and perform badly, but I want to know if the result is what you want>
Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)
Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.
Is that what you want to achieve?
I think I'd start with the following, which is close to what #belisarius already proposed. If you have any additional requirements, such as preferring 'nearly square' rectangles to 'long and thin' ones you'll need to modify this naive approach. I'll assume, for the sake of simplicity, that the points are approximately randomly distributed.
Split your initial rectangle in 2 with a line parallel to the short side of the rectangle and running exactly through the mid-point.
Count the number of points in both half-rectangles. If they are equal (enough) then go to step 4. Otherwise, go to step 3.
Based on the distribution of points between the half-rectangles, move the line to even things up again. So if, perchance, the first cut split the points 1/3, 2/3, move the line half-way into the heavy half of the rectangle. Go to step 2. (Be careful not to get trapped here, moving the line in ever decreasing steps first in one direction, then the other.)
Now, pass each of the half-rectangles in to a recursive call to this function, at step 1.
I hope that outlines the proposal well enough. It has limitations: it will produce a number of rectangles equal to some power of 2, so adjust it if that's not good enough. I've phrased it recursively, but it's ideal for parallelisation. Each split creates two tasks, each of which splits a rectangle and creates two more tasks.
If you don't like that approach, perhaps you could start with a regular grid with some multiple (10 - 100 perhaps) of the number of rectangles you want. Count the number of points in each of these tiny rectangles. Then start gluing the tiny rectangles together until the less-tiny rectangle contains (approximately) the right number of points. Or, if it satisfies your requirements well enough, you could use this as a discretisation method and integrate it with my first approach, but only place the cutting lines along the boundaries of the tiny rectangles. This would probably be much quicker as you'd only have to count the points in each tiny rectangle once.
I haven't really thought about the running time of either of these; I have a preference for the former approach 'cos I do a fair amount of parallel programming and have oodles of processors.
You're after a standard Kd-tree or binary space partitioning tree, I think. (You can look it up on Wikipedia.)
Since you have very many points, you may wish to only approximately partition the first few levels. In this case, you should take a random sample of your 200M points--maybe 200k of them--and split the full data set at the midpoint of the subsample (along whichever axis is longer). If you actually choose the points at random, the probability that you'll miss a huge cluster of points that need to be subdivided will be approximately zero.
Now you have two problems of about 100M points each. Divide each along the longer axis. Repeat until you stop taking subsamples and split along the whole data set. After ten breadth-first iterations you'll be done.
If you have a different problem--you must provide tick marks along the X and Y axis and fill in a grid along those as best you can, rather than having the irregular decomposition of a Kd-tree--take your subsample of points and find the 0/32, 1/32, ..., 32/32 percentiles along each axis. Draw your grid lines there, then fill the resulting 1024-element grid with your points.
R-tree
Good question.
I think the area you need to investigate is "computational geometry" and the "k-partitioning" problem. There's a link that might help get you started here
You might find that the problem itself is NP-hard which means a good approximation algorithm is the best you're going to get.
Would K-means clustering or a Voronoi diagram be a good fit for the problem you are trying to solve?
That's looks like Cluster analysis.
Would a QuadTree work?
A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. All forms of Quadtrees share some common features:
They decompose space into adaptable cells
Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits
The tree directory follows the spatial decomposition of the Quadtree

Spatial Index for Rectangles With Fast Insert

I'm looking for a data structure that provides indexing for Rectangles. I need the insert algorithm to be as fast as possible since the rectangles will be moving around the screen (think of dragging a rectangle with your mouse to a new position).
I've looked into R-Trees, R+Trees, kD-Trees, Quad-Trees and B-Trees but from my understanding insert's are usually slow. I'd prefer to have inserts at sub-linear time complexity so maybe someone can prove me wrong about either of the listed data structures.
I should be able to query the data structure for what rectangles are at point(x, y) or what rectangles intersect rectangle(x, y, width, height).
EDIT: The reason I want insert so fast is because if you think of a rectangle being moved around the screen, they're going to have to be removed and then re-inserted.
Thanks!
I'd use a multiscale grid approach (equivalent to quad-trees in some form).
I'm assuming you're using integer coordinates (i.e. pixels) and have plenty of space to hold all the pixels.
Have an array of lists of rectangles, one for each pixel. Then, bin two-by-two and do it again. And again, and again, and again, until you have one pixel that covers everything.
Now, the key is that you insert your rectangles at the level that is a good match for the size of the rectangle. This will be something like (pixel size) ~= min(height,width)/2. Now for each rectangle you have only a handful of inserts to do into the lists (you could bound it above by a constant, e.g. pick something that has between 4 and 16 pixels).
If you want to seek for all rectangles at x,y you look in the list of the smallest pixel, and then in the list of the 2x2 binned pixel that contains it, and then in the 4x4 etc.; you should have log2(# of pixels) steps to look through. (For larger pixels, you then have to check whether (x,y) was really in the rectangle; you expect about half of them to be successful on borders, and all of them to be successful inside the rectangle, so you'd expect no worse than 2x more work than if you looked up the pixel directly.)
Now, what about insert? That's very inexpensive--O(1) to stick yourself on the front of a list.
What about delete? That's more expensive; you have to look through and heal each list for each pixel you're entered in. That's approximately O(n) in the number of rectangles overlapping at that position in space and of approximately the same size. If you have really large numbers of rectangles, then you should use some other data structure to hold them (hash set, RB tree, etc.).
(Note that if your smallest rectangle must be larger than a pixel, you don't need to actually form the multiscale structure all the way to the pixel level; just go down until the smallest rectangle won't get hopelessly lost inside your binned pixel.)
The data structures you mention are quite a mixed bag: in particular B-Trees should be fast (cost to insert grows with the logarithm of the number of items present) but won't speed up your intersection queries.
Ignoring that - and hoping for the best - the spatial data structures come in two parts. The first part tells you how to build a tree structure from the data. The second part tells you how to keep track of information at each node that describes the items stored below that node, and how to use it to speed up queries.
You can usually pinch the ideas about keeping track of information at each node without using the (expensive) ideas about exactly how the tree should be built. For instance, you could create a key for each rectangle by bit-interleaving the co-ordinates of its points and then use a perfectly ordinary tree structure (such as a B-tree or an AVL tree or a Red-Black tree) to store it, while still keeping information at each node. This might, in practice, speed up your queries enough - although you wouldn't be able to tell that until you implemented and tested it on real data. The purpose of the tree-building instructions in most schemes is to provide performance guarantees.
Two postscripts:
1) I like Patricia trees for this - they are reasonably easy to implement, and adding or deleting entries does not disturb the tree structure much, so you won't have too much work to do updating information stored at nodes.
2) Last time I looked at a window system, it didn't bother about any of this clever stuff at all - it just kept a linear list of items and searched all the way through it when it needed to: that was fast enough.
This is perhaps an extended comment rather than an answer.
I'm a bit puzzled about what you really want. I could guess that you want a data structure to support quick answers to questions such as 'Given the ID of a rectangle, return its current coordinates'. Is that right ?
Or do you want to answer 'what rectangle is at position (x,y)' ? In that case an array with dimensions matching the height and width of your display might suffice, with each element in the array being a (presumably short) list of the rectangles on that pixel.
But then you state that you need an insert algorithm to be as fast as possible to cope with rectangles moving constantly. If you had only, say, 10 rectangles on screen, you could simply have a 10-element array containing the coordinates of each of the rectangles. Updating their positions would not then require any inserts into the data structure.
How many rectangles ? How quickly are they created ? and destroyed ? How do you want to cope with overlaps ? Is a rectangle just a boundary, or does it include the interior ?

Resources