Need help merging some rectangles - algorithm

Hi, I have this mess mess on the left, it's pretty much an array of rectangles with some holes (marked in red). I'm looking for a way to combine them in a way that I'll end up with as few rectangles as possible and preferably have most of them will be as close to squares as possible. Look at the image on the right, that's the kind of thing I'm trying to accomplish, just a bit prettier and preferably a bit more automatic.
I need this for a game and it won't be done at runtime so speed isn't really a concern (unless it's extremely slow, because I have to do it on a fairly large area) but I've never had to do something like this before and I honestly have no idea where to even start.
I already tried bruteforcing my way through the array, starting from the top-left square and kind of merging until there's nothing left to merge but it really isn't that efficient since it can't consider merging rectangles 3x2, 4x3, etc..
If you can point me to any algorithms that can handle this sort of thing or have an idea of how this could be accomplished it would be much appreciated. Thanks!

You can try a greedy algorithm. Of course it won't be optimal (well, you didn't define the optimality criterion strictly). But maybe it will perform good enough for your needs.
So you can try:
Find a pair of rectangles that can be merged with maximum total area
Replace them with the new one - the result of merge operation
Repeat until you cannot find a suitable pair
If you also care for resulting rectangles being close to square you can try to maximize something like a * totalArea + (1 - a) * (min_resulting_side/max_resulting_side) with a suitable value for 0 < a < 1.


Procedural Maze Algorithm With Cells Determined Independently of Neighbors

I was thinking about maze algorithms recently (mostly because I'm working on a game, but I felt this is a more general question than game development related). In simple terms, I was wondering if there is a sort of maze algorithm that can generate (a possibly infinite number of) cells without any information specifically about the cell's neighbors. I imagine, if such a thing were possible, it would rely heavily upon noise functions such as Perlin or Simplex.
Each cell has four walls, these are used when actually rendering the maze so that corridors and walls are not the same thickness.
Let's say, for example, I'd like a cell at (32, 15) to generate its walls.
I know of algorithms like Ellers (which requires a limited number of columns, but infinite rows) and the Virtual fractal Mazes algorithm (which needs to know previous cells in order to build upon them infinitely in both x and y directions).
Does anyone know of any algorithm I could look into for this specific request? If not, are there any algorithms that are good for chunk-based mazes that you know of?
(Note: I did search around for a bit through StackOverflow to see if there were any questions with similar requests to mine, but I did not come across any. If you happen to know of one, a link would be greatly appreciated :D)
Thank you in advance.
Seeeeeecreeeets. My preeeeciooouss secretts. But yeah I can understand the frustration so I'll throw this one to you OP/SO. Feel free to update the PCG Wiki if you're not as lazy as me :3
There are actually many ways to do this. Some of the best techniques for procgen are:
Asking what you really want.
Design backwards. Play in reverse. Result is forwards.
Look at a random sampling of your target goal and try to see overall patterns.
But to get back to the question, there are two simple ways and they both start from asking what your really want. I'll give those first.
The first is to create 2 layers. Both are random noise. You connect the top and the bottom so they're fully connected. No isolated portions. This is asking what you really want which is connected-ness. And then guaranteeing it in a local clean-up step. (tbh I forget the function applied to layer 2 that guarantees connected-ness. I can't find the code atm.... But I remember it was a really simple local function... XOR, Curl, or something similar. Maybe you can figure it out before I fix this).
The second way is using the properties of your functions. As long as your random function is smooth enough you can take the gradient and assign a tile to it. The way you assign the tiles changes the maze structure but you can guarantee connectivity by clever selection of tiles for each gradient (b/c similar or opposite gradients are more likely to be near each other on a smooth gradient function). From here your smooth random can be any form of Perlin Noise, etc. Once again a asking what you want technique.
For backwards-reversed you unfortunately have an NP problem (I'm not sure if it's hard, complete, or whatever it's been a while since I've worked on this...). So while you could generate a random map of distances down a maze path. And then from there generate the actual obstacles... it's not really advisable. There's also a ton of consideration on different cases even for small mazes...
Is simple. There's a column in the lower right corner of 0 and the middle 2 has an _| shaped wall.
This one makes less sense. You still are required to have the same basic walls as before on all the non-changed squares... But you can't have that 4. It needs to be within 1 of at least a single neighbor. (I mean you could have a +3 cost for that square by having something like a conveyor belt or something, but then we're out of the maze problem) Okay so....
Makes more sense but the 2 in the corner is nonsense once again. Flipping that from a trough to a peak would give.
Which makes sense. At any rate. If you can get to this point then looking at local neighbors will give you walls if it's +/-1 then no wall. Otherwise wall. Also note that you can break the rules for the distance map in a consistent way and make a maze just fine. (Like instead of allowing a column picking a wall and throwing it up. This is just loop splitting at this point and should be safe)
For random sampling as the final one that I'm going to look at... Certain maze generation algorithms in the limit take on some interesting properties either as an average configuration or after millions of steps. Some form Voronoi regions. Some form concentric circles with a randomly flipped wall to allow a connection between loops. Etc. The loop one is good example to work off of. Create a set of loops. Flip a random wall on each loop. One will delete a wall which will create access to the next loop. One will split a path and offer a dead-end and a continuation. For a random flip to be a failure there has to be an opening and a split made right next to each other (unless you allow diagonals then we're good). So make loops. Generate random noise per loop. Xor together. Replace local failures with a fixed path if no diagonals are allowed.
So how do we get random noise per loop? Or how do we get better loops than just squares? Just take a random function. Separate divergence and now you have a loop map. If you have the differential equations for the source random function you can pick one random per loop. A simpler way might be to generate concentric circular walls and pick a random point at each radius to flip. Then distort the final result. You have to be careful your distortion doesn't violate any of your path-connected-ness conditions at that point though.

How to find neighboring solutions in simulated annealing?

I'm working on an optimization problem and attempting to use simulated annealing as a heuristic. My goal is to optimize placement of k objects given some cost function. Solutions take the form of a set of k ordered pairs representing points in an M*N grid. I'm not sure how to best find a neighboring solution given a current solution. I've considered shifting each point by 1 or 0 units in a random direction. What might be a good approach to finding a neighboring solution given a current set of points?
Since I'm also trying to learn more about SA, what makes a good neighbor-finding algorithm and how close to the current solution should the neighbor be? Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
I would split your question into several smaller:
Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
Usually, you pick multiple points from a neighborhood, and you can explore all of them. For example, you generate 10 points randomly and choose the best one. By doing so you can efficiently explore more possible solutions.
Why is it better than a random guess? Good solutions tend to have a lot in common (e.g. they are close to each other in a search space). So by introducing small incremental changes, you would be able to find a good solution, while random guess could send you to completely different part of a search space and you'll never find an appropriate solution. And because of the curse of dimensionality random jumps are not better than brute force - there will be too many places to jump.
What might be a good approach to finding a neighboring solution given a current set of points?
I regret to tell you, that this question seems to be unsolvable in general. :( It's a mix between art and science. Choosing a right way to explore a search space is too problem specific. Even for solving a placement problem under varying constraints different heuristics may lead to completely different results.
You can try following:
Random shifts by fixed amount of steps (1,2...). That's your approach
Swapping two points
You can memorize bad moves for some time (something similar to tabu search), so you will use only 'good' ones next 100 steps
Use a greedy approach to generate a suboptimal placement, then improve it with methods above.
Try random restarts. At some stage, drop all of your progress so far (except for the best solution so far), raise a temperature and start again from a random initial point. You can do this each 10000 steps or something similar
Fix some points. Put an object at point (x,y) and do not move it at all, try searching for the best possible solution under this constraint.
Prohibit some combinations of objects, e.g. "distance between p1 and p2 must be larger than D".
Mix all steps above in different ways
Try to understand your problem in all tiniest details. You can derive some useful information/constraints/insights from your problem description. Assume that you can't solve placement problem in general, so try to reduce it to a more specific (== simpler, == with smaller search space) problem.
I would say that the last bullet is the most important. Look closely to your problem, consider its practical aspects only. For example, a size of your problems might allow you to enumerate something, or, maybe, some placements are not possible for you and so on and so forth. THere is no way for SA to derive such domain-specific knowledge by itself, so help it!
How to understand that your heuristic is a good one? Only by practical evaluation. Prepare a decent set of tests with obvious/well-known answers and try different approaches. Use well-known benchmarks if there are any of them.
I hope that this is helpful. :)

Simple k-nearest-neighbor algorithm for euclidean data with variable density?

An elaboration on this question, but with more constraints.
The idea is the same, to find a simple, fast algorithm for k-nearest-neighbors in 2 euclidean dimensions. The bucketing grid seems to work nicely if you can find a grid size that will suitably partition your data. However, what if the data is not uniformly distributed, but has areas with both very high and very low density (for example, the US population), so that no fixed grid size could guarantee both enough neighbors and efficiency? Can this method still be salvaged?
If not, other suggestions would be helpful, though I hope for answers less complex than moving to kd-trees, etc.
If you don't have too many elements, just compare each with all the others. This can be a lot faster than you'd think; today's machines are fast. Unfortunately, the square factor will catch you sooner or later; I figure a linear search of a million objects won't take tooo long, so you may be okay with up to 1000 elements. Using a grid, or even stripes, might boost that number substantially.
But I think you're stuck with a quadtree (a specific form of k-d tree). Your whole map is one block, which can contain four subblocks (upper left, upper right, lower left, lower right). When a block fills up with more elements than you want to do a linear search on, break it into smaller ones and transfer the elements. (Only leaf nodes have elements.) It's easy to search within a given radius of a given point. Start at the top and if a part of a block is within range of the point, check out it's subblocks the same way if it has them. If it doesn't, check its elements.
(When searching for "closest", take care. The square grid means a nearer object might be in a farther block. You have to get everything within a given radius, then check 'em all. If you want the 10 closest and your radius of 20 only picked up 5, you need to try a larger radius. You may have a rejected item that proved to be 30 away and think you should grab it and a few others to make up your 10. However, there may be a few items at 25 away whose whole blocks were rejected, and you want them instead. There ought to be a better solution for this, but I haven't figured it out yet. I just make a guess at the radius and double it till I get enough.)
Quadtrees are fun. If you can set up your data and then access it, it's easy. The problems come when your mapped elements appear, disappear, and move while you are trying to figure out who's near what.
Have you looked at this?
kd-trees are quite simple to implement, there are standard java/c implementations.
You may want to post your question here:

Is there an efficient algorithm to generate random points in general position in the plane?

I need to generate n random points in general position in the plane, i.e. no three points can lie on a same line. Points should have coordinates that are integers and lie inside a fixed square m x m. What would be the best algorithm to solve such a problem?
Update: square is aligned with the axes.
Since they're integers within a square, treat them as points in a bitmap. When you add a point after the first, use Bresenham's algorithm to paint all pixels on each of the lines going through the new point and one of the old ones. When you need to add a new point, get a random location and check if it's clear; otherwise, try again. Since each pair of pixels gives a new line, and thus excludes up to m-2 other pixels, as the number of points grows you will have several random choices rejected before you find a good one. The advantage of the approach I'm suggesting is that you only pay the cost of going through all lines when you have a good choice, while rejecting a bad one is a very quick test.
(if you want to use a different definition of line, just replace Bresenham's with the appropriate algorithm)
Can't see any way around checking each point as you add it, either by (a) running through all of the possible lines it could be on, or (b) eliminating conflicting points as you go along to reduce the possible locations for the next point. Of the two, (b) seems like it could give you better performance.
Similar to #LaC's answer. If memory is not a problem, you could do it like this:
Add all points on the plane to a list (L).
Shuffle the list.
For each point (P) in the list,
For each point (Q) previously picked,
Remove every point from L which are linear to P-Q.
Add P to the picked list.
You could continue the outer loop until you have enough points, or run out of them.
This might just work (though might be a little constrained on being random). Find the largest circle you can draw within the square (this seems very doable). Pick any n points on the circle, no three will ever be collinear :-).
This should be an easy enough task in code. Say the circle is centered at origin (so something of the form x^2 + y^2 = r^2). Assuming r is fixed and x randomly generated, you can solve to find y coordinates. This gives you two points on the circle for every x which are diametrically opposite. Hope this helps.
Edit: Oh, integer points, just noticed that. Thats a pity. I'm going to keep this solution up though - since I like the idea
Both #LaC's and #MizardX's solution are very interesting, but you can combine them to get even better solution.
The problem with #LaC's solution is that you get random choices rejected. The more points you have already generated the harder it gets to generate new ones. If there is only one available position left you have slight chance of randomly choosing it (1/(n*m)).
In the #MizardX's solution you never get rejected choices, however if you directly implement the "Remove every point from L which are linear to P-Q." step you'll get worse complexity (O(n^5)).
Instead it would be better to use a bitmap to find which points from L are to be removed. The bitmap would contain a value indicating whether a point is free to use and what is its location on the L list or a value indicating that this point is already crossed out. This way you get worst-case complexity of O(n^4) which is probably optimal.
I've just found that question: Generate Non-Degenerate Point Set in 2D - C++
It's very similar to this one. It would be good to use solution from this answer Generate Non-Degenerate Point Set in 2D - C++. Modifying it a bit to use radix or bucket sort and adding all the n^2 possible points to the P set initially and shufflying it, one can also get worst-case complexity of O(n^4) with a much simpler code. Moreover, if space is a problem and #LaC's solution is not feasible due to space requirements, then this algorithm will just fit in without modifications and offer a decent complexity.
Here is a paper that can maybe solve your problem:
um, you don't specify which plane.. but just generate 3 random numbers and assign to x,y, and z
if 'the plane' is arbitrary, then set z=o every time or something...
do a check on x and y to see if they are in your m boundary,
compare the third x,y pair to see if it is on the same line as the first two... if it is, then regenerate the random values.

Retrieve set of rectangles containing a specified point

I can't figure out how to implement this in a performing way, so I decided to ask you guys.
I have a list of rectangles - actually atm only squares, but I might have to migrate to rectangles later, so let's stick to them and keep it a bit more general - in a 2 dimensional space. Each rectangle is specified by two points, rectangles can overlap and I don't care all too much about setup time, because the rectangles are basicly static and there's some room for precalculate any setup stuff (like building trees, sorting, precalculating additional vectors, whatever etc). Oh I am developing in JavaScript if this is of any concern.
To my actual question: given a point, how do I get a set of all rectangles that include that point?
Linear approaches do not perform well enough. So I look for something that performs better than O(n). I read some stuff, like on Bounding Volume Hierarchies and similar things, but whatever I tried the fact that rectangles can overlap (and I actually want to get all of them, if the point lies within multiple rectangles) seems to always get into my way.
Are there any suggestions? Have I missed something obvious? Are BVH even applicable to possibly overlapping bounds? If so, how do I build such a possibly overlapping tree? If not, what else could I use? It is of no concern to me if borders are inside, outside or not determined.
If someone could come up with anything helpfull like a link or a rant on how stupid I am to use BVH and not Some_Super_Cool_Structure_Perfectly_Suited_For_My_Problem I'd really appreciate it!
Edit: Ok, I played around a bit with R-Trees and this is exactly what I was looking for. Infact I am currently using the RTree implementation as suggested by endy_c. It performs really well and fullfills my requirements entirely. Thanks alot for your support guys!
You could look at R-Trees
Java code
there's also a wiki, but can only post one link ;-)
You can divide the space into grid, and for each grid cell have a list of rectangles (or rectangle identifiers) that exist at least partially in that grid. Search for rectangles only in corresponding grid's cell. The complexity should be O(sqrt(n)).
Another approach is to maintain four sorted arrays of x1,y1,x2,y2 values, and binary search your point within those 4 arrays. The result of each search is a set of rectangle candidates, and the final result is intersection of those 4 sets. Depending on how set intersection is implemented this should be efficient than O(n).
