I have a set of N positive numbers, and a rectangle of dimensions X and Y that I need to partition into N smaller rectangles such that:
the surface area of each smaller rectangle is proportional to its corresponding number in the given set
all space of big rectangle is occupied and there is no leftover space between smaller rectangles
each small rectangle should be shaped as close to square as feasible
the execution time should be reasonably small
I need directions on this. Do you know of such an algorithm described on the web? Do you have any ideas (pseudo-code is fine)?
Thanks.
What you describe sounds like a treemap:
Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node's rectangle has an area proportional to a specified dimension on the data.
That Wikipedia page links to a page by Ben Shneiderman, which gives a nice overview and links to Java implementations:
Then while puzzling about this in the faculty lounge, I had the Aha! experience of splitting the screen into rectangles in alternating horizontal and vertical directions as you traverse down the levels. This recursive algorithm seemed attractive, but it took me a few days to convince myself that it would always work and to write a six line algorithm.
Wikipedia also to "Squarified Treemaps" by Mark Bruls, Kees Huizing and Jarke J. van Wijk (PDF) that presents one possible algorithm:
How can we tesselate a rectangle recursively into rectangles, such that their aspect-ratios (e.g. max(height/width, width/height)) approach 1 as close as possible? The number of all possible tesselations is very large. This problem falls in the category of NP-hard problems. However, for our application we do not need the optimal solution, a good solution
that can be computed in short time is required.
You do not mention any recursion in the question, so your situation might be just one level of the treemap; but since the algorithms work on one level at a time, this should be no problem.
I have been working on something similar. I'm prioritizing simplicity over getting as similar aspect ratios as possible. This should (in theory) work. Tested it on paper for some values of N between 1 and 10.
N = total number of rects to create,
Q = max(width, height) / min(width, height),
R = N / Q
If Q > N/2, split the rect in N parts along its longest side.
If Q <= N/2, split the rect in R (rounded int) parts along its shortest side.
Then split the subrects in N/R (rounded down int) parts along its shortest side.
Subtract the rounded down value from the result of the next subrects division. Repeat for all subrects or until the required number of rects are created.
Related
Suppose we have n points in a bounded region of the plane. The problem is to divide it in 4 regions (with a horizontal and a vertical line) such that the sum of a metric in each region is minimized.
The metric can be for example, the sum of the distances between the points in each region ; or any other measure about the spreadness of the points. See the figure below.
I don't know if any clustering algorithm might help me tackle this problem, or if for instance it can be formulated as a simple optimization problem. Where the decision variables are the "axes".
I believe this can be formulated as a MIP (Mixed Integer Programming) problem.
Lets introduce 4 quadrants A,B,C,D. A is right,upper, B is right,lower, etc. Then define a binary variable
delta(i,k) = 1 if point i is in quadrant k
0 otherwise
and continuous variables
Lx, Ly : coordinates of the lines
Obviously we have:
sum(k, delta(i,k)) = 1
xlo <= Lx <= xup
ylo <= Ly <= yup
where xlo,xup are the minimum and maximum x-coordinate. Next we need to implement implications like:
delta(i,'A') = 1 ==> x(i)>=Lx and y(i)>=Ly
delta(i,'B') = 1 ==> x(i)>=Lx and y(i)<=Ly
delta(i,'C') = 1 ==> x(i)<=Lx and y(i)<=Ly
delta(i,'D') = 1 ==> x(i)<=Lx and y(i)>=Ly
These can be handled by so-called indicator constraints or written as linear inequalities, e.g.
x(i) <= Lx + (delta(i,'A')+delta(i,'B'))*(xup-xlo)
Similar for the others. Finally the objective is
min sum((i,j,k), delta(i,k)*delta(j,k)*d(i,j))
where d(i,j) is the distance between points i and j. This objective can be linearized as well.
After applying a few tricks, I could prove global optimality for 100 random points in about 40 seconds using Cplex. This approach is not really suited for large datasets (the computation time quickly increases when the number of points becomes large).
I suspect this cannot be shoe-horned into a convex problem. Also I am not sure this objective is really what you want. It will try to make all clusters about the same size (adding a point to a large cluster introduces lots of distances to be added to the objective; adding a point to a small cluster is cheap). May be an average distance for each cluster is a better measure (but that makes the linearization more difficult).
Note - probably incorrect. I will try and add another answer
The one dimensional version of minimising sums of squares of differences is convex. If you start with the line at the far left and move it to the right, each point crossed by the line stops accumulating differences with the points to its right and starts accumulating differences to the points to its left. As you follow this the differences to the left increase and the differences to the right decrease, so you get a monotonic decrease, possibly a single point that can be on either side of the line, and then a monotonic increase.
I believe that the one dimensional problem of clustering points on a line is convex, but I no longer believe that the problem of drawing a single vertical line in the best position is convex. I worry about sets of points that vary in y co-ordinate so that the left hand points are mostly high up, the right hand points are mostly low down, and the intermediate points alternate between high up and low down. If this is not convex, the part of the answer that tries to extend to two dimensions fails.
So for the one dimensional version of the problem you can pick any point and work out in time O(n) whether that point should be to the left or right of the best dividing line. So by binary chop you can find the best line in time O(n log n).
I don't know whether the two dimensional version is convex or not but you can try all possible positions for the horizontal line and, for each position, solve for the position of the vertical line using a similar approach as for the one dimensional problem (now you have the sum of two convex functions to worry about, but this is still convex, so that's OK). Therefore you solve at most O(n) one-dimensional problems, giving cost O(n^2 log n).
If the points aren't very strangely distributed, I would expect that you could save a lot of time by using the solution of the one dimensional problem at the previous iteration as a first estimate of the position of solution for the next iteration. Given a starting point x, you find out if this is to the left or right of the solution. If it is to the left of the solution, go 1, 2, 4, 8... steps away to find a point to the right of the solution and then run binary chop. Hopefully this two-stage chop is faster than starting a binary chop of the whole array from scratch.
Here's another attempt. Lay out a grid so that, except in the case of ties, each point is the only point in its column and the only point in its row. Assuming no ties in any direction, this grid has N rows, N columns, and N^2 cells. If there are ties the grid is smaller, which makes life easier.
Separating the cells with a horizontal and vertical line is pretty much picking out a cell of the grid and saying that cell is the cell just above and just to the right of where the lines cross, so there are roughly O(N^2) possible such divisions, and we can calculate the metric for each such division. I claim that when the metric is the sum of the squares of distances between points in a cluster the cost of this is pretty much a constant factor in an O(N^2) problem, so the whole cost of checking every possibility is O(N^2).
The metric within a rectangle formed by the dividing lines is SUM_i,j[ (X_i - X_j)^2 + (Y_i-Y_j)^2]. We can calculate the X contributions and the Y contributions separately. If you do some algebra (which is easier if you first subtract a constant so that everything sums to zero) you will find that the metric contribution from a co-ordinate is linear in the variance of that co-ordinate. So we want to calculate the variances of the X and Y co-ordinates within the rectangles formed by each division. https://en.wikipedia.org/wiki/Algebraic_formula_for_the_variance gives us an identity which tells us that we can work out the variance given SUM_i Xi and SUM_i Xi^2 for each rectangle (and the corresponding information for the y co-ordinate). This calculation can be inaccurate due to floating point rounding error, but I am going to ignore that here.
Given a value associated with each cell of a grid, we want to make it easy to work out the sum of those values within rectangles. We can create partial sums along each row, transforming 0 1 2 3 4 5 into 0 1 3 6 10 15, so that each cell in a row contains the sum of all the cells to its left and itself. If we take these values and do partial sums up each column, we have just worked out, for each cell, the sum of the rectangle whose top right corner lies in that cell and which extends to the bottom and left sides of the grid. These calculated values at the far right column give us the sum for all the cells on the same level as that cell and below it. If we subtract off the rectangles we know how to calculate we can find the value of a rectangle which lies at the right hand side of the grid and the bottom of the grid. Similar subtractions allow us to work out first the value of the rectangles to the left and right of any vertical line we choose, and then to complete our set of four rectangles formed by two lines crossing by any cell in the grid. The expensive part of this is working out the partial sums, but we only have to do that once, and it costs only O(N^2). The subtractions and lookups used to work out any particular metric have only a constant cost. We have to do one for each of O(N^2) cells, but that is still only O(N^2).
(So we can find the best clustering in O(N^2) time by working out the metrics associated with all possible clusterings in O(N^2) time and choosing the best).
I have different dimension of small rectangles (1cm x 2xm, 2cmx3cm, 4cm*6cm etc). The number of different type rectangles may vary depending on case. Each type of different rectangles may have different number of counts.
I need to create a big rectangle with all these small rectangles which these small rectangles can only be placed on the edges. no rotations. The final outer rectangle should ideally be smiliar to a square shape. X ~Y. Not all edges need to be filled up. There can be gaps in between smaller rectangles. Picture Example:
http://i.stack.imgur.com/GqI5z.png
I am trying to write a code that finds out the minimum possible area that can be formed.
I have an algorithm that loop through all possible placement to find out the minimum area possible. But that takes a long run time as number of different type rectangles and number of rectangles increase. i.e. 2 type of rectangles, each has 100 + rectangles. 8 for loops. That will be ~100^8 iterations
Any ideas on better and faster algorithm to calculate the minimum possible area? code is in python, but any algorithm concept is fine.
for rectange_1_top_count in (range(0,all_rectangles[1]["count"]+1)):
for rectange_1_bottom_count in range(0,all_rectangles[1]["count"]-rectange_1_top_count+1):
for rectange_1_left_count in (range(0,all_rectangles[1]["count"]-rectange_1_top_count-rectange_1_bottom_count+1)):
for rectange_1_right_count in ([all_rectangles[1]["count"]-rectange_1_top_count-rectange_1_bottom_count-rectange_1_left_count]):
for rectange_2_top_count in (range(0,all_rectangles[2]["count"]+1)):
for rectange_2_bottom_count in (range(0,all_rectangles[2]["count"]-rectange_2_top_count+1)):
for rectange_2_left_count in (range(0,all_rectangles[2]["count"]-rectange_2_bottom_count-rectange_2_top_count+1)):
for rectange_2_right_count in [(all_rectangles[2]["count"]-rectange_2_bottom_count-rectange_2_left_count-rectange_2_top_count)]:
area=calculate_minimum_area()
if area< minimum_area:
minimum_area=area
This looks like an NP-hard problem, so there exists no simple and efficient algorithm. It doesn't mean that there is no good heuristic that you can use, but if you have many small rectangles, you won't find the optimal solution fast.
Why is it NP-hard? Let's assume all your rectangles have height 1 and you have on rectangle of height 2, then it would make sense to look for a solution with total height 2 (basically, you try to form two horizontal lines of height-1 rectangles with the same length). To figure out if such a solution exists, you would have to form two subsets of your small rectangles, both adding up to the same total width. This is called the partition problem and it is NP-complete. Even if there may be gaps and the total widths are not required to be the same, this is still an NP-hard problem. You can reduce the partition problem to your rectangle problem by converting the elements to partition into rectangles of height 1 as outlined above.
I'll wait for the answer to the questions I posted in the comments to your question and then think about it again.
For collision detection I'd like to turn a bitmap into a set of rectangles, using as few rectangles as possible. A more formal description of the problem is described in the title. An example:
For tie-breakers of multiple solutions I'd prefer it if the total area covered by all the rectangles combined was maximized. For example, the blue rectangle in the above picture could've been smaller, but that would've been a less optimal solution.
Is there a more common name for this problem? Any literature? Or a simple algorithm that gives an optimal solution?
This problem may be NP-hard, but if you want the highest quality solutions for instances not created by an NP-hardness reduction, then running an integer programming solver is worth a try. Even if running time is a concern, it may be useful to have a gold standard to compare against.
In essence you're trying to solve a special case of a problem called set cover. This is how set cover can be formulated as an integer program.
minimize sum_{white rectangles R} x_R
subject to
for all white points p, sum_{white rectangles R such that p in R} x_R >= 1
for all white rectangles R, x_R in {0, 1}
All you have to do is write code to construct the specific instance of this integer program corresponding to its input, call the solver, get the results back, and then do one more optimization with the optimal number of rectangles (k) known.
maximize sum_{white rectangles R} area(R) x_R
subject to
for all white points p, sum_{white rectangles R such that p in R} x_R >= 1
for all white rectangles R, x_R in {0, 1}
sum_{white rectangles R} x_R <= k
If the instances are large, then you may need to do some preprocessing (the solvers typically can do this as well, but they have to use algorithms for a more general problem, which may not be as efficient). First, use only the white rectangles that are maximal, that is, are not contained in a larger white rectangle. There probably are clever algorithms for enumerating them, but you should implement the naive one and benchmark the whole system first. Second, use only some of the points. In particular, if p and q are distinct points, and p belongs to every maximal rectangle to which q belongs, then tracking p is superfluous.
I suggest simply starting at an external corner which is not yet covered by a rectangle, and greedily growing that rectangle. Repeat until everything's covered. I don't think this gives you the tie-breaker property you're looking for on a global basis (since you may have multiple options for how to greedily grow each rectangle), but it does on a local basis.
I managed to solve the problem in a way that was good enough - it's probably not optimal though.
Make a 2D array with the dimensions of the bitmap. For every pixel in the bitmap that's black make the corresponding element WALL, otherwise EMPTY_SPACE.
Scan the array left-right, top to bottom for the first EMPTY_SPACE. Save this coordinate.
Create a rectangle of area 1 with the topleft coordinate set to the coordinate found at 2, extending 1 downwards and to the right.
Horizontally extend the rectangle to the left and to the right as long as it doesn't cover any WALL.
Vertically extend the rectangle up and down as long as it doesn't cover any WALL.
Mark any element covered by the rectangle as a COVERED_SPACE and add the rectangle to the set of rectangles.
If there is still an element containing EMPTY_SPACE left goto 2, otherwise you're done.
I'm not sure if there's an algorithm that can solve this.
A given number of rectangles are placed side by side horizontally from left to right to form a shape. You are given the width and height of each.
How would you determine the minimum number of rectangles needed to cover the whole shape?
i.e How would you redraw this shape using as few rectangles as possible?
I've can only think about trying to squeeze as many big rectangles as i can but that seems inefficient.
Any ideas?
Edit:
You are given a number n , and then n sizes:
2
1 3
2 5
The above would have two rectangles of sizes 1x3 and 2x5 next to each other.
I'm wondering how many rectangles would i least need to recreate that shape given rectangles cannot overlap.
Since your rectangles are well aligned, it makes the problem easier. You can simply create rectangles from the bottom up. Each time you do that, it creates new shapes to check. The good thing is, all your new shapes will also be base-aligned, and you can just repeat as necessary.
First, you want to find the minimum height rectangle. Make a rectangle that height, with the width as total width for the shape. Cut that much off the bottom of the shape.
You'll be left with multiple shapes. For each one, do the same thing.
Finding the minimum height rectangle should be O(n). Since you do that for each group, worst case is all different heights. Totals out to O(n2).
For example:
In the image, the minimum for each shape is highlighted green. The resulting rectangle is blue, to the right. The total number of rectangles needed is the total number of blue ones in the image, 7.
Note that I'm explaining this as if these were physical rectangles. In code, you can completely do away with the width, since it doesn't matter in the least unless you want to output the rectangles rather than just counting how many it takes.
You can also reduce the "make a rectangle and cut it from the shape" to simply subtracting the height from each rectangle that makes up that shape/subshape. Each contiguous section of shapes with +ve height after doing so will make up a new subshape.
If you look for an overview on algorithms for the general problem, Rectangular Decomposition of Binary Images (article by Tomas Suk, Cyril Höschl, and Jan Flusser) might be helpful. It compares different approaches: row methods, quadtree, largest inscribed block, transformation- and graph-based methods.
A juicy figure (from page 11) as an appetizer:
Figure 5: (a) The binary convolution kernel used in the experiment. (b) Its 10 blocks of GBD decomposition.
How many squares of size a×a can be packed into a circle of radius R?
I don't need a solution. I just need some kind of a starting idea.
I apologise for writing such a long answer. My approach is to start with a theoretical maximum and a guaranteed minimum. When you approach the problem, you can use these values to determine how good any algorithm you use is. If you can think of a better minimum then you can use that instead.
We can define an upper limit for the problem by simply using the area of the circle
Upper Limit = floor( (PI * (r pow 2)) / (L * L) )
Where L is the width or height of the squares you are packing and r is the radius of the circle you are packing the squares into. We are sure this is an upper limit because a) we must have a discrete number of boxes and b) we cannot take up more space than the area of the circle. (A formal proof would work somewhere along the lines of assume we had one more box than this, then the sum of the area of the boxes would be greater than the area of the circle).
So with an upper limit in hand, we can now take any solution that exists for all circles and call it a minimum solution.
So, let's consider a solution that exists for all circles by taking a look at the largest square we can fit inside the circle.
The largest square you can fit inside the circle has 4 points on the perimiter, and has a width and length of sqrt(2) * radius (by using pythagoras' theorem and using the radius for the length of the shorter edges)
So the first thing we note is that if sqrt(2) * radius is less than the dimension of your squares, then you cannot fit any squares in the circle, because afterall, this is the largest square you can fit.
Now we can do a straightforward computation to divide this large square into a regular grid of squares using the L you specified, which will give us at least one solution to the problem. So you have a grid of sqaures inside this maximum square. The number of squares you can fit into one row of this this grid is
floor((sqrt(2) * radius)/ L)
And so this minimum solution asserts that you can have at least
Lower Limit = floor((sqrt(2) * radius)/ L) pow 2
number of squares inside the circle.
So in case you got lost, all I did was take the largest square I could fit inside the circle and then pack as many squares as possible into a regular grid inside that, to give me at least one solution.
If you get an answer at 0 for this stage then you cannot fit any squares inside the circle.
Now armed with a theoreitical maximum and an absolute minimum, you can start trying any sort of heuristic algorithm you like for packing squares. A simple algorithm would be to just split the circle up into rows and fit as many sqaures as you can into each row. You can then take this minimum as a guideline to ensure that you came up with a better solution. If you want to spend more processing power looking for a better solution, you use the theoretical as a guideline for how close you are to the theoretical best.
And if you care about this, you could work out what the maximum and minimum theoretical percentage of cover the minimum algorithm I idenfied gives you. The largest square always covers a fixed ratio (pi/4 or about 78.5% of the internal area of the circle I think). So the maximum theoretical minimum is 78.5% cover. And the minimum non-trivial (ie. non zero) theoretical minimum is when you can only fit 1 square inside the largest square, which happens when the squares you are packing are 1 larger than half the width and height of the largest square you can fit in the circle. Basically you take up just over 25% of the inner square with 1 packed square, which means you get an approximate cover of about 20%
Rasterise the circle using something like the midpoint circle algorithm. The number of filled pixels is the number of squares you can fit in the circle. Of course, since you're not actually filling the pixels, just counting them, this should take time proportional to the circumference of the circle, not its area.
You'll have to pick the radius for rasterisation carefully, so that you only count pixels that are strictly inside the circle.
Edit: This may not be exactly correct, as it's possible that applying a sub-pixel offset to the grid could change the result. I'll leave the answer here as it may provide a useful starting point for an exact solution.
You can pack as many squares as you like into a circle. If you doubt this statement, draw a large circle on a piece of paper, then draw a square with side length 10^(-18)m inside it, repeat. When you get near to the boundary of the circle, start drawing squares with side length of 10^(-21)m.
So your first step must be to refine your question and state your problem more accurately.
Just a shot in the dark after a few minutes of thought...
What if you worked with half the circle and doubled it at the end. I would start with a grid of squares the length of the diameter and the width of the radius, essentially blanketing the semi-circle. Then check all 4 corners of each square and make sure their coordinates are within the radius of the circle. This would of course require that you plot the circle and squares on some sort of coordinate system or grid.
I hope this makes sense... It's in my head and it seems a bit difficult to articulate :)
EDIT:
After drawing it out, I think this method would work with a little tweaking. I would line up the squares along the diameter, but slide the first one down until it fits. Set that one in place and start lining up squares next to it until they no longer fit. Move out to the edge of this line of squares and repeat the same steps until your rows of squares reach the radius.