Choice of optimization algorithm for distributing lines inside a shape

Choice of optimization algorithm for distributing lines inside a shape - algorithm

Consider the follow representation of a concrete slab element with reinforcement bars and holes.
I need an algorithm that automatically distributes lines over an arbitrary shape with different holes.
The main constraints are:
Lines cannot be outside of the region or inside a hole
The distance between two side-by-side lines cannot exceed a variable D
Lines have to be positioned on a fixed interval I, i.e. y mod I = 0, where y is the line's Y coordinate.
Each available point inside the shape cannot be further from a line than D/2
I want to optimize the solution by minimizing the total number of lines N. What kind of optimization algorithm would suit this problem?
I assume most approaches involves simplifying the shape into a raster (with pixel height I) and disable or enable each pixel. I thought this was an obvious LP problem and tried to set it up with GLPK, but found it very hard to describe the problem using this simplified raster for an arbitrary number of lines. I also suspect that the solution space might be too big.
I have already implemented an algorithm in C# that does the job, but not very optimized. This is how it works:
Create a simplified raster of the geometry
Calculate a score for each cell using a complicated formula that takes possible line length and distances to other rods and obstacles into account.
Determine which needs reinforcement (where number of free cells in y direction > D)
Pick the cell with the highest score that needs reinforcement, and reinforce it as far as possible in -x and +x directions
Repeat
Depending on the complicated formula, this works pretty well but starts giving unwanted results when putting the last few lines, since it can never move an already put line.
Are there any other optimization techniques that I should have a look at?

I'm not sure what follows is what you want - I'm fairly sure it's not what you had in mind - but if it sounds reasonable you might try it out.
Because the distance is simply at most d, and can be anything less than that, it seems at first blush like a greedy algorithm should work here. Always place the next line(s) so that (1) as few as possible are needed and (2) they are as far away from existing lines as possible.
Assume you have an optimal algorithm for this problem, and it places the next line(s) at distance a <= d from the last line. Say it places b lines. Our greedy algorithm will definitely place no more than b lines (since the first criterion is to place as few as possible), and if it places b lines it will place them at distance c with a <= c <= d, since it then places the lines as far as possible.
If the greedy algorithm did not do what the optimal algorithm did, it differed in one of the following ways:
It placed the same or fewer lines farther away. Suppose the optimal algorithm had gone on to place b' lines at distance a' away at the next step. Then these lines would be at distance a+a' and there would be b+b' lines in total. But the greedy algorithm can mimic the optimal algorithm in this case by placing b' lines at displacement a+a' by choosing c' = (a+a') - c. Since c > a and a' < d, c' < d and this is a legal placement.
It placed fewer lines closer together. This case is actually problematic. It is possible that this places k unnecessary lines, if any placement requires at least k lines and the farthest ones require more, and the arrangement of holes is chosen so that (e.g.) the distance it spans is a multiple of d.
So the greedy algorithm turns out not to work in case 2. However, it does in other cases. In particular, our observation in the first case is very useful: for any two placements (distance, lines) and (distance', lines'), if distance >= distance' and lines <= lines', the first placement is always to be preferred. This suggests the following algorithm:
PlaceLines(start, stop)
// if we are close enough to the other edge,
// don't place any more lines.
if start + d >= stop then return ([], 0)
// see how many lines we can place at distance
// d from the last placed lines. no need to
// ever place more lines than this
nmax = min_lines_at_distance(start + d)
// see how that selection pans out by recursively
// seeing how line placement works after choosing
// nmax lines at distance d from the last lines.
optimal = PlaceLines(start + d, stop)
optimal[0] = [d] . optimal[0]
optimal[1] = nmax + optimal[1]
// we only need to try fewer lines, never more
for n = 1 to nmax do
// find the max displacement a from the last placed
// lines where we can place n lines.
a = max_distance_for_lines(start, stop, n)
if a is undefined then continue
// see how that choice pans out by placing
// the rest of the lines
candidate = PlaceLines(start + a, stop)
candidate[0] = [a] . candidate[0]
candidate[1] = n + candidate[1]
// replace the last best placement with the
// one we just tried, if it turned out to be
// better than the last
if candidate[1] < optimal[1] then
optimal = candidate
// return the best placement we found
return optimal
This can be improved by memoization by putting results (seq, lines) into a cache indexed by (start, stop). That way, we can recognize when we are trying to compute assignments that may already have been evaluated. I would expect that we'd have this case a lot, regardless of whether you use a coarse or a fine discretization for problem instances.
I don't get into details about how max_lines_at_distance and max_distance_for_lines functions might work, but maybe a word on these.
The first tells you at a given displacement how many lines are required to span the geometry. If you have pixelated your geometry and colored holes black, this would mean looking at the row of cells at the indicated displacement, considering the contiguous black line segments, and determining from there how many lines that implies.
The second tells you, for a given candidate number of lines, the maximum distance from the current position at which that number of lines can be placed. You could make this better by having it tell you the maximum distance at which that number of lines, or fewer, could be placed. If you use this improvement, you could reverse the direction in which you're iterating n and:
if f(start, stop, x) = a and y < x, you only need to search up to a, not stop, from then on;
if f(start, stop, x) is undefined and y < x, you don't need to search any more.
Note that this function can be undefined if it is impossible to place n or fewer lines anywhere between start and stop.
Note also that you can memorize separately for these functions to save repeated lookups. You can precompute max_lines_at_distance for each row and store it in a cache for later. Then, max_distance_for_lines could be a loop that checks the cache back to front inside two bounds.

Related

Finding closest pair of points in the plane with non-distinct x-coordinates in O(nlogn)

Most of the implementations of the algorithm to find the closest pair of points in the plane that I've seen online have one of two deficiencies: either they fail to meet an O(nlogn) runtime, or they fail to accommodate the case where some points share an x-coordinate. Is a hash map (or equivalent) required to solve this problem optimally?
Roughly, the algorithm in question is (per CLRS Ch. 33.4):
For an array of points P, create additional arrays X and Y such that X contains all points in P, sorted by x-coordinate and Y contains all points in P, sorted by y-coordinate.
Divide the points in half - drop a vertical line so that you split X into two arrays, XL and XR, and divide Y similarly, so that YL contains all points left of the line and YR contains all points right of the line, both sorted by y-coordinate.
Make recursive calls for each half, passing XL and YL to one and XR and YR to the other, and finding the minimum distance, d in each of those halves.
Lastly, determine if there's a pair with one point on the left and one point on the right of the dividing line with distance smaller than d; through a geometric argument, we find that we can adopt the strategy of just searching through the next 7 points for every point within distance d of the dividing line, meaning the recombination of the divided subproblems is only an O(n) step (even if it looks n2 at first glance).
This has some tricky edge cases. One way people deal with this is sorting the strip of points of distance d from the dividing line at every recombination step (e.g. here), but this is known to result in an O(nlog2n) solution.
Another way people deal with edge cases is by assuming each point has a distinct x-coordinate (e.g. here): note the snippet in closestUtil which adds to Pyl (or YL as we call it) if the x-coordinate of a point in Y is <= the line, or to Pyr (YR) otherwise. Note that if all points lie on the same vertical line, this would result us writing past the end of the array in C++, as we write all n points to YL.
So the tricky bit when points can have the same x-coordinate is dividing the points in Y into YL and YR depending on whether a point p in Y is in XL or XR. The pseudocode in CLRS for this is (edited slightly for brevity):
for i = 1 to Y.length
if Y[i] in X_L
Y_L.length = Y_L.length + 1;
Y_L[Y_L.length] = Y[i]
else Y_R.length = Y_R.length + 1;
Y_R[Y_R.length] = Y[i]
However, absent of pseudocode, if we're working with plain arrays, we don't have a magic function that can determine whether Y[i] is in X_L in O(1) time. If we're assured that all x-coordinates are distinct, sure - we know that anything with an x-coordinate less than the dividing line is in XL, so with one comparison we know what array to partition any point p in Y into. But in the case where x-coordinates are not necessarily distinct (e.g. in the case where they all lie on the same vertical line), do we require a hash map to determine whether a point in Y is in XL or XR and successfully break down Y into YL and YR in O(n) time? Or is there another strategy?

Yes, there are at least two approaches that work here.
The first, as Bing Wang suggests, is to apply a rotation. If the angle is sufficiently small, this amounts to breaking ties by y coordinate after comparing by x, no other math needed.
The second is to adjust the algorithm on G4G to use a linear-time partitioning algorithm to divide the instance, and a linear-time sorted merge to conquer it. Presumably this was not done because the author valued the simplicity of sorting relative to the previously mentioned algorithms in most programming languages.

Tardos & Kleinberg suggests annotating each point with its position (index) in X.
You could do this in N time, or, if you really, really want to, you could do it "for free" in the sorting operation.
With this annotation, you could do your O(1) partitioning, and then take the position pr of the right-most point in Xl in O(1), using it to determine weather a point in Y goes in Yl (position <= pr), or Yr (position > pr). This does not require an extra data structure like a hash map, but it does require that those same positions are used in X and Y.
NB:
It is not immediately obvious to me that the partitioning of Y is the only problem that arises when multiple points have the same coordinate on the x-axis. It seems to me that the proof of linearity of the comparisons neccesary across partitions breaks, but I have seen only the proof that you need only 15 comparisons, not the proof for the stricter 7-point version, so i cannot be sure.

Fair division of a kingdom [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Recently, I've attended programming competition. There was a problem from it that I am still mulling over. Programming language does not matter, but I've wrote it in C++. The task was this:
As you already know, Flatland is located on the plane. There are n
cities in Flatland, i-th of these cities is located at the point (xi,
yi). There are ai citizens living in i-th city. The king of
Flatland has decided to divide the kingdom between his two sons. He
wants to build a wall in the form of infinite straight line; each of
the parts will be ruled by one of the sons. The wall cannot pass
through any city. To avoid envy between brothers, the populations of
two parts must be as close as possible; formally, if a and b are
the total number of citizens living in cities of the first and the
second part respectively, the value of |a - b| must be minimized.
Help the king to find the optimal division. Number of cities is less
than 1000. And all coordinates are integers. Output of algorithm
should be integer number of minimal |a-b|
Okay, if I knew the direction of line, it will be really easy task - binary search:
I don't want code, I want ideas because I don't have any. If I catch idea I can write code!
I don't know optimal direction, but I think it could be found somehow. So could it be found or is this task solved other way?
An example where the horizontal/vertical line is not optimal:
1
\
\
2 \ 1

The Ansatz
A brute force method would be to check all possible division...
First it should be noted, that the exact orientation of the line does not matter. It can always be shifted by small amounts and there are cases with more than one minimum. What matters it what cities go to which side of the kingdom. Even when simply trying all such possible combinations, it is not trivial to find them. To do so, I propose the following algorithm:
How to find all possible divisions
For each pair of cities x and y, the line connecting them, divides the kingdom in "left" and "right". Then consider all possible combinations of left, right, x and y:
left + x + y vs right (C)
left + x vs right + y (A)
left + y vs right + x (D)
left vs right + x + y (B)
Actually I am not 100% sure but I think in this way you can find all possible division with a finite number of trials. As the cities have no size (I assumed 0 radius), the line connecting x and y can be shifted slightly to include either city on either side of the kindom.
One counter example where this simple method will definitely fail is when more than 2 cities lie on a straight line
Example
This picture illustrates one step of my above algorithm for the example from the OP. x and y are the two cities with 1 inhabitants. Actually with this pair of cities one gets already all possible divisions. (However 3 points is trivial anyhow, as there is no geometrical restriction on what combinations are possible. Interestingly only starting with 4 points their location on the plane really matters.)
Colinear points
Following some discussion and fruitful comments, I came to the conclusion that colinear points are not really a problem. One just has to consider these points when evaluating the 4 possible divisions (for each pair of points). E.g. assume in the above example is another point at (-1,2). Then this point would lie on the left for A and C and on the right for B and D.

For each angle A, consider the family of parallel lines which make an angle of A with the x-axis, with special case A=0 corresponding to the family of lines parallel to the X-axis.
Given A, you can use a binary search to find the line in the family which divides the kingdom most nearly equally. So we have a function f from angles to integers, mapping each angle A to the minimum value of |a-b| for lines in the family corresponding to A.
How many angles do we need to try? The situation changes materially only when A is an angle corresponding to a line between two points, an angle which I will call a "jump angle". The function is continuous, and therefore constant, away from jump angles. We have to try jump angles, of which there are about n choose 2, approximately 500,000 at most. We also have to try intervals of angles between jump angles, doubling the size, to 1,000,000 at most.
Instead of angles, it's probably more sensible to use slopes. I just like thinking in terms of angles.
The time complexity of this approach is O(n^2 log n), n^2 for the number of angles, log n for the binary search. If we can learn more about the function f, it may be possible to use a faster method to minimize f than checking every possibility. For example, it seems reasonable that the minimum of f can be found at an angle not equal to a jump angle.
It may also be possible to eliminate the binary search by using the centroid of the cities. We calculate the weighted average
(a1(x1,y1) + a2(x2,y2) + ... + an(xn,yn))/(a1+a2+...+an)
I think that lines balancing the population will pass through that point. (Hmm.) If that's the case, we only have to think about angles.

Case where n is less than 3
The base case is where there are two cities: in which case you simple take the perpendicular line on the line that connects the two cities.
Case with three or more cities
You can discretize the tangent by taking every pair of two cities, and see the line that connects them as the direction of the infinite line.
Why this works
If you split the number of cities in two parts, there is at least one half with two or more cities. For that part, there are two points that are the closest to the border. Whether the border passes "very closely" to that line or has the same line does not matter; because a "slightly different tangent" will not swap any city (otherwise these cities were not the closest). Since we try "every border", we will eventually generate a border with the given tangent.
Example:
Say you have the following scenario:
1
\
2\ 1
With the numbers showing the values. In this case the two closest points at the border are the one at the top and the right. So we construct a line that points 45 degrees downwards. Now we use binary search to find the most optimal split: we rotate all points, order them by ascending rotated x-value, then perform binary search on the weights. The optimal one is to split it between the origin and the two other points.
Now with four points:
1 2
2 1
Here we will investigate the following lines:
\ 1\|/2 /
\ /|\ /
----+----
/ \|/ \
/ 2/|\1 \
And this will return either the horizontal or the vertical line.
There is a single possibility - as pointed out by #Nemo that all these points are lying on the same line. In such case there is no tangent that makes sense. In that case, one can use the perpendicular tangent as well.
Pseudocode:
for v in V
for w in V\{v}
calculate tangent
for tangent and perpendicular tangent
rotate all points such that the tangent is rotated to the y-axis
look for a rotated line in the y-direction that splits the cities optimal
return the best split found
Furthermore as nearly all geometrical approaches, this method can suffer from the fact that multiple dots are located on the same line in which case by adding a simple rotation one can either include/exclude one of the points. This is indeed a dirty hack to the problem.
This Haskell program calculates the "optimal direction" (if the above solution is correct) for a given list of points:
import Data.List
type Point = (Int,Int)
type WPoint = (Point,Int)
type Direction = Point
dirmul :: Direction -> WPoint -> Int
dirmul (dx,dy) ((xa,ya),_) = xa*dx+ya*dy
dirCompare :: Direction -> WPoint -> WPoint -> Ordering
dirCompare d pa pb = compare (dirmul d pa) (dirmul d pb)
optimalSplit :: [WPoint] -> Direction
optimalSplit pts = (-dy,dx)
where wsum = sum $ map snd pts
(dx,dy) = argmin (bestSplit pts wsum) $ concat [splits pa pb | pa <- pts, pb <- pts, pa /= pb]
splits :: WPoint -> WPoint -> [Direction]
splits ((xa,ya),_) ((xb,yb),_) = [(xb-xa,yb-ya),(ya-yb,xb-xa)]
bestSplit :: [WPoint] -> Int -> Direction -> Int
bestSplit pts wsum d = bestSplitScan cmp ordl 0 wsum
where cmp = dirCompare d
ordl = sortBy cmp pts
bestSplitScan :: ((a,Int) -> (a,Int) -> Ordering) -> [(a,Int)] -> Int -> Int -> Int
bestSplitScan _ [] l r = abs $ l-r
bestSplitScan cmp ((x1,w1):xs) l r = min (abs $ l-r) (bestSplitScan cmp (dropWhile eqf xs) (l+d) (r-d))
where eqf = (==) EQ . cmp (x1,w1)
d = w1+(sum $ map snd $ takeWhile eqf xs)
argmin :: (Ord b) => (a -> b) -> [a] -> a
argmin _ [x] = x
argmin f (x:xs) | (f x) <= f ax = x
| otherwise = ax
where ax = argmin f xs
For instance:
*Main> optimalSplit [((0,0),2),((0,1),1),((1,0),1)]
(-1,1)
*Main> optimalSplit [((0,0),2),((0,1),1),((1,0),1),((1,1),2)]
(-1,0)
So the direction is a line in which if the line moves one element to the left, it moves one element to the top as well. This is the first example. For the second case, it picks a line that moves in the x-direction so it splits horizontally. This algorithm allows only integral points and does not take into account slightly tweaking the line in case the points are placed on the same line: these are all in or all out for a line parallel.

[Edit: Bold-faced text is relevant to concerns expressed previously in comments.]
[Edit 2: As I should have pointed out earlier, this answer is a supplement to the earlier answer by tobi303, which gives a similar algorithm. The main purpose was to show that the basic idea of that algorithm is sound and sufficiently general.
Despite minor differences in the details of the algorithms proposed in the two answers, I think a careful reading of the "why it works" section, applied to either algorithm, will show that the algorithm is in fact complete.]
If all the cities are in one straight line
(including the case where there
are only one or two cities), then the solution is simple.
I assume you can detect and solve this case, so the rest of the
answer will deal with all other cases.
If there are more than two cities and they are not all collinear,
the "brute force" solution is:
for each city X,
for each city Y where Y is not X
construct a directed line that passes through X and then Y.
Divide the cities in two subsets:
S1 = all the cities to the left of this line
S2 = all the other cities (including cities exactly on the line)
Evaluate the "unfairness" of this division.
Among all subdivisions of cities found in this way,
choose the one with the least unfairness. Return the difference. Done.
Note that the line found in this way is not the line that divides the cities "fairly"; it is merely parallel to some such line.
If we had to find the actual dividing line we would have to do a little more work to figure out
exactly where to put that parallel line. But the requested return value
is merely |a-b|.
Why this works:
Suppose that the line L1 divides the cities in the fairest way possible.
There is not a unique line that does this;
there will be (mathematically speaking) an infinite number of lines
that achieve the same "best" division, but such lines exist, and
all we need to suppose is that L1 is one of those lines.
Let the city A be the closest to L1 on one side of the line
and the city B be closest to L1 on the other side.
(If A and B are not uniquely identified, that is if there are two or more
cities on one side of L1 that are tied for "closest to L1",
we can set L2 = L1 and skip forward to the procedure for L2, below.)
Consider rotations of L1 in each direction, using the point where L1 crosses
the line AB as a pivot point. In at least one direction of rotation,
a rotated image of L1 will "hit" one of the other cities,
call it C, without touching either A or B.
(This follows from the fact that the cities are not all along one line.)
At that point, C is closer to the image of L1 than A or B (whichever
of those cities is on the same side of the original L1 as C was).
The Mean Value Theorem of calculus tells us that at some point during
the rotation, C was exactly as close to the rotated image of L1
as the city A or B, whichever is on the same side of that line.
What this shows is that there is always a line L2 that divides the cities
as fairly as possible, such that there are two cities, D and E,
on the same side of L2 and tied for "closest city to L2" among all
cities on that side of L2.
Now consider two directed lines through D and E: L3, which passes through
D and then E, and L4, which passes through E and then D.
The cities that are on the other side of L2 than D and E consist either of
all the cities to the left of L3, or all the cities to the left of L4.
(Note that this works even if L3 and L4 happen
to pass through more than two cities.)
The procedure described before is simply a way to find all possible
lines that could be line L3 or line L4 in any execution of this
procedure starting from a line L1 that solves the problem.
(Note that while there are always infinite possible choices of L1,
every such L1 results in lines L3 and L4 selected from the finite set of
lines that pass through two or more cities.)
So the procedure will find the division of cities described by L1,
which is the solution to the problem.

"Squares" Logic Riddle Solution in Prolog

"Squares"
Input:
Board size m × n (m, n ∈ {1,...,32}), list of triples (i, j, k), where i ∈ {1,...,m}, j ∈ {1,...,n}, k ∈ {0,...,(n-2)·(m-2)}) describing fields with numbers.
Output:
List of triples (i, j, d) showing solved riddle. Triple (i, j, d) describes square with opposite vertices at coordinates (i, j) and (i+d, j+d).
Example:
Input:
7.
7. [(3,3,0), (3,5,0), (4,4,1), (5,1,0), (6,6,3)].
Output:
[(1,1,2), (1,5,2), (2,2,4), (5,1,2), (4,4,3)]
Image:
Explanation:
I have to find placement for x squares(x = fields with numbers). On the
circuit of the each square, exactly at one of his corners should be
only one digit equal to amount of digits inside the square. Sides of
squares can't cover each other, same as corners. Square lines are
"filling fields" so (0,0,1) square fills 4 fields and have 0 fields inside.
I need a little help coding solution to this riddle in Prolog. Could someone direct me in the right direction? What predicates, rules I should use.

Since Naimads life is worth saving(!), here is some more directive help.
Programmers tend to work out their projects either in bottomp-up (starting from "lower" levels of functionality, building up to a complete application) or top-down (starting with a high level "shell" of functionality that provides input/output, but with the solution processing stubbed in for later refinement).
Either way I can recommend an approach for Prolog progrmming that has been championed by the Ruby programming community: Test Driven Development.
Let's pick a couple of functional requirement, one high and one low, and discuss how this bit of philosophy might help.
The high level predicate in Prolog is one which takes all the input and output arguments and semantically defines the problem. Note that we can write down such a predicate without giving any care as to how the solution process will get implemented:
/* squaresRiddle/4 takes inputs of Height and Width of a "board" with
a list of (nonnegative) integers located at cells of the board, and
produces a corresponding list of squares having one corner at each
of those locations "boxing in" exactly the specified counts of them
in their interiors. No edges can overlap nor corners coincide. */
squaresRiddle(Height,Width,BoxedCountList, OutputSquaresList).
So far we have just framed the problem. The useful progress is given by the comment (clearly defining the inputs and outputs at a high level), and the code at this point just guarantees "success" with whatever arguments are passed. So a "test" of this code:
?= squaresRiddle(7,7,ListIn,ListOut).
won't produce anything except free variables.
Next let's give some thought to how the input list and output list ought to be represented. In the Question it is proposed to use triples of integers as entries of these lists. I would recommend instead using a named functor, because the semantics of those input triples (giving an x,y coordinate of a cell with a count) are subtly different from that of the output triples (giving an x,y coordinate of upper left corner and an "extent" (height&width) of the square). Note in particular that an output square may be described by a different corner than the one in the corresponding input item, although the one in the input item needs to be one of the four corners of the output square.
So it tends to make the code more readable if these constructs are distinguished by simple functor names, e.g. BoxedCountList = [box(3,3,0),box(3,5,0),box(4,4,1),box(5,1,0),box(6,6,3)] vs. OutputSquaresList = [sqr(1,1,2), sqr(1,5,2), sqr(2,2,4), sqr(5,1,2), sqr(4,4,3)].
Let's give a little thought to how the search for solutions will proceed. The output items are in 1-to-1 correspondence with the input items, so the lists will have equal length in the end. However choosing the kth output item depends not only on the kth input item, but on all the input items (since we count how many are in the interior of the output square) and on the preceding output items (since we disallow corners that meet and edges that overlap).
There are several ways to manage the flow of decision making consistent with this requirement, and it seems to be the crux of this exercise to pick one approach and make it work. If you are familiar with difference lists or accumulators, you have a head start on ways to do it.
Now let's switch to discussing some lower level functionality. Given an input box there are potentially a number of output sqr that could correspond to it. Given a positive "extent" for the output, the X,Y coordinates of the input can be met with any of the four corners of the output square. While more checking is required to verify the solution, this aspect seems a good place to start testing:
/* check that coordinates X,Y are indeed a corner of candidate square */
hasCorner(sqr(SX,SY,Extent),X,Y) :-
(SX is X ; SX is X + Extent),
(SY is Y ; SY is Y + Extent).
How would we make use of this test? Well we could generate all possible squares (possibly limiting ourselves to those that fit inside the height and width of our "board"), and then check whether any have the required corner. This might be fairly inefficient. A better way (as far as efficiency goes) would use the corner to generate possible squares (by extending from the corner by Extent in any of four directions) subject to the parameters of board size. Implementation of this "optimization" is left as an exercise for the reader.

What to use for flow free-like game random level creation?

I need some advice. I'm developing a game similar to Flow Free wherein the gameboard is composed of a grid and colored dots, and the user has to connect the same colored dots together without overlapping other lines, and using up ALL the free spaces in the board.
My question is about level-creation. I wish to make the levels generated randomly (and should at least be able to solve itself so that it can give players hints) and I am in a stump as to what algorithm to use. Any suggestions?
Note: image shows the objective of Flow Free, and it is the same objective of what I am developing.
Thanks for your help. :)

Consider solving your problem with a pair of simpler, more manageable algorithms: one algorithm that reliably creates simple, pre-solved boards and another that rearranges flows to make simple boards more complex.
The first part, building a simple pre-solved board, is trivial (if you want it to be) if you're using n flows on an nxn grid:
For each flow...
Place the head dot at the top of the first open column.
Place the tail dot at the bottom of that column.
Alternatively, you could provide your own hand-made starter boards to pass to the second part. The only goal of this stage is to get a valid board built, even if it's just trivial or predetermined, so it's worth keeping it simple.
The second part, rearranging the flows, involves looping over each flow, seeing which one can work with its neighboring flow to grow and shrink:
For some number of iterations...
Choose a random flow f.
If f is at the minimum length (say 3 squares long), skip to the next iteration because we can't shrink f right now.
If the head dot of f is next to a dot from another flow g (if more than one g to choose from, pick one at random)...
Move f's head dot one square along its flow (i.e., walk it one square towards the tail). f is now one square shorter and there's an empty square. (The puzzle is now unsolved.)
Move the neighboring dot from g into the empty square vacated by f. Now there's an empty square where g's dot moved from.
Fill in that empty spot with flow from g. Now g is one square longer than it was at the beginning of this iteration. (The puzzle is back to being solved as well.)
Repeat the previous step for f's tail dot.
The approach as it stands is limited (dots will always be neighbors) but it's easy to expand upon:
Add a step to loop through the body of flow f, looking for trickier ways to swap space with other flows...
Add a step that prevents a dot from moving to an old location...
Add any other ideas that you come up with.
The overall solution here is probably less than the ideal one that you're aiming for, but now you have two simple algorithms that you can flesh out further to serve the role of one large, all-encompassing algorithm. In the end, I think this approach is manageable, not cryptic, and easy to tweek, and, if nothing else, a good place to start.
Update: I coded a proof-of-concept based on the steps above. Starting with the first 5x5 grid below, the process produced the subsequent 5 different boards. Some are interesting, some are not, but they're always valid with one known solution.
Starting Point
5 Random Results (sorry for the misaligned screenshots)
And a random 8x8 for good measure. The starting point was the same simple columns approach as above.

Updated answer: I implemented a new generator using the idea of "dual puzzles". This allows much sparser and higher quality puzzles than any previous method I know of. The code is on github. I'll try to write more details about how it works, but here is an example puzzle:
Old answer:
I have implemented the following algorithm in my numberlink solver and generator. In enforces the rule, that a path can never touch itself, which is normal in most 'hardcore' numberlink apps and puzzles
First the board is tiled with 2x1 dominos in a simple, deterministic way.
If this is not possible (on an odd area paper), the bottom right corner is
left as a singleton.
Then the dominos are randomly shuffled by rotating random pairs of neighbours.
This is is not done in the case of width or height equal to 1.
Now, in the case of an odd area paper, the bottom right corner is attached to
one of its neighbour dominos. This will always be possible.
Finally, we can start finding random paths through the dominos, combining them
as we pass through. Special care is taken not to connect 'neighbour flows'
which would create puzzles that 'double back on themselves'.
Before the puzzle is printed we 'compact' the range of colours used, as much as possible.
The puzzle is printed by replacing all positions that aren't flow-heads with a .
My numberlink format uses ascii characters instead of numbers. Here is an example:
$ bin/numberlink --generate=35x20
Warning: Including non-standard characters in puzzle
35 20
....bcd.......efg...i......i......j
.kka........l....hm.n....n.o.......
.b...q..q...l..r.....h.....t..uvvu.
....w.....d.e..xx....m.yy..t.......
..z.w.A....A....r.s....BB.....p....
.D.........E.F..F.G...H.........IC.
.z.D...JKL.......g....G..N.j.......
P...a....L.QQ.RR...N....s.....S.T..
U........K......V...............T..
WW...X.......Z0..M.................
1....X...23..Z0..........M....44...
5.......Y..Y....6.........C.......p
5...P...2..3..6..VH.......O.S..99.I
........E.!!......o...."....O..$$.%
.U..&&..J.\\.(.)......8...*.......+
..1.......,..-...(/:.."...;;.%+....
..c<<.==........)./..8>>.*.?......#
.[..[....]........:..........?..^..
..._.._.f...,......-.`..`.7.^......
{{......].....|....|....7.......#..
And here I run it through my solver (same seed):
$ bin/numberlink --generate=35x20 | bin/numberlink --tubes
Found a solution!
┌──┐bcd───┐┌──efg┌─┐i──────i┌─────j
│kka│└───┐││l┌─┘│hm│n────n┌o│┌────┐
│b──┘q──q│││l│┌r└┐│└─h┌──┐│t││uvvu│
└──┐w┌───┘d└e││xx│└──m│yy││t││└──┘│
┌─z│w│A────A┌┘└─r│s───┘BB││┌┘└p┌─┐│
│D┐└┐│┌────E│F──F│G──┐H┐┌┘││┌──┘IC│
└z└D│││JKL┌─┘┌──┐g┌─┐└G││N│j│┌─┐└┐│
P──┐a││││L│QQ│RR└┐│N└──┘s││┌┘│S│T││
U─┐│┌┘││└K└─┐└─┐V││└─────┘││┌┘││T││
WW│││X││┌──┐│Z0││M│┌──────┘││┌┘└┐││
1┐│││X│││23││Z0│└┐││┌────M┌┘││44│││
5│││└┐││Y││Y│┌─┘6││││┌───┐C┌┘│┌─┘│p
5││└P│││2┘└3││6─┘VH│││┌─┐│O┘S┘│99└I
┌┘│┌─┘││E┐!!│└───┐o┘│││"│└─┐O─┘$$┌%
│U┘│&&│└J│\\│(┐)┐└──┘│8││┌*└┐┌───┘+
└─1└─┐└──┘,┐│-└┐│(/:┌┘"┘││;;│%+───┘
┌─c<<│==┌─┐││└┐│)│/││8>>│*┌?│┌───┐#
│[──[└─┐│]││└┐│└─┘:┘│└──┘┌┘┌┘?┌─^││
└─┐_──_│f││└,│└────-│`──`│7┘^─┘┌─┘│
{{└────┘]┘└──┘|────|└───7└─────┘#─┘
I've tested replacing step (4) with a function that iteratively, randomly merges two neighboring paths. However it game much denser puzzles, and I already think the above is nearly too dense to be difficult.
Here is a list of problems I've generated of different size: https://github.com/thomasahle/numberlink/blob/master/puzzles/inputs3

The most straightforward way to create such a level is to find a way to solve it. This way, you can basically generate any random starting configuration and determine if it is a valid level by trying to have it solved. This will generate the most diverse levels.
And even if you find a way to generate the levels some other way, you'll still want to apply this solving algorithm to prove that the generated level is any good ;)
Brute-force enumerating
If the board has a size of NxN cells, and there are also N colours available, brute-force enumerating all possible configurations (regardless of wether they form actual paths between start and end nodes) would take:
N^2 cells total
2N cells already occupied with start and end nodes
N^2 - 2N cells for which the color has yet to be determined
N colours available.
N^(N^2 - 2N) possible combinations.
So,
For N=5, this means 5^15 = 30517578125 combinations.
For N=6, this means 6^24 = 4738381338321616896 combinations.
In other words, the number of possible combinations is pretty high to start with, but also grows ridiculously fast once you start making the board larger.
Constraining the number of cells per color
Obviously, we should try to reduce the number of configurations as much as possible. One way of doing that is to consider the minimum distance ("dMin") between each color's start and end cell - we know that there should at least be this much cells with that color. Calculating the minimum distance can be done with a simple flood fill or Dijkstra's algorithm.
(N.B. Note that this entire next section only discusses the number of cells, but does not say anything about their locations)
In your example, this means (not counting the start and end cells)
dMin(orange) = 1
dMin(red) = 1
dMin(green) = 5
dMin(yellow) = 3
dMin(blue) = 5
This means that, of the 15 cells for which the color has yet to be determined, there have to be at least 1 orange, 1 red, 5 green, 3 yellow and 5 blue cells, also making a total of 15 cells.
For this particular example this means that connecting each color's start and end cell by (one of) the shortest paths fills the entire board - i.e. after filling the board with the shortest paths no uncoloured cells remain. (This should be considered "luck", not every starting configuration of the board will cause this to happen).
Usually, after this step, we have a number of cells that can be freely coloured, let's call this number U. For N=5,
U = 15 - (dMin(orange) + dMin(red) + dMin(green) + dMin(yellow) + dMin(blue))
Because these cells can take any colour, we can also determine the maximum number of cells that can have a particular colour:
dMax(orange) = dMin(orange) + U
dMax(red) = dMin(red) + U
dMax(green) = dMin(green) + U
dMax(yellow) = dMin(yellow) + U
dMax(blue) = dMin(blue) + U
(In this particular example, U=0, so the minimum number of cells per colour is also the maximum).
Path-finding using the distance constraints
If we were to brute force enumerate all possible combinations using these color constraints, we would have a lot less combinations to worry about. More specifically, in this particular example we would have:
15! / (1! * 1! * 5! * 3! * 5!)
= 1307674368000 / 86400
= 15135120 combinations left, about a factor 2000 less.
However, this still doesn't give us the actual paths. so a better idea would be to a backtracking search, where we process each colour in turn and attempt to find all paths that:
doesn't cross an already coloured cell
Is not shorter than dMin(colour) and not longer than dMax(colour).
The second criteria will reduce the number of paths reported per colour, which causes the total number of paths to be tried to be greatly reduced (due to the combinatorial effect).
In pseudo-code:
function SolveLevel(initialBoard of size NxN)
{
foreach(colour on initialBoard)
{
Find startCell(colour) and endCell(colour)
minDistance(colour) = Length(ShortestPath(initialBoard, startCell(colour), endCell(colour)))
}
//Determine the number of uncoloured cells remaining after all shortest paths have been applied.
U = N^(N^2 - 2N) - (Sum of all minDistances)
firstColour = GetFirstColour(initialBoard)
ExplorePathsForColour(
initialBoard,
firstColour,
startCell(firstColour),
endCell(firstColour),
minDistance(firstColour),
U)
}
}
function ExplorePathsForColour(board, colour, startCell, endCell, minDistance, nrOfUncolouredCells)
{
maxDistance = minDistance + nrOfUncolouredCells
paths = FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance)
foreach(path in paths)
{
//Render all cells in 'path' on a copy of the board
boardCopy = Copy(board)
boardCopy = ApplyPath(boardCopy, path)
uRemaining = nrOfUncolouredCells - (Length(path) - minDistance)
//Recursively explore all paths for the next colour.
nextColour = NextColour(board, colour)
if(nextColour exists)
{
ExplorePathsForColour(
boardCopy,
nextColour,
startCell(nextColour),
endCell(nextColour),
minDistance(nextColour),
uRemaining)
}
else
{
//No more colours remaining to draw
if(uRemaining == 0)
{
//No more uncoloured cells remaining
Report boardCopy as a result
}
}
}
}
FindAllPaths
This only leaves FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance) to be implemented. The tricky thing here is that we're not searching for the shortest paths, but for any paths that fall in the range determined by minDistance and maxDistance. Hence, we can't just use Dijkstra's or A*, because they will only record the shortest path to each cell, not any possible detours.
One way of finding these paths would be to use a multi-dimensional array for the board, where
each cell is capable of storing multiple waypoints, and a waypoint is defined as the pair (previous waypoint, distance to origin). The previous waypoint is needed to be able to reconstruct the entire path once we've reached the destination, and the distance to origin
prevents us from exceeding the maxDistance.
Finding all paths can then be done by using a flood-fill like exploration from the startCell outwards, where for a given cell, each uncoloured or same-as-the-current-color-coloured neigbour is recursively explored (except the ones that form our current path to the origin) until we reach either the endCell or exceed the maxDistance.
An improvement on this strategy is that we don't explore from the startCell outwards to the endCell, but that we explore from both the startCell and endCell outwards in parallel, using Floor(maxDistance / 2) and Ceil(maxDistance / 2) as the respective maximum distances. For large values of maxDistance, this should reduce the number of explored cells from 2 * maxDistance^2 to maxDistance^2.

I think you'll want to do this in two steps. Step 1) find a set of non-intersecting paths that connect all your points, then 2) Grow/shift those paths to fill the entire board
My thoughts on Step 1 are to essentially perform Dijkstra like algorithm on all points simultaneously, growing together the paths. Similar to Dijkstra, I think you'll want to flood-fill out from each of your points, chosing which node to search next using some heuristic (My hunch says chosing points with the least degrees of freedom first, then by distance, might be a good one). Very differently from Dijkstra though I think we might be stuck with having to backtrack when we have multiple paths attempting to grow into the same node. (This could of course be fairly problematic on bigger maps, but might not be a big deal on small maps like the one you have above.)
You may also solve for some of the easier paths before you start the above algorithm, mainly to cut down on the number of backtracks needed. In specific, if you can make a trace between points along the edge of the board, you can guarantee that connecting those two points in that fashion would never interfere with other paths, so you can simply fill those in and take those guys out of the equation. You could then further iterate on this until all of these "quick and easy" paths are found by tracing along the borders of the board, or borders of existing paths. That algorithm would actually completely solve the above example board, but would undoubtedly fail elsewhere .. still, it would be very cheap to perform and would reduce your search time for the previous algorithm.
Alternatively
You could simply do a real Dijkstra's algorithm between each set of points, pathing out the closest points first (or trying them in some random orders a few times). This would probably work for a fair number of cases, and when it fails simply throw out the map and generate a new one.
Once you have Step 1 solved, Step 2 should be easier, though not necessarily trivial. To grow your paths, I think you'll want to grow your paths outward (so paths closest to walls first, growing towards the walls, then other inner paths outwards, etc.). To grow, I think you'll have two basic operations, flipping corners, and expanding into into adjacent pairs of empty squares.. that is to say, if you have a line like
.v<<.
v<...
v....
v....
First you'll want to flip the corners to fill in your edge spaces
v<<<.
v....
v....
v....
Then you'll want to expand into neighboring pairs of open space
v<<v.
v.^<.
v....
v....
v<<v.
>v^<.
v<...
v....
etc..
Note that what I've outlined wont guarantee a solution if one exists, but I think you should be able to find one most of the time if one exists, and then in the cases where the map has no solution, or the algorithm fails to find one, just throw out the map and try a different one :)

You have two choices:
Write a custom solver
Brute force it.
I used option (2) to generate Boggle type boards and it is VERY successful. If you go with Option (2), this is how you do it:
Tools needed:
Write a A* solver.
Write a random board creator
To solve:
Generate a random board consisting of only endpoints
while board is not solved:
get two endpoints closest to each other that are not yet solved
run A* to generate path
update board so next A* knows new board layout with new path marked as un-traversable.
At exit of loop, check success/fail (is whole board used/etc) and run again if needed
The A* on a 10x10 should run in hundredths of a second. You can probably solve 1k+ boards/second. So a 10 second run should get you several 'usable' boards.
Bonus points:
When generating levels for a IAP (in app purchase) level pack, remember to check for mirrors/rotations/reflections/etc so you don't have one board a copy of another (which is just lame).
Come up with a metric that will figure out if two boards are 'similar' and if so, ditch one of them.

Parabolic knapsack

Lets say I have a parabola. Now I also have a bunch of sticks that are all of the same width (yes my drawing skills are amazing!). How can I stack these sticks within the parabola such that I am minimizing the space it uses as much as possible? I believe that this falls under the category of Knapsack problems, but this Wikipedia page doesn't appear to bring me closer to a real world solution. Is this a NP-Hard problem?
In this problem we are trying to minimize the amount of area consumed (eg: Integral), which includes vertical area.

I cooked up a solution in JavaScript using processing.js and HTML5 canvas.
This project should be a good starting point if you want to create your own solution. I added two algorithms. One that sorts the input blocks from largest to smallest and another that shuffles the list randomly. Each item is then attempted to be placed in the bucket starting from the bottom (smallest bucket) and moving up until it has enough space to fit.
Depending on the type of input the sort algorithm can give good results in O(n^2). Here's an example of the sorted output.
Here's the insert in order algorithm.
function solve(buckets, input) {
var buckets_length = buckets.length,
results = [];
for (var b = 0; b < buckets_length; b++) {
results[b] = [];
}
input.sort(function(a, b) {return b - a});
input.forEach(function(blockSize) {
var b = buckets_length - 1;
while (b > 0) {
if (blockSize <= buckets[b]) {
results[b].push(blockSize);
buckets[b] -= blockSize;
break;
}
b--;
}
});
return results;
}
Project on github - https://github.com/gradbot/Parabolic-Knapsack
It's a public repo so feel free to branch and add other algorithms. I'll probably add more in the future as it's an interesting problem.

Simplifying
First I want to simplify the problem, to do that:
I switch the axes and add them to each other, this results in x2 growth
I assume it is parabola on a closed interval [a, b], where a = 0 and for this example b = 3
Lets say you are given b (second part of interval) and w (width of a segment), then you can find total number of segments by n=Floor[b/w]. In this case there exists a trivial case to maximize Riemann sum and function to get i'th segment height is: f(b-(b*i)/(n+1))). Actually it is an assumption and I'm not 100% sure.
Max'ed example for 17 segments on closed interval [0, 3] for function Sqrt[x] real values:
And the segment heights function in this case is Re[Sqrt[3-3*Range[1,17]/18]], and values are:
Exact form:
{Sqrt[17/6], 2 Sqrt[2/3], Sqrt[5/2],
Sqrt[7/3], Sqrt[13/6], Sqrt[2],
Sqrt[11/6], Sqrt[5/3], Sqrt[3/2],
2/Sqrt[3], Sqrt[7/6], 1, Sqrt[5/6],
Sqrt[2/3], 1/Sqrt[2], 1/Sqrt[3],
1/Sqrt[6]}
Approximated form:
{1.6832508230603465,
1.632993161855452, 1.5811388300841898, 1.5275252316519468, 1.4719601443879744, 1.4142135623730951, 1.35400640077266, 1.2909944487358056, 1.224744871391589, 1.1547005383792517, 1.0801234497346435, 1, 0.9128709291752769, 0.816496580927726, 0.7071067811865475, 0.5773502691896258, 0.4082482904638631}
What you have archived is a Bin-Packing problem, with partially filled bin.
Finding b
If b is unknown or our task is to find smallest possible b under what all sticks form the initial bunch fit. Then we can limit at least b values to:
lower limit : if sum of segment heights = sum of stick heights
upper limit : number of segments = number of sticks longest stick < longest segment height
One of the simplest way to find b is to take a pivot at (higher limit-lower limit)/2 find if solution exists. Then it becomes new higher or lower limit and you repeat the process until required precision is met.
When you are looking for b you do not need exact result, but suboptimal and it would be much faster if you use efficient algorithm to find relatively close pivot point to actual b.
For example:
sort the stick by length: largest to smallest
start 'putting largest items' into first bin thy fit

This is equivalent to having multiple knapsacks (assuming these blocks are the same 'height', this means there's one knapsack for each 'line'), and is thus an instance of the bin packing problem.
See http://en.wikipedia.org/wiki/Bin_packing

How can I stack these sticks within the parabola such that I am minimizing the (vertical) space it uses as much as possible?
Just deal with it like any other Bin Packing problem. I'd throw meta-heuristics on it (such as tabu search, simulated annealing, ...) since those algorithms aren't problem specific.
For example, if I'd start from my Cloud Balance problem (= a form of Bin Packing) in Drools Planner. If all the sticks have the same height and there's no vertical space between 2 sticks on top of each other, there's not much I'd have to change:
Rename Computer to ParabolicRow. Remove it's properties (cpu, memory, bandwith). Give it a unique level (where 0 is the lowest row). Create a number of ParabolicRows.
Rename Process to Stick
Rename ProcessAssignement to StickAssignment
Rewrite the hard constraints so it checks if there's enough room for the sum of all Sticks assigned to a ParabolicRow.
Rewrite the soft constraints to minimize the highest level of all ParabolicRows.

I'm very sure it is equivalent to bin-packing:
informal reduction
Be x the width of the widest row, make the bins 2x big and create for every row a placeholder element which is 2x-rowWidth big. So two placeholder elements cannot be packed into one bin.
To reduce bin-packing on parabolic knapsack you just create placeholder elements for all rows that are bigger than the needed binsize with size width-binsize. Furthermore add placeholders for all rows that are smaller than binsize which fill the whole row.
This would obviously mean your problem is NP-hard.
For other ideas look here maybe: http://en.wikipedia.org/wiki/Cutting_stock_problem

Most likely this is the 1-0 Knapsack or a bin-packing problem. This is a NP hard problem and most likely this problem I don't understand and I can't explain to you but you can optimize with greedy algorithms. Here is a useful article about it http://www.developerfusion.com/article/5540/bin-packing that I use to make my php class bin-packing at phpclasses.org.

Props to those who mentioned the fact that the levels could be at varying heights (ex: assuming the sticks are 1 'thick' level 1 goes from 0.1 unit to 1.1 units, or it could go from 0.2 to 1.2 units instead)
You could of course expand the "multiple bin packing" methodology and test arbitrarily small increments. (Ex: run the multiple binpacking methodology with levels starting at 0.0, 0.1, 0.2, ... 0.9) and then choose the best result, but it seems like you would get stuck calulating for an infinite amount of time unless you had some methodlogy to verify that you had gotten it 'right' (or more precisely, that you had all the 'rows' correct as to what they contained, at which point you could shift them down until they met the edge of the parabola)
Also, the OP did not specify that the sticks had to be laid horizontally - although perhaps the OP implied it with those sweet drawings.
I have no idea how to optimally solve such an issue, but i bet there are certain cases where you could randomly place sticks and then test if they are 'inside' the parabola, and it would beat out any of the methodologies relying only on horizontal rows.
(Consider the case of a narrow parabola that we are trying to fill with 1 long stick.)
I say just throw them all in there and shake them ;)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio