I am writing an AI for a text-based game and the problem I have is that I am stuck on how to determine the thinnest section of wall. For example the following represents the 2D map, where '^' is the character who wants to get through a wall represented by '*' characters to the location marked 'X':
------------------
| X * |
|***** |
|**** |
|*** |
|** ^ |
| |
| |
| |
------------------
I've been thinking about this one for a couple days straight now and I've run out of ideas. I've tried using a* algorithm where when it tries to go through a wall character the g-cost made very high. Unfortunately the algorithm decides to never path-find through the wall.
The agent can only move left-right-up-down and not diagonally, one space at a time.
The shortest path in the above example through the wall is one, as it only has to travel through one '*' character.
I just need a couple simple ideas :)
All of the popular graph search algorithms are typically formulated with costs having some real number (i.e. float/double) cost. But this isn't a requirement. All you really need for the cost is something that has a strict ordering and operations like addition.
You could apply standard A* to this.
Define the costs to have the form (a,b)
a is number of moves on wall cells
b is number of moves on normal cells.
Define The ordering on these cells as follows:
[(a1,b1) < (a2,b2)] == [(a1<a2) || (a1==a2 && b1<b2)]
This is just a dictionary ordering where we always prefer less moves on walls before we prefer less moves on normal cells
Define the addition operation on these costs as follows:
(a1,b1) + (a2,b2) == (a1+a2,b1+b2)
Define the heuristic (i.e. the lower bound estimate on the remaining distance to goal) to be (0,b) where b is the Manhattan distance to the goal
One immediate objection might be "With that heuristic, the entire space outside of the wall will have to be explored before ever trying to pass through the wall!" -- But that's exactly what was asked for.
With the information and requirements you've given, that's actually the optimal A* heuristic.
A more complicated approach that could give significantly better best-case performance would be to combine the above with a bidirectional search. If your goal is inside of a tiny walled area, the bidirectional search can find some candidate "cheapest paths through the wall" very early on in the search.
Just make it a weighted graph, and give all the "walls" an absurdly large weight.
Be careful not to overflow your integers.
Assuming any number of moves is always cheaper than going through the wall (meaning 10000000000 non-through-wall moves is cheaper than 1 through-wall move) (otherwise setting the cost appropriately would work), I can see a special case for any algorithm I can think of that does not involve searching practically the whole map, so...
Do an exhaustive A* from the source, stopping at walls, recording the shortest path at each position.
For each explored position next to a wall, step though the wall. Then do a combined A* of all of them (where applicable), again stopping at walls.
For each of these new explored positions next to walls, step though the wall and continue the A*.
And so on... until we find the target.
Skip already explored positions - the new path should always be longer than the one already there.
The only reason you really want to do an A* instead of a BFS is because this will allow for less exploring as soon when you reach the area containing the target. Whether this is more efficient will depend on the map.
As Sint mentioned, if the start is always in a wide open area and the finish is in a small area, reversing this search would be more efficient. But this is only really applicable if you know whether it is. Detecting it is unlikely to be efficient, and once you have, you've done most of the work already and you would've lost out if both are in reasonably-sized areas.
An example:
X** |
** |
** |
** ^|
Initial BFS:
X**3
**32
**21
**10
Step through the wall and BFS (no BFS happens since they have nowhere to go but through walls):
(the ones we can ignore are marked with %)
X*4%
*4%%
*3%%
*2%%
Step into wall and BFS (BFS onto target):
65%%
5%%%
4%%%
3%%%
Use Dijkstra.
Since you're dealing with a text-based game, I find it highly unlikely that you're talking about maps larger than 1000 by 1000 characters. This will give you the guaranteed best answer with a very low cost O(n*logn), and a very simple and straightforward code.
Basically, each search state will have to keep track of two things: How many walls have you stepped through so far, and how many regular empty spaces. This can be encoded into a single integer, for both the search and the mark matrix, by assuming for instance that each wall has a cost of 2^16 to be traversed. Thus, the natural ordering of Dijkstra will ensure the paths with the least walls get tried first, and that after traversing a wall, you do not repeat paths that you had already reached without going through as many walls.
Basically, assuming a 32-bit integer, a state that has passed through 5 empty spaces and 3 walls, will look like this:
0000000000000011 0000000000000101. If your maps are really huge, maze-like, tons of walls, few walls, or whatnot, you can tweak this representation to use more or less bits for each information, or even use a longer integers if you feel more comfortable with, as this specific encoding example would "overflow" if there existed a shortest path that required walking through more than 65 thousand empty spaces.
The main advantage of using a single integer rather than two (for walls/ empty spaces) is that you can have a single, simple int mark[MAXN][MAXM]; matrix to keep track of the search. If you have reached a specific square while walking through 5 walls, you do not have to check whether you could have reached it with 4, 3 or less walls to prevent the propagation of a useless state - this information will automatically be embedded into your integer, as long as you store the amount of walls into the higher bits, you will never repeat a path while having a higher "wall cost".
Here is a fully implemented algorithm in C++, consider it pseudo-code for better visualization and understanding of the idea presented above :)
int rx[4] = {1,0,-1,0};
int ry[4] = {0,1,0,-1};
int text[height][width]; // your map
int mark[height][width]; //set every position to "infinite" cost
int parent[height][width]; //to recover the final path
priority_queue<int64_t, vector<int64_t>, greater<int64_t> > q;
int64_t state = (initial_y<<16) + initial_x;
q.push(state);
while(!q.empty()){
state = q.top(); q.pop();
int x = state & 0xFF;
int y = (state>>16) & 0xFF;
int cost = state>>32;
if(cost > mark[x][y]) continue;
if(text[x][y] == 'X') break;
for(int i = 0; i < 4; ++i){
int xx = x+rx[i];
int yy = y+ry[i];
if(xx > -1 && xx < width && yy > -1 && yy < height){
int newcost = cost;
if(text[yy][xx] == ' ') newcost += 1;
else newcost += 1<<16;
if(newcost < mark[yy][xx]){
mark[yy][xx] = newcost;
parent[yy][xx] = i; //you know which direction you came from
q.push( ((int64_t)newcost << 32) + (y<<16) + x);
}
}
}
}
// The number of walls in the final answer:
// walls = state>>48;
// steps = (state>>32) & 0xFF; // (non walls)
// you can recover the exact path traversing back using the information in parent[][]
Related
Can some one help in solving the below problem using DP technique.
No need of code. Idea should be enough.
Marvel is coming up with a new superhero named Jumping Jack. The co-creator of this superhero is a mathematician, and he adds a mathematical touch to the character's powers.
So, one of Jumping Jack's most prominent powers is jumping distances. But, this superpower comes with certain restrictions.
Jumping Jack can only jump —
To the number that is one kilometre lesser than the current distance. For example, if he is 12 km away from the destination, he won't be able to jump directly to it since he can only jump to a location 11 km away.
To a distance that is half the current distance. For example, if Jumping Jack is 12 km away from the destination, again, he won't be able to jump directly to it since he can only jump to a location 6 km away.
To a distance that is ⅓rd the current distance. For example, if Jumping Jack is 12 km away from the destination, once more, he won't be able to jump directly to it since he can only jump to a location 4 km away.
So, you need to help the superhero develop an algorithm to reach the destination in the minimum number of steps. The destination is defined as the place where the distance becomes 1. Jumping Jack should cover the last 1 km running! Also, he can only jump to a destination that is an integer distance away from the main destination. For example, if he is at 10 km, by jumping 1/3rd the distance, he cannot reach 10/3rd the distance. He has to either jump to 5 or 9.
So, you have to find the minimum number of jumps required to reach a destination. For instance, if the destination is 10 km away, there are two ways to reach it:
10 -> 5 -> 4 -> 2 -> 1 (four jumps)
10 -> 9 -> 3 -> 1 (three jumps)
The minimum of these is three, so the superhero takes a minimum of three jumps to reach the point.
You should keep the following 2 points in mind for solving all the Dynamic programming problems:
Optimal Substructure (to find the minimum number of jumps)
Overlapping subproblems (If you encounter the same subproblem again, don't solve it, rather used the already computed result - yes, you'll have to store the computed result of sub-problems for future reference)
Always try to make a recursive solution to the problem at hand (now don't directly go ahead and look at the recursive solution, rather first try it yourself):
calCulateMinJumps(int currentDistance) {
if(currentDistance == 1) {
// return jumps - you don't need to recurse here
} else {
// find at what all places you can jump
jumps1 = calculateMinJumps(n-1) + 1
if(currentDistance % 2 == 0)
jumps2 = calculateMinJumps(n/2) + 1
if(currentDistance % 3 == 0)
jumps3 = calculateMinJumps(n/3) + 1
return minimum(jumps1, jumps2, jumps3) // takes jump2 and jump3 only if they are valid
}
}
Now you know that we have devised a recursive solution. The next step is to go and store the solution in an array so that you can use it in future and avoid re-computation. You can simply take a 1-D integer array and keep on storing it.
Keep in mind that if you go by top-down approach - it will be called memoization, and if you go by bottom-up approach - it will be called Dynamic programming. Have a look at this to see the exact difference between the two approaches.
Once you have a recursive solution in your mind, you can now think of constructing a bottom-up solution, or a top-down solution.
In Bottom-up solution strategy - you fill out the base cases first (Jumping Jack should cover the last 1 km running!) and then build upon it to reach to the final required solution.
Now, I'm not giving you the complete strategy, as it will be a task for you to carry out. And you will indeed find the solution. :) - As requested by you - Idea should be enough.
Firstly, think about this coin changing problem may help you understand yours, which are much the same:
Coin change - DP
Secondly, usually if you know that your problem has a DP solution, you can do 4 steps to solve it. Of cause you can ommit one or all of the first 3 steps.
Find a backtrack solution.(omitted here)
Find the recursion formula for your problem, based on the backtrack solution. (describe later)
Write a recursion code based on the recursion formula.(Omitted)
Write a iterating code based on step 3.(Omitted)
Finaly, for your question, the formula is not hard to figure out:
minStepFor(distance_N)=Math.min(minStepFor(distance_N-1)+1),
minStepFor(distance_N/2)+1,
minStepFor(distance_N/3)+1)
Just imagine jack is standing at the distance N point, and he has at most three options for his first go: go to N-1 point, or N/2 point, or N/3 point(if N/2 or N/3 is not integer, his choice will be reduced).
For each of his choice, the minimum steps is minStepFor(left distance)+1, since he has made 1 move, and surely he will try to make a minimum steps in his left moving. And the left distance for each choice is distance_N-1,distance_N/2 and distance_N/3.
So that's the way to understand the formula. With it it is not hard to write a recursion solution.
Consider f[1]=0 as the number of jumps required when JumpingJack is 1km away is none.
Using this base value solve F[2]...f[n] in below manner
for(int i=2; i<=n; i++) {
if(i%2==0 && i%3==0) {
f[i] = Math.min(Math.min(f[i-1]+1, f[i/2]+1), f[i/3] + 1);
}
if(i%2==0) {
f[i] = Math.min(f[i-1]+1, f[i/2]+1);
}else if(i%3==0) {
f[i] = Math.min(f[i-1]+1, f[i/3] + 1);
}else{
f[i] = f[i-1]+1;
}
}
return f[n];
You don't need to recursively solve the subproblem more than once!
Consider the follow representation of a concrete slab element with reinforcement bars and holes.
I need an algorithm that automatically distributes lines over an arbitrary shape with different holes.
The main constraints are:
Lines cannot be outside of the region or inside a hole
The distance between two side-by-side lines cannot exceed a variable D
Lines have to be positioned on a fixed interval I, i.e. y mod I = 0, where y is the line's Y coordinate.
Each available point inside the shape cannot be further from a line than D/2
I want to optimize the solution by minimizing the total number of lines N. What kind of optimization algorithm would suit this problem?
I assume most approaches involves simplifying the shape into a raster (with pixel height I) and disable or enable each pixel. I thought this was an obvious LP problem and tried to set it up with GLPK, but found it very hard to describe the problem using this simplified raster for an arbitrary number of lines. I also suspect that the solution space might be too big.
I have already implemented an algorithm in C# that does the job, but not very optimized. This is how it works:
Create a simplified raster of the geometry
Calculate a score for each cell using a complicated formula that takes possible line length and distances to other rods and obstacles into account.
Determine which needs reinforcement (where number of free cells in y direction > D)
Pick the cell with the highest score that needs reinforcement, and reinforce it as far as possible in -x and +x directions
Repeat
Depending on the complicated formula, this works pretty well but starts giving unwanted results when putting the last few lines, since it can never move an already put line.
Are there any other optimization techniques that I should have a look at?
I'm not sure what follows is what you want - I'm fairly sure it's not what you had in mind - but if it sounds reasonable you might try it out.
Because the distance is simply at most d, and can be anything less than that, it seems at first blush like a greedy algorithm should work here. Always place the next line(s) so that (1) as few as possible are needed and (2) they are as far away from existing lines as possible.
Assume you have an optimal algorithm for this problem, and it places the next line(s) at distance a <= d from the last line. Say it places b lines. Our greedy algorithm will definitely place no more than b lines (since the first criterion is to place as few as possible), and if it places b lines it will place them at distance c with a <= c <= d, since it then places the lines as far as possible.
If the greedy algorithm did not do what the optimal algorithm did, it differed in one of the following ways:
It placed the same or fewer lines farther away. Suppose the optimal algorithm had gone on to place b' lines at distance a' away at the next step. Then these lines would be at distance a+a' and there would be b+b' lines in total. But the greedy algorithm can mimic the optimal algorithm in this case by placing b' lines at displacement a+a' by choosing c' = (a+a') - c. Since c > a and a' < d, c' < d and this is a legal placement.
It placed fewer lines closer together. This case is actually problematic. It is possible that this places k unnecessary lines, if any placement requires at least k lines and the farthest ones require more, and the arrangement of holes is chosen so that (e.g.) the distance it spans is a multiple of d.
So the greedy algorithm turns out not to work in case 2. However, it does in other cases. In particular, our observation in the first case is very useful: for any two placements (distance, lines) and (distance', lines'), if distance >= distance' and lines <= lines', the first placement is always to be preferred. This suggests the following algorithm:
PlaceLines(start, stop)
// if we are close enough to the other edge,
// don't place any more lines.
if start + d >= stop then return ([], 0)
// see how many lines we can place at distance
// d from the last placed lines. no need to
// ever place more lines than this
nmax = min_lines_at_distance(start + d)
// see how that selection pans out by recursively
// seeing how line placement works after choosing
// nmax lines at distance d from the last lines.
optimal = PlaceLines(start + d, stop)
optimal[0] = [d] . optimal[0]
optimal[1] = nmax + optimal[1]
// we only need to try fewer lines, never more
for n = 1 to nmax do
// find the max displacement a from the last placed
// lines where we can place n lines.
a = max_distance_for_lines(start, stop, n)
if a is undefined then continue
// see how that choice pans out by placing
// the rest of the lines
candidate = PlaceLines(start + a, stop)
candidate[0] = [a] . candidate[0]
candidate[1] = n + candidate[1]
// replace the last best placement with the
// one we just tried, if it turned out to be
// better than the last
if candidate[1] < optimal[1] then
optimal = candidate
// return the best placement we found
return optimal
This can be improved by memoization by putting results (seq, lines) into a cache indexed by (start, stop). That way, we can recognize when we are trying to compute assignments that may already have been evaluated. I would expect that we'd have this case a lot, regardless of whether you use a coarse or a fine discretization for problem instances.
I don't get into details about how max_lines_at_distance and max_distance_for_lines functions might work, but maybe a word on these.
The first tells you at a given displacement how many lines are required to span the geometry. If you have pixelated your geometry and colored holes black, this would mean looking at the row of cells at the indicated displacement, considering the contiguous black line segments, and determining from there how many lines that implies.
The second tells you, for a given candidate number of lines, the maximum distance from the current position at which that number of lines can be placed. You could make this better by having it tell you the maximum distance at which that number of lines, or fewer, could be placed. If you use this improvement, you could reverse the direction in which you're iterating n and:
if f(start, stop, x) = a and y < x, you only need to search up to a, not stop, from then on;
if f(start, stop, x) is undefined and y < x, you don't need to search any more.
Note that this function can be undefined if it is impossible to place n or fewer lines anywhere between start and stop.
Note also that you can memorize separately for these functions to save repeated lookups. You can precompute max_lines_at_distance for each row and store it in a cache for later. Then, max_distance_for_lines could be a loop that checks the cache back to front inside two bounds.
Consider array of points in 2D,3D,(4D...) space ( e.g. nodes of unstructured mesh ). Initially the index of a point in array is not related to its position in space. In simple case, assume I already know some nearest neighbor connectivity graph.
I would like some heuristics which increase probability that two points which are close to each other in space would have similar index (would be close in array).
I understand that exact solution is very hard (perhaps similar to Travelling salesman problem ) but I don't need exact solution, just something which increase probability.
My ideas on solution:
some naive solution would be like:
1. for each point "i" compute fitness E_i given by sum of distances in array (i.e. index-wise) from its spatial neighbors (i.e. space-wise)
E_i = -Sum_k ( abs( index(i)-index(k) ) )
where "k" are spatial nearest neighbors of "i"
2. for pairs of points (i,j) which have low fitness (E_i,E_j)
try to swap them,
if fitness improves, accept
but the detailed implementation and its performance optimization is not so clear.
Other solution which does not need precomputed nearest-neighbors would be based on some Locality-sensitive_hashing
I think this could be quite common problem, and there may exist good solutions, I do not want to reinvent the wheel.
Application:
improve cache locality, considering that memory access is often bottleneck of graph-traversal
it could accelerate interpolation of unstructured grid, more specifically search for nodes which are near the smaple (e.g. centers of Radial-basis function).
I'd say space filling curves (SPC) are the standard solution to map proximity in space to a linear ordering. The most common ones are Hilbert-curves and z-curves (Morton order).
Hilbert curves have the best proximity mapping, but they are somewhat expensive to calculate. Z-ordering still has a good proximity mapping but is very easy to calculate. For z-ordering, it is sufficient to interleave the bits of each dimension. Assuming integer values, if you have a 64bit 3D point (x,y,z), the z-value is $x_0,y_0,z_0,x_1,y_1,z_1, ... x_63,y_63,z_63$, i.e. a 192 bit value consisting of the first bit of every dimension, followed by the second bit of every dimension, and so on. If your array is ordered according to that z-value, points that are close in space are usually also close in the array.
Here are example functions that interleave (merge) values into a z-value (nBitsPerValue is usually 32 or 64):
public static long[] mergeLong(final int nBitsPerValue, long[] src) {
final int DIM = src.length;
int intArrayLen = (src.length*nBitsPerValue+63) >>> 6;
long[] trg = new long[intArrayLen];
long maskSrc = 1L << (nBitsPerValue-1);
long maskTrg = 0x8000000000000000L;
int srcPos = 0;
int trgPos = 0;
for (int j = 0; j < nBitsPerValue*DIM; j++) {
if ((src[srcPos] & maskSrc) != 0) {
trg[trgPos] |= maskTrg;
} else {
trg[trgPos] &= ~maskTrg;
}
maskTrg >>>= 1;
if (maskTrg == 0) {
maskTrg = 0x8000000000000000L;
trgPos++;
}
if (++srcPos == DIM) {
srcPos = 0;
maskSrc >>>= 1;
}
}
return trg;
}
You can also interleave the bits of floating point values (if encoded with IEEE 754, as they usually are in standard computers), but this results in non-euclidean distance properties. You may have to transform negative values first, see here, section 2.3.
EDIT
Two answer the questions from the comments:
1) I understand how to make space filling curve for regular
rectangular grid. However, if I have randomly positioned floating
points, several points can map into one box. Would that algorithm work
in that case?
There are several ways to use floating point (FP) values. The simplest is to convert them to integer values by multiplying them by a large constant. For example multiply everything by 10^6 to preserve 6 digit precision.
Another way is to use the bitlevel representation of the FP value to turn it into an integer. This has the advantage that no precision is lost and you don't have to determine a multiplication constant. The disadvantage is that euclidean distance metric do not work anymore.
It works as follows: The trick is that the floating point values do not have infinite precision, but are limited to 64bit. Hence they automatically form a grid. The difference to integer values is that floating point values do not form a quadratic grid but a rectangular grid where the rectangles get bigger with growing distance from (0,0). The grid-size is determined by how much precision is available at a given point. Close to (0,0), the precision (=grid_size) is 10^-28, close to (1,1), it is 10^-16 see here. This distorted grid still has the proximity mapping, but distances are not euclidean anymore.
Here is the code to do the transformation (Java, taken from here; in C++ you can simply cast the float to int):
public static long toSortableLong(double value) {
long r = Double.doubleToRawLongBits(value);
return (r >= 0) ? r : r ^ 0x7FFFFFFFFFFFFFFFL;
}
public static double toDouble(long value) {
return Double.longBitsToDouble(value >= 0.0 ? value : value ^ 0x7FFFFFFFFFFFFFFFL);
}
These conversion preserve ordering of the converted values, i.e. for every two FP values the resulting integers have the same ordering with respect to <,>,=. The non-euclidean behaviour is caused by the exponent which is encoded in the bit-string. As mentioned above, this is also discussed here, section 2.3, however the code is slightly less optimized.
2) Is there some algorithm how to do iterative update of such space
filling curve if my points moves in space? ( i.e. without reordering
the whole array each time )
The space filling curve imposes a specific ordering, so for every set of points there is only one valid ordering. If a point is moved, it has to be reinserted at the new position determined by it's z-value.
The good news is that small movement will likely mean that a point may often stay in the same 'area' of your array. So if you really use a fixed array, you only have to shift small parts of it.
If you have a lot of moving objects and the array is to cumbersome, you may want to look into 'moving objects indexes' (MX-CIF-quadtree, etc). I personally can recommend my very own PH-Tree. It is a kind of bitwise radix-quadtree that uses a z-curve for internal ordering. It is quite efficient for updates (and other operations). However, I usually recommend it only for larger datasets, for small datasets a simple quadtree is usually good enough.
The problem you are trying to solve has meaning iff, given a point p and its NN q, then it is true that the NN of q is p.
That is not trivial, since for example the two points can represent positions in a landscape, so the one point can be high in a mountain, so going from the bottom up to mountain costs more that the other way around (from the mountain to the bottom). So, make sure you check that's not your case.
Since TilmannZ already proposed a solution, I would like to emphasize on LSH you mentioned. I would not choose that, since your points lie in a really low dimensional space, it's not even 100, so why using LSH?
I would go for CGAL's algorithm on that case, such as 2D NNS, or even a simple kd-tree. And if speed is critical, but space is not, then why not going for a quadtree (octree in 3D)? I had built one, it won't go beyond 10 dimensions in an 8GB RAM.
If however, you feel that your data may belong in a higher dimensional space in the future, then I would suggest using:
LSH from Andoni, really cool guy.
FLANN, which offers another approach.
kd-GeRaF, which is developed by me.
I need some advice. I'm developing a game similar to Flow Free wherein the gameboard is composed of a grid and colored dots, and the user has to connect the same colored dots together without overlapping other lines, and using up ALL the free spaces in the board.
My question is about level-creation. I wish to make the levels generated randomly (and should at least be able to solve itself so that it can give players hints) and I am in a stump as to what algorithm to use. Any suggestions?
Note: image shows the objective of Flow Free, and it is the same objective of what I am developing.
Thanks for your help. :)
Consider solving your problem with a pair of simpler, more manageable algorithms: one algorithm that reliably creates simple, pre-solved boards and another that rearranges flows to make simple boards more complex.
The first part, building a simple pre-solved board, is trivial (if you want it to be) if you're using n flows on an nxn grid:
For each flow...
Place the head dot at the top of the first open column.
Place the tail dot at the bottom of that column.
Alternatively, you could provide your own hand-made starter boards to pass to the second part. The only goal of this stage is to get a valid board built, even if it's just trivial or predetermined, so it's worth keeping it simple.
The second part, rearranging the flows, involves looping over each flow, seeing which one can work with its neighboring flow to grow and shrink:
For some number of iterations...
Choose a random flow f.
If f is at the minimum length (say 3 squares long), skip to the next iteration because we can't shrink f right now.
If the head dot of f is next to a dot from another flow g (if more than one g to choose from, pick one at random)...
Move f's head dot one square along its flow (i.e., walk it one square towards the tail). f is now one square shorter and there's an empty square. (The puzzle is now unsolved.)
Move the neighboring dot from g into the empty square vacated by f. Now there's an empty square where g's dot moved from.
Fill in that empty spot with flow from g. Now g is one square longer than it was at the beginning of this iteration. (The puzzle is back to being solved as well.)
Repeat the previous step for f's tail dot.
The approach as it stands is limited (dots will always be neighbors) but it's easy to expand upon:
Add a step to loop through the body of flow f, looking for trickier ways to swap space with other flows...
Add a step that prevents a dot from moving to an old location...
Add any other ideas that you come up with.
The overall solution here is probably less than the ideal one that you're aiming for, but now you have two simple algorithms that you can flesh out further to serve the role of one large, all-encompassing algorithm. In the end, I think this approach is manageable, not cryptic, and easy to tweek, and, if nothing else, a good place to start.
Update: I coded a proof-of-concept based on the steps above. Starting with the first 5x5 grid below, the process produced the subsequent 5 different boards. Some are interesting, some are not, but they're always valid with one known solution.
Starting Point
5 Random Results (sorry for the misaligned screenshots)
And a random 8x8 for good measure. The starting point was the same simple columns approach as above.
Updated answer: I implemented a new generator using the idea of "dual puzzles". This allows much sparser and higher quality puzzles than any previous method I know of. The code is on github. I'll try to write more details about how it works, but here is an example puzzle:
Old answer:
I have implemented the following algorithm in my numberlink solver and generator. In enforces the rule, that a path can never touch itself, which is normal in most 'hardcore' numberlink apps and puzzles
First the board is tiled with 2x1 dominos in a simple, deterministic way.
If this is not possible (on an odd area paper), the bottom right corner is
left as a singleton.
Then the dominos are randomly shuffled by rotating random pairs of neighbours.
This is is not done in the case of width or height equal to 1.
Now, in the case of an odd area paper, the bottom right corner is attached to
one of its neighbour dominos. This will always be possible.
Finally, we can start finding random paths through the dominos, combining them
as we pass through. Special care is taken not to connect 'neighbour flows'
which would create puzzles that 'double back on themselves'.
Before the puzzle is printed we 'compact' the range of colours used, as much as possible.
The puzzle is printed by replacing all positions that aren't flow-heads with a .
My numberlink format uses ascii characters instead of numbers. Here is an example:
$ bin/numberlink --generate=35x20
Warning: Including non-standard characters in puzzle
35 20
....bcd.......efg...i......i......j
.kka........l....hm.n....n.o.......
.b...q..q...l..r.....h.....t..uvvu.
....w.....d.e..xx....m.yy..t.......
..z.w.A....A....r.s....BB.....p....
.D.........E.F..F.G...H.........IC.
.z.D...JKL.......g....G..N.j.......
P...a....L.QQ.RR...N....s.....S.T..
U........K......V...............T..
WW...X.......Z0..M.................
1....X...23..Z0..........M....44...
5.......Y..Y....6.........C.......p
5...P...2..3..6..VH.......O.S..99.I
........E.!!......o...."....O..$$.%
.U..&&..J.\\.(.)......8...*.......+
..1.......,..-...(/:.."...;;.%+....
..c<<.==........)./..8>>.*.?......#
.[..[....]........:..........?..^..
..._.._.f...,......-.`..`.7.^......
{{......].....|....|....7.......#..
And here I run it through my solver (same seed):
$ bin/numberlink --generate=35x20 | bin/numberlink --tubes
Found a solution!
┌──┐bcd───┐┌──efg┌─┐i──────i┌─────j
│kka│└───┐││l┌─┘│hm│n────n┌o│┌────┐
│b──┘q──q│││l│┌r└┐│└─h┌──┐│t││uvvu│
└──┐w┌───┘d└e││xx│└──m│yy││t││└──┘│
┌─z│w│A────A┌┘└─r│s───┘BB││┌┘└p┌─┐│
│D┐└┐│┌────E│F──F│G──┐H┐┌┘││┌──┘IC│
└z└D│││JKL┌─┘┌──┐g┌─┐└G││N│j│┌─┐└┐│
P──┐a││││L│QQ│RR└┐│N└──┘s││┌┘│S│T││
U─┐│┌┘││└K└─┐└─┐V││└─────┘││┌┘││T││
WW│││X││┌──┐│Z0││M│┌──────┘││┌┘└┐││
1┐│││X│││23││Z0│└┐││┌────M┌┘││44│││
5│││└┐││Y││Y│┌─┘6││││┌───┐C┌┘│┌─┘│p
5││└P│││2┘└3││6─┘VH│││┌─┐│O┘S┘│99└I
┌┘│┌─┘││E┐!!│└───┐o┘│││"│└─┐O─┘$$┌%
│U┘│&&│└J│\\│(┐)┐└──┘│8││┌*└┐┌───┘+
└─1└─┐└──┘,┐│-└┐│(/:┌┘"┘││;;│%+───┘
┌─c<<│==┌─┐││└┐│)│/││8>>│*┌?│┌───┐#
│[──[└─┐│]││└┐│└─┘:┘│└──┘┌┘┌┘?┌─^││
└─┐_──_│f││└,│└────-│`──`│7┘^─┘┌─┘│
{{└────┘]┘└──┘|────|└───7└─────┘#─┘
I've tested replacing step (4) with a function that iteratively, randomly merges two neighboring paths. However it game much denser puzzles, and I already think the above is nearly too dense to be difficult.
Here is a list of problems I've generated of different size: https://github.com/thomasahle/numberlink/blob/master/puzzles/inputs3
The most straightforward way to create such a level is to find a way to solve it. This way, you can basically generate any random starting configuration and determine if it is a valid level by trying to have it solved. This will generate the most diverse levels.
And even if you find a way to generate the levels some other way, you'll still want to apply this solving algorithm to prove that the generated level is any good ;)
Brute-force enumerating
If the board has a size of NxN cells, and there are also N colours available, brute-force enumerating all possible configurations (regardless of wether they form actual paths between start and end nodes) would take:
N^2 cells total
2N cells already occupied with start and end nodes
N^2 - 2N cells for which the color has yet to be determined
N colours available.
N^(N^2 - 2N) possible combinations.
So,
For N=5, this means 5^15 = 30517578125 combinations.
For N=6, this means 6^24 = 4738381338321616896 combinations.
In other words, the number of possible combinations is pretty high to start with, but also grows ridiculously fast once you start making the board larger.
Constraining the number of cells per color
Obviously, we should try to reduce the number of configurations as much as possible. One way of doing that is to consider the minimum distance ("dMin") between each color's start and end cell - we know that there should at least be this much cells with that color. Calculating the minimum distance can be done with a simple flood fill or Dijkstra's algorithm.
(N.B. Note that this entire next section only discusses the number of cells, but does not say anything about their locations)
In your example, this means (not counting the start and end cells)
dMin(orange) = 1
dMin(red) = 1
dMin(green) = 5
dMin(yellow) = 3
dMin(blue) = 5
This means that, of the 15 cells for which the color has yet to be determined, there have to be at least 1 orange, 1 red, 5 green, 3 yellow and 5 blue cells, also making a total of 15 cells.
For this particular example this means that connecting each color's start and end cell by (one of) the shortest paths fills the entire board - i.e. after filling the board with the shortest paths no uncoloured cells remain. (This should be considered "luck", not every starting configuration of the board will cause this to happen).
Usually, after this step, we have a number of cells that can be freely coloured, let's call this number U. For N=5,
U = 15 - (dMin(orange) + dMin(red) + dMin(green) + dMin(yellow) + dMin(blue))
Because these cells can take any colour, we can also determine the maximum number of cells that can have a particular colour:
dMax(orange) = dMin(orange) + U
dMax(red) = dMin(red) + U
dMax(green) = dMin(green) + U
dMax(yellow) = dMin(yellow) + U
dMax(blue) = dMin(blue) + U
(In this particular example, U=0, so the minimum number of cells per colour is also the maximum).
Path-finding using the distance constraints
If we were to brute force enumerate all possible combinations using these color constraints, we would have a lot less combinations to worry about. More specifically, in this particular example we would have:
15! / (1! * 1! * 5! * 3! * 5!)
= 1307674368000 / 86400
= 15135120 combinations left, about a factor 2000 less.
However, this still doesn't give us the actual paths. so a better idea would be to a backtracking search, where we process each colour in turn and attempt to find all paths that:
doesn't cross an already coloured cell
Is not shorter than dMin(colour) and not longer than dMax(colour).
The second criteria will reduce the number of paths reported per colour, which causes the total number of paths to be tried to be greatly reduced (due to the combinatorial effect).
In pseudo-code:
function SolveLevel(initialBoard of size NxN)
{
foreach(colour on initialBoard)
{
Find startCell(colour) and endCell(colour)
minDistance(colour) = Length(ShortestPath(initialBoard, startCell(colour), endCell(colour)))
}
//Determine the number of uncoloured cells remaining after all shortest paths have been applied.
U = N^(N^2 - 2N) - (Sum of all minDistances)
firstColour = GetFirstColour(initialBoard)
ExplorePathsForColour(
initialBoard,
firstColour,
startCell(firstColour),
endCell(firstColour),
minDistance(firstColour),
U)
}
}
function ExplorePathsForColour(board, colour, startCell, endCell, minDistance, nrOfUncolouredCells)
{
maxDistance = minDistance + nrOfUncolouredCells
paths = FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance)
foreach(path in paths)
{
//Render all cells in 'path' on a copy of the board
boardCopy = Copy(board)
boardCopy = ApplyPath(boardCopy, path)
uRemaining = nrOfUncolouredCells - (Length(path) - minDistance)
//Recursively explore all paths for the next colour.
nextColour = NextColour(board, colour)
if(nextColour exists)
{
ExplorePathsForColour(
boardCopy,
nextColour,
startCell(nextColour),
endCell(nextColour),
minDistance(nextColour),
uRemaining)
}
else
{
//No more colours remaining to draw
if(uRemaining == 0)
{
//No more uncoloured cells remaining
Report boardCopy as a result
}
}
}
}
FindAllPaths
This only leaves FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance) to be implemented. The tricky thing here is that we're not searching for the shortest paths, but for any paths that fall in the range determined by minDistance and maxDistance. Hence, we can't just use Dijkstra's or A*, because they will only record the shortest path to each cell, not any possible detours.
One way of finding these paths would be to use a multi-dimensional array for the board, where
each cell is capable of storing multiple waypoints, and a waypoint is defined as the pair (previous waypoint, distance to origin). The previous waypoint is needed to be able to reconstruct the entire path once we've reached the destination, and the distance to origin
prevents us from exceeding the maxDistance.
Finding all paths can then be done by using a flood-fill like exploration from the startCell outwards, where for a given cell, each uncoloured or same-as-the-current-color-coloured neigbour is recursively explored (except the ones that form our current path to the origin) until we reach either the endCell or exceed the maxDistance.
An improvement on this strategy is that we don't explore from the startCell outwards to the endCell, but that we explore from both the startCell and endCell outwards in parallel, using Floor(maxDistance / 2) and Ceil(maxDistance / 2) as the respective maximum distances. For large values of maxDistance, this should reduce the number of explored cells from 2 * maxDistance^2 to maxDistance^2.
I think you'll want to do this in two steps. Step 1) find a set of non-intersecting paths that connect all your points, then 2) Grow/shift those paths to fill the entire board
My thoughts on Step 1 are to essentially perform Dijkstra like algorithm on all points simultaneously, growing together the paths. Similar to Dijkstra, I think you'll want to flood-fill out from each of your points, chosing which node to search next using some heuristic (My hunch says chosing points with the least degrees of freedom first, then by distance, might be a good one). Very differently from Dijkstra though I think we might be stuck with having to backtrack when we have multiple paths attempting to grow into the same node. (This could of course be fairly problematic on bigger maps, but might not be a big deal on small maps like the one you have above.)
You may also solve for some of the easier paths before you start the above algorithm, mainly to cut down on the number of backtracks needed. In specific, if you can make a trace between points along the edge of the board, you can guarantee that connecting those two points in that fashion would never interfere with other paths, so you can simply fill those in and take those guys out of the equation. You could then further iterate on this until all of these "quick and easy" paths are found by tracing along the borders of the board, or borders of existing paths. That algorithm would actually completely solve the above example board, but would undoubtedly fail elsewhere .. still, it would be very cheap to perform and would reduce your search time for the previous algorithm.
Alternatively
You could simply do a real Dijkstra's algorithm between each set of points, pathing out the closest points first (or trying them in some random orders a few times). This would probably work for a fair number of cases, and when it fails simply throw out the map and generate a new one.
Once you have Step 1 solved, Step 2 should be easier, though not necessarily trivial. To grow your paths, I think you'll want to grow your paths outward (so paths closest to walls first, growing towards the walls, then other inner paths outwards, etc.). To grow, I think you'll have two basic operations, flipping corners, and expanding into into adjacent pairs of empty squares.. that is to say, if you have a line like
.v<<.
v<...
v....
v....
First you'll want to flip the corners to fill in your edge spaces
v<<<.
v....
v....
v....
Then you'll want to expand into neighboring pairs of open space
v<<v.
v.^<.
v....
v....
v<<v.
>v^<.
v<...
v....
etc..
Note that what I've outlined wont guarantee a solution if one exists, but I think you should be able to find one most of the time if one exists, and then in the cases where the map has no solution, or the algorithm fails to find one, just throw out the map and try a different one :)
You have two choices:
Write a custom solver
Brute force it.
I used option (2) to generate Boggle type boards and it is VERY successful. If you go with Option (2), this is how you do it:
Tools needed:
Write a A* solver.
Write a random board creator
To solve:
Generate a random board consisting of only endpoints
while board is not solved:
get two endpoints closest to each other that are not yet solved
run A* to generate path
update board so next A* knows new board layout with new path marked as un-traversable.
At exit of loop, check success/fail (is whole board used/etc) and run again if needed
The A* on a 10x10 should run in hundredths of a second. You can probably solve 1k+ boards/second. So a 10 second run should get you several 'usable' boards.
Bonus points:
When generating levels for a IAP (in app purchase) level pack, remember to check for mirrors/rotations/reflections/etc so you don't have one board a copy of another (which is just lame).
Come up with a metric that will figure out if two boards are 'similar' and if so, ditch one of them.
Lets say I have a parabola. Now I also have a bunch of sticks that are all of the same width (yes my drawing skills are amazing!). How can I stack these sticks within the parabola such that I am minimizing the space it uses as much as possible? I believe that this falls under the category of Knapsack problems, but this Wikipedia page doesn't appear to bring me closer to a real world solution. Is this a NP-Hard problem?
In this problem we are trying to minimize the amount of area consumed (eg: Integral), which includes vertical area.
I cooked up a solution in JavaScript using processing.js and HTML5 canvas.
This project should be a good starting point if you want to create your own solution. I added two algorithms. One that sorts the input blocks from largest to smallest and another that shuffles the list randomly. Each item is then attempted to be placed in the bucket starting from the bottom (smallest bucket) and moving up until it has enough space to fit.
Depending on the type of input the sort algorithm can give good results in O(n^2). Here's an example of the sorted output.
Here's the insert in order algorithm.
function solve(buckets, input) {
var buckets_length = buckets.length,
results = [];
for (var b = 0; b < buckets_length; b++) {
results[b] = [];
}
input.sort(function(a, b) {return b - a});
input.forEach(function(blockSize) {
var b = buckets_length - 1;
while (b > 0) {
if (blockSize <= buckets[b]) {
results[b].push(blockSize);
buckets[b] -= blockSize;
break;
}
b--;
}
});
return results;
}
Project on github - https://github.com/gradbot/Parabolic-Knapsack
It's a public repo so feel free to branch and add other algorithms. I'll probably add more in the future as it's an interesting problem.
Simplifying
First I want to simplify the problem, to do that:
I switch the axes and add them to each other, this results in x2 growth
I assume it is parabola on a closed interval [a, b], where a = 0 and for this example b = 3
Lets say you are given b (second part of interval) and w (width of a segment), then you can find total number of segments by n=Floor[b/w]. In this case there exists a trivial case to maximize Riemann sum and function to get i'th segment height is: f(b-(b*i)/(n+1))). Actually it is an assumption and I'm not 100% sure.
Max'ed example for 17 segments on closed interval [0, 3] for function Sqrt[x] real values:
And the segment heights function in this case is Re[Sqrt[3-3*Range[1,17]/18]], and values are:
Exact form:
{Sqrt[17/6], 2 Sqrt[2/3], Sqrt[5/2],
Sqrt[7/3], Sqrt[13/6], Sqrt[2],
Sqrt[11/6], Sqrt[5/3], Sqrt[3/2],
2/Sqrt[3], Sqrt[7/6], 1, Sqrt[5/6],
Sqrt[2/3], 1/Sqrt[2], 1/Sqrt[3],
1/Sqrt[6]}
Approximated form:
{1.6832508230603465,
1.632993161855452, 1.5811388300841898, 1.5275252316519468, 1.4719601443879744, 1.4142135623730951, 1.35400640077266, 1.2909944487358056, 1.224744871391589, 1.1547005383792517, 1.0801234497346435, 1, 0.9128709291752769, 0.816496580927726, 0.7071067811865475, 0.5773502691896258, 0.4082482904638631}
What you have archived is a Bin-Packing problem, with partially filled bin.
Finding b
If b is unknown or our task is to find smallest possible b under what all sticks form the initial bunch fit. Then we can limit at least b values to:
lower limit : if sum of segment heights = sum of stick heights
upper limit : number of segments = number of sticks longest stick < longest segment height
One of the simplest way to find b is to take a pivot at (higher limit-lower limit)/2 find if solution exists. Then it becomes new higher or lower limit and you repeat the process until required precision is met.
When you are looking for b you do not need exact result, but suboptimal and it would be much faster if you use efficient algorithm to find relatively close pivot point to actual b.
For example:
sort the stick by length: largest to smallest
start 'putting largest items' into first bin thy fit
This is equivalent to having multiple knapsacks (assuming these blocks are the same 'height', this means there's one knapsack for each 'line'), and is thus an instance of the bin packing problem.
See http://en.wikipedia.org/wiki/Bin_packing
How can I stack these sticks within the parabola such that I am minimizing the (vertical) space it uses as much as possible?
Just deal with it like any other Bin Packing problem. I'd throw meta-heuristics on it (such as tabu search, simulated annealing, ...) since those algorithms aren't problem specific.
For example, if I'd start from my Cloud Balance problem (= a form of Bin Packing) in Drools Planner. If all the sticks have the same height and there's no vertical space between 2 sticks on top of each other, there's not much I'd have to change:
Rename Computer to ParabolicRow. Remove it's properties (cpu, memory, bandwith). Give it a unique level (where 0 is the lowest row). Create a number of ParabolicRows.
Rename Process to Stick
Rename ProcessAssignement to StickAssignment
Rewrite the hard constraints so it checks if there's enough room for the sum of all Sticks assigned to a ParabolicRow.
Rewrite the soft constraints to minimize the highest level of all ParabolicRows.
I'm very sure it is equivalent to bin-packing:
informal reduction
Be x the width of the widest row, make the bins 2x big and create for every row a placeholder element which is 2x-rowWidth big. So two placeholder elements cannot be packed into one bin.
To reduce bin-packing on parabolic knapsack you just create placeholder elements for all rows that are bigger than the needed binsize with size width-binsize. Furthermore add placeholders for all rows that are smaller than binsize which fill the whole row.
This would obviously mean your problem is NP-hard.
For other ideas look here maybe: http://en.wikipedia.org/wiki/Cutting_stock_problem
Most likely this is the 1-0 Knapsack or a bin-packing problem. This is a NP hard problem and most likely this problem I don't understand and I can't explain to you but you can optimize with greedy algorithms. Here is a useful article about it http://www.developerfusion.com/article/5540/bin-packing that I use to make my php class bin-packing at phpclasses.org.
Props to those who mentioned the fact that the levels could be at varying heights (ex: assuming the sticks are 1 'thick' level 1 goes from 0.1 unit to 1.1 units, or it could go from 0.2 to 1.2 units instead)
You could of course expand the "multiple bin packing" methodology and test arbitrarily small increments. (Ex: run the multiple binpacking methodology with levels starting at 0.0, 0.1, 0.2, ... 0.9) and then choose the best result, but it seems like you would get stuck calulating for an infinite amount of time unless you had some methodlogy to verify that you had gotten it 'right' (or more precisely, that you had all the 'rows' correct as to what they contained, at which point you could shift them down until they met the edge of the parabola)
Also, the OP did not specify that the sticks had to be laid horizontally - although perhaps the OP implied it with those sweet drawings.
I have no idea how to optimally solve such an issue, but i bet there are certain cases where you could randomly place sticks and then test if they are 'inside' the parabola, and it would beat out any of the methodologies relying only on horizontal rows.
(Consider the case of a narrow parabola that we are trying to fill with 1 long stick.)
I say just throw them all in there and shake them ;)