Finding all possible segment tilings in a fixed capacity - algorithm

I am looking for an efficient (more efficient than my combinatorial search) algorithm to tile segments of strings.
I am given two things:
Length of the "area" to cover;
Set of segments (a segment is defined by starting and ending index, as well as a label)
Example:
The total length of area is 5. The given segments are:
- [(0,2), "A B"]
- [(1,3), "B C"]
- [(1,4), "B C D"]
- [(3,5), "D E"]
I am using Python notation to denote segment e.g. (0,2) and its label "A B" a string.
There are 3 ways how one could populate the "area" of size 5 with the given segments with no overlap:
1. ("A B"), ("D E")
2. ("B C"), ("D E")
3. ("B C D")
Other tilings are not possible as an additional segment will overlap with one of the existing tiles already sitting in the "area".
My current approach is to start with one segment and see how many combinations of non-overlapping segments I get. Then two segments and so on till 4. Each combination validity is checked if any of remaining segments cannot be added. If any of remaining segments can be inserted, the whole combination is rejected as invalid.
This works pretty well when there are not many segments and there is little overlap/combinations.
Is there a simple yet efficient algorithm to find these "tilings"?

There is a dynamic programming approach in which you work from left to right along the area to be covered, and at each point work out the number of tilings which finish neatly at that point. To calculate this, look at the segments which cover that point and end at that point. For each of these segments the number of tilings which end with that segment is the number of tilings which end just before that segment starts, and you have already calculated this. So add up these numbers to get the number of tilings which cover that point and end at that point. The answer you want is the number of tilings which cover the end point in the area to be covered and end at that point (assuming want to end neatly or will be forced to).
If you just want one of the possible tilings, you don't need to count the number of tilings terminating at each point, you just want a flag to tell whether there are any tilings terminating at that point or not. It makes it easier to retrieve a possible tiling later if you also note down, at each point where there is a tiling terminating, the tile that terminates at that point for one of the tilings.
Now you can work out a possible tiling that tiles the whole line. Look at the entry for the last character in the string. This gives you the segment that ends that tiling. Given that segment, you know the its length. If it is of length three, and the last offset was offset ten, there must be a valid tiling that terminates at offset seven. You will have noted the segment that terminates at offset seven, which gives you the second last segment in the tiling you need. By looking at the length of that segment, you can work out where to look for a note of the segment just before that, and so on, until you have recovered the entire tiling, from right to left.

Related

Algorithm to find positions in a game board i can move to

The problem i have goes as follows (simplified):
I have a board, represented as a matrix of n x m squares (n might equal m)
In it, there are p game pieces
Each game piece has a pre-defined speed, which is how many steps it can take in it's turn
Pieces can't overlap
There are three types of cells: those which don't require extra movements to be crossed (you loose 0 extra speed when going through), those which require 1 extra movement to be crossed and some which you simply can't get through (like a wall)
So, given a game piece in a certain [i,j] position in my game board, i want to find out:
a) All the places it can move to, with it's speed
b) The path to a certain [k,l] position in the board
Having a) solved, b) is almost trivial.
Currently the algorithm i'm using goes as follows, assuming a language where arrays of size n go from 0 to n-1:
Create a sqaure matrix of speed*2+1 size which represents the cost of moving as if all cells had no extra cost to be crossed (the piece is on the position [speed, speed])
Create another square matrix of speed*2+1 size which has the extra costs of each cell (those which can't be crossed because either it's a wall or there is another piece in it has a value of infinite)(the piece is on the position [speed, speed])
Create another square matrix of speed*2+1 size which is the sum of the former two(the piece is on the position [speed, speed])
Correct the latter matrix making sure the value of each cell is: the minimal cost of all the adjacent cells + 1 + the extra cost of the cell. If it isn't, i correct it and start with the matrix all over again.
An example:
P are pieces, W are walls, E are empty cells which require no extra movement, X are cells which require 1 extra movement to be crossed.
X,E,X,X,X
X,X,X,X,X
W,E,E,E,W
W,E,X,E,W
E,P,P,P,P
The first matrix:
2,2,2,2,2
2,1,1,1,2
2,1,0,1,2
2,1,1,1,2
2,2,2,2,2
The second matrix:
1,0,1,inf,1
1,1,1,1,1
inf,0,0,0,inf
inf,0,1,0,inf
0,inf,inf,inf,inf
The sum:
3,2,3,3,3
3,2,2,2,3
inf,1,0,1,inf
inf,1,2,1,inf
inf,inf,inf,inf,inf
Since [0,0] is not 2+1+1, i correct it:
The sum:
4,2,3,3,3
3,2,2,2,3
inf,1,0,1,inf
inf,1,2,1,inf
inf,inf,inf,inf,inf
Since [0,1] is not 2+1+0, i correct it:
The sum:
4,3,3,3,3
3,2,2,2,3
inf,1,0,1,inf
inf,1,2,1,inf
inf,inf,inf,inf,inf
Since [0,2] is not 2+1+1, i correct it:
The sum:
4,2,4,3,3
3,2,2,2,3
inf,1,0,1,inf
inf,1,2,1,inf
inf,inf,inf,inf,inf
Which one is the correct answer?
What I want to know is if this problem has a name I can search it by (couldn't find anything) or if anybody can tell me how to solve the point a).
Note that I want the optimal solution, so I went with a dynamic programming algorithm. Might random walkers be better? AFAIK, this solution is not failing (yet), but I have no proof of correctness for it, and I want to be sure it works.
A-star is a standard algorithm to determine shortest path give obstacles on a 2d board and cost per square of moving. You can also use it to test if a specific move is valid, but to actually generate all valid moves I would simply start ay the start position, move in each direction by one square mark which squares are valid and then repeat from each of your new places making sure not to visit the same square again. It will be a recursive algorithm calling itself at most 4 times on each call and will generate you valid moves efficiently. If there are constraints like how many squares you can move at once with different costs just pass the running total of how far you've come for each square.

How to solve crossword (NP-Hard)?

I am currently doing an assignment and I'm stuck with the approach.
I have a crossword problem which consists of an empty grid (no solid square as a conventional crossword would), with a varied width and height between 4 and 400 (inclusive).
Rules:
Words are part of the input - a list of 10 - 1000 (inclusive) English words of varying lengths.
A horizontal word can only intersect a vertical word.
A vertical word can only intersect a horizontal word.
A word can only intersect 1 or 2 other words.
Each letter is worth one point.
Words must have a 1 grid space gap surrounding it unless it is a part of an intersecting word.
Example:
X X X X X X
X B O S S X
X X X X X X
Goal:
Get the maximum possible score within a 5 minute time limit.
So far:
After some research I am aware that this is an NP-Hard problem. Thus the most optimal solution cannot be calculated because every combination cannot be examined.
The easiest solution would appear to be to sort the words according to length and inserting the highest scoring words for maximum score (greedy algorithm).
I’ve also been told a recursive tree with the nodes consisting of alternative equally scoring word insertions and the knapsack algorithm apply to this problem (not sure what the implementation would look like).
Questions:
What allows me to check the maximum number of combinations within a 5 minute time span that scales accordingly to the maximum possible word list and grid size?
What heuristics might I apply when inserting words?
Btw the goal here is to get the best possible solution in 5 minutes.
To clarify each letter of a valid word is worth 1 point, thus a 5 letter word is worth 5 points.
Thanks in advance I have been reading a lot of mathematical notation on crossword research papers all day which has seem to have lead me in a circle.
I'd start with a word with following characteristics:
It should have max possible intersections.
Its length should be such that number of words of that length are minimum in the list.
ie, word length should be least frequent and most number of intersections.
Reason for this kind of selection is that it would minimize further possibility of words that can be selected. eg. A word of size 9 with 2 further intersections is selected. These intersecting words are of length 6 and 5 (say). Now, you have removed possibility of all those words of length 6 and 5 whose 3rd char is 'a' and 2nd char is 's' (say, 'a' and 's' are the intersecting letters).
If there are many places with same configuration, run this selection procedure one or two steps deeper to get a better selection of which part (word) of the grid to fill first.
Now, try filling in all words in this 1st selected position (since this had min frequency, it should be good to use) and then going deeper in the crossword to fill it. Whichever word results in most points till a deadend is reached, should be your solution. When you reach a dead-end, you can start over with a new word.
This seems like a really interesting problem in discrete optimization. You're certainly right; with the number of words and number of possible placements there is no way you could ever explore a fraction of the space.
Also given the 5 minute time limit (quite short), I think you're going to have a really hard time with any solid heuristic. I think your best bet might be some sort of random permutation / simulated annealing algorithm.
If I was doing this, I would first calculate clusters of words, completely ignoring the crossword structure itself. Take one word, find a second word that intersects it. Then find another word that can fit onto this structure (obeying the max of 2 intersections per word), and so on. You should end up with many of these clusters, which you can rank by density (points / area used). I think you should be able to do this relatively quickly.
Then for the random permutation / simulated annealing part, for my moves I would place either a cluster or unused word onto the crossword itself, or move an existing cluster / word. Just save the current highest-scoring configuration as you go, and return this after the 5 minutes.
If the 5 min is too short to find anything meaningful using random permutations, another approach might be to use a constraint propagation idea working with those clusters.

What to use for flow free-like game random level creation?

I need some advice. I'm developing a game similar to Flow Free wherein the gameboard is composed of a grid and colored dots, and the user has to connect the same colored dots together without overlapping other lines, and using up ALL the free spaces in the board.
My question is about level-creation. I wish to make the levels generated randomly (and should at least be able to solve itself so that it can give players hints) and I am in a stump as to what algorithm to use. Any suggestions?
Note: image shows the objective of Flow Free, and it is the same objective of what I am developing.
Thanks for your help. :)
Consider solving your problem with a pair of simpler, more manageable algorithms: one algorithm that reliably creates simple, pre-solved boards and another that rearranges flows to make simple boards more complex.
The first part, building a simple pre-solved board, is trivial (if you want it to be) if you're using n flows on an nxn grid:
For each flow...
Place the head dot at the top of the first open column.
Place the tail dot at the bottom of that column.
Alternatively, you could provide your own hand-made starter boards to pass to the second part. The only goal of this stage is to get a valid board built, even if it's just trivial or predetermined, so it's worth keeping it simple.
The second part, rearranging the flows, involves looping over each flow, seeing which one can work with its neighboring flow to grow and shrink:
For some number of iterations...
Choose a random flow f.
If f is at the minimum length (say 3 squares long), skip to the next iteration because we can't shrink f right now.
If the head dot of f is next to a dot from another flow g (if more than one g to choose from, pick one at random)...
Move f's head dot one square along its flow (i.e., walk it one square towards the tail). f is now one square shorter and there's an empty square. (The puzzle is now unsolved.)
Move the neighboring dot from g into the empty square vacated by f. Now there's an empty square where g's dot moved from.
Fill in that empty spot with flow from g. Now g is one square longer than it was at the beginning of this iteration. (The puzzle is back to being solved as well.)
Repeat the previous step for f's tail dot.
The approach as it stands is limited (dots will always be neighbors) but it's easy to expand upon:
Add a step to loop through the body of flow f, looking for trickier ways to swap space with other flows...
Add a step that prevents a dot from moving to an old location...
Add any other ideas that you come up with.
The overall solution here is probably less than the ideal one that you're aiming for, but now you have two simple algorithms that you can flesh out further to serve the role of one large, all-encompassing algorithm. In the end, I think this approach is manageable, not cryptic, and easy to tweek, and, if nothing else, a good place to start.
Update: I coded a proof-of-concept based on the steps above. Starting with the first 5x5 grid below, the process produced the subsequent 5 different boards. Some are interesting, some are not, but they're always valid with one known solution.
Starting Point
5 Random Results (sorry for the misaligned screenshots)
And a random 8x8 for good measure. The starting point was the same simple columns approach as above.
Updated answer: I implemented a new generator using the idea of "dual puzzles". This allows much sparser and higher quality puzzles than any previous method I know of. The code is on github. I'll try to write more details about how it works, but here is an example puzzle:
Old answer:
I have implemented the following algorithm in my numberlink solver and generator. In enforces the rule, that a path can never touch itself, which is normal in most 'hardcore' numberlink apps and puzzles
First the board is tiled with 2x1 dominos in a simple, deterministic way.
If this is not possible (on an odd area paper), the bottom right corner is
left as a singleton.
Then the dominos are randomly shuffled by rotating random pairs of neighbours.
This is is not done in the case of width or height equal to 1.
Now, in the case of an odd area paper, the bottom right corner is attached to
one of its neighbour dominos. This will always be possible.
Finally, we can start finding random paths through the dominos, combining them
as we pass through. Special care is taken not to connect 'neighbour flows'
which would create puzzles that 'double back on themselves'.
Before the puzzle is printed we 'compact' the range of colours used, as much as possible.
The puzzle is printed by replacing all positions that aren't flow-heads with a .
My numberlink format uses ascii characters instead of numbers. Here is an example:
$ bin/numberlink --generate=35x20
Warning: Including non-standard characters in puzzle
35 20
....bcd.......efg...i......i......j
.kka........l....hm.n....n.o.......
.b...q..q...l..r.....h.....t..uvvu.
....w.....d.e..xx....m.yy..t.......
..z.w.A....A....r.s....BB.....p....
.D.........E.F..F.G...H.........IC.
.z.D...JKL.......g....G..N.j.......
P...a....L.QQ.RR...N....s.....S.T..
U........K......V...............T..
WW...X.......Z0..M.................
1....X...23..Z0..........M....44...
5.......Y..Y....6.........C.......p
5...P...2..3..6..VH.......O.S..99.I
........E.!!......o...."....O..$$.%
.U..&&..J.\\.(.)......8...*.......+
..1.......,..-...(/:.."...;;.%+....
..c<<.==........)./..8>>.*.?......#
.[..[....]........:..........?..^..
..._.._.f...,......-.`..`.7.^......
{{......].....|....|....7.......#..
And here I run it through my solver (same seed):
$ bin/numberlink --generate=35x20 | bin/numberlink --tubes
Found a solution!
┌──┐bcd───┐┌──efg┌─┐i──────i┌─────j
│kka│└───┐││l┌─┘│hm│n────n┌o│┌────┐
│b──┘q──q│││l│┌r└┐│└─h┌──┐│t││uvvu│
└──┐w┌───┘d└e││xx│└──m│yy││t││└──┘│
┌─z│w│A────A┌┘└─r│s───┘BB││┌┘└p┌─┐│
│D┐└┐│┌────E│F──F│G──┐H┐┌┘││┌──┘IC│
└z└D│││JKL┌─┘┌──┐g┌─┐└G││N│j│┌─┐└┐│
P──┐a││││L│QQ│RR└┐│N└──┘s││┌┘│S│T││
U─┐│┌┘││└K└─┐└─┐V││└─────┘││┌┘││T││
WW│││X││┌──┐│Z0││M│┌──────┘││┌┘└┐││
1┐│││X│││23││Z0│└┐││┌────M┌┘││44│││
5│││└┐││Y││Y│┌─┘6││││┌───┐C┌┘│┌─┘│p
5││└P│││2┘└3││6─┘VH│││┌─┐│O┘S┘│99└I
┌┘│┌─┘││E┐!!│└───┐o┘│││"│└─┐O─┘$$┌%
│U┘│&&│└J│\\│(┐)┐└──┘│8││┌*└┐┌───┘+
└─1└─┐└──┘,┐│-└┐│(/:┌┘"┘││;;│%+───┘
┌─c<<│==┌─┐││└┐│)│/││8>>│*┌?│┌───┐#
│[──[└─┐│]││└┐│└─┘:┘│└──┘┌┘┌┘?┌─^││
└─┐_──_│f││└,│└────-│`──`│7┘^─┘┌─┘│
{{└────┘]┘└──┘|────|└───7└─────┘#─┘
I've tested replacing step (4) with a function that iteratively, randomly merges two neighboring paths. However it game much denser puzzles, and I already think the above is nearly too dense to be difficult.
Here is a list of problems I've generated of different size: https://github.com/thomasahle/numberlink/blob/master/puzzles/inputs3
The most straightforward way to create such a level is to find a way to solve it. This way, you can basically generate any random starting configuration and determine if it is a valid level by trying to have it solved. This will generate the most diverse levels.
And even if you find a way to generate the levels some other way, you'll still want to apply this solving algorithm to prove that the generated level is any good ;)
Brute-force enumerating
If the board has a size of NxN cells, and there are also N colours available, brute-force enumerating all possible configurations (regardless of wether they form actual paths between start and end nodes) would take:
N^2 cells total
2N cells already occupied with start and end nodes
N^2 - 2N cells for which the color has yet to be determined
N colours available.
N^(N^2 - 2N) possible combinations.
So,
For N=5, this means 5^15 = 30517578125 combinations.
For N=6, this means 6^24 = 4738381338321616896 combinations.
In other words, the number of possible combinations is pretty high to start with, but also grows ridiculously fast once you start making the board larger.
Constraining the number of cells per color
Obviously, we should try to reduce the number of configurations as much as possible. One way of doing that is to consider the minimum distance ("dMin") between each color's start and end cell - we know that there should at least be this much cells with that color. Calculating the minimum distance can be done with a simple flood fill or Dijkstra's algorithm.
(N.B. Note that this entire next section only discusses the number of cells, but does not say anything about their locations)
In your example, this means (not counting the start and end cells)
dMin(orange) = 1
dMin(red) = 1
dMin(green) = 5
dMin(yellow) = 3
dMin(blue) = 5
This means that, of the 15 cells for which the color has yet to be determined, there have to be at least 1 orange, 1 red, 5 green, 3 yellow and 5 blue cells, also making a total of 15 cells.
For this particular example this means that connecting each color's start and end cell by (one of) the shortest paths fills the entire board - i.e. after filling the board with the shortest paths no uncoloured cells remain. (This should be considered "luck", not every starting configuration of the board will cause this to happen).
Usually, after this step, we have a number of cells that can be freely coloured, let's call this number U. For N=5,
U = 15 - (dMin(orange) + dMin(red) + dMin(green) + dMin(yellow) + dMin(blue))
Because these cells can take any colour, we can also determine the maximum number of cells that can have a particular colour:
dMax(orange) = dMin(orange) + U
dMax(red) = dMin(red) + U
dMax(green) = dMin(green) + U
dMax(yellow) = dMin(yellow) + U
dMax(blue) = dMin(blue) + U
(In this particular example, U=0, so the minimum number of cells per colour is also the maximum).
Path-finding using the distance constraints
If we were to brute force enumerate all possible combinations using these color constraints, we would have a lot less combinations to worry about. More specifically, in this particular example we would have:
15! / (1! * 1! * 5! * 3! * 5!)
= 1307674368000 / 86400
= 15135120 combinations left, about a factor 2000 less.
However, this still doesn't give us the actual paths. so a better idea would be to a backtracking search, where we process each colour in turn and attempt to find all paths that:
doesn't cross an already coloured cell
Is not shorter than dMin(colour) and not longer than dMax(colour).
The second criteria will reduce the number of paths reported per colour, which causes the total number of paths to be tried to be greatly reduced (due to the combinatorial effect).
In pseudo-code:
function SolveLevel(initialBoard of size NxN)
{
foreach(colour on initialBoard)
{
Find startCell(colour) and endCell(colour)
minDistance(colour) = Length(ShortestPath(initialBoard, startCell(colour), endCell(colour)))
}
//Determine the number of uncoloured cells remaining after all shortest paths have been applied.
U = N^(N^2 - 2N) - (Sum of all minDistances)
firstColour = GetFirstColour(initialBoard)
ExplorePathsForColour(
initialBoard,
firstColour,
startCell(firstColour),
endCell(firstColour),
minDistance(firstColour),
U)
}
}
function ExplorePathsForColour(board, colour, startCell, endCell, minDistance, nrOfUncolouredCells)
{
maxDistance = minDistance + nrOfUncolouredCells
paths = FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance)
foreach(path in paths)
{
//Render all cells in 'path' on a copy of the board
boardCopy = Copy(board)
boardCopy = ApplyPath(boardCopy, path)
uRemaining = nrOfUncolouredCells - (Length(path) - minDistance)
//Recursively explore all paths for the next colour.
nextColour = NextColour(board, colour)
if(nextColour exists)
{
ExplorePathsForColour(
boardCopy,
nextColour,
startCell(nextColour),
endCell(nextColour),
minDistance(nextColour),
uRemaining)
}
else
{
//No more colours remaining to draw
if(uRemaining == 0)
{
//No more uncoloured cells remaining
Report boardCopy as a result
}
}
}
}
FindAllPaths
This only leaves FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance) to be implemented. The tricky thing here is that we're not searching for the shortest paths, but for any paths that fall in the range determined by minDistance and maxDistance. Hence, we can't just use Dijkstra's or A*, because they will only record the shortest path to each cell, not any possible detours.
One way of finding these paths would be to use a multi-dimensional array for the board, where
each cell is capable of storing multiple waypoints, and a waypoint is defined as the pair (previous waypoint, distance to origin). The previous waypoint is needed to be able to reconstruct the entire path once we've reached the destination, and the distance to origin
prevents us from exceeding the maxDistance.
Finding all paths can then be done by using a flood-fill like exploration from the startCell outwards, where for a given cell, each uncoloured or same-as-the-current-color-coloured neigbour is recursively explored (except the ones that form our current path to the origin) until we reach either the endCell or exceed the maxDistance.
An improvement on this strategy is that we don't explore from the startCell outwards to the endCell, but that we explore from both the startCell and endCell outwards in parallel, using Floor(maxDistance / 2) and Ceil(maxDistance / 2) as the respective maximum distances. For large values of maxDistance, this should reduce the number of explored cells from 2 * maxDistance^2 to maxDistance^2.
I think you'll want to do this in two steps. Step 1) find a set of non-intersecting paths that connect all your points, then 2) Grow/shift those paths to fill the entire board
My thoughts on Step 1 are to essentially perform Dijkstra like algorithm on all points simultaneously, growing together the paths. Similar to Dijkstra, I think you'll want to flood-fill out from each of your points, chosing which node to search next using some heuristic (My hunch says chosing points with the least degrees of freedom first, then by distance, might be a good one). Very differently from Dijkstra though I think we might be stuck with having to backtrack when we have multiple paths attempting to grow into the same node. (This could of course be fairly problematic on bigger maps, but might not be a big deal on small maps like the one you have above.)
You may also solve for some of the easier paths before you start the above algorithm, mainly to cut down on the number of backtracks needed. In specific, if you can make a trace between points along the edge of the board, you can guarantee that connecting those two points in that fashion would never interfere with other paths, so you can simply fill those in and take those guys out of the equation. You could then further iterate on this until all of these "quick and easy" paths are found by tracing along the borders of the board, or borders of existing paths. That algorithm would actually completely solve the above example board, but would undoubtedly fail elsewhere .. still, it would be very cheap to perform and would reduce your search time for the previous algorithm.
Alternatively
You could simply do a real Dijkstra's algorithm between each set of points, pathing out the closest points first (or trying them in some random orders a few times). This would probably work for a fair number of cases, and when it fails simply throw out the map and generate a new one.
Once you have Step 1 solved, Step 2 should be easier, though not necessarily trivial. To grow your paths, I think you'll want to grow your paths outward (so paths closest to walls first, growing towards the walls, then other inner paths outwards, etc.). To grow, I think you'll have two basic operations, flipping corners, and expanding into into adjacent pairs of empty squares.. that is to say, if you have a line like
.v<<.
v<...
v....
v....
First you'll want to flip the corners to fill in your edge spaces
v<<<.
v....
v....
v....
Then you'll want to expand into neighboring pairs of open space
v<<v.
v.^<.
v....
v....
v<<v.
>v^<.
v<...
v....
etc..
Note that what I've outlined wont guarantee a solution if one exists, but I think you should be able to find one most of the time if one exists, and then in the cases where the map has no solution, or the algorithm fails to find one, just throw out the map and try a different one :)
You have two choices:
Write a custom solver
Brute force it.
I used option (2) to generate Boggle type boards and it is VERY successful. If you go with Option (2), this is how you do it:
Tools needed:
Write a A* solver.
Write a random board creator
To solve:
Generate a random board consisting of only endpoints
while board is not solved:
get two endpoints closest to each other that are not yet solved
run A* to generate path
update board so next A* knows new board layout with new path marked as un-traversable.
At exit of loop, check success/fail (is whole board used/etc) and run again if needed
The A* on a 10x10 should run in hundredths of a second. You can probably solve 1k+ boards/second. So a 10 second run should get you several 'usable' boards.
Bonus points:
When generating levels for a IAP (in app purchase) level pack, remember to check for mirrors/rotations/reflections/etc so you don't have one board a copy of another (which is just lame).
Come up with a metric that will figure out if two boards are 'similar' and if so, ditch one of them.

How do I calculate the shanten number in mahjong?

This is a followup to my earlier question about deciding if a hand is ready.
Knowledge of mahjong rules would be excellent, but a poker- or romme-based background is also sufficient to understand this question.
In Mahjong 14 tiles (tiles are like
cards in Poker) are arranged to 4 sets
and a pair. A straight ("123") always
uses exactly 3 tiles, not more and not
less. A set of the same kind ("111")
consists of exactly 3 tiles, too. This
leads to a sum of 3 * 4 + 2 = 14
tiles.
There are various exceptions like Kan
or Thirteen Orphans that are not
relevant here. Colors and value ranges
(1-9) are also not important for the
algorithm.
A hand consists of 13 tiles, every time it's our turn we get to pick a new tile and have to discard any tile so we stay on 13 tiles - except if we can win using the newly picked tile.
A hand that can be arranged to form 4 sets and a pair is "ready". A hand that requires only 1 tile to be exchanged is said to be "tenpai", or "1 from ready". Any other hand has a shanten-number which expresses how many tiles need to be exchanged to be in tenpai. So a hand with a shanten number of 1 needs 1 tile to be tenpai (and 2 tiles to be ready, accordingly). A hand with a shanten number of 5 needs 5 tiles to be tenpai and so on.
I'm trying to calculate the shanten number of a hand. After googling around for hours and reading multiple articles and papers on this topic, this seems to be an unsolved problem (except for the brute force approach). The closest algorithm I could find relied on chance, i.e. it was not able to detect the correct shanten number 100% of the time.
Rules
I'll explain a bit on the actual rules (simplified) and then my idea how to tackle this task. In mahjong, there are 4 colors, 3 normal ones like in card games (ace, heart, ...) that are called "man", "pin" and "sou". These colors run from 1 to 9 each and can be used to form straights as well as groups of the same kind. The forth color is called "honors" and can be used for groups of the same kind only, but not for straights. The seven honors will be called "E, S, W, N, R, G, B".
Let's look at an example of a tenpai hand: 2p, 3p, 3p, 3p, 3p, 4p, 5m, 5m, 5m, W, W, W, E. Next we pick an E. This is a complete mahjong hand (ready) and consists of a 2-4 pin street (remember, pins can be used for straights), a 3 pin triple, a 5 man triple, a W triple and an E pair.
Changing our original hand slightly to 2p, 2p, 3p, 3p, 3p, 4p, 5m, 5m, 5m, W, W, W, E, we got a hand in 1-shanten, i.e. it requires an additional tile to be tenpai. In this case, exchanging a 2p for an 3p brings us back to tenpai so by drawing a 3p and an E we win.
1p, 1p, 5p, 5p, 9p, 9p, E, E, E, S, S, W, W is a hand in 2-shanten. There is 1 completed triplet and 5 pairs. We need one pair in the end, so once we pick one of 1p, 5p, 9p, S or W we need to discard one of the other pairs. Example: We pick a 1 pin and discard an W. The hand is in 1-shanten now and looks like this: 1p, 1p, 1p, 5p, 5p, 9p, 9p, E, E, E, S, S, W. Next, we wait for either an 5p, 9p or S. Assuming we pick a 5p and discard the leftover W, we get this: 1p, 1p, 1p, 5p, 5p, 5p, 9p, 9p, E, E, E, S, S. This hand is in tenpai in can complete on either a 9 pin or an S.
To avoid drawing this text in length even more, you can read up on more example at wikipedia or using one of the various search results at google. All of them are a bit more technical though, so I hope the above description suffices.
Algorithm
As stated, I'd like to calculate the shanten number of a hand. My idea was to split the tiles into 4 groups according to their color. Next, all tiles are sorted into sets within their respective groups to we end up with either triplets, pairs or single tiles in the honor group or, additionally, streights in the 3 normal groups. Completed sets are ignored. Pairs are counted, the final number is decremented (we need 1 pair in the end). Single tiles are added to this number. Finally, we divide the number by 2 (since every time we pick a good tile that brings us closer to tenpai, we can get rid of another unwanted tile).
However, I can not prove that this algorithm is correct, and I also have trouble incorporating straights for difficult groups that contain many tiles in a close range. Every kind of idea is appreciated. I'm developing in .NET, but pseudo code or any readable language is welcome, too.
I've thought about this problem a bit more. To see the final results, skip over to the last section.
First idea: Brute Force Approach
First of all, I wrote a brute force approach. It was able to identify 3-shanten within a minute, but it was not very reliable (sometimes too a lot longer, and enumerating the whole space is impossible even for just 3-shanten).
Improvement of Brute Force Approach
One thing that came to mind was to add some intelligence to the brute force approach. The naive way is to add any of the remaining tiles, see if it produced Mahjong, and if not try the next recursively until it was found. Assuming there are about 30 different tiles left and the maximum depth is 6 (I'm not sure if a 7+-shanten hand is even possible [Edit: according to the formula developed later, the maximum possible shanten number is (13-1)*2/3 = 8]), we get (13*30)^6 possibilities, which is large (10^15 range).
However, there is no need to put every leftover tile in every position in your hand. Since every color has to be complete in itself, we can add tiles to the respective color groups and note down if the group is complete in itself. Details like having exactly 1 pair overall are not difficult to add. This way, there are max around (13*9)^6 possibilities, that is around 10^12 and more feasible.
A better solution: Modification of the existing Mahjong Checker
My next idea was to use the code I wrote early to test for Mahjong and modify it in two ways:
don't stop when an invalid hand is found but note down a missing tile
if there are multiple possible ways to use a tile, try out all of them
This should be the optimal idea, and with some heuristic added it should be the optimal algorithm. However, I found it quite difficult to implement - it is definitely possible though. I'd prefer an easier to write and maintain solution first.
An advanced approach using domain knowledge
Talking to a more experienced player, it appears there are some laws that can be used. For instance, a set of 3 tiles does never need to be broken up, as that would never decrease the shanten number. It may, however, be used in different ways (say, either for a 111 or a 123 combination).
Enumerate all possible 3-set and create a new simulation for each of them. Remove the 3-set. Now create all 2-set in the resulting hand and simulate for every tile that improves them to a 3-set. At the same time, simulate for any of the 1-sets being removed. Keep doing this until all 3- and 2-sets are gone. There should be a 1-set (that is, a single tile) be left in the end.
Learnings from implementation and final algorithm
I implemented the above algorithm. For easier understanding I wrote it down in pseudocode:
Remove completed 3-sets
If removed, return (i.e. do not simulate NOT taking the 3-set later)
Remove 2-set by looping through discarding any other tile (this creates a number of branches in the simulation)
If removed, return (same as earlier)
Use the number of left-over single tiles to calculate the shanten number
By the way, this is actually very similar to the approach I take when calculating the number myself, and obviously never to yields too high a number.
This works very well for almost all cases. However, I found that sometimes the earlier assumption ("removing already completed 3-sets is NEVER a bad idea") is wrong. Counter-example: 23566M 25667P 159S. The important part is the 25667. By removing a 567 3-set we end up with a left-over 6 tile, leading to 5-shanten. It would be better to use two of the single tiles to form 56x and 67x, leading to 4-shanten overall.
To fix, we simple have to remove the wrong optimization, leading to this code:
Remove completed 3-sets
Remove 2-set by looping through discarding any other tile
Use the number of left-over single tiles to calculate the shanten number
I believe this always accurately finds the smallest shanten number, but I don't know how to prove that. The time taken is in a "reasonable" range (on my machine 10 seconds max, usually 0 seconds).
The final point is calculating the shanten out of the number of left-over single tiles. First of all, it is obvious that the number is in the form 3*n+1 (because we started out with 14 tiles and always subtracted 3 tiles).
If there is 1 tile left, we're shanten already (we're just waiting for the final pair). With 4 tiles left, we have to discard 2 of them to form a 3-set, leaving us with a single tile again. This leads to 2 additional discards. With 7 tiles, we have 2 times 2 discards, adding 4. And so on.
This leads to the simple formula shanten_added = (number_of_singles - 1) * (2/3).
The described algorithm works well and passed all my tests, so I'm assuming it is correct. As stated, I can't prove it though.
Since the algorithm removes the most likely tiles combinations first, it kind of has a built-in optimization. Adding a simple check if (current_depth > best_shanten) then return; it does very well even for high shanten numbers.
My best guess would be an A* inspired approach. You need to find some heuristic which never overestimates the shanten number and use it to search the brute-force tree only in the regions where it is possible to get into a ready state quickly enough.
Correct algorithm sample: syanten.cpp
Recursive cut forms from hand in order: sets, pairs, incomplete forms, - and count it. In all variations. And result is minimal Shanten value of all variants:
Shanten = Min(Shanten, 8 - * 2 - - )
C# sample (rewrited from c++) can be found here (in Russian).
I've done a little bit of thinking and came up with a slightly different formula than mafu's. First of all, consider a hand (a very terrible hand):
1s 4s 6s 1m 5m 8m 9m 9m 7p 8p West East North
By using mafu's algorithm all we can do is cast out a pair (9m,9m). Then we are left with 11 singles. Now if we apply mafu's formula we get (11-1)*2/3 which is not an integer and therefore cannot be a shanten number. This is where I came up with this:
N = ( (S + 1) / 3 ) - 1
N stands for shanten number and S for score sum.
What is score? It's a number of tiles you need to make an incomplete set complete. For example, if you have (4,5) in your hand you need either 3 or 6 to make it a complete 3-set, that is, only one tile. So this incomplete pair gets score 1. Accordingly, (1,1) needs only 1 to become a 3-set. Any single tile obviously needs 2 tiles to become a 3-set and gets score 2. Any complete set of course get score 0. Note that we ignore the possibility of singles becoming pairs. Now if we try to find all of the incomplete sets in the above hand we get:
(4s,6s) (8m,9m) (7p,8p) 1s 1m 5m 9m West East North
Then we count the sum of its scores = 1*3+2*7 = 17.
Now if we apply this number to the formula above we get (17+1)/3 - 1 = 5 which means this hand is 5-shanten. It's somewhat more complicated than Alexey's and I don't have a proof but so far it seems to work for me. Note that such a hand could be parsed in the other way. For example:
(4s,6s) (9m,9m) (7p,8p) 1s 1m 5m 8m West East North
However, it still gets score sum 17 and 5-shanten according to formula. I also can't proof this and this is a little bit more complicated than Alexey's formula but also introduces scores that could be applied(?) to something else.
Take a look here: ShantenNumberCalculator. Calculate shanten really fast. And some related stuff (in japanese, but with code examples) http://cmj3.web.fc2.com
The essence of the algorithm: cut out all pairs, sets and unfinished forms in ALL possible ways, and thereby find the minimum value of the number of shanten.
The maximum value of the shanten for an ordinary hand: 8.
That is, as it were, we have the beginnings for 4 sets and one pair, but only one tile from each (total 13 - 5 = 8).
Accordingly, a pair will reduce the number of shantens by one, two (isolated from the rest) neighboring tiles (preset) will decrease the number of shantens by one,
a complete set (3 identical or 3 consecutive tiles) will reduce the number of shantens by 2, since two suitable tiles came to an isolated tile.
Shanten = 8 - Sets * 2 - Pairs - Presets
Determining whether your hand is already in tenpai sounds like a multi-knapsack problem. Greedy algorithms won't work - as Dialecticus pointed out, you'll need to consider the entire problem space.

plane bombing problems- help

I'm training code problems, and on this one I am having problems to solve it, can you give me some tips how to solve it please.
The problem is taken from here:
https://www.ieee.org/documents/IEEEXtreme2008_Competitition_book_2.pdf
Problem 12: Cynical Times.
The problem is something like this (but do refer to above link of the source problem, it has a diagram!):
Your task is to find the sequence of points on the map that the bomber is expected to travel such that it hits all vital links. A link from A to B is vital when its absence isolates completely A from B. In other words, the only way to go from A to B (or vice versa) is via that link.
Due to enemy counter-attack, the plane may have to retreat at any moment, so the plane should follow, at each moment, to the closest vital link possible, even if in the end the total distance grows larger.
Given all coordinates (the initial position of the plane and the nodes in the map) and the range R, you have to determine the sequence of positions in which the plane has to drop bombs.
This sequence should start (takeoff) and finish (landing) at the initial position. Except for the start and finish, all the other positions have to fall exactly in a segment of the map (i.e. it should correspond to a point in a non-hit vital link segment).
The coordinate system used will be UTM (Universal Transverse Mercator) northing and easting, which basically corresponds to a Euclidian perspective of the world (X=Easting; Y=Northing).
Input
Each input file will start with three floating point numbers indicating the X0 and Y0 coordinates of the airport and the range R. The second line contains an integer, N, indicating the number of nodes in the road network graph. Then, the next N (<10000) lines will each contain a pair of floating point numbers indicating the Xi and Yi coordinates (1 < i<=N). Notice that the index i becomes the identifier of each node. Finally, the last block starts with an integer M, indicating the number of links. Then the next M (<10000) lines will each have two integers, Ak and Bk (1 < Ak,Bk <=N; 0 < k < M) that correspond to the identifiers of the points that are linked together.
No two links will ever cross with each other.
Output
The program will print the sequence of coordinates (pairs of floating point numbers with exactly one decimal place), each one at a line, in the order that the plane should visit (starting and ending in the airport).
Sample input 1
102.3 553.9 0.2
14
342.2 832.5
596.2 638.5
479.7 991.3
720.4 874.8
744.3 1284.1
1294.6 924.2
1467.5 659.6
1802.6 659.6
1686.2 860.7
1548.6 1111.2
1834.4 1054.8
564.4 1442.8
850.1 1460.5
1294.6 1485.1
17
1 2
1 3
2 4
3 4
4 5
4 6
6 7
7 8
8 9
8 10
9 10
10 11
6 11
5 12
5 13
12 13
13 14
Sample output 1
102.3 553.9
720.4 874.8
850.1 1460.5
102.3 553.9
Pre-process the input first, so you identify the choke points. Algorithms like Floyd-Warshall would help you.
Model the problem as a Heuristic Search problem, you can compute a MST which covers all choke-points and take the sum of the costs of the edges as a heuristic.
As the commenters said, try to make concrete questions, either here or to the TA supervising your class.
Don't forget to mention where you got these hints.
The problem can be broken down into two parts.
1) Find the vital links.
These are nothing but the Bridges in the graph described. See the wiki page (linked to in the previous sentence), it mentions an algorithm by Tarjan to find the bridges.
2) Once you have the vital links, you need to find the smallest number of points which given the radius of the bomb, will cover the links. For this, for each link, you create a region around it, where dropping the bomb will destroy it. Now you form a graph of these regions (two regions are adjacent if they intersect). You probably need to find a minimum clique partition in this graph.
Haven't thought it through (especially part 2), but hope it helps.
And good luck in the contest!
I think Moron' is right about the first part, but on the second part...
The problem description does not tell anything about "smallest number of points". It tells that the plane flies to the closest vital link.
So, I think the part 2 will be much simpler:
Find the closest non-hit segment to the current location.
Travel to the closest point on the closest segment.
Bomb the current location (remove all segments intersecting a circle)
Repeat until there are no non-hit vital links left.
This straight-forward algorithm has a complexity of O(N*N), but this should be sufficient considering input constraints.

Resources