Understanding the AStar Algorithm - algorithm

From this link: Link
If an adjacent square is already on the open list, check to see if
this path to that square is a better one. In other words, check to see
if the G score for that square is lower if we use the current square
to get there. If not, don’t do anything.
Example:
parent (already traversed): O
branches: A, B, C
parent (working on): A
Branches: B, D
The open list contains, A, B, C and now D.
Now, the bold statements in the above quote are comparing which path with, path A to B?
i.e.
Comparison of A to B && O to B
OR
Comparison of A to B && A to D
Please clarify.

Basic A* is:
While we're not close enough to the / a destination
Take the point that we have now, with the lowest expected total cost (so known cost up to this point plus estimated remaining cost).
Calculate cost for all surrounding locations & add them back to the priority queue with their expected total cost.
return route that we followed to the closest point
In the official A* terminology, G-score is the cost to get there. H-score is the estimate to get from there to where you want to go.
Extremes are if your H-score always overestimates; the new score (estimate + new cost) will then always be lower than the previous square's estimate + cost, so you'll beeline to the target. If your H-score always underestimates (or is 0, or whatever) you'll always prefer squares closer to your point of departure, as they'll have a lower cost so far, so you'll basically floodfill from that position.
You can use A* from a theoretical point of view or from a practical point of view. Theoretically, you may never overestimate the cost of any link. That means that you will always find the most efficient route, but will take longer finding it as you'll expand a lot more nodes.
Using it from the practical point of view allows slightly-inadmissible heuristics (ones that can overestimate slightly). The route you find will most likely be slightly unoptimal but shouldn't be too bad for game use. The calculations get way faster though, since you don't expand everything anymore. One (slow) implementation I had made took 6 minutes across a 1k*1k map with a regular trig distance heuristic but only a few seconds with that augmented times 3. The routes weren't too different. Making the heuristic times 5 made the routes basically a beeline & much faster still, but unusably so.
WRT your direct question:
It's the second square you evaluate. You have the roads from O to A,B,C planned (with a given cost G for each of those routes). Now, suppose there's a highway from O through A to B, instead of a dirt road from O to B. That means that going through A is faster. When expanding A it looks at all surrounding squares and adds the cost + heuristic to the open list if it wasn't in there already. If it was in there, it sees if the new route is faster (lower G to get to that square) and only if so, does it replace it.
So in your example, it adds O->A->D to the list and then checks if O->A->B is faster than O->B.
-- addendum
Open list: 2 / A (through O), 5 / B (through O), 7 / C (through O)
Closed list: 0 / O (origin / through nothing)
Take A as it's the lowest cost so far. Then, for each neighbor of A calculate the cost to get there.
If there's an entry for it already and the cost is lower than we know so far, replace the entry.
If there's no current entry, add it.
Working on A, with roads of cost 2 to B and 3 to D. Going to B through A has a cost of 4, where the current entry is 5. So we replace it. D isn't in there, so we add it.
Open list: 4 / B (through A), 5 / D (through A), 7 / C (through O)
Closed list: 0 / O (origin / through nothing), 2 / A (through O)
So we compared A to B versus O to B.

Well, if we are working on the node A, we are considering its neighbours. Say we are considering the B node now (which is in the openList).
The definition given tells us to compare the G value of B (previously computed when it was first added to open list when the working node was O) and the sum of the cost of reaching A from the begining (O) and the cost of reaching B from A.

Related

is this the best complexity I can get?

The problem goes as follows:
you have n domino pieces and the two numbers on the every domino piece (n pieces), also there is an extra set of m domino pieces but you can use only one piece from this set (at most) to help you do the following:
calculate the minimum number of domino pieces that you can use to go from a given starting point S to an ending point D.
meaning that the starting piece should have the number (S) and the ending piece should have the number (D).
Input:
n and the domino pieces' numbers (n pairs).
m and the extra domino pieces' numbers (m pairs).
starting point S and a destination D.
Output:
the minimum number of domino pieces.
I am thinking of using BFS for this problem where I can start from S and find the minimum path to D with constantly removing node m(i) from the graph and adding node m(i+1)
but doing this the time complexity will be O(n * m).
but not only this, there could be multiple starting points S so the complexity would be O(|S| * n * m).
can it be solved in a better way?
The teacher said it could be solved in a Linear Time but I am just very confused.
I initially missed that your question allows multiple sources, and wrote a somewhat long answer explaining how to approach that problem. Let me post it here anyway, because it might still be helpful. Scroll further for the solution to the original question.
Finding shortest paths from single S to D in linear time
Let's build the idea incrementally. First, let's try to solve a simpler version of the problem, where we just need to find out whether you can get from a single S to a single D at all by using at most one domino from the set of extra M dominoes.
I suggest to approach it this way: do some preprocessing on the N dominoes that will let you, for each of the M additional dominoes, quickly (in constant time) answer whether there exists a path from S to D that goes through this domino. (And of course we need to remember the edge case when we don't need an extra domino at all, but it's easy to cover in linear time.)
What kind of information lets you answer this question? Let's say you are looking at a domino with numbers A and B on its ends. If you knew that you can get from S to A, and from B to D, you use this domino to get from S to D, right? Alternatively, if there was a path from S to B and from A to D, it would also work. If neither is true, then there is no way this domino can help you to get from S to D.
That's great, but if we run BFS from every possible B, we won't achieve linear time complexity. However, note that you can reverse the second problem (detecting if a path from B's to D exists) and pose it as "can I get from D to every possible B"? That is easily answered with a single BFS.
Can you see how this approach can be adapted to find length of the shortest path through each domino, as opposed to just detecting if a path exists?
Finding shortest paths from multiple S to D in linear time
Let's reverse the problem and say we want to find the shortest paths from D to multiple S. You could create a directed graph with twice as many nodes as there were unique numbers written on dominoes. That is, for each number there are nodes V and V', and if you are in V, it means you haven't used an extra domino yet, but if you are in V', it means you already used one. Each core (that is, one of the original N) domino (A, B) corresponds to 4 edges in this graph: (A -> B), (B -> A), (A' -> B'), (B' -> A'). Each extra domino corresponds to 2 edges: (A -> B'), (B -> A'). Note that once we get into a node with ', we can never get out of it, so we will only use at most one extra domino this way. A single BFS from D in this graph will answer the problem.

Pathfinding task - how can I find next vertex on the shortest path from A to B faster that O ( n )?

I have a quite tricky task to solve:
You are given a N * M board (1 <= N, M <= 256). You can move from each field to it's neighbouring field (moving diagonally is not allowed). At the beginning, there are two types of fields: active and blocked. You can pass through active field, but you can't go on the blocked one. You have Q queries (1 <= Q <= 200). There are two types of queries:
1) find the next field (neighbouring to A) that lies on the shortest path from field A to B
2) change field A from active to blocked or conversly.
The first type query can be easily solved with simple BFS in O(N * M) time. We can represent active and blocked fields as 0 or 1, so the second query could be done in constant time.
The total time of that algorithm would be O(Q (number of queries) * N * M).
So what's the problem? I have a 1/60 second to solve all the queries. If we consider 1 second as 10^8 calculations, we are left with about 1,5 * 10^6 calculations. One BFS may take up to N * M * 4 time, which is about 2,5 * 10^5. So if Q is 200, the needed calculations may be up to 5 * 10^7, which is way too slow.
As far as I know, there is no better pathfinding algorithms than BFS in this case (well, I could go for an A*, but I'm not sure if it's much quicker than BFS, it's still worst-case O(|E|) - according to Wikipedia ). So there's not much to optimize in this area. However, I could change my graph in some way to reduce the amount of edges that the algorithm would have to process (I don't need to know the full shortest path, only the next move I should make, so the rest of the shortest path can be very simplified). I was thinking about some preprocessing - grouping vertices in a groups and making a graph of graphs, but I'm not sure how to handle the blocked fields in that way.
How can I optimize it better? Or is it even possible?
EDIT: The actual problem: I have some units on the board. I want to start moving them to the selected destination. Units can't share the same field, so one can block others' paths or open a new, better paths for them. There can be a lot of units, that's why I need a better optimization.
If I understand the problem correctly, you want to find the shortest path on a grid from A to B, with the added ability that your path-finder can remove walls for an additional movement cost?
You can treat this as a directed graph problem, where you can move into any wall-node for a cost of 2, and into any normal node for a cost of 1. Then just use any directed-graph pathfinding algorithm such as Dijkstra's or A* (the usual heuristic, manhatten distance, will still work)

Dynamic graph algorithm shortest path

I have a problem which I am converting into a TSP kind of problem so that I can explain it here and I am looking for any existing algorithms that might help me.
There are a list of places that I need to visit and I need to visit them all.
There are some places that have to be visited as the first x of n (IE, they need to be first 3 or first 5 places visited). (where the number is arbitrary)
There are some other places that have to be visited as the last y of n (IE, they need to be last 3 or last 5 places visited).
The places are could be categorized (some may not have a category), for those in a category, they need to visited as far away from each other (ie, if 2 places are categorized as green, then I would like to visit as many other category places as possible between these green categorized places)
Here is an example list:
A: category green: last 3
B: category none: ordering none
C: category pink: first 3
D: category none: ordering none
E: category none: last 3
F: category green: ordering none
G: category pink: first 3
The order I would like to come up with is:
G(pink,first3) -> F(green,none) -> C(pink,first3) -> D(none,none) -> B(none,none) -> E(none,last3) -> A(green,last3)
Explanation:
G came first, to keep it as far away from C as possible.
F came next to keep it as far away from A as possible.
C came next as it needed to be in first 3. C and G could be interchanged
D B could be placed anywhere
E came next as it had to be last 3
A came last as it had to be last 3 and by placing it at the end, it was as far as possible from F.
My idea is to evaluate each edge cost and the edge cost would be dynamically calculated. So if you tried to visit A and then F it would have a high cost, as opposed to visiting A and then some other place and then F (where the number of places in between would some how be part of the cost). Also, I would introduce a start and end place and so, if I had to visit some places as first x, I would be able to give it a low cost if start was within N places of that place. Same for the end.
I was wondering if there is a graph algorithm that can account for such dynamic weights/cost and determine the shortest path from start to end?
note: In some cases a best case may not be available, and that would be ok, as long as I can show that the cost is high because there wasnt enough category separation (eg: all places were in the same category).
Brute force algorithm
Initial idea I had is: Given a list of places, come up with all combinations of place ordering and calculate the costs between each and then choose the cheapest. (but this would mean evaluating n! where for 8 that would be 362880 orders that i would have to evaluate! why 8, cause that is what I believe will be the average number of places to evaluate)
But is there an algorithm that I could potentially use to determine it without testing all orderings.
One thing you could do:
Order the places as follows: first 1, ... first n, unordered, last n, ... last 1.
Go through the list and separate elemnts with the same color where possible without violating the previous order
Calculate the cost of this list and store it as the current best
Use this list to determine the order in which you evaluate permutations
While you build permutations, keep track of the cost
Abort building the current permutation when the cost exceeds the current best (including the theoretical minimum cost for the remaining places, if there is any)
Also abort when you have the theoretically possible best score.

Match every point in two different sized sets with minimum total line length

I have two sets of points plotted in a coordinate system. Each point in a set must be matched to at least one point at the other set, in a way that the sum of the length of the lines drawn by joining those points should be as low as possible. To make it clear, line drawing is just an abstraction, the actual output is just the pairs of points that must be matched.
I've seen this question about a similar problem, except that in my case there's no single-link restriction since the sets may have different sizes. Is there any kind of problem that describes this situation? More specifically, what algorithm could I use to solve this, assuming each set may have a maximum of 10 points?
Algorithm
You can model this as a network flow problem.
By having a source of 1 at each point in the first set, and a sink of 1 at each point in the second set, plus an extra node 'dest' for any left over capacity, any valid flow will always connect every point.
Make edges between the points with cost according to the distance between the points.
So far we have a network whose solution will be the lowest cost matching of set 1 to set 2 (i.e. each point will have a single link).
To allow multiple links you can simply make the following additions:
add 0 weight edges between each point in set2 and 'dest' (this allows points in set 2 to be multiply connected)
add 0 weight edges between 'dest' and each point in set2 (this allows points in set 1 to be multiply connected)
Example Python code using Networkx
import networkx as nx
import random
G=nx.DiGraph()
set1=['A','B','C','D','E','F','G','H','I']
set2=['a','b','c']
# Assume set1 > set2 (or swap sets)
assert len(set1)>=len(set2)
G.add_node('dest',demand=len(set1)-len(set2))
A=[]
for person in set1:
G.add_node(person,demand=-1)
G.add_edge('dest',person,weight=0)
for project in set2:
cost = random.randint(1,10) # Assign appropriate costs here
G.add_edge(person,project,weight=cost) # Edge taken if person does this project
for project in set2:
G.add_node(project,demand=1)
G.add_edge(project,'dest',weight=0)
flowdict = nx.min_cost_flow(G)
for person in set1:
for project,flow in flowdict[person].items():
if flow:
print person,'->',project
You can use a discrete optimization approach (Integer Programming).
We have two sets A, of size X, and B, of size Y. This means a maximum of X*Y links, each described by a boolean variable: L(i,j) = L(Y*i+j) is 1 if nodes A(i) and B(j) are linked, 0 if not. If X = Y = 10, we can write link L(7,3) as L73.
We can rewrite the problem like this:
Node A(i) has at least one link: X (say, ten) criteria with i from 0 to X-1, each of them comprised of Y components:
L(i,0)+L(i,1)+L(i,2)+...+L(i,Y-1) >= 1
Node B(j) has at least one link, and there are Y criteria made up of X components:
L(0,j)+L(1,j)+L(2,j)+...+L(X-1,j) >= 1
The minimal cost requirement becomes:
cost = SUM(C(0,0)*L(0,0)+C(0,1)*L(0,1)+...+C(9,9)*L(9,9)
With these conventions, we can easily build the matrices for an ILP problem, that can be passed to our favorite ILP solving package or library (C, Java, Python, even PHP).
====
A self-contained "greedy" algorithm which is not guaranteed to find a minimum, but is reasonably quick and should give reasonable results unless you feed it a pathological data set, is:
- connect all points in the smaller set, each to its nearest point in the other set.
- connect all unconnected points remaining in the larger set, each to its
nearest point in the first set, whether it's already connected or not.
As an optimization, you can then enumerate the points in the larger data set; if one of them (say A) is singly connected to a point in the first data set (say B) which is multiply connected, and is not its nearest neighbour C, you can switch the link from A-B to A-C. This takes care of one of the simplest problems that may arise from the "greediness" of the algorithm.

How do I calculate the shanten number in mahjong?

This is a followup to my earlier question about deciding if a hand is ready.
Knowledge of mahjong rules would be excellent, but a poker- or romme-based background is also sufficient to understand this question.
In Mahjong 14 tiles (tiles are like
cards in Poker) are arranged to 4 sets
and a pair. A straight ("123") always
uses exactly 3 tiles, not more and not
less. A set of the same kind ("111")
consists of exactly 3 tiles, too. This
leads to a sum of 3 * 4 + 2 = 14
tiles.
There are various exceptions like Kan
or Thirteen Orphans that are not
relevant here. Colors and value ranges
(1-9) are also not important for the
algorithm.
A hand consists of 13 tiles, every time it's our turn we get to pick a new tile and have to discard any tile so we stay on 13 tiles - except if we can win using the newly picked tile.
A hand that can be arranged to form 4 sets and a pair is "ready". A hand that requires only 1 tile to be exchanged is said to be "tenpai", or "1 from ready". Any other hand has a shanten-number which expresses how many tiles need to be exchanged to be in tenpai. So a hand with a shanten number of 1 needs 1 tile to be tenpai (and 2 tiles to be ready, accordingly). A hand with a shanten number of 5 needs 5 tiles to be tenpai and so on.
I'm trying to calculate the shanten number of a hand. After googling around for hours and reading multiple articles and papers on this topic, this seems to be an unsolved problem (except for the brute force approach). The closest algorithm I could find relied on chance, i.e. it was not able to detect the correct shanten number 100% of the time.
Rules
I'll explain a bit on the actual rules (simplified) and then my idea how to tackle this task. In mahjong, there are 4 colors, 3 normal ones like in card games (ace, heart, ...) that are called "man", "pin" and "sou". These colors run from 1 to 9 each and can be used to form straights as well as groups of the same kind. The forth color is called "honors" and can be used for groups of the same kind only, but not for straights. The seven honors will be called "E, S, W, N, R, G, B".
Let's look at an example of a tenpai hand: 2p, 3p, 3p, 3p, 3p, 4p, 5m, 5m, 5m, W, W, W, E. Next we pick an E. This is a complete mahjong hand (ready) and consists of a 2-4 pin street (remember, pins can be used for straights), a 3 pin triple, a 5 man triple, a W triple and an E pair.
Changing our original hand slightly to 2p, 2p, 3p, 3p, 3p, 4p, 5m, 5m, 5m, W, W, W, E, we got a hand in 1-shanten, i.e. it requires an additional tile to be tenpai. In this case, exchanging a 2p for an 3p brings us back to tenpai so by drawing a 3p and an E we win.
1p, 1p, 5p, 5p, 9p, 9p, E, E, E, S, S, W, W is a hand in 2-shanten. There is 1 completed triplet and 5 pairs. We need one pair in the end, so once we pick one of 1p, 5p, 9p, S or W we need to discard one of the other pairs. Example: We pick a 1 pin and discard an W. The hand is in 1-shanten now and looks like this: 1p, 1p, 1p, 5p, 5p, 9p, 9p, E, E, E, S, S, W. Next, we wait for either an 5p, 9p or S. Assuming we pick a 5p and discard the leftover W, we get this: 1p, 1p, 1p, 5p, 5p, 5p, 9p, 9p, E, E, E, S, S. This hand is in tenpai in can complete on either a 9 pin or an S.
To avoid drawing this text in length even more, you can read up on more example at wikipedia or using one of the various search results at google. All of them are a bit more technical though, so I hope the above description suffices.
Algorithm
As stated, I'd like to calculate the shanten number of a hand. My idea was to split the tiles into 4 groups according to their color. Next, all tiles are sorted into sets within their respective groups to we end up with either triplets, pairs or single tiles in the honor group or, additionally, streights in the 3 normal groups. Completed sets are ignored. Pairs are counted, the final number is decremented (we need 1 pair in the end). Single tiles are added to this number. Finally, we divide the number by 2 (since every time we pick a good tile that brings us closer to tenpai, we can get rid of another unwanted tile).
However, I can not prove that this algorithm is correct, and I also have trouble incorporating straights for difficult groups that contain many tiles in a close range. Every kind of idea is appreciated. I'm developing in .NET, but pseudo code or any readable language is welcome, too.
I've thought about this problem a bit more. To see the final results, skip over to the last section.
First idea: Brute Force Approach
First of all, I wrote a brute force approach. It was able to identify 3-shanten within a minute, but it was not very reliable (sometimes too a lot longer, and enumerating the whole space is impossible even for just 3-shanten).
Improvement of Brute Force Approach
One thing that came to mind was to add some intelligence to the brute force approach. The naive way is to add any of the remaining tiles, see if it produced Mahjong, and if not try the next recursively until it was found. Assuming there are about 30 different tiles left and the maximum depth is 6 (I'm not sure if a 7+-shanten hand is even possible [Edit: according to the formula developed later, the maximum possible shanten number is (13-1)*2/3 = 8]), we get (13*30)^6 possibilities, which is large (10^15 range).
However, there is no need to put every leftover tile in every position in your hand. Since every color has to be complete in itself, we can add tiles to the respective color groups and note down if the group is complete in itself. Details like having exactly 1 pair overall are not difficult to add. This way, there are max around (13*9)^6 possibilities, that is around 10^12 and more feasible.
A better solution: Modification of the existing Mahjong Checker
My next idea was to use the code I wrote early to test for Mahjong and modify it in two ways:
don't stop when an invalid hand is found but note down a missing tile
if there are multiple possible ways to use a tile, try out all of them
This should be the optimal idea, and with some heuristic added it should be the optimal algorithm. However, I found it quite difficult to implement - it is definitely possible though. I'd prefer an easier to write and maintain solution first.
An advanced approach using domain knowledge
Talking to a more experienced player, it appears there are some laws that can be used. For instance, a set of 3 tiles does never need to be broken up, as that would never decrease the shanten number. It may, however, be used in different ways (say, either for a 111 or a 123 combination).
Enumerate all possible 3-set and create a new simulation for each of them. Remove the 3-set. Now create all 2-set in the resulting hand and simulate for every tile that improves them to a 3-set. At the same time, simulate for any of the 1-sets being removed. Keep doing this until all 3- and 2-sets are gone. There should be a 1-set (that is, a single tile) be left in the end.
Learnings from implementation and final algorithm
I implemented the above algorithm. For easier understanding I wrote it down in pseudocode:
Remove completed 3-sets
If removed, return (i.e. do not simulate NOT taking the 3-set later)
Remove 2-set by looping through discarding any other tile (this creates a number of branches in the simulation)
If removed, return (same as earlier)
Use the number of left-over single tiles to calculate the shanten number
By the way, this is actually very similar to the approach I take when calculating the number myself, and obviously never to yields too high a number.
This works very well for almost all cases. However, I found that sometimes the earlier assumption ("removing already completed 3-sets is NEVER a bad idea") is wrong. Counter-example: 23566M 25667P 159S. The important part is the 25667. By removing a 567 3-set we end up with a left-over 6 tile, leading to 5-shanten. It would be better to use two of the single tiles to form 56x and 67x, leading to 4-shanten overall.
To fix, we simple have to remove the wrong optimization, leading to this code:
Remove completed 3-sets
Remove 2-set by looping through discarding any other tile
Use the number of left-over single tiles to calculate the shanten number
I believe this always accurately finds the smallest shanten number, but I don't know how to prove that. The time taken is in a "reasonable" range (on my machine 10 seconds max, usually 0 seconds).
The final point is calculating the shanten out of the number of left-over single tiles. First of all, it is obvious that the number is in the form 3*n+1 (because we started out with 14 tiles and always subtracted 3 tiles).
If there is 1 tile left, we're shanten already (we're just waiting for the final pair). With 4 tiles left, we have to discard 2 of them to form a 3-set, leaving us with a single tile again. This leads to 2 additional discards. With 7 tiles, we have 2 times 2 discards, adding 4. And so on.
This leads to the simple formula shanten_added = (number_of_singles - 1) * (2/3).
The described algorithm works well and passed all my tests, so I'm assuming it is correct. As stated, I can't prove it though.
Since the algorithm removes the most likely tiles combinations first, it kind of has a built-in optimization. Adding a simple check if (current_depth > best_shanten) then return; it does very well even for high shanten numbers.
My best guess would be an A* inspired approach. You need to find some heuristic which never overestimates the shanten number and use it to search the brute-force tree only in the regions where it is possible to get into a ready state quickly enough.
Correct algorithm sample: syanten.cpp
Recursive cut forms from hand in order: sets, pairs, incomplete forms, - and count it. In all variations. And result is minimal Shanten value of all variants:
Shanten = Min(Shanten, 8 - * 2 - - )
C# sample (rewrited from c++) can be found here (in Russian).
I've done a little bit of thinking and came up with a slightly different formula than mafu's. First of all, consider a hand (a very terrible hand):
1s 4s 6s 1m 5m 8m 9m 9m 7p 8p West East North
By using mafu's algorithm all we can do is cast out a pair (9m,9m). Then we are left with 11 singles. Now if we apply mafu's formula we get (11-1)*2/3 which is not an integer and therefore cannot be a shanten number. This is where I came up with this:
N = ( (S + 1) / 3 ) - 1
N stands for shanten number and S for score sum.
What is score? It's a number of tiles you need to make an incomplete set complete. For example, if you have (4,5) in your hand you need either 3 or 6 to make it a complete 3-set, that is, only one tile. So this incomplete pair gets score 1. Accordingly, (1,1) needs only 1 to become a 3-set. Any single tile obviously needs 2 tiles to become a 3-set and gets score 2. Any complete set of course get score 0. Note that we ignore the possibility of singles becoming pairs. Now if we try to find all of the incomplete sets in the above hand we get:
(4s,6s) (8m,9m) (7p,8p) 1s 1m 5m 9m West East North
Then we count the sum of its scores = 1*3+2*7 = 17.
Now if we apply this number to the formula above we get (17+1)/3 - 1 = 5 which means this hand is 5-shanten. It's somewhat more complicated than Alexey's and I don't have a proof but so far it seems to work for me. Note that such a hand could be parsed in the other way. For example:
(4s,6s) (9m,9m) (7p,8p) 1s 1m 5m 8m West East North
However, it still gets score sum 17 and 5-shanten according to formula. I also can't proof this and this is a little bit more complicated than Alexey's formula but also introduces scores that could be applied(?) to something else.
Take a look here: ShantenNumberCalculator. Calculate shanten really fast. And some related stuff (in japanese, but with code examples) http://cmj3.web.fc2.com
The essence of the algorithm: cut out all pairs, sets and unfinished forms in ALL possible ways, and thereby find the minimum value of the number of shanten.
The maximum value of the shanten for an ordinary hand: 8.
That is, as it were, we have the beginnings for 4 sets and one pair, but only one tile from each (total 13 - 5 = 8).
Accordingly, a pair will reduce the number of shantens by one, two (isolated from the rest) neighboring tiles (preset) will decrease the number of shantens by one,
a complete set (3 identical or 3 consecutive tiles) will reduce the number of shantens by 2, since two suitable tiles came to an isolated tile.
Shanten = 8 - Sets * 2 - Pairs - Presets
Determining whether your hand is already in tenpai sounds like a multi-knapsack problem. Greedy algorithms won't work - as Dialecticus pointed out, you'll need to consider the entire problem space.

Resources