In the context of a project, following the UC Berkley pacman ai project (its second part), I want to implement the minimax algorithm, without alpha-beta pruning, for an adversarial agent in a layout small enough that recursion is not a problem.
Having defined the problem as a 2-player (we assume only 1 ghost), turn taking, zero-sum game with perfect information, applying the recursive would be pretty trivial. However, since many different strategies can end up in the same game state (defined as a tuple of pacman's position, the ghost's position, the food's position and the player currently playing), I wanted to find a way to avoid recomputing all those states.
I searched and I read some things about transposition tables. I am not really sure on how to use such a method however and what I thought I should implement was the following:
Each time a state, not yet visited, is expanded, add it to a 'visited' set. If the state has already been expanded, then if it's the max player's turn (pacman) return a +inf value (which would normally never be chosen by the min player), if it's min's turn return -inf accordingly.
The problem with this idea, I think, and the reason why it works for some layouts but not others, is that when I hit a node, all the children of which have already been expanded, the only values I have to choose from are +/- infinities. This causes an infinite value to propagate upwards and be selected, while in fact it is possible that this game state leads to a loss. I think, I have understood the problem, but I can't seem to find a way to get around it.
Is there any other method I could use to avoid computing repeated game states? Is there a standard approach to this that I am not aware of?
Here is some pseudocode:
def maxPLayer(currentState, visitedSet):
if not isTerminalState
for nextState, action in currentState.generateMaxSuccessors()
if nextState not in visitedSet
mark nextState as visited
scores = scores + [minPlayer(nextState, visitedSet)]
if scores is not empty
return bestScore = max(scores)
else
return +inf #The problem is HERE!
else
return evalFnc(currentState)
end MaxPlayer
def minPlayer(currenstState, visitedSet):
if not isTerminal
for nextState, action in generateMinSuccessors()
if nextState not in visitedSet
mark nextState as visited
scores = scores + [maxPLayer(nextState, visitedSet)]
if scores is not empty
return bestScore = min(scores)
else
return -inf #The problem is also HERE!
else
return evalFnc(currentState)
end MinPlayer
Note that the first player to play is max and I choose the action that has the highest score. Nothing changes if I take into account infinite values or not, there are still instances of the game where the agent loses, or loops infinitely.
I think the main shortcoming in your approach is that you consider already visited states as undesirable targets for the opponent to move to. Instead of returning an infinity value, you should retrieve the value that was computed at the time when that state was first visited.
Practically this means you should use a map (of state->value) instead of a set (of state).
Only in case the value of the first visit is not yet computed (because the recursive call leads to a visit of an ancestor state), you would need to use a reserved value. But let that value be undefined/null/None, so that it will not be treated as other numerical results, but will be excluded from possible paths, even when backtracking.
As a side note, I would perform the lookup & marking of states at the start of the function -- on the current state -- instead of inside the loop on the neighboring states.
Here is how one of the two functions would then look:
def maxPLayer(currentState, evaluatedMap):
if currentState in evaluatedMap
return evaluatedMap.get(currentState)
evaluatedMap.set(currentState, undefined)
if not isTerminalState
bestScore = undefined
for nextState in currentState.generateMaxSuccessors()
value = minPlayer(nextState, evaluatedMap)
if value != undefined
scores.append(value)
if scores is not empty
bestScore = max(scores)
else
bestScore = evalFnc(currentState)
evaluatedMap.set(currentState, bestScore)
return bestScore
end MaxPlayer
The value undefined will be used during the time that a state is visited, but its value has not yet been determined (because of pending recursive calls). If a state is such that the current player has no valid moves (is "stuck"), then that state will permanently get the value undefined, in other cases, the value undefined will eventually get replaced with a true score.
The problem I was having was finally related with the definition of a 'game state' and how 'repeated states' had to be handled.
In fact, consider a the game state tree and a particular game state x which is identified by the following:
The position of pacman.
The number and position of food pellets on the grid.
The position and the direction of the ghost (the direction is taken into acount because the ghost is considered to not be able to make a half turn.
Now suppose you start going down a certain branch of the tree and at some point you visit the node x. Assuming it had not already been visited before and it is not a terminal state for the game, this node should added to the set of visited nodes.
Now suppose that once you're done with this particular branch of the tree, you start exploring a different one. After a certain, undetermined number of steps you get once again to a node identified as x. This is where the problem with the code in the question lies.
In fact, while the game state as defined is exactly the same, the path followed to get to this state is not (since we are currently on a new, different branch than the original one). Obviously, considering the state as visited or using the utility calculated by the last branch is false. It produces unexpected results.
The solution to this problem is, simply, to have a separate set of visited nodes for each branch of the tree. This way the situation described above is avoided. From there on, there are two strategies that can be considered:
The first one consists of considering looping through already visited states as a worst case scenario for pacman and an optimal strategy for the ghost (which is obviously not strictly true). Taking this into account, repeated states in the same branch of the tree are treated as a kind of 'terminal' states that return -inf as a utility.
The second approach consists of making use of a transposition table. This is however not trivial to implement: If a node is not already in the dictionary, initialize it at infinity to show that it is currently being computed and should not be recomputed if visited later. When reaching a terminal state, while recursing on all nodes store in the dictionary the difference in the game score between the current node and the corresponding terminal state. If while traversing a branch you visit a node that was already in the dictionary return the current game score (which depends on the path you took to get to this node and can change from one branch to the other) plus the value in the dictionaty (which is the gain (or loss) in score form getting from this node to the terminal state and which is always the same).
In more practical terms, the first approach is really simple to implement, it suffices to copy the set every time you pass it as an argument to the next player (so that values in different branches won't affect each other). This will make the algorithm significantly slower and alpha beta should be applied even for very small, simple mazes (1 food pellet and maybe 7x7 mazes). In any other case python will wither complain about recursion or simply take too long to solve (more than a few minutes). It is however correct.
The second approach is more complicated. I have no formal proof of correctness, although intuitively it seems to work. It is significantly faster and also compatible with alpha beta pruning.
The corresponding pseudo code is easy to derive from the explanation.
Related
I need to write a code in Java that will find the longest word that DFA accepts. Firstly, if there is transition to one of previous states (or self-transition) on path that leads to final state, that means there are infinite words, and longest one doesn't exist (that means there is Kleene star applied on some word). I was thinking to form queue by BFS, where each level is separated by null, so that when I'm iterating through queue and come across null, length of the word would be increases by one, but it would be hard to track set of previous states so I'm kind of idealess. If you can't code in Java I would appreciate pseudocode or algorithm.
I don't think this is strictly necessary, but it would not hurt the performance too terribly much in practice and might be sufficient for your needs. I would suggest, as a first pass, minimizing the DFA. This can be done in O(nlogn) in terms of the number of states, using e.g. Hopcroft. This is probably conceptually similiar to what Christian Sloper suggests in the comments regarding reversing the transitions to find unproductive states ; indeed, there is a minimization algorithm that does this as well, but you might be able to get away with just removing unproductive states and not minimizing here (though minimizing does make the reasoning a little easier).
Doing that is nice because it will remove all unproductive loops and combine them into a single dead state, if indeed there are any unproductive prefixes. It is easy to find the one dead state, if there is one, and remove it from the directed graph formed by the DFA's states and transitions. To do this, do either DFS or BFS and check each state to come to and see if (1) all transitions are self-loops and (2) the state is not accepting.
With the one dead state removed (if any) any loops or cycles we detect in the remaining directed graph imply there are infinitely many strings in the language, since by definition any remaining states have a path to acceptance. If we find a loop or cycle, we know the language is infinite, and can respond accordingly.
If there are no loops or cycles remaining after removing the dead state from the minimal DFA, what remains is a tree rooted at the start state and whose leaves are accepting states (think about this for a moment and you will see it must be true). Therefore, the length of the longest string accepted is the length (in edges) of the longest path from the root to a leaf; so basically the height of the tree or something close to it (depending on how you define depth/height, whether edges or nodes). You can take any old algorithm for finding the depth and modify it so that in addition to returning the depth, it returns the string corresponding to the deepest subtree, so you can get the string without having to go back through the tree. Something like this:
GetLongestStringInTree(root)
1. if root is null return ""
2. result = ""
3. maxlen = 0
4. for each transition
5. child = transition.target
6. symbol = transition.symbol
7. str = GetLongestStringInTree(child)
8. if str.length > maxlen then
9. maxlen = str.length
10. result = str
11. return result
This could be pretty easily modified to find all words of maximum length by adding str to a collection if its length is equal to the max length so far, and emptying that collection when a new longer string is found, and returning the collection (and using the length of the first thing in the collection for checking). That can be left as an exercise; as written, this will just find some arbitrary longest string accepted by the DFA.
This problem becomes a lot simpler if you split it in two. (Sorry no java)
Step 1: Determine if there is a loop.
If there is a loop there exist an infinite long input. Detecting a loop in a directed graph can be done with DFS.
Step 2 (no loop): You now have a directed acyclic graph (DAG) and you can find the longest path using this algorithm: Longest path in Directed acyclic graph
So I have a problem that goes as follows:
Xzqthpl is an alien living on the inconspicuous Kepler-1229b planet, a
mere 870 or so light years away from Earth. Whether to see C-beams
outside Tannhäuser Gate or just visiting Sirius for a good suntan,
Xzqthpl enjoys going on weekend trips to different faraway stars and
galaxies. However, because the universe is expanding, some of those
faraway places are going to move further and further away from
Kepler-1129b as time progresses. As a result, at some point in the
distant future even relatively nearby galaxies like Andromeda will be
too far away from Kepler-1229b for Xzqthpl to be able to do a weekend
trip there because the journey back and forth would take too much
time. There is a list of "n" places of interest to potentially visit.
For each place, Xzqthpl has assigned a value "v_i" measuring how
interested Xzqthpl is in the place, and a value "t_i" indicating the
number of weeks from now after which the place will be too far away to
visit.
Now Xzqthpl would like to plan its weekend trips in the following way:
No place is visited more than once.
At most one place is visited each week.
Place "i" is not visited after week "t_i"
The sum of values "v_i" for the visited places is maximized
Design an efficient (polynomial in "n" and independent of the v_i’s
and t_i’s assuming the unit cost model) algorithm to solve Xzqthpl’s
travel planning problem.
Currently I don't really know where to start. This feels like a weird variant of the "Weighted Interval Scheduling" algorithm (though I am not sure). Could someone give me some hints on where to start?
My inital thought is to sort the list by "t_i" in ascending order... but I am not really sure of what to do past that point (and my idea might even be wrong).
Thanks!
You could use a min-heap for this:
Algorithm
Sort the input by ti
Create an empty min-heap, which will contain the vi that are retained
Iterate the sorted input. For each i:
If ti < size of heap, then this means this element cannot be retained, unless another, previously selected element is kicked out. Check if the minimum value in the heap is less that vi. If so, then it is beneficial to take that minimum value out of the heap, and put this vi instead.
Otherwise, just add vi to the heap.
In either case, keep the total value of the heap updated
Return the total value of the heap
Why this works
This works, because at each iteration we have this invariant:
The size of the heap represents two things at the same time. It is both:
The number of items we still consider as possible candidates for the final solution, and
The number of weeks that have passed.
The idea is that every item in the heap is assigned to one week, and so we need just as many weeks as there are items in the heap.
So, in every iteration we try to progress with 1 week. However, if the next visited item could only be allowed in the period that has already passed (i.e. its last possible week is a week that is already passed), then we can't just add it like that to the heap, as there is no available week for it. Instead we check whether the considered item would be better exchanged with an item that we already selected (and is in the heap). If we exchange it, the one that loses out, cannot stay in the heap, because now we don't have an available week for that one (remember its time limit is even more strict -- we visit them in order of time limit). So whether we exchange or not, the heap size remains the same.
Secondly, the heap has to be a heap, because we want an efficient way to always know which is the element with the least value. Otherwise, if it were a simple list, we would have to scan that list in each iteration, in order to compare its value with the one we are currently dealing with (and want to potentially exchange). Obviously, an exchange is only profitable, if the total value of the heap increases. So we need an efficient way to find a bad value fast. A min-heap provides this.
An Implementation
Here is an implementation in Python:
from collections import namedtuple
from heapq import heappush, heapreplace
Node = namedtuple("Node", "time,value")
def kepler(times, values):
n = len(values)
# Combine corresponding times and values
nodes = [Node(times[i], values[i]) for i in range(n)];
nodes.sort() # by time
totalValue = 0
minheap = []
for node in nodes:
if node.time < len(minheap): # Cannot be visited in time
leastValue = minheap[0] # See if we should replace
if leastValue < node.value:
heapreplace(minheap, node.value) # pop and insert
totalValue += node.value - leastValue
else:
totalValue += node.value
heappush(minheap, node.value)
return totalValue
And here is some sample input for it:
times = [3,3,0,2,6,2,2]
values =[7,6,3,2,1,4,5]
value = kepler(times, values)
print(value) # 23
Time Complexity
Sorting would represent O(nlogn) time complexity. Even though one could consider some radix sort to get that down to O(n), the use of the heap also represents a worst case of O(nlogn). So the algorithm has a time complexity of O(nlogn).
I am trying to implement this recursive-backtracking function for a constraint satisfaction problem from the given algorithm:
function BACKTRACKING-SEARCH(csp) returns solution/failure
return RECURSIVE-BACKTRACKING({},csp)
function RECURSIVE-BACKTRACKING(assignment,csp) returns soln/failure
if assignment is complete then return assignment
var <- SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)
for each value in ORDER-DOMAIN-VALUES(var,assignment,csp) do
if value is consistent with assignment given CONSTRAINT[csp] then
add {var = value} to assignment
result <- RECURSIVE-BACKTRACKING(assignment, csp)
if result != failure then return result
remove {var = value} from assignment
return failure
The input for csp in BACKTRACKING-SEARCH(csp) is a csp class that contains a) a list of states, b) the list of colors, and c) an ordered dictionary with a state as the key and the value is the list of neighbors of the state that cannot have the same color.
The problem is that I am having a hard time understanding how the algorithm works correctly. If anyone can give me a proper explanation of this algorithm, it would be very much appreciated. Some specific questions I have is:
if assignment is complete then return assignment
I assume that since assignment is inputted as an empty dictionary {}, that this will return the solution, that is, the dictionary that contains states and their colors. However, I don't understand how I can check if the assignment is complete? Would it be something like checking the size of the dictionary against the number of states?
var <- SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)
The input csp class contains a list of states, I assume this could just be var equal to popping off a value in the list? I guess, what's confusing me is I'm not sure what the parameters (VARIABLES[csp], assignment, csp) are doing, given my input.
for each value in ORDER-DOMAIN-VALUES(var,assignment,csp) do
Again, confused on what the inputs of (var, assignment, csp) are doing exactly. But I assume that it'll go through each value (neighbor) in dictionary of the state?
if value is consistent with assignment given CONSTRAINT[csp] then
add {var = value} to assignment
result <- RECURSIVE-BACKTRACKING(assignment, csp)
if result != failure then return result
remove {var = value} from assignment
How do I properly check if value is consistent with assignment given constraints[csp]? I assume that constraints should be something that should be apart of my csp class that I haven't implemented yet? I don't understand what this if statement is doing in terms of checking. It would be quite useful if someone can clearly explain this if statement and the body of the if statement in depth.
So after rehashing some college literature (Peter Norvig's Artificial Intelligence: A Modern Approach), it turns out the problem in your hands is the application of Recursive Backtracking as a way to find a solution for the Graph Coloring Problem, which is also called Map Coloring (given its history to solve the problem of minimize colors needed to draw a map). Replacing each country in a map for a node and their borders with edges will give you a graph where we can apply recursive backtracking to find a solution.
Recursive backtracking will descend the graph nodes as a depth-first tree search, checking at each node for whether a color can be used. If not, it tries the next color, if yes, then it tries the next unvisited adjacent node. If for a given node no color satisfies the condition, it will step back (backtrack) and move on to a sibling (or the parent's sibling if no siblings for that node).
So,
I assume that since assignment is inputted as an empty dictionary {}, that this will return the solution, that is, the dictionary that contains states and their colors
...
Would it be something like checking the size of the dictionary against the number of states?
Yes and yes. Once the dictionary contains all the nodes of the graph with a color, you'll have a solution.
The input csp class contains a list of states, I assume this could just be var equal to popping off a value in the list?
That pseudocode syntax is confusing but the general idea is that you'll have a way to find out a node of the graph that hasn't been colored yet. One simply way is to return a node from the dictionary that doesn't have a value assigned to it. So if I understand the syntax correctly, var would store a node.
VARIABLES[csp] seems to me like a representation of the list of nodes inside your CSP structure.
I'm not sure what the parameters (VARIABLES[csp], assignment, csp) are doing, given my input
The assignment parameter is a dictionary containing the nodes evaluated so far (and the future solution), as mentioned above, and csp is the structure containing a,b and c.
Again, confused on what the inputs of (var, assignment, csp) are doing exactly. But I assume that it'll go through each value (neighbor) in dictionary of the state?
ORDER-DOMAIN-VALUES appears to be a function which will return the ordered set of colors in your CSP structure. The FOR loop will iterate over each color so that they're tested to satisfy the problem at that level.
if value is consistent with assignment given CONSTRAINT[csp] then
Here, what you're doing is testing the constraint with that value, to ensure it's true. In this case you want to check that any nodes adjacent to that node does not have that color already. If an adjacent node has that color, skip the IF and iterate the for loop to try the next color.
If no adjacent nodes have that color, then enter the IF body and add that node var with color value to the assigment dictionary (I believe {var = value} is a tuple representation, which I would write {var,value}, but oh well).
Then call the function recursive backtracking again, recursively.
If the recursive call returns non-failure, return its results (it means the solution has been found).
If it returns a failure (meaning, it tried all the colors and all of them happened to be used by another adjacent node), then remove that node ({var,value}) from the assignment (solution) array and move on to the next color. If all colors have been exausted, return failure.
Here is the problem I ran into. I have a list of evaluators, I_1, I_2... etc, which have dependency among each other. Something like I_1 -> I_2 (reads, I_2 depends on I_1's result). There is no cyclic dependency.
each of these shared interfaces bool eval(), double value(). say I_1->eval() would update the result of I_1, which can be returned by I_1->value(). And the boolean returned by eval() tells me if the result has changed, and if so, all I_js that depend on I_1 need to be updated.
Now say I_1 has updated result, how to run as few eval()s as possible to keep all I_js up to date?
I just have a nested loop like this:
first do a tree-walk from I_1, marking it and all descendants as out-of-date
make a list of those descendants
anything_changed = true
while anything_changed
anything_changed = false
for each formula in the descendant list
if no predecessors of that formula in the descendant list are out of date
re-evaluate the formula and assert that it is not out of date
anything_changed = true
Look, it's crude but correct.
So what if it's a bit like a quadratic big-O?
If the number of formulas is not too large, and/or the cost of evaluating each one is not too small, and/or if this is not done at high frequency, performance should not be an issue.
If I could, I'd add links from a parent to it's dependant children, so the update then becomes:-
change_value ()
{
evaluate new_value based on all parents
if (value != new_value)
{
value = new_value
for each child
child->change_value ()
}
}
Of course, you'd need to cope with the case where Child(n) is the parent of Child(m)
Actually, thinking about it, it might just work but won't be a minimal set of calls to change_value
You need something like a breadth first search from l_1, omitting to search the descendants of nodes whose return from eval() said that they had not changed, and taking into account that you should not evaluate a node until you have evaluated all the nodes that it directly depends on. One way to arrange this would be to keep a count of unevaluated direct dependencies on each node, decrementing the count for all the nodes that depend on a node you have just evaluated. At each stage if there are nodes not yet evaluated that need to be there must be at least one that does not depend on an unevaluated node. If not you could produce an infinite list of unevaluated nodes by travelling from one node to a node that it depends on and so on, and we know there are no cycles in the dependency graph.
There is pseudo-code for breadth first search at https://en.wikipedia.org/wiki/Breadth-first_search.
An efficient solution would be to have two relations. If I_2 depends on I_1 you would have I_1 --influences--> I_2 and I2 --depends on--> I_1 as relations.
You basically need to be able to efficiently calculate the numbers of out-of-date evaluations that I_X depends on (let's call that number D(I_X))
Then, you do the following:
Do a BFS with the --influences--> relation, storing all reachable I_X
Store the reachable I_X in a data structure that sorts them according to their D(I_X) , e.g. a Priority Queue
// finding the D(I_X) could be integrated into the DFS and require little additional calculation time
while (still nodes to update):
Pop and re-evaluate the I_X with the lowest D(I_X) value (e.g. the first I_X from the Queue) (*)
Update the D(I_Y) value for all I_Y with I_X --influences--> I_Y
(i.e. lower it by 1)
Update the sorting/queue to reflect the new D(I_Y) values
(*) The first element should always have D(I_X) == 0, otherwise, you might have a circular dependency
The algorithm above uses quite a bit of time to find the nodes to update and order them, but gains the advantage that it only re-valuates every I_X once.
I'm trying to find the width of a directed acyclic graph... as represented by an arbitrarily ordered list of nodes, without even an adjacency list.
The graph/list is for a parallel GNU Make-like workflow manager that uses files as its criteria for execution order. Each node has a list of source files and target files. We have a hash table in place so that, given a file name, the node which produces it can be determined. In this way, we can figure out a node's parents by examining the nodes which generate each of its source files using this table.
That is the ONLY ability I have at this point, without changing the code severely. The code has been in public use for a while, and the last thing we want to do is to change the structure significantly and have a bad release. And no, we don't have time to test rigorously (I am in an academic environment). Ideally we're hoping we can do this without doing anything more dangerous than adding fields to the node.
I'll be posting a community-wiki answer outlining my current approach and its flaws. If anyone wants to edit that, or use it as a starting point, feel free. If there's anything I can do to clarify things, I can answer questions or post code if needed.
Thanks!
EDIT: For anyone who cares, this will be in C. Yes, I know my pseudocode is in some horribly botched Python look-alike. I'm sort of hoping the language doesn't really matter.
I think the "width" you're considering here isn't really what you want - the width depends on how you assign levels to each node where you have some choice. You noticed this when you were deciding whether to assign all sources to level 0 or all sinks to the max level.
Instead, you just want to count the number of nodes and divide by the "critical path length", which is the longest path in the dag. This gives the average parallelism for the graph. It depends only on the graph itself, and it still gives you an indication of how wide the graph is.
To compute the critical path length, just do what you're doing - the critical path length is the maximum level you end up assigning.
In my opinion when you're doing this type of last minute development, its best to keep the new structures separate from the ones you are already using. At this point, if I were pressed by time I would go for a simpler solution.
Create an adjacency matrix for the graph using the parent data (should be easy)
Perform a topological sort using this matrix. (or even use tsort if pressed for time)
Now that you have a topological sort, create an array level, one element for each node.
For each node:
If the node has no parents set its level to 0
Otherwise set it to the minimum of level its parents + 1.
Find the maximum level width.
The question is as Keith Randall asked, is this the right measurement you need?
Here's what I (Platinum Azure, the original author) have so far.
Preparations/augmentations:
Add "children" field to linked list ("DAG") node
Add "level" field to "DAG" node
Add "children_left" field to "DAG" node. This is used to make sure that all children are examined before a parent is examined (in a later stage of the algorithm).
Algorithm:
Find the number of immediate children for all nodes; also, determine leaves by adding nodes with children==0 to list.
for l in L:
l.children = 0
for l in L:
l.level = 0
for p in l.parents:
++p.children
Leaves = []
for l in L:
l.children_left = l.children
if l.children == 0:
Leaves.append(l)
Assign every node a "reverse depth" level. Normally by depth, I mean topologically sort and assign depth=0 to nodes with no parents. However, I'm thinking I need to reverse this, with depth=0 corresponding to leaves. Also, we want to make sure that no node is added to the queue without all its children "looking at it" first (to determine its proper "depth level").
max_level = 0
while !Leaves.empty():
l = Leaves.pop()
for p in l.parents:
--p.children_left
if p.children_left == 0:
/* we only want to append parents with for sure correct level */
Leaves.append(p)
p.level = Max(p.level, l.level + 1)
if p.level > max_level:
max_level = p.level
Now that every node has a level, simply create an array and then go through the list once more to count the number of nodes in each level.
level_count = new int[max_level+1]
for l in L:
++level_count[l.level]
width = Max(level_count)
So that's what I'm thinking so far. Is there a way to improve on it? It's linear time all the way, but it's got like five or six linear scans and there will probably be a lot of cache misses and the like. I have to wonder if there isn't a way to exploit some locality with a better data structure-- without actually changing the underlying code beyond node augmentation.
Any thoughts?