Algorithm for shortening a series of actions? - algorithm

It's been awhile since my algorithms class in school, so forgive me if my terminology is not exact.
I have a series of actions that, when run, produces some desired state (it's basically a set of steps to reproduce a bug, but that doesn't matter for the sake of this question).
My goal is to find the shortest series of steps that still produces the desired state. Any given step might be unnecessary, so I'm trying to remove those as efficiently as possible.
I want to preserve the order of the steps (so I can remove steps, but not rearrange them).
The naive approach I'm taking is to take the entire series and try removing each action. If I successfully can remove one action (without altering the final state), I start back at the beginning of the series. This should be O(n^2) in the worst case.
I'm starting to play around with ways to make this more efficient, but I'm pretty sure this is a solved problem. Unfortunately, I'm not sure exactly what to Google - the series isn't really a "path," so I can't use path-shortening algorithms. Any help - even just giving me some terms to search - would be helpful.
Update: Several people have pointed out that even my naive algorithm won't find the shortest solution. This is a good point, so let me revise my question slightly: any ideas about approximate algorithms for the same problem? I'd rather have a short solution that's near the shortest solution quickly than take a very long time to guarantee the absolute shortest series. Thanks!

Your naive n^2 approach is not exactly correct; in the worst case you might have to look at all subsets (well actually the more accurate thing to say is that this problem might be NP-hard, which doesn't mean "might have to look at all subsets", but anyway...)
For example, suppose you are currently running steps 12345, and you start trying to remove each of them individually. Then you might find that you can't remove 1, you can remove 2 (so you remove it), then you look at 1345 and find that each of them is essential -- none can be removed. But it might turn out that actually, if you keep 2, then just "125" suffice.
If your family of sets that produce the given outcome is not monotone (i.e. if it doesn't have the property that if a certain set of actions work, then so will any superset), then you can prove that there is no way of finding the shortest sequence without looking at all subsets.

If you are making strickly no assumptions about the effect of each action and you want to strickly find the smallest subset, then you will need to try all possible subets of actions to find the shortest seuence.
The binary search method stated, would only be sufficient if a single step caused your desired state.
For the more general state, even removing a single action at a time would not necessarily give you the shortest sequence. This is the case if you consider pathological examples where actions may together cause no problem, but individually trigger your desired state.
Your problem seem reducable to a more general search problem, and the more assumptions you can create the smaller your search space will become.

Delta Debugging, A method for minimizing a set of failure inducing input, might be a good fit.
I've previously used Delta(minimizes "interesting" files, based on test for interestingness) to reduce a ~1000 line file to around 10 lines, for a bug report.

The most obvious thing that comes to mind is a binary search-inspired recursive division into halves, where you alternately leave out each half. If leaving out a half at any stage of the recursion still reproduces the end state, then leave it out; otherwise, put it back in and recurse on both halves of that half, etc.
Recursing on both halves means that it tries to eliminate large chunks before giving up and trying smaller chunks of those chunks. The running time will be O(n log(n)) in the worst case, but if you have a large n with a high likelihood of many irrelevant steps, it ought to win ahead of the O(n) approach of trying leaving out each step one at a time (but not restarting).
This algorithm will only find some minimal paths, though, it can't find smaller paths that may exist due to combinatorial inter-step effects (if the steps are indeed of that nature). Finding all of those will result in combinatorial explosion, though, unless you have more information about the steps with which to reason (such as dependencies).

You problem domain can be mapped to directional graph where you have states as nodes and steps as links , you want to find the shortest path in a graph , to do this a number of well known algorithms exists for example Dijkstra's or A*
Updated:
Let's think about simple case you have one step what leads from state A to state B this can be drawn as 2 nodes conected by a link. Now you have another step what leads from A to C and from C you havel step what leads to B. With this you have graph with 3 nodes and 3 links, a cost of reaching B from A it eather 2 (A-C-B) or 1 (A-B).
So you can see that cost function is actualy very simple you add 1 for every step you take to reach the goal.

Related

How can this bipartite matching solution be improved?

I'm working through codefights and am attempting the busyHolidays challenge from the Instacart company challenges.
The challenge provides three arrays. Shoppers contains strings representing the start and end times of their shifts. Orders contains strings representing the start and end times of the orders, and leadTime contains integers representing the number of minutes it takes to complete the job.
The goal is to determine if the orders can be matched to shoppers such that each shopper has only one order and each order has a shopper. An order may only be matched to a shopper if the shopper can both begin and complete it within the order time.
I have a solution that passes 19/20 tests, but since I can't see the last test I have no idea what's going wrong. I originally spent a couple days trying to learn algorithms like Edmond's Algorithm and the Hungarian Algorithm, but my lack of CS background and weakness in math kind of bit me in the ass and I can't seem to wrap my head around how to actually implement those methodologies, so I came up with a solution that involves weighting each node on each side of the graph according to its number of possible connections. I would appreciate it if anyone could help me take a look at my solution and either point out where it might be messing up or suggest a more standard solution to the problem in a way that might be easier for someone without formal training in algorithms to understand. Thanks in advance.
I'll put the code in a gist since it's fairly length
Code: https://gist.github.com/JakeTompkins/7e1afc4722fb828f26f8f6a964774a25
Well, I don't see any reason to think that the algorithm you're writing is actually going to work so the question about how you might be messing it up doesn't seem to be relevant.
You have correctly identified this as an instance of the assignment problem. More specifically this is the "maximum bipartite matching" problem, and the Edmonds-Karp algorithm is the simpliest way to solve it (https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm)
However, this is an algorithm for finding the maximum flow in a network, which is a larger problem than simple bipartite matching, and the explanations of this algorithm are really a lot more complicated than you need. It's understandable that you had some trouble implementing this from the literature, but actually when the problem is reduced to simple (unweighted) bipartite matching, the algorithm is easy to understand:
Make an initial assignment
Try to find an improvement
Repeat until no more improvements can be found.
For bipartite matching, an "improvement" always has the same form, which is what makes this problem easy to solve. To find an improvement, you have to find a path that connects an unassigned shopper to an unassigned order, following these rules:
The path can go from any shopper to any order he/she could fulfill but does not
The path can go from any order only to the shopper that is fulfilling it in the current assignment.
You use bread-first search to find the shortest path, which will correspond to the improvement that changes the smallest number of existing assignments.
The path you find will necessarily have an odd number of edges, and the even-numbered edges will be assignments. To implement the improvement, you remove those assignments and replace them with the odd-numbered edges. There's one more of those, which is what makes it an improvement. It looks like this:
PREVIOUS PATH FOUND IMPROVED ASSIGNMENT
1 1 1
/ /
A A A
\ \
2 2 2
/ /
B B B
\ \
3 3 3
/ /
C C C

Why is chess, checkers, Go, etc. in EXP but conjectured to be in NP?

If I tell you the moves for a game of chess and declare who wins, why can't it be checked in polynomial time if the winner does really win? This would make it an NP problem from my understanding.
First of all: The number of positions you can set up with 32 pieces on a 8x8 field is limited. We need to consider any pawn being converted to any other piece and include any such available position, too. Of course, among all these, there are some positions that cannot be reached following the rules of chess, but this does not matter. The important thing is: we have a limit. Lets name this limit simply MaxPositions.
Now for any given position, let's build up a tree as follows:
The given position is the root.
Add any position (legal chess position or not) as child.
For any of these children, add any position as child again.
Continue this way, until your tree reaches a depth of MaxPositions.
I'm now too tired to think of if we need one additional level of depth or not for the idea (proof?), but heck, just let's add it. The important thing is: the tree constructed like this is limited.
Next step: Of this tree, remove any sub-tree that is not reachable from the root via legal chess moves. Repeat this step for the remaining children, grand-children, ..., until there is no unreachable position left in the whole tree. The number of steps must be limited, as the tree is limited.
Now do a breadth-first search and make any node a leaf if it has been found previously. It must be marked as such(!; draw candidate?). Same for any mate position.
How to find out if there is a forced mate? In any sub tree, if it is your turn, there must be at least one child leading to a forced mate. If it is the opponents move, there must be a grand child for every child that leads to a mate. This applies recursively, of course. However, as the tree is limited, this whole algorithm is limited.
[sensored], this whole algorithm is limited! There is some constant limiting the whole stuff. So: although the limit is incredibly high (and far beyond what up-to-date hardware can handle), it is a limit (please do not ask me to calculate it...). So: our problem actually is O(1)!!!
The same for checkers, go, ...
This applies for the forced mate, so far. What is the best move? First, check if we can find a forced mate. If so, fine, we found the best move. If there are several, select the one with the least moves necessary (still there might be more than one...).
If there is no such forced mate, then we need to measure by some means the 'best' one. Possibly count the number of available successions to mate. Other propositions for measurement? As long as operating on this tree from top to down, we still remain limited. So again, we are O(1).
Now what did we miss? Have a look at the link in your comment again. They are talking about an NxN checkers! The author is varying size of the field!
So have a look back at how we constructed the tree. I think it is obvious that the tree grows exponentially with the size of the field (try to prove it yourself...).
I know very well that this answer is not a prove for that the problem is EXP(TIME). Actually, I admit, it is not really an answer at all. But I think what I illustrated still gives quite a good image/impression of the complexity of the problem. And as long as no one provides a better answer, I dare to claim that this is better than nothing at all...
Addendum, considering your comment:
Let me allow to refer to wikipedia. Actually, it should be suffient to transform the other problem in exponential time, not polynomial as in the link, as applying the transformation + solving the resulting problem still remains exponential. But I'm not sure about the exact definition...
It is sufficient to show this for a problem of which you know already it is EXP complete (transforming any other problem to this one and then to the chess problem again remains exponential, if both transformations are exponential).
Apparently, J.M. Robson found a way to do this for NxN checkers. It must be possible for generalized chess, too, probably simply modifying Robsons algorithm. I do not think it is possible for classical 8x8 chess, though...
O(1) applies for classical chess only, not for generalized chess. But it is the latter one for which we assume not being in NP! Actually, in my answer up to this addendum, there is one prove lacking: The size of the limited tree (if N is fix) does not grow faster than exponentially with growing N (so the answer actually is incomplete!).
And to prove that generalized chess is not in NP, we have to prove that there is no polynomial algorithm to solve the problem on a non-deterministic turing machine. This I leave open again, and my answer remains even less complete...
If I tell you the moves for a game of chess and declare who wins, why
can't it be checked in polynomial time if the winner does really win?
This would make it an NP problem from my understanding.
Because in order to check if the winner(white) does really win, you will have to also evaluate all possible moves that the looser(black) could've made in other to also win. That makes the checking also exponential.

Finding a sequence of operations

This should eventually be written in JavaScript. But I feel that I should not type any code until my algorithm is clear, which it is not!
Problem Given: Starting at 1, write a function that given a number returns a sequence of operations that consist only of either "+5" or "*3" that produce the number in question.
My basic algorithm:
Get the number
if the number is 1
return 1.
else if we surpass the number
return -1.
else keep trying to "+5" or "*3" until number is reached, assuming it can be reached.
My problem is with step # 4: I see that there are two paths to take which will bring me to the number in question(target), either "+5" OR "*3", but what about the number 13 which can be found by a MIXTURE of BOTH paths?? I can only do one thing or the other!
How would I know which path to take and how many times I should take that path? How would I bounce back and forth between paths?
I agree with the concept of breadth first search in a binary tree. However, I suggest turning the problem around, and looking at the problem of using "-5" or "/3" to get from the target back to 1. That allows pruning based on the target.
For example, 13 is not divisible by 3, so the first step in the backwards problem for target 13 must be "-5", not "/3".
It does not change the complexity, but may make the algorithm faster in practice for small problems.
You essentially want to do a breadth first, binary search tree. You could use recursion, or just some while loops. Each step you take the current number and add 5 or multiply by 3. Do your tests, and if you find the input value, then return 0 or something (You did not specify).
The key here is to thing about the data structure and how to search it. Do you understand why it should be breadth first? Do you understand why it is a binary tree?
In response to comments:
First off I admire your efforts. Solving this kind of problem, independent of language, is a great way to approach a problem. It is not about stupid trick in Javascript (or any other language).
So the first concept to get down is that you "searching" for a solution, if you don't find one return -1.
Second you should do some research on binary trees. They are a very important concept!
Third you should then go breadth first search. However, that is the least important. It just makes the problem a bit more efficient.
what about the number 13 which can be found by a MIXTURE of BOTH paths?? I can only do one thing or the other!
Well, actually you can do both. As in the example in chapter 3 of the book you mention, you'll see that the function find is called twice inside itself -- the function is trying both paths at any choice point and the first correct solution is returned (you could also experiment with altering the overall function so it will return all correct paths).
How would I know which path to take and how many times I should take that path? How would I bounce back and forth between paths?
Basically, bouncing back and forth between paths is achieved by traveling both of them. You know if it's the right path if the function hits the target number.

Recursion to iteration - or optimization?

I've got an algorithm which searches for all possible paths between two nodes in graph, but I'm choosing path without repeating nodes and with my specified length of that path.
It is in ActionScript3 and I need to change my algorithm to iterative or to optimize it (if it's possible).
I have no idea how to do that and I'm not sure if changing to iterative will bring some better execution times of that function. Maybe it can't be optimized. I'm not sure.
Here is my function:
http://pastebin.com/nMN2kkpu
If someone could give some tips about how to solve that, that would be great.
For one thing, you could sort the edges by starting vertex. Then, iterating through a vertex' neighbours will be proportional to the number of neighbours of this vertex (while right now it's taking O(M), where M is the edge count for the whole graph).
If you relax the condition of not repeating a vertex, I'm certain the problem can be solved in better time.
If, however, you need that, I'm afraid there's no simple change that would make your code way faster. I can't guarantee on this, though.
Also, if I am correct, the code snippet tests if the edge is used already, not if the vertex is used. Another thing I noticed is that you don't stop the recursion once you've found such a path. Since in most* graphs, such a path will exist for reasonable values of length, I'd say if you need only one such path, you might be wasting a lot of CPU time after one such path is found.
*Most - to be read 'maybe most (IMO)'.

N-Puzzle with 5x5 grid, theory question

I'm writing a program which solves a 24-puzzle (5x5 grid) using two heuristic. The first uses how many blocks the incorrect place and the second uses the Manhattan distance between the blocks current place and desired place.
I have different functions in the program which use each heuristic with an A* and a greedy search and compares the results (so 4 different parts in total).
I'm curious whether my program is wrong or whether it's a limitation of the puzzle. The puzzle is generated randomly with pieces being moved around a few times and most of the time (~70%) a solution is found with most searches, but sometimes they fail.
I can understand why greedy would fail, as it's not complete, but seeing as A* is complete this leads me to believe that there's an error in my code.
So could someone please tell me whether this is an error in my thinking or a limitation of the puzzle? Sorry if this is badly worded, I'll rephrase if necessary.
Thanks
EDIT:
So I"m fairly sure it's something I'm doing wrong. Here's a step-by-step list of how I'm doing the searches, is anything wrong here?
Create a new list for the fringe, sorted by whichever heuristic is being used
Create a set to store visited nodes
Add the initial state of the puzzle to the fringe
while the fringe isn't empty..
pop the first element from the fringe
if the node has been visited before, skip it
if node is the goal, return it
add the node to our visited set
expand the node and add all descendants back to the fringe
If you mean that sliding puzzle: This is solvable if you exchange two pieces from a working solution - so if you don't find a solution this doesn't tell anything about the correctness of your algorithm.
It's just your seed is flawed.
Edit: If you start with the solution and make (random) legal moves, then a correct algorithm would find a solution (as reversing the order is a solution).
It is not completely clear who invented it, but Sam Loyd popularized the 14-15 puzzle, during the late 19th Century, which is the 4x4 version of your 5x5.
From the Wikipedia article, a parity argument proved that half of the possible configurations are unsolvable. You are probably running into something similar when your search fails.
I'm going to assume your code is correct, and you implemented all the algorithms and heuristics correctly.
This leaves us with the "generated randomly" part of your puzzle initialization. Are you sure you are generating correct states of the puzzle? If you generate an illegal state, obviously there will be no solution.
While the steps you have listed seem a little incomplete, you have listed enough to ensure that your A* will reach a solution if there is one (albeit not optimal as long as you are just simply skipping nodes).
It sounds like either your puzzle generation is flawed or your algorithm isn't implemented correctly. To easily verify your puzzle generation, store the steps used to generate the puzzle, and run it in reverse and check if the result is a solution state before allowing the puzzle to be sent to the search routines. If you ever generate an invalid puzzle, dump the puzzle, and expected steps and see where the problem is. If the puzzle passes and the algorithm fails, you have at least narrowed down where the problem is.
If it turns out to be your algorithm, post a more detailed explanation of the steps you have actually implemented (not just how A* works, we all know that), like for instance when you run the evaluation function, and where you resort the list that acts as your queue. That will make it easier to determine a problem within your implementation.

Resources