Finding a sequence of operations - algorithm

This should eventually be written in JavaScript. But I feel that I should not type any code until my algorithm is clear, which it is not!
Problem Given: Starting at 1, write a function that given a number returns a sequence of operations that consist only of either "+5" or "*3" that produce the number in question.
My basic algorithm:
Get the number
if the number is 1
return 1.
else if we surpass the number
return -1.
else keep trying to "+5" or "*3" until number is reached, assuming it can be reached.
My problem is with step # 4: I see that there are two paths to take which will bring me to the number in question(target), either "+5" OR "*3", but what about the number 13 which can be found by a MIXTURE of BOTH paths?? I can only do one thing or the other!
How would I know which path to take and how many times I should take that path? How would I bounce back and forth between paths?

I agree with the concept of breadth first search in a binary tree. However, I suggest turning the problem around, and looking at the problem of using "-5" or "/3" to get from the target back to 1. That allows pruning based on the target.
For example, 13 is not divisible by 3, so the first step in the backwards problem for target 13 must be "-5", not "/3".
It does not change the complexity, but may make the algorithm faster in practice for small problems.

You essentially want to do a breadth first, binary search tree. You could use recursion, or just some while loops. Each step you take the current number and add 5 or multiply by 3. Do your tests, and if you find the input value, then return 0 or something (You did not specify).
The key here is to thing about the data structure and how to search it. Do you understand why it should be breadth first? Do you understand why it is a binary tree?
In response to comments:
First off I admire your efforts. Solving this kind of problem, independent of language, is a great way to approach a problem. It is not about stupid trick in Javascript (or any other language).
So the first concept to get down is that you "searching" for a solution, if you don't find one return -1.
Second you should do some research on binary trees. They are a very important concept!
Third you should then go breadth first search. However, that is the least important. It just makes the problem a bit more efficient.

what about the number 13 which can be found by a MIXTURE of BOTH paths?? I can only do one thing or the other!
Well, actually you can do both. As in the example in chapter 3 of the book you mention, you'll see that the function find is called twice inside itself -- the function is trying both paths at any choice point and the first correct solution is returned (you could also experiment with altering the overall function so it will return all correct paths).
How would I know which path to take and how many times I should take that path? How would I bounce back and forth between paths?
Basically, bouncing back and forth between paths is achieved by traveling both of them. You know if it's the right path if the function hits the target number.

Related

How can this bipartite matching solution be improved?

I'm working through codefights and am attempting the busyHolidays challenge from the Instacart company challenges.
The challenge provides three arrays. Shoppers contains strings representing the start and end times of their shifts. Orders contains strings representing the start and end times of the orders, and leadTime contains integers representing the number of minutes it takes to complete the job.
The goal is to determine if the orders can be matched to shoppers such that each shopper has only one order and each order has a shopper. An order may only be matched to a shopper if the shopper can both begin and complete it within the order time.
I have a solution that passes 19/20 tests, but since I can't see the last test I have no idea what's going wrong. I originally spent a couple days trying to learn algorithms like Edmond's Algorithm and the Hungarian Algorithm, but my lack of CS background and weakness in math kind of bit me in the ass and I can't seem to wrap my head around how to actually implement those methodologies, so I came up with a solution that involves weighting each node on each side of the graph according to its number of possible connections. I would appreciate it if anyone could help me take a look at my solution and either point out where it might be messing up or suggest a more standard solution to the problem in a way that might be easier for someone without formal training in algorithms to understand. Thanks in advance.
I'll put the code in a gist since it's fairly length
Code: https://gist.github.com/JakeTompkins/7e1afc4722fb828f26f8f6a964774a25
Well, I don't see any reason to think that the algorithm you're writing is actually going to work so the question about how you might be messing it up doesn't seem to be relevant.
You have correctly identified this as an instance of the assignment problem. More specifically this is the "maximum bipartite matching" problem, and the Edmonds-Karp algorithm is the simpliest way to solve it (https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm)
However, this is an algorithm for finding the maximum flow in a network, which is a larger problem than simple bipartite matching, and the explanations of this algorithm are really a lot more complicated than you need. It's understandable that you had some trouble implementing this from the literature, but actually when the problem is reduced to simple (unweighted) bipartite matching, the algorithm is easy to understand:
Make an initial assignment
Try to find an improvement
Repeat until no more improvements can be found.
For bipartite matching, an "improvement" always has the same form, which is what makes this problem easy to solve. To find an improvement, you have to find a path that connects an unassigned shopper to an unassigned order, following these rules:
The path can go from any shopper to any order he/she could fulfill but does not
The path can go from any order only to the shopper that is fulfilling it in the current assignment.
You use bread-first search to find the shortest path, which will correspond to the improvement that changes the smallest number of existing assignments.
The path you find will necessarily have an odd number of edges, and the even-numbered edges will be assignments. To implement the improvement, you remove those assignments and replace them with the odd-numbered edges. There's one more of those, which is what makes it an improvement. It looks like this:
PREVIOUS PATH FOUND IMPROVED ASSIGNMENT
1 1 1
/ /
A A A
\ \
2 2 2
/ /
B B B
\ \
3 3 3
/ /
C C C

When using dynamic programming, capturing the entire path for a min-sum?

I am trying to use the Viterbi min-sum algorithm which tries to find the pathway through a bunch of nodes that minimizes the overall Hamming distance (fancy term for "xor two numbers and count the resulting bits") against some fixed input.
I understand find how to use DP to compute the minimal distance overall, but I am having trouble using it to also capture the corresponding path that corresponds to the minimal distance.
It seems like memoizing the path at each node would be really memory-intensive. Is there a standard way to handle these kinds of problems?
Edit:
http://i.imgur.com/EugiEWG.jpg
Here is a sample trellis with what I am talking about. The general idea is to find the path through the trellis that most closely emulates the input bitstring, with minimal error (measured by minimizing overall Hamming distance, or the number of mismatched bits).
As you can see, the first chunk of my input string is 01, and I can traverse there in column 1 of the trellis. The next chunk is 10, and I can move there in column 2. Next chunk is 11. Fine so far. Next chunk is 10, which is a problem because I can't reach that state from where I am now, so I have to go to the next best thing (00) and the rest can be filled fine.
But this can become more complex. I'd need to be able to somehow get the corresponding path to the minimal Hamming distance.
(The point of this exercise is that the trellis represents what are ACTUALLY valid transitions, whereas the input string is something you receive through telecommunicationa and might get garbled and have incorrect bits here and there. This program tries to figure out what the input string SHOULD be by minimizing error).
There's the usual "follow path backwards" technique, requiring only the table of values (but the whole table of values, no cheating with "keep only the most recent part"). The algorithm is simple: start at the end, decide which way you came from. You can make that decision, because either there's exactly one way such that if you came from it you'd compute the value that matches the stored one, or several result in the same value and it wouldn't matter which one you chose.
Storing also a table of "back-pointers" doesn't take much space (about as much as the table of weights, but you can actually omit most of the table of weights if you do this), doing it that way allows you to have a much simpler backwards phase: just follow the pointers. That really is the path, just stored backwards.
You are correct that the immediate approach for calculating the paths, is space expensive.
This problem comes up often in DNA sequencing, where the cost is prohibitive. There are a number of ways to overcome it (see more here):
You can reduce up to a square root of the space if you are willing to double the execution time (see 2.1.1 in the link above).
Using a compressed tree, you can reduce one of the dimensions logarithmically (see 2.1.2 in the link above).

Is my heuristic algorithm correct? (Sudoku solver)

First of -yes this IS a homework- but it's primarily a theoretical question rather than a practical one, I am simply asking a confirmation if I am thinking correctly or any hints if I am not.
I have been asked to compile a simple Sudoku solver (on Prolog but that is not so important right now) with the only limitation being that it must utilize a heuristic function using Best-First Algorithm. The only heuristic function I have been able to come up with is explained below:
1. Select an empty cell.
1a. If there are no empty cells and there is a solution return solution.
Else return No.
2. Find all possible values it can hold. %% It can't take values currently assigned to cells on the same line/column/box.
3. Set to all those values a heuristic number starting from 1.
4. Pick the value whose heuristic number is the lowest && you haven't checked yet.
4a. If there are no more values return no.
5. If a solution is not found: GoTo 1.
Else Return Solution.
// I am sorry for errors in this "pseudo code." If you want any clarification let me know.
So am I doing this right or is there any other way around and mine is false?
Thanks in advance.
The heuristic I would use is this:
Repeatedly find any empty spaces where there is only one possible number you can insert. Fill them with the number 1-9 that fits.
If every empty space has two or more possibilities, push the game state onto a stack, then pick a random square to fill in with a random value.
Go to step 1.
If you manage to fill every square, you've found a valid solution.
If you get to a point where there are no valid options, pop the last game state off the stack (i.e. backtrack to the last time you made a random choice.) Make a different choice and try again.
As an interesting sidenote, you've been told to do this using a greedy heuristic approach, but Sudoku can actually be reduced to a boolean satisfiability problem (SAT problem) and solved using a general-purpose SAT solver. This is very elegant and can actually be faster than a heuristic approach.
When I wrote a sudoku solver myself in Prolog, the algorithm I used was the following:
filter out cells already solved (ie the given values at the start)
for each cell, build a list containing all its neighbours (that's 20 cells).
for each cell, build a list containing all the possible values it can take (easy to do once the above is done)
in the list containing all the cells to solve, put one with the minimum number of values available on top
if the top cell has 0 remaining possibility, go to 7, else, go to 6, if the list is empty, you have a solution.
for the cell of the top of the list: pick a random number in the possible values of the cell. Remove this value in all the possible values of its neighbours. Go to 5.
backtrack (ie, fail in Prolog)
This algorithm always sorts the "most solved" cell first and detects failure early enough. It reduces solving time quite a lot compared to an algorithm that solves a random cell.
What you have described is Most Constrained Variable heuristic. It picks up the cell that has least number of possibilities and then branches recursively in depth starting from that cell. This heuristic is extremely fast in depth-first search algorithms because it detects collisions early, near the root, while the search tree is still small.
Here is the implementation of Most Constrained Variable heuristic in C#: Exercise #2: Sudoku Solver
This text also contains the analysis of total number of visits to Sudoku cells by this algorithm - it is surprisingly small. It almost looks like the heuristic solves Sudoku in the first try.

N-Puzzle with 5x5 grid, theory question

I'm writing a program which solves a 24-puzzle (5x5 grid) using two heuristic. The first uses how many blocks the incorrect place and the second uses the Manhattan distance between the blocks current place and desired place.
I have different functions in the program which use each heuristic with an A* and a greedy search and compares the results (so 4 different parts in total).
I'm curious whether my program is wrong or whether it's a limitation of the puzzle. The puzzle is generated randomly with pieces being moved around a few times and most of the time (~70%) a solution is found with most searches, but sometimes they fail.
I can understand why greedy would fail, as it's not complete, but seeing as A* is complete this leads me to believe that there's an error in my code.
So could someone please tell me whether this is an error in my thinking or a limitation of the puzzle? Sorry if this is badly worded, I'll rephrase if necessary.
Thanks
EDIT:
So I"m fairly sure it's something I'm doing wrong. Here's a step-by-step list of how I'm doing the searches, is anything wrong here?
Create a new list for the fringe, sorted by whichever heuristic is being used
Create a set to store visited nodes
Add the initial state of the puzzle to the fringe
while the fringe isn't empty..
pop the first element from the fringe
if the node has been visited before, skip it
if node is the goal, return it
add the node to our visited set
expand the node and add all descendants back to the fringe
If you mean that sliding puzzle: This is solvable if you exchange two pieces from a working solution - so if you don't find a solution this doesn't tell anything about the correctness of your algorithm.
It's just your seed is flawed.
Edit: If you start with the solution and make (random) legal moves, then a correct algorithm would find a solution (as reversing the order is a solution).
It is not completely clear who invented it, but Sam Loyd popularized the 14-15 puzzle, during the late 19th Century, which is the 4x4 version of your 5x5.
From the Wikipedia article, a parity argument proved that half of the possible configurations are unsolvable. You are probably running into something similar when your search fails.
I'm going to assume your code is correct, and you implemented all the algorithms and heuristics correctly.
This leaves us with the "generated randomly" part of your puzzle initialization. Are you sure you are generating correct states of the puzzle? If you generate an illegal state, obviously there will be no solution.
While the steps you have listed seem a little incomplete, you have listed enough to ensure that your A* will reach a solution if there is one (albeit not optimal as long as you are just simply skipping nodes).
It sounds like either your puzzle generation is flawed or your algorithm isn't implemented correctly. To easily verify your puzzle generation, store the steps used to generate the puzzle, and run it in reverse and check if the result is a solution state before allowing the puzzle to be sent to the search routines. If you ever generate an invalid puzzle, dump the puzzle, and expected steps and see where the problem is. If the puzzle passes and the algorithm fails, you have at least narrowed down where the problem is.
If it turns out to be your algorithm, post a more detailed explanation of the steps you have actually implemented (not just how A* works, we all know that), like for instance when you run the evaluation function, and where you resort the list that acts as your queue. That will make it easier to determine a problem within your implementation.

Algorithm for shortening a series of actions?

It's been awhile since my algorithms class in school, so forgive me if my terminology is not exact.
I have a series of actions that, when run, produces some desired state (it's basically a set of steps to reproduce a bug, but that doesn't matter for the sake of this question).
My goal is to find the shortest series of steps that still produces the desired state. Any given step might be unnecessary, so I'm trying to remove those as efficiently as possible.
I want to preserve the order of the steps (so I can remove steps, but not rearrange them).
The naive approach I'm taking is to take the entire series and try removing each action. If I successfully can remove one action (without altering the final state), I start back at the beginning of the series. This should be O(n^2) in the worst case.
I'm starting to play around with ways to make this more efficient, but I'm pretty sure this is a solved problem. Unfortunately, I'm not sure exactly what to Google - the series isn't really a "path," so I can't use path-shortening algorithms. Any help - even just giving me some terms to search - would be helpful.
Update: Several people have pointed out that even my naive algorithm won't find the shortest solution. This is a good point, so let me revise my question slightly: any ideas about approximate algorithms for the same problem? I'd rather have a short solution that's near the shortest solution quickly than take a very long time to guarantee the absolute shortest series. Thanks!
Your naive n^2 approach is not exactly correct; in the worst case you might have to look at all subsets (well actually the more accurate thing to say is that this problem might be NP-hard, which doesn't mean "might have to look at all subsets", but anyway...)
For example, suppose you are currently running steps 12345, and you start trying to remove each of them individually. Then you might find that you can't remove 1, you can remove 2 (so you remove it), then you look at 1345 and find that each of them is essential -- none can be removed. But it might turn out that actually, if you keep 2, then just "125" suffice.
If your family of sets that produce the given outcome is not monotone (i.e. if it doesn't have the property that if a certain set of actions work, then so will any superset), then you can prove that there is no way of finding the shortest sequence without looking at all subsets.
If you are making strickly no assumptions about the effect of each action and you want to strickly find the smallest subset, then you will need to try all possible subets of actions to find the shortest seuence.
The binary search method stated, would only be sufficient if a single step caused your desired state.
For the more general state, even removing a single action at a time would not necessarily give you the shortest sequence. This is the case if you consider pathological examples where actions may together cause no problem, but individually trigger your desired state.
Your problem seem reducable to a more general search problem, and the more assumptions you can create the smaller your search space will become.
Delta Debugging, A method for minimizing a set of failure inducing input, might be a good fit.
I've previously used Delta(minimizes "interesting" files, based on test for interestingness) to reduce a ~1000 line file to around 10 lines, for a bug report.
The most obvious thing that comes to mind is a binary search-inspired recursive division into halves, where you alternately leave out each half. If leaving out a half at any stage of the recursion still reproduces the end state, then leave it out; otherwise, put it back in and recurse on both halves of that half, etc.
Recursing on both halves means that it tries to eliminate large chunks before giving up and trying smaller chunks of those chunks. The running time will be O(n log(n)) in the worst case, but if you have a large n with a high likelihood of many irrelevant steps, it ought to win ahead of the O(n) approach of trying leaving out each step one at a time (but not restarting).
This algorithm will only find some minimal paths, though, it can't find smaller paths that may exist due to combinatorial inter-step effects (if the steps are indeed of that nature). Finding all of those will result in combinatorial explosion, though, unless you have more information about the steps with which to reason (such as dependencies).
You problem domain can be mapped to directional graph where you have states as nodes and steps as links , you want to find the shortest path in a graph , to do this a number of well known algorithms exists for example Dijkstra's or A*
Updated:
Let's think about simple case you have one step what leads from state A to state B this can be drawn as 2 nodes conected by a link. Now you have another step what leads from A to C and from C you havel step what leads to B. With this you have graph with 3 nodes and 3 links, a cost of reaching B from A it eather 2 (A-C-B) or 1 (A-B).
So you can see that cost function is actualy very simple you add 1 for every step you take to reach the goal.

Resources