Unknown algorithm name - algorithm

I have a matching problem, and have designed a method of solving it.
I need to know if an algorithm exists, if so the name, for the following situation, I have looked a lot, but cannot find anything. The closest I have come to is round robin, but thats not quite the same.
It is similar to some networking problems, but they generally don't get the most optimal route, they just settle for a good one. I need the most optimal.
This is a long read, and an unusual request for SO, but I cannot find a name for it anywhere.
The Problem
I have a pile of items.
Each item can be potentially connected to 1 or more other items.
Each item can only be paired once.
Each connection has a value.
I need to find what combination of pairs will result in the highest connection value.
My Solution
find all pairs for all items and store it in a map
take the first item and pair it with the first item in the maps value.
take the next item and pair it with the first unused item.
keep doing this until no more unused items exist.
save this combination or pairs total value.
change the last pair in the pair if possible.
compare to saved combination, if more save new combination
when the pair can no longer be changed, delete it and change the new last pair, find more possible pairs.
this keeps going until the combination list is reduced to size zero
The last saved combination is the best one.
(Fin)

This problem is basically just a maximum weighted matching.
For bipartite graphs, finding a maximum matching is easy. For arbitrary graphs, it's harder but still doable. Wikipedia suggests an algorithm by Edmonds called the Blossom Algorithm.
As for your algorithm, it's not clear exactly what you're doing, but it looks like a greedy assignment followed by hill climbing. I'm concerned that your algorithm isn't guaranteed to produce an optimal result. Have you actually proven this? How do you know it won't just get stuck in a local minima?

I believe you are looking for the Gale-Shapley algorithm, which solves the stable marriage problem.

Related

Algorithm Problem: Word transformation from a given word to another using only the words in a given dict

The detailed description of the problem is as follows:
Given two words (beginWord and endWord), and a dictionary's word list, find if there's a transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
I know this word can be solved using breadth-first-search. After I proposed the normal BFS solution, the interviewer asked me if I can make it faster. I didn't figure out a way to speed up. And the interviewer told me I should use a PriorityQueue instead to do a "Best-First-Search". And the priority is given by the hamming distance between the current word and target.
I don't quite understand why this can speed up the search. I feel by using priorityQueue we try to search the path that makes progress (i.e. reducing hamming distance).
This seems to be a greedy method. My questions is:
Why this solution is faster than the breadth-first-search solution? I feel the actual path can be like this: at first not making any progress, or even increasing the hamming distance, but after reaching a word the hamming distance goes down gradually. In this scenario, I think the priority queue solution will be slower.
Any suggestions will be appreciated! Thanks
First, I'd recommend to do some thorough reading on graph searching algorithms, that will explain the question to any detail you want (and far beyond).
TL;DR:
Your interviewer effectively recommended something close to the A* algorithm.
It differs from BFS in one aspect: which node to expand first. It uses a notion of distance score, composed from two elements:
At a node X, we already "traveled" a distance given by the number of transformations so far.
To reach the target from X, we still need to travel some more, at least N steps, where N is the number of characters different between node and target.
If we are to follow the path through X, the total number of steps from start to target can't be less than this score. It can be more if the real rest distance turns out to be longer (some words necessary for the direct path don't exist in the dictionary).
A* tells us: of all open (unexpanded) nodes, try the one first that potentially gives the shortest overall solution path, i.e. the one with the lowest score. And to implement that, a priority queue is a good fit.
In many cases, A* can dramatically reduce the search space (compared to BFS), and it still guarantees to find the best solution.
A* is NOT a greedy algorithm. It will eventually explore the whole search space, only in a much better ordering than a blind BFS.

Writing Simulated Annealing algorithm for 0-1 knapsack in C#

I'm in the process of learning about simulated annealing algorithms and have a few questions on how I would modify an example algorithm to solve a 0-1 knapsack problem.
I found this great code on CP:
http://www.codeproject.com/KB/recipes/simulatedAnnealingTSP.aspx
I'm pretty sure I understand how it all works now (except the whole Bolzman condition, as far as I'm concerned is black magic, though I understand about escaping local optimums and apparently this does exactly that). I'd like to re-design this to solve a 0-1 knapsack-"ish" problem. Basically I'm putting one of 5,000 objects in 10 sacks and need to optimize for the least unused space. The actual "score" I assign to a solution is a bit more complex, but not related to the algorithm.
This seems easy enough. This means the Anneal() function would be basically the same. I'd have to implement the GetNextArrangement() function to fit my needs. In the TSM problem, he just swaps two random nodes along the path (ie, he makes a very small change each iteration).
For my problem, on the first iteration, I'd pick 10 random objects and look at the leftover space. For the next iteration, would I just pick 10 new random objects? Or am I best only swapping out a few of the objects, like half of them or only even one of them? Or maybe the number of objects I swap out should be relative to the temperature? Any of these seem doable to me, I'm just wondering if someone has some advice on the best approach (though I can mess around with improvements once I have the code working).
Thanks!
Mike
With simulated annealing, you want to make neighbour states as close in energy as possible. If the neighbours have significantly greater energy, then it will just never jump to them without a very high temperature -- high enough that it will never make progress. On the other hand, if you can come up with heuristics that exploit lower-energy states, then exploit them.
For the TSP, this means swapping adjacent cities. For your problem, I'd suggest a conditional neighbour selection algorithm as follows:
If there are objects that fit in the empty space, then it always puts the biggest one in.
If no objects fit in the empty space, then pick an object to swap out -- but prefer to swap objects of similar sizes.
That is, objects have a probability inverse to the difference in their sizes. You might want to use something like roulette selection here, with the slice size being something like (1 / (size1 - size2)^2).
Ah, I think I found my answer on Wikipedia.. It suggests moving to a "neighbor" state, which usually implies changing as little as possible (like swapping two cities in a TSM problem)..
From: http://en.wikipedia.org/wiki/Simulated_annealing
"The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour."
So I believe my GetNextArrangement function would want to swap out a random item with an item unused in the set..

Calculate shortest path through a grocery store

I'm trying to find a way to find the shortest path through a grocery store, visiting a list of locations (shopping list). The path should start at a specified start position and can end at multiple end positions (there are multiple checkout counters).
Also, I have some predefined constraints on the path, such as "item x on the shopping list needs to be the last, second last, or third last item on the path". There is a function that will return true or false for a given path.
Finally, this needs to be calculated with limited CPU power (on a smartphone) and within a second or so. If this isn't possible, then an approximation to the optimal path is also ok.
Is this possible? So far I think I need to start by calculating the distance between every item on the list using something like A* or Dijkstra's. After that, should I treat it like the traveling salesman problem? Because in my problem there is a specified start node, specified end nodes, and some constraints, which are not in the traveling salesman problem.
It seems like viewing it as a TSP problem makes it more difficult. Someone pointed out that grocery stories aren't that complicated. In the grocery stores I am familiar with (in the US), there aren't that many reasonable routes. Especially if you have a given starting point. I think a well thought-out heuristic will probably do the trick.
For example: I typically start at one end-- if it's a big trip I make sure I go through the frozen foods last, but it often doesn't matter and I'll start closest to where I enter the store. I generally walk around the outside, only going down individual aisles if I need something in that one. Once you go into an aisle, pick up everything in that one. With some aisles its better to drop into one end, grab the item, and go back to your starting point, and others you just commit to the whole aisle-- it's a function of the last item you need in that aisle and where you need to be next-- how to get out of the aisle depends on the next item needed-- it may or may not involve a backtrack-- but the computer can easily calculate the shortest path to the next items.
So I agree with the helpful hints of the other problems above, but maybe a less general algorithm will work. And will probably work better with limited resources. TSP tells us, however, you can't prove that it's the optimal approach but I suspect that's not really necessary...
Well, basically it is a traveling salesman problem, but with your requirements you limit the search space pretty well. If there are not too many locations, you could try to calculate all possible options, but if that is not feasible, you need to limit the search space even more.
You can make the search time limited so that you will return the shortest path you can find in 1 second, but that might not be the shortest of them all, but pretty short anyway.
You could also apply Greedy algorithm to find the next closest location, then using Backtracking select the next closest location etc.
Pseudo code for possible solution:
def find_shortest_path(current_path, best_path):
if time_limit()
return
if len(current_path) == NUM_LOCATIONS:
# destionation reached
if calc_len(current_path) < calc_len(best_path):
best_path = current_path
return
# You need to sort the possible locations well to maximize your chances of finding the
# the shortest solution
sort(possible_locations)
for location in possible_locations:
find_shortest_path(current_path + location, best_path)
Well, you could limit the search-space using information about the layout of the store. For instance, a typical store (at least here in Germany) has many shelves in it that can be considered to form lanes. In between there are orthogonal lanes that connect the shelf-lanes. Now you define the crossings to be the nodes and the lanes to be edges in a graph. The edges are labeled with all the items in the shelves of that section of lane. Now, even for a big store this graph would be pretty small. You would now have to find the shortest path that includes all the edge-labels (items) you need. This should be possible by using the greedy/backtracking approach Tuomas Pelkonen suggested.
This is just an idea, and I do not know if it really works, but maybe you can take it from here.
Only breadth first searching will make sure you don't "miss" a path through the store which is better than you're current "best" solution, but you need not search every node in the path. Nodes which are "obviously" longer than the current "best" solution may be expanded later.
This means you approach the problem like a "breath first" search, but alter the expansion of your nodes based on the current distance travelled. Some branches of the search tree will expand faster than others, because the manage to visit more nodes in the same amount of time.
So if node expansion is not truly breath-first, why do I keep using that word? Because after you do find a solution, you must still expand the current set of "considered nodes" until each one of those search paths exceed the solution. This avoids missing a path which has a lot of time consuming initial legs, but finishes faster than the current solution because the last step is lighting fast.
The requirement of a start node is fictitious. Using the TSP you'll end up with a tour where you can chose the start node that you want without altering the solution cost.
It's somewhat trickier when it comes to the counters: what you need is to solve the problem on a directed graph with some arcs missing (or, which is the same, where some arcs have a really high cost).
Starting with the complete directed graph you should modify the costs of the proper arcs in order to:
deny going from the items to the start node
deny going from the counters to the items
deny going from the start node to the counters
allow going from the counters to the start node at zero cost (this are only needed to close the path)
after having drawn the thing down, tell me if I missed something :)
HTH

Efficient word scramble algorithm

I'm looking for an efficient algorithm for scrambling a set of letters into a permutation containing the maximum number of words.
For example, say I am given the list of letters: {e, e, h, r, s, t}. I need to order them in such a way as to contain the maximum number of words. If I order those letters into "theres", it contain the words "the", "there", "her", "here", and "ere". So that example could have a score of 5, since it contains 5 words. I want to order the letters in such a way as to have the highest score (contain the most words).
A naive algorithm would be to try and score every permutation. I believe this is O(n!), so 720 different permutations would be tried for just the 6 letters above (including some duplicates, since the example has e twice). For more letters, the naive solution quickly becomes impossible, of course.
The algorithm doesn't have to actually produce the very best solution, but it should find a good solution in a reasonable amount of time. For my application, simply guessing (Monte Carlo) at a few million permutations works quite poorly, so that's currently the mark to beat.
I am currently using the Aho-Corasick algorithm to score permutations. It searches for each word in the dictionary in just one pass through the text, so I believe it's quite efficient. This also means I have all the words stored in a trie, but if another algorithm requires different storage that's fine too. I am not worried about setting up the dictionary, just the run time of the actual ordering and searching. Even a fuzzy dictionary could be used if needed, like a Bloom Filter.
For my application, the list of letters given is about 100, and the dictionary contains over 100,000 entries. The dictionary never changes, but several different lists of letters need to be ordered.
I am considering trying a path finding algorithm. I believe I could start with a random letter from the list as a starting point. Then each remaining letter would be used to create a "path." I think this would work well with the Aho-Corasick scoring algorithm, since scores could be built up one letter at a time. I haven't tried path finding yet though; maybe it's not a even a good idea? I don't know which path finding algorithm might be best.
Another algorithm I thought of also starts with a random letter. Then the dictionary trie would be searched for "rich" branches containing the remain letters. Dictionary branches containing unavailable letters would be pruned. I'm a bit foggy on the details of how this would work exactly, but it could completely eliminate scoring permutations.
Here's an idea, inspired by Markov Chains:
Precompute the letter transition probabilities in your dictionary. Create a table with the probability that some letter X is followed by another letter Y, for all letter pairs, based on the words in the dictionary.
Generate permutations by randomly choosing each next letter from the remaining pool of letters, based on the previous letter and the probability table, until all letters are used up. Run this many times.
You can experiment by increasing the "memory" of your transition table - don't look only one letter back, but say 2 or 3. This increases the probability table, but gives you more chance of creating a valid word.
You might try simulated annealing, which has been used successfully for complex optimization problems in a number of domains. Basically you do randomized hill-climbing while gradually reducing the randomness. Since you already have the Aho-Corasick scoring you've done most of the work already. All you need is a way to generate neighbor permutations; for that something simple like swapping a pair of letters should work fine.
Have you thought about using a genetic algorithm? You have the beginnings of your fitness function already. You could experiment with the mutation and crossover (thanks Nathan) algorithms to see which do the best job.
Another option would be for your algorithm to build the smallest possible word from the input set, and then add one letter at a time so that the new word is also is or contains a new word. Start with a few different starting words for each input set and see where it leads.
Just a few idle thoughts.
It might be useful to check how others solved this:
http://sourceforge.net/search/?type_of_search=soft&words=anagram
On this page you can generate anagrams online. I've played around with it for a while and it's great fun. It doesn't explain in detail how it does its job, but the parameters give some insight.
http://wordsmith.org/anagram/advanced.html
With javascript and Node.js I implemented a jumble solver that uses a dictionary and create a tree and then traversal the tree after that you can get all possible words, I explained the algorithm in this article in detail and put the source code on GitHub:
Scramble or Jumble Word Solver with Express and Node.js

Algorithm for shortening a series of actions?

It's been awhile since my algorithms class in school, so forgive me if my terminology is not exact.
I have a series of actions that, when run, produces some desired state (it's basically a set of steps to reproduce a bug, but that doesn't matter for the sake of this question).
My goal is to find the shortest series of steps that still produces the desired state. Any given step might be unnecessary, so I'm trying to remove those as efficiently as possible.
I want to preserve the order of the steps (so I can remove steps, but not rearrange them).
The naive approach I'm taking is to take the entire series and try removing each action. If I successfully can remove one action (without altering the final state), I start back at the beginning of the series. This should be O(n^2) in the worst case.
I'm starting to play around with ways to make this more efficient, but I'm pretty sure this is a solved problem. Unfortunately, I'm not sure exactly what to Google - the series isn't really a "path," so I can't use path-shortening algorithms. Any help - even just giving me some terms to search - would be helpful.
Update: Several people have pointed out that even my naive algorithm won't find the shortest solution. This is a good point, so let me revise my question slightly: any ideas about approximate algorithms for the same problem? I'd rather have a short solution that's near the shortest solution quickly than take a very long time to guarantee the absolute shortest series. Thanks!
Your naive n^2 approach is not exactly correct; in the worst case you might have to look at all subsets (well actually the more accurate thing to say is that this problem might be NP-hard, which doesn't mean "might have to look at all subsets", but anyway...)
For example, suppose you are currently running steps 12345, and you start trying to remove each of them individually. Then you might find that you can't remove 1, you can remove 2 (so you remove it), then you look at 1345 and find that each of them is essential -- none can be removed. But it might turn out that actually, if you keep 2, then just "125" suffice.
If your family of sets that produce the given outcome is not monotone (i.e. if it doesn't have the property that if a certain set of actions work, then so will any superset), then you can prove that there is no way of finding the shortest sequence without looking at all subsets.
If you are making strickly no assumptions about the effect of each action and you want to strickly find the smallest subset, then you will need to try all possible subets of actions to find the shortest seuence.
The binary search method stated, would only be sufficient if a single step caused your desired state.
For the more general state, even removing a single action at a time would not necessarily give you the shortest sequence. This is the case if you consider pathological examples where actions may together cause no problem, but individually trigger your desired state.
Your problem seem reducable to a more general search problem, and the more assumptions you can create the smaller your search space will become.
Delta Debugging, A method for minimizing a set of failure inducing input, might be a good fit.
I've previously used Delta(minimizes "interesting" files, based on test for interestingness) to reduce a ~1000 line file to around 10 lines, for a bug report.
The most obvious thing that comes to mind is a binary search-inspired recursive division into halves, where you alternately leave out each half. If leaving out a half at any stage of the recursion still reproduces the end state, then leave it out; otherwise, put it back in and recurse on both halves of that half, etc.
Recursing on both halves means that it tries to eliminate large chunks before giving up and trying smaller chunks of those chunks. The running time will be O(n log(n)) in the worst case, but if you have a large n with a high likelihood of many irrelevant steps, it ought to win ahead of the O(n) approach of trying leaving out each step one at a time (but not restarting).
This algorithm will only find some minimal paths, though, it can't find smaller paths that may exist due to combinatorial inter-step effects (if the steps are indeed of that nature). Finding all of those will result in combinatorial explosion, though, unless you have more information about the steps with which to reason (such as dependencies).
You problem domain can be mapped to directional graph where you have states as nodes and steps as links , you want to find the shortest path in a graph , to do this a number of well known algorithms exists for example Dijkstra's or A*
Updated:
Let's think about simple case you have one step what leads from state A to state B this can be drawn as 2 nodes conected by a link. Now you have another step what leads from A to C and from C you havel step what leads to B. With this you have graph with 3 nodes and 3 links, a cost of reaching B from A it eather 2 (A-C-B) or 1 (A-B).
So you can see that cost function is actualy very simple you add 1 for every step you take to reach the goal.

Resources