Algorithm Problem: Word transformation from a given word to another using only the words in a given dict

Algorithm Problem: Word transformation from a given word to another using only the words in a given dict - algorithm

The detailed description of the problem is as follows:
Given two words (beginWord and endWord), and a dictionary's word list, find if there's a transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
I know this word can be solved using breadth-first-search. After I proposed the normal BFS solution, the interviewer asked me if I can make it faster. I didn't figure out a way to speed up. And the interviewer told me I should use a PriorityQueue instead to do a "Best-First-Search". And the priority is given by the hamming distance between the current word and target.
I don't quite understand why this can speed up the search. I feel by using priorityQueue we try to search the path that makes progress (i.e. reducing hamming distance).
This seems to be a greedy method. My questions is:
Why this solution is faster than the breadth-first-search solution? I feel the actual path can be like this: at first not making any progress, or even increasing the hamming distance, but after reaching a word the hamming distance goes down gradually. In this scenario, I think the priority queue solution will be slower.
Any suggestions will be appreciated! Thanks

First, I'd recommend to do some thorough reading on graph searching algorithms, that will explain the question to any detail you want (and far beyond).
TL;DR:
Your interviewer effectively recommended something close to the A* algorithm.
It differs from BFS in one aspect: which node to expand first. It uses a notion of distance score, composed from two elements:
At a node X, we already "traveled" a distance given by the number of transformations so far.
To reach the target from X, we still need to travel some more, at least N steps, where N is the number of characters different between node and target.
If we are to follow the path through X, the total number of steps from start to target can't be less than this score. It can be more if the real rest distance turns out to be longer (some words necessary for the direct path don't exist in the dictionary).
A* tells us: of all open (unexpanded) nodes, try the one first that potentially gives the shortest overall solution path, i.e. the one with the lowest score. And to implement that, a priority queue is a good fit.
In many cases, A* can dramatically reduce the search space (compared to BFS), and it still guarantees to find the best solution.
A* is NOT a greedy algorithm. It will eventually explore the whole search space, only in a much better ordering than a blind BFS.

Related

Is Dijkstra's algorithm deterministic?

I think that Dijkstra's algorithm is determined, so that if you choose the same starting vertex, you will get the same result (the same distances to every other vertex). But I don't think that it is deterministic (that it has defined the following operation for each operation), because that would mean that it wouldn't have to search for the shortest distances in the first place.
Am I correct? If I'm wrong, could you please explain why it is deterministic, and maybe give an example?

I'm not sure there is a universal definition of determinism, but Wikipedia defines it as...
... an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.
So this requires both determinism of the output and determinism of the execution. The output of Dijkstra's algorithm is deterministic no matter how you look at it, because it's the length of the shortest path, and there is only one such length.
The execution of Dijkstra's algorithm in the abstract sense is not deterministic, because the final step is:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
If there are multiple nodes with the same smallest tentative distance, the algorithm is free to select one arbitrarily. This doesn't affect the output, but it does affect the order of operations within the algorithm.
A particular implementation of Dijkstra's algorithm, however, probably is deterministic, because the nodes will be stored in a deterministic data structure like a min heap. This will result in the same node being selected each time the program is run. Although things like hashtable salts may also affect determinism even here.

Allow me to expand on Thomas's answer.
If you look at an implementation of Dijkstra, such as this example: http://graphonline.ru/en/?graph=NnnNwZKjpjeyFnwx you'll see a graph like this
In the example graph, 0→1→5, 0→2→5, 0→3→5 and 0→4→5 are all the same length. To find "the shortest path" is not necessarily unique, as is evidenced by this diagram.
Using the wording on Wikipedia, at some point the algorithm instructs us to:
select the unvisited node that is marked with the smallest tentative distance.
The problem here is the word the, suggesting that it is somehow unique. It may not be. For an implementation to actually pick one node from many equal candidates requires further specification of the algorithm regarding how to select it. But any such selected candidate having the required property will determine a path of the shortest length. So the algorithm doesn't commit. The modern approach to wording this algorithm would be to say:
select any unvisited node that is marked with the smallest tentative distance.
From a mathematical graph theory algorithm standpoint, that algorithm would technically proceed with all such candidates simultaneously in a sort of multiverse. All answers it may arrive at are equally valid. And when proving the algorithm works, we would prove it for all such candidates in all the multiverses and show that all choices arrive at a path of the same distance, and that the distance is the shortest distance possible.
Then, if you want to use the algorithm to just compute one such answer because you want to either A) find one such path, or B) determine the distance of such a path, then it is left up to you to select one specific branch of the multiverse to explore. All such selections made according to the algorithm as defined will yield a path whose length is the shortest length possible. You can define any additional non-conflicting criteria you wish to make such a selection.
The reason the implementation I linked is deterministic and always gives the same answer (for the same start and end nodes, obviously) is because the nodes themselves are ordered in the computer. This additional information about the ordering of the nodes is not considered for graph theory. The nodes are often labelled, but not necessarily ordered. In the implementation, the computer relies on the fact that the nodes appear in an ordered array of nodes in memory and the implementation uses this ordering to resolve ties. Possibly by selecting the node with the lowest index in the array, a.k.a. the "first" candidate.
If an implementation resolved ties by randomly (not pesudo-randomly!) selecting a winner from equal candidates, then it wouldn't be deterministic either.
Dijkstra's algorithm as described on Wikipedia just defines an algorithm to find the shortest paths (note the plural paths) between nodes. Any such path that it finds (if it exists) it is guaranteed to be of the shortest distance possible. You're still left with the task of deciding between equivalent candidates though at step 6 in the algorithm.

As the tag says, the usual term is "deterministic". And the algorithm is indeed deterministic. For any given input, the steps executed are always identical.
Compare it to a simpler algorithm like adding two multi-digit numbers. The result is always the same for two given inputs, the steps are also the same, but you still need to add the numbers to get the outcome.

By deterministic I take it you mean it will give the same answer to the same query for the same data every time and give only one answer, then it is deterministic. If it were not deterministic think of the problems it would cause by those who use it. I write in Prolog all day so I know non-deterministic answers when I see them.
Here I just introduced a simple mistake in Prolog and the answer was not deterministic, and with a simple fix it is deterministic.
Non-deterministic
spacing_rec(0,[]).
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = '' ;
false.
Deterministic
spacing_rec(0,[]) :- !.
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = ''.

I will try and keep this short and simple, there are so many great explanations on this on here and online as well, if some good research is done of course.
Dijkstra's algorithm is a greedy algorithm, the main goal of a Dijsktra's algorithm is to find the shortest path between two nodes of a weighted graph.
Wikipedia does a great job with explaining what a deterministic and non-deterministic algorithms are and how you can 'determine' which algorithm would fall either either category:
From Wikipedia Source:
Deterministic Algorithm:
In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently.
Formally, a deterministic algorithm computes a mathematical function; a function has a unique value for any input in its domain, and the algorithm is a process that produces this particular value as output.
Nondeterministic Algorithm
In computer science, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition.
So going back to the goal of Dijkstra's algorithm is like saying I want to get from X location to Z location but to do that I have options going through shorter nodes that will get my to my end a lot quicker and more efficiently than other longer routes or nodes...
Thinking through cases where Dijsktra's algorithm could be deterministic would be a good idea as well.

Approximated closest pair algorithm

I have been thinking about a variation of the closest pair problem in which the only available information is the set of distances already calculated (we are not allowed to sort points according to their x-coordinates).
Consider 4 points (A, B, C, D), and the following distances:
dist(A,B) = 0.5
dist(A,C) = 5
dist(C,D) = 2
In this example, I don't need to evaluate dist(B,C) or dist(A,D), because it is guaranteed that these distances are greater than the current known minimum distance.
Is it possible to use this kind of information to reduce the O(n²) to something like O(nlogn)?
Is it possible to reduce the cost to something close to O(nlogn) if I accept a kind of approximated solution? In this case, I am thinking about some technique based on reinforcement learning that only converges to the real solution when the number of reinforcements go to infinite, but provides a great approximation for small n.
Processing time (measured by the big O notation) is not the only issue. To keep a very large amount of previous calculated distances can also be an issue.
Imagine this problem for a set with 10⁸ points.
What kind of solution should I look for? Was this kind of problem solved before?
This is not a classroom problem or something related. I have been just thinking about this problem.

I suggest using ideas that are derived from quickly solving k-nearest-neighbor searches.
The M-Tree data structure: (see http://en.wikipedia.org/wiki/M-tree and http://www.vldb.org/conf/1997/P426.PDF ) is designed to reduce the number distance comparisons that need to be performed to find "nearest neighbors".
Personally, I could not find an implementation of an M-Tree online that I was satisfied with (see my closed thread Looking for a mature M-Tree implementation) so I rolled my own.
My implementation is here: https://github.com/jon1van/MTreeMapRepo
Basically, this is binary tree in which each leaf node contains a HashMap of Keys that are "close" in some metric space you define.
I suggest using my code (or the idea behind it) to implement a solution in which you:
Search each leaf node's HashMap and find the closest pair of Keys within that small subset.
Return the closest pair of Keys when considering only the "winner" of each HashMap.
This style of solution would be a "divide and conquer" approach the returns an approximate solution.
You should know this code has an adjustable parameter the governs the maximum number of Keys that can be placed in an individual HashMap. Reducing this parameter will increase the speed of your search, but it will increase the probability that the correct solution won't be found because one Key is in HashMap A while the second Key is in HashMap B.
Also, each HashMap is associated a "radius". Depending on how accurate you want your result you maybe able to just search the HashMap with the largest hashMap.size()/radius (because this HashMap contains the highest density of points, thus it is a good search candidate)
Good Luck

If you only have sample distances, not original point locations in a plane you can operate on, then I suspect you are bounded at O(E).
Specifically, it would seem from your description that any valid solution would need to inspect every edge in order to rule out it having something interesting to say, meanwhile, inspecting every edge and taking the smallest solves the problem.
Planar versions bypass O(V^2), by using planar distances to deduce limitations on sets of edges, allowing us to avoid needing to look at most of the edge weights.

Use same idea as in space partitioning. Recursively split given set of points by choosing two points and dividing set in two parts, points that are closer to first point and points that are closer to second point. That is same as splitting points by a line passing between two chosen points.
That produces (binary) space partitioning, on which standard nearest neighbour search algorithms can be used.

concrete examples of heuristics

What are concrete examples (e.g. Alpha-beta pruning, example:tic-tac-toe and how is it applicable there) of heuristics. I already saw an answered question about what heuristics is but I still don't get the thing where it uses estimation. Can you give me a concrete example of a heuristic and how it works?

Warnsdorff's rule is an heuristic, but the A* search algorithm isn't. It is, as its name implies, a search algorithm, which is not problem-dependent. The heuristic is. An example: you can use the A* (if correctly implemented) to solve the Fifteen puzzle and to find the shortest way out of a maze, but the heuristics used will be different. With the Fifteen puzzle your heuristic could be how many tiles are out of place: the number of moves needed to solve the puzzle will always be greater or equal to the heuristic.
To get out of the maze you could use the Manhattan Distance to a point you know is outside of the maze as your heuristic. Manhattan Distance is widely used in game-like problems as it is the number of "steps" in horizontal and in vertical needed to get to the goal.
Manhattan distance = abs(x2-x1) + abs(y2-y1)
It's easy to see that in the best case (there are no walls) that will be the exact distance to the goal, in the rest you will need more. This is important: your heuristic must be optimistic (admissible heuristic) so that your search algorithm is optimal. It must also be consistent. However, in some applications (such as games with very big maps) you use non-admissible heuristics because a suboptimal solution suffices.
A heuristic is just an approximation to the real cost, (always lower than the real cost if admissible). The better the approximation, the fewer states the search algorithm will have to explore. But better approximations usually mean more computing time, so you have to find a compromise solution.

Most demonstrative is the usage of heuristics in informed search algorithms, such as A-Star. For realistic problems you usually have large search space, making it infeasible to check every single part of it. To avoid this, i.e. to try the most promising parts of the search space first, you use a heuristic. A heuristic gives you an estimate of how good the available subsequent search steps are. You will choose the most promising next step, i.e. best-first. For example if you'd like to search the path between two cities (i.e. vertices, connected by a set of roads, i.e. edges, that form a graph) you may want to choose the straight-line distance to the goal as a heuristic to determine which city to visit first (and see if it's the target city).
Heuristics should have similar properties as metrics for the search space and they usually should be optimistic, but that's another story. The problem of providing a heuristic that works out to be effective and that is side-effect free is yet another problem...
For an application of different heuristics being used to find the path through a given maze also have a look at this answer.

Your question interests me as I've heard about heuristics too during my studies but never saw an application for it, I googled a bit and found this : http://www.predictia.es/blog/aco-search
This code simulate an "ant colony optimization" algorithm to search trough a website.
The "ants" are workers which will search through the site, some will search randomly, some others will follow the "best path" determined by the previous ones.

A concrete example: I've been doing a solver for the game JT's Block, which is roughly equivalent to the Same Game. The algorithm performs a breadth-first search on all possible hits, store the values, and performs to the next ply. Problem is the number of possible hits quickly grows out of control (10e30 estimated positions per game), so I need to prune the list of positions at each turn and only take the "best" of them.
Now, the definition of the "best" positions is quite fuzzy: they are the positions that are expected to lead to the best final scores, but nothing is sure. And here comes the heuristics. I've tried a few of them:
sort positions by score obtained so far
increase score by best score obtained with a x-depth search
increase score based on a complex formula using the number of tiles, their color and their proximity
improve the last heuristic by tweaking its parameters and seeing how they perform
etc...
The last of these heuristic could have lead to an ant-march optimization: there's half a dozen parameters that can be tweaked from 0 to 1, and an optimizer could find the optimal combination of these. For the moment I've just manually improved some of them.
The second of this heuristics is interesting: it could lead to the optimal score through a full depth-first search, but such a goal is impossible of course because it would take too much time. In general, increasing X leads to a better heuristic, but increases the computing time a lot.
So here it is, some examples of heuristics. Anything can be an heuristic as long as it helps your algorithm perform better, and it's what makes them so hard to grasp: they're not deterministic. Another point with heuristics: they're supposed to lead to quick and dirty results of the real stuff, so there's a trade-of between their execution time and their accuracy.

A couple of concrete examples: for solving the Knight's Tour problem, one can use Warnsdorff's rule - an heuristic. Or for solving the Fifteen puzzle, a possible heuristic is the A* search algorithm.

The original question asked for concrete examples for heuristics.
Some of these concrete examples were already given. Another one would be the number of misplaced tiles in the 15-puzzle or its improvement, the Manhattan distance, based on the misplaced tiles.
One of the previous answers also claimed that heuristics are always problem-dependent, whereas algorithms are problem-independent. While there are, of course, also problem-dependent algorithms (for instance, for every problem you can just give an algorithm that immediately solves that very problem, e.g. the optimal strategy for any tower-of-hanoi problem is known) there are also problem-independent heuristics!
Consequently, there are also different kinds of problem-independent heuristics. Thus, in a certain way, every such heuristic can be regarded a concrete heuristic example while not being tailored to a specific problem like 15-puzzle. (Examples for problem-independent heuristics taken from planning are the FF heuristic or the Add heuristic.)
These problem-independent heuristics base on a general description language and then they perform a problem relaxation. That is, the problem relaxation only bases on the syntax (and, of course, its underlying semantics) of the problem description without "knowing" what it represents. If you are interested in this, you should get familiar with "planning" and, more specifically, with "planning as heuristic search". I also want to mention that these heuristics, while being problem-independent, are dependent on the problem description language, of course. (E.g., my before-mentioned heuristics are specific to "planning problems" and even for planning there are various different sub problem classes with differing kinds of heuristics.)

Number of simple mutations to change one string to another?

I'm sure you've all heard of the "Word game", where you try to change one word to another by changing one letter at a time, and only going through valid English words. I'm trying to implement an A* Algorithm to solve it (just to flesh out my understanding of A*) and one of the things that is needed is a minimum-distance heuristic.
That is, the minimum number of one of these three mutations that can turn an arbitrary string a into another string b:
1) Change one letter for another
2) Add one letter at a spot before or after any letter
3) Remove any letter
Examples
aabca => abaca:
aabca
abca
abaca
= 2
abcdebf => bgabf:
abcdebf
bcdebf
bcdbf
bgdbf
bgabf
= 4
I've tried many algorithms out; I can't seem to find one that gives the actual answer every time. In fact, sometimes I'm not sure if even my human reasoning is finding the best answer.
Does anyone know any algorithm for such purpose? Or maybe can help me find one?
(Just to clarify, I'm asking for an algorithm that can turn any arbitrary string to any other, disregarding their English validity-ness.)

You want the minimum edit distance (or Levenshtein distance):
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.
And one algorithm to determine the editing sequence is on the same page here.

An excellent reference on "Edit distance" is section 6.3 of the Algorithms textbook by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, a draft of which is available freely here.

If you have a reasonably sized (small) dictionary, a breadth first tree search might work.
So start with all words your word can mutate into, then all those can mutate into (except the original), then go down to the third level... Until you find the word you are looking for.
You could eliminate divergent words (ones further away from the target), but doing so might cause you to fail in a case where you must go through some divergent state to reach the shortest path.

Matching algorithm

Odd question here not really code but logic,hope its ok to post it here,here it is
I have a data structure that can be thought of as a graph.
Each node can support many links but is limited to a value for each node.
All links are bidirectional. and each link has a cost. the cost depends on euclidian difference between the nodes the minimum value of two parameters in each node. and a global modifier.
i wish to find the maximum cost for the graph.
wondering if there was a clever way to find such a matching, rather than going through in brute force ...which is ugly... and i'm not sure how i'd even do that without spending 7 million years running it.
To clarify:
Global variable = T
many nodes N each have E,X,Y,L
L is the max number of links each node can have.
cost of link A,B = Sqrt( min([a].e | [b].e) ) x
( 1 + Sqrt( sqrt(sqr([a].x-[b].x)+sqr([a].y-[b].y)))/75 + Sqrt(t)/10 )
total cost =sum all links.....and we wish to maximize this.
average values for nodes is 40-50 can range to (20..600)
average node linking factor is 3 range 0-10.

For the sake of completeness for anybody else that looks at this article, i would suggest revisiting your graph theory algorithms:
Dijkstra
Astar
Greedy
Depth / Breadth First
Even dynamic programming (in some situations)
ect. ect.
In there somewhere is the correct solution for your problem. I would suggest looking at Dijkstra first.
I hope this helps someone.

If I understand the problem correctly, there is likely no polynomial solution. Therefore I would implement the following algorithm:
Find some solution by beng greedy. To do that, you sort all edges by cost and then go through them starting with the highest, adding an edge to your graph while possible, and skipping when the node can't accept more edges.
Look at your edges and try to change them to archive higher cost by using a heuristics. The first that comes to my mind: you cycle through all 4-tuples of nodes (A,B,C,D) and if your current graph has edges AB, CD but AC, BD would be better, then you make the change.
Optionally the same thing with 6-tuples, or other genetic algorithms (they are called that way because they work by mutations).

This is equivalent to the traveling salesman problem (and is therefore NP-Complete) since if you could solve this problem efficiently, you could solve TSP simply by replacing each cost with its reciprocal.
This means you can't solve exactly. On the other hand, it means that you can do exactly as I said (replace each cost with its reciprocal) and then use any of the known TSP approximation methods on this problem.

Seems like a max flow problem to me.

Is it possible that by greedily selecting the next most expensive option from any given start point (omitting jumps to visited nodes) and stopping once all nodes are visited? If you get to a dead end backtrack to the previous spot where you are not at a dead end and greedily select. It would require some work and probably something like a stack to keep your paths in. I think this would work quite effectively provided the costs are well ordered and non negative.

Use Genetic Algorithms. They are designed to solve the problem you state rapidly reducing time complexity. Check for AI library in your language of choice.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio