Generate a message out of cutout magazine characters (interview question) - algorithm

This problem comes out of the dynamic programming chapter in The Algorithm Deisgn Manual by Skiena.
Give an algorithm to determine whether you can generate a given string by pasting cutouts from a magazine. You are given a function that will identify the character and its position on the reverse side of the page for any given character position.
I solved this with with backtracking, but since it's in the dynamic programming chapter I think there must be a recurrence I can't figure out. Can anyone give me a hint?

You can solve it with maximum bipartite matching.
Each character L of the given string forms the left set. (Note, you repeat the characters if the string has repeated characters).
Each pair of characters (R1,R2) of the magazine forms the right set.
L is connected to (r1,r2) iff L=R1 or L=R2.
Find a maximum matching in the resulting graph. If all left vertices are part of the matching, you have the answer. If not, such a string is not possible.
See Maximum Bipartite Matchings for an algorithm.
Not sure if this is optimal though and sorry for not answering exactly as asked.

If you have a recursive backtracking solution, you may be able to apply memoization, which is one way to do dynamic programming.

Related

Minimum-size Trie Construction by Dynamic Programming (War Story: What’s Past is Prolog from The Algorithm Design Manual by Steven S Skiena)

I'm reading the The Algorithm Design Manual by Steven S Skiena, and trying to understand the solution to the problem of War Story: What’s Past is Prolog.
The problem is also well described here.
Basically, the problem is, given an ordered list of strings, give a optimal solution to construct a trie with minimum size (the string character as the node), with the constraint that the order of the strings must be reserved, while the character index can be reordered.
Maybe this is not an appropriate question here for stackoverflow, still I'm wondering if anyone could give me some hint on the solution, especially what this recurrence means by its arguments:
the recurrence for the Dynamic Programming algorithm
You can think about it this way:
Let's assume that we fix the index of the first character. All strings get split into r bins based on the value of the character in this position (bins are essentially subtrees).
We can work with each bin independently. It won't change the order across different bins because two strings in different bins are different in the first character.
Thus, we can solve the problem for each bin independently. After that, we need exactly one edge to connect the root to each bin (that is, subtree). That's where the formula
C[i_k, j_k] + 1 comes from.
As we want to minimize the total number of edges and we're free to pick the first position, we just try all possible options among m positions.
Note: this algorithm is correct under assumption that we can reorder the rest of the characters in each subtree independently. If it's not the case, the dynamic programming solution is incorrect.

Lexicographically smallest perfect matching

I want to find the lexicographically smallest perfect matching in two partial graphs. I'm supposed to use Kuhn's algorithm, but I don't understand how to make matching lexicographically smallest. Is it at all possible in Kuhn's algorithm? I can provide my code, but it's classic enough.
As a hint, consider how you could determine where just the first node should be matched in the lex-min matching.
In most cases like this it is usually easier to make a reduction instead of modifying the algorithm:
Find a way to change the input in your problem so that lexicographical order breaks any ties (but in a manner that perfect matchings still have a higher score than imperfect ones)
Run the modified graph through Kuhn's algorithm.
If needed, translate the answer back to the original problem.
I haven't tried to actually solve this myself or read the problem in detail. But this seems to be a textbook exercise and I feel this answer is enough :-)
Think about how you can create assignment prices that encourage lexicographically early matchings.

Shortest path from one word to another via valid words (no graph)

I came across this variation of edit-distance problem:
Find the shortest path from one word to another, for example storm->power, validating each intermediate word by using a isValidWord() function. There is no other access to the dictionary of words and therefore a graph cannot be constructed.
I am trying to figure this out but it doesn't seem to be a distance related problem, per se. Use simple recursion maybe? But then how do you know that you're going the right direction?
Anyone else find this interesting? Looking forward to some help, thanks!
This is a puzzle from Lewis Carroll known as Word Ladders. Donald Knuth covers this in The Stanford Graphbase. This also
You can view it as a breadth first search. You will need access to a dictionary of words, otherwise the space you will have to search will be huge. If you just have access to a valid word you can generate all the permutations of words and then just use isValidWord() to filter it down (Norvig's "How to Write a Spelling Corrector" is a great explanation of generating the edits).
You can guide the search by trying to minimize the edit distance between where you currently are and where you can to be. For example, generate the space of all nodes to search, and sort by minimum edit distance. Follow the links first that are closest (e.g. minimize the edit distance) to the target. In the example, follow the nodes that are closest to "power".
I found this interesting as well, so there's a Haskell implementation here which works reasonably well. There's a link in the comments to a Clojure version which has some really nice visualizations.
You can search from two sides at the same time. I.e. change a letter in storm and run it through isValidWord(), and change a letter in power and run it through isValidWord(). If those two words are the same, you have found a path.

Is this combinatorial optimization problem NP-hard?

I working on a combinatorial optimization problem that I suspect is NP-hard, and a genetic algorithm has been working well with our dataset. We're a research group and plan to publish our results in our field (not in math or CS), and I'd like to explore the NP-hard question before sending the manuscript out for review.
There are two main questions:
1) I'd like to know whether this particular optimization problem has been studied. I've heavily searched the lit but haven't seen anything exactly the same.
2) If the problem hasn't been studied, I might take a crack at doing a reducibility proof, and would like some pointers to good NP-complete candidates for the reduction.
The problem can be described in two ways, as a subsequence variant, and as a bipartite graph problem.
In the subsequence flavor, I want to find a "relaxed" subsequence that allows permutations, and optimize to minimize the permutation count. For example: (. = any other char)
Query: abc, Target: ..b.a.b.c., Result: abc (normal subsequence)
Query: abc, Target: ..b.a.c.a., Result: bac (subsequence with one permutation)
The bipartite formulation is a matching problem or linear assignment problem, with the graph partitioned into query character nodes and target character nodes. The edges connect the query characters to the target characters, such that there is exactly one edge from each query char to a target char. The objective function is to minimize the number of edge crossings (also called "crossing number" in the lit). This is similar to bipartite graph layout algorithms that reorder node placement to minimize edge crossings, but my problem requires the that both node orders stay fixed.
Any thoughts from the experts on questions 1 or 2?
Thanks in advance!
Just some idea: Does it somehow equivalent to finding the minimal number of swap needed to sort an array (MIN-SBR)? If yes, this is NP-Hard.
(btw, are you working on something similar to this?)
I don't think this is NP-hard. See work by Pevzner and Hannehali. A paper that comes to mind is titled ``From Cabbage to Turnip''. The idea is to find the minimum number of inversions to go from one string to another. They have a polytime algorithm for this.
The problem with "word problem" should be harder, right? – J-16 SDiZ 14
Yes, having multiple occurrences of the char in the target seems to make my problem harder than MIN-SBR, so from that angle my problem would be at least as hard as NP-complete. On the other hand, I'm not yet clear on how their central notion of block reversals would affect my claim of NP-completeness.
I sure would be nice to know whether my optimization can be solved in polynomial time. Put another way, it sure would be embarrassing if a reviewer came back with five lines of pseudocode that finds the global maximum in O(n)..
Would, Query: abc Target: ..c.b.a.a Result: cba, be three permutations (as per your use of the term) then? If so, then maybe you mean transpositions rather than permutations. A transposition is the swapping of two adjacent characters.
Good question. We're interested in a mapping from Query -> Target that has as few crossings as possible. This is very much the motivation for mentioning the bipartite edge crossings in the original post. Alternatively, you can think of maximizing a rank statistic, like Spearman's Rho, over the mapping.
Also, out of curiousity, how many unique characters are there in the query/target? – Justin Peel 18
Typical query: 100, typical target: 1000. Combinatorially, it's a huge solution space.

Number of simple mutations to change one string to another?

I'm sure you've all heard of the "Word game", where you try to change one word to another by changing one letter at a time, and only going through valid English words. I'm trying to implement an A* Algorithm to solve it (just to flesh out my understanding of A*) and one of the things that is needed is a minimum-distance heuristic.
That is, the minimum number of one of these three mutations that can turn an arbitrary string a into another string b:
1) Change one letter for another
2) Add one letter at a spot before or after any letter
3) Remove any letter
Examples
aabca => abaca:
aabca
abca
abaca
= 2
abcdebf => bgabf:
abcdebf
bcdebf
bcdbf
bgdbf
bgabf
= 4
I've tried many algorithms out; I can't seem to find one that gives the actual answer every time. In fact, sometimes I'm not sure if even my human reasoning is finding the best answer.
Does anyone know any algorithm for such purpose? Or maybe can help me find one?
(Just to clarify, I'm asking for an algorithm that can turn any arbitrary string to any other, disregarding their English validity-ness.)
You want the minimum edit distance (or Levenshtein distance):
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.
And one algorithm to determine the editing sequence is on the same page here.
An excellent reference on "Edit distance" is section 6.3 of the Algorithms textbook by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, a draft of which is available freely here.
If you have a reasonably sized (small) dictionary, a breadth first tree search might work.
So start with all words your word can mutate into, then all those can mutate into (except the original), then go down to the third level... Until you find the word you are looking for.
You could eliminate divergent words (ones further away from the target), but doing so might cause you to fail in a case where you must go through some divergent state to reach the shortest path.

Resources