How to find the best possible answer to a really large seeming problem?

How to find the best possible answer to a really large seeming problem? - algorithm

First off, this is NOT a homework problem. I haven't had to do homework since 1988!
I have a list of words of length N
I have a max of 13 characters to choose from.
There can be multiples of the same letter
Given the list of words, which 13 characters would spell the most possible words. I can throw out words that make the problem harder to solve, for example:
speedometer has 4 e's in it, something MOST words don't have,
so I could toss that word due to a poor fit characteristic, or it might just
go away based on the algorithm
I've looked # letter distributions, I've built a graph of the words (letter by letter). There is something I'm missing, or this problem is a lot harder than I thought. I'd rather not totally brute force it if that is possible, but I'm down to about that point right now.
Genetic algorithms come to mind, but I've never tried them before....
Seems like I need a way to score each letter based upon its association with other letters in the words it is in....

It sounds like a hard combinatorial problem. You are given a dictionary D of words, and you can select N letters (possible with repeats) to cover / generate as many of the words in D as possible. I'm 99.9% certain it can be shown to be an NP-complete optimization problem in general (assuming possibly alphabet i.e. set of letters that contains more than 26 items) by reduction of SETCOVER to it, but I'm leaving the actual reduction as an exercise to the reader :)
Assuming it's hard, you have the usual routes:
branch and bound
stochastic search
approximation algorithms

Best I can come up with is branch and bound. Make an "intermediate state" data structure that consists of
Letters you've already used (with multiplicity)
Number of characters you still get to use
Letters still available
Words still in your list
Number of words still in your list (count of the previous set)
Number of words that are not possible in this state
Number of words that are already covered by your choice of letters
You'd start with
Empty set
13
{A, B, ..., Z}
Your whole list
N
0
0
Put that data structure into a queue.
At each step
Pop an item from the queue
Split into possible next states (branch)
Bound & delete extraneous possibilities
From a state, I'd generate possible next states as follows:
For each letter L in the set of letters left
Generate a new state where:
you've added L to the list of chosen letters
the least letter is L
so you remove anything less than L from the allowed letters
So, for example, if your left-over set is {W, X, Y, Z}, I'd generate one state with W added to my choice, {W, X, Y, Z} still possible, one with X as my choice, {X, Y, Z} still possible (but not W), one with Y as my choice and {Y, Z} still possible, and one with Z as my choice and {Z} still possible.
Do all the various accounting to figure out the new states.
Each state has at minimum "Number of words that are already covered by your choice of letters" words, and at maximum that number plus "Number of words still in your list." Of all the states, find the highest minimum, and delete any states with maximum higher than that.
No special handling for speedometer required.
I can't imagine this would be fast, but it'd work.
There are probably some optimizations (e.g., store each word in your list as an array of A-Z of number of occurrances, and combine words with the same structure: 2 occurrances of AB.....T => BAT and TAB). How you sort and keep track of minimum and maximum can also probably help things somewhat. Probably not enough to make an asymptotic difference, but maybe for a problem this big enough to make it run in a reasonable time instead of an extreme time.

Total brute forcing should work, although the implementation would become quite confusing.
Instead of throwing words like speedometer out, can't you generate the association graphs considering only if the character appears in the word or not (irrespective of the no. of times it appears as it should not have any bearing on the final best-choice of 13 characters). And this would also make it fractionally simpler than total brute force.
Comments welcome. :)

Removing the bounds on each parameter including alphabet size, there's an easy objective-preserving reduction from the maximum coverage problem, which is NP-hard and hard to approximate with a ratio better than (e - 1) / e ≈ 0.632 . It's fixed-parameter tractable in the alphabet size by brute force.
I agree with Nick Johnson's suggestion of brute force; at worst, there are only (13 + 26 - 1) choose (26 - 1) multisets, which is only about 5 billion. If you limit the multiplicity of each letter to what could ever be useful, this number gets a lot smaller. Even if it's too slow, you should be able to recycle the data structures.

I did not understand this completely "I have a max of 13 characters to choose from.". If you have a list of 1000 words, then did you mean you have to reduce that to just 13 chars?!
Some thoughts based on my (mis)understanding:
If you are only handling English lang words, then you can skip vowels because consonants are just as descriptive. Our brains can sort of fill in the vowels - a.k.a SMS/Twitter language :)
Perhaps for 1-3 letter words, stripping off vowels would loose too much info. But still:
spdmtr hs 4 's n t, smthng
MST wrds dn't hv, s cld
tss tht wrd d t pr ft
chrctrstc, r t mght jst g
wy bsd n th lgrthm
Stemming will cut words even shorter. Stemming first, then strip vowels. Then do a histogram....

Related

Is it possible to create an algorithm which generates an autogram?

An autogram is a sentence which describes the characters it contains, usually enumerating each letter of the alphabet, but possibly also the punctuation it contains. Here is the example given in the wiki page.
This sentence employs two a’s, two c’s, two d’s, twenty-eight e’s, five f’s, three g’s, eight h’s, eleven i’s, three l’s, two m’s, thirteen n’s, nine o’s, two p’s, five r’s, twenty-five s’s, twenty-three t’s, six v’s, ten w’s, two x’s, five y’s, and one z.
Coming up with one is hard, because you don't know how many letters it contains until you finish the sentence. Which is what prompts me to ask: is it possible to write an algorithm which could create an autogram? For example, a given parameter would be the start of the sentence as an input e.g. "This sentence employs", and assuming that it uses the same format as the above "x a's, ... y z's".
I'm not asking for you to actually write an algorithm, although by all means I'd love to see if you know one to exist or want to try and write one; rather I'm curious as to whether the problem is computable in the first place.

You are asking two different questions.
"is it possible to write an algorithm which could create an autogram?"
There are algorithms to find autograms. As far as I know, they use randomization, which means that such an algorithm might find a solution for a given start text, but if it doesn't find one, then this doesn't mean that there isn't one. This takes us to the second question.
"I'm curious as to whether the problem is computable in the first place."
Computable would mean that there is an algorithm which for a given start text either outputs a solution, or states that there isn't one. The above-mentioned algorithms can't do that, and an exhaustive search is not workable. Therefore I'd say that this problem is not computable. However, this is rather of academic interest. In practice, the randomized algorithms work well enough.

Let's assume for the moment that all counts are less than or equal to some maximum M, with M < 100. As mentioned in the OP's link, this means that we only need to decide counts for the 16 letters that appear in these number words, as counts for the other 10 letters are already determined by the specified prefix text and can't change.
One property that I think is worth exploiting is the fact that, if we take some (possibly incorrect) solution and rearrange the number-words in it, then the total letter counts don't change. IOW, if we ignore the letters spent "naming themselves" (e.g. the c in two c's) then the total letter counts only depend on the multiset of number-words that are actually present in the sentence. What that means is that instead of having to consider all possible ways of assigning one of M number-words to each of the 16 letters, we can enumerate just the (much smaller) set of all multisets of number-words of size 16 or less, having elements taken from the ground set of number-words of size M, and for each multiset, look to see whether we can fit the 16 letters to its elements in a way that uses each multiset element exactly once.
Note that a multiset of numbers can be uniquely represented as a nondecreasing list of numbers, and this makes them easy to enumerate.
What does it mean for a letter to "fit" a multiset? Suppose we have a multiset W of number-words; this determines total letter counts for each of the 16 letters (for each letter, just sum the counts of that letter across all the number-words in W; also add a count of 1 for the letter "S" for each number-word besides "one", to account for the pluralisation). Call these letter counts f["A"] for the frequency of "A", etc. Pretend we have a function etoi() that operates like C's atoi(), but returns the numeric value of a number-word. (This is just conceptual; of course in practice we would always generate the number-word from the integer value (which we would keep around), and never the other way around.) Then a letter x fits a particular number-word w in W if and only if f[x] + 1 = etoi(w), since writing the letter x itself into the sentence will increase its frequency by 1, thereby making the two sides of the equation equal.
This does not yet address the fact that if more than one letter fits a number-word, only one of them can be assigned it. But it turns out that it is easy to determine whether a given multiset W of number-words, represented as a nondecreasing list of integers, simultaneously fits any set of letters:
Calculate the total letter frequencies f[] that W implies.
Sort these frequencies.
Skip past any zero-frequency letters. Suppose there were k of these.
For each remaining letter, check whether its frequency is equal to one less than the numeric value of the number-word in the corresponding position. I.e. check that f[k] + 1 == etoi(W[0]), f[k+1] + 1 == etoi(W[1]), etc.
If and only if all these frequencies agree, we have a winner!
The above approach is naive in that it assumes that we choose words to put in the multiset from a size M ground set. For M > 20 there is a lot of structure in this set that can be exploited, at the cost of slightly complicating the algorithm. In particular, instead of enumerating straight multisets of this ground set of all allowed numbers, it would be much better to enumerate multisets of {"one", "two", ..., "nineteen", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}, and then allow the "fit detection" step to combine the number-words for multiples of 10 with the single-digit number-words.

How to generate letter grids with lots of words

The question is basically "how do I generate a good grid for the game 'Boggle' with lots of words" where good is defined as having lots of words of 5 or more letters.
Boggle is a game where you roll dice with letters on them, they are placed in a 4x4 grid. Example:
H S A V
E N I S
K R G I
S O L A
Words can be made by connecting letters horizontally, vertically or diagonally. In the good example grid above you can make the words "VANISHERS", "VANISHER", "KNAVISH", "ALIGNERS", "SAVINGS", "SINKERS" and around 271 other words depending on the dictionary used, for example "AS", "I", "AIR", "SIN", "IS", etc...
As a bad example this grid
O V W C
T K Z O
Y N J H
D E I E
only has ~44 words only 2 of which are > 4 letters long. "TYNED" and "HINKY".
There's lots of similar questions but AFAICT not this exact question. This is obviously a reference to the game "Scramble with Friends".
The first solution, picking letters at random, has the problem that if you accidently pick all consonants there will be no words. Adding a few random vowels is not enough to guarantee a good set of words. You might only get 1 to 4 letter words whereas a good algorithm will choose a set of letters that has > 200 words with many words > 7 letters.
I'm open to any algorithm. Obviously I could write code to brute force solutions finding every possible grid and then sorting them by grids with the most words but that simple solution would take forever to run.
I can imagine various heuristics like choosing a long word (8-16 letters), putting those letters in the grid at random but in a way that can actually still make the word and then filling in the left over spaces. I suspect that's also not enough to guarantee a good set of words though I haven't tried it yet.
It's possible the solution requires pre-processing a dictionary to know common parts of words. For example all words that end in "ing" or "ers" or "ght" or "tion" or "land". Or somehow organizing them into a graph of shared letters. Maybe weighting certain sets of letters so "ing" or "ers" are inserted often.
Ideas?

Short of the brute-force search proposal there is probably no way to guarantee that you have a good grid. If you use the letter frequency as found on the Boggle dice, then you will get 'average' grids (exactly as if you roll the dice). You could improve this by adding extra heuristics or filters, for example:
ensure that (almost) every consonant is 'in-reach-of' a vowel
ensure 'Q' is 'in-reach-of' a 'U'
ensure the ratio of vowels to consonants is within a set range
ensure the number of rare consonants is not too large
etc
Then you could
set letters using weighted letter frequency
change (swap/replace) letters not meeting your heuristics
It would still be possible for a bad grid to get through unless you checked via brute-force, but you may be able to reduce the number of bad grids substantially from those returned by a simple randomly generated grid.
Alternately, generate random grids and do the brute force work as required to pick good grids. But do this in the background (days or weeks before needed). Then store a bunch of good grids and choose one randomly as required when needed (and cross it off your list so you don't see it again).

The way Boggle works is that there are six-sided die with certain letters on the side. Those die are randomly assigned to the 16 squares and then rolled. Common letters occur on more faces of the dice. Search around - you may be able to get the exact set of dice.

Calculate statistical letter frequency and letter-pair frequencies from the dictionary.
Start from randomly choosing one of the four central squares
Randomly choose a letter for that square weighted by single letter frequency.
Recursively:
4.1. Randomly choose one of all the empty connected squares.
4.2. Randomly choose a letter for that square weighted by the combination, (average), of the dual letter frequencies of any connected filled square and the single letter frequencies of any connected empty square.
Et voila!
P.S. You might also want to experiment with adding a global letter derating based on its current count of appearances in the grid to 4.2.

Minimal Difference Patch Algorithm

I'm trying to convey the difference between two bytestreams. I want to minimize the number of bytes in the patch.
(I don't necessarily want to minimize the number of "changes" in the diff, which is what the optimal patch in a levenshtein distance computation would give me.)
The patch would ideally be in a format such that, given the source bytestream and the diff, it would be easy to reconstruct the target bytestream.
Is there a good algorithm for doing this?
Edit: For the record, I've tried sending changes of the form "at spot 506, insert the following the bytes...", where I create a change list from the levenshtein distance algorithm.
The problem I have is that the levenshtein distance algorithm gives me a lot of changes like:
at spot 506 substitute [some bytes1]
at spot 507 do nothing
at spot 508 substitute [some bytes2]
at spot 509 do nothing
at spot 510 substitute [some bytes3]
...
This is because the lev distance algorithm tries to minimize the number of changes. However, for my purposes this instruction set is wasteful. It would probably be better if an algorithm just said,
At spot 506 substitute [some bytes1, [byte at spot 507], some bytes2, [byte at spot 509], some bytes3, ...]
There's probably some way to modify lev distance to favor these types of changes but it seems a little tricky. I could coalesce substituions after getting a changelist (and I'm going to try that) but there may be opportunities to coalesce deletions / inserts too, and it's less obvious how to do that correctly.
Just wondering if there's a special purpose algorithm for this (or if somebody's done a modification of lev distance to favor these types of changes already).

You can do this using pairwise alignment with affine gap costs, which takes O(nm) time for two strings of lengths n and m respectively.
One thing first: There is no way to find a provably minimal patch in terms of bits or bytes used. That's because if there was such a way, then the function shortest_patch(x, y) that calculates it could be used to find a provably minimal compression of any given string s by calling it with shortest_patch('', s), and Kolmogorov complexity tells us that the shortest possible compression of a given string is formally uncomputable. But if edits tend to be clustered in space, as it seems they are here, then it's certainly possible to find smaller patches than those produced using the usual Levenshtein distance algorithm.
Edit scripts
Patches are usually called "edit scripts" in CS. Finding a minimal (in terms of number of insertions plus number of deletions) edit script for turning one string x into another string y is equivalent to finding an optimal pairwise alignment in which every pair of equal characters has value 0, every pair of unequal characters has value -inf, and every position in which a character from one string is aligned with a - gap character has value -1. Alignments are easy to visualise:
st--ing st-i-ng
stro-ng str-ong
These are 2 optimal alignments of the strings sting and strong, each having cost -3 under the model. If pairs of unequal characters are given the value -1 instead of -inf, then we get an alignment with cost equal to the Levenshtein distance (the number of insertions, plus the number of deletions, plus the number of substitutions):
st-ing sti-ng
strong strong
These are 2 optimal alignments under the new model, and each has cost -2.
To see how these correspond with edit scripts, we can regard the top string as the "original" string, and the bottom string as the "target" string. Columns containing pairs of unequal characters correspond to substitutions, the columns containing a - in the top row correspond to insertions of characters, and the columns containing a - in the bottom row correspond to deletions of characters. You can create an edit script from an alignment by using the "instructions" (C)opy, (D)elete, (I)nsert and (S)ubstitute. Each instruction is followed by a number indicating the number of columns to consume from the alignment, and in the case of I and S, a corresponding number of characters to insert or replace with. For example, the edit scripts for the previous 2 alignments are
C2, I1"r", S1"o", C2 and C2, S1"r", I1"o", C2
Increasing bunching
Now if we have strings like mississippi and tip, we find that the two alignments
mississippi
------tip--
mississippi
t---i----p-
both have the same score of -9: they both require the same total number of insertions, deletions and substitutions. But we much prefer the top one, because its edit script can be described much more succinctly: D6, S1"t", C2, D2. The second's edit script would be S1"t", D3, C1, D4, C1, D1.
In order to get the alignment algorithm to also "prefer" the first alignment, we can adjust gap costs so that starting a blocks of gaps costs more than continuing an existing block of gaps. If we make it so that a column containing a gap costs -2 instead of -1 when the preceding column contains no gap, then what we are effectively doing is penalising the number of contiguous blocks of gaps (since each contiguous block of gaps must obviously have a first position). Under this model, the first alignment above now costs -11, because it contains two contiguous blocks of gaps. The second alignment now costs -12, because it contains three contiguous blocks of gaps. IOW, the algorithm now prefers the first alignment.
This model, in which every aligned position containing a gap costs g and the first position in any contiguous block of gap columns costs g + s, is called the affine gap cost model, and an O(nm) algorithm was given for this by Gotoh in 1982: http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user/archive/JMB82.pdf. Increasing the gap-open cost s will cause aligned segments to bunch together. You can play with the various cost parameters until you get alignments (corresponding to patches) that empirically look about right and are small enough.

There are two approaches to solving this kind of problem:
1) Establish a language for X (edit scripts, in this case), and figure out how to minimize the length of the applicable sentence; or,
2) Compute some kind of minimum representation for Y (string differences), and then think up a way to represent that in the shortest form.
The Myers paper demonstrates that for a particular language, finding the minimum set of changes and finding the minimum length of the change representation are the same problem.
Obviously, changing the language might invalidate that assumption, and certain changes might be extremely complicated to apply correctly (for example, suppose the language included the primitive kP which means to remove the next k characters whose indices are prime. For certain diffs, using that primitive might turn out to be a huge win, but the applications are probably pretty rare. It's an absurd example, I know, but it demonstrates the difficulty of starting with a language.
So I propose starting with the minimum change list, which identifies inserts and deletes. We translate that in a straightforward way to a string of commands, of which there are exactly three. There are no indices here. The idea is that we start with a cursor at the beginning of the original string, and then execute the commands in sequence. The commands are:
= Advance the cursor without altering the character it points to
Ic Insert the character `c` before the cursor.
D Delete the character at the cursor.
Although I said there were exactly three commands, that's not quite true; there are actually A+2 where A is the size of the alphabet.
This might result in a string like this:
=========================IbIaInIaInIaDD=D=D============================
Now, let's try to compress this. First, we run-length encode (RLE), so that every command is preceded by a repeat count, and we drop the trailing =s
27=1Ib1Ia1In1Ia1In1Ia2D1=1D1=1D
(In effect, the RLE recreates indices, although they're relative instead of absolute).
Finally, we use zlib to compress the resulting string. I'm not going to do that here, but just to give some idea of the sort of compression it might come up with:
27=1Ib1Ia1In||2D1=1D|
______+| ____+
___<---+
(Trying to show the back-references. It's not very good ascii art, sorry.)
Liv-Zempell is very good at finding and optimizing unexpected repetitions. In fact, we could have just used it instead of doing the intermediate RLE step, but experience shows that in cases where RLE is highly effective, it's better to LZ the RLE than the source. But it would be worth trying both ways to see what's better for your application.

A common approach to this that uses very few bytes (though not necessarily the theoretical optimal number of bytes) is the following:
Pad the bytes with some character (perhaps zero) until they have the same lengths.
XOR the two streams together. This will result in a byte stream that is zero everywhere the bytes are the same and nonzero otherwise.
Compress the XORed stream using any compression algorithm, perhaps something like LZW.
Assuming that the patch you have is a localized set of changes to a small part of the file, this will result in a very short patch, since the bulk of the file will be zeros, which can be efficiently compressed.
To apply the patch, you just decompress the XORed string and then XOR it with the byte stream to patch. This computes
Original XOR (Original XOR New) = (Original XOR Original) XOR New = New
Since XOR is associative and self-inverting.
Hope this helps!

There is a new promising approach to change detection.
The sequence alignment problem is considered to be an abstract model for changes detection in collaborative text editing designed to minimize the probability of merge conflict. A new cost function is defined as the probability of intersection between detected changes and random string.
The result should be more similar to patch length minimization then other known approaches.
It avoids both the known shortcomings of LCS and others approaches.
The cubic algorithm has been proposed.
http://psta.psiras.ru/read/psta2015_1_3-10.pdf

The perverse hangman problem

Perverse Hangman is a game played much like regular Hangman with one important difference: The winning word is determined dynamically by the house depending on what letters have been guessed.
For example, say you have the board _ A I L and 12 remaining guesses. Because there are 13 different words ending in AIL (bail, fail, hail, jail, kail, mail, nail, pail, rail, sail, tail, vail, wail) the house is guaranteed to win because no matter what 12 letters you guess, the house will claim the chosen word was the one you didn't guess. However, if the board was _ I L M, you have cornered the house as FILM is the only word that ends in ILM.
The challenge is: Given a dictionary, a word length & the number of allowed guesses, come up with an algorithm that either:
a) proves that the player always wins by outputting a decision tree for the player that corners the house no matter what
b) proves the house always wins by outputting a decision tree for the house that allows the house to escape no matter what.
As a toy example, consider the dictionary:
bat
bar
car
If you are allowed 3 wrong guesses, the player wins with the following tree:
Guess B
NO -> Guess C, Guess A, Guess R, WIN
YES-> Guess T
NO -> Guess A, Guess R, WIN
YES-> Guess A, WIN

This is almost identical to the "how do I find the odd coin by repeated weighings?" problem. The fundamental insight is that you are trying to maximise the amount of information you gain from your guess.
The greedy algorithm to build the decision tree is as follows:
- for each guess, choose the guess which for which the answer is "true" and which the answer is "false" is as close to 50-50 as possible, as information theoretically this gives the most information.
Let N be the size of the set, A be the size of the alphabet, and L be the number of letters in the word.
So put all your words in a set. For each letter position, and for each letter in your alphabet count how many words have that letter in that position (this can be optimised with an additional hash table). Choose the count which is closest in size to half the set. This is O(L*A).
Divide the set in two taking the subset which has this letter in this position, and make that the two branches to the tree. Repeat for each subset until you have the whole tree. In worst case this will require O(N) steps, but if you have a nice dictionary this will lead to O(logN) steps.

This isn't strictly an answer, since it doesn't give you a decision tree, but I did something very similar when writing my hangman solver. Basically, it looks at the set of words in its dictionary that match the pattern and picks the most common letter. If it guesses wrong, it eliminates the largest number of candidates. Since there's no penalty to guessing right in hangman, I think this is the optimal strategy given the constraints.
So with the dictionary you gave, it would first guess a correctly. Then it would guess r, also correctly, then b (incorrect), then c.
The problem with perverse hangman is that you always guess wrong if you can guess wrong, but that's perfect for this algorithm since it eliminates the largest set first. As a slightly more meaningful example:
Dictionary:
mar
bar
car
fir
wit
In this case it guesses r incorrectly first and is left with just wit. If wit were replaced in the dictionary with sir, then it would guess r correctly then a incorrectly, eliminating the larger set, then w or f at random incorrectly, followed by the other for the final word with only 1 incorrect guess.
So this algorithm will win if it's possible to win, though you have to actually run through it to see if it does win.

Ordering a dictionary to maximize common letters between adjacent words

This is intended to be a more concrete, easily expressable form of my earlier question.
Take a list of words from a dictionary with common letter length.
How to reorder this list tto keep as many letters as possible common between adjacent words?
Example 1:
AGNI, CIVA, DEVA, DEWA, KAMA, RAMA, SIVA, VAYU
reorders to:
AGNI, CIVA, SIVA, DEVA, DEWA, KAMA, RAMA, VAYU
Example 2:
DEVI, KALI, SHRI, VACH
reorders to:
DEVI, SHRI, KALI, VACH
The simplest algorithm seems to be: Pick anything, then search for the shortest distance?
However, DEVI->KALI (1 common) is equivalent to DEVI->SHRI (1 common)
Choosing the first match would result in fewer common pairs in the entire list (4 versus 5).
This seems that it should be simpler than full TSP?

What you're trying to do, is calculate the shortest hamiltonian path in a complete weighted graph, where each word is a vertex, and the weight of each edge is the number of letters that are differenct between those two words.
For your example, the graph would have edges weighted as so:
DEVI KALI SHRI VACH
DEVI X 3 3 4
KALI 3 X 3 3
SHRI 3 3 X 4
VACH 4 3 4 X
Then it's just a simple matter of picking your favorite TSP solving algorithm, and you're good to go.

My pseudo code:
Create a graph of nodes where each node represents a word
Create connections between all the nodes (every node connects to every other node). Each connection has a "value" which is the number of common characters.
Drop connections where the "value" is 0.
Walk the graph by preferring connections with the highest values. If you have two connections with the same value, try both recursively.
Store the output of a walk in a list along with the sum of the distance between the words in this particular result. I'm not 100% sure ATM if you can simply sum the connections you used. See for yourself.
From all outputs, chose the one with the highest value.
This problem is probably NP complete which means that the runtime of the algorithm will become unbearable as the dictionaries grow. Right now, I see only one way to optimize it: Cut the graph into several smaller graphs, run the code on each and then join the lists. The result won't be as perfect as when you try every permutation but the runtime will be much better and the final result might be "good enough".
[EDIT] Since this algorithm doesn't try every possible combination, it's quite possible to miss the perfect result. It's even possible to get caught in a local maximum. Say, you have a pair with a value of 7 but if you chose this pair, all other values drop to 1; if you didn't take this pair, most other values would be 2, giving a much better overall final result.
This algorithm trades perfection for speed. When trying every possible combination would take years, even with the fastest computer in the world, you must find some way to bound the runtime.
If the dictionaries are small, you can simply create every permutation and then select the best result. If they grow beyond a certain bound, you're doomed.
Another solution is to mix the two. Use the greedy algorithm to find "islands" which are probably pretty good and then use the "complete search" to sort the small islands.

This can be done with a recursive approach. Pseudo-code:
Start with one of the words, call it w
FindNext(w, l) // l = list of words without w
Get a list l of the words near to w
If only one word in list
Return that word
Else
For every word w' in l do FindNext(w', l') //l' = l without w'
You can add some score to count common pairs and to prefer "better" lists.

You may want to take a look at BK-Trees, which make finding words with a given distance to each other efficient. Not a total solution, but possibly a component of one.

This problem has a name: n-ary Gray code. Since you're using English letters, n = 26. The Wikipedia article on Gray code describes the problem and includes some sample code.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio