How to handle multiple letters in a Wordle game - logic

I'm making a Wordle type game (you guess a word, if your guess has the right letter in the right spot the guessed letter should go green, if it is the right letter but the wrong spot it shoulod go yellow). I can't quite figure out the logic for colouring the squares when there are multiple letters in the guess.
Right now I have something like:
For X = 1 to 2
If GuessLetter(X) = WordLetter(X) then set GuessLetter(X) to green
Else if GuessLetter(X) is in WholeWord then set GuessLetter(X) to yellow.
Loop
However in the case of the word being AS and the guess being AA, this logic will set the first A green (right letter, right spot, then the second letter yellow (right letter, wrong spot). The correct result should be Green, no colour as there is only one A in the word.
What is the most efficient logic to stop a duplicate guess letter being coloured incorrectly yellow?
I am thinking something like count unique letters in the word and the guess, and skip letters in the guess that exceed the count of that letter in the word. But this does not feel very elegant. Is there a better way?
I'm doing this in javascript but interested in the general logic most of all.

Related

Check if one string includes a substring with Levenshtein distance of 1 from other string

My problem is that we want our users to enter the code like this:
639195-EM-66-XA-53-WX somewhere in the input, so the result may look like this: The code is 639195-EM-66-XA-53-WX, let me in. We still want to match the string if they make a small error in the code (Levenshtein distance of 1). For example The code is 739195-EM-66-XA-53-WX, let me in. (changed 6 to 7 in the first letter of the code)
The algorithm should match even if user skips dashes, and it should ignore lowercase/uppercase letters. These requirements are easy to fulfil, because I can remove all dashes and do to_uppercase.
Is there an algorithm for something like that?
Generating all strings with the distance of 1 from original code is computationally expensive.
I was also thinking about using something like Levenshtein distance, but ignoring missing letters that user added in the second string, but that would allow wrong letters in the middle of the code.
Searching for the code in user input seems a little bit better, but still not very clean.
I had an idea for a solution, maybe this is good enough for you:
As you said, first remove the dashes and make everything upper (or lower) case:
Sentence: THE CODE IS 639195EM66XA53WX, LET ME IN
Code: 639195EM66XA53WX
Split the code in the middle (c1 and c2), because Levenshtein distance of 1 means that there can only be one mistake (insertion, deletion or replacement of a single character), so one of c1 or c2 has to match if the code is present in the sentence with just 1 or less mistakes. Splitting in the middle because the longer both substrings of the code are the fewer matches you should get:
c1: 639195EM
c2: 66XA53WX
Now try to find c1 and c2 in your sentence, if you find a match then you either have to go forward (c1 matched) or backwards (c2 matched) in the sentence to check if the Levenshtein distance of the missing part is 1 or less.
So in your example you would find c2 and then:
Set pointers to the last character of c1 and the character before the match.
While the characters are the same reduce both pointers by 1 (go backwards in both strings).
If you can consume c1 completely this way you found an exact match (Levenshtein distance of 0).
Otherwise try the 3 possibilities for Levenshtein distance of 1:
Only move the pointer of the c1 backwards and see if the rest matches (deletion).
Only move the pointer of the sentence backwards and see if the rest matches (insertion).
Move both pointers backwards and see if the rest matches (replacement).
If one of them succeeds you found a match with Levenshtein distance of 1, otherwise the distance is higher.

How to generate letter grids with lots of words

The question is basically "how do I generate a good grid for the game 'Boggle' with lots of words" where good is defined as having lots of words of 5 or more letters.
Boggle is a game where you roll dice with letters on them, they are placed in a 4x4 grid. Example:
H S A V
E N I S
K R G I
S O L A
Words can be made by connecting letters horizontally, vertically or diagonally. In the good example grid above you can make the words "VANISHERS", "VANISHER", "KNAVISH", "ALIGNERS", "SAVINGS", "SINKERS" and around 271 other words depending on the dictionary used, for example "AS", "I", "AIR", "SIN", "IS", etc...
As a bad example this grid
O V W C
T K Z O
Y N J H
D E I E
only has ~44 words only 2 of which are > 4 letters long. "TYNED" and "HINKY".
There's lots of similar questions but AFAICT not this exact question. This is obviously a reference to the game "Scramble with Friends".
The first solution, picking letters at random, has the problem that if you accidently pick all consonants there will be no words. Adding a few random vowels is not enough to guarantee a good set of words. You might only get 1 to 4 letter words whereas a good algorithm will choose a set of letters that has > 200 words with many words > 7 letters.
I'm open to any algorithm. Obviously I could write code to brute force solutions finding every possible grid and then sorting them by grids with the most words but that simple solution would take forever to run.
I can imagine various heuristics like choosing a long word (8-16 letters), putting those letters in the grid at random but in a way that can actually still make the word and then filling in the left over spaces. I suspect that's also not enough to guarantee a good set of words though I haven't tried it yet.
It's possible the solution requires pre-processing a dictionary to know common parts of words. For example all words that end in "ing" or "ers" or "ght" or "tion" or "land". Or somehow organizing them into a graph of shared letters. Maybe weighting certain sets of letters so "ing" or "ers" are inserted often.
Ideas?
Short of the brute-force search proposal there is probably no way to guarantee that you have a good grid. If you use the letter frequency as found on the Boggle dice, then you will get 'average' grids (exactly as if you roll the dice). You could improve this by adding extra heuristics or filters, for example:
ensure that (almost) every consonant is 'in-reach-of' a vowel
ensure 'Q' is 'in-reach-of' a 'U'
ensure the ratio of vowels to consonants is within a set range
ensure the number of rare consonants is not too large
etc
Then you could
set letters using weighted letter frequency
change (swap/replace) letters not meeting your heuristics
It would still be possible for a bad grid to get through unless you checked via brute-force, but you may be able to reduce the number of bad grids substantially from those returned by a simple randomly generated grid.
Alternately, generate random grids and do the brute force work as required to pick good grids. But do this in the background (days or weeks before needed). Then store a bunch of good grids and choose one randomly as required when needed (and cross it off your list so you don't see it again).
The way Boggle works is that there are six-sided die with certain letters on the side. Those die are randomly assigned to the 16 squares and then rolled. Common letters occur on more faces of the dice. Search around - you may be able to get the exact set of dice.
Calculate statistical letter frequency and letter-pair frequencies from the dictionary.
Start from randomly choosing one of the four central squares
Randomly choose a letter for that square weighted by single letter frequency.
Recursively:
4.1. Randomly choose one of all the empty connected squares.
4.2. Randomly choose a letter for that square weighted by the combination, (average), of the dual letter frequencies of any connected filled square and the single letter frequencies of any connected empty square.
Et voila!
P.S. You might also want to experiment with adding a global letter derating based on its current count of appearances in the grid to 4.2.

How to choose the next center in the longest palindrome algorithm?

This is a question about the longest palindrome algorithm discussed here some time ago. The quoted blog post, which explains the algorithm, says: "to pick the next center, take the center of the longest palindromic proper suffix of the current palindrome". Unfortunately, they don't provide a proof and I don't really understand why the next center is the center of the longest palindromic proper suffix of the current palindrome.
Can anybody prove/explain it ?
So we're moving to the right...
Say your "current" palindrome is 40 letters big. (Maybe centered on say position 100.) You're trying to find a bigger one.
(OK, there might be a bigger one that is 900 letters long, and that is 50,000 letters to the right -- totally uninvolved with this one. That's fine. But we'll get to that in the future. For now, we have to move the center to the right while looking for longer-than-40 palindromes. Makes sense?)
So we have to move to the right - we could move one step. BUT we want to move as far as possible without missing any.
Now, if the next one to the right is going to include this one...in fact, it has to include the right-most letter of this group of 40. (It can't be further to the left, as we've already checked, so, it must center after 100, and, because it's going to be longer than 40, it must include our right-hand letter, #120.)
So how far back do we have to go?
Well, you can't go back (from 120) further than a palindrome! If it's not a palindrome in the middle it will never be a palindrome.
3333333333333331110111
You can only go "back" to the 0. the 1 sitting to the left of the 0 (for example), could never be a palindrome.
So it's that simple. You have to include our rightmost letter (if we're going to include any of us at all), and, you want it to be as big as possible, and it has to be a palindrome because palindromes can only start (I mean "from the middle") with palindromes.
in the example above it's not possible that the 1 to the left or the 0, or let's say the right-most 3, could ever in this universe center a palindrome, no matter what we later find on the right. They don't have palindromes around them, so they could "never be" a palindrome center!
Note that the 3 in the middle of the 3s could possibly center a bigger palindrome .... but don't forget we've already checked this is the longest palindrome so far (based on centers, from the left), so that cannot be true.
So any palindrome that is longer than this one -- rather, the next possible starting point for a palindrome longer than this one -- is that 0.
In other words, it's simply the center of the biggest palindrome we currently have at the right. (so, not the "111" which is a palindrome but short, but the "1110111" which is the longest palindrome you can see stuck on the right.)
Indeed, the two possibilities we have to check are (A) the "0" and (B) the "1" at the second-last spot. of course, among those two possibilties, we have to go from left to right, so (A) the "0" is indeed the next one to check.
Don't forget those two (the 0 and the 1 in question) are equivalent to saying "there's a palindrome 1110111 stuck to the end, and there's a shorter palindrome 111 stuck to the end".
Of course 1110111 is longer, so the center of 1110111 is obviously to the left of the center of 111.
The longest palindrome stuck to the right, will of course have the center closest the left.
So hopefully that makes clear JUST the specific part of the discussion on the linked blog, which, you asked about!!! I deliberately repeated myself in a number of ways, hopefully it helps. It's Jungian algorithms day :)
Again please note I am specifically and only trying to clarify the very specific issue Michael asked about.
Bloody confusing eh?
BTW, I simply ignored the issue of on-character off-character centers - but it is irrelevant to understanding what you asked about.

The perverse hangman problem

Perverse Hangman is a game played much like regular Hangman with one important difference: The winning word is determined dynamically by the house depending on what letters have been guessed.
For example, say you have the board _ A I L and 12 remaining guesses. Because there are 13 different words ending in AIL (bail, fail, hail, jail, kail, mail, nail, pail, rail, sail, tail, vail, wail) the house is guaranteed to win because no matter what 12 letters you guess, the house will claim the chosen word was the one you didn't guess. However, if the board was _ I L M, you have cornered the house as FILM is the only word that ends in ILM.
The challenge is: Given a dictionary, a word length & the number of allowed guesses, come up with an algorithm that either:
a) proves that the player always wins by outputting a decision tree for the player that corners the house no matter what
b) proves the house always wins by outputting a decision tree for the house that allows the house to escape no matter what.
As a toy example, consider the dictionary:
bat
bar
car
If you are allowed 3 wrong guesses, the player wins with the following tree:
Guess B
NO -> Guess C, Guess A, Guess R, WIN
YES-> Guess T
NO -> Guess A, Guess R, WIN
YES-> Guess A, WIN
This is almost identical to the "how do I find the odd coin by repeated weighings?" problem. The fundamental insight is that you are trying to maximise the amount of information you gain from your guess.
The greedy algorithm to build the decision tree is as follows:
- for each guess, choose the guess which for which the answer is "true" and which the answer is "false" is as close to 50-50 as possible, as information theoretically this gives the most information.
Let N be the size of the set, A be the size of the alphabet, and L be the number of letters in the word.
So put all your words in a set. For each letter position, and for each letter in your alphabet count how many words have that letter in that position (this can be optimised with an additional hash table). Choose the count which is closest in size to half the set. This is O(L*A).
Divide the set in two taking the subset which has this letter in this position, and make that the two branches to the tree. Repeat for each subset until you have the whole tree. In worst case this will require O(N) steps, but if you have a nice dictionary this will lead to O(logN) steps.
This isn't strictly an answer, since it doesn't give you a decision tree, but I did something very similar when writing my hangman solver. Basically, it looks at the set of words in its dictionary that match the pattern and picks the most common letter. If it guesses wrong, it eliminates the largest number of candidates. Since there's no penalty to guessing right in hangman, I think this is the optimal strategy given the constraints.
So with the dictionary you gave, it would first guess a correctly. Then it would guess r, also correctly, then b (incorrect), then c.
The problem with perverse hangman is that you always guess wrong if you can guess wrong, but that's perfect for this algorithm since it eliminates the largest set first. As a slightly more meaningful example:
Dictionary:
mar
bar
car
fir
wit
In this case it guesses r incorrectly first and is left with just wit. If wit were replaced in the dictionary with sir, then it would guess r correctly then a incorrectly, eliminating the larger set, then w or f at random incorrectly, followed by the other for the final word with only 1 incorrect guess.
So this algorithm will win if it's possible to win, though you have to actually run through it to see if it does win.

Is it possible to generate Pangram from given word list?

Pangram is a sentence using every letter of the alphabet at least once.
Is it possible to generate shortest Pangram from given word list?
Lets say, I have word list like this
cat monkey temp banana christmas
fast quick quickest jumping
white brown black blue
fox xor jump jumps oven over
now the is was
lazy laziest crazy
dig dog joker mighty
And Like to generate possible pangrams list like followings
the quick over lazy jumps fox dog brown
brown dog fox jumps lazy over quick the
quick brown fox jumps over the lazy dog
Grammar and word ordering is no need to consider for now (I am going to do it in non-english language)
Any Ideas, algorithms, codes, references, will be greatly appreciated!
PS: This is not a homework
The simplest way to generate all possible pangrams from the word list is probably to generate all possible combinations of words from the list, then for each of them, check whether it's a pangram. To do the check, walk through the string and set a bool to true for each letter that's in the string. At the end, it's a pangram iff the bools have all been set to true.
A more efficient method would probably be to walk through each word, and set up an array of bools (or a set of bits, such as in a 32-bit int) along with the length of the word. You can then find the bits that or'd together produce a value with all 26 bits set, and you have a pangram.
As you're putting a pangram together, you can add bounds-check, so if adding a word would make a potential pangram longer than your current shortest pangram (if any) you stop that check right there. If you start by sorting your words by length, the minute you hit a longer combination, you can quit that whole set of attempts, and go on to the next possibility.
If you want to get even more sophisticated about it, You can start by building the same kind of bit set as above. Then take those, and add together the bits to determine which letters occur in the fewest words. When you start to generate a potential pangram, you know it must include one of those words. E.g. in the list you gave above, "lazy", "laziest" and "crazy" seem to be the only ones that include 'z', so you know immediately that every pangram must include one of those three words. None of those includes a "q", and the only words that do include "q" seem to be "quick", and "quickest", so (again) every pangram must include one of those two (of course I'm going from manual inspection here, so I might have missed a word). So, every possible pangram from that list includes (and might as well start with): (quick|quickest) (lazy|laziest|crazy).
You could also consider preprocessing your word list: any word that's longer than another, but doesn't contain at least one letter missing from the other can be eliminated immediately. As a hypothetical example, if you have "ab" and "abab", you know that "abab" can never result in a shorter pangram than "ab", so you might as well eliminate it from the list immediately.
Sure. Here's one algorithm:
Let Lw be the list of words given.
Let Ld be the list of distinct words in Lw.
Let Lc be the list of all possible combinations using words from Ld. If Ld contains n elements, Lc will contain 2n elements.
Let P be the shortest Pangram (desired result). Initially P will be empty.
Iterate over each item (combination) in Lc. In each iteration:
Let C be the current combination being considered.
Check if C is a Pangram.
If C is a Pangram, check if P is empty or if C is shorter than P.
If P is empty or if C is shorter than P, let P be C
Ideas for finding an approximate solutions:
determine the letter frequency of your set
score each word
add words with the highest score until you have every letter
Word scoring could look something like:
score = 0
foreach unique letter in word
score += 1/letter_frequency[letter]
score /= word.length

Resources