Word find style game, distributed letter generation algorithm - algorithm

So, I am working on a word find style game, and how I generate new letters right now just isn't cutting it. I mean it works, however it seems to generate either letters that aren't too often used (see https://en.wikipedia.org/wiki/Letter_frequency) or generates too many of 1 letter.
Right now I just use a mod based on a random number and chooses based on that, which again it works but is not ideal.
So i have 2 cases
1) On start, it will generate the board with 25 letters which is randomly generated.
2) When a word is found, I remove those letters from the board and generate new letters to replace them
Is there a known algorithm that could based on https://en.wikipedia.org/wiki/Letter_frequency generate letters that are most used in words?
I could just do some loop over the existing letters and do a lot looping, letter count. and based on that determine what letter to generate.
I'd prefer something a little less crazy and as well be able to possibly use it for other languages (however not necessary at this point)
Any pointers would be greatly appreciated!

You could create a pool of letters according to frequency, for example the 98 tiles of English letters from Scrabble.
When you fill your grid, you remove the picked letters from the pool and place them in the grid. When the player selects a valid word from the grid, do the reverse: Remove the letters from the board and put them back into the pool. Then draw new letters to fill the gaps.
When you want to prefill the grid with some existing words to get the player started, you should also pick letters from the pool.
You can use a simple array for the pool. When you remove a random letter, shorten the array by putting the last element in the place where the picked element was. When you put back elements, just append them to the end of the array.

Related

How do I modify the prefix trie data structure to handle words that are in the middle?

I want to implement simple autocomplete functionality for a website. I first wanted to use the prefix trie data structure for this, and that's how autocomplete usually works, you enter a prefix and you can search the trie for the possible suffixes, however the product owner wants to handle the words that are in the middle as well.
Let me explain what I mean. Imagine I have these product names:
tile for bathroom
tile for living room
kitchen tile
kitchen tile, black
some other tile, green
The user searches for "tile" and they will only see the first 2 results if I use the prefix trie, but I want all those results to pop up, however I don't know about any efficient data structure to handle this. Can you please suggest something? Can a prefix trie be modified to handle this?
I have thought about some modifications, such as inserting all suffixes, etc, but they will give wrong results, for example, I have inserted suffixes for
kitchen tile, black
some other tile, green
and kept the prefixes in the first node for each suffix (kind of like cartesian product), that way I can get the result of "some other tile, black" which doesn't exist. So this solution is bad. Also this solution will use a lot of memory...
The trie data structure indeed works for prefix match operations, not for in the middle text search
The usual data structure to support in the middle text search is the suffix tree: https://en.wikipedia.org/wiki/Suffix_tree
It requires enough space to store about 20 times your list of words in memory, so yes it costs more memory
Suffix array is a space efficient alternative: https://en.wikipedia.org/wiki/Suffix_array
Don't over-think this. Computers are fast. If you're talking on the order of thousands of products in memory, then a sequential search doing a contains check is going to be plenty fast enough: just a few milliseconds, if that.
If you're talking a high-traffic site with thousands of requests per second, or a system with hundreds of thousands of different products, you'll need a better approach. But for a low-traffic site and a few thousand products, do the simple thing first. It's easy to implement and easy to prove correct. Then, if it's not fast enough, you can worry about optimizing it.
I have an approach that will work using simple tries.
Assumption:- User will see Sentence once the whole word is complete
Let's take above example to understand this approach.
1. Take each sentence, say tile for bathroom.
2. Split the sentences into words as - tile, for, bathroom.
3. Create a tuple of [String, String], so for above example we will get three tuples.
(i) [tile, tile for bathroom]
(ii) [for, tile for bathroom]
(iii)[bathroom, tile for bathroom]
4. Now insert the first String of the above tuple into your trie and store the
other tuple (which is the whole sentence) as a String object to the
last character node for the word. i.e when inserting tile, the node at
character e will store the sentence string value.
5. One case to handle here is, like the tile word appears in two strings, so in that case
the last character e will store a List of string having values - tile for bathroom and tile for living room.
Once you have the trie ready based on the above approach, you would be able to search the sentence based on any word being used in that sentence. In short, we are creating each word of the sentence as tag.
Let me know, if you need more clarity on the above approach.
Hope this helps!

Creating very specific Alpha-numeric Codes

So I know there are websites out there that do this for you, but not to the extent that I need. I want to be able to Create a 13 Digit Alpha Numeric code several times over, if possible have it spit out 1,000 codes at a time. However my problem is I only want there to be 4 numbers max(so 0-4 out of the 13 will be numbers and the rest CAPITAL letters), here is an example: CHC-RCJV-6KK-ZUA . The Hyphens are not a neccesity. I am new to coding for the most part, I'm not sure if it's possible to do this on windows If so I would prefer it, however I can use linux if needed. Thanks for any help!
You want up to 4 random digits and the rest capital letters. That gives you a five stage process:
Pick how many digits, from the range [0..4].
Pick that many random single digits and store them in a list.
Pick up to 13 random capital letters and store them in the same list.
Shuffle the contents of your list.
Insert the hyphens and print/display/return/whatever.
Try coding that for yourself. If you have problems making it work then show us your code and we will help you.

Algorithm: Solve crossword puzzle given a list of words

I looked at previous questions of similar problems, but I have a specific question related to this algorithm. The problem statement (https://www.hackerrank.com/challenges/crossword-puzzle/problem) is as follows:
A 10x10 Crossword grid is provided to you, along with a set of words (or names of places) which need to be filled into the grid.
The cells in the grid are initially, either + signs or - signs.
Cells marked with a + have to be left as they are. Cells marked with a - need to be filled up with an appropriate character.
Sample input:
+-++++++++
+-++++++++
+-++++++++
+-----++++
+-+++-++++
+-+++-++++
+++++-++++
++------++
+++++-++++
+++++-++++
LONDON;DELHI;ICELAND;ANKARA
Corresponding output:
+L++++++++
+O++++++++
+N++++++++
+DELHI++++
+O+++C++++
+N+++E++++
+++++L++++
++ANKARA++
+++++N++++
+++++D++++
I made the mistake of writing out an algorithm without fully understanding the problem, where I just put the next available letter in an empty spot and solving the maze that way (here's my code):
def populate_grid(maze, locations)
maze.each_index do |row|
maze[row].each_index do |col|
if maze[row][col] == "-"
maze[row][col] = locations.first[0]
if locations.first.length == 1
locations.shift # remove this location altogether
else
locations[0] = locations[0][1...locations.first.length]
end
populate_grid(maze, locations)
end
end
end
end
Unfortunately, there isn't a provided solution for this problem, and I'd like to know how to build the consistent directionality per word (e.g only goes in horizontal/vertical direction). I thought about using a 3 parameter as a Boolean for whether the word is going up or down, but that didn't seem feasible to me.
Anyone have ideas for how to preserve directionality?
In general, you need to treat with the lexicon as word units, and handle the grid (that's the cruciverbalist's term for your maze) with routines to insert an entire word as a unit.
You'll need to "parse" the grid to identify all available locations.
Write a function to match a word to an available location, or vice versa. A location consists of a starting square (row, col), direction (boolean), and letter pattern (blanks, except where a crossing word is already filled in). match will consider word length and any squares already filled in.
Now, you can iterate over placing each word in turn, or filling each grid location. Call match until you find a place to fill in a word. If you find none, the current grid fill does not lead to a solution; backtrack one word and try again. When you find a spot, fill in the word (another function), update any crossing locations with the now-filled letter, and go to the next word or location in your iteration.
If you reach the end successfully, you have a completed puzzle.
Does that get you moving?

Substitution cipher decryption using letter frequency analysis for text without blanks and special characters

I need to find the plain text for given cipher text. I also have statistics (in an Excel document) for the letters in the given language e.g. I have the frequencies of the letters and also of the digraphs.
I tried this approach so far: I evaluated the frequency of each letter in the cipher text I received. Then I sorted the letters in descending order by their frequencies and mapped each letter with the corresponding letter from the Excel document. The problem with this approach is that it gives me some text that has no meaning at all. That is because my text is pretty small (only 1500 characters long).
I considered doing some limited permutations, but I have no idea what could I use to evaluate how good some permutation is. I think a good evaluation function would solve my problem.
Be aware that all special characters and white spaces are removed from the text. Also there are no numbers.
Thank you in advance.
for fully automated decryption
you need to add some dictionary of commonly used words
and compare against it
the solution that finds most words from it is probably the right one
with letter probabilities comes few problems
they are derived for common texts
so if your encrypted text is for example technical paper and not beletry ...
or it includes equations or tables
then it can screw your overall letter occurence
so do it like this:
compute the probabilities of letters
divide letters into groups by probabilities
so commonly used (high probability) letters are grouped together (group A)
so less common used (mid probability) letters are grouped together (group B)
and the rest (low probability) also group together (group C)
substitute group A
first see if group A probabilities match your language
if not then the text is in different language,style/form,or it is not a plain text at all
in such case you can not proceed safely
if they match then substitute letters from group A
they should be OK on the first run
try substitute group B
so you know all the letters from group B (encrypted/decrypted)
so generate all permutations of substitutions
for each one try to decipher text
and search for words after decryption (ignoring not yet substituted letters)
compute the word count percentage
and remember the best one (or few top ones)
try substitute group C
do it the same as bullet 4
corrections
it is probable that in the final result will be few letters mixed
so there are ways to handle also this
you can have table of letters that are mixable between each other
so you can try permutate them and test against your dictionary
or find words in your text with 1-2 wrong letters per word (for bigger words like 5 or more letters)
and permutate/correct substitution of the wrong letters if enough such words found
[notes]
you can obtain dictionaries from translators
also saw some plain text translator tables online
the groups should have distinct probability difference to each other
number of groups can change with language
I had best results for this task with semi automated approach
steps 5,6 can use user input

Algorithm to estimate word's complexity

I need to estimate a complexity of words for typists.
For example "suffer" is easy than "people" becouse "o" and "p" is harder than "e" and "r".
Any key pressed by little finger is more hard to hit than by index finger.
And move finger from basic position is more harder than do not move.
And use shift key also add hardness.
What approach can be imlemented in this case?
I would check out the Carpalx website. The site details how they rate different keyboard layouts for typists, and already has some open-source software that implements their algorithms for any given keyboard layout. (Make sure to check out the typing effort, model parameters and keyboard evaluation sections.)
A simple approach:
Score each letter based off the finger positions. Add a modifier or multiplier for shift. Maybe add a reduction for repeated letters?
Take the word, add up the scores, and you should have one approach.
Test, modify scores as needed, repeat until you have a meaningful distribution.
You could hold a 2-D representation of your keyboard in an array as a sort of a connected graph, with the key name and coordonates as nodes, and 2 coordonates to where your hands are hovering over (around F and J?), then for each input key you calculate the distances from the key in the graph to your 2 "hover keys", take the minimum, add in a shift (caps) penalty and output a (possibly weighted) score.
You may want to try a very basic approach by giving a value to each key that may be typed, and simply add the value of all keys of the word you want to evaluate...
Purely based on distance between keys, calculate the distance using a table of keys, and add up distances.
As a base, you can give each key a difficulty rating and then just add the ratings.
Then you probably want to spot difficult finger patterns. For example the combination "sle" is more difficult than "sfe", because the former is a left-right-left combination. It's more difficult for the brain to coordinate between left and right hand as they are connected to each half of the brain, than it is to coordinate fingers on the same hand. It's common to press the keys in the wrong order in such combinations.
How common a word is also has an effect on the difficulty. More common words are typed more often so the brain learns the patterns. Also when a word contains a common word it's easier to type, like "hand" as it contains "and". On the other hand, words that contain only a part of a more common word becomes harder, as the brain wants to follow the more common pattern.
Typing difficulty is very subjective - it's akin to learning to play a musical instrument, so a word that is very difficult for one person to type is a piece of cake for another. Take for instance someone who's never sat at a keyboard before and ask them to type the word "Microsoft"... they'll hunt and peck and it'll probably take them a few seconds to type it. Take your average programmer that types this word a few dozen times a day and they'll rattle it off in less than a second.
On the other hand, take the first person and have them type the word "Microscope" and they'll likely take a very similar amount of time as they did to type "Microsoft", but the programmer will get to the "s" and either finish the word as "Microsoft" before deleting and replacing the characters with the correct ones, or they'll noticeably slow as they hit the "s" when their fingers don't immediately know the pattern for "Microscope" - in fact, I just had to type it 3 times because my fingers automatically finish the pattern for me without me thinking about it.
As far as word complexity goes therefore, it's not quite as straightforward as calculating the distance from the home keys, it comes down to the background environment of the typist, their usual typing speed, their literacy with the keyboard and a host of other things.
Instead of guessing, measure it.
Come up with a list of 100 words and ask a handful of people to type them in. Measure the amount of time between each keystroke. For every pair of letters, accumulate the total time taken by the user to move from the first to the second and divide by the number of times that letter pair appears to get an average, which is an actual direct estimate of the difficulty of moving between those two keys.
Of course, there will be some pairs of letters that don't appear anywhere in your words (e.g. ZQ). But those letter pairs will probably be irrelevant to your work anyway, unless you need to score random sequences of letters.
You'll also need to somehow account for mistyped letters. You could either discard these outright, or use mistyped letters to add some sort of penalty to that letter pair (reflecting the fact that mistyping one of the letters indicates that this letter pair may be difficult to type).

Resources