Find 5 letter words with 25 distinct characters - algorithm

Solving wordle efficiently (for humans and for computers) is all the rage right now.
One particular way of solving a wordle made me curious. The idea is to select 5 words that have distinct letters so you'll end up with 25 characters. If you use these 5 words as your first 5 guesses in the game, you'll have a close to 100% chance of getting the correct word in your last guess (it's essentially an anagram of all the clues and you'll probably have a few green ones). There is a set of words that is suggested (all of the words are valid English words):
brick
glent
jumpy
vozhd
waqfs
But this made me wonder: How many of these 5 word combinations are out there and I started whipping up a recursive algorithm but I am close to giving up.
My initial thought was:
Start with the first word
reduce overlapping words from the word list
pick the next remaining word in the word list
Repeat with the next word
But this only really works if you have a set of five distinct words in order.
For this list:
brick
feast
glent
jumpy
vozhd
waqfs
I will end up with: [brick, feast, jumpy, vozhd] because feast comes before glent and will filter it out but in the end glent would have been the better pick.
I wasn't able to find any algorithms for this specific problem so I was wondering if there is any existing algorithm that can be applied to this?

It's possible to brute-force this. For efficiency, one can discard all words with duplicate letters, and pre-process the words to use a bitmask of which letters they have (there are 26 letters, so this fits in a 32-bit unsigned integer).
Then just do a depth-first search, maintaining a list of words (bitmasks) that don't intersect with the words found so far.
I've written some go code that does this. It uses a shortened list of words that just contains the solution words (the full wordlist is too long to include here), but the code runs in a few seconds even with the full list.
Because it uses bitmasks to represent words, it's possible that there's multiple words with the same letters in the solution. The program shows those with a | inbetween. There's just one pair: cylix|xylic in the solution:
bling treck waqfs jumpy vozhd
pling treck waqfs jumby vozhd
brick glent waqfs jumpy vozhd
kreng clipt waqfs jumby vozhd
fjord chunk vibex gymps waltz
fjord gucks vibex nymph waltz
prick glent waqfs jumby vozhd
kempt brung waqfs cylix|xylic vozhd
blunk waqfs cimex grypt vozhd
clunk waqfs bemix grypt vozhd
It can be run here: https://go.dev/play/p/wVEDjx3G1fE
package main
import (
"fmt"
"math/bits"
"sort"
"strings"
)
var allWords = []string{
"bemix", "bling", "blunk", "brick", "brung", "chunk", "cimex", "clipt", "clunk", "cylix", "fjord", "glent", "grypt", "gucks", "gymps", "jumby", "jumpy", "kempt", "kreng", "nymph", "pling", "prick", "treck", "vibex", "vozhd", "waltz", "waqfs", "xylic",
}
func printSol(res []uint32, masks map[uint32][]string) {
var b strings.Builder
for i, r := range res {
if i > 0 {
b.WriteString(" ")
}
b.WriteString(strings.Join(masks[r], "|"))
}
fmt.Println(b.String())
}
func find5(w []uint32, mask uint32, n int, res []uint32, masks map[uint32][]string) {
if n == 5 {
printSol(res, masks)
return
}
sub := []uint32{}
for _, x := range w {
if x&mask != 0 {
continue
}
sub = append(sub, x)
}
for i, x := range sub {
res[n] = x
find5(sub[i+1:], mask|x, n+1, res, masks)
}
}
func find5clique() {
masks := map[uint32][]string{}
for _, x := range allWords {
m := uint32(0)
for _, c := range x {
m |= 1 << (c - 'a')
}
if bits.OnesCount32(m) == 5 {
masks[m] = append(masks[m], x)
}
}
maskSlice := []uint32{}
for m := range masks {
maskSlice = append(maskSlice, m)
}
sort.Slice(maskSlice, func(i, j int) bool {
return maskSlice[i] < maskSlice[j]
})
find5(maskSlice, uint32(0), 0, make([]uint32, 5, 5), masks)
}
func main() {
find5clique()
}

My choice of 4 words: batch, field, wrong, musky
Works very well for all forms of *ordles
Can’t find a fifth word with the remaining letters, though.

I'm currently working on the same thing. (Implemented In Python)
Here's My Code (Explanation Below):
import requests,string
wordlist = str(requests.get('https://gist.githubusercontent.com/dracos/dd0668f281e685bad51479e5acaadb93/raw/ca9018b32e963292473841fb55fd5a62176769b5/valid-wordle-words.txt').content).split('\\n');wordlist[0] = 'aahed'
alphabet = [str(_) for _ in string.ascii_lowercase]
for word in wordlist:
if [char in word for char in alphabet].count(True) != 5:
wordlist.remove(word)
for i in range(len(wordlist) ** 2):
alphabet = [str(_) for _ in string.ascii_lowercase]
currentwords=[]
for word in wordlist:
for char in word:
alphabet.remove(char)
currentwords.append(word)
with open("out.txt", "a") as f:
f.write(";".join(currentwords))
f.write("\n")
wordlist.pop(wordlist.index(currentwords[0]))
Basically, we load the wordlist, remove the duplicates:
for word in wordlist:
if [char in word for char in alphabet].count(True) != 5:
wordlist.remove(word)
then loop over the entire wordlist len of wordlist amount of times.
we reset/initialize the variables. and loop over every word in the wordlist.
we remove each character in the current word from the alphabet (aka left over letters)
and add that word to the current words.
we then output that to the out.txt file.
after we finish the nested loop. we remove the word we just got (since we're done with it) and continue.
this method will output combinations of any length (ie. 1,2,3,4,5) and is incredibly inefficient and is still in the making.
Please comment if you have any ideas for optimizing this!

Related

Algorithm for finding amount of word anagrams?

So I know the theory behind finding anagrams, shown here. For my purposes I need to find the amount of anagrams that can be found from a word excluding duplicates.
Allowing for duplicates, this is fairly simple.
aab has the following anagrams:
aab
aab
aba
aba
baa
baa
This amount can be found by calculating the factorial from the amount of letters
factorial := 1
for i := len(word); i > 0; i-- {
factorial = i * factorial
}
// aab -> 6
However, if you want to exclude duplicates you have reduced your potential anagrams from 6 to 3. An example of this is the word hello, which has 120 combinations, yet only 60 without duplicates.
I coded my own algorithm that made a map of letters and returned the length of the map, but this had issues as well.
hello -> 24 (actually 60)
helllo -> 24 (actually 120)
How can I accomplish this?
If the validity of the words is not considered whatsoever, then probably best to ditch the word "anagram". You're simply asking about permutations. There is a formula for permutations that accounts for duplicates:
For a word of length n, take the base number of permutations, which is n!.
Then, for each unique letter in the word, count the number of occurrences of that letter. For each of those letters, take the factorial of the number of occurences, and divide the number of permutations by it.
For "helllo":
n = 6
h = 1, e = 1, l = 3, o = 1
Permutations = 6! / (1! x 1! x 3! x 1!)
= 720 / 6
= 120
Code:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
func main() {
scanner := bufio.NewScanner(os.Stdin)
fmt.Print("Enter word: ")
scanner.Scan()
word := scanner.Text()
anagrams := factorial(len(word))
chars := strings.Split(word, "")
word1 := word
n := 0
for i := 0; i < len(word); i++ {
n = strings.Count(word1, chars[i])
if n > 0 {
anagrams = anagrams / factorial(n)
word1 = strings.Replace(word1, chars[i], "", -1)
}
}
fmt.Println(anagrams)
}
func factorial(n int) int {
factorial := 1
for i := n; i > 0; i-- {
factorial = i * factorial
}
return factorial
}
Results:
aab -> 3
helo -> 24
hello -> 60
helllo -> 120
You can use some combinatorics. First you count number of occurrences of each character. Then with newtons symbol you emplace every character on its places. for example given word
aabcdee
you have 7 places to put single letter and you have duplicates - double a and double e.
so u use that formula
you can place a on 2 of 7 places then you can multiply it by number of places where u can emplace b - 1 of 5 remaining places. Then c on 1 of 4. Then d on 1 of 3. Then e on 2 of 2.
Multiplying each of these formulas will give you number of anagrams in linear time (in case of using hashmap for letter counting).

Print permutations of a string given an index

I'm trying to learn recursion and going through the Stanford online video lectures and textbook. In the programming exercises a question is posed to generate all permutations for a string given an index. For example "ABCD" and index 2. This should generate "ABCD" and "ABDC".
I understand how to generate the permutations by using func permute(prefix, suffix) but this question is confusing me. Here is want I have so far:
func permute(s string) {
permuteHelper(s, 2)
}
func permuteHelper(s string, idx int) {
if idx == 0 {
fmt.Println(s)
return
}
for i := idx; i < len(s); i++ {
newS := s[:idx]
suffix := s[idx : idx+1]
newS += suffix
permuteHelper(newS, idx-1)
}
}
Output:
AB
AB
AB
AB
I don't want the answer, but perhaps some guidance in my thought process. I know I should create a static "AB" and then select "C" on one iteration and then select "D", then my base case should be triggered and print the string. Control will then return to "AB" and "i" should be 3 and I choose "D", but how do I then chose "C"?
You're on the right track and the overall form looks fine, but the details are still blurry.
Firstly,
newS := s[:idx]
suffix := s[idx : idx+1]
newS += suffix
is equivalent to
newS := s[:idx+1]
No real permuting is going on here; this is chopping off the back of the string and ignoring the loop variable i entirely. Try to swap two characters in the string for each recursive call and use both i and idx to do so; think of idx as a fixed pivot for swapping every i...len(s) element with per call frame. Good job ensuring you're not reassigning to the string in the current scope, though, because that'd mess up state for later iterations of the loop.
Second suggestion: To establish the base case, count recursively up to len(s) instead of down to zero. You can pretty much pretend the entire first chunk of the array doesn't exist. Think of it just like a regular permutation algorithm except you've skipped the first idx indices.
Also, this is more of a design point than an algorithmic issue, but I would expose the idx parameter to the caller instead of hiding it behind a wrapper. This makes the function reusable and more obvious as to what it does--as a user of a library, I'd be perplexed if a function named permute refused to permute the first 2 chars.
It's better to return a result than produce a side effect like printing, but I'll set that aside for pedagogy's sake.
Here's one solution (spoiler alert!):
package main
import "fmt"
func permute(s string, idx int) {
if idx == len(s) {
fmt.Println(s)
}
for i := idx; i < len(s); i++ {
a := []rune(s)
a[i], a[idx] = a[idx], a[i]
permute(string(a), idx + 1)
}
}
func main() {
permute("abcde", 2)
}
permute("abcde", 2) produces
abcde
abced
abdce
abdec
abedc
abecd

Finding all the shortest unique substring which are of same length?

Given a string sequence which contains only four letters, ['a','g','c','t']
for example: agggcttttaaaatttaatttgggccc.
Find all the shortest unique sub-string of the string sequence which are of equal length (the length should be minimum of all the unique sub-strings) ?
For example : aaggcgccttt
answer: ['aa', 'ag', 'gg','cg', 'cc','ct']
explanation:shortest unique sub-string of length 2
I have tried using suffix-arrays coupled with longest common prefix but i am unable to draw the solution perfectly.
I'm not sure what you mean by "minimum unique sub-string", but looking at your example I assume you mean "shortest runs of a single letter". If this is the case, you just need to iterate through the string once (character by character) and count all the shortest runs you find. You should keep track of the length of the minimum run found so far (infinity at start) and the length of the current run.
If you need to find the exact runs, you can add all the minimum runs you find to e.g. a list as you iterate through the string (and modify that list accordingly if a shorter run is found).
EDIT:
I thought more about the problem and came up with the following solution.
We find all the unique sub-strings of length i (in ascending order). So, first we consider all sub-strings of length 1, then all sub-strings of length 2, and so on. If we find any, we stop, since the sub-string length can only increase from this point.
You will have to use a list to keep track of the sub-strings you've seen so far, and a list to store the actual sub-strings. You will also have to maintain them accordingly as you find new sub-strings.
Here's the Java code I came up with, in case you need it:
String str = "aaggcgccttt";
String curr = "";
ArrayList<String> uniqueStrings = new ArrayList<String>();
ArrayList<String> alreadySeen = new ArrayList<String>();
for (int i = 1; i < str.length(); i++) {
for (int j = 0; j < str.length() - i + 1; j++) {
curr = str.substring(j, j + i);
if (!alreadySeen.contains(curr)){ //Sub-string hasn't been seen yet
uniqueStrings.add(curr);
alreadySeen.add(curr);
}
else //Repeated sub-string found
uniqueStrings.remove(curr);
}
if (!uniqueStrings.isEmpty()) //We have found non-repeating sub-string(s)
break;
alreadySeen.clear();
}
//Output
if (uniqueStrings.isEmpty())
System.out.println(str);
else {
for (String s : uniqueStrings)
System.out.println(s);
}
The uniqueStrings list contains all the unique sub-strings of minimum length (used for output). The alreadySeen list keeps track of all the sub-strings that have already been seen (used to exclude repeating sub-strings).
I'll write some code in Python, because that's what I find the easiest.
I actually wrote both the overlapping and the non-overlapping variants. As a bonus, it also checks that the input is valid.
You seems to be interested only in the overlapping variant:
import itertools
def find_all(
text,
pattern,
overlap=False):
"""
Find all occurrencies of the pattern in the text.
Args:
text (str|bytes|bytearray): The input text.
pattern (str|bytes|bytearray): The pattern to find.
overlap (bool): Detect overlapping patterns.
Yields:
position (int): The position of the next finding.
"""
len_text = len(text)
offset = 1 if overlap else (len(pattern) or 1)
i = 0
while i < len_text:
i = text.find(pattern, i)
if i >= 0:
yield i
i += offset
else:
break
def is_valid(text, tokens):
"""
Check if the text only contains the specified tokens.
Args:
text (str|bytes|bytearray): The input text.
tokens (str|bytes|bytearray): The valid tokens for the text.
Returns:
result (bool): The result of the check.
"""
return set(text).issubset(set(tokens))
def shortest_unique_substr(
text,
tokens='acgt',
overlapping=True,
check_valid_input=True):
"""
Find the shortest unique substring.
Args:
text (str|bytes|bytearray): The input text.
tokens (str|bytes|bytearray): The valid tokens for the text.
overlap (bool)
check_valid_input (bool): Check if the input is valid.
Returns:
result (set): The set of the shortest unique substrings.
"""
def add_if_single_match(
text,
pattern,
result,
overlapping):
match_gen = find_all(text, pattern, overlapping)
try:
next(match_gen) # first match
except StopIteration:
# the pattern is not found, nothing to do
pass
else:
try:
next(match_gen)
except StopIteration:
# the pattern was found only once so add to results
result.add(pattern)
else:
# the pattern is found twice, nothing to do
pass
# just some sanity check
if check_valid_input and not is_valid(text, tokens):
raise ValueError('Input text contains invalid tokens.')
result = set()
# shortest sequence cannot be longer than this
if overlapping:
max_lim = len(text) // 2 + 1
max_lim = len(tokens)
for n in range(1, max_lim + 1):
for pattern_gen in itertools.product(tokens, repeat=2):
pattern = ''.join(pattern_gen)
add_if_single_match(text, pattern, result, overlapping)
if len(result) > 0:
break
else:
max_lim = len(tokens)
for n in range(1, max_lim + 1):
for i in range(len(text) - n):
pattern = text[i:i + n]
add_if_single_match(text, pattern, result, overlapping)
if len(result) > 0:
break
return result
After some sanity check for the correctness of the outputs:
shortest_unique_substr_ovl = functools.partial(shortest_unique_substr, overlapping=True)
shortest_unique_substr_ovl.__name__ = 'shortest_unique_substr_ovl'
shortest_unique_substr_not = functools.partial(shortest_unique_substr, overlapping=False)
shortest_unique_substr_not.__name__ = 'shortest_unique_substr_not'
funcs = shortest_unique_substr_ovl, shortest_unique_substr_not
test_inputs = (
'aaa',
'aaaa',
'aaggcgccttt',
'agggcttttaaaatttaatttgggccc',
)
import functools
for func in funcs:
print('Func:', func.__name__)
for test_input in test_inputs:
print(func(test_input))
print()
Func: shortest_unique_substr_ovl
set()
set()
{'cg', 'ag', 'gg', 'ct', 'aa', 'cc'}
{'tg', 'ag', 'ct'}
Func: shortest_unique_substr_not
{'aa'}
{'aaa'}
{'cg', 'tt', 'ag', 'gg', 'ct', 'aa', 'cc'}
{'tg', 'ag', 'ct', 'cc'}
it is wise to benchmark how fast we actually are.
Below you can find some benchmarks, produced using some template code from here (the overlapping variant is in blue):
and the rest of the code for completeness:
def gen_input(n, tokens='acgt'):
return ''.join([tokens[random.randint(0, len(tokens) - 1)] for _ in range(n)])
def equal_output(a, b):
return a == b
input_sizes = tuple(2 ** (1 + i) for i in range(16))
runtimes, input_sizes, labels, results = benchmark(
funcs, gen_input=gen_input, equal_output=equal_output,
input_sizes=input_sizes)
plot_benchmarks(runtimes, input_sizes, labels, units='ms')
plot_benchmarks(runtimes, input_sizes, labels, units='μs', zoom_fastest=2)
As far as the asymptotic time-complexity analysis is concerned, considering only the overlapping case, let N be the input size, let K be the number of tokens (4 in your case), find_all() is O(N), and the body of shortest_unique_substr is O(K²) (+ O((K - 1)²) + O((K - 2)²) + ...).
So, this is overall O(N*K²) or O(N*(Σk²)) (for k = 1, …, K), since K is fixed, this is O(N), as the benchmarks seem to indicate.

Find most unique words, penalizing words in common

suppose I have n classes like:
A: this,is,a,test,of,the,salmon,system
B: i,like,to,test,the,flounder,system
C: to,test,a,salmon,is,like,to,test,the,iodine,system
I want to get the most unique words for each class, so something with a ranking that gives me
A: salmon
B: flounder
C: iodine, salmon
(as their first elements ; it can be a ranking of all words)
How do I do this? There will be hundreds of input classes each with tens of thousands of tokens.
I'm guessing this is essentially the sort of thing any search engine back-end does, but I'd like a fairly simple standalone thing.
Using a language like Python, you can write this efficiently in 8 lines. For hundreds of groups, each with tens of thousands of tokens, the running time sounds like it will take at most a few minutes (although I haven't tried this on actual input).
Create a hash-based dictionary mapping each word to the number of its occurrences.
Iterate over all groups, and all words in a group, and update this dictionary.
For each group,
a. If you need a total ranking, sort with the value in the dictionary as the critera
b. If you need the top k, use an order statistics type of algorithm again using the value in the dictionary as the criteria
Steps 1 + 2 should have expected linear complexity in the total number of words.
Step 3 is n log(n) per group for total ranking, and linear in the total number of words otherwise.
Here is the Python code for the top k. Assume all_groups is a list of lists of strings, and that k = 10.
from collections import Counter
import heapq
import operator
c = Counter()
for g in all_groups:
c.update(g)
for g in all_groups:
print heapq.nsmallest(k, [(w, c[w]) for w in g], key=operator.itemgetter(1))
What I understand from your question, I come to this solution as the least used words per class comparing with all the other classes.
var a = "this,is,a,test,of,the,salmon,system".split(","),
b = "i,like,to,test,the,flounder,system".split(","),
c = "to,test,a,salmon,is,like,to,test,the,iodine,system".split(","),
map = {},
min,
key,
parse = function(stringArr) {
var length = stringArr.length,
i,count;
for (i = 0; i< length; i++) {
if (count = map[stringArr[i]]) {
map[stringArr[i]] = count + 1;
}
else {
map[stringArr[i]] = 1;
}
}
},
get = function(stringArr) {
min = Infinity;
stringArr.forEach((item)=>{
if (map[item] < min) {
min = map[item];
key = item
}
});
console.log(key);
};
parse(a);
parse(b);
parse(c);
get(a);
get(b);
get(c);
Ignore the classes, go through all the words and make a frequency table.
Then, for each class select the word with the lowest frequency.
Example in Python (slightly unpythonic solution to maintain readability for non-Python users):
a = "this,is,a,test,of,the,salmon,system".split(",")
b = "i,like,to,test,the,flounder,system".split(",")
c = "to,test,a,salmon,is,like,to,test,the,iodine,system".split(",")
freq = {}
for word in a + b + c:
freq[word] = (freq[word] if word in freq else 0) + 1
print("a: ", min(a, key=lambda w: freq[w]))
print("b: ", min(b, key=lambda w: freq[w]))
print("c: ", min(c, key=lambda w: freq[w]))

Add the least amount of characters to make a palindrome

The question:
Given any string, add the least amount of characters possible to make it a palindrome in linear time.
I'm only able to come up with a O(N2) solution.
Can someone help me with an O(N) solution?
Revert the string
Use a modified Knuth-Morris-Pratt to find the latest match (simplest modification would be to just append the original string to the reverted string and ignore matches after len(string).
Append the unmatched rest of the reverted string to the original.
1 and 3 are obviously linear and 2 is linear beacause Knuth-Morris-Pratt is.
If only appending is allowed
A Scala solution:
def isPalindrome(s: String) = s.view.reverse == s.view
def makePalindrome(s: String) =
s + s.take((0 to s.length).find(i => isPalindrome(s.substring(i))).get).reverse
If you're allowed to insert characters anywhere
Every palindrome can be viewed as a set of nested letter pairs.
a n n a b o b
| | | | | * |
| -- | | |
--------- -----
If the palindrome length n is even, we'll have n/2 pairs. If it is odd, we'll have n/2 full pairs and one single letter in the middle (let's call it a degenerated pair).
Let's represent them by pairs of string indexes - the left index counted from the left end of the string, and the right index counted from the right end of the string, both ends starting with index 0.
Now let's write pairs starting from the outer to the inner. So in our example:
anna: (0, 0) (1, 1)
bob: (0, 0) (1, 1)
In order to make any string a palindrome, we will go from both ends of the string one character at a time, and with every step, we'll eventually add a character to produce a correct pair of identical characters.
Example:
Assume the input word is "blob"
Pair (0, 0) is (b, b) ok, nothing to do, this pair is fine. Let's increase the counter.
Pair (1, 1) is (l, o). Doesn't match. So let's add "o" at position 1 from the left. Now our word became "bolob".
Pair (2, 2). We don't need to look even at the characters, because we're pointing at the same index in the string. Done.
Wait a moment, but we have a problem here: in point 2. we arbitrarily chose to add a character on the left. But we could as well add a character "l" on the right. That would produce "blolb", also a valid palindrome. So does it matter? Unfortunately it does because the choice in earlier steps may affect how many pairs we'll have to fix and therefore how many characters we'll have to add in the future steps.
Easy algorithm: search all the possiblities. That would give us a O(2^n) algorithm.
Better algorithm: use Dynamic Programming approach and prune the search space.
In order to keep things simpler, now we decouple inserting of new characters from just finding the right sequence of nested pairs (outer to inner) and fixing their alignment later. So for the word "blob" we have the following possibilities, both ending with a degenerated pair:
(0, 0) (1, 2)
(0, 0) (2, 1)
The more such pairs we find, the less characters we will have to add to fix the original string. Every full pair found gives us two characters we can reuse. Every degenerated pair gives us one character to reuse.
The main loop of the algorithm will iteratively evaluate pair sequences in such a way, that in step 1 all valid pair sequences of length 1 are found. The next step will evaluate sequences of length 2, the third sequences of length 3 etc. When at some step we find no possibilities, this means the previous step contains the solution with the highest number of pairs.
After each step, we will remove the pareto-suboptimal sequences. A sequence is suboptimal compared to another sequence of the same length, if its last pair is dominated by the last pair of the other sequence. E.g. sequence (0, 0)(1, 3) is worse than (0, 0)(1, 2). The latter gives us more room to find nested pairs and we're guaranteed to find at least all the pairs that we'd find for the former. However sequence (0, 0)(1, 2) is neither worse nor better than (0, 0)(2, 1). The one minor detail we have to beware of is that a sequence ending with a degenerated pair is always worse than a sequence ending with a full pair.
After bringing it all together:
def makePalindrome(str: String): String = {
/** Finds the pareto-minimum subset of a set of points (here pair of indices).
* Could be done in linear time, without sorting, but O(n log n) is not that bad ;) */
def paretoMin(points: Iterable[(Int, Int)]): List[(Int, Int)] = {
val sorted = points.toSeq.sortBy(identity)
(List.empty[(Int, Int)] /: sorted) { (result, e) =>
if (result.isEmpty || e._2 <= result.head._2)
e :: result
else
result
}
}
/** Find all pairs directly nested within a given pair.
* For performance reasons tries to not include suboptimal pairs (pairs nested in any of the pairs also in the result)
* although it wouldn't break anything as prune takes care of this. */
def pairs(left: Int, right: Int): Iterable[(Int, Int)] = {
val builder = List.newBuilder[(Int, Int)]
var rightMax = str.length
for (i <- left until (str.length - right)) {
rightMax = math.min(str.length - left, rightMax)
val subPairs =
for (j <- right until rightMax if str(i) == str(str.length - j - 1)) yield (i, j)
subPairs.headOption match {
case Some((a, b)) => rightMax = b; builder += ((a, b))
case None =>
}
}
builder.result()
}
/** Builds sequences of size n+1 from sequence of size n */
def extend(path: List[(Int, Int)]): Iterable[List[(Int, Int)]] =
for (p <- pairs(path.head._1 + 1, path.head._2 + 1)) yield p :: path
/** Whether full or degenerated. Full-pairs save us 2 characters, degenerated save us only 1. */
def isFullPair(pair: (Int, Int)) =
pair._1 + pair._2 < str.length - 1
/** Removes pareto-suboptimal sequences */
def prune(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val allowedHeads = paretoMin(sequences.map(_.head)).toSet
val containsFullPair = allowedHeads.exists(isFullPair)
sequences.filter(s => allowedHeads.contains(s.head) && (isFullPair(s.head) || !containsFullPair))
}
/** Dynamic-Programming step */
#tailrec
def search(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val nextStage = prune(sequences.flatMap(extend))
nextStage match {
case List() => sequences
case x => search(nextStage)
}
}
/** Converts a sequence of nested pairs to a palindrome */
def sequenceToString(sequence: List[(Int, Int)]): String = {
val lStr = str
val rStr = str.reverse
val half =
(for (List(start, end) <- sequence.reverse.sliding(2)) yield
lStr.substring(start._1 + 1, end._1) + rStr.substring(start._2 + 1, end._2) + lStr(end._1)).mkString
if (isFullPair(sequence.head))
half + half.reverse
else
half + half.reverse.substring(1)
}
sequenceToString(search(List(List((-1, -1)))).head)
}
Note: The code does not list all the palindromes, but gives only one example, and it is guaranteed it has the minimum length. There usually are more palindromes possible with the same minimum length (O(2^n) worst case, so you probably don't want to enumerate them all).
O(n) time solution.
Algorithm:
Need to find the longest palindrome within the given string that contains the last character. Then add all the character that are not part of the palindrome to the back of the string in reverse order.
Key point:
In this problem, the longest palindrome in the given string MUST contain the last character.
ex:
input: abacac
output: abacacaba
Here the longest palindrome in the input that contains the last letter is "cac". Therefore add all the letter before "cac" to the back in reverse order to make the entire string a palindrome.
written in c# with a few test cases commented out
static public void makePalindrome()
{
//string word = "aababaa";
//string word = "abacbaa";
//string word = "abcbd";
//string word = "abacac";
//string word = "aBxyxBxBxyxB";
//string word = "Malayal";
string word = "abccadac";
int j = word.Length - 1;
int mark = j;
bool found = false;
for (int i = 0; i < j; i++)
{
char cI = word[i];
char cJ = word[j];
if (cI == cJ)
{
found = true;
j--;
if(mark > i)
mark = i;
}
else
{
if (found)
{
found = false;
i--;
}
j = word.Length - 1;
mark = j;
}
}
for (int i = mark-1; i >=0; i--)
word += word[i];
Console.Write(word);
}
}
Note that this code will give you the solution for least amount of letter to APPEND TO THE BACK to make the string a palindrome. If you want to append to the front, just have a 2nd loop that goes the other way. This will make the algorithm O(n) + O(n) = O(n). If you want a way to insert letters anywhere in the string to make it a palindrome, then this code will not work for that case.
I believe #Chronical's answer is wrong, as it seems to be for best case scenario, not worst case which is used to compute big-O complexity. I welcome the proof, but the "solution" doesn't actually describe a valid answer.
KMP finds a matching substring in O(n * 2k) time, where n is the length of the input string, and k substring we're searching for, but does not in O(n) time tell you what the longest palindrome in the input string is.
To solve this problem, we need to find the longest palindrome at the end of the string. If this longest suffix palindrome is of length x, the minimum number of characters to add is n - x. E.g. the string aaba's longest suffix substring is aba of length 3, thus our answer is 1. The algorithm to find out if a string is a palindrome takes O(n) time, whether using KMP or the more efficient and simple algorithm (O(n/2)):
Take two pointers, one at the first character and one at the last character
Compare the characters at the pointers, if they're equal, move each pointer inward, otherwise return false
When the pointers point to the same index (odd string length), or have overlapped (even string length), return true
Using the simple algorithm, we start from the entire string and check if it's a palindrome. If it is, we return 0, and if not, we check the string string[1...end], string[2...end] until we have reached a single character and return n - 1. This results in a runtime of O(n^2).
Splitting up the KMP algorithm into
Build table
Search for longest suffix palindrome
Building the table takes O(n) time, and then each check of "are you a palindrome" for each substring from string[0...end], string[1...end], ..., string[end - 2...end] each takes O(n) time. k in this case is the same factor of n that the simple algorithm takes to check each substring, because it starts as k = n, then goes through k = n - 1, k = n - 2... just the same as the simple algorithm did.
TL; DR:
KMP can tell you if a string is a palindrome in O(n) time, but that supply an answer to the question, because you have to check if all substrings string[0...end], string[1...end], ..., string[end - 2...end] are palindromes, resulting in the same (but actually worse) runtime as a simple palindrome-check algorithm.
#include<iostream>
#include<string>
using std::cout;
using std::endl;
using std::cin;
int main() {
std::string word, left("");
cin >> word;
size_t start, end;
for (start = 0, end = word.length()-1; start < end; end--) {
if (word[start] != word[end]) {
left.append(word.begin()+end, 1 + word.begin()+end);
continue;
}
left.append(word.begin()+start, 1 + word.begin()+start), start++;
}
cout << left << ( start == end ? std::string(word.begin()+end, 1 + word.begin()+end) : "" )
<< std::string(left.rbegin(), left.rend()) << endl;
return 0;
}
Don't know if it appends the minimum number, but it produces palindromes
Explained:
We will start at both ends of the given string and iterate inwards towards the center.
At each iteration, we check if each letter is the same, i.e. word[start] == word[end]?.
If they are the same, we append a copy of the variable word[start] to another string called left which as it name suggests will serve as the left hand side of the new palindrome string when iteration is complete. Then we increment both variables (start)++ and (end)-- towards the center
In the case that they are not the same, we append a copy of of the variable word[end] to the same string left
And this is the basics of the algorithm until the loop is done.
When the loop is finished, one last check is done to make sure that if we got an odd length palindrome, we append the middle character to the middle of the new palindrome formed.
Note that if you decide to append the oppoosite characters to the string left, the opposite about everything in the code becomes true; i.e. which index is incremented at each iteration and which is incremented when a match is found, order of printing the palindrome, etc. I don't want to have to go through it again but you can try it and see.
The running complexity of this code should be O(N) assuming that append method of the std::string class runs in constant time.
If some wants to solve this in ruby, The solution can be very simple
str = 'xcbc' # Any string that you want.
arr1 = str.split('')
arr2 = arr1.reverse
count = 0
while(str != str.reverse)
count += 1
arr1.insert(count-1, arr2[count-1])
str = arr1.join('')
end
puts str
puts str.length - arr2.count
I am assuming that you cannot replace or remove any existing characters?
A good start would be reversing one of the strings and finding the longest-common-substring (LCS) between the reversed string and the other string. Since it sounds like this is a homework or interview question, I'll leave the rest up to you.
Here see this solution
This is better than O(N^2)
Problem is sub divided in to many other sub problems
ex:
original "tostotor"
reversed "rototsot"
Here 2nd position is 'o' so dividing in to two problems by breaking in to "t" and "ostot" from the original string
For 't':solution is 1
For 'ostot':solution is 2 because LCS is "tot" and characters need to be added are "os"
so total is 2+1 = 3
def shortPalin( S):
k=0
lis=len(S)
for i in range(len(S)/2):
if S[i]==S[lis-1-i]:
k=k+1
else :break
S=S[k:lis-k]
lis=len(S)
prev=0
w=len(S)
tot=0
for i in range(len(S)):
if i>=w:
break;
elif S[i]==S[lis-1-i]:
tot=tot+lcs(S[prev:i])
prev=i
w=lis-1-i
tot=tot+lcs(S[prev:i])
return tot
def lcs( S):
if (len(S)==1):
return 1
li=len(S)
X=[0 for x in xrange(len(S)+1)]
Y=[0 for l in xrange(len(S)+1)]
for i in range(len(S)-1,-1,-1):
for j in range(len(S)-1,-1,-1):
if S[i]==S[li-1-j]:
X[j]=1+Y[j+1]
else:
X[j]=max(Y[j],X[j+1])
Y=X
return li-X[0]
print shortPalin("tostotor")
Using Recursion
#include <iostream>
using namespace std;
int length( char str[])
{ int l=0;
for( int i=0; str[i]!='\0'; i++, l++);
return l;
}
int palin(char str[],int len)
{ static int cnt;
int s=0;
int e=len-1;
while(s<e){
if(str[s]!=str[e]) {
cnt++;
return palin(str+1,len-1);}
else{
s++;
e--;
}
}
return cnt;
}
int main() {
char str[100];
cin.getline(str,100);
int len = length(str);
cout<<palin(str,len);
}
Solution with O(n) time complexity
public static void main(String[] args) {
String givenStr = "abtb";
String palindromeStr = covertToPalindrome(givenStr);
System.out.println(palindromeStr);
}
private static String covertToPalindrome(String str) {
char[] strArray = str.toCharArray();
int low = 0;
int high = strArray.length - 1;
int subStrIndex = -1;
while (low < high) {
if (strArray[low] == strArray[high]) {
high--;
} else {
high = strArray.length - 1;
subStrIndex = low;
}
low++;
}
return str + (new StringBuilder(str.substring(0, subStrIndex+1))).reverse().toString();
}
// string to append to convert it to a palindrome
public static void main(String args[])
{
String s=input();
System.out.println(min_operations(s));
}
static String min_operations(String str)
{
int i=0;
int j=str.length()-1;
String ans="";
while(i<j)
{
if(str.charAt(i)!=str.charAt(j))
{
ans=ans+str.charAt(i);
}
if(str.charAt(i)==str.charAt(j))
{
j--;
}
i++;
}
StringBuffer sd=new StringBuffer(ans);
sd.reverse();
return (sd.toString());
}

Resources