Related
I am new to ruby. How do I write a function that takes a find method that accepts a lower case and the find should return an array of all the words in dic.json that can be made by rearranging these letters. So if I input "ab" then the output should be ["ab", "ba"] from the JSON file below.
dic.json
[
"ab",
"ba",
"abc",
"acb",
"bac",
"bca",
"cab",
"cba"
]
This is what have so far
I used File read to access the JSON file and I have a function that can find permutations but I am not sure how to connect the two functions.
class LetterLocater
def get_file_contents
return File.read('dictionary.json').split
end
def permutation(letters)
return [''] if letters.empty?
chrs = letters.chars
(0...letters.size).flat_map { |i|
chr, rest = letters[i], letters[0...i] + letters[i+1..-1]
permutation(rest).map { |sub|
chr + sub
}
}
end
end
a = LetterLocater.new
puts a.permutation(gets.chomp) ```
Instead of creating all permutations for various inputs, you could also group the words from the dictionary by sorting their letters:
def sorted(str)
str.chars.sort.join
end
# assume this was read from the JSON file
dict = %w[ab ba abc acb bac bca cab cba]
lookup_hash = dict.group_by { |word| sorted(word) }
#=> {
# "ab" => ["ab", "ba"],
# "abc" => ["abc", "acb", "bac", "bca", "cab", "cba"]
# }
Although this calculation can be quite expensive for larger dictionaries, you only have to do it once. (you could even store the hash to disk and only update when your dictionary changes)
After creating the hash, it's almost trivial to find the permutations. You just have to fetch the values for the sorted input:
input = gets.chomp
puts lookup_hash[sorted(input)]
This will be much faster than generating all permutation each time.
In Ruby there is already Array#permutation that you can use to calculate all possible words.
letters = "ab" # example
permutations = letters.split(//).permutation.map(&:join)
#=> ["ab", "ba"]
And then there is 'Array#&' that returns only elements from an array that are present in another array.
words = ["ab", "ba", "abc", "acb", "bac", "bca", "cab", "cba"]
words & permutations
#=> ["ab", "ba"]
And you can use JSON.load(File.open('dictionary.json')) to load the JSON file into a Ruby array – as Schwern already wrote in his comment.
Now let's combine all these methods into one class
require 'json'
class LetterLocater
attr_reader :words
def initialize(dictionary)
#words = JSON.load(File.open('dictionary.json'))
end
def permutation(letters)
permutations = letters.split(//).permutation.map(&:join)
words & permutations
end
end
ll = LetterLocater.new('dictionary.json')
ll.permutation('ab')
#=> ["ab", "ba"]
ll.permutation('abc')
#=> ["abc", "acb", "bac", "bca", "cab", "cba"]
def find_permutations_in_array(arr, str)
chars = str.chars.sort
arr.inject([]) do |res, word|
res << word if word.size == str.size && word.chars.sort == chars
res
end
end
Assumption
I have assumed that the task is one-off, in the sense that the task is to be performed for a single word and a given dictionary, not for possibly many words and the same dictionary.
Approach
I propose that three tests be used to determine if each dictionary word is formed by permuting the letters in the given word:
reject the dictionary word if its length differs from that of the given word
if the above test fails, reject the dictionary word if it begins with a string s where it has been determined that no word beginning with s can be formed by permuting the letters of the given word
if the above test fails, each character in the dictionary is examined in sequence until a character is found that is not present in the given word or whose count in the dictionary word exceeds the count of the same character in the given word, in which case the dictionary word can be rejected
If none of the above tests succeeds we may conclude that the dictionary word can be formed by permuting the letters of the given word.
Note that, in the third test above, if the dictionary word w was rejected when its character at index i was examined we may conclude that no dictionary word that begins with s = w[0,i] can be formed by permuting the characters of the given word, which is the basis for the second test.
Code
def find_em(word, dict)
word_len = word.length
ltr_cnt = word.each_char.tally
prefix = ''
dict.select do |w|
next false if w.length != word_len ||
(prefix.length > 0 && w.start_with?(prefix))
prefix = ''
lc = ltr_cnt.dup
ltrs_used = 0
w.each_char do |c|
prefix << c
break unless lc.key?(c) && lc[c] > 0
lc[c] -= 1
ltrs_used += 1
end
if ltrs_used == word_len
prefix = ''
true
else
false
end
end
end
Example
word = "aab"
dict = [
"aaa", "aab", "ab", "aba", "abb", "aca", "acad", "acb", "acc",
"acd", "ace", "ba", "bac", "bca", "bcc", "caa", "cab", "cbb"
]
Being a "dictionary", I assume the words of dict are ordered lexicographically, but if that is not the case the first step is to sort the given array words to produce dict.
find_em(word, dict)
#=> ["aab", "aba"]
Explanation
For the example just given the method processes each word in the dictionary in as indicated by the following:
"aaa", "aab", "ab", "aba", "abb", "aca", "acad", "acb", "acc",
c m s m c n s r r
"acd", "ace", "ba", "bac", "bca", "bcc", "caa", "cab", "cbb"
r r s n n r n r r
"aaa" is rejected (signified by "c", for "count"), when the third character is examined, because word contains only two "a"'s (which is why "c" is shown under the third "a")
"aab" is matched (signified by "m") because its letters are a permutation of the letters of word
"ab" is rejected because the length of the word is not the same as the length of word
"aba" is matched
"abb" is rejected when the second "b" is examined because there are too many "b"'s
"aca" is rejected when "c" is examined because word does not contain "c" (indicated by the "n" under the "c")
"acad" is rejected because it is the wrong size
"acb", "acc", "acd", "ace" are all rejected because we know from the processing of "aca" that no words starting with "ac" can match (indicated by "r" for "repeat")
"ba" is the wrong size
"bac" is rejected because word does not contain "c"
"bca" does not begin "bac" so we examine each letter from the start of the word until we find "c", which is not contained in word so we reject the word
"bcc" is rejected because we know from examining "bca" that no words starting with "bc" can match
"caa" is rejected because word does not contain "c"
"cab" and "cbb" are rejected because we know from examining "caa" that no words starting with "c" can match
To better understand details of the calculations one might execute the method after having salted it with puts statements. For example, one might execute the following modification.
def find_em(word, dict)
word_len = word.length
ltr_cnt = word.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
prefix = ''
dict.select do |w|
puts "w=#{w}, prefix=#{prefix}"
next false if w.length != word_len ||
(prefix.length > 0 && w.start_with?(prefix))
prefix = ''
lc = ltr_cnt.dup
ltrs_used = 0
w.each_char do |c|
puts " prefix=#{prefix}, lc=#{lc}"
prefix << c
break unless lc.key?(c) && lc[c] > 0
lc[c] -= 1
ltrs_used += 1
end
puts " for w = #{w} prefix = #{prefix}"
if ltrs_used == word_len
puts " w = '#{w}' a match"
prefix = ''
true
else
puts " w = '#{w} *not* a match"
false
end
end
end
Alternative when word is relatively short
If the number of unique permutations of the letters of word is relatively small we could generate each word that is formed by a unique permutation of the characters in word and then use Array#bsearch to perform a binary search to determine if that word is in the dictionary:
def present?(word, dict)
dict.bsearch { |w| w >= word } == word
end
word.chars.permutation.map(&:join).uniq.select { |w| present?(w, dict) }
#=> ["aab", "aba"]
Given a string S consisting of N lowercase English alphabets. Suppose we have a list L consisting of all non empty substrings of the string S.
Now we need to answer Q queries. For ith query, I need to count the number of ways to choose exactly K equal strings from the list L.
NOTE: For each K we will have different value of K.
To avoid overflow I need to take it modulo 10^9+7.
Example : Let S=ababa and we have 2 Queries. Value of K for each query is :
2
3
Then answer for first query is 7 and for second query its 1.
As List L = {"a", "b", "a", "b", "a", "ab", "ba", "ab", "ba", "aba", "bab", "aba", "abab", "baba", "ababa"}
For Query 1 : There are seven ways to choose two equal strings ("a", "a"), ("a", "a"), ("a", "a"), ("b", "b"), ("ab", "ab"), ("ba", "ba"), ("aba", "aba").
For Query 2 : There is one way to choose three equal strings - ("a", "a", "a").
Now the problem is that N<=5000 and Queries can be 100000. So brute solution won't work. What can be better way to do it.
One way to optimize search would be to reduce the set of possible candidates for pairs.
Basic idea would be: For each substring of length n > 2, that matches the constraints (the substring appears atleast K times in the input-string), 2 substrings of length n - 1 that match requirements must exist. Example: if ("abab" , "abab") is a pair of substrings of the input-string, ("aba" , "aba") and ("bab" , "bab") must aswell be pairs of substrings of the input-string.
This can be used to eliminate candidates for pairs in the following way:
Start with an initial set of substrings of length = 1, such that the set only contains substrings for which atleast K - 1 equal substrings can be found. Extend each of these substrings by adding the next character. Now we can eliminate substrings, for which not enough matches can be found. Repeat this until all substrings are eliminated.
Now from theory to praxis:
This basic datastructure simply represents a substring by it's starting and end-point (inclusive) in the input string.
define substr:
int start , end
A helper method for getting the string represented by a substr:
define getstr:
input: string s , substr sub
return string(s , sub.start , sub.end)
First generate a lookup-table for all characters and their position in the string. The table will be needed later on.
define posMap:
input: string in
output: multimap
multimap pos
for int i in [0 , length(in)]
put(pos , in[i] , i)//store the position of character in[i] in the map
return pos
Another helper-method generates a set of all indices of characters that only appear once in the input-string
define listSingle:
input: multimap pos
output: set
set single
for char c in keys(pos)
if length(get(pos , c)) == 1
add(single , get(get(pos , c) , 0)
return single
A method that creates the initial set of matching pairs. These pairs consist of substrings of length 1. The pairs themself aren't specifed; the algorithm only maps the text of a substring to all occurences. (NOTE: I'm using pair here, though the correct term would be set of length K)
define listSinglePairs:
input: multimap pos
output: multimap
multimap result
for char key in keys(pos)
list ind = get(pos , key)
if length(ind) < 2
continue
string k = toString(key)
for int i in ind
put(result , k , substr(i , i))
return result
Furthermore a method is required to list all substrings that contain the same string as a given string:
define matches:
input: string in , substr sub , multimap charmap
output: list
list result
string txt = getstr(in , sub)
list candidates = get(charmap , txt[0])
for int i in [1 , length(txt)[
//increment all elements in candidates
for int c in [0 , size(candidates)[
replace(candidates , c , get(candidates , c) + 1)
list next = get(charmap , txt[i])
//since the indices of all candidates were incremented (index of the previous character in
//in) they now are equal to the indices of the next character in the substring, if it matches
candidates = intersection(candidates , next)
if isEmpty(candidates)
return EMPTY
//candidates now holds the indices of the end of all substrings that
//match the given substring -> convert to list of substr
for int i in candidates
add(result , substr(i - length(txt) , i))
return result
This is the main-routine that does the work:
define listMatches:
input: string in , int K
output: multimap
multimap chars = posMap(in)
set single = listSingle(chars)
multimap clvl = listSinglePairs(chars , K)
multimap result
while NOT isEmpty(clvl)
multimap nextlvl
for string sub in clvl
list pairs = get(clvl , sub)
list tmp
//extend all substrings by one character
//substrings that end in a character that only appears once in the
//input string can be ignored
for substr s in pairs
if s.end + 1 > length(in) OR contains(single , s.end + 1)
continue
add(tmp , substr(s.start , s.end + 1)
//map all substrs to their respective string
while NOT isEmpty(tmp)
substr s = get(tmp , 0)
string txt = getstr(s , in)
list match = matches(in , s , chars)
//this substring doesn't have enough pairs
if size(match) < K
continue
//save all matches as solution and candidates for the next round
for substr m in match
put(result , txt , m)
put(nextlvl , txt , m)
//overwrite candidates for the next round with the given candidates
clvl = nextlvl
return result
NOTE: this algorithm generates a map of all substrings, for which pairs exist to the position of the substrings.
I hope this is comprehensible (i'm horrible in explaining things).
The question:
Given any string, add the least amount of characters possible to make it a palindrome in linear time.
I'm only able to come up with a O(N2) solution.
Can someone help me with an O(N) solution?
Revert the string
Use a modified Knuth-Morris-Pratt to find the latest match (simplest modification would be to just append the original string to the reverted string and ignore matches after len(string).
Append the unmatched rest of the reverted string to the original.
1 and 3 are obviously linear and 2 is linear beacause Knuth-Morris-Pratt is.
If only appending is allowed
A Scala solution:
def isPalindrome(s: String) = s.view.reverse == s.view
def makePalindrome(s: String) =
s + s.take((0 to s.length).find(i => isPalindrome(s.substring(i))).get).reverse
If you're allowed to insert characters anywhere
Every palindrome can be viewed as a set of nested letter pairs.
a n n a b o b
| | | | | * |
| -- | | |
--------- -----
If the palindrome length n is even, we'll have n/2 pairs. If it is odd, we'll have n/2 full pairs and one single letter in the middle (let's call it a degenerated pair).
Let's represent them by pairs of string indexes - the left index counted from the left end of the string, and the right index counted from the right end of the string, both ends starting with index 0.
Now let's write pairs starting from the outer to the inner. So in our example:
anna: (0, 0) (1, 1)
bob: (0, 0) (1, 1)
In order to make any string a palindrome, we will go from both ends of the string one character at a time, and with every step, we'll eventually add a character to produce a correct pair of identical characters.
Example:
Assume the input word is "blob"
Pair (0, 0) is (b, b) ok, nothing to do, this pair is fine. Let's increase the counter.
Pair (1, 1) is (l, o). Doesn't match. So let's add "o" at position 1 from the left. Now our word became "bolob".
Pair (2, 2). We don't need to look even at the characters, because we're pointing at the same index in the string. Done.
Wait a moment, but we have a problem here: in point 2. we arbitrarily chose to add a character on the left. But we could as well add a character "l" on the right. That would produce "blolb", also a valid palindrome. So does it matter? Unfortunately it does because the choice in earlier steps may affect how many pairs we'll have to fix and therefore how many characters we'll have to add in the future steps.
Easy algorithm: search all the possiblities. That would give us a O(2^n) algorithm.
Better algorithm: use Dynamic Programming approach and prune the search space.
In order to keep things simpler, now we decouple inserting of new characters from just finding the right sequence of nested pairs (outer to inner) and fixing their alignment later. So for the word "blob" we have the following possibilities, both ending with a degenerated pair:
(0, 0) (1, 2)
(0, 0) (2, 1)
The more such pairs we find, the less characters we will have to add to fix the original string. Every full pair found gives us two characters we can reuse. Every degenerated pair gives us one character to reuse.
The main loop of the algorithm will iteratively evaluate pair sequences in such a way, that in step 1 all valid pair sequences of length 1 are found. The next step will evaluate sequences of length 2, the third sequences of length 3 etc. When at some step we find no possibilities, this means the previous step contains the solution with the highest number of pairs.
After each step, we will remove the pareto-suboptimal sequences. A sequence is suboptimal compared to another sequence of the same length, if its last pair is dominated by the last pair of the other sequence. E.g. sequence (0, 0)(1, 3) is worse than (0, 0)(1, 2). The latter gives us more room to find nested pairs and we're guaranteed to find at least all the pairs that we'd find for the former. However sequence (0, 0)(1, 2) is neither worse nor better than (0, 0)(2, 1). The one minor detail we have to beware of is that a sequence ending with a degenerated pair is always worse than a sequence ending with a full pair.
After bringing it all together:
def makePalindrome(str: String): String = {
/** Finds the pareto-minimum subset of a set of points (here pair of indices).
* Could be done in linear time, without sorting, but O(n log n) is not that bad ;) */
def paretoMin(points: Iterable[(Int, Int)]): List[(Int, Int)] = {
val sorted = points.toSeq.sortBy(identity)
(List.empty[(Int, Int)] /: sorted) { (result, e) =>
if (result.isEmpty || e._2 <= result.head._2)
e :: result
else
result
}
}
/** Find all pairs directly nested within a given pair.
* For performance reasons tries to not include suboptimal pairs (pairs nested in any of the pairs also in the result)
* although it wouldn't break anything as prune takes care of this. */
def pairs(left: Int, right: Int): Iterable[(Int, Int)] = {
val builder = List.newBuilder[(Int, Int)]
var rightMax = str.length
for (i <- left until (str.length - right)) {
rightMax = math.min(str.length - left, rightMax)
val subPairs =
for (j <- right until rightMax if str(i) == str(str.length - j - 1)) yield (i, j)
subPairs.headOption match {
case Some((a, b)) => rightMax = b; builder += ((a, b))
case None =>
}
}
builder.result()
}
/** Builds sequences of size n+1 from sequence of size n */
def extend(path: List[(Int, Int)]): Iterable[List[(Int, Int)]] =
for (p <- pairs(path.head._1 + 1, path.head._2 + 1)) yield p :: path
/** Whether full or degenerated. Full-pairs save us 2 characters, degenerated save us only 1. */
def isFullPair(pair: (Int, Int)) =
pair._1 + pair._2 < str.length - 1
/** Removes pareto-suboptimal sequences */
def prune(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val allowedHeads = paretoMin(sequences.map(_.head)).toSet
val containsFullPair = allowedHeads.exists(isFullPair)
sequences.filter(s => allowedHeads.contains(s.head) && (isFullPair(s.head) || !containsFullPair))
}
/** Dynamic-Programming step */
#tailrec
def search(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val nextStage = prune(sequences.flatMap(extend))
nextStage match {
case List() => sequences
case x => search(nextStage)
}
}
/** Converts a sequence of nested pairs to a palindrome */
def sequenceToString(sequence: List[(Int, Int)]): String = {
val lStr = str
val rStr = str.reverse
val half =
(for (List(start, end) <- sequence.reverse.sliding(2)) yield
lStr.substring(start._1 + 1, end._1) + rStr.substring(start._2 + 1, end._2) + lStr(end._1)).mkString
if (isFullPair(sequence.head))
half + half.reverse
else
half + half.reverse.substring(1)
}
sequenceToString(search(List(List((-1, -1)))).head)
}
Note: The code does not list all the palindromes, but gives only one example, and it is guaranteed it has the minimum length. There usually are more palindromes possible with the same minimum length (O(2^n) worst case, so you probably don't want to enumerate them all).
O(n) time solution.
Algorithm:
Need to find the longest palindrome within the given string that contains the last character. Then add all the character that are not part of the palindrome to the back of the string in reverse order.
Key point:
In this problem, the longest palindrome in the given string MUST contain the last character.
ex:
input: abacac
output: abacacaba
Here the longest palindrome in the input that contains the last letter is "cac". Therefore add all the letter before "cac" to the back in reverse order to make the entire string a palindrome.
written in c# with a few test cases commented out
static public void makePalindrome()
{
//string word = "aababaa";
//string word = "abacbaa";
//string word = "abcbd";
//string word = "abacac";
//string word = "aBxyxBxBxyxB";
//string word = "Malayal";
string word = "abccadac";
int j = word.Length - 1;
int mark = j;
bool found = false;
for (int i = 0; i < j; i++)
{
char cI = word[i];
char cJ = word[j];
if (cI == cJ)
{
found = true;
j--;
if(mark > i)
mark = i;
}
else
{
if (found)
{
found = false;
i--;
}
j = word.Length - 1;
mark = j;
}
}
for (int i = mark-1; i >=0; i--)
word += word[i];
Console.Write(word);
}
}
Note that this code will give you the solution for least amount of letter to APPEND TO THE BACK to make the string a palindrome. If you want to append to the front, just have a 2nd loop that goes the other way. This will make the algorithm O(n) + O(n) = O(n). If you want a way to insert letters anywhere in the string to make it a palindrome, then this code will not work for that case.
I believe #Chronical's answer is wrong, as it seems to be for best case scenario, not worst case which is used to compute big-O complexity. I welcome the proof, but the "solution" doesn't actually describe a valid answer.
KMP finds a matching substring in O(n * 2k) time, where n is the length of the input string, and k substring we're searching for, but does not in O(n) time tell you what the longest palindrome in the input string is.
To solve this problem, we need to find the longest palindrome at the end of the string. If this longest suffix palindrome is of length x, the minimum number of characters to add is n - x. E.g. the string aaba's longest suffix substring is aba of length 3, thus our answer is 1. The algorithm to find out if a string is a palindrome takes O(n) time, whether using KMP or the more efficient and simple algorithm (O(n/2)):
Take two pointers, one at the first character and one at the last character
Compare the characters at the pointers, if they're equal, move each pointer inward, otherwise return false
When the pointers point to the same index (odd string length), or have overlapped (even string length), return true
Using the simple algorithm, we start from the entire string and check if it's a palindrome. If it is, we return 0, and if not, we check the string string[1...end], string[2...end] until we have reached a single character and return n - 1. This results in a runtime of O(n^2).
Splitting up the KMP algorithm into
Build table
Search for longest suffix palindrome
Building the table takes O(n) time, and then each check of "are you a palindrome" for each substring from string[0...end], string[1...end], ..., string[end - 2...end] each takes O(n) time. k in this case is the same factor of n that the simple algorithm takes to check each substring, because it starts as k = n, then goes through k = n - 1, k = n - 2... just the same as the simple algorithm did.
TL; DR:
KMP can tell you if a string is a palindrome in O(n) time, but that supply an answer to the question, because you have to check if all substrings string[0...end], string[1...end], ..., string[end - 2...end] are palindromes, resulting in the same (but actually worse) runtime as a simple palindrome-check algorithm.
#include<iostream>
#include<string>
using std::cout;
using std::endl;
using std::cin;
int main() {
std::string word, left("");
cin >> word;
size_t start, end;
for (start = 0, end = word.length()-1; start < end; end--) {
if (word[start] != word[end]) {
left.append(word.begin()+end, 1 + word.begin()+end);
continue;
}
left.append(word.begin()+start, 1 + word.begin()+start), start++;
}
cout << left << ( start == end ? std::string(word.begin()+end, 1 + word.begin()+end) : "" )
<< std::string(left.rbegin(), left.rend()) << endl;
return 0;
}
Don't know if it appends the minimum number, but it produces palindromes
Explained:
We will start at both ends of the given string and iterate inwards towards the center.
At each iteration, we check if each letter is the same, i.e. word[start] == word[end]?.
If they are the same, we append a copy of the variable word[start] to another string called left which as it name suggests will serve as the left hand side of the new palindrome string when iteration is complete. Then we increment both variables (start)++ and (end)-- towards the center
In the case that they are not the same, we append a copy of of the variable word[end] to the same string left
And this is the basics of the algorithm until the loop is done.
When the loop is finished, one last check is done to make sure that if we got an odd length palindrome, we append the middle character to the middle of the new palindrome formed.
Note that if you decide to append the oppoosite characters to the string left, the opposite about everything in the code becomes true; i.e. which index is incremented at each iteration and which is incremented when a match is found, order of printing the palindrome, etc. I don't want to have to go through it again but you can try it and see.
The running complexity of this code should be O(N) assuming that append method of the std::string class runs in constant time.
If some wants to solve this in ruby, The solution can be very simple
str = 'xcbc' # Any string that you want.
arr1 = str.split('')
arr2 = arr1.reverse
count = 0
while(str != str.reverse)
count += 1
arr1.insert(count-1, arr2[count-1])
str = arr1.join('')
end
puts str
puts str.length - arr2.count
I am assuming that you cannot replace or remove any existing characters?
A good start would be reversing one of the strings and finding the longest-common-substring (LCS) between the reversed string and the other string. Since it sounds like this is a homework or interview question, I'll leave the rest up to you.
Here see this solution
This is better than O(N^2)
Problem is sub divided in to many other sub problems
ex:
original "tostotor"
reversed "rototsot"
Here 2nd position is 'o' so dividing in to two problems by breaking in to "t" and "ostot" from the original string
For 't':solution is 1
For 'ostot':solution is 2 because LCS is "tot" and characters need to be added are "os"
so total is 2+1 = 3
def shortPalin( S):
k=0
lis=len(S)
for i in range(len(S)/2):
if S[i]==S[lis-1-i]:
k=k+1
else :break
S=S[k:lis-k]
lis=len(S)
prev=0
w=len(S)
tot=0
for i in range(len(S)):
if i>=w:
break;
elif S[i]==S[lis-1-i]:
tot=tot+lcs(S[prev:i])
prev=i
w=lis-1-i
tot=tot+lcs(S[prev:i])
return tot
def lcs( S):
if (len(S)==1):
return 1
li=len(S)
X=[0 for x in xrange(len(S)+1)]
Y=[0 for l in xrange(len(S)+1)]
for i in range(len(S)-1,-1,-1):
for j in range(len(S)-1,-1,-1):
if S[i]==S[li-1-j]:
X[j]=1+Y[j+1]
else:
X[j]=max(Y[j],X[j+1])
Y=X
return li-X[0]
print shortPalin("tostotor")
Using Recursion
#include <iostream>
using namespace std;
int length( char str[])
{ int l=0;
for( int i=0; str[i]!='\0'; i++, l++);
return l;
}
int palin(char str[],int len)
{ static int cnt;
int s=0;
int e=len-1;
while(s<e){
if(str[s]!=str[e]) {
cnt++;
return palin(str+1,len-1);}
else{
s++;
e--;
}
}
return cnt;
}
int main() {
char str[100];
cin.getline(str,100);
int len = length(str);
cout<<palin(str,len);
}
Solution with O(n) time complexity
public static void main(String[] args) {
String givenStr = "abtb";
String palindromeStr = covertToPalindrome(givenStr);
System.out.println(palindromeStr);
}
private static String covertToPalindrome(String str) {
char[] strArray = str.toCharArray();
int low = 0;
int high = strArray.length - 1;
int subStrIndex = -1;
while (low < high) {
if (strArray[low] == strArray[high]) {
high--;
} else {
high = strArray.length - 1;
subStrIndex = low;
}
low++;
}
return str + (new StringBuilder(str.substring(0, subStrIndex+1))).reverse().toString();
}
// string to append to convert it to a palindrome
public static void main(String args[])
{
String s=input();
System.out.println(min_operations(s));
}
static String min_operations(String str)
{
int i=0;
int j=str.length()-1;
String ans="";
while(i<j)
{
if(str.charAt(i)!=str.charAt(j))
{
ans=ans+str.charAt(i);
}
if(str.charAt(i)==str.charAt(j))
{
j--;
}
i++;
}
StringBuffer sd=new StringBuffer(ans);
sd.reverse();
return (sd.toString());
}
Is there any algorithm that can be used to find the most common phrases (or substrings) in a string? For example, the following string would have "hello world" as its most common two-word phrase:
"hello world this is hello world. hello world repeats three times in this string!"
In the string above, the most common string (after the empty string character, which repeats an infinite number of times) would be the space character .
Is there any way to generate a list of common substrings in this string, from most common to least common?
This is as task similar to Nussinov algorithm and actually even simpler as we do not allow any gaps, insertions or mismatches in the alignment.
For the string A having the length N, define a F[-1 .. N, -1 .. N] table and fill in using the following rules:
for i = 0 to N
for j = 0 to N
if i != j
{
if A[i] == A[j]
F[i,j] = F [i-1,j-1] + 1;
else
F[i,j] = 0;
}
For instance, for B A O B A B:
This runs in O(n^2) time. The largest values in the table now point to the end positions of the longest self-matching subquences (i - the end of one occurence, j - another). In the beginning, the array is assumed to be zero-initialized. I have added condition to exclude the diagonal that is the longest but probably not interesting self-match.
Thinking more, this table is symmetric over diagonal so it is enough to compute only half of it. Also, the array is zero initialized so assigning zero is redundant. That remains
for i = 0 to N
for j = i + 1 to N
if A[i] == A[j]
F[i,j] = F [i-1,j-1] + 1;
Shorter but potentially more difficult to understand. The computed table contains all matches, short and long. You can add further filtering as you need.
On the next step, you need to recover strings, following from the non zero cells up and left by diagonal. During this step is also trivial to use some hashmap to count the number of self-similarity matches for the same string. With normal string and normal minimal length only small number of table cells will be processed through this map.
I think that using hashmap directly actually requires O(n^3) as the key strings at the end of access must be compared somehow for equality. This comparison is probably O(n).
Python. This is somewhat quick and dirty, with the data structures doing most of the lifting.
from collections import Counter
accumulator = Counter()
text = 'hello world this is hello world.'
for length in range(1,len(text)+1):
for start in range(len(text) - length):
accumulator[text[start:start+length]] += 1
The Counter structure is a hash-backed dictionary designed for counting how many times you've seen something. Adding to a nonexistent key will create it, while retrieving a nonexistent key will give you zero instead of an error. So all you have to do is iterate over all the substrings.
just pseudo code, and maybe this isn't the most beautiful solution, but I would solve like this:
function separateWords(String incomingString) returns StringArray{
//Code
}
function findMax(Map map) returns String{
//Code
}
function mainAlgorithm(String incomingString) returns String{
StringArray sArr = separateWords(incomingString);
Map<String, Integer> map; //init with no content
for(word: sArr){
Integer count = map.get(word);
if(count == null){
map.put(word,1);
} else {
//remove if neccessary
map.put(word,count++);
}
}
return findMax(map);
}
Where map can contain a key, value pairs like in Java HashMap.
Since for every substring of a String of length >= 2 the text contains at least one substring of length 2 at least as many times, we only need to investigate substrings of length 2.
val s = "hello world this is hello world. hello world repeats three times in this string!"
val li = s.sliding (2, 1).toList
// li: List[String] = List(he, el, ll, lo, "o ", " w", wo, or, rl, ld, "d ", " t", th, hi, is, "s ", " i", is, "s ", " h", he, el, ll, lo, "o ", " w", wo, or, rl, ld, d., ". ", " h", he, el, ll, lo, "o ", " w", wo, or, rl, ld, "d ", " r", re, ep, pe, ea, at, ts, "s ", " t", th, hr, re, ee, "e ", " t", ti, im, me, es, "s ", " i", in, "n ", " t", th, hi, is, "s ", " s", st, tr, ri, in, ng, g!)
val uniques = li.toSet
uniques.toList.map (u => li.count (_ == u))
// res18: List[Int] = List(1, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, 1, 1, 1, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1, 3, 3, 2, 4, 1, 2, 2, 1)
uniques.toList(6)
res19: String = "s "
Perl, O(n²) solution
my $str = "hello world this is hello world. hello world repeats three times in this string!";
my #words = split(/[^a-z]+/i, $str);
my ($display,$ix,$i,%ocur) = 10;
# calculate
for ($ix=0 ; $ix<=$#words ; $ix++) {
for ($i=$ix ; $i<=$#words ; $i++) {
$ocur{ join(':', #words[$ix .. $i]) }++;
}
}
# display
foreach (sort { my $c = $ocur{$b} <=> $ocur{$a} ; return $c ? $c : split(/:/,$b)-split(/:/,$a); } keys %ocur) {
print "$_: $ocur{$_}\n";
last if !--$display;
}
displays the 10 best scores of the most common sub strings (in case of tie, show the longest chain of words first). Change $display to 1 to have only the result.There are n(n+1)/2 iterations.
Suppose I have an alphabet of 'abcd' and a maximum string length of 3. This gives me 85 possible strings, including the empty string. What I would like to do is map an integer in the range [0,85) to a string in my string space without using a lookup table. Something like this:
0 => ''
1 => 'a'
...
4 => 'd'
5 => 'aa'
6 => 'ab'
...
84 => 'ddd'
This is simple enough to do if the string is fixed length using this pseudocode algorithm:
str = ''
for i in 0..maxLen do
str += alphabet[i % alphabet.length]
i /= alphabet.length
done
I can't figure out a good, efficient way of doing it though when the length of the string could be anywhere in the range [0,3). This is going to be running in a tight loop with random inputs so I would like to avoid any unnecessary branching or lookups.
Shift your index by one and ignore the empty string temporarily. So you'd map 0 -> "a", ..., 83 -> "ddd".
Then the mapping is
n -> base-4-encode(n - number of shorter strings)
With 26 symbols, that's the Excel-column-numbering scheme.
With s symbols, there are s + s^2 + ... + s^l nonempty strings of length at most l. Leaving aside the trivial case s = 1, that sum is (a partial sum of a geometric series) s*(s^l - 1)/(s-1).
So, given n, find the largest l such that s*(s^l - 1)/(s-1) <= n, i.e.
l = floor(log((s-1)*n/s + 1) / log(s))
Then let m = n - s*(s^l - 1)/(s-1) and encode m as an l+1-symbol string in base s ('a' ~> 0, 'b' ~> 1, ...).
For the problem including the empty string, map 0 to the empty string and for n > 0 encode n-1 as above.
In Haskell
encode cs n = reverse $ encode' n where
len = length cs
encode' 0 = ""
encode' n = (cs !! ((n-1) `mod` len)) : encode' ((n-1) `div` len)
Check:
*Main> map (encode "abcd") [0..84] ["","a","b","c","d","aa","ab","ac","ad","ba","bb","bc","bd","ca","cb","cc","cd","da","db","dc","dd","aaa","aab","aac","aad","aba","abb","abc","abd","aca","acb","acc","acd","ada","adb","adc","add","baa","bab","bac","bad","bba","bbb","bbc","bbd","bca","bcb","bcc","bcd","bda","bdb","bdc","bdd","caa","cab","cac","cad","cba","cbb","cbc","cbd","cca","ccb","ccc","ccd","cda","cdb","cdc","cdd","daa","dab","dac","dad","dba","dbb","dbc","dbd","dca","dcb","dcc","dcd","dda","ddb","ddc","ddd"]
Figure out the number of strings for each length: N0, N1, N2 & N3 (actually, you won't need N3). Then, use those values to partition your space of integers: 0..N0-1 are length 0, N0..N0+N1-1 are length 1, etc. Within each partition, you can use your fixed-length algorithm.
At worst, you've greatly reduced the size of your lookup table.
Here is a C# solution:
static string F(int x, int alphabetSize)
{
string ret = "";
while (x > 0)
{
x--;
ret = (char)('a' + (x % alphabetSize)) + ret;
x /= alphabetSize;
}
return ret;
}
If you want to optimize this further, you may want to do something to avoid the string concatenations. For example, you could store the result into a preallocated char[] array.