Algorithm to find the most common substrings in a string

Algorithm to find the most common substrings in a string - algorithm

Is there any algorithm that can be used to find the most common phrases (or substrings) in a string? For example, the following string would have "hello world" as its most common two-word phrase:
"hello world this is hello world. hello world repeats three times in this string!"
In the string above, the most common string (after the empty string character, which repeats an infinite number of times) would be the space character .
Is there any way to generate a list of common substrings in this string, from most common to least common?

This is as task similar to Nussinov algorithm and actually even simpler as we do not allow any gaps, insertions or mismatches in the alignment.
For the string A having the length N, define a F[-1 .. N, -1 .. N] table and fill in using the following rules:
for i = 0 to N
for j = 0 to N
if i != j
{
if A[i] == A[j]
F[i,j] = F [i-1,j-1] + 1;
else
F[i,j] = 0;
}
For instance, for B A O B A B:
This runs in O(n^2) time. The largest values in the table now point to the end positions of the longest self-matching subquences (i - the end of one occurence, j - another). In the beginning, the array is assumed to be zero-initialized. I have added condition to exclude the diagonal that is the longest but probably not interesting self-match.
Thinking more, this table is symmetric over diagonal so it is enough to compute only half of it. Also, the array is zero initialized so assigning zero is redundant. That remains
for i = 0 to N
for j = i + 1 to N
if A[i] == A[j]
F[i,j] = F [i-1,j-1] + 1;
Shorter but potentially more difficult to understand. The computed table contains all matches, short and long. You can add further filtering as you need.
On the next step, you need to recover strings, following from the non zero cells up and left by diagonal. During this step is also trivial to use some hashmap to count the number of self-similarity matches for the same string. With normal string and normal minimal length only small number of table cells will be processed through this map.
I think that using hashmap directly actually requires O(n^3) as the key strings at the end of access must be compared somehow for equality. This comparison is probably O(n).

Python. This is somewhat quick and dirty, with the data structures doing most of the lifting.
from collections import Counter
accumulator = Counter()
text = 'hello world this is hello world.'
for length in range(1,len(text)+1):
for start in range(len(text) - length):
accumulator[text[start:start+length]] += 1
The Counter structure is a hash-backed dictionary designed for counting how many times you've seen something. Adding to a nonexistent key will create it, while retrieving a nonexistent key will give you zero instead of an error. So all you have to do is iterate over all the substrings.

just pseudo code, and maybe this isn't the most beautiful solution, but I would solve like this:
function separateWords(String incomingString) returns StringArray{
//Code
}
function findMax(Map map) returns String{
//Code
}
function mainAlgorithm(String incomingString) returns String{
StringArray sArr = separateWords(incomingString);
Map<String, Integer> map; //init with no content
for(word: sArr){
Integer count = map.get(word);
if(count == null){
map.put(word,1);
} else {
//remove if neccessary
map.put(word,count++);
}
}
return findMax(map);
}
Where map can contain a key, value pairs like in Java HashMap.

Since for every substring of a String of length >= 2 the text contains at least one substring of length 2 at least as many times, we only need to investigate substrings of length 2.
val s = "hello world this is hello world. hello world repeats three times in this string!"
val li = s.sliding (2, 1).toList
// li: List[String] = List(he, el, ll, lo, "o ", " w", wo, or, rl, ld, "d ", " t", th, hi, is, "s ", " i", is, "s ", " h", he, el, ll, lo, "o ", " w", wo, or, rl, ld, d., ". ", " h", he, el, ll, lo, "o ", " w", wo, or, rl, ld, "d ", " r", re, ep, pe, ea, at, ts, "s ", " t", th, hr, re, ee, "e ", " t", ti, im, me, es, "s ", " i", in, "n ", " t", th, hi, is, "s ", " s", st, tr, ri, in, ng, g!)
val uniques = li.toSet
uniques.toList.map (u => li.count (_ == u))
// res18: List[Int] = List(1, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, 1, 1, 1, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1, 3, 3, 2, 4, 1, 2, 2, 1)
uniques.toList(6)
res19: String = "s "

Perl, O(n²) solution
my $str = "hello world this is hello world. hello world repeats three times in this string!";
my #words = split(/[^a-z]+/i, $str);
my ($display,$ix,$i,%ocur) = 10;
# calculate
for ($ix=0 ; $ix<=$#words ; $ix++) {
for ($i=$ix ; $i<=$#words ; $i++) {
$ocur{ join(':', #words[$ix .. $i]) }++;
}
}
# display
foreach (sort { my $c = $ocur{$b} <=> $ocur{$a} ; return $c ? $c : split(/:/,$b)-split(/:/,$a); } keys %ocur) {
print "$_: $ocur{$_}\n";
last if !--$display;
}
displays the 10 best scores of the most common sub strings (in case of tie, show the longest chain of words first). Change $display to 1 to have only the result.There are n(n+1)/2 iterations.

Related

How to elegantly and imperatively generate the nth string of an alphabet?

Given an alphabet such as: ["a","b","c","d"], the sequence of all strings made up from characters of that alphabet is:
""
"a"
"b"
"c"
"d"
"aa"
"ab"
"ac"
...
Haskell can generate the nth element of that sequence elegantly and recursively:
nth :: Int -> String
nth n = reverse $ alphabet !! n where
alphabet = [""] ++ concatMap (\ str -> map (: str) "abcd") alphabet
But that's inefficient. Using base conversions, you could try generating it imperatively as (using JavaScript just for demonstration):
function nth(n) {
var str = "";
while (n > 0) {
str += String.fromCharCode(97 + n % 4);
n = Math.floor(n / 4);
}
return str;
};
for (var i = 0; i < 64; ++i) {
console.log(nth(i));
}
But that actually generates the following sequence:
""
"b"
"c"
"d"
"ab"
"bb"
"cb"
"db"
"ac"
"bc"
"cc"
"dc"
"ad"
"bd"
"cd"
"dd"
"aab"
Which is not what was desired: notice the missing "a", "aa", "ba", etc. I'm probably missing some simple operation to fixes the imperative implementation, thus, my question is: is there any elegant way to imperatively generate the nth string of an alphabet?

Insert n-- at the beginning of the while loop. If you want the results in short lexicographic order, reverse the string before printing.

Find subarray with given sum

I am trying to implement functional style of finding subarray with given sum.
Code i wrote is not up to functional style. Can someone help to make it more functional.
Problem : Given an unsorted array of nonnegative integers, find a continous subarray which adds to a given number.
Input: arr[] = {1, 4, 20, 3, 10, 5}, sum = 33
Ouptut: Sum found between indexes 2 and 4
Input: arr[] = {1, 4, 0, 0, 3, 10, 5}, sum = 7
Ouptut: Sum found between indexes 1 and 4
I could solve this problem in brute force approach. But looking for more effective functional solution.
val sumList = list.foldLeft(List(0), 0)((l, r) => (l._1 :+ (l._2+r), l._2 + r))._1.drop(1)
//Brute force approach
sumList.zipWithIndex.combinations(2).toList.collectFirst({
case i if i(1)._1 - i(0)._1 == sum => i
}) match {
case Some(List(x, y)) => println("elements which form the given sum are => "+ list.drop(x._2+1).take(y._2-x._2))
case _ => println("couldn't find elements which satisfy the given condition")
}
Algorithm : Initialize a variable curr_sum as first element. curr_sum indicates the sum of current subarray. Start from the second element and add all elements one by one to the curr_sum. If curr_sum becomes equal to sum, then print the solution. If curr_sum exceeds the sum, then remove trailing elemnents while curr_sum is greater than sum.
val list:List[Int] = List(1, 4, 20, 3, 10, 5)
val sum = 33
val (totalSum, start, end, isSumFound) = list.zipWithIndex.drop(1).foldLeft(list.head, 0, 1, false)((l, r) =>
if(!l._4) {
val tempSum = l._1 + r._1
if (tempSum == sum){
(sum, l._2, r._2, true)
} else if(tempSum > sum){
var (curSum, curIndex) = (tempSum, l._2)
while(curSum > sum && curIndex < list.length-1){
curSum = curSum - list(curIndex)
curIndex = l._2 +1
}
(curSum, curIndex, r._2, curSum == sum)
} else {
(tempSum, l._2, r._2, false)
}
}else
l
)
if(isSumFound || totalSum == sum){
println("elements which form the given sum are => "+ list.drop(start+1).take(end-start))
}else{
println("couldn't find elements which satisfy the given condition")
}

val list:List[Int] = List(1, 4, 20, 3, 10, 5)
val sum = 33
A method to return a iterator of sublists, first with the ones that start with the first element, then starting with the second...
def subLists[T](xs:List[T]):Iterator[List[T]] =
if (xs == Nil) Iterator.empty
else xs.inits ++ subLists(xs.tail)
Find the first list with the correct sum
val ol = subLists(list).collectFirst{ case x if x.sum == sum => x}
Then find the index again, and print the result
ol match {
case None => println("No such subsequence")
case Some(l) => val i = list.indexOfSlice(l)
println("Sequence of sum " + sum +
" found between " + i +
" and " + (i + l.length - 1))
}
//> Sequence of sum 33 found between 2 and 4
(you could keep track of the index associated with the sublist when building the iterator, but that seems more trouble than it is worth, and reduces the general usefulness of subLists)
EDIT: Here's a version of the code you posted that's more "functional". But I think my first version is clearer - it's simpler to separate the concerns of generating the sequences from checking their sums
val sumList = list.scanLeft(0){_ + _}
val is = for {i <- 1 to list.length - 1
j <- 0 to i
if sumList(i)-sumList(j) == sum}
yield (j, i-1)
is match {
case Seq() => println("No such subsequence")
case (start, end) +: _ =>
println("Sequence of sum " + sum +
" found between " + start + " and " + end )
}
//> Sequence of sum 33 found between 2 and 4
EDIT2: And here's an O(N) one. "Functional" in that there are no mutable variables, but it's less clear than the others, in my opinion. It's a bit clearer if you just print the results as they are found (no need to carry the rs part of the accumulator between iterations) but that side-effecting way seems less functional, so I return a list of solutions.
val sums = list.scanLeft(0)(_ + _) zipWithIndex
sums.drop(1).foldLeft((sums, List[(Int, Int)]())) {
case ((leftTotal, rs), total) =>
val newL = leftTotal.dropWhile(total._1 - _._1 > target)
if (total._1 - newL.head._1 == target)
(newL, (newL.head._2, total._2 - 1) :: rs)
else (newL, rs)
}._2
//> res0: List[(Int, Int)] = List((2,4))
O(N) because we pass the shortened newL as the next iterations leftTotal, so dropWhile only ever goes through the list once. This one relies on the integers being non-negative (so adding another element cannot reduce the total), the others work with negative integers too.

Add the least amount of characters to make a palindrome

The question:
Given any string, add the least amount of characters possible to make it a palindrome in linear time.
I'm only able to come up with a O(N2) solution.
Can someone help me with an O(N) solution?

Revert the string
Use a modified Knuth-Morris-Pratt to find the latest match (simplest modification would be to just append the original string to the reverted string and ignore matches after len(string).
Append the unmatched rest of the reverted string to the original.
1 and 3 are obviously linear and 2 is linear beacause Knuth-Morris-Pratt is.

If only appending is allowed
A Scala solution:
def isPalindrome(s: String) = s.view.reverse == s.view
def makePalindrome(s: String) =
s + s.take((0 to s.length).find(i => isPalindrome(s.substring(i))).get).reverse
If you're allowed to insert characters anywhere
Every palindrome can be viewed as a set of nested letter pairs.
a n n a b o b
| | | | | * |
| -- | | |
--------- -----
If the palindrome length n is even, we'll have n/2 pairs. If it is odd, we'll have n/2 full pairs and one single letter in the middle (let's call it a degenerated pair).
Let's represent them by pairs of string indexes - the left index counted from the left end of the string, and the right index counted from the right end of the string, both ends starting with index 0.
Now let's write pairs starting from the outer to the inner. So in our example:
anna: (0, 0) (1, 1)
bob: (0, 0) (1, 1)
In order to make any string a palindrome, we will go from both ends of the string one character at a time, and with every step, we'll eventually add a character to produce a correct pair of identical characters.
Example:
Assume the input word is "blob"
Pair (0, 0) is (b, b) ok, nothing to do, this pair is fine. Let's increase the counter.
Pair (1, 1) is (l, o). Doesn't match. So let's add "o" at position 1 from the left. Now our word became "bolob".
Pair (2, 2). We don't need to look even at the characters, because we're pointing at the same index in the string. Done.
Wait a moment, but we have a problem here: in point 2. we arbitrarily chose to add a character on the left. But we could as well add a character "l" on the right. That would produce "blolb", also a valid palindrome. So does it matter? Unfortunately it does because the choice in earlier steps may affect how many pairs we'll have to fix and therefore how many characters we'll have to add in the future steps.
Easy algorithm: search all the possiblities. That would give us a O(2^n) algorithm.
Better algorithm: use Dynamic Programming approach and prune the search space.
In order to keep things simpler, now we decouple inserting of new characters from just finding the right sequence of nested pairs (outer to inner) and fixing their alignment later. So for the word "blob" we have the following possibilities, both ending with a degenerated pair:
(0, 0) (1, 2)
(0, 0) (2, 1)
The more such pairs we find, the less characters we will have to add to fix the original string. Every full pair found gives us two characters we can reuse. Every degenerated pair gives us one character to reuse.
The main loop of the algorithm will iteratively evaluate pair sequences in such a way, that in step 1 all valid pair sequences of length 1 are found. The next step will evaluate sequences of length 2, the third sequences of length 3 etc. When at some step we find no possibilities, this means the previous step contains the solution with the highest number of pairs.
After each step, we will remove the pareto-suboptimal sequences. A sequence is suboptimal compared to another sequence of the same length, if its last pair is dominated by the last pair of the other sequence. E.g. sequence (0, 0)(1, 3) is worse than (0, 0)(1, 2). The latter gives us more room to find nested pairs and we're guaranteed to find at least all the pairs that we'd find for the former. However sequence (0, 0)(1, 2) is neither worse nor better than (0, 0)(2, 1). The one minor detail we have to beware of is that a sequence ending with a degenerated pair is always worse than a sequence ending with a full pair.
After bringing it all together:
def makePalindrome(str: String): String = {
/** Finds the pareto-minimum subset of a set of points (here pair of indices).
* Could be done in linear time, without sorting, but O(n log n) is not that bad ;) */
def paretoMin(points: Iterable[(Int, Int)]): List[(Int, Int)] = {
val sorted = points.toSeq.sortBy(identity)
(List.empty[(Int, Int)] /: sorted) { (result, e) =>
if (result.isEmpty || e._2 <= result.head._2)
e :: result
else
result
}
}
/** Find all pairs directly nested within a given pair.
* For performance reasons tries to not include suboptimal pairs (pairs nested in any of the pairs also in the result)
* although it wouldn't break anything as prune takes care of this. */
def pairs(left: Int, right: Int): Iterable[(Int, Int)] = {
val builder = List.newBuilder[(Int, Int)]
var rightMax = str.length
for (i <- left until (str.length - right)) {
rightMax = math.min(str.length - left, rightMax)
val subPairs =
for (j <- right until rightMax if str(i) == str(str.length - j - 1)) yield (i, j)
subPairs.headOption match {
case Some((a, b)) => rightMax = b; builder += ((a, b))
case None =>
}
}
builder.result()
}
/** Builds sequences of size n+1 from sequence of size n */
def extend(path: List[(Int, Int)]): Iterable[List[(Int, Int)]] =
for (p <- pairs(path.head._1 + 1, path.head._2 + 1)) yield p :: path
/** Whether full or degenerated. Full-pairs save us 2 characters, degenerated save us only 1. */
def isFullPair(pair: (Int, Int)) =
pair._1 + pair._2 < str.length - 1
/** Removes pareto-suboptimal sequences */
def prune(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val allowedHeads = paretoMin(sequences.map(_.head)).toSet
val containsFullPair = allowedHeads.exists(isFullPair)
sequences.filter(s => allowedHeads.contains(s.head) && (isFullPair(s.head) || !containsFullPair))
}
/** Dynamic-Programming step */
#tailrec
def search(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val nextStage = prune(sequences.flatMap(extend))
nextStage match {
case List() => sequences
case x => search(nextStage)
}
}
/** Converts a sequence of nested pairs to a palindrome */
def sequenceToString(sequence: List[(Int, Int)]): String = {
val lStr = str
val rStr = str.reverse
val half =
(for (List(start, end) <- sequence.reverse.sliding(2)) yield
lStr.substring(start._1 + 1, end._1) + rStr.substring(start._2 + 1, end._2) + lStr(end._1)).mkString
if (isFullPair(sequence.head))
half + half.reverse
else
half + half.reverse.substring(1)
}
sequenceToString(search(List(List((-1, -1)))).head)
}
Note: The code does not list all the palindromes, but gives only one example, and it is guaranteed it has the minimum length. There usually are more palindromes possible with the same minimum length (O(2^n) worst case, so you probably don't want to enumerate them all).

O(n) time solution.
Algorithm:
Need to find the longest palindrome within the given string that contains the last character. Then add all the character that are not part of the palindrome to the back of the string in reverse order.
Key point:
In this problem, the longest palindrome in the given string MUST contain the last character.
ex:
input: abacac
output: abacacaba
Here the longest palindrome in the input that contains the last letter is "cac". Therefore add all the letter before "cac" to the back in reverse order to make the entire string a palindrome.
written in c# with a few test cases commented out
static public void makePalindrome()
{
//string word = "aababaa";
//string word = "abacbaa";
//string word = "abcbd";
//string word = "abacac";
//string word = "aBxyxBxBxyxB";
//string word = "Malayal";
string word = "abccadac";
int j = word.Length - 1;
int mark = j;
bool found = false;
for (int i = 0; i < j; i++)
{
char cI = word[i];
char cJ = word[j];
if (cI == cJ)
{
found = true;
j--;
if(mark > i)
mark = i;
}
else
{
if (found)
{
found = false;
i--;
}
j = word.Length - 1;
mark = j;
}
}
for (int i = mark-1; i >=0; i--)
word += word[i];
Console.Write(word);
}
}
Note that this code will give you the solution for least amount of letter to APPEND TO THE BACK to make the string a palindrome. If you want to append to the front, just have a 2nd loop that goes the other way. This will make the algorithm O(n) + O(n) = O(n). If you want a way to insert letters anywhere in the string to make it a palindrome, then this code will not work for that case.

I believe #Chronical's answer is wrong, as it seems to be for best case scenario, not worst case which is used to compute big-O complexity. I welcome the proof, but the "solution" doesn't actually describe a valid answer.
KMP finds a matching substring in O(n * 2k) time, where n is the length of the input string, and k substring we're searching for, but does not in O(n) time tell you what the longest palindrome in the input string is.
To solve this problem, we need to find the longest palindrome at the end of the string. If this longest suffix palindrome is of length x, the minimum number of characters to add is n - x. E.g. the string aaba's longest suffix substring is aba of length 3, thus our answer is 1. The algorithm to find out if a string is a palindrome takes O(n) time, whether using KMP or the more efficient and simple algorithm (O(n/2)):
Take two pointers, one at the first character and one at the last character
Compare the characters at the pointers, if they're equal, move each pointer inward, otherwise return false
When the pointers point to the same index (odd string length), or have overlapped (even string length), return true
Using the simple algorithm, we start from the entire string and check if it's a palindrome. If it is, we return 0, and if not, we check the string string[1...end], string[2...end] until we have reached a single character and return n - 1. This results in a runtime of O(n^2).
Splitting up the KMP algorithm into
Build table
Search for longest suffix palindrome
Building the table takes O(n) time, and then each check of "are you a palindrome" for each substring from string[0...end], string[1...end], ..., string[end - 2...end] each takes O(n) time. k in this case is the same factor of n that the simple algorithm takes to check each substring, because it starts as k = n, then goes through k = n - 1, k = n - 2... just the same as the simple algorithm did.
TL; DR:
KMP can tell you if a string is a palindrome in O(n) time, but that supply an answer to the question, because you have to check if all substrings string[0...end], string[1...end], ..., string[end - 2...end] are palindromes, resulting in the same (but actually worse) runtime as a simple palindrome-check algorithm.

#include<iostream>
#include<string>
using std::cout;
using std::endl;
using std::cin;
int main() {
std::string word, left("");
cin >> word;
size_t start, end;
for (start = 0, end = word.length()-1; start < end; end--) {
if (word[start] != word[end]) {
left.append(word.begin()+end, 1 + word.begin()+end);
continue;
}
left.append(word.begin()+start, 1 + word.begin()+start), start++;
}
cout << left << ( start == end ? std::string(word.begin()+end, 1 + word.begin()+end) : "" )
<< std::string(left.rbegin(), left.rend()) << endl;
return 0;
}
Don't know if it appends the minimum number, but it produces palindromes
Explained:
We will start at both ends of the given string and iterate inwards towards the center.
At each iteration, we check if each letter is the same, i.e. word[start] == word[end]?.
If they are the same, we append a copy of the variable word[start] to another string called left which as it name suggests will serve as the left hand side of the new palindrome string when iteration is complete. Then we increment both variables (start)++ and (end)-- towards the center
In the case that they are not the same, we append a copy of of the variable word[end] to the same string left
And this is the basics of the algorithm until the loop is done.
When the loop is finished, one last check is done to make sure that if we got an odd length palindrome, we append the middle character to the middle of the new palindrome formed.
Note that if you decide to append the oppoosite characters to the string left, the opposite about everything in the code becomes true; i.e. which index is incremented at each iteration and which is incremented when a match is found, order of printing the palindrome, etc. I don't want to have to go through it again but you can try it and see.
The running complexity of this code should be O(N) assuming that append method of the std::string class runs in constant time.

If some wants to solve this in ruby, The solution can be very simple
str = 'xcbc' # Any string that you want.
arr1 = str.split('')
arr2 = arr1.reverse
count = 0
while(str != str.reverse)
count += 1
arr1.insert(count-1, arr2[count-1])
str = arr1.join('')
end
puts str
puts str.length - arr2.count

I am assuming that you cannot replace or remove any existing characters?
A good start would be reversing one of the strings and finding the longest-common-substring (LCS) between the reversed string and the other string. Since it sounds like this is a homework or interview question, I'll leave the rest up to you.

Here see this solution
This is better than O(N^2)
Problem is sub divided in to many other sub problems
ex:
original "tostotor"
reversed "rototsot"
Here 2nd position is 'o' so dividing in to two problems by breaking in to "t" and "ostot" from the original string
For 't':solution is 1
For 'ostot':solution is 2 because LCS is "tot" and characters need to be added are "os"
so total is 2+1 = 3
def shortPalin( S):
k=0
lis=len(S)
for i in range(len(S)/2):
if S[i]==S[lis-1-i]:
k=k+1
else :break
S=S[k:lis-k]
lis=len(S)
prev=0
w=len(S)
tot=0
for i in range(len(S)):
if i>=w:
break;
elif S[i]==S[lis-1-i]:
tot=tot+lcs(S[prev:i])
prev=i
w=lis-1-i
tot=tot+lcs(S[prev:i])
return tot
def lcs( S):
if (len(S)==1):
return 1
li=len(S)
X=[0 for x in xrange(len(S)+1)]
Y=[0 for l in xrange(len(S)+1)]
for i in range(len(S)-1,-1,-1):
for j in range(len(S)-1,-1,-1):
if S[i]==S[li-1-j]:
X[j]=1+Y[j+1]
else:
X[j]=max(Y[j],X[j+1])
Y=X
return li-X[0]
print shortPalin("tostotor")

Using Recursion
#include <iostream>
using namespace std;
int length( char str[])
{ int l=0;
for( int i=0; str[i]!='\0'; i++, l++);
return l;
}
int palin(char str[],int len)
{ static int cnt;
int s=0;
int e=len-1;
while(s<e){
if(str[s]!=str[e]) {
cnt++;
return palin(str+1,len-1);}
else{
s++;
e--;
}
}
return cnt;
}
int main() {
char str[100];
cin.getline(str,100);
int len = length(str);
cout<<palin(str,len);
}

Solution with O(n) time complexity
public static void main(String[] args) {
String givenStr = "abtb";
String palindromeStr = covertToPalindrome(givenStr);
System.out.println(palindromeStr);
}
private static String covertToPalindrome(String str) {
char[] strArray = str.toCharArray();
int low = 0;
int high = strArray.length - 1;
int subStrIndex = -1;
while (low < high) {
if (strArray[low] == strArray[high]) {
high--;
} else {
high = strArray.length - 1;
subStrIndex = low;
}
low++;
}
return str + (new StringBuilder(str.substring(0, subStrIndex+1))).reverse().toString();
}

// string to append to convert it to a palindrome
public static void main(String args[])
{
String s=input();
System.out.println(min_operations(s));
}
static String min_operations(String str)
{
int i=0;
int j=str.length()-1;
String ans="";
while(i<j)
{
if(str.charAt(i)!=str.charAt(j))
{
ans=ans+str.charAt(i);
}
if(str.charAt(i)==str.charAt(j))
{
j--;
}
i++;
}
StringBuffer sd=new StringBuffer(ans);
sd.reverse();
return (sd.toString());
}

Mapping integers to strings in a given string space

Suppose I have an alphabet of 'abcd' and a maximum string length of 3. This gives me 85 possible strings, including the empty string. What I would like to do is map an integer in the range [0,85) to a string in my string space without using a lookup table. Something like this:
0 => ''
1 => 'a'
...
4 => 'd'
5 => 'aa'
6 => 'ab'
...
84 => 'ddd'
This is simple enough to do if the string is fixed length using this pseudocode algorithm:
str = ''
for i in 0..maxLen do
str += alphabet[i % alphabet.length]
i /= alphabet.length
done
I can't figure out a good, efficient way of doing it though when the length of the string could be anywhere in the range [0,3). This is going to be running in a tight loop with random inputs so I would like to avoid any unnecessary branching or lookups.

Shift your index by one and ignore the empty string temporarily. So you'd map 0 -> "a", ..., 83 -> "ddd".
Then the mapping is
n -> base-4-encode(n - number of shorter strings)
With 26 symbols, that's the Excel-column-numbering scheme.
With s symbols, there are s + s^2 + ... + s^l nonempty strings of length at most l. Leaving aside the trivial case s = 1, that sum is (a partial sum of a geometric series) s*(s^l - 1)/(s-1).
So, given n, find the largest l such that s*(s^l - 1)/(s-1) <= n, i.e.
l = floor(log((s-1)*n/s + 1) / log(s))
Then let m = n - s*(s^l - 1)/(s-1) and encode m as an l+1-symbol string in base s ('a' ~> 0, 'b' ~> 1, ...).
For the problem including the empty string, map 0 to the empty string and for n > 0 encode n-1 as above.

In Haskell
encode cs n = reverse $ encode' n where
len = length cs
encode' 0 = ""
encode' n = (cs !! ((n-1) `mod` len)) : encode' ((n-1) `div` len)
Check:
*Main> map (encode "abcd") [0..84] ["","a","b","c","d","aa","ab","ac","ad","ba","bb","bc","bd","ca","cb","cc","cd","da","db","dc","dd","aaa","aab","aac","aad","aba","abb","abc","abd","aca","acb","acc","acd","ada","adb","adc","add","baa","bab","bac","bad","bba","bbb","bbc","bbd","bca","bcb","bcc","bcd","bda","bdb","bdc","bdd","caa","cab","cac","cad","cba","cbb","cbc","cbd","cca","ccb","ccc","ccd","cda","cdb","cdc","cdd","daa","dab","dac","dad","dba","dbb","dbc","dbd","dca","dcb","dcc","dcd","dda","ddb","ddc","ddd"]

Figure out the number of strings for each length: N0, N1, N2 & N3 (actually, you won't need N3). Then, use those values to partition your space of integers: 0..N0-1 are length 0, N0..N0+N1-1 are length 1, etc. Within each partition, you can use your fixed-length algorithm.
At worst, you've greatly reduced the size of your lookup table.

Here is a C# solution:
static string F(int x, int alphabetSize)
{
string ret = "";
while (x > 0)
{
x--;
ret = (char)('a' + (x % alphabetSize)) + ret;
x /= alphabetSize;
}
return ret;
}
If you want to optimize this further, you may want to do something to avoid the string concatenations. For example, you could store the result into a preallocated char[] array.

Need an algorithm to split a series of numbers

After a few busy nights my head isn't working so well, but this needs to be fixed yesterday, so I'm asking the more refreshed community of SO.
I've got a series of numbers. For example:
1, 5, 7, 13, 3, 3, 4, 1, 8, 6, 6, 6
I need to split this series into three parts so the sum of the numbers in all parts is as close as possible. The order of the numbers needs to be maintained, so the first part must consist of the first X numbers, the second - of the next Y numbers, and the third - of whatever is left.
What would be the algorithm to do this?
(Note: the actual problem is to arrange text paragraphs of differing heights into three columns. Paragraphs must maintain order (of course) and they may not be split in half. The columns should be as equal of height as possible.)

First, we'll need to define the goal better:
Suppose the partial sums are A1,A2,A3, We are trying to minimize |A-A1|+|A-A2|+|A-A3|. A is the average: A=(A1+A2+A3)/3.
Therefore, we are trying to minimize |A2+A3-2A1|+|A1+A3-2A2|+|A1+A2-2A3|.
Let S denote the sum (which is constant): S=A1+A2+A3, so A3=S-A1-A2.
We're trying to minimize:
|A2+S-A1-A2-2A1|+|A1+S-A1-A2-2A2|+|A1+A2-2S+2A1+2A2|=|S-3A1|+|S-3A2|+|3A1+SA2-2S|
Denoting this function as f, we can do two loops O(n^2) and keep track of the minimum:
Something like:
for (x=1; x<items; x++)
{
A1= sum(Item[0]..Item[x-1])
for (y=x; y<items; y++)
{
A2= sum(Item[x]..Item[y-1])
calc f, if new minimum found -keep x,y
}
}

find sum and cumulative sum of series.
get a= sum/3
then locate nearest a, 2*a in the cumulative sum which divides your list into three equal parts.

Lets say p is your array of paragraph heights;
int len= p.sum()/3; //it is avarage value
int currlen=0;
int templen=0;
int indexes[2];
int j = 0;
for (i=0;i<p.lenght;i++)
{
currlen = currlen + p[i];
if (currlen>len)
{
if ((currlen-len)<(abs((currlen-p[i])-len))
{ //check which one is closer to avarege val
indexes[j++] = i;
len=(p.sum()-currlen)/2 //optional: count new avearege height from remaining lengths
currlen = 0;
}
else
{
indexes[j++] = i-1;
len=(p.sum()-currlen)/2
currlen = p[i];
}
}
if (j>2)
break;
}
You will get starting index of 2nd and 3rd sequence. Note its kind of pseudo code :)

I believe that this can be solved with a dynamic programming algorithm for line breaking invented by Donald Knuth for use in TeX.

Following Aasmund Eldhuset answer, I previously answerd this question on SO.
Word wrap to X lines instead of maximum width (Least raggedness)
This algo doesn't rely on the max line size but just gives an optimal cut.
I modified it to work with your problem :
L=[1,5,7,13,3,3,4,1,8,6,6,6]
def minragged(words, n=3):
P=2
cumwordwidth = [0]
# cumwordwidth[-1] is the last element
for word in words:
cumwordwidth.append(cumwordwidth[-1] + word)
totalwidth = cumwordwidth[-1] + len(words) - 1 # len(words) - 1 spaces
linewidth = float(totalwidth - (n - 1)) / float(n) # n - 1 line breaks
print "number of words:", len(words)
def cost(i, j):
"""
cost of a line words[i], ..., words[j - 1] (words[i:j])
"""
actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
return (linewidth - float(actuallinewidth)) ** P
"""
printing the reasoning and reversing the return list
"""
F={} # Total cost function
for stage in range(n):
print "------------------------------------"
print "stage :",stage
print "------------------------------------"
print "word i to j in line",stage,"\t\tTotalCost (f(j))"
print "------------------------------------"
if stage==0:
F[stage]=[]
i=0
for j in range(i,len(words)+1):
print "i=",i,"j=",j,"\t\t\t",cost(i,j)
F[stage].append([cost(i,j),0])
elif stage==(n-1):
F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
for i in range(len(words)+1):
j=len(words)
if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]: #calculating min cost (cf f formula)
F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
F[stage][j][1]=i
print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]
else:
F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
for i in range(len(words)+1):
for j in range(i,len(words)+1):
if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]:
F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
F[stage][j][1]=i
print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]
print 'reversing list'
print "------------------------------------"
listWords=[]
a=len(words)
for k in xrange(n-1,0,-1):#reverse loop from n-1 to 1
listWords.append(words[F[k][a][1]:a])
a=F[k][a][1]
listWords.append(words[0:a])
listWords.reverse()
for line in listWords:
print line, '\t\t',sum(line)
return listWords
THe result I get is :
[1, 5, 7, 13] 26
[3, 3, 4, 1, 8] 19
[6, 6, 6] 18
[[1, 5, 7, 13], [3, 3, 4, 1, 8], [6, 6, 6]]
Hope it helps

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Algorithm to find the most common substrings in a string - algorithm

Related

How to elegantly and imperatively generate the nth string of an alphabet?

Find subarray with given sum

Add the least amount of characters to make a palindrome

Mapping integers to strings in a given string space

Need an algorithm to split a series of numbers

Categories

Resources