How can I compute the number of characters required to turn a string into a palindrome? - algorithm

I recently found a contest problem that asks you to compute the minimum number of characters that must be inserted (anywhere) in a string to turn it into a palindrome.
For example, given the string: "abcbd" we can turn it into a palindrome by inserting just two characters: one after "a" and another after "d": "adbcbda".
This seems to be a generalization of a similar problem that asks for the same thing, except characters can only be added at the end - this has a pretty simple solution in O(N) using hash tables.
I have been trying to modify the Levenshtein distance algorithm to solve this problem, but haven't been successful. Any help on how to solve this (it doesn't necessarily have to be efficient, I'm just interested in any DP solution) would be appreciated.

Note: This is just a curiosity. Dav proposed an algorithm which can be modified to DP algorithm to run in O(n^2) time and O(n^2) space easily (and perhaps O(n) with better bookkeeping).
Of course, this 'naive' algorithm might actually come in handy if you decide to change the allowed operations.
Here is a 'naive'ish algorithm, which can probably be made faster with clever bookkeeping.
Given a string, we guess the middle of the resulting palindrome and then try to compute the number of inserts required to make the string a palindrome around that middle.
If the string is of length n, there are 2n+1 possible middles (Each character, between two characters, just before and just after the string).
Suppose we consider a middle which gives us two strings L and R (one to left and one to right).
If we are using inserts, I believe the Longest Common Subsequence algorithm (which is a DP algorithm) can now be used the create a 'super' string which contains both L and reverse of R, see Shortest common supersequence.
Pick the middle which gives you the smallest number inserts.
This is O(n^3) I believe. (Note: I haven't tried proving that it is true).

My C# solution looks for repeated characters in a string and uses them to reduce the number of insertions. In a word like program, I use the 'r' characters as a boundary. Inside of the 'r's, I make that a palindrome (recursively). Outside of the 'r's, I mirror the characters on the left and the right.
Some inputs have more than one shortest output: output can be toutptuot or outuputuo. My solution only selects one of the possibilities.
Some example runs:
radar -> radar, 0 insertions
esystem -> metsystem, 2 insertions
message -> megassagem, 3 insertions
stackexchange -> stegnahckexekchangets, 8 insertions
First I need to check if an input is already a palindrome:
public static bool IsPalindrome(string str)
{
for (int left = 0, right = str.Length - 1; left < right; left++, right--)
{
if (str[left] != str[right])
return false;
}
return true;
}
Then I need to find any repeated characters in the input. There may be more than one. The word message has two most-repeated characters ('e' and 's'):
private static bool TryFindMostRepeatedChar(string str, out List<char> chs)
{
chs = new List<char>();
int maxCount = 1;
var dict = new Dictionary<char, int>();
foreach (var item in str)
{
int temp;
if (dict.TryGetValue(item, out temp))
{
dict[item] = temp + 1;
maxCount = temp + 1;
}
else
dict.Add(item, 1);
}
foreach (var item in dict)
{
if (item.Value == maxCount)
chs.Add(item.Key);
}
return maxCount > 1;
}
My algorithm is here:
public static string MakePalindrome(string str)
{
List<char> repeatedList;
if (string.IsNullOrWhiteSpace(str) || IsPalindrome(str))
{
return str;
}
//If an input has repeated characters,
// use them to reduce the number of insertions
else if (TryFindMostRepeatedChar(str, out repeatedList))
{
string shortestResult = null;
foreach (var ch in repeatedList) //"program" -> { 'r' }
{
//find boundaries
int iLeft = str.IndexOf(ch); // "program" -> 1
int iRight = str.LastIndexOf(ch); // "program" -> 4
//make a palindrome of the inside chars
string inside = str.Substring(iLeft + 1, iRight - iLeft - 1); // "program" -> "og"
string insidePal = MakePalindrome(inside); // "og" -> "ogo"
string right = str.Substring(iRight + 1); // "program" -> "am"
string rightRev = Reverse(right); // "program" -> "ma"
string left = str.Substring(0, iLeft); // "program" -> "p"
string leftRev = Reverse(left); // "p" -> "p"
//Shave off extra chars in rightRev and leftRev
// When input = "message", this loop converts "meegassageem" to "megassagem",
// ("ee" to "e"), as long as the extra 'e' is an inserted char
while (left.Length > 0 && rightRev.Length > 0 &&
left[left.Length - 1] == rightRev[0])
{
rightRev = rightRev.Substring(1);
leftRev = leftRev.Substring(1);
}
//piece together the result
string result = left + rightRev + ch + insidePal + ch + right + leftRev;
//find the shortest result for inputs that have multiple repeated characters
if (shortestResult == null || result.Length < shortestResult.Length)
shortestResult = result;
}
return shortestResult;
}
else
{
//For inputs that have no repeated characters,
// just mirror the characters using the last character as the pivot.
for (int i = str.Length - 2; i >= 0; i--)
{
str += str[i];
}
return str;
}
}
Note that you need a Reverse function:
public static string Reverse(string str)
{
string result = "";
for (int i = str.Length - 1; i >= 0; i--)
{
result += str[i];
}
return result;
}

C# Recursive solution adding to the end of the string:
There are 2 base cases. When length is 1 or 2. Recursive case: If the extremes are equal, then
make palindrome the inner string without the extremes and return that with the extremes.
If the extremes are not equal, then add the first character to the end and make palindrome the
inner string including the previous last character. return that.
public static string ConvertToPalindrome(string str) // By only adding characters at the end
{
if (str.Length == 1) return str; // base case 1
if (str.Length == 2 && str[0] == str[1]) return str; // base case 2
else
{
if (str[0] == str[str.Length - 1]) // keep the extremes and call
return str[0] + ConvertToPalindrome(str.Substring(1, str.Length - 2)) + str[str.Length - 1];
else //Add the first character at the end and call
return str[0] + ConvertToPalindrome(str.Substring(1, str.Length - 1)) + str[0];
}
}

Related

Count the occurrence of a word in an incoming stream of characters

I was asked this question during an interview and although I am good in DS&Algo but this one I was not able to solve. This is an interesting question anyways so posting it.
Problem: You have an incoming stream of characters and you need to count the occurrences of a word. You have only one API to read from the stream which is stream.next_char() which return "\0" if there is none.
int count_occurrences(Stream stream, String word) {
// you have only one function provided from Stream class that you can use to
// read one char at a time, no length/size etc.
// stream.next_char() - return "\0" if end
}
Input: "aabckjhabcc"
Word: "abc"
Output: 2
I think this question is related to the idea of
matching strings using a finite state model of computation.
This question can be solved by using the KMP string
matching algorithm.
The KMP algorithm tries to find matches in a text string of a pattern
string, by considering how much prefix of pattern
is still matched even if we find a mismatch at some point.
For determining "how much prefix can still be matched" if
we encounter a mismatch after matching upto index i in pattern,
a failure function is built in advance. (Please refer the below code
for building of failure function values)
This failure function will tell for each index i of pattern,
how much prefix of the pattern can still be matched even if
we encounter a mismatch after index i.
The idea is to figure out what is the length of longest proper prefix of pattern
that is also a suffix of it for each substring of pattern denoted by 1 to i
indices where i ranges from 1 to n.
I am using string indices to start from 1.
Thus the failure function value for first character of any pattern
is 0. (i.e. no character has been matched so far).
For subsequent characters, for each index i = 2 to n, we see what
is the length of the longest
proper prefix of the substring of pattern[1...i] which is also
a suffix of that substring of pattern[1...i].
Suppose our pattern is "aac", then the failure function value for
index 1 is 0 (no match yet), and the failure function value
for index 2 is 1, ( the length of the longest proper prefix which is same as
longest proper suffix for "aa" is 1)
For pattern "ababac" the failure function value for index 1 is 0,
index 2 is 0, index 3 is 1 (as the third index 'a' is the same as
first index 'a'), for index 4 is 2 (as "ab" at indices 1 and 2 are same
as "ab" at indices 3 and 4) , and for index 5 is 3 ("aba" at indices[1...3]
is the same as "aba" at indices [3...5] ). For index 6, the failure function value is 0.
Here is the code (C++) for building the failure function and matching
a text (or stream) using it :
/* Assuming that string indices start from 1 for both pattern and text. */
/* Firstly build the failure function. */
int s = 1;
int t = 0;
/* n denotes the length of the pattern */
int *f = new int[n+1];
f[1] = 0;
for (s = 1; s < n; s++) {
while (t > 0 && pattern[t + 1] != pattern[s + 1]) {
t = f[t];
}
if (pattern[t + 1] == pattern[s + 1]) {
t++;
f[s + 1] = t;
}
else {
f[s + 1] = 0;
}
}
/* Now start reading characters from the stream */
int count = 0;
char current_char = stream.next_char();
/* s denotes the index of pattern upto which we have found match in text */
/* initially its 0 i.e. no character has been matched yet. */
s = 0;
while (current_char != '\0') {
/* IF previously, we had matched upto a certain number of
characters, and then failed, we return s to the point
which is the longest prefix that still might be matched.
(spaces between string are just for clarity)
For e.g., if pattern is "a b a b a a"
& its failure returning index is "0 0 1 2 3 1"
and we encounter
the text like : "x y z a b a b a b a a"
indices : 1 2 3 4 5 6 7 8 9 10 11
after matching the substring "a b a b a", starting at
index 4 of text, we are successful upto index 8 but we fail
at index 9, the next character at index 9 of text is 'b'
but in our pattern which should have been 'a'.Thus, the index
in pattern which has been matched till now is 5 ( a b a b a)
1 2 3 4 5
Now, we see that the failure returning index at index 5 of
pattern is 3, which means that the text is still matched upto
index 3 of pattern (a b a), not from the initial starting
index 4 of text, but starting from index 6 of text.
Thus we continue searching assuming that our next starting index
in text is 6, eventually finding the match starting from index 6
upto index 11.
*/
while (s > 0 && current_char != pattern[s + 1]) {
s = f[s];
}
if (current_char == pattern[s + 1]) s++; /* We test the next character after the currently
matched portion of pattern with the current
character of text , if it matches, we increase
the size of our matched portion by 1*/
if (s == n) {
count++;
}
current_char = stream.next_char();
}
printf("Count is %d\n", count);
`
Note: This method will be helpful in finding out counts even in overlapping pattern occurrences. For example, the word "aba" as two occurrences
in the stream "ababa".
the simplest solution is to use buffer with up-to word.length() symbols:
static int count_occurrences(final Stream stream, final String word) {
int found = 0;
char c;
String tmpWord = "";
while ((c = stream.next_char()) != 0) {
tmpWord += c;
if (tmpWord.length() > word.length()) {
tmpWord = tmpWord.substring(1);
}
if (tmpWord.equals(word)) {
found++;
}
}
return found;
}
complexity is O(N*M), memory O(M)
Maybe something like this?
int count_occurrences(Stream stream, String word) {
// you have only one function provided from Stream class that you can use to
// read one char at a time, no length/size etc.
// stream.next_char() - return "\0" if end
List<int> positions = new List<int>();
int counter = 0;
while (true) {
char ch = stream.next_char();
if (ch == '\0') return counter;
if (ch == word.charAt(0)) {
positions.add(0);
}
int i = 0;
while (i < positions.length) {
int pos = positions[i];
if (word.charAt(pos) != ch) {
positions.remove(i);
continue;
}
pos++;
if (pos == word.length()) {
positions.remove(i);
counter++;
continue;
}
positions[i] = pos;
i++;
}
}
}
int count_occurrences(Stream stream, String word) {
int occurrence = 0;
int current_index = 0;
int word_length = word.length();
char[] word_chars = word.toCharArray();
char c = stream.next_char();
while(c != '\0') {
if( c == word_chars[current_index] ) {
current_index++;
if(current_index >= word_length) {
occurrence++;
current_index = 0;
}
}
else {
current_index = 0;
if( c == word_chars[current_index] ) {
current_index++;
}
}
c = stream.next_char();
}
return occurrence;
}
What they were looking for (probably) is either Rabin-Karp or Knuth-Morris-Pratt. Both need a single pass with very small overhead. If the pattern is large they will be obvious wins in terms of speed since the complexity is O(stream_length).
Rabbin-Karp relies on hashes that you can update in O(1) for the next character. Can give you false positives if the hash isn't very good or when the stream is very long (hash collisions).
Knuth-Morris-Pratt reliest on computing the length of the longest prefix that is also a suffix for every position in your pattern. This needs O(n) memory for storing these results but that's it.
Look them up in wikipedia under string pattern matching for more details and implementations.

Check if binary string can be partitioned such that each partition is a power of 5

I recently came across this question - Given a binary string, check if we can partition/split the string into 0..n parts such that each part is a power of 5. Return the minimum number of splits, if it can be done.
Examples would be:
input = "101101" - returns 1, as the string can be split once to form "101" and "101",as 101= 5^1.
input = "1111101" - returns 0, as the string itself is 5^3.
input = "100"- returns -1, as it can't be split into power(s) of 5.
I came up with this recursive algorithm:
Check if the string itself is a power of 5. if yes, return 0
Else, iterate over the string character by character, checking at every point if the number seen so far is a power of 5. If yes, add 1 to split count and check the rest of the string recursively for powers of 5 starting from step 1.
return the minimum number of splits seen so far.
I implemented the above algo in Java. I believe it works alright, but it's a straightforward recursive solution. Can this be solved using dynamic programming to improve the run time?
The code is below:
public int partition(String inp){
if(inp==null || inp.length()==0)
return 0;
return partition(inp,inp.length(),0);
}
public int partition(String inp,int len,int index){
if(len==index)
return 0;
if(isPowerOfFive(inp,index))
return 0;
long sub=0;
int count = Integer.MAX_VALUE;
for(int i=index;i<len;++i){
sub = sub*2 +(inp.charAt(i)-'0');
if(isPowerOfFive(sub))
count = Math.min(count,1+partition(inp,len,i+1));
}
return count;
}
Helper functions:
public boolean isPowerOfFive(String inp,int index){
long sub = 0;
for(int i=index;i<inp.length();++i){
sub = sub*2 +(inp.charAt(i)-'0');
}
return isPowerOfFive(sub);
}
public boolean isPowerOfFive(long val){
if(val==0)
return true;
if(val==1)
return false;
while(val>1){
if(val%5 != 0)
return false;
val = val/5;
}
return true;
}
Here is simple improvements that can be done:
Calculate all powers of 5 before start, so you could do checks faster.
Stop split input string if the number of splits is already greater than in the best split you've already done.
Here is my solution using these ideas:
public static List<String> powers = new ArrayList<String>();
public static int bestSplit = Integer.MAX_VALUE;
public static void main(String[] args) throws Exception {
// input string (5^5, 5^1, 5^10)
String inp = "110000110101101100101010000001011111001";
// calc all powers of 5 that fits in given string
for (int pow = 1; ; ++pow) {
String powStr = Long.toBinaryString((long) Math.pow(5, pow));
if (powStr.length() <= inp.length()) { // can be fit in input string
powers.add(powStr);
} else {
break;
}
}
Collections.reverse(powers); // simple heuristics, sort powers in decreasing order
// do simple recursive split
split(inp, 0, -1);
// print result
if (bestSplit == Integer.MAX_VALUE) {
System.out.println(-1);
} else {
System.out.println(bestSplit);
}
}
public static void split(String inp, int start, int depth) {
if (depth >= bestSplit) {
return; // can't do better split
}
if (start == inp.length()) { // perfect split
bestSplit = depth;
return;
}
for (String pow : powers) {
if (inp.startsWith(pow, start)) {
split(inp, start + pow.length(), depth + 1);
}
}
}
EDIT:
I also found another approach which looks like very fast one.
Calculate all powers of 5 whose string representation is shorter than input string. Save those strings in powers array.
For every string power from powers array: if power is substring of input then save its start and end indexes into the edges array (array of tuples).
Now we just need to find shortest path from index 0 to index input.length() by edges from the edges array. Every edge has the same weight, so the shortest path can be found very fast with BFS.
The number of edges in the shortest path found is exactly what you need -- minimum number of splits of the input string.
Instead of calculating all possible substrings, you can check the binary representation of the powers of 5 in search of a common pattern. Using something like:
bc <<< "obase=2; for(i = 1; i < 40; i++) 5^i"
You get:
51 = 1012
52 = 110012
53 = 11111012
54 = 10011100012
55 = 1100001101012
56 = 111101000010012
57 = 100110001001011012
58 = 10111110101111000012
59 = 1110111001101011001012
510 = 1001010100000010111110012
511 = 101110100100001110110111012
512 = 11101000110101001010010100012
513 = 10010001100001001110011100101012
514 = 1011010111100110001000001111010012
515 = 111000110101111110101001001100011012
516 = 100011100001101111001001101111110000012
517 = 10110001101000101011110000101110110001012
518 = 1101111000001011011010110011101001110110012
...
529 = 101000011000111100000111110101110011011010111001000010111110010101012
As you can see, odd powers of 5 always ends with 101 and even powers of 5 ends with the pattern 10+1 (where + means one or more occurrences).
You could put your input string in a trie and then iterate over it identifying the 10+1 pattern, once you have a match, evaluate it to check if is not a false positive.
You just have to save the value for a given string in a map. For example having if you have a string ending like this: (each letter may be a string of arbitrary size)
ABCD
You find that part A mod 5 is ok, so you try again for BCD, but find that B mod 5 is also ok, same for C and D as well as CD together. Now you should have the following results cached:
C -> 0
D -> 0
CD -> 0
BCD -> 1 # split B/CD is the best
But you're not finished with ABCD - you find that AB mod 5 is ok, so you check the resulting CD - it's already in the cache and you don't have to process it from the beginning.
In practice you just need to cache answers from partition() - either for the actual string or for the (string, start, length) tuple. Which one is better depends on how many repeating sequences you have and whether it's faster to compare the contents, or just indexes.
Given below is a solution in C++. Using dynamic programming I am considering all the possible splits and saving the best results.
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int isPowerOfFive(ll n)
{
if(n == 0) return 0;
ll temp = (ll)(log(n)/log(5));
ll t = round(pow(5,temp));
if(t == n)
{
return 1;
}
else
{
return 0;
}
}
ll solve(string s)
{
vector<ll> dp(s.length()+1);
for(int i = 1; i <= s.length(); i++)
{
dp[i] = INT_MAX;
for(int j = 1; j <= i; j++)
{
if( s[j-1] == '0')
{
continue;
}
ll num = stoll(s.substr(j-1, i-j+1), nullptr, 2);
if(isPowerOfFive(num))
{
dp[i] = min(dp[i], dp[j-1]+1);
}
}
}
if(dp[s.length()] == INT_MAX)
{
return -1;
}
else
{
return dp[s.length()];
}
}
int main()
{
string s;
cin>>s;
cout<<solve(s);
}

What is the best way to recursively generate all binary strings of length n?

I'm looking for a good (easy to implement, intuitive, etc.) recursive method of generating all binary strings of length n, where 1 <= n <= 35.
I would appreciate ideas for a pseudo-code algorithm (no language-specific tricks).
LE: okay, I did go overboard with the upper limit. My intention was to avoid solutions that use the binary representation of a counter from 1 to 1 << n.
Here's an example of recursion in C++.
vector<string> answer;
void getStrings( string s, int digitsLeft )
{
if( digitsLeft == 0 ) // the length of string is n
answer.push_back( s );
else
{
getStrings( s + "0", digitsLeft - 1 );
getStrings( s + "1", digitsLeft - 1 );
}
}
getStrings( "", n ); // initial call
According to the Divide et Impera paradigm, the problem of generating all binary strings of length n can be splitted in two subproblems: the problem of printing all binary strings of lenght n-1 preceeded by a 0 and the one of printing all binary strings of lenght n-1 preceeded by a 1. So the following pseudocode solves the problem:
generateBinary(length, string)
if(length > 0)
generateBinary(length-1, string + "0")
generateBinary(length-1, string + "1")
else
print(string)
def genBins(n):
"""
generate all the binary strings with n-length
"""
max_int = '0b' + '1' * n
for i in range(0, int(max_int, 2)+1, 1):
yield str(format(i, 'b').zfill(n))
if __name__ == '__main__':
print(list(genBins(5)))
The problem you have can be solved with a Backtracking algorithm.
Pseudo-code for such an algorithm is:
fun(input, n)
if( base_case(input, n) )
//print result
else
//choose from pool of choices
//explorer the rest of choices from what's left
//unchoose
Implementation:
Base case: we want to print our result string when its size is equal to n
Recursive case:
our pool of choices consists of 0 and 1
choosing in this case means take 0 or 1 and add it to the input as last character
explore by recursing, where we pass the new input value from the choose step until base case is reached
un-choosing in this case means remove the last character
function binary(n) {
binaryHelper('', n);
}
function binaryHelper(str, n) {
if (str.length === n) {
//base case
console.log(str); //print string
} else {
for (let bit = 0; bit < 2; bit++) {
str = str + bit; // choose
binaryHelper(str, n); // explore
str = str.slice(0, -1); // un-choose
}
}
}
console.log('Size 2 binary strings:');
binary(2);
console.log('Size 3 binary strings:');
binary(3);
You can re-write the code above like this, where you choose & un-choose by stateless transition from one loop iteration to another. This is less intuitive though.
function binary(n) {
binaryHelper('', n);
}
function binaryHelper(str, n) {
if(str.length === n) {
console.log(str);
} else {
for(let bit = 0; bit < 2; bit++) {
binaryHelper(str+bit, n);
}
}
}
console.log('Size 2 binary strings:');
binary(2);
console.log('Size 3 binary strings:');
binary(3);

Longest common prefix for n string

Given n string of max length m. How can we find the longest common prefix shared by at least two strings among them?
Example: ['flower', 'flow', 'hello', 'fleet']
Answer: fl
I was thinking of building a Trie for all the string and then checking the deepest node (satisfies longest) that branches out to two/more substrings (satisfies commonality). This takes O(n*m) time and space. Is there a better way to do this
Why to use trie(which takes O(mn) time and O(mn) space, just use the basic brute force way. first loop, find the shortest string as minStr, which takes o(n) time, second loop, compare one by one with this minStr, and keep an variable which indicates the rightmost index of minStr, this loop takes O(mn) where m is the shortest length of all strings. The code is like below,
public String longestCommonPrefix(String[] strs) {
if(strs.length==0) return "";
String minStr=strs[0];
for(int i=1;i<strs.length;i++){
if(strs[i].length()<minStr.length())
minStr=strs[i];
}
int end=minStr.length();
for(int i=0;i<strs.length;i++){
int j;
for( j=0;j<end;j++){
if(minStr.charAt(j)!=strs[i].charAt(j))
break;
}
if(j<end)
end=j;
}
return minStr.substring(0,end);
}
there is an O(|S|*n) solution to this problem, using a trie. [n is the number of strings, S is the longest string]
(1) put all strings in a trie
(2) do a DFS in the trie, until you find the first vertex with more than 1 "edge".
(3) the path from the root to the node you found at (2) is the longest common prefix.
There is no possible faster solution then it [in terms of big O notation], at the worst case, all your strings are identical - and you need to read all of them to know it.
I would sort them, which you can do in n lg n time. Then any strings with common prefixes will be right next to eachother. In fact you should be able to keep a pointer of which index you're currently looking at and work your way down for a pretty speedy computation.
As a completely different answer from my other answer...
You can, with one pass, bucket every string based on its first letter.
With another pass you can sort each bucket based on its second later. (This is known as radix sort, which is O(n*m), and O(n) with each pass.) This gives you a baseline prefix of 2.
You can safely remove from your dataset any elements that do not have a prefix of 2.
You can continue the radix sort, removing elements without a shared prefix of p, as p approaches m.
This will give you the same O(n*m) time that the trie approach does, but will always be faster than the trie since the trie must look at every character in every string (as it enters the structure), while this approach is only guaranteed to look at 2 characters per string, at which point it culls much of the dataset.
The worst case is still that every string is identical, which is why it shares the same big O notation, but will be faster in all cases as is guaranteed to use less comparisons since on any "non-worst-case" there are characters that never need to be visited.
public String longestCommonPrefix(String[] strs) {
if (strs == null || strs.length == 0)
return "";
char[] c_list = strs[0].toCharArray();
int len = c_list.length;
int j = 0;
for (int i = 1; i < strs.length; i++) {
for (j = 0; j < len && j < strs[i].length(); j++)
if (c_list[j] != strs[i].charAt(j))
break;
len = j;
}
return new String(c_list).substring(0, len);
}
It happens that the bucket sort (radix sort) described by corsiKa can be extended such that all strings are eventually placed alone in a bucket, and at that point, the LCP for such a lonely string is known. Further, the shustring of each string is also known; it is one longer than is the LCP. The bucket sort is defacto the construction of a suffix array but, only partially so. Those comparisons that are not performed (as described by corsiKa) indeed represent those portions of the suffix strings that are not added to the suffix array. Finally, this method allows for determination of not just the LCP and shustrings, but also one may easily find those subsequences that are not present within the string.
Since the world is obviously begging for an answer in Swift, here's mine ;)
func longestCommonPrefix(strings:[String]) -> String {
var commonPrefix = ""
var indices = strings.map { $0.startIndex}
outerLoop:
while true {
var toMatch: Character = "_"
for (whichString, f) in strings.enumerate() {
let cursor = indices[whichString]
if cursor == f.endIndex { break outerLoop }
indices[whichString] = cursor.successor()
if whichString == 0 { toMatch = f[cursor] }
if toMatch != f[cursor] { break outerLoop }
}
commonPrefix.append(toMatch)
}
return commonPrefix
}
Swift 3 Update:
func longestCommonPrefix(strings:[String]) -> String {
var commonPrefix = ""
var indices = strings.map { $0.startIndex}
outerLoop:
while true {
var toMatch: Character = "_"
for (whichString, f) in strings.enumerated() {
let cursor = indices[whichString]
if cursor == f.endIndex { break outerLoop }
indices[whichString] = f.characters.index(after: cursor)
if whichString == 0 { toMatch = f[cursor] }
if toMatch != f[cursor] { break outerLoop }
}
commonPrefix.append(toMatch)
}
return commonPrefix
}
What's interesting to note:
this runs in O^2, or O(n x m) where n is the number of strings and m
is the length of the shortest one.
this uses the String.Index data type and thus deals with Grapheme Clusters which the Character type represents.
And given the function I needed to write in the first place:
/// Takes an array of Strings representing file system objects absolute
/// paths and turn it into a new array with the minimum number of common
/// ancestors, possibly pushing the root of the tree as many level downwards
/// as necessary
///
/// In other words, we compute the longest common prefix and remove it
func reify(fullPaths:[String]) -> [String] {
let lcp = longestCommonPrefix(fullPaths)
return fullPaths.map {
return $0[lcp.endIndex ..< $0.endIndex]
}
}
here is a minimal unit test:
func testReifySimple() {
let samplePaths:[String] = [
"/root/some/file"
, "/root/some/other/file"
, "/root/another/file"
, "/root/direct.file"
]
let expectedPaths:[String] = [
"some/file"
, "some/other/file"
, "another/file"
, "direct.file"
]
let reified = PathUtilities().reify(samplePaths)
for (index, expected) in expectedPaths.enumerate(){
XCTAssert(expected == reified[index], "failed match, \(expected) != \(reified[index])")
}
}
Perhaps a more intuitive solution. Channel the already found prefix out of earlier iteration as input string to the remaining or next string input. [[[w1, w2], w3], w4]... so on], where [] is supposedly the LCP of two strings.
public String findPrefixBetweenTwo(String A, String B){
String ans = "";
for (int i = 0, j = 0; i < A.length() && j < B.length(); i++, j++){
if (A.charAt(i) != B.charAt(j)){
return i > 0 ? A.substring(0, i) : "";
}
}
// Either of the string is prefix of another one OR they are same.
return (A.length() > B.length()) ? B.substring(0, B.length()) : A.substring(0, A.length());
}
public String longestCommonPrefix(ArrayList<String> A) {
if (A.size() == 1) return A.get(0);
String prefix = A.get(0);
for (int i = 1; i < A.size(); i++){
prefix = findPrefixBetweenTwo(prefix, A.get(i)); // chain the earlier prefix
}
return prefix;
}

Find the first un-repeated character in a string

What is the quickest way to find the first character which only appears once in a string?
It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.
So you can iterate over the characters and append each character to a list the first time you see it, and separately keep a count of how many times you've seen it (in fact the only values that matter for the count is "0", "1" or "more than 1").
When you reach the end of the string you just have to find the first character in the list that has a count of exactly one.
Example code in Python:
def first_non_repeated_character(s):
counts = defaultdict(int)
l = []
for c in s:
counts[c] += 1
if counts[c] == 1:
l.append(c)
for c in l:
if counts[c] == 1:
return c
return None
This runs in O(n).
I see that people have posted some delightful answers below, so I'd like to offer something more in-depth.
An idiomatic solution in Ruby
We can find the first un-repeated character in a string like so:
def first_unrepeated_char string
string.each_char.tally.find { |_, n| n == 1 }.first
end
How does Ruby accomplish this?
Reading Ruby's source
Let's break down the solution and consider what algorithms Ruby uses for each step.
First we call each_char on the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could use each_byte instead.
The each_char method is implemented like so:
rb_str_each_char(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_chars(str, 0);
}
In turn, rb_string_enumerate_chars is implemented as:
rb_str_enumerate_chars(VALUE str, VALUE ary)
{
VALUE orig = str;
long i, len, n;
const char *ptr;
rb_encoding *enc;
str = rb_str_new_frozen(str);
ptr = RSTRING_PTR(str);
len = RSTRING_LEN(str);
enc = rb_enc_get(str);
if (ENC_CODERANGE_CLEAN_P(ENC_CODERANGE(str))) {
for (i = 0; i < len; i += n) {
n = rb_enc_fast_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
else {
for (i = 0; i < len; i += n) {
n = rb_enc_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
RB_GC_GUARD(str);
if (ary)
return ary;
else
return orig;
}
From this we can see that it calls rb_enc_mbclen (or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string as tally consumes the iterator.
Tally is then implemented like so:
static void
tally_up(VALUE hash, VALUE group)
{
VALUE tally = rb_hash_aref(hash, group);
if (NIL_P(tally)) {
tally = INT2FIX(1);
}
else if (FIXNUM_P(tally) && tally < INT2FIX(FIXNUM_MAX)) {
tally += INT2FIX(1) & ~FIXNUM_FLAG;
}
else {
tally = rb_big_plus(tally, INT2FIX(1));
}
rb_hash_aset(hash, group, tally);
}
static VALUE
tally_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, hash))
{
ENUM_WANT_SVALUE();
tally_up(hash, i);
return Qnil;
}
Here, tally_i uses RB_BLOCK_CALL_FUNC_ARGLIST to call repeatedly to tally_up, which updates the tally hash on every iteration.
Rough time & memory analysis
The each_char method doesn't allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When we tally the characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.
Time-wise, tally does a full scan of the string, and calling find to locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.
However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).
However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).
My old accepted answer in Python
You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:
def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False
Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computerâ„¢.
Why not use a heap based data structure such as a minimum priority queue. As you read each character from the string, add it to the queue with a priority based on the location in the string and the number of occurrences so far. You could modify the queue to add priorities on collision so that the priority of a character is the sum of the number appearances of that character. At the end of the loop, the first element in the queue will be the least frequent character in the string and if there are multiple characters with a count == 1, the first element was the first unique character added to the queue.
Here is another fun way to do it. Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
Lots of answers are attempting O(n) but are forgetting the actual costs of inserting and removing from the lists/associative arrays/sets they're using to track.
If you can assume that a char is a single byte, then you use a simple array indexed by the char and keep a count in it. This is truly O(n) because the array accesses are guaranteed O(1), and the final pass over the array to find the first element with 1 is constant time (because the array has a small, fixed size).
If you can't assume that a char is a single byte, then I would propose sorting the string and then doing a single pass checking adjacent values. This would be O(n log n) for the sort plus O(n) for the final pass. So it's effectively O(n log n), which is better than O(n^2). Also, it has virtually no space overhead, which is another problem with many of the answers that are attempting O(n).
Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... counts = Counter(s)
... for c in s:
... if counts[c]==1:
... return c
... return None
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
Refactoring a solution proposed earlier (not having to use extra list/memory). This goes over the string twice. So this takes O(n) too like the original solution.
def first_non_repeated_character(s):
counts = defaultdict(int)
for c in s:
counts[c] += 1
for c in s:
if counts[c] == 1:
return c
return None
The following is a Ruby implementation of finding the first nonrepeated character of a string:
def first_non_repeated_character(string)
string1 = string.split('')
string2 = string.split('')
string1.each do |let1|
counter = 0
string2.each do |let2|
if let1 == let2
counter+=1
end
end
if counter == 1
return let1
break
end
end
end
p first_non_repeated_character('dont doddle in the forest')
And here is a JavaScript implementation of the same style function:
var first_non_repeated_character = function (string) {
var string1 = string.split('');
var string2 = string.split('');
var single_letters = [];
for (var i = 0; i < string1.length; i++) {
var count = 0;
for (var x = 0; x < string2.length; x++) {
if (string1[i] == string2[x]) {
count++
}
}
if (count == 1) {
return string1[i];
}
}
}
console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));
In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.
I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).
unsigned char find_first_unique(unsigned char *string)
{
int chars[256];
int i=0;
memset(chars, 0, sizeof(chars));
while (string[i++])
{
chars[string[i]]++;
}
i = 0;
while (string[i++])
{
if (chars[string[i]] == 1) return string[i];
}
return 0;
}
In Ruby:
(Original Credit: Andrew A. Smith)
x = "a huge string in which some characters repeat"
def first_unique_character(s)
s.each_char.detect { |c| s.count(c) == 1 }
end
first_unique_character(x)
=> "u"
def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in repeated:
... discard it.
else if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False
Other JavaScript solutions are quite c-style solutions here is a more JavaScript-style solution.
var arr = string.split("");
var occurences = {};
var tmp;
var lowestindex = string.length+1;
arr.forEach( function(c){
tmp = c;
if( typeof occurences[tmp] == "undefined")
occurences[tmp] = tmp;
else
occurences[tmp] += tmp;
});
for(var p in occurences) {
if(occurences[p].length == 1)
lowestindex = Math.min(lowestindex, string.indexOf(p));
}
if(lowestindex > string.length)
return null;
return string[lowestindex];
}
in C, this is almost Shlemiel the Painter's Algorithm (not quite O(n!) but more than 0(n2)).
But will outperform "better" algorithms for reasonably sized strings because O is so small. This can also easily tell you the location of the first non-repeating string.
char FirstNonRepeatedChar(char * psz)
{
for (int ii = 0; psz[ii] != 0; ++ii)
{
for (int jj = ii+1; ; ++jj)
{
// if we hit the end of string, then we found a non-repeat character.
//
if (psz[jj] == 0)
return psz[ii]; // this character doesn't repeat
// if we found a repeat character, we can stop looking.
//
if (psz[ii] == psz[jj])
break;
}
}
return 0; // there were no non-repeating characters.
}
edit: this code is assuming you don't mean consecutive repeating characters.
Here's an implementation in Perl (version >=5.10) that doesn't care whether the repeated characters are consecutive or not:
use strict;
use warnings;
foreach my $word(#ARGV)
{
my #distinct_chars;
my %char_counts;
my #chars=split(//,$word);
foreach (#chars)
{
push #distinct_chars,$_ unless $_~~#distinct_chars;
$char_counts{$_}++;
}
my $first_non_repeated="";
foreach(#distinct_chars)
{
if($char_counts{$_}==1)
{
$first_non_repeated=$_;
last;
}
}
if(length($first_non_repeated))
{
print "For \"$word\", the first non-repeated character is '$first_non_repeated'.\n";
}
else
{
print "All characters in \"$word\" are repeated.\n";
}
}
Storing this code in a script (which I named non_repeated.pl) and running it on a few inputs produces:
jmaney> perl non_repeated.pl aabccd "a huge string in which some characters repeat" abcabc
For "aabccd", the first non-repeated character is 'b'.
For "a huge string in which some characters repeat", the first non-repeated character is 'u'.
All characters in "abcabc" are repeated.
Here's a possible solution in ruby without using Array#detect (as in this answer). Using Array#detect makes it too easy, I think.
ALPHABET = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
def fnr(s)
unseen_chars = ALPHABET.dup
seen_once_chars = []
s.each_char do |c|
if unseen_chars.include?(c)
unseen_chars.delete(c)
seen_once_chars << c
elsif seen_once_chars.include?(c)
seen_once_chars.delete(c)
end
end
seen_once_chars.first
end
Seems to work for some simple examples:
fnr "abcdabcegghh"
# => "d"
fnr "abababababababaqababa"
=> "q"
Suggestions and corrections are very much appreciated!
Try this code:
public static String findFirstUnique(String str)
{
String unique = "";
foreach (char ch in str)
{
if (unique.Contains(ch)) unique=unique.Replace(ch.ToString(), "");
else unique += ch.ToString();
}
return unique[0].ToString();
}
In Mathematica one might write this:
string = "conservationist deliberately treasures analytical";
Cases[Gather # Characters # string, {_}, 1, 1][[1]]
{"v"}
This snippet code in JavaScript
var string = "tooth";
var hash = [];
for(var i=0; j=string.length, i<j; i++){
if(hash[string[i]] !== undefined){
hash[string[i]] = hash[string[i]] + 1;
}else{
hash[string[i]] = 1;
}
}
for(i=0; j=string.length, i<j; i++){
if(hash[string[i]] === 1){
console.info( string[i] );
return false;
}
}
// prints "h"
Different approach here.
scan each element in the string and create a count array which stores the repetition count of each element.
Next time again start from first element in the array and print the first occurrence of element with count = 1
C code
-----
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char t_c;
char *t_p = argv[1] ;
char count[128]={'\0'};
char ch;
for(t_c = *(argv[1]); t_c != '\0'; t_c = *(++t_p))
count[t_c]++;
t_p = argv[1];
for(t_c = *t_p; t_c != '\0'; t_c = *(++t_p))
{
if(count[t_c] == 1)
{
printf("Element is %c\n",t_c);
break;
}
}
return 0;
}
input is = aabbcddeef output is = c
char FindUniqueChar(char *a)
{
int i=0;
bool repeat=false;
while(a[i] != '\0')
{
if (a[i] == a[i+1])
{
repeat = true;
}
else
{
if(!repeat)
{
cout<<a[i];
return a[i];
}
repeat=false;
}
i++;
}
return a[i];
}
Here is another approach...we could have a array which will store the count and the index of the first occurrence of the character. After filling up the array we could jst traverse the array and find the MINIMUM index whose count is 1 then return str[index]
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <climits>
using namespace std;
#define No_of_chars 256
//store the count and the index where the char first appear
typedef struct countarray
{
int count;
int index;
}countarray;
//returns the count array
countarray *getcountarray(char *str)
{
countarray *count;
count=new countarray[No_of_chars];
for(int i=0;i<No_of_chars;i++)
{
count[i].count=0;
count[i].index=-1;
}
for(int i=0;*(str+i);i++)
{
(count[*(str+i)].count)++;
if(count[*(str+i)].count==1) //if count==1 then update the index
count[*(str+i)].index=i;
}
return count;
}
char firstnonrepeatingchar(char *str)
{
countarray *array;
array = getcountarray(str);
int result = INT_MAX;
for(int i=0;i<No_of_chars;i++)
{
if(array[i].count==1 && result > array[i].index)
result = array[i].index;
}
delete[] (array);
return (str[result]);
}
int main()
{
char str[] = "geeksforgeeks";
cout<<"First non repeating character is "<<firstnonrepeatingchar(str)<<endl;
return 0;
}
Function:
This c# function uses a HashTable (Dictionary) and have a performance O(2n) worstcase.
private static string FirstNoRepeatingCharacter(string aword)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < aword.Length; i++)
{
if (!dic.ContainsKey(aword.Substring(i, 1)))
dic.Add(aword.Substring(i, 1), 1);
else
dic[aword.Substring(i, 1)]++;
}
foreach (var item in dic)
{
if (item.Value == 1) return item.Key;
}
return string.Empty;
}
Example:
string aword = "TEETER";
Console.WriteLine(FirstNoRepeatingCharacter(aword)); //print: R
I have two strings i.e. 'unique' and 'repeated'. Every character appearing for the first time, gets added to 'unique'. If it is repeated for the second time, it gets removed from 'unique' and added to 'repeated'. This way, we will always have a string of unique characters in 'unique'.
Complexity big O(n)
public void firstUniqueChar(String str){
String unique= "";
String repeated = "";
str = str.toLowerCase();
for(int i=0; i<str.length();i++){
char ch = str.charAt(i);
if(!(repeated.contains(str.subSequence(i, i+1))))
if(unique.contains(str.subSequence(i, i+1))){
unique = unique.replaceAll(Character.toString(ch), "");
repeated = repeated+ch;
}
else
unique = unique+ch;
}
System.out.println(unique.charAt(0));
}
The following code is in C# with complexity of n.
using System;
using System.Linq;
using System.Text;
namespace SomethingDigital
{
class FirstNonRepeatingChar
{
public static void Main()
{
String input = "geeksforgeeksandgeeksquizfor";
char[] str = input.ToCharArray();
bool[] b = new bool[256];
String unique1 = "";
String unique2 = "";
foreach (char ch in str)
{
if (!unique1.Contains(ch))
{
unique1 = unique1 + ch;
unique2 = unique2 + ch;
}
else
{
unique2 = unique2.Replace(ch.ToString(), "");
}
}
if (unique2 != "")
{
Console.WriteLine(unique2[0].ToString());
Console.ReadLine();
}
else
{
Console.WriteLine("No non repeated string");
Console.ReadLine();
}
}
}
}
The following solution is an elegant way to find the first unique character within a string using the new features which have been introduced as part as Java 8. This solution uses the approach of first creating a map to count the number of occurrences of each character. It then uses this map to find the first character which occurs only once. This runs in O(N) time.
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
// Runs in O(N) time and uses lambdas and the stream API from Java 8
// Also, it is only three lines of code!
private static String findFirstUniqueCharacterPerformantWithLambda(String inputString) {
// convert the input string into a list of characters
final List<String> inputCharacters = Arrays.asList(inputString.split(""));
// first, construct a map to count the number of occurrences of each character
final Map<Object, Long> characterCounts = inputCharacters
.stream()
.collect(groupingBy(s -> s, counting()));
// then, find the first unique character by consulting the count map
return inputCharacters
.stream()
.filter(s -> characterCounts.get(s) == 1)
.findFirst()
.orElse(null);
}
Here is one more solution with o(n) time complexity.
public void findUnique(String string) {
ArrayList<Character> uniqueList = new ArrayList<>();
int[] chatArr = new int[128];
for (int i = 0; i < string.length(); i++) {
Character ch = string.charAt(i);
if (chatArr[ch] != -1) {
chatArr[ch] = -1;
uniqueList.add(ch);
} else {
uniqueList.remove(ch);
}
}
if (uniqueList.size() == 0) {
System.out.println("No unique character found!");
} else {
System.out.println("First unique character is :" + uniqueList.get(0));
}
}
I read through the answers, but did not see any like mine, I think this answer is very simple and fast, am I wrong?
def first_unique(s):
repeated = []
while s:
if s[0] not in s[1:] and s[0] not in repeated:
return s[0]
else:
repeated.append(s[0])
s = s[1:]
return None
test
(first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')
Question : First Unique Character of a String
This is the simplest solution.
public class Test4 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
firstUniqCharindex(a);
}
public static void firstUniqCharindex(String a) {
int[] count = new int[256];
for (int i = 0; i < a.length(); i++) {
count[a.charAt(i)]++;
}
int index = -1;
for (int i = 0; i < a.length(); i++) {
if (count[a.charAt(i)] == 1) {
index = i;
break;
} // if
}
System.out.println(index);// output => 8
System.out.println(a.charAt(index)); //output => P
}// end1
}
IN Python :
def firstUniqChar(a):
count = [0] * 256
for i in a: count[ord(i)] += 1
element = ""
for items in a:
if(count[ord(items) ] == 1):
element = items ;
break
return element
a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P
Using Java 8 :
public class Test2 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
Map<Character, Long> map = a.chars()
.mapToObj(
ch -> Character.valueOf((char) ch)
).collect(
Collectors.groupingBy(
Function.identity(),
LinkedHashMap::new,
Collectors.counting()));
System.out.println("MAP => " + map);
// {G=2, i=5, n=2, a=2, P=1, r=1, o=1, t=1, j=1, y=1}
Character chh = map
.entrySet()
.stream()
.filter(entry -> entry.getValue() == 1L)
.map(entry -> entry.getKey())
.findFirst()
.get();
System.out.println("First Non Repeating Character => " + chh);// P
}// main
}
how about using a suffix tree for this case... the first unrepeated character will be first character of longest suffix string with least depth in tree..
Create Two list -
unique list - having only unique character .. UL
non-unique list - having only repeated character -NUL
for(char c in str) {
if(nul.contains(c)){
//do nothing
}else if(ul.contains(c)){
ul.remove(c);
nul.add(c);
}else{
nul.add(c);
}

Resources