Find the first un-repeated character in a string - algorithm

What is the quickest way to find the first character which only appears once in a string?

It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.
So you can iterate over the characters and append each character to a list the first time you see it, and separately keep a count of how many times you've seen it (in fact the only values that matter for the count is "0", "1" or "more than 1").
When you reach the end of the string you just have to find the first character in the list that has a count of exactly one.
Example code in Python:
def first_non_repeated_character(s):
counts = defaultdict(int)
l = []
for c in s:
counts[c] += 1
if counts[c] == 1:
l.append(c)
for c in l:
if counts[c] == 1:
return c
return None
This runs in O(n).

I see that people have posted some delightful answers below, so I'd like to offer something more in-depth.
An idiomatic solution in Ruby
We can find the first un-repeated character in a string like so:
def first_unrepeated_char string
string.each_char.tally.find { |_, n| n == 1 }.first
end
How does Ruby accomplish this?
Reading Ruby's source
Let's break down the solution and consider what algorithms Ruby uses for each step.
First we call each_char on the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could use each_byte instead.
The each_char method is implemented like so:
rb_str_each_char(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_chars(str, 0);
}
In turn, rb_string_enumerate_chars is implemented as:
rb_str_enumerate_chars(VALUE str, VALUE ary)
{
VALUE orig = str;
long i, len, n;
const char *ptr;
rb_encoding *enc;
str = rb_str_new_frozen(str);
ptr = RSTRING_PTR(str);
len = RSTRING_LEN(str);
enc = rb_enc_get(str);
if (ENC_CODERANGE_CLEAN_P(ENC_CODERANGE(str))) {
for (i = 0; i < len; i += n) {
n = rb_enc_fast_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
else {
for (i = 0; i < len; i += n) {
n = rb_enc_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
RB_GC_GUARD(str);
if (ary)
return ary;
else
return orig;
}
From this we can see that it calls rb_enc_mbclen (or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string as tally consumes the iterator.
Tally is then implemented like so:
static void
tally_up(VALUE hash, VALUE group)
{
VALUE tally = rb_hash_aref(hash, group);
if (NIL_P(tally)) {
tally = INT2FIX(1);
}
else if (FIXNUM_P(tally) && tally < INT2FIX(FIXNUM_MAX)) {
tally += INT2FIX(1) & ~FIXNUM_FLAG;
}
else {
tally = rb_big_plus(tally, INT2FIX(1));
}
rb_hash_aset(hash, group, tally);
}
static VALUE
tally_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, hash))
{
ENUM_WANT_SVALUE();
tally_up(hash, i);
return Qnil;
}
Here, tally_i uses RB_BLOCK_CALL_FUNC_ARGLIST to call repeatedly to tally_up, which updates the tally hash on every iteration.
Rough time & memory analysis
The each_char method doesn't allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When we tally the characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.
Time-wise, tally does a full scan of the string, and calling find to locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.
However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).
However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).
My old accepted answer in Python
You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:
def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False
Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computer™.

Why not use a heap based data structure such as a minimum priority queue. As you read each character from the string, add it to the queue with a priority based on the location in the string and the number of occurrences so far. You could modify the queue to add priorities on collision so that the priority of a character is the sum of the number appearances of that character. At the end of the loop, the first element in the queue will be the least frequent character in the string and if there are multiple characters with a count == 1, the first element was the first unique character added to the queue.

Here is another fun way to do it. Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

Lots of answers are attempting O(n) but are forgetting the actual costs of inserting and removing from the lists/associative arrays/sets they're using to track.
If you can assume that a char is a single byte, then you use a simple array indexed by the char and keep a count in it. This is truly O(n) because the array accesses are guaranteed O(1), and the final pass over the array to find the first element with 1 is constant time (because the array has a small, fixed size).
If you can't assume that a char is a single byte, then I would propose sorting the string and then doing a single pass checking adjacent values. This would be O(n log n) for the sort plus O(n) for the final pass. So it's effectively O(n log n), which is better than O(n^2). Also, it has virtually no space overhead, which is another problem with many of the answers that are attempting O(n).

Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... counts = Counter(s)
... for c in s:
... if counts[c]==1:
... return c
... return None
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

Refactoring a solution proposed earlier (not having to use extra list/memory). This goes over the string twice. So this takes O(n) too like the original solution.
def first_non_repeated_character(s):
counts = defaultdict(int)
for c in s:
counts[c] += 1
for c in s:
if counts[c] == 1:
return c
return None

The following is a Ruby implementation of finding the first nonrepeated character of a string:
def first_non_repeated_character(string)
string1 = string.split('')
string2 = string.split('')
string1.each do |let1|
counter = 0
string2.each do |let2|
if let1 == let2
counter+=1
end
end
if counter == 1
return let1
break
end
end
end
p first_non_repeated_character('dont doddle in the forest')
And here is a JavaScript implementation of the same style function:
var first_non_repeated_character = function (string) {
var string1 = string.split('');
var string2 = string.split('');
var single_letters = [];
for (var i = 0; i < string1.length; i++) {
var count = 0;
for (var x = 0; x < string2.length; x++) {
if (string1[i] == string2[x]) {
count++
}
}
if (count == 1) {
return string1[i];
}
}
}
console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));
In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.

I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).
unsigned char find_first_unique(unsigned char *string)
{
int chars[256];
int i=0;
memset(chars, 0, sizeof(chars));
while (string[i++])
{
chars[string[i]]++;
}
i = 0;
while (string[i++])
{
if (chars[string[i]] == 1) return string[i];
}
return 0;
}

In Ruby:
(Original Credit: Andrew A. Smith)
x = "a huge string in which some characters repeat"
def first_unique_character(s)
s.each_char.detect { |c| s.count(c) == 1 }
end
first_unique_character(x)
=> "u"

def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in repeated:
... discard it.
else if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False

Other JavaScript solutions are quite c-style solutions here is a more JavaScript-style solution.
var arr = string.split("");
var occurences = {};
var tmp;
var lowestindex = string.length+1;
arr.forEach( function(c){
tmp = c;
if( typeof occurences[tmp] == "undefined")
occurences[tmp] = tmp;
else
occurences[tmp] += tmp;
});
for(var p in occurences) {
if(occurences[p].length == 1)
lowestindex = Math.min(lowestindex, string.indexOf(p));
}
if(lowestindex > string.length)
return null;
return string[lowestindex];
}

in C, this is almost Shlemiel the Painter's Algorithm (not quite O(n!) but more than 0(n2)).
But will outperform "better" algorithms for reasonably sized strings because O is so small. This can also easily tell you the location of the first non-repeating string.
char FirstNonRepeatedChar(char * psz)
{
for (int ii = 0; psz[ii] != 0; ++ii)
{
for (int jj = ii+1; ; ++jj)
{
// if we hit the end of string, then we found a non-repeat character.
//
if (psz[jj] == 0)
return psz[ii]; // this character doesn't repeat
// if we found a repeat character, we can stop looking.
//
if (psz[ii] == psz[jj])
break;
}
}
return 0; // there were no non-repeating characters.
}
edit: this code is assuming you don't mean consecutive repeating characters.

Here's an implementation in Perl (version >=5.10) that doesn't care whether the repeated characters are consecutive or not:
use strict;
use warnings;
foreach my $word(#ARGV)
{
my #distinct_chars;
my %char_counts;
my #chars=split(//,$word);
foreach (#chars)
{
push #distinct_chars,$_ unless $_~~#distinct_chars;
$char_counts{$_}++;
}
my $first_non_repeated="";
foreach(#distinct_chars)
{
if($char_counts{$_}==1)
{
$first_non_repeated=$_;
last;
}
}
if(length($first_non_repeated))
{
print "For \"$word\", the first non-repeated character is '$first_non_repeated'.\n";
}
else
{
print "All characters in \"$word\" are repeated.\n";
}
}
Storing this code in a script (which I named non_repeated.pl) and running it on a few inputs produces:
jmaney> perl non_repeated.pl aabccd "a huge string in which some characters repeat" abcabc
For "aabccd", the first non-repeated character is 'b'.
For "a huge string in which some characters repeat", the first non-repeated character is 'u'.
All characters in "abcabc" are repeated.

Here's a possible solution in ruby without using Array#detect (as in this answer). Using Array#detect makes it too easy, I think.
ALPHABET = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
def fnr(s)
unseen_chars = ALPHABET.dup
seen_once_chars = []
s.each_char do |c|
if unseen_chars.include?(c)
unseen_chars.delete(c)
seen_once_chars << c
elsif seen_once_chars.include?(c)
seen_once_chars.delete(c)
end
end
seen_once_chars.first
end
Seems to work for some simple examples:
fnr "abcdabcegghh"
# => "d"
fnr "abababababababaqababa"
=> "q"
Suggestions and corrections are very much appreciated!

Try this code:
public static String findFirstUnique(String str)
{
String unique = "";
foreach (char ch in str)
{
if (unique.Contains(ch)) unique=unique.Replace(ch.ToString(), "");
else unique += ch.ToString();
}
return unique[0].ToString();
}

In Mathematica one might write this:
string = "conservationist deliberately treasures analytical";
Cases[Gather # Characters # string, {_}, 1, 1][[1]]
{"v"}

This snippet code in JavaScript
var string = "tooth";
var hash = [];
for(var i=0; j=string.length, i<j; i++){
if(hash[string[i]] !== undefined){
hash[string[i]] = hash[string[i]] + 1;
}else{
hash[string[i]] = 1;
}
}
for(i=0; j=string.length, i<j; i++){
if(hash[string[i]] === 1){
console.info( string[i] );
return false;
}
}
// prints "h"

Different approach here.
scan each element in the string and create a count array which stores the repetition count of each element.
Next time again start from first element in the array and print the first occurrence of element with count = 1
C code
-----
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char t_c;
char *t_p = argv[1] ;
char count[128]={'\0'};
char ch;
for(t_c = *(argv[1]); t_c != '\0'; t_c = *(++t_p))
count[t_c]++;
t_p = argv[1];
for(t_c = *t_p; t_c != '\0'; t_c = *(++t_p))
{
if(count[t_c] == 1)
{
printf("Element is %c\n",t_c);
break;
}
}
return 0;
}

input is = aabbcddeef output is = c
char FindUniqueChar(char *a)
{
int i=0;
bool repeat=false;
while(a[i] != '\0')
{
if (a[i] == a[i+1])
{
repeat = true;
}
else
{
if(!repeat)
{
cout<<a[i];
return a[i];
}
repeat=false;
}
i++;
}
return a[i];
}

Here is another approach...we could have a array which will store the count and the index of the first occurrence of the character. After filling up the array we could jst traverse the array and find the MINIMUM index whose count is 1 then return str[index]
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <climits>
using namespace std;
#define No_of_chars 256
//store the count and the index where the char first appear
typedef struct countarray
{
int count;
int index;
}countarray;
//returns the count array
countarray *getcountarray(char *str)
{
countarray *count;
count=new countarray[No_of_chars];
for(int i=0;i<No_of_chars;i++)
{
count[i].count=0;
count[i].index=-1;
}
for(int i=0;*(str+i);i++)
{
(count[*(str+i)].count)++;
if(count[*(str+i)].count==1) //if count==1 then update the index
count[*(str+i)].index=i;
}
return count;
}
char firstnonrepeatingchar(char *str)
{
countarray *array;
array = getcountarray(str);
int result = INT_MAX;
for(int i=0;i<No_of_chars;i++)
{
if(array[i].count==1 && result > array[i].index)
result = array[i].index;
}
delete[] (array);
return (str[result]);
}
int main()
{
char str[] = "geeksforgeeks";
cout<<"First non repeating character is "<<firstnonrepeatingchar(str)<<endl;
return 0;
}

Function:
This c# function uses a HashTable (Dictionary) and have a performance O(2n) worstcase.
private static string FirstNoRepeatingCharacter(string aword)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < aword.Length; i++)
{
if (!dic.ContainsKey(aword.Substring(i, 1)))
dic.Add(aword.Substring(i, 1), 1);
else
dic[aword.Substring(i, 1)]++;
}
foreach (var item in dic)
{
if (item.Value == 1) return item.Key;
}
return string.Empty;
}
Example:
string aword = "TEETER";
Console.WriteLine(FirstNoRepeatingCharacter(aword)); //print: R

I have two strings i.e. 'unique' and 'repeated'. Every character appearing for the first time, gets added to 'unique'. If it is repeated for the second time, it gets removed from 'unique' and added to 'repeated'. This way, we will always have a string of unique characters in 'unique'.
Complexity big O(n)
public void firstUniqueChar(String str){
String unique= "";
String repeated = "";
str = str.toLowerCase();
for(int i=0; i<str.length();i++){
char ch = str.charAt(i);
if(!(repeated.contains(str.subSequence(i, i+1))))
if(unique.contains(str.subSequence(i, i+1))){
unique = unique.replaceAll(Character.toString(ch), "");
repeated = repeated+ch;
}
else
unique = unique+ch;
}
System.out.println(unique.charAt(0));
}

The following code is in C# with complexity of n.
using System;
using System.Linq;
using System.Text;
namespace SomethingDigital
{
class FirstNonRepeatingChar
{
public static void Main()
{
String input = "geeksforgeeksandgeeksquizfor";
char[] str = input.ToCharArray();
bool[] b = new bool[256];
String unique1 = "";
String unique2 = "";
foreach (char ch in str)
{
if (!unique1.Contains(ch))
{
unique1 = unique1 + ch;
unique2 = unique2 + ch;
}
else
{
unique2 = unique2.Replace(ch.ToString(), "");
}
}
if (unique2 != "")
{
Console.WriteLine(unique2[0].ToString());
Console.ReadLine();
}
else
{
Console.WriteLine("No non repeated string");
Console.ReadLine();
}
}
}
}

The following solution is an elegant way to find the first unique character within a string using the new features which have been introduced as part as Java 8. This solution uses the approach of first creating a map to count the number of occurrences of each character. It then uses this map to find the first character which occurs only once. This runs in O(N) time.
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
// Runs in O(N) time and uses lambdas and the stream API from Java 8
// Also, it is only three lines of code!
private static String findFirstUniqueCharacterPerformantWithLambda(String inputString) {
// convert the input string into a list of characters
final List<String> inputCharacters = Arrays.asList(inputString.split(""));
// first, construct a map to count the number of occurrences of each character
final Map<Object, Long> characterCounts = inputCharacters
.stream()
.collect(groupingBy(s -> s, counting()));
// then, find the first unique character by consulting the count map
return inputCharacters
.stream()
.filter(s -> characterCounts.get(s) == 1)
.findFirst()
.orElse(null);
}

Here is one more solution with o(n) time complexity.
public void findUnique(String string) {
ArrayList<Character> uniqueList = new ArrayList<>();
int[] chatArr = new int[128];
for (int i = 0; i < string.length(); i++) {
Character ch = string.charAt(i);
if (chatArr[ch] != -1) {
chatArr[ch] = -1;
uniqueList.add(ch);
} else {
uniqueList.remove(ch);
}
}
if (uniqueList.size() == 0) {
System.out.println("No unique character found!");
} else {
System.out.println("First unique character is :" + uniqueList.get(0));
}
}

I read through the answers, but did not see any like mine, I think this answer is very simple and fast, am I wrong?
def first_unique(s):
repeated = []
while s:
if s[0] not in s[1:] and s[0] not in repeated:
return s[0]
else:
repeated.append(s[0])
s = s[1:]
return None
test
(first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')

Question : First Unique Character of a String
This is the simplest solution.
public class Test4 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
firstUniqCharindex(a);
}
public static void firstUniqCharindex(String a) {
int[] count = new int[256];
for (int i = 0; i < a.length(); i++) {
count[a.charAt(i)]++;
}
int index = -1;
for (int i = 0; i < a.length(); i++) {
if (count[a.charAt(i)] == 1) {
index = i;
break;
} // if
}
System.out.println(index);// output => 8
System.out.println(a.charAt(index)); //output => P
}// end1
}
IN Python :
def firstUniqChar(a):
count = [0] * 256
for i in a: count[ord(i)] += 1
element = ""
for items in a:
if(count[ord(items) ] == 1):
element = items ;
break
return element
a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P
Using Java 8 :
public class Test2 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
Map<Character, Long> map = a.chars()
.mapToObj(
ch -> Character.valueOf((char) ch)
).collect(
Collectors.groupingBy(
Function.identity(),
LinkedHashMap::new,
Collectors.counting()));
System.out.println("MAP => " + map);
// {G=2, i=5, n=2, a=2, P=1, r=1, o=1, t=1, j=1, y=1}
Character chh = map
.entrySet()
.stream()
.filter(entry -> entry.getValue() == 1L)
.map(entry -> entry.getKey())
.findFirst()
.get();
System.out.println("First Non Repeating Character => " + chh);// P
}// main
}

how about using a suffix tree for this case... the first unrepeated character will be first character of longest suffix string with least depth in tree..

Create Two list -
unique list - having only unique character .. UL
non-unique list - having only repeated character -NUL
for(char c in str) {
if(nul.contains(c)){
//do nothing
}else if(ul.contains(c)){
ul.remove(c);
nul.add(c);
}else{
nul.add(c);
}

Related

Given a string, find its first non-repeating character in only One scan

Given a string, find the first non-repeating character in it. For
example, if the input string is “GeeksforGeeks”, then output should be
‘f’.
We can use string characters as index and build a count array.
Following is the algorithm.
Scan the string from left to right and construct the count array or
HashMap.
Again, scan the string from left to right and check for
count of each character, if you find an element who's count is 1,
return it.
Above problem and algorithm is from GeeksForGeeks
But it requires two scan of an array. I want to find first non-repeating character in only one scan.
I implemented above algorithm Please check it also on Ideone:
import java.util.HashMap;
import java.util.Scanner;
/**
*
* #author Neelabh
*/
public class FirstNonRepeatedCharacter {
public static void main(String [] args){
Scanner scan=new Scanner(System.in);
String string=scan.next();
int len=string.length();
HashMap<Character, Integer> hashMap=new HashMap<Character, Integer>();
//First Scan
for(int i = 0; i <len;i++){
char currentCharacter=string.charAt(i);
if(!hashMap.containsKey(currentCharacter)){
hashMap.put(currentCharacter, 1);
}
else{
hashMap.put(currentCharacter, hashMap.get(currentCharacter)+1);
}
}
// Second Scan
boolean flag=false;
char firstNonRepeatingChar = 0;
for(int i=0;i<len;i++){
char c=string.charAt(i);
if(hashMap.get(c)==1){
flag=true;
firstNonRepeatingChar=c;
break;
}
}
if(flag==true)
System.out.println("firstNonRepeatingChar is "+firstNonRepeatingChar);
else
System.out.println("There is no such type of character");
}
}
GeeksforGeeks also suggest efficient method but I think it is also two scan. Following solution is from GeeksForGeeks
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define NO_OF_CHARS 256
// Structure to store count of a character and index of the first
// occurrence in the input string
struct countIndex {
int count;
int index;
};
/* Returns an array of above structure type. The size of
array is NO_OF_CHARS */
struct countIndex *getCharCountArray(char *str)
{
struct countIndex *count =
(struct countIndex *)calloc(sizeof(countIndex), NO_OF_CHARS);
int i;
// This is First Scan
for (i = 0; *(str+i); i++)
{
(count[*(str+i)].count)++;
// If it's first occurrence, then store the index
if (count[*(str+i)].count == 1)
count[*(str+i)].index = i;
}
return count;
}
/* The function returns index of the first non-repeating
character in a string. If all characters are repeating
then reurns INT_MAX */
int firstNonRepeating(char *str)
{
struct countIndex *count = getCharCountArray(str);
int result = INT_MAX, i;
//Second Scan
for (i = 0; i < NO_OF_CHARS; i++)
{
// If this character occurs only once and appears
// before the current result, then update the result
if (count[i].count == 1 && result > count[i].index)
result = count[i].index;
}
free(count); // To avoid memory leak
return result;
}
/* Driver program to test above function */
int main()
{
char str[] = "geeksforgeeks";
int index = firstNonRepeating(str);
if (index == INT_MAX)
printf("Either all characters are repeating or string is empty");
else
printf("First non-repeating character is %c", str[index]);
getchar();
return 0;
}
You can store 2 arrays: count of each character and the first occurrence(and fill both of them during the first scan). Then the second scan will be unnecessary.
Use String functions of java then you find the solution in only one for loop
The Example is show below
import java.util.Scanner;
public class firstoccurance {
public static void main(String args[]){
char [] a ={'h','h','l','l','o'};
//Scanner sc=new Scanner(System.in);
String s=new String(a);//sc.next();
char c;
int i;
int length=s.length();
for(i=0;i<length;i++)
{
c=s.charAt(i);
if(s.indexOf(c)==s.lastIndexOf(c))
{
System.out.println("first non repeating char in a string "+c);
break;
}
else if(i==length-1)
{
System.out.println("no single char");
}
}
}
}
In following solution I declare one class CharCountAndPosition which stores firstIndex and frequencyOfchar. During the reading string characterwise, firstIndex stores the first encounter of character and frequencyOfchar stores the total occurrence of characters.
We will make array of CharCountAndPosition step:1 and Initialize it step2.
During scanning the string, Initialize the firstIndex and frequencyOfchar for every character step3.
Now In the step4 check the array of CharCountAndPosition, find the character with frequency==1 and minimum firstIndex
Over all time complexity is O(n+256), where n is size of string. O(n+256) is equivalent to O(n) Because 256 is constant. Please find solution of this on ideone
public class FirstNonRepeatedCharacterEfficient {
public static void main(String [] args){
// step1: make array of CharCountAndPosition.
CharCountAndPosition [] array=new CharCountAndPosition[256];
// step2: Initialize array with object of CharCountAndPosition.
for(int i=0;i<256;i++)
{
array[i]=new CharCountAndPosition();
}
Scanner scan=new Scanner(System.in);
String str=scan.next();
int len=str.length();
// step 3
for(int i=0;i<len;i++){
char c=str.charAt(i);
int index=c-'a';
int frequency=array[index].frequencyOfchar;
if(frequency==0)
array[index].firstIndex=i;
array[index].frequencyOfchar=frequency+1;
//System.out.println(c+" "+array[index].frequencyOfchar);
}
boolean flag=false;
int firstPosition=Integer.MAX_VALUE;
for(int i=0;i<256;i++){
// Step4
if(array[i].frequencyOfchar==1){
//System.out.println("character="+(char)(i+(int)'a'));
if(firstPosition> array[i].firstIndex){
firstPosition=array[i].firstIndex;
flag=true;
}
}
}
if(flag==true)
System.out.println(str.charAt(firstPosition));
else
System.out.println("There is no such type of character");
}
}
class CharCountAndPosition{
int firstIndex;
int frequencyOfchar;
}
A solution in javascript with a lookup table:
var sample="It requires two scan of an array I want to find first non repeating character in only one scan";
var sampleArray=sample.split("");
var table=Object.create(null);
sampleArray.forEach(function(char,idx){
char=char.toLowerCase();
var pos=table[char];
if(typeof(pos)=="number"){
table[char]=sampleArray.length; //a duplicate found; we'll assign some invalid index value to this entry and discard these characters later
return;
}
table[char]=idx; //index of first occurance of this character
});
var uniques=Object.keys(table).filter(function(k){
return table[k]<sampleArray.length;
}).map(function(k){
return {key:k,pos:table[k]};
});
uniques.sort(function(a,b){
return a.pos-b.pos;
});
uniques.toSource(); //[{key:"q", pos:5}, {key:"u", pos:6}, {key:"d", pos:46}, {key:"p", pos:60}, {key:"g", pos:66}, {key:"h", pos:69}, {key:"l", pos:83}]
(uniques.shift()||{}).key; //q
Following C prog, add char specific value to 'count' if char didn't occurred before, removes char specific value from 'count' if char had occurred before. At the end I get a 'count' that has char specific value which indicate what was that char!
//TO DO:
//If multiple unique char occurs, which one is occurred before?
//Is is possible to get required values (1,2,4,8,..) till _Z_ and _z_?
#include <stdio.h>
#define _A_ 1
#define _B_ 2
#define _C_ 4
#define _D_ 8
//And so on till _Z
//Same for '_a' to '_z'
#define ADDIFNONREP(C) if(count & C) count = count & ~C; else count = count | C; break;
char getNonRepChar(char *str)
{
int i = 0, count = 0;
for(i = 0; str[i] != '\0'; i++)
{
switch(str[i])
{
case 'A':
ADDIFNONREP(_A_);
case 'B':
ADDIFNONREP(_B_);
case 'C':
ADDIFNONREP(_C_);
case 'D':
ADDIFNONREP(_D_);
//And so on
//Same for 'a' to 'z'
}
}
switch(count)
{
case _A_:
return 'A';
case _B_:
return 'B';
case _C_:
return 'C';
case _D_:
return 'D';
//And so on
//Same for 'a' to 'z'
}
}
int main()
{
char str[] = "ABCDABC";
char c = getNonRepChar(str);
printf("%c\n", c); //Prints D
return 0;
}
You can maintain a queue of keys as they are added to the hash map (you add your key to the queue if you add a new key to the hash map). After string scan, you use the queue to obtain the order of the keys as they were added to the map. This functionality is exactly what Java standard library class OrderedHashMap does.
Here is my take on the problem.
Iterate through string. Check if hashset contains the character. If so delete it from array. If not present just add it to the array and hashset.
NSMutableSet *repeated = [[NSMutableSet alloc] init]; //Hashset
NSMutableArray *nonRepeated = [[NSMutableArray alloc] init]; //Array
for (int i=0; i<[test length]; i++) {
NSString *currentObj = [NSString stringWithFormat:#"%c", [test characterAtIndex:i]]; //No support for primitive data types.
if ([repeated containsObject:currentObj]) {
[nonRepeated removeObject:currentObj];// in obj-c nothing happens even if nonrepeted in nil
continue;
}
[repeated addObject:currentObj];
[nonRepeated addObject:currentObj];
}
NSLog(#"This is the character %#", [nonRepeated objectAtIndex:0]);
If you can restrict yourself to strings of ASCII characters, I would recommend a lookup table instead of a hash table. This lookup table would have only 128 entries.
A possible approach would be as follows.
We start with an empty queue Q (may be implemented using linked lists) and a lookup table T. For a character ch, T[ch] stores a pointer to a queue node containing the character ch and the index of the first occurrence of ch in the string. Initially, all entries of T are NULL.
Each queue node stores the character and the first occurrence index as specified earlier, and also has a special boolean flag named removed which indicates that the node has been removed from the queue.
Read the string character by character. If the ith character is ch, check if T[ch] = NULL. If so, this is the first occurrence of ch in the string. Then add a node for ch containing the index i to the queue.
If T[ch] is not NULL, this is a repeating character. If the node pointed to by T[ch] has already been removed (i.e. the removed flag of the node is set), then nothing needs to be done. Otherwise, remove the node from the queue by manipulating the pointers of the previous and next nodes. Also set the removed flag of the node to indicate that the node is now removed. Note that we do not free/delete the node at this stage, nor do we set T[ch] back to NULL.
If we proceed in this way, the nodes for all the repeating characters will be removed from the queue. The removed flag is used to ensure that no node is removed twice from the queue if the character occurs more than two times.
After the string has been completely processed, the first node of the linked list will contain the character code as well as the index of the first non-repeating character. Then, the memory can be freed by iterating over the entries of lookup table T and freeing any non-NULL entries.
Here is a C implementation. Here, instead of the removed flag, I set the prev and next pointers of the current node to NULL when it is removed, and check for that to see if a node has already been removed.
#include <stdio.h>
#include <stdlib.h>
struct queue_node {
int ch;
int index;
struct queue_node *prev;
struct queue_node *next;
};
void print_queue (struct queue_node *head);
int main (void)
{
int i;
struct queue_node *lookup_entry[128];
struct queue_node *head;
struct queue_node *last;
struct queue_node *cur_node, *prev_node, *next_node;
char str [] = "GeeksforGeeks";
head = malloc (sizeof (struct queue_node));
last = head;
last->prev = last->next = NULL;
for (i = 0; i < 128; i++) {
lookup_entry[i] = NULL;
}
for (i = 0; str[i] != '\0'; i++) {
cur_node = lookup_entry[str[i]];
if (cur_node != NULL) {
/* it is a repeating character */
if (cur_node->prev != NULL) {
/* Entry has not been removed. Remove it from the queue. */
prev_node = cur_node->prev;
next_node = cur_node->next;
prev_node->next = next_node;
if (next_node != NULL) {
next_node->prev = prev_node;
} else {
/* Last node was removed */
last = prev_node;
}
cur_node->prev = NULL;
cur_node->next = NULL;
/* We will not free the node now. Instead, free
* all nodes in a single pass afterwards.
*/
}
} else {
/* This is the first occurence - add an entry to the queue */
struct queue_node *newnode = malloc (sizeof(struct queue_node));
newnode->ch = str[i];
newnode->index = i;
newnode->prev = last;
newnode->next = NULL;
last->next = newnode;
last = newnode;
lookup_entry[str[i]] = newnode;
}
print_queue (head);
}
last = head->next;
while (last != NULL) {
printf ("Non-repeating char: %c at index %d.\n", last->ch, last->index);
last = last->next;
}
/* Free the queue memory */
for (i = 0; i < 128; i++) {
if (lookup_entry[i] != NULL) {
free (lookup_entry[i]);
lookup_entry[i] = NULL;
}
}
free (head);
return (0);
}
void print_queue (struct queue_node *head) {
struct queue_node *tmp = head->next;
printf ("Queue: ");
while (tmp != NULL) {
printf ("%c:%d ", tmp->ch, tmp->index);
tmp = tmp->next;
}
printf ("\n");
}
Instead of making things more and more complex, I can use three for loops to tackle this.
class test{
public static void main(String args[]){
String s="STRESST";//Your input can be given here.
char a[]=new char[s.length()];
for(int i=0;i<s.length();i++){
a[i]=s.charAt(i);
}
for(int i=0;i<s.length();i++){
int flag=0;
for(int j=0;j<s.length();j++){
if(a[i]==a[j]){
flag++;
}
}
if(flag==1){
System.out.println(a[i]+" is not repeated");
break;
}
}
}
}
I guess it will be helpful for people who are just gonna look at the logic part without any complex methods used in the program.
This can be done in one Scan using the substring method. Do it like this:
String str="your String";<br>
String s[]= str.split("");<br>
int n=str.length();<br>
int i=0;<br><br>
for(String ss:s){
if(!str.substring(i+1,n).contains(ss)){
System.out.println(ss);
}
}
This will have the lowest complexity and will search for it even without completing one full scan.
Add each character to a HashSet and check whether hashset.add() returns true, if it returns false ,then remove the character from hashset.
Then getting the first value of the hashset will give you the first non repeated character.
Algorithm:
for(i=0;i<str.length;i++)
{
HashSet hashSet=new HashSet<>()
if(!hashSet.add(str[i))
hashSet.remove(str[i])
}
hashset.get(0) will give the non repeated character.
i have this program which is more simple,
this is not using any data structures
public static char findFirstNonRepChar(String input){
char currentChar = '\0';
int len = input.length();
for(int i=0;i<len;i++){
currentChar = input.charAt(i);
if((i!=0) && (currentChar!=input.charAt(i-1)) && (i==input.lastIndexOf(currentChar))){
return currentChar;
}
}
return currentChar;
}
A simple (non hashed) version...
public static String firstNRC(String s) {
String c = "";
while(s.length() > 0) {
c = "" + s.charAt(0);
if(! s.substring(1).contains(c)) return c;
s = s.replace(c, "");
}
return "";
}
or
public static char firstNRC(String s) {
s += " ";
for(int i = 0; i < s.length() - 1; i++)
if( s.split("" + s.charAt(i)).length == 2 ) return s.charAt(i);
return ' ';
}
//This is the simple logic for finding first non-repeated character....
public static void main(String[] args) {
String s = "GeeksforGeeks";
for (int i = 0; i < s.length(); i++) {
char begin = s.charAt(i);
String begin1 = String.valueOf(begin);
String end = s.substring(0, i) + s.substring(i + 1);
if (end.contains(begin1));
else {
i = s.length() + 1;
System.out.println(begin1);
}
}
}
#Test
public void testNonRepeadLetter() {
assertEquals('f', firstNonRepeatLetter("GeeksforGeeks"));
assertEquals('I', firstNonRepeatLetter("teststestsI"));
assertEquals('1', firstNonRepeatLetter("123aloalo"));
assertEquals('o', firstNonRepeatLetter("o"));
}
private char firstNonRepeatLetter(String s) {
if (s == null || s.isEmpty()) {
throw new IllegalArgumentException(s);
}
Set<Character> set = new LinkedHashSet<>();
for (int i = 0; i < s.length(); i++) {
char charAt = s.charAt(i);
if (set.contains(charAt)) {
set.remove(charAt);
} else {
set.add(charAt);
}
}
return set.iterator().next();
}
here is a tested code in java. note that it is possible that no non repeated character is found, and for that we return a '0'
// find first non repeated character in a string
static char firstNR( String str){
int i, j, l;
char letter;
int[] k = new int[100];
j = str.length();
if ( j > 100) return '0';
for (i=0; i< j; i++){
k[i] = 0;
}
for (i=0; i<j; i++){
for (l=0; l<j; l++){
if (str.charAt(i) == str.charAt(l))
k[i]++;
}
}
for (i=0; i<j; i++){
if (k[i] == 1)
return str.charAt(i);
}
return '0';
Here is the logic to find the first non-repeatable letter in a String.
String name = "TestRepeat";
Set <Character> set = new LinkedHashSet<Character>();
List<Character> list = new ArrayList<Character>();
char[] ch = name.toCharArray();
for (char c :ch) {
set.add(c);
list.add(c);
}
Iterator<Character> itr1 = set.iterator();
Iterator<Character> itr2= list.iterator();
while(itr1.hasNext()){
int flag =0;
Character setNext= itr1.next();
for(int i=0; i<list.size(); i++){
Character listNext= list.get(i);
if(listNext.compareTo(setNext)== 0){
flag ++;
}
}
if(flag==1){
System.out.println("Character: "+setNext);
break;
}
}
it is very easy....you can do it without collection in java..
public class FirstNonRepeatedString{
public static void main(String args[]) {
String input ="GeeksforGeeks";
char process[] = input.toCharArray();
boolean status = false;
int index = 0;
for (int i = 0; i < process.length; i++) {
for (int j = 0; j < process.length; j++) {
if (i == j) {
continue;
} else {
if (process[i] == process[j]) {
status = false;
break;
} else {
status = true;
index = i;
}
}
}
if (status) {
System.out.println("First non-repeated string is : " + process[index]);
break;
}
}
}
}
We can create LinkedHashMap having each character from the string and it's respective count. And then traverse through the map when you come across char with count as 1 return that character. Below is the function for the same.
private static char findFirstNonRepeatedChar(String string) {
LinkedHashMap<Character, Integer> map = new LinkedHashMap<>();
for(int i=0;i< string.length();i++){
if(map.containsKey(string.charAt(i)))
map.put(string.charAt(i),map.get(string.charAt(i))+1);
else
map.put(string.charAt(i),1);
}
for(Entry<Character,Integer> entry : map.entrySet()){
if(entry.getValue() == 1){
return entry.getKey();
}
}
return ' ';
}
One Pass Solution.
I have used linked Hashmap here to maintain the insertion order. So I go through all the characters of a string and store it values in Linked HashMap. After that I traverse through the Linked Hash map and whichever first key will have its value equal to 1, I will print that key and exit the program.
import java.util.*;
class demo
{
public static void main(String args[])
{
String str="GeekGsQuizk";
HashMap <Character,Integer>hm=new LinkedHashMap<Character,Integer>();
for(int i=0;i<str.length();i++)
{
if(!hm.containsKey(str.charAt(i)))
hm.put(str.charAt(i),1);
else
hm.put(str.charAt(i),hm.get(str.charAt(i))+1);
}
for (Character key : hm.keySet())
{
if(hm.get(key)==1)
{
System.out.println(key);
System.exit(0) ;
}
}
}
}
I know this comes one year late, but I think if you use LinkedHashMap in your solution instead of using a HashMap, you will have the order guaranteed in the resulting map and you can directly return the key with the corresponding value as 1.
Not sure if this is what you wanted though as you will have to iterate over the map (not the string) after you are done populating it - but just my 2 cents.
Regards,
-Vini
Finding first non-repeated character in one pass O(n ) , without using indexOf and lastIndexOf methods
package nee.com;
public class FirstNonRepeatedCharacterinOnePass {
public static void printFirstNonRepeatedCharacter(String str){
String strToCaps=str.toUpperCase();
char ch[]=strToCaps.toCharArray();
StringBuilder sb=new StringBuilder();
// ASCII range for A-Z ( 91-65 =26)
boolean b[]=new boolean[26];
for(int i=0;i<ch.length;i++){
if(b[ch[i]-65]==false){
b[ch[i]-65]=true;
}
else{
//add repeated char to StringBuilder
sb.append(ch[i]+"");
}
}
for(int i=0;i<ch.length;i++){
// if char is not there in StringBuilder means it is non repeated
if(sb.indexOf(ch[i]+"")==-1){
System.out.println(" first non repeated in lower case ...."+Character.toLowerCase((ch[i])));
break;
}
}
}
public static void main(String g[]){
String str="abczdabddcn";
printFirstNonRepeatedCharacter(str);
}
}
I did the same using LinkedHashSet. Following is the code snippet:
System.out.print("Please enter the string :");
str=sc.nextLine();
if(null==str || str.equals("")) {
break;
}else {
chArr=str.toLowerCase().toCharArray();
set=new LinkedHashSet<Character>();
dupSet=new LinkedHashSet<Character>();
for(char chVal:chArr) {
if(set.contains(chVal)) {
dupSet.add(chVal);
}else {
set.add(chVal);
}
}
set.removeAll(dupSet);
System.out.println("First unique :"+set.toArray()[0]);
}
You can find this question here
For code of the below algorithm refer this link (My implementation with test cases)
Using linkedlist in combination with hashMap
I have a solution which solves it in O(n) time One array pass and O(1) space
Inreality -> O(1) space is O(26) space
Algorithm
1) every time you visit a character for the first time
Create a node for the linkedList(storing that character).Append it at the end of the lnkedList.Add an entry in the hashMap storing for recently appended charater the address of the node in the linked list that was before that character.If character is appended to an empty linked list store null for vale in hash map.
2) Now if you encounter the same charactter again
Remove that element from the linkedlist using the address stored in the hash map and now you have to update for the element that was after the deleted element ,the previous element for it. Make it equal to the previous element of the deleted element.
Complexity Analysis
LinkedlIst add element -> O(1)
LinkedlIst delete element -> O(1)
HashMap -> O(1)
space O(1)
pass -> one in O(n)
#include<bits/stdc++.h>
using namespace std;
typedef struct node
{
char ch;
node *next;
}node;
char firstNotRepeatingCharacter(string &s)
{
char ans = '_';
map<char,node*> mp;//hash map atmost may consume O(26) space
node *head = NULL;//linkedlist atmost may consume O(26) space
node *last;// to append at last in O(1)
node *temp1 = NULL;
node *temp2 = new node[1];
temp2->ch = '$';
temp2->next = NULL;
//This is my one pass of array//
for(int i = 0;i < s.size();++i)
{
//first occurence of character//
if(mp.find(s[i]) == mp.end())
{
node *temp = new node[1];
temp->ch = s[i];
temp->next = NULL;
if(head == NULL)
{
head = temp;
last = temp;
mp.insert(make_pair(s[i],temp1));
}
else
{
last->next = temp;
mp.insert(make_pair(s[i],last));
last = temp;
}
}
//Repeated occurence//
else
{
node *temp = mp[s[i]];
if(mp[s[i]] != temp2)
{
if(temp == temp1)
{
head = head->next;
if((head)!=NULL){mp[head->ch] = temp1;}
else last = head;
mp[s[i]] = temp2;
}
else if((temp->next) != NULL)
{
temp->next = temp->next->next;
if((temp->next) != NULL){mp[temp->next->ch] = temp;}
else last = temp;
mp[s[i]] = temp2;
}
else
{
;
}
}
}
if(head == NULL){;}
else {ans = head->ch;}
return ans;
}
int main()
{
int T;
cin >> T;
while(T--)
{
string str;
cin >> str;
cout << str << " -> " << firstNotRepeatingCharacter(str)<< endl;
}
return 0;
}
Requires one scan only.
Uses a deque (saves char) and a hashmap (saves char->node). On repeating char, get char's node in deque using hashmap and remove it from deque (in O(1) time) but keep the char in hashmap with null node value. peek() gives the 1st unique character.
[pseudocode]
char? findFirstUniqueChar(s):
if s == null:
throw
deque<char>() dq = new
hashmap<char, node<char>> chToNodeMap = new
for i = 0, i < s.length(), i++:
ch = s[i]
if !chToNodeMap.hasKey(ch):
chToNodeMap[ch] = dq.enqueue(ch)
else:
chNode = chToNodeMap[ch]
if chNode != null:
dq.removeNode(chNode)
chToNodeMap[ch] = null
if dq.isEmpty():
return null
return dq.peek()
// deque interface
deque<T>:
node<T> enqueue(T t)
bool removeNode(node<T> n)
T peek()
bool isEmpty()
The string is scanned only once; other scans happen on counts and first appearance arrays, which are generally much smaller in size. Or at least below approach is for cases when string is much larger than character set the string is made from.
Here is an example in golang:
package main
import (
"fmt"
)
func firstNotRepeatingCharacter(s string) int {
counts := make([]int, 256)
first := make([]int, 256)
// The string is parsed only once
for i := len(s) - 1; i >= 0; i-- {
counts[s[i]]++
first[s[i]] = i
}
min := 0
minValue := len(s) + 1
// Now we are parsing counts and first slices
for i := 0; i < 256; i++ {
if counts[i] == 1 && first[i] < minValue {
minValue = first[i]
min = i
}
}
return min
}
func main() {
fmt.Println(string(firstNotRepeatingCharacter("fff")))
fmt.Println(string(firstNotRepeatingCharacter("aabbc")))
fmt.Println(string(firstNotRepeatingCharacter("cbbc")))
fmt.Println(string(firstNotRepeatingCharacter("cbabc")))
}
go playground
Question : Find First Non Repeating Character or First Unique Character:
The code itself is understandable.
public class uniqueCharacter1 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
firstUniqCharindex(a);
}
public static void firstUniqCharindex(String a) {
int count[] = new int[256];
for (char ch : a.toCharArray()) {
count[ch]++;
} // for
for (int i = 0; i < a.length(); i++) {
char ch = a.charAt(i);
if (count[ch] == 1) {
System.out.println(i);// 8
System.out.println(a.charAt(i));// p
break;
}
}
}// end1
}
In Python:
def firstUniqChar(a):
count = [0] * 256
for i in a: count[ord(i)] += 1
element = ""
for items in a:
if(count[ord(items) ] == 1):
element = items ;
break
return element
a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P
GeeksforGeeks also suggest efficient method but I think it is also two
scan.
Note that in the second scan, it does not scan the input string, but the array of wihch the length is NO_OF_CHARS. So the time complexity is O(n+m), which is better than 2*O(n), when the n is quite large(for a long intput string)
But it requires two scan of an array. I want to find first
non-repeating character in only one scan.
IMHO, it is possible if a priority queue is used. In that queue we compare each char with its occurrence count and its first occur index, and finally, we simply get the first element in the queue. See #hlpPy 's answer.

Finding shortest repeating cycle in word?

I'm about to write a function which, would return me a shortest period of group of letters which would eventually create the given word.
For example word abkebabkebabkeb is created by repeated abkeb word. I would like to know, how efficiently analyze input word, to get the shortest period of characters creating input word.
Here is a correct O(n) algorithm. The first for loop is the table building portion of KMP. There are various proofs that it always runs in linear time.
Since this question has 4 previous answers, none of which are O(n) and correct, I heavily tested this solution for both correctness and runtime.
def pattern(inputv):
if not inputv:
return inputv
nxt = [0]*len(inputv)
for i in range(1, len(nxt)):
k = nxt[i - 1]
while True:
if inputv[i] == inputv[k]:
nxt[i] = k + 1
break
elif k == 0:
nxt[i] = 0
break
else:
k = nxt[k - 1]
smallPieceLen = len(inputv) - nxt[-1]
if len(inputv) % smallPieceLen != 0:
return inputv
return inputv[0:smallPieceLen]
O(n) solution. Assumes that the entire string must be covered. The key observation is that we generate the pattern and test it, but if we find something along the way that doesn't match, we must include the entire string that we already tested, so we don't have to reobserve those characters.
def pattern(inputv):
pattern_end =0
for j in range(pattern_end+1,len(inputv)):
pattern_dex = j%(pattern_end+1)
if(inputv[pattern_dex] != inputv[j]):
pattern_end = j;
continue
if(j == len(inputv)-1):
print pattern_end
return inputv[0:pattern_end+1];
return inputv;
This is an example for PHP:
<?php
function getrepeatedstring($string) {
if (strlen($string)<2) return $string;
for($i = 1; $i<strlen($string); $i++) {
if (substr(str_repeat(substr($string, 0, $i),strlen($string)/$i+1), 0, strlen($string))==$string)
return substr($string, 0, $i);
}
return $string;
}
?>
Most easiest one in python:
def pattern(self, s):
ans=(s+s).find(s,1,-1)
return len(pat) if ans == -1 else ans
I believe there is a very elegant recursive solution. Many of the proposed solutions solve the extra complexity where the string ends with part of the pattern, like abcabca. But I do not think that is asked for.
My solution for the simple version of the problem in clojure:
(defn find-shortest-repeating [pattern string]
(if (empty? (str/replace string pattern ""))
pattern
(find-shortest-repeating (str pattern (nth string (count pattern))) string)))
(find-shortest-repeating "" "abcabcabc") ;; "abc"
But be aware that this will not find patterns that are uncomplete at the end.
I found a solution based on your post, that could take an incomplete pattern:
(defn find-shortest-repeating [pattern string]
(if (or (empty? (clojure.string/split string (re-pattern pattern)))
(empty? (second (clojure.string/split string (re-pattern pattern)))))
pattern
(find-shortest-repeating (str pattern (nth string (count pattern))) string)))
My Solution:
The idea is to find a substring from the position zero such that it becomes equal to the adjacent substring of same length, when such a substring is found return the substring. Please note if no repeating substring is found I am printing the entire input String.
public static void repeatingSubstring(String input){
for(int i=0;i<input.length();i++){
if(i==input.length()-1){
System.out.println("There is no repetition "+input);
}
else if(input.length()%(i+1)==0){
int size = i+1;
if(input.substring(0, i+1).equals(input.substring(i+1, i+1+size))){
System.out.println("The subString which repeats itself is "+input.substring(0, i+1));
break;
}
}
}
}
This is a solution I came up with using the queue, it passed all the test cases of a similar problem in codeforces. Problem No is 745A.
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s, s1, s2; cin >> s; queue<char> qu; qu.push(s[0]); bool flag = true; int ind = -1;
s1 = s.substr(0, s.size() / 2);
s2 = s.substr(s.size() / 2);
if(s1 == s2)
{
for(int i=0; i<s1.size(); i++)
{
s += s1[i];
}
}
//cout << s1 << " " << s2 << " " << s << "\n";
for(int i=1; i<s.size(); i++)
{
if(qu.front() == s[i]) {qu.pop();}
qu.push(s[i]);
}
int cycle = qu.size();
/*queue<char> qu2 = qu; string str = "";
while(!qu2.empty())
{
cout << qu2.front() << " ";
str += qu2.front();
qu2.pop();
}*/
while(!qu.empty())
{
if(s[++ind] != qu.front()) {flag = false; break;}
qu.pop();
}
flag == true ? cout << cycle : cout << s.size();
return 0;
}
Simpler answer which I can come up in an interview is just a O(n^2) solution, which tries out all combinations of substring starting from 0.
int findSmallestUnit(string str){
for(int i=1;i<str.length();i++){
int j=0;
for(;j<str.length();j++){
if(str[j%i] != str[j]){
break;
}
}
if(j==str.length()) return str.substr(0,i);
}
return str;
}
Now if someone is interested in O(n) solution to this problem in c++:
int findSmallestUnit(string str){
vector<int> lps(str.length(),0);
int i=1;
int len=0;
while(i<str.length()){
if(str[i] == str[len]){
len++;
lps[i] = len;
i++;
}
else{
if(len == 0) i++;
else{
len = lps[len-1];
}
}
}
int n=str.length();
int x = lps[n-1];
if(n%(n-x) == 0){
return str.substr(0,n-x);
}
return str;
}
The above is just #Buge's answer in c++, since someone asked in comments.
Regex solution:
Use the following regex replacement to find the shortest repeating substring, and only keeping that substring:
^(.+?)\1*$
$1
Explanation:
^(.+?)\1*$
^ $ # Start and end, to match the entire input-string
( ) # Capture group 1:
.+ # One or more characters,
? # with a reluctant instead of greedy match†
\1* # Followed by the first capture group repeated zero or more times
$1 # Replace the entire input-string with the first capture group match,
# removing all other duplicated substrings
† Greedy vs reluctant would in this case mean: greedy = consumes as many characters as it can; reluctant = consumes as few characters as it can. Since we want the shortest repeating substring, we would want a reluctant match in our regex.
Example input: "abkebabkebabkeb"
Example output: "abkeb"
Try it online in Retina.
Here an example implementation in Java.
Super delayed answer, but I got the question in an interview, here was my answer (probably not the most optimal but it works for strange test cases as well).
private void run(String[] args) throws IOException {
File file = new File(args[0]);
BufferedReader buffer = new BufferedReader(new FileReader(file));
String line;
while ((line = buffer.readLine()) != null) {
ArrayList<String> subs = new ArrayList<>();
String t = line.trim();
String out = null;
for (int i = 0; i < t.length(); i++) {
if (t.substring(0, t.length() - (i + 1)).equals(t.substring(i + 1, t.length()))) {
subs.add(t.substring(0, t.length() - (i + 1)));
}
}
subs.add(0, t);
for (int j = subs.size() - 2; j >= 0; j--) {
String match = subs.get(j);
int mLength = match.length();
if (j != 0 && mLength <= t.length() / 2) {
if (t.substring(mLength, mLength * 2).equals(match)) {
out = match;
break;
}
} else {
out = match;
}
}
System.out.println(out);
}
}
Testcases:
abcabcabcabc
bcbcbcbcbcbcbcbcbcbcbcbcbcbc
dddddddddddddddddddd
adcdefg
bcbdbcbcbdbc
hellohell
Code returns:
abc
bc
d
adcdefg
bcbdbc
hellohell
Works in cases such as bcbdbcbcbdbc.
function smallestRepeatingString(sequence){
var currentRepeat = '';
var currentRepeatPos = 0;
for(var i=0, ii=sequence.length; i<ii; i++){
if(currentRepeat[currentRepeatPos] !== sequence[i]){
currentRepeatPos = 0;
// Add next character available to the repeat and reset i so we don't miss any matches inbetween
currentRepeat = currentRepeat + sequence.slice(currentRepeat.length, currentRepeat.length+1);
i = currentRepeat.length-1;
}else{
currentRepeatPos++;
}
if(currentRepeatPos === currentRepeat.length){
currentRepeatPos = 0;
}
}
// If repeat wasn't reset then we didn't find a full repeat at the end.
if(currentRepeatPos !== 0){ return sequence; }
return currentRepeat;
}
I came up with a simple solution that works flawlessly even with very large strings.
PHP Implementation:
function get_srs($s){
$hash = md5( $s );
$i = 0; $p = '';
do {
$p .= $s[$i++];
preg_match_all( "/{$p}/", $s, $m );
} while ( ! hash_equals( $hash, md5( implode( '', $m[0] ) ) ) );
return $p;
}

finding if two words are anagrams of each other

I am looking for a method to find if two strings are anagrams of one another.
Ex: string1 - abcde
string2 - abced
Ans = true
Ex: string1 - abcde
string2 - abcfed
Ans = false
the solution i came up with so for is to sort both the strings and compare each character from both strings till the end of either strings.It would be O(logn).I am looking for some other efficient method which doesn't change the 2 strings being compared
Count the frequency of each character in the two strings. Check if the two histograms match. O(n) time, O(1) space (assuming ASCII) (Of course it is still O(1) space for Unicode but the table will become very large).
Get table of prime numbers, enough to map each prime to every character. So start from 1, going through line, multiply the number by the prime representing current character. Number you'll get is only depend on characters in string but not on their order, and every unique set of characters correspond to unique number, as any number may be factored in only one way. So you can just compare two numbers to say if a strings are anagrams of each other.
Unfortunately you have to use multiple precision (arbitrary-precision) integer arithmetic to do this, or you will get overflow or rounding exceptions when using this method.
For this you may use libraries like BigInteger, GMP, MPIR or IntX.
Pseudocode:
prime[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101}
primehash(string)
Y = 1;
foreach character in string
Y = Y * prime[character-'a']
return Y
isanagram(str1, str2)
return primehash(str1)==primehash(str2)
Create a Hashmap where key - letter and value - frequencey of letter,
for first string populate the hashmap (O(n))
for second string decrement count and remove element from hashmap O(n)
if hashmap is empty, the string is anagram otherwise not.
The steps are:
check the length of of both the words/strings if they are equal then only proceed to check for anagram else do nothing
sort both the words/strings and then compare
JAVA CODE TO THE SAME:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package anagram;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
/**
*
* #author Sunshine
*/
public class Anagram {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
// TODO code application logic here
System.out.println("Enter the first string");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String s1 = br.readLine().toLowerCase();
System.out.println("Enter the Second string");
BufferedReader br2 = new BufferedReader(new InputStreamReader(System.in));
String s2 = br2.readLine().toLowerCase();
char c1[] = null;
char c2[] = null;
if (s1.length() == s2.length()) {
c1 = s1.toCharArray();
c2 = s2.toCharArray();
Arrays.sort(c1);
Arrays.sort(c2);
if (Arrays.equals(c1, c2)) {
System.out.println("Both strings are equal and hence they have anagram");
} else {
System.out.println("Sorry No anagram in the strings entred");
}
} else {
System.out.println("Sorry the string do not have anagram");
}
}
}
C#
public static bool AreAnagrams(string s1, string s2)
{
if (s1 == null) throw new ArgumentNullException("s1");
if (s2 == null) throw new ArgumentNullException("s2");
var chars = new Dictionary<char, int>();
foreach (char c in s1)
{
if (!chars.ContainsKey(c))
chars[c] = 0;
chars[c]++;
}
foreach (char c in s2)
{
if (!chars.ContainsKey(c))
return false;
chars[c]--;
}
return chars.Values.All(i => i == 0);
}
Some tests:
[TestMethod]
public void TestAnagrams()
{
Assert.IsTrue(StringUtil.AreAnagrams("anagramm", "nagaramm"));
Assert.IsTrue(StringUtil.AreAnagrams("anzagramm", "nagarzamm"));
Assert.IsTrue(StringUtil.AreAnagrams("anz121agramm", "nag12arz1amm"));
Assert.IsFalse(StringUtil.AreAnagrams("anagram", "nagaramm"));
Assert.IsFalse(StringUtil.AreAnagrams("nzagramm", "nagarzamm"));
Assert.IsFalse(StringUtil.AreAnagrams("anzagramm", "nag12arz1amm"));
}
Code to find whether two words are anagrams:
Logic explained already in few answers and few asking for the code. This solution produce the result in O(n) time.
This approach counts the no of occurrences of each character and store it in the respective ASCII location for each string. And then compare the two array counts. If it is not equal the given strings are not anagrams.
public boolean isAnagram(String str1, String str2)
{
//To get the no of occurrences of each character and store it in their ASCII location
int[] strCountArr1=getASCIICountArr(str1);
int[] strCountArr2=getASCIICountArr(str2);
//To Test whether the two arrays have the same count of characters. Array size 256 since ASCII 256 unique values
for(int i=0;i<256;i++)
{
if(strCountArr1[i]!=strCountArr2[i])
return false;
}
return true;
}
public int[] getASCIICountArr(String str)
{
char c;
//Array size 256 for ASCII
int[] strCountArr=new int[256];
for(int i=0;i<str.length();i++)
{
c=str.charAt(i);
c=Character.toUpperCase(c);// If both the cases are considered to be the same
strCountArr[(int)c]++; //To increment the count in the character's ASCII location
}
return strCountArr;
}
Using an ASCII hash-map that allows O(1) look-up for each char.
The java example listed above is converting to lower-case that seems incomplete. I have an example in C that simply initializes a hash-map array for ASCII values to '-1'
If string2 is different in length than string 1, no anagrams
Else, we update the appropriate hash-map values to 0 for each char in string1 and string2
Then for each char in string1, we update the count in hash-map. Similarily, we decrement the value of the count for each char in string2.
The result should have values set to 0 for each char if they are anagrams. if not, some positive value set by string1 remains
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRAYMAX 128
#define True 1
#define False 0
int isAnagram(const char *string1,
const char *string2) {
int str1len = strlen(string1);
int str2len = strlen(string2);
if (str1len != str2len) /* Simple string length test */
return False;
int * ascii_hashtbl = (int * ) malloc((sizeof(int) * ARRAYMAX));
if (ascii_hashtbl == NULL) {
fprintf(stderr, "Memory allocation failed\n");
return -1;
}
memset((void *)ascii_hashtbl, -1, sizeof(int) * ARRAYMAX);
int index = 0;
while (index < str1len) { /* Populate hash_table for each ASCII value
in string1*/
ascii_hashtbl[(int)string1[index]] = 0;
ascii_hashtbl[(int)string2[index]] = 0;
index++;
}
index = index - 1;
while (index >= 0) {
ascii_hashtbl[(int)string1[index]]++; /* Increment something */
ascii_hashtbl[(int)string2[index]]--; /* Decrement something */
index--;
}
/* Use hash_table to compare string2 */
index = 0;
while (index < str1len) {
if (ascii_hashtbl[(int)string1[index]] != 0) {
/* some char is missing in string2 from string1 */
free(ascii_hashtbl);
ascii_hashtbl = NULL;
return False;
}
index++;
}
free(ascii_hashtbl);
ascii_hashtbl = NULL;
return True;
}
int main () {
char array1[ARRAYMAX], array2[ARRAYMAX];
int flag;
printf("Enter the string\n");
fgets(array1, ARRAYMAX, stdin);
printf("Enter another string\n");
fgets(array2, ARRAYMAX, stdin);
array1[strcspn(array1, "\r\n")] = 0;
array2[strcspn(array2, "\r\n")] = 0;
flag = isAnagram(array1, array2);
if (flag == 1)
printf("%s and %s are anagrams.\n", array1, array2);
else if (flag == 0)
printf("%s and %s are not anagrams.\n", array1, array2);
return 0;
}
let's take a question: Given two strings s and t, write a function to determine if t is an anagram of s.
For example,
s = "anagram", t = "nagaram", return true.
s = "rat", t = "car", return false.
Method 1(Using HashMap ):
public class Method1 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
Map<Character ,Integer> map = new HashMap<>();
for( char c : a.toCharArray()) {
map.put(c, map.getOrDefault(c, 0 ) + 1 );
}
for(char c : b.toCharArray()) {
int count = map.getOrDefault(c, 0);
if(count == 0 ) {return false ; }
else {map.put(c, count - 1 ) ; }
}
return true;
}
}
Method 2 :
public class Method2 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b));// output=> true
}
private static boolean isAnagram(String a, String b) {
int[] alphabet = new int[26];
for(int i = 0 ; i < a.length() ;i++) {
alphabet[a.charAt(i) - 'a']++ ;
}
for (int i = 0; i < b.length(); i++) {
alphabet[b.charAt(i) - 'a']-- ;
}
for( int w : alphabet ) {
if(w != 0 ) {return false;}
}
return true;
}
}
Method 3 :
public class Method3 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
char[] ca = a.toCharArray() ;
char[] cb = b.toCharArray();
Arrays.sort( ca );
Arrays.sort( cb );
return Arrays.equals(ca , cb );
}
}
Method 4 :
public class AnagramsOrNot {
public static void main(String[] args) {
String a = "Protijayi";
String b = "jayiProti";
isAnagram(a, b);
}
private static void isAnagram(String a, String b) {
Map<Integer, Integer> map = new LinkedHashMap<>();
a.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) + 1));
System.out.println(map);
b.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) - 1));
System.out.println(map);
if (map.values().contains(0)) {
System.out.println("Anagrams");
} else {
System.out.println("Not Anagrams");
}
}
}
In Python:
def areAnagram(a, b):
if len(a) != len(b): return False
count1 = [0] * 256
count2 = [0] * 256
for i in a:count1[ord(i)] += 1
for i in b:count2[ord(i)] += 1
for i in range(256):
if(count1[i] != count2[i]):return False
return True
str1 = "Giniiii"
str2 = "Protijayi"
print(areAnagram(str1, str2))
Let's take another famous Interview Question: Group the Anagrams from a given String:
public class GroupAnagrams {
public static void main(String[] args) {
String a = "Gini Gina Protijayi iGin aGin jayiProti Soudipta";
Map<String, List<String>> map = Arrays.stream(a.split(" ")).collect(Collectors.groupingBy(GroupAnagrams::sortedString));
System.out.println("MAP => " + map);
map.forEach((k,v) -> System.out.println(k +" and the anagrams are =>" + v ));
/*
Look at the Map output:
MAP => {Giin=[Gini, iGin], Paiijorty=[Protijayi, jayiProti], Sadioptu=[Soudipta], Gain=[Gina, aGin]}
As we can see, there are multiple Lists. Hence, we have to use a flatMap(List::stream)
Now, Look at the output:
Paiijorty and the anagrams are =>[Protijayi, jayiProti]
Now, look at this output:
Sadioptu and the anagrams are =>[Soudipta]
List contains only word. No anagrams.
That means we have to work with map.values(). List contains all the anagrams.
*/
String stringFromMapHavingListofLists = map.values().stream().flatMap(List::stream).collect(Collectors.joining(" "));
System.out.println(stringFromMapHavingListofLists);
}
public static String sortedString(String a) {
String sortedString = a.chars().sorted()
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append).toString();
return sortedString;
}
/*
* The output : Gini iGin Protijayi jayiProti Soudipta Gina aGin
* All the anagrams are side by side.
*/
}
Now to Group Anagrams in Python is again easy.We have to :
Sort the lists. Then, Create a dictionary. Now dictionary will tell us where are those anagrams are( Indices of Dictionary). Then values of the dictionary is the actual indices of the anagrams.
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
The Output :
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']
Another Important Anagram Question : Find the Anagram occuring Max. number of times.
In the Example, ROOPA is the word which has occured maximum number of times.
Hence, ['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP'] will be the final output.
from sqlite3 import collections
from statistics import mode, mean
import numpy as np
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
"Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
print(".....Method 1....... ")
sortedwords = [''.join(sorted(word)) for word in words]
print(sortedwords)
print("...........")
LongestAnagram = np.array(words)[np.array(sortedwords) == mode(sortedwords)]
# Longest anagram
print("Longest anagram by Method 1:")
print(LongestAnagram)
print(".....................................................")
print(".....Method 2....... ")
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords,samewords in enumerate(A):
dict.setdefault(samewords,[]).append(samewords)
#print(dict)
#{'AOOPR': ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'], 'ABTU': ['ABTU', 'ABTU', 'ABTU'], 'Sadioptu': ['Sadioptu', 'Sadioptu'], ' KPaaehiklry': [' KPaaehiklry'], 'Taeggllnouy': ['Taeggllnouy'], 'Leov': ['Leov'], 'Paiijorty': ['Paiijorty', 'Paiijorty'], 'Paaaikpr': ['Paaaikpr'], 'Saaaabhmryz': ['Saaaabhmryz'], ' CNaachlortttu': [' CNaachlortttu'], 'Saaaaborvz': ['Saaaaborvz']}
aa = max(dict.items() , key = lambda x : len(x[1]))
print("aa => " , aa)
word, anagrams = aa
print("Longest anagram by Method 2:")
print(" ".join(anagrams))
The Output :
.....Method 1.......
['AOOPR', 'ABTU', 'AOOPR', 'ABTU', 'ABTU', 'AOOPR', 'Sadioptu', ' KPaaehiklry', 'Taeggllnouy', 'AOOPR', 'Leov', 'AOOPR', 'Paiijorty', 'Paaaikpr', 'Sadioptu', 'Saaaabhmryz', 'Paiijorty', ' CNaachlortttu', 'Saaaaborvz']
...........
Longest anagram by Method 1:
['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP']
.....................................................
.....Method 2.......
aa => ('AOOPR', ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'])
Longest anagram by Method 2:
AOOPR AOOPR AOOPR AOOPR AOOPR
Well you can probably improve the best case and average case substantially just by checking the length first, then a quick checksum on the digits (not something complex, as that will probably be worse order than the sort, just a summation of ordinal values), then sort, then compare.
If the strings are very short the checksum expense will be not greatly dissimilar to the sort in many languages.
How about this?
a = "lai d"
b = "di al"
sorteda = []
sortedb = []
for i in a:
if i != " ":
sorteda.append(i)
if c == len(b):
for x in b:
c -= 1
if x != " ":
sortedb.append(x)
sorteda.sort(key = str.lower)
sortedb.sort(key = str.lower)
print sortedb
print sorteda
print sortedb == sorteda
How about Xor'ing both the strings??? This will definitely be of O(n)
char* arr1="ab cde";
int n1=strlen(arr1);
char* arr2="edcb a";
int n2=strlen(arr2);
// to check for anagram;
int c=0;
int i=0, j=0;
if(n1!=n2)
printf("\nNot anagram");
else {
while(i<n1 || j<n2)
{
c^= ((int)arr1[i] ^ (int)arr2[j]);
i++;
j++;
}
}
if(c==0) {
printf("\nAnagram");
}
else printf("\nNot anagram");
}
static bool IsAnagram(string s1, string s2)
{
if (s1.Length != s2.Length)
return false;
else
{
int sum1 = 0;
for (int i = 0; i < s1.Length; i++)
sum1 += (int)s1[i]-(int)s2[i];
if (sum1 == 0)
return true;
else
return false;
}
}
For known (and small) sets of valid letters (e.g. ASCII) use a table with counts associated with each valid letter. First string increments counts, second string decrements counts. Finally iterate through the table to see if all counts are zero (strings are anagrams) or there are non-zero values (strings are not anagrams). Make sure to convert all characters to uppercase (or lowercase, all the same) and to ignore white space.
For a large set of valid letters, such as Unicode, do not use table but rather use a hash table. It has O(1) time to add, query and remove and O(n) space. Letters from first string increment count, letters from second string decrement count. Count that becomes zero is removed form the hash table. Strings are anagrams if at the end hash table is empty. Alternatively, search terminates with negative result as soon as any count becomes negative.
Here is the detailed explanation and implementation in C#: Testing If Two Strings are Anagrams
If strings have only ASCII characters:
create an array of 256 length
traverse the first string and increment counter in the array at index = ascii value of the character. also keep counting characters to find length when you reach end of string
traverse the second string and decrement counter in the array at index = ascii value of the character. If the value is ever 0 before decrementing, return false since the strings are not anagrams. also, keep track of the length of this second string.
at the end of the string traversal, if lengths of the two are equal, return true, else, return false.
If string can have unicode characters, then use a hash map instead of an array to keep track of the frequency. Rest of the algorithm remains same.
Notes:
calculating length while adding characters to array ensures that we traverse each string only once.
Using array in case of an ASCII only string optimizes space based on the requirement.
I guess your sorting algorithm is not really O(log n), is it?
The best you can get is O(n) for your algorithm, because you have to check every character.
You might use two tables to store the counts of each letter in every word, fill it with O(n) and compare it with O(1).
It seems that the following implementation works too, can you check?
int histogram[256] = {0};
for (int i = 0; i < strlen(str1); ++i) {
/* Just inc and dec every char count and
* check the histogram against 0 in the 2nd loop */
++histo[str1[i]];
--histo[str2[i]];
}
for (int i = 0; i < 256; ++i) {
if (histo[i] != 0)
return 0; /* not an anagram */
}
return 1; /* an anagram */
/* Program to find the strings are anagram or not*/
/* Author Senthilkumar M*/
Eg.
Anagram:
str1 = stackoverflow
str2 = overflowstack
Not anagram:`enter code here`
str1 = stackforflow
str2 = stacknotflow
int is_anagram(char *str1, char *str2)
{
int l1 = strlen(str1);
int l2 = strlen(str2);
int s1 = 0, s2 = 0;
int i = 0;
/* if both the string are not equal it is not anagram*/
if(l1 != l2) {
return 0;
}
/* sum up the character in the strings
if the total sum of the two strings is not equal
it is not anagram */
for( i = 0; i < l1; i++) {
s1 += str1[i];
s2 += str2[i];
}
if(s1 != s2) {
return 0;
}
return 1;
}
If both strings are of equal length proceed, if not then the strings are not anagrams.
Iterate each string while summing the ordinals of each character. If the sums are equal then the strings are anagrams.
Example:
public Boolean AreAnagrams(String inOne, String inTwo) {
bool result = false;
if(inOne.Length == inTwo.Length) {
int sumOne = 0;
int sumTwo = 0;
for(int i = 0; i < inOne.Length; i++) {
sumOne += (int)inOne[i];
sumTwo += (int)inTwo[i];
}
result = sumOne == sumTwo;
}
return result;
}
implementation in Swift 3:
func areAnagrams(_ str1: String, _ str2: String) -> Bool {
return dictionaryMap(forString: str1) == dictionaryMap(forString: str2)
}
func dictionaryMap(forString str: String) -> [String : Int] {
var dict : [String : Int] = [:]
for var i in 0..<str.characters.count {
if let count = dict[str[i]] {
dict[str[i]] = count + 1
}else {
dict[str[i]] = 1
}
}
return dict
}
//To easily subscript characters
extension String {
subscript(i: Int) -> String {
return String(self[index(startIndex, offsetBy: i)])
}
}
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Scanner;
/**
* --------------------------------------------------------------------------
* Finding Anagrams in the given dictionary. Anagrams are words that can be
* formed from other words Ex :The word "words" can be formed using the word
* "sword"
* --------------------------------------------------------------------------
* Input : if choose option 2 first enter no of word want to compare second
* enter word ex:
*
* Enter choice : 1:To use Test Cases 2: To give input 2 Enter the number of
* words in dictionary
* 6
* viq
* khan
* zee
* khan
* am
*
* Dictionary : [ viq khan zee khan am]
* Anagrams 1:[khan, khan]
*
*/
public class Anagrams {
public static void main(String args[]) {
// User Input or just use the testCases
int choice;
#SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
System.out.println("Enter choice : \n1:To use Test Cases 2: To give input");
choice = scan.nextInt();
switch (choice) {
case 1:
testCaseRunner();
break;
case 2:
userInput();
default:
break;
}
}
private static void userInput() {
#SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
System.out.println("Enter the number of words in dictionary");
int number = scan.nextInt();
String dictionary[] = new String[number];
//
for (int i = 0; i < number; i++) {
dictionary[i] = scan.nextLine();
}
printAnagramsIn(dictionary);
}
/**
* provides a some number of dictionary of words
*/
private static void testCaseRunner() {
String dictionary[][] = { { "abc", "cde", "asfs", "cba", "edcs", "name" },
{ "name", "mane", "string", "trings", "embe" } };
for (int i = 0; i < dictionary.length; i++) {
printAnagramsIn(dictionary[i]);
}
}
/**
* Prints the set of anagrams found the give dictionary
*
* logic is sorting the characters in the given word and hashing them to the
* word. Data Structure: Hash[sortedChars] = word
*/
private static void printAnagramsIn(String[] dictionary) {
System.out.print("Dictionary : [");// + dictionary);
for (String each : dictionary) {
System.out.print(each + " ");
}
System.out.println("]");
//
Map<String, ArrayList<String>> map = new LinkedHashMap<String, ArrayList<String>>();
// review comment: naming convention: dictionary contains 'word' not
// 'each'
for (String each : dictionary) {
char[] sortedWord = each.toCharArray();
// sort dic value
Arrays.sort(sortedWord);
//input word
String sortedString = new String(sortedWord);
//
ArrayList<String> list = new ArrayList<String>();
if (map.keySet().contains(sortedString)) {
list = map.get(sortedString);
}
list.add(each);
map.put(sortedString, list);
}
// print anagram
int i = 1;
for (String each : map.keySet()) {
if (map.get(each).size() != 1) {
System.out.println("Anagrams " + i + ":" + map.get(each));
i++;
}
}
}
}
I just had an interview and 'SolutionA' was basically my solution.
Seems to hold.
It might also work to sum all characters, or the hashCodes of each character, but it would still be at least O(n).
/**
* Using HashMap
*
* O(a + b + b + b) = O(a + 3*b) = O( 4n ) if a and b are equal. Meaning O(n) in total.
*/
public static final class SolutionA {
//
private static boolean isAnagram(String a, String b) {
if ( a.length() != b.length() ) return false;
HashMap<Character, Integer> aa = toHistogram(a);
HashMap<Character, Integer> bb = toHistogram(b);
return isHistogramsEqual(aa, bb);
}
private static HashMap<Character, Integer> toHistogram(String characters) {
HashMap<Character, Integer> histogram = new HashMap<>();
int i = -1; while ( ++i < characters.length() ) {
histogram.compute(characters.charAt(i), (k, v) -> {
if ( v == null ) v = 0;
return v+1;
});
}
return histogram;
}
private static boolean isHistogramsEqual(HashMap<Character, Integer> a, HashMap<Character, Integer> b) {
for ( Map.Entry<Character, Integer> entry : b.entrySet() ) {
Integer aa = a.get(entry.getKey());
Integer bb = entry.getValue();
if ( !Objects.equals(aa, bb) ) {
return false;
}
}
return true;
}
public static void main(String[] args) {
System.out.println(isAnagram("abc", "cba"));
System.out.println(isAnagram("abc", "cbaa"));
System.out.println(isAnagram("abcc", "cba"));
System.out.println(isAnagram("abcd", "cba"));
System.out.println(isAnagram("twelve plus one", "eleven plus two"));
}
}
I've provided a hashCode() based implementation as well. Seems to hold as well.
/**
* Using hashCode()
*
* O(a + b) minimum + character.hashCode() calculation, the latter might be cheap though. Native implementation.
*
* Risk for collision albeit small.
*/
public static final class SolutionB {
public static void main(String[] args) {
System.out.println(isAnagram("abc", "cba"));
System.out.println(isAnagram("abc", "cbaa"));
System.out.println(isAnagram("abcc", "cba"));
System.out.println(isAnagram("abcd", "cba"));
System.out.println(isAnagram("twelve plus one", "eleven plus two"));
}
private static boolean isAnagram(String a, String b) {
if ( a.length() != b.length() ) return false;
return toHashcode(a) == toHashcode(b);
}
private static long toHashcode(String str) {
long sum = 0; int i = -1; while ( ++i < str.length() ) {
sum += Objects.hashCode( str.charAt(i) );
}
return sum;
}
}
in java we can also do it like this and its very simple logic
import java.util.*;
class Anagram
{
public static void main(String args[]) throws Exception
{
Boolean FLAG=true;
Scanner sc= new Scanner(System.in);
System.out.println("Enter 1st string");
String s1=sc.nextLine();
System.out.println("Enter 2nd string");
String s2=sc.nextLine();
int i,j;
i=s1.length();
j=s2.length();
if(i==j)
{
for(int k=0;k<i;k++)
{
for(int l=0;l<i;l++)
{
if(s1.charAt(k)==s2.charAt(l))
{
FLAG=true;
break;
}
else
FLAG=false;
}
}
}
else
FLAG=false;
if(FLAG)
System.out.println("Given Strings are anagrams");
else
System.out.println("Given Strings are not anagrams");
}
}
How about converting into the int value of the character and sum up :
If the value of sum are equals then they are anagram to each other.
def are_anagram1(s1, s2):
return [False, True][sum([ord(x) for x in s1]) == sum([ord(x) for x in s2])]
s1 = 'james'
s2 = 'amesj'
print are_anagram1(s1,s2)
This solution works only for 'A' to 'Z' and 'a' to 'z'.

How to find validity of a string of parentheses, curly brackets and square brackets?

I recently came in contact with this interesting problem. You are given a string containing just the characters '(', ')', '{', '}', '[' and ']', for example, "[{()}]", you need to write a function which will check validity of such an input string, function may be like this:
bool isValid(char* s);
these brackets have to close in the correct order, for example "()" and "()[]{}" are all valid but "(]", "([)]" and "{{{{" are not!
I came out with following O(n) time and O(n) space complexity solution, which works fine:
Maintain a stack of characters.
Whenever you find opening braces '(', '{' OR '[' push it on the stack.
Whenever you find closing braces ')', '}' OR ']' , check if top of stack is corresponding opening bracket, if yes, then pop the stack, else break the loop and return false.
Repeat steps 2 - 3 until end of the string.
This works, but can we optimize it for space, may be constant extra space, I understand that time complexity cannot be less than O(n) as we have to look at every character.
So my question is can we solve this problem in O(1) space?
With reference to the excellent answer from Matthieu M., here is an implementation in C# that seems to work beautifully.
/// <summary>
/// Checks to see if brackets are well formed.
/// Passes "Valid parentheses" challenge on www.codeeval.com,
/// which is a programming challenge site much like www.projecteuler.net.
/// </summary>
/// <param name="input">Input string, consisting of nothing but various types of brackets.</param>
/// <returns>True if brackets are well formed, false if not.</returns>
static bool IsWellFormedBrackets(string input)
{
string previous = "";
while (input.Length != previous.Length)
{
previous = input;
input = input
.Replace("()", String.Empty)
.Replace("[]", String.Empty)
.Replace("{}", String.Empty);
}
return (input.Length == 0);
}
Essentially, all it does is remove pairs of brackets until there are none left to remove; if there is anything left the brackets are not well formed.
Examples of well formed brackets:
()[]
{()[]}
Example of malformed brackets:
([)]
{()[}]
Actually, there's a deterministic log-space algorithm due to Ritchie and Springsteel: http://dx.doi.org/10.1016/S0019-9958(72)90205-7 (paywalled, sorry not online). Since we need log bits to index the string, this is space-optimal.
If you're willing to accept one-sided error, then there's an algorithm that uses n polylog(n) time and polylog(n) space: http://www.eccc.uni-trier.de/report/2009/119/
If the input is read-only, I don't think we can do O(1) space. It is a well known fact that any O(1) space decidable language is regular (i.e writeable as a regular expression). The set of strings you have is not a regular language.
Of course, this is about a Turing Machine. I would expect it to be true for fixed word RAM machines too.
Edit: Although simple, this algorithm is actually O(n^2) in terms of character comparisons. To demonstrate it, one can simply generate a string as '(' * n + ')' * n.
I have a simple, though perhaps erroneous idea, that I will submit to your criticisms.
It's a destructive algorithm, which means that if you ever need the string it would not help (since you would need to copy it down).
Otherwise, the algorithm work with a simple index within the current string.
The idea is to remove pairs one after the others:
([{}()])
([()])
([])
()
empty -> OK
It is based on the simple fact that if we have matching pairs, then at least one is of the form () without any pair character in between.
Algorithm:
i := 0
Find a matching pair from i. If none is found, then the string is not valid. If one is found, let i be the index of the first character.
Remove [i:i+1] from the string
If i is at the end of the string, and the string is not empty, it's a failure.
If [i-1:i] is a matching pair, i := i-1 and back to 3.
Else, back to 1.
The algorithm is O(n) in complexity because:
each iteration of the loop removes 2 characters from the string
the step 2., which is linear, is naturally bound (i cannot grow indefinitely)
And it's O(1) in space because only the index is required.
Of course, if you can't afford to destroy the string, then you'll have to copy it, and that's O(n) in space so no real benefit there!
Unless, of course, I am deeply mistaken somewhere... and perhaps someone could use the original idea (there is a pair somewhere) to better effect.
I doubt you'll find a better solution, since even if you use internal functions to regexp or count occurrences, they still have a O(...) cost. I'd say your solution is the best :)
To optimize for space you could do some run-length encoding on your stack, but I doubt it would gain you very much, except in cases like {{{{{{{{{{}}}}}}}}}}.
http://www.sureinterview.com/shwqst/112007
It is natural to solve this problem with a stack.
If only '(' and ')' are used, the stack is not necessary. We just need to maintain a counter for the unmatched left '('. The expression is valid if the counter is always non-negative during the match and is zero at the end.
In general case, although the stack is still necessary, the depth of the stack can be reduced by using a counter for unmatched braces.
This is an working java code where I filter out the brackets from the string expression and then check the well formedness by replacing wellformed braces by nulls
Sample input = (a+{b+c}-[d-e])+[f]-[g] FilterBrackets will output = ({}[])[][] Then I check for wellformedness.
Comments welcome.
public class ParanString {
public static void main(String[] args) {
String s = FilterBrackets("(a+{b+c}-[d-e])[][]");
while ((s.length()!=0) && (s.contains("[]")||s.contains("()")||s.contains("{}")))
{
//System.out.println(s.length());
//System.out.println(s);
s = s.replace("[]", "");
s = s.replace("()", "");
s = s.replace("{}", "");
}
if(s.length()==0)
{
System.out.println("Well Formed");
}
else
{
System.out.println("Not Well Formed");
}
}
public static String FilterBrackets(String str)
{
int len=str.length();
char arr[] = str.toCharArray();
String filter = "";
for (int i = 0; i < len; i++)
{
if ((arr[i]=='(') || (arr[i]==')') || (arr[i]=='[') || (arr[i]==']') || (arr[i]=='{') || (arr[i]=='}'))
{
filter=filter+arr[i];
}
}
return filter;
}
}
The following modification of Sbusidan's answer is O(n2) time complex but O(log n) space simple.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
char opposite(char bracket) {
switch(bracket) {
case '[':
return ']';
case '(':
return ')';
}
}
bool is_balanced(int length, char *s) {
int depth, target_depth, index;
char target_bracket;
if(length % 2 != 0) {
return false;
}
for(target_depth = length/2; target_depth > 0; target_depth--) {
depth=0;
for(index = 0; index < length; index++) {
switch(s[index]) {
case '(':
case '[':
depth++;
if(depth == target_depth) target_bracket = opposite(s[index]);
break;
case ')':
case ']':
if(depth == 0) return false;
if(depth == target_depth && s[index] != target_bracket) return false;
depth--;
break;
}
}
}
}
void main(char* argv[]) {
char input[] = "([)[(])]";
char *balanced = is_balanced(strlen(input), input) ? "balanced" : "imbalanced";
printf("%s is %s.\n", input, balanced);
}
If you can overwrite the input string (not reasonable in the use cases I envision, but what the heck...) you can do it in constant space, though I believe the time requirement goes up to O(n2).
Like this:
string s = input
char c = null
int i=0
do
if s[i] isAOpenChar()
c = s[i]
else if
c = isACloseChar()
if closeMatchesOpen(s[i],c)
erase s[i]
while s[--i] != c ;
erase s[i]
c == null
i = 0; // Not optimal! It would be better to back up until you find an opening character
else
return fail
end if
while (s[++i] != EOS)
if c==null
return pass
else
return fail
The essence of this is to use the early part of the input as the stack.
I know I'm a little late to this party; it's also my very first post on StackOverflow.
But when I looked through the answers, I thought I might be able to come up with a better solution.
So my solution is to use a few pointers.
It doesn't even have to use any RAM storage, as registers can be used for this.
I have not tested the code; it's written it on the fly.
You'll need to fix my typos, and debug it, but I believe you'll get the idea.
Memory usage: Only the CPU registers in most cases.
CPU usage: It depends, but approximately twice the time it takes to read the string.
Modifies memory: No.
b: string beginning, e: string end.
l: left position, r: right position.
c: char, m: match char
if r reaches the end of the string, we have a success.
l goes backwards from r towards b.
Whenever r meets a new start kind, set l = r.
when l reaches b, we're done with the block; jump to beginning of next block.
const char *chk(const char *b, int len) /* option 2: remove int len */
{
char c, m;
const char *l, *r;
e = &b[len]; /* option 2: remove. */
l = b;
r = b;
while(r < e) /* option 2: change to while(1) */
{
c = *r++;
/* option 2: if(0 == c) break; */
if('(' == c || '{' == c || '[' == c)
{
l = r;
}
else if(')' == c || ']' == c || '}' == c)
{
/* find 'previous' starting brace */
m = 0;
while(l > b && '(' != m && '[' != m && '{' != m)
{
m = *--l;
}
/* now check if we have the correct one: */
if(((m & 1) + 1 + m) != c) /* cryptic: convert starting kind to ending kind and match with c */
{
return(r - 1); /* point to error */
}
if(l <= b) /* did we reach the beginning of this block ? */
{
b = r; /* set new beginning to 'head' */
l = b; /* obsolete: make left is in range. */
}
}
}
m = 0;
while(l > b && '(' != m && '[' != m && '{' != m)
{
m = *--l;
}
return(m ? l : NULL); /* NULL-pointer for OK */
}
After thinking about this approach for a while, I realized that it will not work as it is right now.
The problem will be that if you have "[()()]", it'll fail when reaching the ']'.
But instead of deleting the proposed solution, I'll leave it here, as it's actually not impossible to make it work, it does require some modification, though.
/**
*
* #author madhusudan
*/
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
new Main().validateBraces("()()()()(((((())))))()()()()()()()()");
// TODO code application logic here
}
/**
* #Use this method to validate braces
*/
public void validateBraces(String teststr)
{
StringBuffer teststr1=new StringBuffer(teststr);
int ind=-1;
for(int i=0;i<teststr1.length();)
{
if(teststr1.length()<1)
break;
char ch=teststr1.charAt(0);
if(isClose(ch))
break;
else if(isOpen(ch))
{
ind=teststr1.indexOf(")", i);
if(ind==-1)
break;
teststr1=teststr1.deleteCharAt(ind).deleteCharAt(i);
}
else if(isClose(ch))
{
teststr1=deleteOpenBraces(teststr1,0,i);
}
}
if(teststr1.length()>0)
{
System.out.println("Invalid");
}else
{
System.out.println("Valid");
}
}
public boolean isOpen(char ch)
{
if("(".equals(Character.toString(ch)))
{
return true;
}else
return false;
}
public boolean isClose(char ch)
{
if(")".equals(Character.toString(ch)))
{
return true;
}else
return false;
}
public StringBuffer deleteOpenBraces(StringBuffer str,int start,int end)
{
char ar[]=str.toString().toCharArray();
for(int i=start;i<end;i++)
{
if("(".equals(ar[i]))
str=str.deleteCharAt(i).deleteCharAt(end);
break;
}
return str;
}
}
Instead of putting braces into the stack, you could use two pointers to check the characters of the string. one start from the beginning of the string and the other start from end of the string. something like
bool isValid(char* s) {
start = find_first_brace(s);
end = find_last_brace(s);
while (start <= end) {
if (!IsPair(start,end)) return false;
// move the pointer forward until reach a brace
start = find_next_brace(start);
// move the pointer backward until reach a brace
end = find_prev_brace(end);
}
return true;
}
Note that there are some corner case not handled.
I think that you can implement an O(n) algorithm. Simply you have to initialise an counter variable for each type: curly, square and normal brackets. After than you should iterate the string and should increase the coresponding counter if the bracket is opened, otherwise to decrease it. If the counter is negative return false. AfterI think that you can implement an O(n) algorithm. Simply you have to initialise an counter variable for each type: curly, square and normal brackets. After than you should iterate the string and should increase the coresponding counter if the bracket is opened, otherwise to decrease it. If the counter is negative return false. After you count all brackets, you should check if all counters are zero. In that case, the string is valid and you should return true.
You could provide the value and check if its a valid one, it would print YES otherwise it would print NO
static void Main(string[] args)
{
string value = "(((([{[(}]}]))))";
List<string> jj = new List<string>();
if (!(value.Length % 2 == 0))
{
Console.WriteLine("NO");
}
else
{
bool isValid = true;
List<string> items = new List<string>();
for (int i = 0; i < value.Length; i++)
{
string item = value.Substring(i, 1);
if (item == "(" || item == "{" || item == "[")
{
items.Add(item);
}
else
{
string openItem = items[items.Count - 1];
if (((item == ")" && openItem == "(")) || (item == "}" && openItem == "{") || (item == "]" && openItem == "["))
{
items.RemoveAt(items.Count - 1);
}
else
{
isValid = false;
break;
}
}
}
if (isValid)
{
Console.WriteLine("Yes");
}
else
{
Console.WriteLine("NO");
}
}
Console.ReadKey();
}
var verify = function(text)
{
var symbolsArray = ['[]', '()', '<>'];
var symbolReg = function(n)
{
var reg = [];
for (var i = 0; i < symbolsArray.length; i++) {
reg.push('\\' + symbolsArray[i][n]);
}
return new RegExp('(' + reg.join('|') + ')','g');
};
// openReg matches '(', '[' and '<' and return true or false
var openReg = symbolReg(0);
// closeReg matches ')', ']' and '>' and return true or false
var closeReg = symbolReg(1);
// nestTest matches openSymbol+anyChar+closeSymbol
// and returns an obj with the match str and it's start index
var nestTest = function(symbols, text)
{
var open = symbols[0]
, close = symbols[1]
, reg = new RegExp('(\\' + open + ')([\\s\\S])*(\\' + close + ')','g')
, test = reg.exec(text);
if (test) return {
start: test.index,
str: test[0]
};
else return false;
};
var recursiveCheck = function(text)
{
var i, nestTests = [], test, symbols;
// nestTest with each symbol
for (i = 0; i < symbolsArray.length; i++)
{
symbols = symbolsArray[i];
test = nestTest(symbols, text);
if (test) nestTests.push(test);
}
// sort tests by start index
nestTests.sort(function(a, b)
{
return a.start - b.start;
});
if (nestTests.length)
{
// build nest data: calculate match end index
for (i = 0; i < nestTests.length; i++)
{
test = nestTests[i];
var end = test.start + ( (test.str) ? test.str.length : 0 );
nestTests[i].end = end;
var last = (nestTests[i + 1]) ? nestTests[i + 1].index : text.length;
nestTests[i].pos = text.substring(end, last);
}
for (i = 0; i < nestTests.length; i++)
{
test = nestTests[i];
// recursive checks what's after the nest
if (test.pos.length && !recursiveCheck(test.pos)) return false;
// recursive checks what's in the nest
if (test.str.length) {
test.str = test.str.substring(1, test.str.length - 1);
return recursiveCheck(test.str);
} else return true;
}
} else {
// if no nests then check for orphan symbols
var closeTest = closeReg.test(text);
var openTest = openReg.test(text);
return !(closeTest || openTest);
}
};
return recursiveCheck(text);
};
Using c# OOPS programming... Small and simple solution
Console.WriteLine("Enter the string");
string str = Console.ReadLine();
int length = str.Length;
if (length % 2 == 0)
{
while (length > 0 && str.Length > 0)
{
for (int i = 0; i < str.Length; i++)
{
if (i + 1 < str.Length)
{
switch (str[i])
{
case '{':
if (str[i + 1] == '}')
str = str.Remove(i, 2);
break;
case '(':
if (str[i + 1] == ')')
str = str.Remove(i, 2);
break;
case '[':
if (str[i + 1] == ']')
str = str.Remove(i, 2);
break;
}
}
}
length--;
}
if(str.Length > 0)
Console.WriteLine("Invalid input");
else
Console.WriteLine("Valid input");
}
else
Console.WriteLine("Invalid input");
Console.ReadKey();
This is my solution to the problem.
O(n) is the complexity of time without complexity of space.
Code in C.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
bool checkBraket(char *s)
{
int curly = 0, rounded = 0, squre = 0;
int i = 0;
char ch = s[0];
while (ch != '\0')
{
if (ch == '{') curly++;
if (ch == '}') {
if (curly == 0) {
return false;
} else {
curly--; }
}
if (ch == '[') squre++;
if (ch == ']') {
if (squre == 0) {
return false;
} else {
squre--;
}
}
if (ch == '(') rounded++;
if (ch == ')') {
if (rounded == 0) {
return false;
} else {
rounded--;
}
}
i++;
ch = s[i];
}
if (curly == 0 && rounded == 0 && squre == 0){
return true;
}
else {
return false;
}
}
void main()
{
char mystring[] = "{{{{{[(())}}]}}}";
int answer = checkBraket(mystring);
printf("my answer is %d\n", answer);
return;
}

How can I compute the number of characters required to turn a string into a palindrome?

I recently found a contest problem that asks you to compute the minimum number of characters that must be inserted (anywhere) in a string to turn it into a palindrome.
For example, given the string: "abcbd" we can turn it into a palindrome by inserting just two characters: one after "a" and another after "d": "adbcbda".
This seems to be a generalization of a similar problem that asks for the same thing, except characters can only be added at the end - this has a pretty simple solution in O(N) using hash tables.
I have been trying to modify the Levenshtein distance algorithm to solve this problem, but haven't been successful. Any help on how to solve this (it doesn't necessarily have to be efficient, I'm just interested in any DP solution) would be appreciated.
Note: This is just a curiosity. Dav proposed an algorithm which can be modified to DP algorithm to run in O(n^2) time and O(n^2) space easily (and perhaps O(n) with better bookkeeping).
Of course, this 'naive' algorithm might actually come in handy if you decide to change the allowed operations.
Here is a 'naive'ish algorithm, which can probably be made faster with clever bookkeeping.
Given a string, we guess the middle of the resulting palindrome and then try to compute the number of inserts required to make the string a palindrome around that middle.
If the string is of length n, there are 2n+1 possible middles (Each character, between two characters, just before and just after the string).
Suppose we consider a middle which gives us two strings L and R (one to left and one to right).
If we are using inserts, I believe the Longest Common Subsequence algorithm (which is a DP algorithm) can now be used the create a 'super' string which contains both L and reverse of R, see Shortest common supersequence.
Pick the middle which gives you the smallest number inserts.
This is O(n^3) I believe. (Note: I haven't tried proving that it is true).
My C# solution looks for repeated characters in a string and uses them to reduce the number of insertions. In a word like program, I use the 'r' characters as a boundary. Inside of the 'r's, I make that a palindrome (recursively). Outside of the 'r's, I mirror the characters on the left and the right.
Some inputs have more than one shortest output: output can be toutptuot or outuputuo. My solution only selects one of the possibilities.
Some example runs:
radar -> radar, 0 insertions
esystem -> metsystem, 2 insertions
message -> megassagem, 3 insertions
stackexchange -> stegnahckexekchangets, 8 insertions
First I need to check if an input is already a palindrome:
public static bool IsPalindrome(string str)
{
for (int left = 0, right = str.Length - 1; left < right; left++, right--)
{
if (str[left] != str[right])
return false;
}
return true;
}
Then I need to find any repeated characters in the input. There may be more than one. The word message has two most-repeated characters ('e' and 's'):
private static bool TryFindMostRepeatedChar(string str, out List<char> chs)
{
chs = new List<char>();
int maxCount = 1;
var dict = new Dictionary<char, int>();
foreach (var item in str)
{
int temp;
if (dict.TryGetValue(item, out temp))
{
dict[item] = temp + 1;
maxCount = temp + 1;
}
else
dict.Add(item, 1);
}
foreach (var item in dict)
{
if (item.Value == maxCount)
chs.Add(item.Key);
}
return maxCount > 1;
}
My algorithm is here:
public static string MakePalindrome(string str)
{
List<char> repeatedList;
if (string.IsNullOrWhiteSpace(str) || IsPalindrome(str))
{
return str;
}
//If an input has repeated characters,
// use them to reduce the number of insertions
else if (TryFindMostRepeatedChar(str, out repeatedList))
{
string shortestResult = null;
foreach (var ch in repeatedList) //"program" -> { 'r' }
{
//find boundaries
int iLeft = str.IndexOf(ch); // "program" -> 1
int iRight = str.LastIndexOf(ch); // "program" -> 4
//make a palindrome of the inside chars
string inside = str.Substring(iLeft + 1, iRight - iLeft - 1); // "program" -> "og"
string insidePal = MakePalindrome(inside); // "og" -> "ogo"
string right = str.Substring(iRight + 1); // "program" -> "am"
string rightRev = Reverse(right); // "program" -> "ma"
string left = str.Substring(0, iLeft); // "program" -> "p"
string leftRev = Reverse(left); // "p" -> "p"
//Shave off extra chars in rightRev and leftRev
// When input = "message", this loop converts "meegassageem" to "megassagem",
// ("ee" to "e"), as long as the extra 'e' is an inserted char
while (left.Length > 0 && rightRev.Length > 0 &&
left[left.Length - 1] == rightRev[0])
{
rightRev = rightRev.Substring(1);
leftRev = leftRev.Substring(1);
}
//piece together the result
string result = left + rightRev + ch + insidePal + ch + right + leftRev;
//find the shortest result for inputs that have multiple repeated characters
if (shortestResult == null || result.Length < shortestResult.Length)
shortestResult = result;
}
return shortestResult;
}
else
{
//For inputs that have no repeated characters,
// just mirror the characters using the last character as the pivot.
for (int i = str.Length - 2; i >= 0; i--)
{
str += str[i];
}
return str;
}
}
Note that you need a Reverse function:
public static string Reverse(string str)
{
string result = "";
for (int i = str.Length - 1; i >= 0; i--)
{
result += str[i];
}
return result;
}
C# Recursive solution adding to the end of the string:
There are 2 base cases. When length is 1 or 2. Recursive case: If the extremes are equal, then
make palindrome the inner string without the extremes and return that with the extremes.
If the extremes are not equal, then add the first character to the end and make palindrome the
inner string including the previous last character. return that.
public static string ConvertToPalindrome(string str) // By only adding characters at the end
{
if (str.Length == 1) return str; // base case 1
if (str.Length == 2 && str[0] == str[1]) return str; // base case 2
else
{
if (str[0] == str[str.Length - 1]) // keep the extremes and call
return str[0] + ConvertToPalindrome(str.Substring(1, str.Length - 2)) + str[str.Length - 1];
else //Add the first character at the end and call
return str[0] + ConvertToPalindrome(str.Substring(1, str.Length - 1)) + str[0];
}
}

Resources