finding if two words are anagrams of each other

finding if two words are anagrams of each other - algorithm

I am looking for a method to find if two strings are anagrams of one another.
Ex: string1 - abcde
string2 - abced
Ans = true
Ex: string1 - abcde
string2 - abcfed
Ans = false
the solution i came up with so for is to sort both the strings and compare each character from both strings till the end of either strings.It would be O(logn).I am looking for some other efficient method which doesn't change the 2 strings being compared

Count the frequency of each character in the two strings. Check if the two histograms match. O(n) time, O(1) space (assuming ASCII) (Of course it is still O(1) space for Unicode but the table will become very large).

Get table of prime numbers, enough to map each prime to every character. So start from 1, going through line, multiply the number by the prime representing current character. Number you'll get is only depend on characters in string but not on their order, and every unique set of characters correspond to unique number, as any number may be factored in only one way. So you can just compare two numbers to say if a strings are anagrams of each other.
Unfortunately you have to use multiple precision (arbitrary-precision) integer arithmetic to do this, or you will get overflow or rounding exceptions when using this method.
For this you may use libraries like BigInteger, GMP, MPIR or IntX.
Pseudocode:
prime[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101}
primehash(string)
Y = 1;
foreach character in string
Y = Y * prime[character-'a']
return Y
isanagram(str1, str2)
return primehash(str1)==primehash(str2)

Create a Hashmap where key - letter and value - frequencey of letter,
for first string populate the hashmap (O(n))
for second string decrement count and remove element from hashmap O(n)
if hashmap is empty, the string is anagram otherwise not.

The steps are:
check the length of of both the words/strings if they are equal then only proceed to check for anagram else do nothing
sort both the words/strings and then compare
JAVA CODE TO THE SAME:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package anagram;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
/**
*
* #author Sunshine
*/
public class Anagram {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
// TODO code application logic here
System.out.println("Enter the first string");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String s1 = br.readLine().toLowerCase();
System.out.println("Enter the Second string");
BufferedReader br2 = new BufferedReader(new InputStreamReader(System.in));
String s2 = br2.readLine().toLowerCase();
char c1[] = null;
char c2[] = null;
if (s1.length() == s2.length()) {
c1 = s1.toCharArray();
c2 = s2.toCharArray();
Arrays.sort(c1);
Arrays.sort(c2);
if (Arrays.equals(c1, c2)) {
System.out.println("Both strings are equal and hence they have anagram");
} else {
System.out.println("Sorry No anagram in the strings entred");
}
} else {
System.out.println("Sorry the string do not have anagram");
}
}
}

C#
public static bool AreAnagrams(string s1, string s2)
{
if (s1 == null) throw new ArgumentNullException("s1");
if (s2 == null) throw new ArgumentNullException("s2");
var chars = new Dictionary<char, int>();
foreach (char c in s1)
{
if (!chars.ContainsKey(c))
chars[c] = 0;
chars[c]++;
}
foreach (char c in s2)
{
if (!chars.ContainsKey(c))
return false;
chars[c]--;
}
return chars.Values.All(i => i == 0);
}
Some tests:
[TestMethod]
public void TestAnagrams()
{
Assert.IsTrue(StringUtil.AreAnagrams("anagramm", "nagaramm"));
Assert.IsTrue(StringUtil.AreAnagrams("anzagramm", "nagarzamm"));
Assert.IsTrue(StringUtil.AreAnagrams("anz121agramm", "nag12arz1amm"));
Assert.IsFalse(StringUtil.AreAnagrams("anagram", "nagaramm"));
Assert.IsFalse(StringUtil.AreAnagrams("nzagramm", "nagarzamm"));
Assert.IsFalse(StringUtil.AreAnagrams("anzagramm", "nag12arz1amm"));
}

Code to find whether two words are anagrams:
Logic explained already in few answers and few asking for the code. This solution produce the result in O(n) time.
This approach counts the no of occurrences of each character and store it in the respective ASCII location for each string. And then compare the two array counts. If it is not equal the given strings are not anagrams.
public boolean isAnagram(String str1, String str2)
{
//To get the no of occurrences of each character and store it in their ASCII location
int[] strCountArr1=getASCIICountArr(str1);
int[] strCountArr2=getASCIICountArr(str2);
//To Test whether the two arrays have the same count of characters. Array size 256 since ASCII 256 unique values
for(int i=0;i<256;i++)
{
if(strCountArr1[i]!=strCountArr2[i])
return false;
}
return true;
}
public int[] getASCIICountArr(String str)
{
char c;
//Array size 256 for ASCII
int[] strCountArr=new int[256];
for(int i=0;i<str.length();i++)
{
c=str.charAt(i);
c=Character.toUpperCase(c);// If both the cases are considered to be the same
strCountArr[(int)c]++; //To increment the count in the character's ASCII location
}
return strCountArr;
}

Using an ASCII hash-map that allows O(1) look-up for each char.
The java example listed above is converting to lower-case that seems incomplete. I have an example in C that simply initializes a hash-map array for ASCII values to '-1'
If string2 is different in length than string 1, no anagrams
Else, we update the appropriate hash-map values to 0 for each char in string1 and string2
Then for each char in string1, we update the count in hash-map. Similarily, we decrement the value of the count for each char in string2.
The result should have values set to 0 for each char if they are anagrams. if not, some positive value set by string1 remains
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRAYMAX 128
#define True 1
#define False 0
int isAnagram(const char *string1,
const char *string2) {
int str1len = strlen(string1);
int str2len = strlen(string2);
if (str1len != str2len) /* Simple string length test */
return False;
int * ascii_hashtbl = (int * ) malloc((sizeof(int) * ARRAYMAX));
if (ascii_hashtbl == NULL) {
fprintf(stderr, "Memory allocation failed\n");
return -1;
}
memset((void *)ascii_hashtbl, -1, sizeof(int) * ARRAYMAX);
int index = 0;
while (index < str1len) { /* Populate hash_table for each ASCII value
in string1*/
ascii_hashtbl[(int)string1[index]] = 0;
ascii_hashtbl[(int)string2[index]] = 0;
index++;
}
index = index - 1;
while (index >= 0) {
ascii_hashtbl[(int)string1[index]]++; /* Increment something */
ascii_hashtbl[(int)string2[index]]--; /* Decrement something */
index--;
}
/* Use hash_table to compare string2 */
index = 0;
while (index < str1len) {
if (ascii_hashtbl[(int)string1[index]] != 0) {
/* some char is missing in string2 from string1 */
free(ascii_hashtbl);
ascii_hashtbl = NULL;
return False;
}
index++;
}
free(ascii_hashtbl);
ascii_hashtbl = NULL;
return True;
}
int main () {
char array1[ARRAYMAX], array2[ARRAYMAX];
int flag;
printf("Enter the string\n");
fgets(array1, ARRAYMAX, stdin);
printf("Enter another string\n");
fgets(array2, ARRAYMAX, stdin);
array1[strcspn(array1, "\r\n")] = 0;
array2[strcspn(array2, "\r\n")] = 0;
flag = isAnagram(array1, array2);
if (flag == 1)
printf("%s and %s are anagrams.\n", array1, array2);
else if (flag == 0)
printf("%s and %s are not anagrams.\n", array1, array2);
return 0;
}

let's take a question: Given two strings s and t, write a function to determine if t is an anagram of s.
For example,
s = "anagram", t = "nagaram", return true.
s = "rat", t = "car", return false.
Method 1(Using HashMap ):
public class Method1 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
Map<Character ,Integer> map = new HashMap<>();
for( char c : a.toCharArray()) {
map.put(c, map.getOrDefault(c, 0 ) + 1 );
}
for(char c : b.toCharArray()) {
int count = map.getOrDefault(c, 0);
if(count == 0 ) {return false ; }
else {map.put(c, count - 1 ) ; }
}
return true;
}
}
Method 2 :
public class Method2 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b));// output=> true
}
private static boolean isAnagram(String a, String b) {
int[] alphabet = new int[26];
for(int i = 0 ; i < a.length() ;i++) {
alphabet[a.charAt(i) - 'a']++ ;
}
for (int i = 0; i < b.length(); i++) {
alphabet[b.charAt(i) - 'a']-- ;
}
for( int w : alphabet ) {
if(w != 0 ) {return false;}
}
return true;
}
}
Method 3 :
public class Method3 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
char[] ca = a.toCharArray() ;
char[] cb = b.toCharArray();
Arrays.sort( ca );
Arrays.sort( cb );
return Arrays.equals(ca , cb );
}
}
Method 4 :
public class AnagramsOrNot {
public static void main(String[] args) {
String a = "Protijayi";
String b = "jayiProti";
isAnagram(a, b);
}
private static void isAnagram(String a, String b) {
Map<Integer, Integer> map = new LinkedHashMap<>();
a.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) + 1));
System.out.println(map);
b.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) - 1));
System.out.println(map);
if (map.values().contains(0)) {
System.out.println("Anagrams");
} else {
System.out.println("Not Anagrams");
}
}
}
In Python:
def areAnagram(a, b):
if len(a) != len(b): return False
count1 = [0] * 256
count2 = [0] * 256
for i in a:count1[ord(i)] += 1
for i in b:count2[ord(i)] += 1
for i in range(256):
if(count1[i] != count2[i]):return False
return True
str1 = "Giniiii"
str2 = "Protijayi"
print(areAnagram(str1, str2))
Let's take another famous Interview Question: Group the Anagrams from a given String:
public class GroupAnagrams {
public static void main(String[] args) {
String a = "Gini Gina Protijayi iGin aGin jayiProti Soudipta";
Map<String, List<String>> map = Arrays.stream(a.split(" ")).collect(Collectors.groupingBy(GroupAnagrams::sortedString));
System.out.println("MAP => " + map);
map.forEach((k,v) -> System.out.println(k +" and the anagrams are =>" + v ));
/*
Look at the Map output:
MAP => {Giin=[Gini, iGin], Paiijorty=[Protijayi, jayiProti], Sadioptu=[Soudipta], Gain=[Gina, aGin]}
As we can see, there are multiple Lists. Hence, we have to use a flatMap(List::stream)
Now, Look at the output:
Paiijorty and the anagrams are =>[Protijayi, jayiProti]
Now, look at this output:
Sadioptu and the anagrams are =>[Soudipta]
List contains only word. No anagrams.
That means we have to work with map.values(). List contains all the anagrams.
*/
String stringFromMapHavingListofLists = map.values().stream().flatMap(List::stream).collect(Collectors.joining(" "));
System.out.println(stringFromMapHavingListofLists);
}
public static String sortedString(String a) {
String sortedString = a.chars().sorted()
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append).toString();
return sortedString;
}
/*
* The output : Gini iGin Protijayi jayiProti Soudipta Gina aGin
* All the anagrams are side by side.
*/
}
Now to Group Anagrams in Python is again easy.We have to :
Sort the lists. Then, Create a dictionary. Now dictionary will tell us where are those anagrams are( Indices of Dictionary). Then values of the dictionary is the actual indices of the anagrams.
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
The Output :
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']
Another Important Anagram Question : Find the Anagram occuring Max. number of times.
In the Example, ROOPA is the word which has occured maximum number of times.
Hence, ['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP'] will be the final output.
from sqlite3 import collections
from statistics import mode, mean
import numpy as np
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
"Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
print(".....Method 1....... ")
sortedwords = [''.join(sorted(word)) for word in words]
print(sortedwords)
print("...........")
LongestAnagram = np.array(words)[np.array(sortedwords) == mode(sortedwords)]
# Longest anagram
print("Longest anagram by Method 1:")
print(LongestAnagram)
print(".....................................................")
print(".....Method 2....... ")
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords,samewords in enumerate(A):
dict.setdefault(samewords,[]).append(samewords)
#print(dict)
#{'AOOPR': ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'], 'ABTU': ['ABTU', 'ABTU', 'ABTU'], 'Sadioptu': ['Sadioptu', 'Sadioptu'], ' KPaaehiklry': [' KPaaehiklry'], 'Taeggllnouy': ['Taeggllnouy'], 'Leov': ['Leov'], 'Paiijorty': ['Paiijorty', 'Paiijorty'], 'Paaaikpr': ['Paaaikpr'], 'Saaaabhmryz': ['Saaaabhmryz'], ' CNaachlortttu': [' CNaachlortttu'], 'Saaaaborvz': ['Saaaaborvz']}
aa = max(dict.items() , key = lambda x : len(x[1]))
print("aa => " , aa)
word, anagrams = aa
print("Longest anagram by Method 2:")
print(" ".join(anagrams))
The Output :
.....Method 1.......
['AOOPR', 'ABTU', 'AOOPR', 'ABTU', 'ABTU', 'AOOPR', 'Sadioptu', ' KPaaehiklry', 'Taeggllnouy', 'AOOPR', 'Leov', 'AOOPR', 'Paiijorty', 'Paaaikpr', 'Sadioptu', 'Saaaabhmryz', 'Paiijorty', ' CNaachlortttu', 'Saaaaborvz']
...........
Longest anagram by Method 1:
['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP']
.....................................................
.....Method 2.......
aa => ('AOOPR', ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'])
Longest anagram by Method 2:
AOOPR AOOPR AOOPR AOOPR AOOPR

Well you can probably improve the best case and average case substantially just by checking the length first, then a quick checksum on the digits (not something complex, as that will probably be worse order than the sort, just a summation of ordinal values), then sort, then compare.
If the strings are very short the checksum expense will be not greatly dissimilar to the sort in many languages.

How about this?
a = "lai d"
b = "di al"
sorteda = []
sortedb = []
for i in a:
if i != " ":
sorteda.append(i)
if c == len(b):
for x in b:
c -= 1
if x != " ":
sortedb.append(x)
sorteda.sort(key = str.lower)
sortedb.sort(key = str.lower)
print sortedb
print sorteda
print sortedb == sorteda

How about Xor'ing both the strings??? This will definitely be of O(n)
char* arr1="ab cde";
int n1=strlen(arr1);
char* arr2="edcb a";
int n2=strlen(arr2);
// to check for anagram;
int c=0;
int i=0, j=0;
if(n1!=n2)
printf("\nNot anagram");
else {
while(i<n1 || j<n2)
{
c^= ((int)arr1[i] ^ (int)arr2[j]);
i++;
j++;
}
}
if(c==0) {
printf("\nAnagram");
}
else printf("\nNot anagram");
}

static bool IsAnagram(string s1, string s2)
{
if (s1.Length != s2.Length)
return false;
else
{
int sum1 = 0;
for (int i = 0; i < s1.Length; i++)
sum1 += (int)s1[i]-(int)s2[i];
if (sum1 == 0)
return true;
else
return false;
}
}

For known (and small) sets of valid letters (e.g. ASCII) use a table with counts associated with each valid letter. First string increments counts, second string decrements counts. Finally iterate through the table to see if all counts are zero (strings are anagrams) or there are non-zero values (strings are not anagrams). Make sure to convert all characters to uppercase (or lowercase, all the same) and to ignore white space.
For a large set of valid letters, such as Unicode, do not use table but rather use a hash table. It has O(1) time to add, query and remove and O(n) space. Letters from first string increment count, letters from second string decrement count. Count that becomes zero is removed form the hash table. Strings are anagrams if at the end hash table is empty. Alternatively, search terminates with negative result as soon as any count becomes negative.
Here is the detailed explanation and implementation in C#: Testing If Two Strings are Anagrams

If strings have only ASCII characters:
create an array of 256 length
traverse the first string and increment counter in the array at index = ascii value of the character. also keep counting characters to find length when you reach end of string
traverse the second string and decrement counter in the array at index = ascii value of the character. If the value is ever 0 before decrementing, return false since the strings are not anagrams. also, keep track of the length of this second string.
at the end of the string traversal, if lengths of the two are equal, return true, else, return false.
If string can have unicode characters, then use a hash map instead of an array to keep track of the frequency. Rest of the algorithm remains same.
Notes:
calculating length while adding characters to array ensures that we traverse each string only once.
Using array in case of an ASCII only string optimizes space based on the requirement.

I guess your sorting algorithm is not really O(log n), is it?
The best you can get is O(n) for your algorithm, because you have to check every character.
You might use two tables to store the counts of each letter in every word, fill it with O(n) and compare it with O(1).

It seems that the following implementation works too, can you check?
int histogram[256] = {0};
for (int i = 0; i < strlen(str1); ++i) {
/* Just inc and dec every char count and
* check the histogram against 0 in the 2nd loop */
++histo[str1[i]];
--histo[str2[i]];
}
for (int i = 0; i < 256; ++i) {
if (histo[i] != 0)
return 0; /* not an anagram */
}
return 1; /* an anagram */

/* Program to find the strings are anagram or not*/
/* Author Senthilkumar M*/
Eg.
Anagram:
str1 = stackoverflow
str2 = overflowstack
Not anagram:`enter code here`
str1 = stackforflow
str2 = stacknotflow
int is_anagram(char *str1, char *str2)
{
int l1 = strlen(str1);
int l2 = strlen(str2);
int s1 = 0, s2 = 0;
int i = 0;
/* if both the string are not equal it is not anagram*/
if(l1 != l2) {
return 0;
}
/* sum up the character in the strings
if the total sum of the two strings is not equal
it is not anagram */
for( i = 0; i < l1; i++) {
s1 += str1[i];
s2 += str2[i];
}
if(s1 != s2) {
return 0;
}
return 1;
}

If both strings are of equal length proceed, if not then the strings are not anagrams.
Iterate each string while summing the ordinals of each character. If the sums are equal then the strings are anagrams.
Example:
public Boolean AreAnagrams(String inOne, String inTwo) {
bool result = false;
if(inOne.Length == inTwo.Length) {
int sumOne = 0;
int sumTwo = 0;
for(int i = 0; i < inOne.Length; i++) {
sumOne += (int)inOne[i];
sumTwo += (int)inTwo[i];
}
result = sumOne == sumTwo;
}
return result;
}

implementation in Swift 3:
func areAnagrams(_ str1: String, _ str2: String) -> Bool {
return dictionaryMap(forString: str1) == dictionaryMap(forString: str2)
}
func dictionaryMap(forString str: String) -> [String : Int] {
var dict : [String : Int] = [:]
for var i in 0..<str.characters.count {
if let count = dict[str[i]] {
dict[str[i]] = count + 1
}else {
dict[str[i]] = 1
}
}
return dict
}
//To easily subscript characters
extension String {
subscript(i: Int) -> String {
return String(self[index(startIndex, offsetBy: i)])
}
}

import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Scanner;
/**
* --------------------------------------------------------------------------
* Finding Anagrams in the given dictionary. Anagrams are words that can be
* formed from other words Ex :The word "words" can be formed using the word
* "sword"
* --------------------------------------------------------------------------
* Input : if choose option 2 first enter no of word want to compare second
* enter word ex:
*
* Enter choice : 1:To use Test Cases 2: To give input 2 Enter the number of
* words in dictionary
* 6
* viq
* khan
* zee
* khan
* am
*
* Dictionary : [ viq khan zee khan am]
* Anagrams 1:[khan, khan]
*
*/
public class Anagrams {
public static void main(String args[]) {
// User Input or just use the testCases
int choice;
#SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
System.out.println("Enter choice : \n1:To use Test Cases 2: To give input");
choice = scan.nextInt();
switch (choice) {
case 1:
testCaseRunner();
break;
case 2:
userInput();
default:
break;
}
}
private static void userInput() {
#SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
System.out.println("Enter the number of words in dictionary");
int number = scan.nextInt();
String dictionary[] = new String[number];
//
for (int i = 0; i < number; i++) {
dictionary[i] = scan.nextLine();
}
printAnagramsIn(dictionary);
}
/**
* provides a some number of dictionary of words
*/
private static void testCaseRunner() {
String dictionary[][] = { { "abc", "cde", "asfs", "cba", "edcs", "name" },
{ "name", "mane", "string", "trings", "embe" } };
for (int i = 0; i < dictionary.length; i++) {
printAnagramsIn(dictionary[i]);
}
}
/**
* Prints the set of anagrams found the give dictionary
*
* logic is sorting the characters in the given word and hashing them to the
* word. Data Structure: Hash[sortedChars] = word
*/
private static void printAnagramsIn(String[] dictionary) {
System.out.print("Dictionary : [");// + dictionary);
for (String each : dictionary) {
System.out.print(each + " ");
}
System.out.println("]");
//
Map<String, ArrayList<String>> map = new LinkedHashMap<String, ArrayList<String>>();
// review comment: naming convention: dictionary contains 'word' not
// 'each'
for (String each : dictionary) {
char[] sortedWord = each.toCharArray();
// sort dic value
Arrays.sort(sortedWord);
//input word
String sortedString = new String(sortedWord);
//
ArrayList<String> list = new ArrayList<String>();
if (map.keySet().contains(sortedString)) {
list = map.get(sortedString);
}
list.add(each);
map.put(sortedString, list);
}
// print anagram
int i = 1;
for (String each : map.keySet()) {
if (map.get(each).size() != 1) {
System.out.println("Anagrams " + i + ":" + map.get(each));
i++;
}
}
}
}

I just had an interview and 'SolutionA' was basically my solution.
Seems to hold.
It might also work to sum all characters, or the hashCodes of each character, but it would still be at least O(n).
/**
* Using HashMap
*
* O(a + b + b + b) = O(a + 3*b) = O( 4n ) if a and b are equal. Meaning O(n) in total.
*/
public static final class SolutionA {
//
private static boolean isAnagram(String a, String b) {
if ( a.length() != b.length() ) return false;
HashMap<Character, Integer> aa = toHistogram(a);
HashMap<Character, Integer> bb = toHistogram(b);
return isHistogramsEqual(aa, bb);
}
private static HashMap<Character, Integer> toHistogram(String characters) {
HashMap<Character, Integer> histogram = new HashMap<>();
int i = -1; while ( ++i < characters.length() ) {
histogram.compute(characters.charAt(i), (k, v) -> {
if ( v == null ) v = 0;
return v+1;
});
}
return histogram;
}
private static boolean isHistogramsEqual(HashMap<Character, Integer> a, HashMap<Character, Integer> b) {
for ( Map.Entry<Character, Integer> entry : b.entrySet() ) {
Integer aa = a.get(entry.getKey());
Integer bb = entry.getValue();
if ( !Objects.equals(aa, bb) ) {
return false;
}
}
return true;
}
public static void main(String[] args) {
System.out.println(isAnagram("abc", "cba"));
System.out.println(isAnagram("abc", "cbaa"));
System.out.println(isAnagram("abcc", "cba"));
System.out.println(isAnagram("abcd", "cba"));
System.out.println(isAnagram("twelve plus one", "eleven plus two"));
}
}
I've provided a hashCode() based implementation as well. Seems to hold as well.
/**
* Using hashCode()
*
* O(a + b) minimum + character.hashCode() calculation, the latter might be cheap though. Native implementation.
*
* Risk for collision albeit small.
*/
public static final class SolutionB {
public static void main(String[] args) {
System.out.println(isAnagram("abc", "cba"));
System.out.println(isAnagram("abc", "cbaa"));
System.out.println(isAnagram("abcc", "cba"));
System.out.println(isAnagram("abcd", "cba"));
System.out.println(isAnagram("twelve plus one", "eleven plus two"));
}
private static boolean isAnagram(String a, String b) {
if ( a.length() != b.length() ) return false;
return toHashcode(a) == toHashcode(b);
}
private static long toHashcode(String str) {
long sum = 0; int i = -1; while ( ++i < str.length() ) {
sum += Objects.hashCode( str.charAt(i) );
}
return sum;
}
}

in java we can also do it like this and its very simple logic
import java.util.*;
class Anagram
{
public static void main(String args[]) throws Exception
{
Boolean FLAG=true;
Scanner sc= new Scanner(System.in);
System.out.println("Enter 1st string");
String s1=sc.nextLine();
System.out.println("Enter 2nd string");
String s2=sc.nextLine();
int i,j;
i=s1.length();
j=s2.length();
if(i==j)
{
for(int k=0;k<i;k++)
{
for(int l=0;l<i;l++)
{
if(s1.charAt(k)==s2.charAt(l))
{
FLAG=true;
break;
}
else
FLAG=false;
}
}
}
else
FLAG=false;
if(FLAG)
System.out.println("Given Strings are anagrams");
else
System.out.println("Given Strings are not anagrams");
}
}

How about converting into the int value of the character and sum up :
If the value of sum are equals then they are anagram to each other.
def are_anagram1(s1, s2):
return [False, True][sum([ord(x) for x in s1]) == sum([ord(x) for x in s2])]
s1 = 'james'
s2 = 'amesj'
print are_anagram1(s1,s2)
This solution works only for 'A' to 'Z' and 'a' to 'z'.

Related

how to find the most letter(s) with the same frequency

I just started using java, I'm trying to create a nested for-loop (without using arrays) that gives me how many letters (from alphabet) have a frequency of zero in a string. So if my string is "test", then it should display "23 letters" as an answer because only 3 out of 26 letters are in the string. However, my program is missing information. I'm trying to make sure my program can target the specific frequency I'm looking for ie. 0.
Here is my program so far:
public class FindMaxandMinofString {
public static void main(String[] args) {
char charToLookFor;
String s = "test";
int count = 0;
for (charToLookFor = 'a'; charToLookFor = 'z' ;charToLookFor++)
{
for(int l = 0; l < s.length(); l++) {
if(s.charAt(l) == charToLookFor)
count++;
}
System.out.print(count);
}

Instead of counting of a count of 0, start from a count of 26 and subtract from it whenever you find a new letter. It is import to break from the loop when you find one otherwise you may count each letter more than once.
public class FindMaxandMinofString {
public static void main(String[] args) {
char charToLookFor;
String s = "test";
int count = 26;
for (charToLookFor = 'a'; charToLookFor <= 'z' ;charToLookFor++)
{
for(int l = 0; l < s.length(); l++)
{
if(s.charAt(l) == charToLookFor)
{
count--;
break;
}
}
}
System.out.print(count + " letters");
}
}

You can accomplish this task by using a hash set - when you find a character, add it to the hash set. Your answer will be 26 minus the size of the final hash set after you're done iterating through the entire string.

Efficient tuple search algorithm

Given a store of 3-tuples where:
All elements are numeric ex :( 1, 3, 4) (1300, 3, 15) (1300, 3, 15) …
Tuples are removed and added frequently
At any time the store is typically under 100,000 elements
All Tuples are available in memory
The application is interactive requiring 100s of searches per second.
What are the most efficient algorithms/data structures to perform wild card (*) searches such as:
(1, *, 6) (3601, *, *) (*, 1935, *)
The aim is to have a Linda like tuple space but on an application level

Well, there are only 8 possible arrangements of wildcards, so you can easily construct 6 multi-maps and a set to serve as indices: one for each arrangement of wildcards in the query. You don't need an 8th index because the query (*,*,*) trivially returns all tuples. The set is for tuples with no wildcards; only a membership test is needed in this case.
A multimap takes a key to a set. In your example, e.g., the query (1,*,6) would consult the multimap for queries of the form (X,*,Y), which takes key <X,Y> to the set of all tuples with X in the first position and Y in third. In this case, X=1 and Y=6.
With any reasonable hash-based multimap implementation, lookups ought to be very fast. Several hundred a second ought to be easy, and several thousand per second doable (with e.g a contemporary x86 CPU).
Insertions and deletions require updating the maps and set. Again this ought to be reasonably fast, though not as fast as lookups of course. Again several hundred per second ought to be doable.
With only ~10^5 tuples, this approach ought to be fine for memory as well. You can save a bit of space with tricks, e.g. keeping a single copy of each tuple in an array and storing indices in the map/set to represent both key and value. Manage array slots with a free list.
To make this concrete, here is pseudocode. I'm going to use angle brackets <a,b,c> for tuples to avoid too many parens:
# Definitions
For a query Q <k2,k1,k0> where each of k_i is either * or an integer,
Let I(Q) be a 3-digit binary number b2|b1|b0 where
b_i=0 if k_i is * and 1 if k_i is an integer.
Let N(i) be the number of 1's in the binary representation of i
Let M(i) be a multimap taking a tuple with N(i) elements to a set
of tuples with 3 elements.
Let t be a 3 element tuple. Then T(t,i) returns a new tuple with
only the elements of t in positions where i has a 1. For example
T(<1,2,3>,0) = <> and T(<1,2,3>,6) = <2,3>
Note that function T works fine on query tuples with wildcards.
# Algorithm to insert tuple T into the database:
fun insert(t)
for i = 0 to 7
add the entry T(t,i)->t to M(i)
# Algorithm to delete tuple T from the database:
fun delete(t)
for i = 0 to 7
delete the entry T(t,i)->t from M(i)
# Query algorithm
fun query(Q)
let i = I(Q)
return M(i).lookup(T(Q, i)) # lookup failure returns empty set
Note that for simplicity, I've not shown the "optimizations" for M(0) and M(7). For M(0), the algorithm above would create a multimap taking the empty tuple to the set of all 3-tuples in the database. You can avoid this merely by treating i=0 as a special case. Similarly M(7) would take each tuple to a set containing only itself.
An "optimized" version:
fun insert(t)
for i = 1 to 6
add the entry T(t,i)->t to M(i)
add t to set S
fun delete(t)
for i = 1 to 6
delete the entry T(t,i)->t from M(i)
remove t from set S
fun query(Q)
let i = I(Q)
if i = 0, return S
elsif i = 7 return if Q\in S { Q } else {}
else return M(i).lookup(T(Q, i))
Addition
For fun, a Java implementation:
package hacking;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
public class Hacking {
public static void main(String [] args) {
TupleDatabase db = new TupleDatabase();
int n = 200000;
long start = System.nanoTime();
for (int i = 0; i < n; ++i) {
db.insert(db.randomTriple());
}
long stop = System.nanoTime();
double elapsedSec = (stop - start) * 1e-9;
System.out.println("Inserted " + n + " tuples in " + elapsedSec
+ " seconds (" + (elapsedSec / n * 1000.0) + "ms per insert).");
Scanner in = new Scanner(System.in);
for (;;) {
System.out.print("Query: ");
int a = in.nextInt();
int b = in.nextInt();
int c = in.nextInt();
System.out.println(db.query(new Tuple(a, b, c)));
}
}
}
class Tuple {
static final int [] N_ONES = new int[] { 0, 1, 1, 2, 1, 2, 2, 3 };
static final int STAR = -1;
final int [] vals;
Tuple(int a, int b, int c) {
vals = new int[] { a, b, c };
}
Tuple(Tuple t, int code) {
vals = new int[N_ONES[code]];
int m = 0;
for (int k = 0; k < 3; ++k) {
if (((1 << k) & code) > 0) {
vals[m++] = t.vals[k];
}
}
}
#Override
public boolean equals(Object other) {
if (other instanceof Tuple) {
Tuple triple = (Tuple) other;
return Arrays.equals(this.vals, triple.vals);
}
return false;
}
#Override
public int hashCode() {
return Arrays.hashCode(this.vals);
}
#Override
public String toString() {
return Arrays.toString(vals);
}
int code() {
int c = 0;
for (int k = 0; k < 3; k++) {
if (vals[k] != STAR) {
c |= (1 << k);
}
}
return c;
}
Set<Tuple> setOf() {
Set<Tuple> s = new HashSet<>();
s.add(this);
return s;
}
}
class Multimap extends HashMap<Tuple, Set<Tuple>> {
#Override
public Set<Tuple> get(Object key) {
Set<Tuple> r = super.get(key);
return r == null ? Collections.<Tuple>emptySet() : r;
}
void put(Tuple key, Tuple value) {
if (containsKey(key)) {
super.get(key).add(value);
} else {
super.put(key, value.setOf());
}
}
void remove(Tuple key, Tuple value) {
Set<Tuple> set = super.get(key);
set.remove(value);
if (set.isEmpty()) {
super.remove(key);
}
}
}
class TupleDatabase {
final Set<Tuple> set;
final Multimap [] maps;
TupleDatabase() {
set = new HashSet<>();
maps = new Multimap[7];
for (int i = 1; i < 7; i++) {
maps[i] = new Multimap();
}
}
void insert(Tuple t) {
set.add(t);
for (int i = 1; i < 7; i++) {
maps[i].put(new Tuple(t, i), t);
}
}
void delete(Tuple t) {
set.remove(t);
for (int i = 1; i < 7; i++) {
maps[i].remove(new Tuple(t, i), t);
}
}
Set<Tuple> query(Tuple q) {
int c = q.code();
switch (c) {
case 0: return set;
case 7: return set.contains(q) ? q.setOf() : Collections.<Tuple>emptySet();
default: return maps[c].get(new Tuple(q, c));
}
}
Random gen = new Random();
int randPositive() {
return gen.nextInt(1000);
}
Tuple randomTriple() {
return new Tuple(randPositive(), randPositive(), randPositive());
}
}
Some output:
Inserted 200000 tuples in 2.981607358 seconds (0.014908036790000002ms per insert).
Query: -1 -1 -1
[[504, 296, 987], [500, 446, 184], [499, 482, 16], [488, 823, 40], ...
Query: 500 446 -1
[[500, 446, 184], [500, 446, 762]]
Query: -1 -1 500
[[297, 56, 500], [848, 185, 500], [556, 351, 500], [779, 986, 500], [935, 279, 500], ...

If you think of the tuples like a ip address, then a radix tree (trie) type structure might work. Radix tree is used for IP discovery.
Another way maybe to calculate use bit operations and calculate a bit hash for the tuple and in your search do bit (or, and) for quick discovery.

Getting all combination of array elements that form a given string

I am stuck on this interview question.
Given a word S and an array of strings A. How to find all possible combinations of A elemnts that can form S.
example :
S = "hotday"
A = ["o","ho","h","tday"]
the possible combinations are : ("h"+"o"+"tday") and ("ho"+"tday").
thanks

You can use backtracking. Here is some pseudo code:
def generateSolutions(unusedWords, usedWords, string, position):
if position == string.length():
print(usedWords)
else:
for word in unusedWords:
if word is a prefix of string[position ... s.length() - 1]:
generateSolutions(unusedWords - word, usedWords + word,
string, position + word.length())
generateSolution(words, an empty list, input string, 0)
The idea is very simple: we can just pick an unused word that matches a prefix of the rest of the input string and keep generating all valid combinations recursively(I assume that we can use each word from the given list of words only once). This solution has an exponential time complexity, but is not possible to do much better in the worst case. For instance, if the given string is abcdef...yz and the list of words is [a, b, c, ..., z, ab, cd, ..., yz], the number of such combinations is 2 ^ n / 2, where n is the length of the given string.

You could iterate through all permutations of A and see which ones fit. Python sample implementation:
import itertools
S = "hotday"
A = ["o","ho","h","tday"]
for count in range(len(A)):
for pieces in itertools.permutations(A, count):
if "".join(pieces) == S:
print pieces
Result:
('ho', 'tday')
('h', 'o', 'tday')
Yes, this is O(N!), but that's fine for the small A you've provided.

This is my java solution, it is the implementation of the pseudo code of "ILoveCoding" :
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
public class PossibleCombination {
public static void printPossibleCombinations(String[] tab, String S)
{
ArrayList<String> used = new ArrayList<String>();
ArrayList<String> notused = new ArrayList<String>(Arrays.asList(tab));
printPossibleCombinations(used, notused, S,0);
}
private static void printPossibleCombinations(ArrayList<String> used,
ArrayList<String> notused, String s,int pos) {
if (pos == s.length())
{ System.out.println("Possible combinaiton : ");
for(String e : used)
{
System.out.print(e + " - ");
System.out.println();
}
}
HashSet<String> prefixes = getPossiblePrefixes(s,pos);
for(String e : notused)
{
if (prefixes.contains(e))
{
ArrayList<String> isused = new ArrayList<String>(used);
isused.add(e);
ArrayList<String> isnotused = new ArrayList<String>(notused);
isnotused.remove(e);
printPossibleCombinations(isused, isnotused,s, pos + e.length());
}
}
}
private static HashSet<String> getPossiblePrefixes(String s, int pos) {
HashSet<String> prefixes = new HashSet<String>();
for(int i = pos ; i<= s.length() ; i++)
{
prefixes.add(s.substring(pos,i));
}
return prefixes;
}
public static void main(String[] args) {
String[] tab = {"o","ho","h","tday"};
String S = "hotday";
printPossibleCombinations(tab, S);
}
}

Given a string, find its first non-repeating character in only One scan

Given a string, find the first non-repeating character in it. For
example, if the input string is “GeeksforGeeks”, then output should be
‘f’.
We can use string characters as index and build a count array.
Following is the algorithm.
Scan the string from left to right and construct the count array or
HashMap.
Again, scan the string from left to right and check for
count of each character, if you find an element who's count is 1,
return it.
Above problem and algorithm is from GeeksForGeeks
But it requires two scan of an array. I want to find first non-repeating character in only one scan.
I implemented above algorithm Please check it also on Ideone:
import java.util.HashMap;
import java.util.Scanner;
/**
*
* #author Neelabh
*/
public class FirstNonRepeatedCharacter {
public static void main(String [] args){
Scanner scan=new Scanner(System.in);
String string=scan.next();
int len=string.length();
HashMap<Character, Integer> hashMap=new HashMap<Character, Integer>();
//First Scan
for(int i = 0; i <len;i++){
char currentCharacter=string.charAt(i);
if(!hashMap.containsKey(currentCharacter)){
hashMap.put(currentCharacter, 1);
}
else{
hashMap.put(currentCharacter, hashMap.get(currentCharacter)+1);
}
}
// Second Scan
boolean flag=false;
char firstNonRepeatingChar = 0;
for(int i=0;i<len;i++){
char c=string.charAt(i);
if(hashMap.get(c)==1){
flag=true;
firstNonRepeatingChar=c;
break;
}
}
if(flag==true)
System.out.println("firstNonRepeatingChar is "+firstNonRepeatingChar);
else
System.out.println("There is no such type of character");
}
}
GeeksforGeeks also suggest efficient method but I think it is also two scan. Following solution is from GeeksForGeeks
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define NO_OF_CHARS 256
// Structure to store count of a character and index of the first
// occurrence in the input string
struct countIndex {
int count;
int index;
};
/* Returns an array of above structure type. The size of
array is NO_OF_CHARS */
struct countIndex *getCharCountArray(char *str)
{
struct countIndex *count =
(struct countIndex *)calloc(sizeof(countIndex), NO_OF_CHARS);
int i;
// This is First Scan
for (i = 0; *(str+i); i++)
{
(count[*(str+i)].count)++;
// If it's first occurrence, then store the index
if (count[*(str+i)].count == 1)
count[*(str+i)].index = i;
}
return count;
}
/* The function returns index of the first non-repeating
character in a string. If all characters are repeating
then reurns INT_MAX */
int firstNonRepeating(char *str)
{
struct countIndex *count = getCharCountArray(str);
int result = INT_MAX, i;
//Second Scan
for (i = 0; i < NO_OF_CHARS; i++)
{
// If this character occurs only once and appears
// before the current result, then update the result
if (count[i].count == 1 && result > count[i].index)
result = count[i].index;
}
free(count); // To avoid memory leak
return result;
}
/* Driver program to test above function */
int main()
{
char str[] = "geeksforgeeks";
int index = firstNonRepeating(str);
if (index == INT_MAX)
printf("Either all characters are repeating or string is empty");
else
printf("First non-repeating character is %c", str[index]);
getchar();
return 0;
}

You can store 2 arrays: count of each character and the first occurrence(and fill both of them during the first scan). Then the second scan will be unnecessary.

Use String functions of java then you find the solution in only one for loop
The Example is show below
import java.util.Scanner;
public class firstoccurance {
public static void main(String args[]){
char [] a ={'h','h','l','l','o'};
//Scanner sc=new Scanner(System.in);
String s=new String(a);//sc.next();
char c;
int i;
int length=s.length();
for(i=0;i<length;i++)
{
c=s.charAt(i);
if(s.indexOf(c)==s.lastIndexOf(c))
{
System.out.println("first non repeating char in a string "+c);
break;
}
else if(i==length-1)
{
System.out.println("no single char");
}
}
}
}

In following solution I declare one class CharCountAndPosition which stores firstIndex and frequencyOfchar. During the reading string characterwise, firstIndex stores the first encounter of character and frequencyOfchar stores the total occurrence of characters.
We will make array of CharCountAndPosition step:1 and Initialize it step2.
During scanning the string, Initialize the firstIndex and frequencyOfchar for every character step3.
Now In the step4 check the array of CharCountAndPosition, find the character with frequency==1 and minimum firstIndex
Over all time complexity is O(n+256), where n is size of string. O(n+256) is equivalent to O(n) Because 256 is constant. Please find solution of this on ideone
public class FirstNonRepeatedCharacterEfficient {
public static void main(String [] args){
// step1: make array of CharCountAndPosition.
CharCountAndPosition [] array=new CharCountAndPosition[256];
// step2: Initialize array with object of CharCountAndPosition.
for(int i=0;i<256;i++)
{
array[i]=new CharCountAndPosition();
}
Scanner scan=new Scanner(System.in);
String str=scan.next();
int len=str.length();
// step 3
for(int i=0;i<len;i++){
char c=str.charAt(i);
int index=c-'a';
int frequency=array[index].frequencyOfchar;
if(frequency==0)
array[index].firstIndex=i;
array[index].frequencyOfchar=frequency+1;
//System.out.println(c+" "+array[index].frequencyOfchar);
}
boolean flag=false;
int firstPosition=Integer.MAX_VALUE;
for(int i=0;i<256;i++){
// Step4
if(array[i].frequencyOfchar==1){
//System.out.println("character="+(char)(i+(int)'a'));
if(firstPosition> array[i].firstIndex){
firstPosition=array[i].firstIndex;
flag=true;
}
}
}
if(flag==true)
System.out.println(str.charAt(firstPosition));
else
System.out.println("There is no such type of character");
}
}
class CharCountAndPosition{
int firstIndex;
int frequencyOfchar;
}

A solution in javascript with a lookup table:
var sample="It requires two scan of an array I want to find first non repeating character in only one scan";
var sampleArray=sample.split("");
var table=Object.create(null);
sampleArray.forEach(function(char,idx){
char=char.toLowerCase();
var pos=table[char];
if(typeof(pos)=="number"){
table[char]=sampleArray.length; //a duplicate found; we'll assign some invalid index value to this entry and discard these characters later
return;
}
table[char]=idx; //index of first occurance of this character
});
var uniques=Object.keys(table).filter(function(k){
return table[k]<sampleArray.length;
}).map(function(k){
return {key:k,pos:table[k]};
});
uniques.sort(function(a,b){
return a.pos-b.pos;
});
uniques.toSource(); //[{key:"q", pos:5}, {key:"u", pos:6}, {key:"d", pos:46}, {key:"p", pos:60}, {key:"g", pos:66}, {key:"h", pos:69}, {key:"l", pos:83}]
(uniques.shift()||{}).key; //q

Following C prog, add char specific value to 'count' if char didn't occurred before, removes char specific value from 'count' if char had occurred before. At the end I get a 'count' that has char specific value which indicate what was that char!
//TO DO:
//If multiple unique char occurs, which one is occurred before?
//Is is possible to get required values (1,2,4,8,..) till _Z_ and _z_?
#include <stdio.h>
#define _A_ 1
#define _B_ 2
#define _C_ 4
#define _D_ 8
//And so on till _Z
//Same for '_a' to '_z'
#define ADDIFNONREP(C) if(count & C) count = count & ~C; else count = count | C; break;
char getNonRepChar(char *str)
{
int i = 0, count = 0;
for(i = 0; str[i] != '\0'; i++)
{
switch(str[i])
{
case 'A':
ADDIFNONREP(_A_);
case 'B':
ADDIFNONREP(_B_);
case 'C':
ADDIFNONREP(_C_);
case 'D':
ADDIFNONREP(_D_);
//And so on
//Same for 'a' to 'z'
}
}
switch(count)
{
case _A_:
return 'A';
case _B_:
return 'B';
case _C_:
return 'C';
case _D_:
return 'D';
//And so on
//Same for 'a' to 'z'
}
}
int main()
{
char str[] = "ABCDABC";
char c = getNonRepChar(str);
printf("%c\n", c); //Prints D
return 0;
}

You can maintain a queue of keys as they are added to the hash map (you add your key to the queue if you add a new key to the hash map). After string scan, you use the queue to obtain the order of the keys as they were added to the map. This functionality is exactly what Java standard library class OrderedHashMap does.

Here is my take on the problem.
Iterate through string. Check if hashset contains the character. If so delete it from array. If not present just add it to the array and hashset.
NSMutableSet *repeated = [[NSMutableSet alloc] init]; //Hashset
NSMutableArray *nonRepeated = [[NSMutableArray alloc] init]; //Array
for (int i=0; i<[test length]; i++) {
NSString *currentObj = [NSString stringWithFormat:#"%c", [test characterAtIndex:i]]; //No support for primitive data types.
if ([repeated containsObject:currentObj]) {
[nonRepeated removeObject:currentObj];// in obj-c nothing happens even if nonrepeted in nil
continue;
}
[repeated addObject:currentObj];
[nonRepeated addObject:currentObj];
}
NSLog(#"This is the character %#", [nonRepeated objectAtIndex:0]);

If you can restrict yourself to strings of ASCII characters, I would recommend a lookup table instead of a hash table. This lookup table would have only 128 entries.
A possible approach would be as follows.
We start with an empty queue Q (may be implemented using linked lists) and a lookup table T. For a character ch, T[ch] stores a pointer to a queue node containing the character ch and the index of the first occurrence of ch in the string. Initially, all entries of T are NULL.
Each queue node stores the character and the first occurrence index as specified earlier, and also has a special boolean flag named removed which indicates that the node has been removed from the queue.
Read the string character by character. If the ith character is ch, check if T[ch] = NULL. If so, this is the first occurrence of ch in the string. Then add a node for ch containing the index i to the queue.
If T[ch] is not NULL, this is a repeating character. If the node pointed to by T[ch] has already been removed (i.e. the removed flag of the node is set), then nothing needs to be done. Otherwise, remove the node from the queue by manipulating the pointers of the previous and next nodes. Also set the removed flag of the node to indicate that the node is now removed. Note that we do not free/delete the node at this stage, nor do we set T[ch] back to NULL.
If we proceed in this way, the nodes for all the repeating characters will be removed from the queue. The removed flag is used to ensure that no node is removed twice from the queue if the character occurs more than two times.
After the string has been completely processed, the first node of the linked list will contain the character code as well as the index of the first non-repeating character. Then, the memory can be freed by iterating over the entries of lookup table T and freeing any non-NULL entries.
Here is a C implementation. Here, instead of the removed flag, I set the prev and next pointers of the current node to NULL when it is removed, and check for that to see if a node has already been removed.
#include <stdio.h>
#include <stdlib.h>
struct queue_node {
int ch;
int index;
struct queue_node *prev;
struct queue_node *next;
};
void print_queue (struct queue_node *head);
int main (void)
{
int i;
struct queue_node *lookup_entry[128];
struct queue_node *head;
struct queue_node *last;
struct queue_node *cur_node, *prev_node, *next_node;
char str [] = "GeeksforGeeks";
head = malloc (sizeof (struct queue_node));
last = head;
last->prev = last->next = NULL;
for (i = 0; i < 128; i++) {
lookup_entry[i] = NULL;
}
for (i = 0; str[i] != '\0'; i++) {
cur_node = lookup_entry[str[i]];
if (cur_node != NULL) {
/* it is a repeating character */
if (cur_node->prev != NULL) {
/* Entry has not been removed. Remove it from the queue. */
prev_node = cur_node->prev;
next_node = cur_node->next;
prev_node->next = next_node;
if (next_node != NULL) {
next_node->prev = prev_node;
} else {
/* Last node was removed */
last = prev_node;
}
cur_node->prev = NULL;
cur_node->next = NULL;
/* We will not free the node now. Instead, free
* all nodes in a single pass afterwards.
*/
}
} else {
/* This is the first occurence - add an entry to the queue */
struct queue_node *newnode = malloc (sizeof(struct queue_node));
newnode->ch = str[i];
newnode->index = i;
newnode->prev = last;
newnode->next = NULL;
last->next = newnode;
last = newnode;
lookup_entry[str[i]] = newnode;
}
print_queue (head);
}
last = head->next;
while (last != NULL) {
printf ("Non-repeating char: %c at index %d.\n", last->ch, last->index);
last = last->next;
}
/* Free the queue memory */
for (i = 0; i < 128; i++) {
if (lookup_entry[i] != NULL) {
free (lookup_entry[i]);
lookup_entry[i] = NULL;
}
}
free (head);
return (0);
}
void print_queue (struct queue_node *head) {
struct queue_node *tmp = head->next;
printf ("Queue: ");
while (tmp != NULL) {
printf ("%c:%d ", tmp->ch, tmp->index);
tmp = tmp->next;
}
printf ("\n");
}

Instead of making things more and more complex, I can use three for loops to tackle this.
class test{
public static void main(String args[]){
String s="STRESST";//Your input can be given here.
char a[]=new char[s.length()];
for(int i=0;i<s.length();i++){
a[i]=s.charAt(i);
}
for(int i=0;i<s.length();i++){
int flag=0;
for(int j=0;j<s.length();j++){
if(a[i]==a[j]){
flag++;
}
}
if(flag==1){
System.out.println(a[i]+" is not repeated");
break;
}
}
}
}
I guess it will be helpful for people who are just gonna look at the logic part without any complex methods used in the program.

This can be done in one Scan using the substring method. Do it like this:
String str="your String";<br>
String s[]= str.split("");<br>
int n=str.length();<br>
int i=0;<br><br>
for(String ss:s){
if(!str.substring(i+1,n).contains(ss)){
System.out.println(ss);
}
}
This will have the lowest complexity and will search for it even without completing one full scan.

Add each character to a HashSet and check whether hashset.add() returns true, if it returns false ,then remove the character from hashset.
Then getting the first value of the hashset will give you the first non repeated character.
Algorithm:
for(i=0;i<str.length;i++)
{
HashSet hashSet=new HashSet<>()
if(!hashSet.add(str[i))
hashSet.remove(str[i])
}
hashset.get(0) will give the non repeated character.

i have this program which is more simple,
this is not using any data structures
public static char findFirstNonRepChar(String input){
char currentChar = '\0';
int len = input.length();
for(int i=0;i<len;i++){
currentChar = input.charAt(i);
if((i!=0) && (currentChar!=input.charAt(i-1)) && (i==input.lastIndexOf(currentChar))){
return currentChar;
}
}
return currentChar;
}

A simple (non hashed) version...
public static String firstNRC(String s) {
String c = "";
while(s.length() > 0) {
c = "" + s.charAt(0);
if(! s.substring(1).contains(c)) return c;
s = s.replace(c, "");
}
return "";
}
or
public static char firstNRC(String s) {
s += " ";
for(int i = 0; i < s.length() - 1; i++)
if( s.split("" + s.charAt(i)).length == 2 ) return s.charAt(i);
return ' ';
}

//This is the simple logic for finding first non-repeated character....
public static void main(String[] args) {
String s = "GeeksforGeeks";
for (int i = 0; i < s.length(); i++) {
char begin = s.charAt(i);
String begin1 = String.valueOf(begin);
String end = s.substring(0, i) + s.substring(i + 1);
if (end.contains(begin1));
else {
i = s.length() + 1;
System.out.println(begin1);
}
}
}

#Test
public void testNonRepeadLetter() {
assertEquals('f', firstNonRepeatLetter("GeeksforGeeks"));
assertEquals('I', firstNonRepeatLetter("teststestsI"));
assertEquals('1', firstNonRepeatLetter("123aloalo"));
assertEquals('o', firstNonRepeatLetter("o"));
}
private char firstNonRepeatLetter(String s) {
if (s == null || s.isEmpty()) {
throw new IllegalArgumentException(s);
}
Set<Character> set = new LinkedHashSet<>();
for (int i = 0; i < s.length(); i++) {
char charAt = s.charAt(i);
if (set.contains(charAt)) {
set.remove(charAt);
} else {
set.add(charAt);
}
}
return set.iterator().next();
}

here is a tested code in java. note that it is possible that no non repeated character is found, and for that we return a '0'
// find first non repeated character in a string
static char firstNR( String str){
int i, j, l;
char letter;
int[] k = new int[100];
j = str.length();
if ( j > 100) return '0';
for (i=0; i< j; i++){
k[i] = 0;
}
for (i=0; i<j; i++){
for (l=0; l<j; l++){
if (str.charAt(i) == str.charAt(l))
k[i]++;
}
}
for (i=0; i<j; i++){
if (k[i] == 1)
return str.charAt(i);
}
return '0';

Here is the logic to find the first non-repeatable letter in a String.
String name = "TestRepeat";
Set <Character> set = new LinkedHashSet<Character>();
List<Character> list = new ArrayList<Character>();
char[] ch = name.toCharArray();
for (char c :ch) {
set.add(c);
list.add(c);
}
Iterator<Character> itr1 = set.iterator();
Iterator<Character> itr2= list.iterator();
while(itr1.hasNext()){
int flag =0;
Character setNext= itr1.next();
for(int i=0; i<list.size(); i++){
Character listNext= list.get(i);
if(listNext.compareTo(setNext)== 0){
flag ++;
}
}
if(flag==1){
System.out.println("Character: "+setNext);
break;
}
}

it is very easy....you can do it without collection in java..
public class FirstNonRepeatedString{
public static void main(String args[]) {
String input ="GeeksforGeeks";
char process[] = input.toCharArray();
boolean status = false;
int index = 0;
for (int i = 0; i < process.length; i++) {
for (int j = 0; j < process.length; j++) {
if (i == j) {
continue;
} else {
if (process[i] == process[j]) {
status = false;
break;
} else {
status = true;
index = i;
}
}
}
if (status) {
System.out.println("First non-repeated string is : " + process[index]);
break;
}
}
}
}

We can create LinkedHashMap having each character from the string and it's respective count. And then traverse through the map when you come across char with count as 1 return that character. Below is the function for the same.
private static char findFirstNonRepeatedChar(String string) {
LinkedHashMap<Character, Integer> map = new LinkedHashMap<>();
for(int i=0;i< string.length();i++){
if(map.containsKey(string.charAt(i)))
map.put(string.charAt(i),map.get(string.charAt(i))+1);
else
map.put(string.charAt(i),1);
}
for(Entry<Character,Integer> entry : map.entrySet()){
if(entry.getValue() == 1){
return entry.getKey();
}
}
return ' ';
}

One Pass Solution.
I have used linked Hashmap here to maintain the insertion order. So I go through all the characters of a string and store it values in Linked HashMap. After that I traverse through the Linked Hash map and whichever first key will have its value equal to 1, I will print that key and exit the program.
import java.util.*;
class demo
{
public static void main(String args[])
{
String str="GeekGsQuizk";
HashMap <Character,Integer>hm=new LinkedHashMap<Character,Integer>();
for(int i=0;i<str.length();i++)
{
if(!hm.containsKey(str.charAt(i)))
hm.put(str.charAt(i),1);
else
hm.put(str.charAt(i),hm.get(str.charAt(i))+1);
}
for (Character key : hm.keySet())
{
if(hm.get(key)==1)
{
System.out.println(key);
System.exit(0) ;
}
}
}
}

I know this comes one year late, but I think if you use LinkedHashMap in your solution instead of using a HashMap, you will have the order guaranteed in the resulting map and you can directly return the key with the corresponding value as 1.
Not sure if this is what you wanted though as you will have to iterate over the map (not the string) after you are done populating it - but just my 2 cents.
Regards,
-Vini

Finding first non-repeated character in one pass O(n ) , without using indexOf and lastIndexOf methods
package nee.com;
public class FirstNonRepeatedCharacterinOnePass {
public static void printFirstNonRepeatedCharacter(String str){
String strToCaps=str.toUpperCase();
char ch[]=strToCaps.toCharArray();
StringBuilder sb=new StringBuilder();
// ASCII range for A-Z ( 91-65 =26)
boolean b[]=new boolean[26];
for(int i=0;i<ch.length;i++){
if(b[ch[i]-65]==false){
b[ch[i]-65]=true;
}
else{
//add repeated char to StringBuilder
sb.append(ch[i]+"");
}
}
for(int i=0;i<ch.length;i++){
// if char is not there in StringBuilder means it is non repeated
if(sb.indexOf(ch[i]+"")==-1){
System.out.println(" first non repeated in lower case ...."+Character.toLowerCase((ch[i])));
break;
}
}
}
public static void main(String g[]){
String str="abczdabddcn";
printFirstNonRepeatedCharacter(str);
}
}

I did the same using LinkedHashSet. Following is the code snippet:
System.out.print("Please enter the string :");
str=sc.nextLine();
if(null==str || str.equals("")) {
break;
}else {
chArr=str.toLowerCase().toCharArray();
set=new LinkedHashSet<Character>();
dupSet=new LinkedHashSet<Character>();
for(char chVal:chArr) {
if(set.contains(chVal)) {
dupSet.add(chVal);
}else {
set.add(chVal);
}
}
set.removeAll(dupSet);
System.out.println("First unique :"+set.toArray()[0]);
}

You can find this question here
For code of the below algorithm refer this link (My implementation with test cases)
Using linkedlist in combination with hashMap
I have a solution which solves it in O(n) time One array pass and O(1) space
Inreality -> O(1) space is O(26) space
Algorithm
1) every time you visit a character for the first time
Create a node for the linkedList(storing that character).Append it at the end of the lnkedList.Add an entry in the hashMap storing for recently appended charater the address of the node in the linked list that was before that character.If character is appended to an empty linked list store null for vale in hash map.
2) Now if you encounter the same charactter again
Remove that element from the linkedlist using the address stored in the hash map and now you have to update for the element that was after the deleted element ,the previous element for it. Make it equal to the previous element of the deleted element.
Complexity Analysis
LinkedlIst add element -> O(1)
LinkedlIst delete element -> O(1)
HashMap -> O(1)
space O(1)
pass -> one in O(n)
#include<bits/stdc++.h>
using namespace std;
typedef struct node
{
char ch;
node *next;
}node;
char firstNotRepeatingCharacter(string &s)
{
char ans = '_';
map<char,node*> mp;//hash map atmost may consume O(26) space
node *head = NULL;//linkedlist atmost may consume O(26) space
node *last;// to append at last in O(1)
node *temp1 = NULL;
node *temp2 = new node[1];
temp2->ch = '$';
temp2->next = NULL;
//This is my one pass of array//
for(int i = 0;i < s.size();++i)
{
//first occurence of character//
if(mp.find(s[i]) == mp.end())
{
node *temp = new node[1];
temp->ch = s[i];
temp->next = NULL;
if(head == NULL)
{
head = temp;
last = temp;
mp.insert(make_pair(s[i],temp1));
}
else
{
last->next = temp;
mp.insert(make_pair(s[i],last));
last = temp;
}
}
//Repeated occurence//
else
{
node *temp = mp[s[i]];
if(mp[s[i]] != temp2)
{
if(temp == temp1)
{
head = head->next;
if((head)!=NULL){mp[head->ch] = temp1;}
else last = head;
mp[s[i]] = temp2;
}
else if((temp->next) != NULL)
{
temp->next = temp->next->next;
if((temp->next) != NULL){mp[temp->next->ch] = temp;}
else last = temp;
mp[s[i]] = temp2;
}
else
{
;
}
}
}
if(head == NULL){;}
else {ans = head->ch;}
return ans;
}
int main()
{
int T;
cin >> T;
while(T--)
{
string str;
cin >> str;
cout << str << " -> " << firstNotRepeatingCharacter(str)<< endl;
}
return 0;
}

Requires one scan only.
Uses a deque (saves char) and a hashmap (saves char->node). On repeating char, get char's node in deque using hashmap and remove it from deque (in O(1) time) but keep the char in hashmap with null node value. peek() gives the 1st unique character.
[pseudocode]
char? findFirstUniqueChar(s):
if s == null:
throw
deque<char>() dq = new
hashmap<char, node<char>> chToNodeMap = new
for i = 0, i < s.length(), i++:
ch = s[i]
if !chToNodeMap.hasKey(ch):
chToNodeMap[ch] = dq.enqueue(ch)
else:
chNode = chToNodeMap[ch]
if chNode != null:
dq.removeNode(chNode)
chToNodeMap[ch] = null
if dq.isEmpty():
return null
return dq.peek()
// deque interface
deque<T>:
node<T> enqueue(T t)
bool removeNode(node<T> n)
T peek()
bool isEmpty()

The string is scanned only once; other scans happen on counts and first appearance arrays, which are generally much smaller in size. Or at least below approach is for cases when string is much larger than character set the string is made from.
Here is an example in golang:
package main
import (
"fmt"
)
func firstNotRepeatingCharacter(s string) int {
counts := make([]int, 256)
first := make([]int, 256)
// The string is parsed only once
for i := len(s) - 1; i >= 0; i-- {
counts[s[i]]++
first[s[i]] = i
}
min := 0
minValue := len(s) + 1
// Now we are parsing counts and first slices
for i := 0; i < 256; i++ {
if counts[i] == 1 && first[i] < minValue {
minValue = first[i]
min = i
}
}
return min
}
func main() {
fmt.Println(string(firstNotRepeatingCharacter("fff")))
fmt.Println(string(firstNotRepeatingCharacter("aabbc")))
fmt.Println(string(firstNotRepeatingCharacter("cbbc")))
fmt.Println(string(firstNotRepeatingCharacter("cbabc")))
}
go playground

Question : Find First Non Repeating Character or First Unique Character:
The code itself is understandable.
public class uniqueCharacter1 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
firstUniqCharindex(a);
}
public static void firstUniqCharindex(String a) {
int count[] = new int[256];
for (char ch : a.toCharArray()) {
count[ch]++;
} // for
for (int i = 0; i < a.length(); i++) {
char ch = a.charAt(i);
if (count[ch] == 1) {
System.out.println(i);// 8
System.out.println(a.charAt(i));// p
break;
}
}
}// end1
}
In Python:
def firstUniqChar(a):
count = [0] * 256
for i in a: count[ord(i)] += 1
element = ""
for items in a:
if(count[ord(items) ] == 1):
element = items ;
break
return element
a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P

GeeksforGeeks also suggest efficient method but I think it is also two
scan.
Note that in the second scan, it does not scan the input string, but the array of wihch the length is NO_OF_CHARS. So the time complexity is O(n+m), which is better than 2*O(n), when the n is quite large(for a long intput string)
But it requires two scan of an array. I want to find first
non-repeating character in only one scan.
IMHO, it is possible if a priority queue is used. In that queue we compare each char with its occurrence count and its first occur index, and finally, we simply get the first element in the queue. See #hlpPy 's answer.

Find the first un-repeated character in a string

What is the quickest way to find the first character which only appears once in a string?

It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.
So you can iterate over the characters and append each character to a list the first time you see it, and separately keep a count of how many times you've seen it (in fact the only values that matter for the count is "0", "1" or "more than 1").
When you reach the end of the string you just have to find the first character in the list that has a count of exactly one.
Example code in Python:
def first_non_repeated_character(s):
counts = defaultdict(int)
l = []
for c in s:
counts[c] += 1
if counts[c] == 1:
l.append(c)
for c in l:
if counts[c] == 1:
return c
return None
This runs in O(n).

I see that people have posted some delightful answers below, so I'd like to offer something more in-depth.
An idiomatic solution in Ruby
We can find the first un-repeated character in a string like so:
def first_unrepeated_char string
string.each_char.tally.find { |_, n| n == 1 }.first
end
How does Ruby accomplish this?
Reading Ruby's source
Let's break down the solution and consider what algorithms Ruby uses for each step.
First we call each_char on the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could use each_byte instead.
The each_char method is implemented like so:
rb_str_each_char(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_chars(str, 0);
}
In turn, rb_string_enumerate_chars is implemented as:
rb_str_enumerate_chars(VALUE str, VALUE ary)
{
VALUE orig = str;
long i, len, n;
const char *ptr;
rb_encoding *enc;
str = rb_str_new_frozen(str);
ptr = RSTRING_PTR(str);
len = RSTRING_LEN(str);
enc = rb_enc_get(str);
if (ENC_CODERANGE_CLEAN_P(ENC_CODERANGE(str))) {
for (i = 0; i < len; i += n) {
n = rb_enc_fast_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
else {
for (i = 0; i < len; i += n) {
n = rb_enc_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
RB_GC_GUARD(str);
if (ary)
return ary;
else
return orig;
}
From this we can see that it calls rb_enc_mbclen (or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string as tally consumes the iterator.
Tally is then implemented like so:
static void
tally_up(VALUE hash, VALUE group)
{
VALUE tally = rb_hash_aref(hash, group);
if (NIL_P(tally)) {
tally = INT2FIX(1);
}
else if (FIXNUM_P(tally) && tally < INT2FIX(FIXNUM_MAX)) {
tally += INT2FIX(1) & ~FIXNUM_FLAG;
}
else {
tally = rb_big_plus(tally, INT2FIX(1));
}
rb_hash_aset(hash, group, tally);
}
static VALUE
tally_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, hash))
{
ENUM_WANT_SVALUE();
tally_up(hash, i);
return Qnil;
}
Here, tally_i uses RB_BLOCK_CALL_FUNC_ARGLIST to call repeatedly to tally_up, which updates the tally hash on every iteration.
Rough time & memory analysis
The each_char method doesn't allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When we tally the characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.
Time-wise, tally does a full scan of the string, and calling find to locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.
However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).
However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).
My old accepted answer in Python
You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:
def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False
Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computer™.

Why not use a heap based data structure such as a minimum priority queue. As you read each character from the string, add it to the queue with a priority based on the location in the string and the number of occurrences so far. You could modify the queue to add priorities on collision so that the priority of a character is the sum of the number appearances of that character. At the end of the loop, the first element in the queue will be the least frequent character in the string and if there are multiple characters with a count == 1, the first element was the first unique character added to the queue.

Here is another fun way to do it. Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

Lots of answers are attempting O(n) but are forgetting the actual costs of inserting and removing from the lists/associative arrays/sets they're using to track.
If you can assume that a char is a single byte, then you use a simple array indexed by the char and keep a count in it. This is truly O(n) because the array accesses are guaranteed O(1), and the final pass over the array to find the first element with 1 is constant time (because the array has a small, fixed size).
If you can't assume that a char is a single byte, then I would propose sorting the string and then doing a single pass checking adjacent values. This would be O(n log n) for the sort plus O(n) for the final pass. So it's effectively O(n log n), which is better than O(n^2). Also, it has virtually no space overhead, which is another problem with many of the answers that are attempting O(n).

Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... counts = Counter(s)
... for c in s:
... if counts[c]==1:
... return c
... return None
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

Refactoring a solution proposed earlier (not having to use extra list/memory). This goes over the string twice. So this takes O(n) too like the original solution.
def first_non_repeated_character(s):
counts = defaultdict(int)
for c in s:
counts[c] += 1
for c in s:
if counts[c] == 1:
return c
return None

The following is a Ruby implementation of finding the first nonrepeated character of a string:
def first_non_repeated_character(string)
string1 = string.split('')
string2 = string.split('')
string1.each do |let1|
counter = 0
string2.each do |let2|
if let1 == let2
counter+=1
end
end
if counter == 1
return let1
break
end
end
end
p first_non_repeated_character('dont doddle in the forest')
And here is a JavaScript implementation of the same style function:
var first_non_repeated_character = function (string) {
var string1 = string.split('');
var string2 = string.split('');
var single_letters = [];
for (var i = 0; i < string1.length; i++) {
var count = 0;
for (var x = 0; x < string2.length; x++) {
if (string1[i] == string2[x]) {
count++
}
}
if (count == 1) {
return string1[i];
}
}
}
console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));
In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.

I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).
unsigned char find_first_unique(unsigned char *string)
{
int chars[256];
int i=0;
memset(chars, 0, sizeof(chars));
while (string[i++])
{
chars[string[i]]++;
}
i = 0;
while (string[i++])
{
if (chars[string[i]] == 1) return string[i];
}
return 0;
}

In Ruby:
(Original Credit: Andrew A. Smith)
x = "a huge string in which some characters repeat"
def first_unique_character(s)
s.each_char.detect { |c| s.count(c) == 1 }
end
first_unique_character(x)
=> "u"

def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in repeated:
... discard it.
else if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False

Other JavaScript solutions are quite c-style solutions here is a more JavaScript-style solution.
var arr = string.split("");
var occurences = {};
var tmp;
var lowestindex = string.length+1;
arr.forEach( function(c){
tmp = c;
if( typeof occurences[tmp] == "undefined")
occurences[tmp] = tmp;
else
occurences[tmp] += tmp;
});
for(var p in occurences) {
if(occurences[p].length == 1)
lowestindex = Math.min(lowestindex, string.indexOf(p));
}
if(lowestindex > string.length)
return null;
return string[lowestindex];
}

in C, this is almost Shlemiel the Painter's Algorithm (not quite O(n!) but more than 0(n2)).
But will outperform "better" algorithms for reasonably sized strings because O is so small. This can also easily tell you the location of the first non-repeating string.
char FirstNonRepeatedChar(char * psz)
{
for (int ii = 0; psz[ii] != 0; ++ii)
{
for (int jj = ii+1; ; ++jj)
{
// if we hit the end of string, then we found a non-repeat character.
//
if (psz[jj] == 0)
return psz[ii]; // this character doesn't repeat
// if we found a repeat character, we can stop looking.
//
if (psz[ii] == psz[jj])
break;
}
}
return 0; // there were no non-repeating characters.
}
edit: this code is assuming you don't mean consecutive repeating characters.

Here's an implementation in Perl (version >=5.10) that doesn't care whether the repeated characters are consecutive or not:
use strict;
use warnings;
foreach my $word(#ARGV)
{
my #distinct_chars;
my %char_counts;
my #chars=split(//,$word);
foreach (#chars)
{
push #distinct_chars,$_ unless $_~~#distinct_chars;
$char_counts{$_}++;
}
my $first_non_repeated="";
foreach(#distinct_chars)
{
if($char_counts{$_}==1)
{
$first_non_repeated=$_;
last;
}
}
if(length($first_non_repeated))
{
print "For \"$word\", the first non-repeated character is '$first_non_repeated'.\n";
}
else
{
print "All characters in \"$word\" are repeated.\n";
}
}
Storing this code in a script (which I named non_repeated.pl) and running it on a few inputs produces:
jmaney> perl non_repeated.pl aabccd "a huge string in which some characters repeat" abcabc
For "aabccd", the first non-repeated character is 'b'.
For "a huge string in which some characters repeat", the first non-repeated character is 'u'.
All characters in "abcabc" are repeated.

Here's a possible solution in ruby without using Array#detect (as in this answer). Using Array#detect makes it too easy, I think.
ALPHABET = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
def fnr(s)
unseen_chars = ALPHABET.dup
seen_once_chars = []
s.each_char do |c|
if unseen_chars.include?(c)
unseen_chars.delete(c)
seen_once_chars << c
elsif seen_once_chars.include?(c)
seen_once_chars.delete(c)
end
end
seen_once_chars.first
end
Seems to work for some simple examples:
fnr "abcdabcegghh"
# => "d"
fnr "abababababababaqababa"
=> "q"
Suggestions and corrections are very much appreciated!

Try this code:
public static String findFirstUnique(String str)
{
String unique = "";
foreach (char ch in str)
{
if (unique.Contains(ch)) unique=unique.Replace(ch.ToString(), "");
else unique += ch.ToString();
}
return unique[0].ToString();
}

In Mathematica one might write this:
string = "conservationist deliberately treasures analytical";
Cases[Gather # Characters # string, {_}, 1, 1][[1]]
{"v"}

This snippet code in JavaScript
var string = "tooth";
var hash = [];
for(var i=0; j=string.length, i<j; i++){
if(hash[string[i]] !== undefined){
hash[string[i]] = hash[string[i]] + 1;
}else{
hash[string[i]] = 1;
}
}
for(i=0; j=string.length, i<j; i++){
if(hash[string[i]] === 1){
console.info( string[i] );
return false;
}
}
// prints "h"

Different approach here.
scan each element in the string and create a count array which stores the repetition count of each element.
Next time again start from first element in the array and print the first occurrence of element with count = 1
C code
-----
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char t_c;
char *t_p = argv[1] ;
char count[128]={'\0'};
char ch;
for(t_c = *(argv[1]); t_c != '\0'; t_c = *(++t_p))
count[t_c]++;
t_p = argv[1];
for(t_c = *t_p; t_c != '\0'; t_c = *(++t_p))
{
if(count[t_c] == 1)
{
printf("Element is %c\n",t_c);
break;
}
}
return 0;
}

input is = aabbcddeef output is = c
char FindUniqueChar(char *a)
{
int i=0;
bool repeat=false;
while(a[i] != '\0')
{
if (a[i] == a[i+1])
{
repeat = true;
}
else
{
if(!repeat)
{
cout<<a[i];
return a[i];
}
repeat=false;
}
i++;
}
return a[i];
}

Here is another approach...we could have a array which will store the count and the index of the first occurrence of the character. After filling up the array we could jst traverse the array and find the MINIMUM index whose count is 1 then return str[index]
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <climits>
using namespace std;
#define No_of_chars 256
//store the count and the index where the char first appear
typedef struct countarray
{
int count;
int index;
}countarray;
//returns the count array
countarray *getcountarray(char *str)
{
countarray *count;
count=new countarray[No_of_chars];
for(int i=0;i<No_of_chars;i++)
{
count[i].count=0;
count[i].index=-1;
}
for(int i=0;*(str+i);i++)
{
(count[*(str+i)].count)++;
if(count[*(str+i)].count==1) //if count==1 then update the index
count[*(str+i)].index=i;
}
return count;
}
char firstnonrepeatingchar(char *str)
{
countarray *array;
array = getcountarray(str);
int result = INT_MAX;
for(int i=0;i<No_of_chars;i++)
{
if(array[i].count==1 && result > array[i].index)
result = array[i].index;
}
delete[] (array);
return (str[result]);
}
int main()
{
char str[] = "geeksforgeeks";
cout<<"First non repeating character is "<<firstnonrepeatingchar(str)<<endl;
return 0;
}

Function:
This c# function uses a HashTable (Dictionary) and have a performance O(2n) worstcase.
private static string FirstNoRepeatingCharacter(string aword)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < aword.Length; i++)
{
if (!dic.ContainsKey(aword.Substring(i, 1)))
dic.Add(aword.Substring(i, 1), 1);
else
dic[aword.Substring(i, 1)]++;
}
foreach (var item in dic)
{
if (item.Value == 1) return item.Key;
}
return string.Empty;
}
Example:
string aword = "TEETER";
Console.WriteLine(FirstNoRepeatingCharacter(aword)); //print: R

I have two strings i.e. 'unique' and 'repeated'. Every character appearing for the first time, gets added to 'unique'. If it is repeated for the second time, it gets removed from 'unique' and added to 'repeated'. This way, we will always have a string of unique characters in 'unique'.
Complexity big O(n)
public void firstUniqueChar(String str){
String unique= "";
String repeated = "";
str = str.toLowerCase();
for(int i=0; i<str.length();i++){
char ch = str.charAt(i);
if(!(repeated.contains(str.subSequence(i, i+1))))
if(unique.contains(str.subSequence(i, i+1))){
unique = unique.replaceAll(Character.toString(ch), "");
repeated = repeated+ch;
}
else
unique = unique+ch;
}
System.out.println(unique.charAt(0));
}

The following code is in C# with complexity of n.
using System;
using System.Linq;
using System.Text;
namespace SomethingDigital
{
class FirstNonRepeatingChar
{
public static void Main()
{
String input = "geeksforgeeksandgeeksquizfor";
char[] str = input.ToCharArray();
bool[] b = new bool[256];
String unique1 = "";
String unique2 = "";
foreach (char ch in str)
{
if (!unique1.Contains(ch))
{
unique1 = unique1 + ch;
unique2 = unique2 + ch;
}
else
{
unique2 = unique2.Replace(ch.ToString(), "");
}
}
if (unique2 != "")
{
Console.WriteLine(unique2[0].ToString());
Console.ReadLine();
}
else
{
Console.WriteLine("No non repeated string");
Console.ReadLine();
}
}
}
}

The following solution is an elegant way to find the first unique character within a string using the new features which have been introduced as part as Java 8. This solution uses the approach of first creating a map to count the number of occurrences of each character. It then uses this map to find the first character which occurs only once. This runs in O(N) time.
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
// Runs in O(N) time and uses lambdas and the stream API from Java 8
// Also, it is only three lines of code!
private static String findFirstUniqueCharacterPerformantWithLambda(String inputString) {
// convert the input string into a list of characters
final List<String> inputCharacters = Arrays.asList(inputString.split(""));
// first, construct a map to count the number of occurrences of each character
final Map<Object, Long> characterCounts = inputCharacters
.stream()
.collect(groupingBy(s -> s, counting()));
// then, find the first unique character by consulting the count map
return inputCharacters
.stream()
.filter(s -> characterCounts.get(s) == 1)
.findFirst()
.orElse(null);
}

Here is one more solution with o(n) time complexity.
public void findUnique(String string) {
ArrayList<Character> uniqueList = new ArrayList<>();
int[] chatArr = new int[128];
for (int i = 0; i < string.length(); i++) {
Character ch = string.charAt(i);
if (chatArr[ch] != -1) {
chatArr[ch] = -1;
uniqueList.add(ch);
} else {
uniqueList.remove(ch);
}
}
if (uniqueList.size() == 0) {
System.out.println("No unique character found!");
} else {
System.out.println("First unique character is :" + uniqueList.get(0));
}
}

I read through the answers, but did not see any like mine, I think this answer is very simple and fast, am I wrong?
def first_unique(s):
repeated = []
while s:
if s[0] not in s[1:] and s[0] not in repeated:
return s[0]
else:
repeated.append(s[0])
s = s[1:]
return None
test
(first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')

Question : First Unique Character of a String
This is the simplest solution.
public class Test4 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
firstUniqCharindex(a);
}
public static void firstUniqCharindex(String a) {
int[] count = new int[256];
for (int i = 0; i < a.length(); i++) {
count[a.charAt(i)]++;
}
int index = -1;
for (int i = 0; i < a.length(); i++) {
if (count[a.charAt(i)] == 1) {
index = i;
break;
} // if
}
System.out.println(index);// output => 8
System.out.println(a.charAt(index)); //output => P
}// end1
}
IN Python :
def firstUniqChar(a):
count = [0] * 256
for i in a: count[ord(i)] += 1
element = ""
for items in a:
if(count[ord(items) ] == 1):
element = items ;
break
return element
a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P
Using Java 8 :
public class Test2 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
Map<Character, Long> map = a.chars()
.mapToObj(
ch -> Character.valueOf((char) ch)
).collect(
Collectors.groupingBy(
Function.identity(),
LinkedHashMap::new,
Collectors.counting()));
System.out.println("MAP => " + map);
// {G=2, i=5, n=2, a=2, P=1, r=1, o=1, t=1, j=1, y=1}
Character chh = map
.entrySet()
.stream()
.filter(entry -> entry.getValue() == 1L)
.map(entry -> entry.getKey())
.findFirst()
.get();
System.out.println("First Non Repeating Character => " + chh);// P
}// main
}

how about using a suffix tree for this case... the first unrepeated character will be first character of longest suffix string with least depth in tree..

Create Two list -
unique list - having only unique character .. UL
non-unique list - having only repeated character -NUL
for(char c in str) {
if(nul.contains(c)){
//do nothing
}else if(ul.contains(c)){
ul.remove(c);
nul.add(c);
}else{
nul.add(c);
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

finding if two words are anagrams of each other - algorithm

Count the frequency of each character in the two strings. Check if the two histograms match. O(n) time, O(1) space (assuming ASCII) (Of course it is still O(1) space for Unicode but the table will become very large).

Create a Hashmap where key - letter and value - frequencey of letter, for first string populate the hashmap (O(n)) for second string decrement count and remove element from hashmap O(n) if hashmap is empty, the string is anagram otherwise not.

How about this? a = "lai d" b = "di al" sorteda = [] sortedb = [] for i in a: if i != " ": sorteda.append(i) if c == len(b): for x in b: c -= 1 if x != " ": sortedb.append(x) sorteda.sort(key = str.lower) sortedb.sort(key = str.lower) print sortedb print sorteda print sortedb == sorteda

static bool IsAnagram(string s1, string s2) { if (s1.Length != s2.Length) return false; else { int sum1 = 0; for (int i = 0; i < s1.Length; i++) sum1 += (int)s1[i]-(int)s2[i]; if (sum1 == 0) return true; else return false; } }

I guess your sorting algorithm is not really O(log n), is it? The best you can get is O(n) for your algorithm, because you have to check every character. You might use two tables to store the counts of each letter in every word, fill it with O(n) and compare it with O(1).

Related

how to find the most letter(s) with the same frequency

Efficient tuple search algorithm

Getting all combination of array elements that form a given string

Given a string, find its first non-repeating character in only One scan

Find the first un-repeated character in a string

Categories

Resources