Algorithm to find first index where strings are different? - algorithm

I've got a collection of strings, and I need to know the first index where they all differ. I can think of two ways to do this: (the following pseudo code is just off the top of my head and may be heavily bug-laden)
First Way:
var minLength = [go through all strings finding min length];
var set = new set()
for(i=0;i<minlength;i++)
{
for(str in strings)
{
var substring = str.substring(0,i);
if(set.contains(substring))
break; // not all different yet, increment i
set.add(substring)
}
set.clear(); // prepare for next length of substring
}
This strikes me as gross because of the use of a set data structure where it seems like one should not be needed.
Second Way:
var minLength = [go through all strings finding min length];
strings.sort();
for(i=0;i<minlength;i++)
{
boolean done = true;
char last = null;
for(str in strings)
{
char c = str[i];
if(c == last)
{
// not all different yet, increment i
done = false;
break;
}
last = c;
}
if(done)
return i;
}
But it annoys me that I have to run the sort first, because the sorting algorithm, by its very nature, has access to the information that I'm looking for.
Surely there must be a more efficient way than what I have listed above. Eventually I'd like to abstract it out to any type of array, but that will be trivial and it's simpler to think of it as a string problem.
Any help?
**UPDATE: I apparently didn't explain myself very well. If my strings are ["apple", "banana", "cucumber", "banking"], I want the function to return 3, because there were two strings ("banana" and "banking") that matched through index 0, 1, and 2, so 3 is the first index where they are all unique.
As Daniel mentioned below, a better way to state my needs is that: "I want to find index i where calling substring(0,i) on all my strings will result in all unique values."**

This is untested, but here's my attempt. (I may be making it more complicated than I have to, but I think it's a different way to look at it.)
The basic idea is to compile groups of items that match at the first element, then find the max unique index for each group, checking elements at each successive index.
int FirstUniqueIndex<T>(IEnumerable<IEnumerable<T>> myArrayCollection)
{
//just an overload so you don't have to specify index 0 all the time
return FirstUniqueIndex(myArrayCollection, 0);
}
int FirstUniqueIndex<T>(IEnumerable<IEnumerable<T>> myArrayCollection, int StartIndex)
{
/* Group the current collection by the element at StartIndex, and
* return a collection of these groups. Additionally, we're only interested
* in the groups with more than one element, so only get those.*/
var groupsWithMatches = from var item in myArrayCollection //for each item in the collection (called "item")
where item.Length > StartIndex //that are long enough
group by item[StartIndex] into g //group them by the element at StartIndex, and call the group "g"
where g.Skip(1).Any() //only want groups with more than one element
select g; //add the group to the collection
/* Now "groupsWithMatches" is an enumeration of groups of inner matches of
* your original arrays. Let's process them... */
if(groupsWithMatches.Any())
//some matches were found - check the next index for each group
//(get the maximum unique index of all the matched groups)
return groupsWithMatches.Max(group => FirstUniqueIndex(group, StartIndex + 1));
else
//no matches found, all unique at this index
return StartIndex;
}
And for the non-LINQ version of the above (I'll change it to use a List collection, but any collection will do). I'll even remove the lambda. Again untested, so try not to aim sharp implements in my direction.
int FirstUniqueIndex<T>(List<List<T>> myArrayCollection, int StartIndex)
{
/* Group the current collection by the element at StartIndex, and
* return a collection of these groups. Additionally, we're only interested
* in the groups with more than one element, so only get those.*/
Dictionary<T, List<List<T>>> groupsWithMatches = new Dictionary<T, List<List<T>>>();
//group all the items by the element at StartIndex
foreach(var item in myArrayCollection)
{
if(item.Count > StartIndex)
{
List<List<T>> group;
if(!groups.TryGetValue(item[StartIndex], out group))
{
//new group, so make it first
group = new List<List<T>>();
groups.Add(item[StartIndex], group);
}
group.Add(Item);
}
}
/* Now "groups" is an enumeration of groups of inner matches of
* your original arrays. Let's get the groups with more than one item. */
List<List<List<T>>> groupsWithMatches = new List<List<List<T>>>(groups.Count);
foreach(List<List<T> group in groupsWithMatches)
{
if(group.Count > 1)
groupsWithMatches.Add(group);
}
if(groupsWithMatches.Count > 0)
{
//some matches were found - check the next index for each group
//(get the maximum unique index of all the matched groups)
int max = -1;
foreach(List<List<T>> group in groupsWithMatches)
{
int index = FirstUniqueIndex(group, StartIndex + 1);
max = index > max ? index : max;
}
return max;
}
else
{
//no matches found, all unique at this index
return StartIndex;
}
}

have you looked at a Patricia trie? (Java implementation available on google code)
Build the trie, then traverse the data structure to find the maximum string position of all the internal nodes (black dots in the function above).
This seems like it should be an O(n) operation. I'm not sure whether your set implementation is O(n) or not -- it "smells" like O(n2) but I'm not sure.

Use the set as you proposed, that's exactly the right thing to do.

You should be able to do this without sorting, and with only looking at each character in each string once in the worst case.
here is a ruby script that puts the index to the console:
mystrings = ["apple", "banana", "cucumber", "banking"]
minlength = getMinLengthString(mystrings) #not defined here
char_set = {}
(0..minlength).each do |char_index|
char_set[mystrings[0][char_index].chr] = 1
(1..mystrings.length).each do |string_index|
comparing_char = mystrings[string_index][char_index].chr
break if char_set[comparing_char]
if string_index == (mystrings.length - 1) then
puts string_index
exit
else
char_set[comparing_char] = 1
end
end
char_set.clear
end
puts minlength
the result is 3.
Here's the same general snippet in C#, if it is more legible for you:
string[] mystrings = { "apple", "banana", "cucumber", "banking" };
//defined elsewhere...
int minlength = GetMinStringLengthFromStringArray(mystrings);
Dictionary<char, int> charSet = new Dictionary<char, int>();
for (int char_index = 0; char_index < minlength; char_index++)
{
charSet.Add(mystrings[0][char_index], 1);
for (int string_index = 1; string_index < mystrings.Length; string_index++)
{
char comparing_char = mystrings[string_index][char_index];
if (charSet.ContainsKey(comparing_char))
{
break;
}
else
{
if (string_index == mystrings.Length - 1)
{
Console.Out.WriteLine("Index is: " + string_index.ToString());
return;
}
else
{
charSet.Add(comparing_char, 1);
}
}
}
charSet.Clear();
}
Console.Out.WriteLine("Index is: " + minlength.ToString());

int i = 0;
while(true)
{
Set set = new Set();
for(int j = 0; j < strings.length; j++)
{
if(i >= strings[j].length) return i;
String chr = strings[j].charAt(i);
if(set.hasElement(chr))
break;
else
set.addElement(chr);
}
if(set.size() == strings.length)
return i;
i++;
}
Gotta check preconditions first.
EDIT: Using a set now. Changed langauge.

Here's my solution in Python:
words = ["apple", "banana", "cucumber", "banking"]
for i in range(len(min(words))):
d = defaultdict(int)
for word in words:
d[word[i]] += 1
if max(d.values()) == 1:
return i
I didn't write in anything to handle the case where no minimum index is found by the time you reach the end of the shortest word, but I'm sure you get the idea.

Related

Best way to sort a list of lines for connectivity

I have a long list of lines in (possibly) random order. So basically:
struct Line
{
Vector StartPos;
Vector EndPos;
};
Now I'm looking for an efficient way to sort these lines so that they are sorted into spans. I.E. if line A's startpos matches Line B's endpos, it gets moved into the list immediately after line B. If nothing matches, it just goes to the end of the list to start a new span.
Right now I'm doing it brute force-- setting a flag variable if anything was changed, and if anything changed, sorting it again. This produces gigantically exponential iterations. Is there any faster way to optimize this so that I could conceivably keep the iterations down to listsize^listsize?
If you do not have lines that start or end at the same point maybe you can use dictionaries to reduce the look ups. Something like:
public class Line
{
public Point StartPos;
public Point EndPos;
public bool isUsed = false;
};
and then 1) create a dictionary with the key the endPos and the value the index of the element in you list, 2) for each element of the list follow the link using the dictionary. Something like:
List<List<Line>> result = new List<List<Line>>();
Dictionary<Point,int> dic= new Dictionary<Point,int>();
for (int kk = 0; kk < mylines.Count; kk++)
{
dic[mylines[kk].EndPos] = kk;
}
for (int kk = 0; kk < mylines.Count; kk++)
{
if (mylines[kk].isUsed == false)
{
var orderline= new List<Line>();
orderline.Add(mylines[kk]);
int mm = kk;
while (dic.ContainsKey(mylines[mm].EndPos))
{
mm = dic[mylines[mm].EndPos];
mylines[mm].isUsed = true;
orderline.Add(mylines[mm]);
}
result.Add(orderline);
}
}

Increment Key, Decrement Key, Find Max Key, Find Min key in O(1) time

I was asked this question in the interview but could not solve it. Design a data structure which does the following
Inc(Key) -> Takes a key and increment its value by 1. If the key comes first time then make its value as 1.
Dec(Key) -> Takes a key and decrement its value by 1. It is given that its value is minimum 1.
Findmaxkey() -> Returns the key which has the maximum value corresponding to it. If there are multiple such keys then you can output any of them.
Findminkey() -> Returns the key which has the minimum value corresponding to it. If there are multiple such keys then you can output any of them.
You have to do all the operations in O(1) time.
Hint: The interviewer was asking me to use a dictionary(hashmap) with a doubly-linked list.
The data structure could be constructed as follows:
Store all keys that have the same count in a HashSet keys, and accompany that set with the value for count: let's call this pair of count and keys a "bucket".
For each count value for which there is at least a key, you'd have such a bucket. Put the buckets in a doubly linked list bucketList, and keep them ordered by count.
Also create a HashMap bucketsByKey that maps a key to the bucket where that key is currently stored (the key is listed in the bucket's keys set)
The FindMinKey operation is then simple: get the first bucket from bucketList, grab a key from it's keys set (no matter which), and return it. Similar for FindMaxKey.
The Inc(key) operation would perform the following steps:
Get the bucket corresponding to key from bucketsByKey
If that bucket exists, delete the key from it's keys set.
If that set happens to become empty, remove the bucket from bucketList
If the next bucket in bucketList has a count that is one more, then add the key to it's set, and update bucketsByKey so that it refers to this bucket for this key.
If the next bucket in bucketList has a different count (or there are no more buckets), then create a new bucket with the right count and key and insert it just before the earlier found bucket in bucketList -- or if no next bucket was found, just add the new one at the end.
If in step 2 there was no bucket found for this key, then assume its count was 0, and take the first bucket from bucketList and use it as the "next bucket" from step 4 onwards.
The process for Dec(key) is similar except that when the count is found to be already 1, nothing happens.
Here is an interactive snippet in JavaScript which you can run here. It uses the native Map for the HashMap, the native Set for the HashSet, and implements a doubly linked list as a circular one, where the start/end is marked by a "sentinel" node (without data).
You can press the Inc/Dec buttons for a key of your choice and monitor the output of FindMinKey and FindMaxKey, as well as a simple view on the data structure.
class Bucket {
constructor(count) {
this.keys = new Set; // keys in this hashset all have the same count:
this.count = count; // will never change. It's the unique key identifying this bucket
this.next = this; // next bucket in a doubly linked, circular list
this.prev = this; // previous bucket in the list
}
delete() { // detach this bucket from the list it is in
this.next.prev = this.prev;
this.prev.next = this.next;
this.next = this;
this.prev = this;
}
insertBefore(node) { // inject `this` into the list that `node` is in, right before it
this.next = node;
this.prev = node.prev;
this.prev.next = this;
this.next.prev = this;
}
* nextBuckets() { // iterate all following buckets until the "sentinel" bucket is encountered
for (let bucket = this.next; bucket.count; bucket = bucket.next) {
yield bucket;
}
}
}
class MinMaxMap {
constructor() {
this.bucketsByKey = new Map; // hashmap of key -> bucket
this.bucketList = new Bucket(0); // a sentinel node of a circular doubly linked list of buckets
}
inc(key) {
this.add(key, 1);
}
dec(key) {
this.add(key, -1);
}
add(key, one) {
let nextBucket, count = 1;
let bucket = this.bucketsByKey.get(key);
if (bucket === undefined) {
nextBucket = this.bucketList.next;
} else {
count = bucket.count + one;
if (count < 1) return;
bucket.keys.delete(key);
nextBucket = one === 1 ? bucket.next : bucket.prev;
if (bucket.keys.size === 0) bucket.delete(); // remove from its list
}
if (nextBucket.count !== count) {
bucket = new Bucket(count);
bucket.insertBefore(one === 1 ? nextBucket : nextBucket.next);
} else {
bucket = nextBucket;
}
bucket.keys.add(key);
this.bucketsByKey.set(key, bucket);
}
findMaxKey() {
if (this.bucketList.prev.count === 0) return null; // the list is empty
return this.bucketList.prev.keys.values().next().value; // get any key from first bucket
}
findMinKey() {
if (this.bucketList.next.count === 0) return null; // the list is empty
return this.bucketList.next.keys.values().next().value; // get any key from last bucket
}
toString() {
return JSON.stringify(Array.from(this.bucketList.nextBuckets(), ({count, keys}) => [count, ...keys]))
}
}
// I/O handling
let inpKey = document.querySelector("input");
let [btnInc, btnDec] = document.querySelectorAll("button");
let [outData, outMin, outMax] = document.querySelectorAll("span");
let minMaxMap = new MinMaxMap;
btnInc.addEventListener("click", function () {
minMaxMap.inc(inpKey.value);
refresh();
});
btnDec.addEventListener("click", function () {
minMaxMap.dec(inpKey.value);
refresh();
});
function refresh() {
outData.textContent = minMaxMap.toString();
outMin.textContent = minMaxMap.findMinKey();
outMax.textContent = minMaxMap.findMaxKey();
}
key: <input> <button>Inc</button> <button>Dec</button><br>
data structure (linked list): <span></span><br>
findMinKey = <span></span><br>
findMaxKey = <span></span>
Here is my answer, still I'm not sure that I haven't broken any of the circumstances that your interviewer had in mind.
We will keep a LinkedList where each element has the key and values it's corresponding to, and a pointer to its previous and next element and is always sorted by values. We store a pointer for every key, where it is placed in the LinkedList. Furthermore, for every new number that we see, we add two elements which are supposed to view the start and end element of each number and we will store a pointer to them. Since we are adding these extra elements at most two for each operation, it's still of O(1).
now for every operation (say increment), we can find where the element corresponding to this key is placed in the LinkedList using a dictionary (assuming dictionaries work in time complexity of O(1)) now, we find the last element in the LinkedList which has the same value (we can do it using the element corresponding to the end of that value and come one element backwards) and swap these two's pointers (it's only a simple swap, and this swap does not affect other elements) next we swap this element with it's next one for two times so that it falls in the segment of the next number (we may need to add that number as well), the last things to keep track of, is the value of minimum and maximum which has to be updated if the element which is changing is either the current minimum or maximum and there is no number with the same value (the start and end elements for that value are consecutive in the LinkedList)
Still, I think this approach can be improved.
The key is the problem only asks for dec(1) or inc(1). Therefore, the algorithm only needs to move a block forward or backward. That's a strong prior and gives a lot of information.
My tested code:
template <typename K, uint32_t N>
struct DumbStructure {
private:
const int head_ = 0, tail_ = N - 1;
std::unordered_map<K, int> dic_;
int l_[N], r_[N], min_ = -1, max_ = -1;
std::unordered_set<K> keys_[N];
void NewKey(const K &key) {
if (min_ < 0) {
// nothing on the list
l_[1] = head_;
r_[1] = tail_;
r_[head_] = 1;
l_[tail_] = 1;
min_ = max_ = 1;
} else if (min_ == 1) {
} else {
// min_ > 1
l_[1] = head_;
r_[1] = min_;
r_[head_] = 1;
l_[min_] = 1;
min_ = 1;
}
keys_[1].insert(key);
}
void MoveKey(const K &key, int from_value, int to_value) {
int prev_from_value = l_[from_value];
int succ_from_value = r_[from_value];
if (keys_[from_value].size() >= 2) {
} else {
r_[prev_from_value] = succ_from_value;
l_[succ_from_value] = prev_from_value;
if (min_ == from_value) min_ = succ_from_value;
if (max_ == from_value) max_ = prev_from_value;
}
keys_[from_value].erase(key);
if (keys_[to_value].size() >= 1) {
} else {
if (to_value > from_value) {
// move forward
l_[to_value] =
keys_[from_value].size() > 0 ? from_value : prev_from_value;
r_[to_value] = succ_from_value;
r_[l_[to_value]] = to_value;
l_[r_[to_value]] = to_value;
} else {
// move backward
l_[to_value] = prev_from_value;
r_[to_value] =
keys_[from_value].size() > 0 ? from_value : succ_from_value;
r_[l_[to_value]] = to_value;
l_[r_[to_value]] = to_value;
}
}
keys_[to_value].insert(key);
min_ = std::min(min_, to_value);
max_ = std::max(max_, to_value);
}
public:
DumbStructure() {
l_[head_] = -1;
r_[head_] = tail_;
l_[tail_] = head_;
r_[tail_] = -1;
}
void Inc(const K &key) {
if (dic_.count(key) == 0) {
dic_[key] = 1;
NewKey(key);
} else {
MoveKey(key, dic_[key], dic_[key] + 1);
dic_[key] += 1;
}
}
void Dec(const K &key) {
if (dic_.count(key) == 0 || dic_[key] == 1) {
// invalid
return;
} else {
MoveKey(key, dic_[key], dic_[key] - 1);
dic_[key] -= 1;
}
}
K GetMaxKey() const { return *keys_[max_].begin(); }
K GetMinKey() const { return *keys_[min_].begin(); }
};

Find the index of a given permutation in the sorted list of the permutations of a given string

We're given a string and a permutation of the string.
For example, an input string sandeep and a permutation psdenae.
Find the position of the given permutation in the sorted list of the permutations of the original string.
The total number of permutation of a given string of length n would be n! (if all characters are different), thus it would not be possible to explore all the combinations.
This question is actually like the mathematics P & C question
Find the rank of the word "stack" when arranged in dictionary order.
Given the input string as NILSU
Take a word which we have to find the rank. Take "SUNIL" for example.
Now arrange the letter of "SUNIL" in alphabetical order.
It will be. "I L N S U".
Now take the first letter. Its "I". Now check, is the letter "I" the
first letter of "SUNIL"? No. The number of words that can be formed
starting with I will be 4!, so we know that there will be 4! words
before "SUNIL".
I = 4! = 24
Now go for the second letter. Its "L". Now check once again if this
letter we want in first position? No. So the number of words can be
formed starting with "L" will be 4!.
L = 4! = 24
Now go for "N". Is this we want? No. Write down the number of words
can be formed starting with "N", once again 4!
N = 4! = 24
Now go for "S". Is this what we want? Yes. Now remove the letter from
the alphabetically ordered word. It will now be "I L N U"
Write S and check the word once again in the list. Is we want SI? No.
So the number of words can be formed starting with SI will be 3!
[S]:I-> 3! = 6
Go for L. is we want SL? No. So it will be 3!.
[S]:L-> 3! = 6
Go for N. is we want SN? No.
[S]:N-> 3! = 6
Go for SU. Is this we want? Yes. Cut the letter U from the list and
then it will be "I L N". Now try I. is we want SUI? No. So the number
of words can be formed which starts from SUI will be 2!
[SU]:I-> 2! = 2 Now go for L. Do we want "SUL". No. so the number of
words starting with SUL will be 2!.
[SU]:L-> 2! = 2
Now go for N. Is we want SUN? Yes, now remove that letter. and this
will be "I L". Do we want "SUNI"? Yes. Remove that letter. The only
letter left is "L".
Now go for L. Do we want SUNIL? Yes. SUNIL were the first options, so
we have 1!. [SUN][I][L] = 1! = 1
Now add the whole numbers we get. The sum will be.
24 + 24 + 24 + 6 + 6 + 6 + 2 + 2 + 1 = 95.
So the word SUNIL will be at 95th position if we count the words that can be created using the letters of SUNIL arranged in dictionary order.
Thus through this method you could solve this problem quite easily.
Building off #Algorithmist 's answer, and his comment to his answer, and using the principle discussed in this post for when there are repeated letters, I made the following algorithm in JavaScript that works for all letter-based words even with repeated letter instances.
function anagramPosition(string) {
var index = 1;
var remainingLetters = string.length - 1;
var frequencies = {};
var splitString = string.split("");
var sortedStringLetters = string.split("").sort();
sortedStringLetters.forEach(function(val, i) {
if (!frequencies[val]) {
frequencies[val] = 1;
} else {
frequencies[val]++;
}
})
function factorial(coefficient) {
var temp = coefficient;
var permutations = coefficient;
while (temp-- > 2) {
permutations *= temp;
}
return permutations;
}
function getSubPermutations(object, currentLetter) {
object[currentLetter]--;
var denominator = 1;
for (var key in object) {
var subPermutations = factorial(object[key]);
subPermutations !== 0 ? denominator *= subPermutations : null;
}
object[currentLetter]++;
return denominator;
}
var splitStringIndex = 0;
while (sortedStringLetters.length) {
for (var i = 0; i < sortedStringLetters.length; i++) {
if (sortedStringLetters[i] !== splitString[splitStringIndex]) {
if (sortedStringLetters[i] !== sortedStringLetters[i+1]) {
var permutations = factorial(remainingLetters);
index += permutations / getSubPermutations(frequencies, sortedStringLetters[i]);
} else {
continue;
}
} else {
splitStringIndex++;
frequencies[sortedStringLetters[i]]--;
sortedStringLetters.splice(i, 1);
remainingLetters--;
break;
}
}
}
return index;
}
anagramPosition("ARCTIC") // => 42
I didn't comment the code but I did try to make the variable names as explanatory as possible. If you run it through a debugger process using your dev tools console and throw in a few console.logs you should be able to see how it uses the formula in the above-linked S.O. post.
I tried to implement this in js. It works for string that have no repeated letters but I get a wrong count otherwise. Here is my code:
function x(str) {
var sOrdinata = str.split('').sort()
console.log('sOrdinata = '+ sOrdinata)
var str = str.split('')
console.log('str = '+str)
console.log('\n')
var pos = 1;
for(var j in str){
//console.log(j)
for(var i in sOrdinata){
if(sOrdinata[i]==str[j]){
console.log('found, position: '+ i)
sOrdinata.splice(i,1)
console.log('Nuovo sOrdinata = '+sOrdinata)
console.log('\n')
break;
}
else{
//calculate number of permutations
console.log('valore di j: '+j)
//console.log('lunghezza stringa da permutare: '+str.slice(~~j+1).length);
if(str.slice(j).length >1 ){sub = str.slice(~~j+1)}else {sub = str.slice(j)}
console.log('substring to be used for permutation: '+ sub)
prep = nrepC(sub.join(''))
console.log('prep = '+prep)
num = factorial(sub.length)
console.log('num = '+num)
den = denom(prep)
console.log('den = '+ den)
pos += num/den
console.log(num/den)
console.log('\n')
}
}
}
console.log(pos)
return pos
}
/* ------------ functions used by main --------------- */
function nrepC(str){
var obj={}
var repeats=[]
var res= [];
for(x = 0, length = str.length; x < length; x++) {
var l = str.charAt(x)
obj[l] = (isNaN(obj[l]) ? 1 : obj[l] + 1);
}
//console.log(obj)
for (var i in obj){
if(obj[i]>1) res.push(obj[i])
}
if(res.length==0){res.push(1); return res}
else return res
}
function num(vect){
var res = 1
}
function denom(vect){
var res = 1
for(var i in vect){
res*= factorial(vect[i])
}
return res
}
function factorial (n){
if (n==0 || n==1){
return 1;
}
return factorial(n-1)*n;
}
A bit too late but just as reference... You can use this C# code directly.
It will work but...
The only important thing is that usually, you should have unique values as your starting set. Otherwise you don't have n! permutations. You have something else (less than n!). I have a little doubt of any useful usage when item could be duplicate ones.
using System;
using System.Collections.Generic;
namespace WpfPermutations
{
public class PermutationOuelletLexico3<T>
{
// ************************************************************************
private T[] _sortedValues;
private bool[] _valueUsed;
public readonly long MaxIndex; // long to support 20! or less
// ************************************************************************
public PermutationOuelletLexico3(T[] sortedValues)
{
if (sortedValues.Length <= 0)
{
throw new ArgumentException("sortedValues.Lenght should be greater than 0");
}
_sortedValues = sortedValues;
Result = new T[_sortedValues.Length];
_valueUsed = new bool[_sortedValues.Length];
MaxIndex = Factorial.GetFactorial(_sortedValues.Length);
}
// ************************************************************************
public T[] Result { get; private set; }
// ************************************************************************
/// <summary>
/// Return the permutation relative to the index received, according to
/// _sortedValues.
/// Sort Index is 0 based and should be less than MaxIndex. Otherwise you get an exception.
/// </summary>
/// <param name="sortIndex"></param>
/// <returns>The result is written in property: Result</returns>
public void GetValuesForIndex(long sortIndex)
{
int size = _sortedValues.Length;
if (sortIndex < 0)
{
throw new ArgumentException("sortIndex should be greater or equal to 0.");
}
if (sortIndex >= MaxIndex)
{
throw new ArgumentException("sortIndex should be less than factorial(the lenght of items)");
}
for (int n = 0; n < _valueUsed.Length; n++)
{
_valueUsed[n] = false;
}
long factorielLower = MaxIndex;
for (int index = 0; index < size; index++)
{
long factorielBigger = factorielLower;
factorielLower = Factorial.GetFactorial(size - index - 1); // factorielBigger / inverseIndex;
int resultItemIndex = (int)(sortIndex % factorielBigger / factorielLower);
int correctedResultItemIndex = 0;
for(;;)
{
if (! _valueUsed[correctedResultItemIndex])
{
resultItemIndex--;
if (resultItemIndex < 0)
{
break;
}
}
correctedResultItemIndex++;
}
Result[index] = _sortedValues[correctedResultItemIndex];
_valueUsed[correctedResultItemIndex] = true;
}
}
// ************************************************************************
/// <summary>
/// Calc the index, relative to _sortedValues, of the permutation received
/// as argument. Returned index is 0 based.
/// </summary>
/// <param name="values"></param>
/// <returns></returns>
public long GetIndexOfValues(T[] values)
{
int size = _sortedValues.Length;
long valuesIndex = 0;
List<T> valuesLeft = new List<T>(_sortedValues);
for (int index = 0; index < size; index++)
{
long indexFactorial = Factorial.GetFactorial(size - 1 - index);
T value = values[index];
int indexCorrected = valuesLeft.IndexOf(value);
valuesIndex = valuesIndex + (indexCorrected * indexFactorial);
valuesLeft.Remove(value);
}
return valuesIndex;
}
// ************************************************************************
}
}
My approach to the problem is sort the given permutation.
Number of swappings of the characters in the string will give us the position of the pemutation in the sorted list of permutations.
An inefficient solution would be to successively find the previous permutations until you reach a string that cannot be permuted anymore. The number of permutations it takes to reach this state is the position of the original string.
However, if you use combinatorics you can achieve the solution faster. The previous solution will produce a very slow output if string length exceeds 12.

Google search results: How to find the minimum window that contains all the search keywords?

What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?
As stated, the problem is solved by a rather simple algorithm:
Just look through the input text sequentially from the very beginning and check each word: whether it is in the search key or not. If the word is in the key, add it to the end of the structure that we will call The Current Block. The Current Block is just a linear sequence of words, each word accompanied by a position at which it was found in the text. The Current Block must maintain the following Property: the very first word in The Current Block must be present in The Current Block once and only once. If you add the new word to the end of The Current Block, and the above property becomes violated, you have to remove the very first word from the block. This process is called normalization of The Current Block. Normalization is a potentially iterative process, since once you remove the very first word from the block, the new first word might also violate The Property, so you'll have to remove it as well. And so on.
So, basically The Current Block is a FIFO sequence: the new words arrive at the right end, and get removed by normalization process from the left end.
All you have to do to solve the problem is look through the text, maintain The Current Block, normalizing it when necessary so that it satisfies The Property. The shortest block with all the keywords in it you ever build is the answer to the problem.
For example, consider the text
CxxxAxxxBxxAxxCxBAxxxC
with keywords A, B and C. Looking through the text you'll build the following sequence of blocks
C
CA
CAB - all words, length 9 (CxxxAxxxB...)
CABA - all words, length 12 (CxxxAxxxBxxA...)
CABAC - violates The Property, remove first C
ABAC - violates The Property, remove first A
BAC - all words, length 7 (...BxxAxxC...)
BACB - violates The Property, remove first B
ACB - all words, length 6 (...AxxCxB...)
ACBA - violates The Property, remove first A
CBA - all words, length 4 (...CxBA...)
CBAC - violates The Property, remove first C
BAC - all words, length 6 (...BAxxxC)
The best block we built has length 4, which is the answer in this case
CxxxAxxxBxxAxx CxBA xxxC
The exact complexity of this algorithm depends on the input, since it dictates how many iterations the normalization process will make, but ignoring the normalization the complexity would trivially be O(N * log M), where N is the number of words in the text and M is the number of keywords, and O(log M) is the complexity of checking whether the current word belongs to the keyword set.
Now, having said that, I have to admit that I suspect that this might not be what you need. Since you mentioned Google in the caption, it might be that the statement of the problem you gave in your post is not complete. Maybe in your case the text is indexed? (With indexing the above algorithm is still applicable, just becomes more efficient). Maybe there's some tricky database that describes the text and allows for a more efficient solution (like without looking through the entire text)? I can only guess and you are not saying...
I think the solution proposed by AndreyT assumes no duplicates exists in the keywords/search terms. Also, the current block can get as big as the text itself if text contains lot of duplicate keywords.
For example:
Text: 'ABBBBBBBBBB'
Keyword text: 'AB'
Current Block: 'ABBBBBBBBBB'
Anyway, I have implemented in C#, did some basic testing, would be nice to get some feedback on whether it works or not :)
static string FindMinWindow(string text, string searchTerms)
{
Dictionary<char, bool> searchIndex = new Dictionary<char, bool>();
foreach (var item in searchTerms)
{
searchIndex.Add(item, false);
}
Queue<Tuple<char, int>> currentBlock = new Queue<Tuple<char, int>>();
int noOfMatches = 0;
int minLength = Int32.MaxValue;
int startIndex = 0;
for(int i = 0; i < text.Length; i++)
{
char item = text[i];
if (searchIndex.ContainsKey(item))
{
if (!searchIndex[item])
{
noOfMatches++;
}
searchIndex[item] = true;
var newEntry = new Tuple<char, int> ( item, i );
currentBlock.Enqueue(newEntry);
// Normalization step.
while (currentBlock.Count(o => o.Item1.Equals(currentBlock.First().Item1)) > 1)
{
currentBlock.Dequeue();
}
// Figuring out minimum length.
if (noOfMatches == searchTerms.Length)
{
var length = currentBlock.Last().Item2 - currentBlock.First().Item2 + 1;
if (length < minLength)
{
startIndex = currentBlock.First().Item2;
minLength = length;
}
}
}
}
return noOfMatches == searchTerms.Length ? text.Substring(startIndex, minLength) : String.Empty;
}
This is an interesting question.
To restate it more formally:
Given a list L (the web page) of length n and a set S (the query) of size k, find the smallest sublist of L that contains all the elements of S.
I'll start with a brute-force solution in hopes of inspiring others to beat it.
Note that set membership can be done in constant time, after one pass through the set. See this question.
Also note that this assumes all the elements of S are in fact in L, otherwise it will just return the sublist from 1 to n.
best = (1,n)
For i from 1 to n-k:
Create/reset a hash found[] mapping each element of S to False.
For j from i to n or until counter == k:
If found[L[j]] then counter++ and let found[L[j]] = True;
If j-i < best[2]-best[1] then let best = (i,j).
Time complexity is O((n+k)(n-k)). Ie, n^2-ish.
Here's a solution using Java 8.
static Map.Entry<Integer, Integer> documentSearch(Collection<String> document, Collection<String> query) {
Queue<KeywordIndexPair> queue = new ArrayDeque<>(query.size());
HashSet<String> words = new HashSet<>();
query.stream()
.forEach(words::add);
AtomicInteger idx = new AtomicInteger();
IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
AtomicInteger size = new AtomicInteger();
document.stream()
.map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
.filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
.forEach(pair -> {
// only the first and last elements are useful to the algorithm, so we don't bother removing
// an element from any other index. note that removing an element using equality
// from an ArrayDeque is O(n)
KeywordIndexPair first = queue.peek();
if (pair.equals(first)) {
queue.remove();
}
queue.add(pair);
first = queue.peek();
int diff = pair.index - first.index;
if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
interval.begin = first.index;
interval.end = pair.index;
size.set(0);
}
});
return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
}
There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.
Test:
Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog
Query: banana, cat
Interval: 8, 10
For all the words, maintain min and max index in case there is going to be more than one entry; if not both min and mix index will same.
import edu.princeton.cs.algs4.ST;
public class DicMN {
ST<String, Words> st = new ST<>();
public class Words {
int min;
int max;
public Words(int index) {
min = index;
max = index;
}
}
public int findMinInterval(String[] sw) {
int begin = Integer.MAX_VALUE;
int end = Integer.MIN_VALUE;
for (int i = 0; i < sw.length; i++) {
if (st.contains(sw[i])) {
Words w = st.get(sw[i]);
begin = Math.min(begin, w.min);
end = Math.max(end, w.max);
}
}
if (begin != Integer.MAX_VALUE) {
return (end - begin) + 1;
}
return 0;
}
public void put(String[] dw) {
for (int i = 0; i < dw.length; i++) {
if (!st.contains(dw[i])) {
st.put(dw[i], new Words(i));
}
else {
Words w = st.get(dw[i]);
w.min = Math.min(w.min, i);
w.max = Math.max(w.max, i);
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
DicMN dic = new DicMN();
String[] arr1 = { "one", "two", "three", "four", "five", "six", "seven", "eight" };
dic.put(arr1);
String[] arr2 = { "two", "five" };
System.out.print("Interval:" + dic.findMinInterval(arr2));
}
}

Find the first un-repeated character in a string

What is the quickest way to find the first character which only appears once in a string?
It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.
So you can iterate over the characters and append each character to a list the first time you see it, and separately keep a count of how many times you've seen it (in fact the only values that matter for the count is "0", "1" or "more than 1").
When you reach the end of the string you just have to find the first character in the list that has a count of exactly one.
Example code in Python:
def first_non_repeated_character(s):
counts = defaultdict(int)
l = []
for c in s:
counts[c] += 1
if counts[c] == 1:
l.append(c)
for c in l:
if counts[c] == 1:
return c
return None
This runs in O(n).
I see that people have posted some delightful answers below, so I'd like to offer something more in-depth.
An idiomatic solution in Ruby
We can find the first un-repeated character in a string like so:
def first_unrepeated_char string
string.each_char.tally.find { |_, n| n == 1 }.first
end
How does Ruby accomplish this?
Reading Ruby's source
Let's break down the solution and consider what algorithms Ruby uses for each step.
First we call each_char on the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could use each_byte instead.
The each_char method is implemented like so:
rb_str_each_char(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_chars(str, 0);
}
In turn, rb_string_enumerate_chars is implemented as:
rb_str_enumerate_chars(VALUE str, VALUE ary)
{
VALUE orig = str;
long i, len, n;
const char *ptr;
rb_encoding *enc;
str = rb_str_new_frozen(str);
ptr = RSTRING_PTR(str);
len = RSTRING_LEN(str);
enc = rb_enc_get(str);
if (ENC_CODERANGE_CLEAN_P(ENC_CODERANGE(str))) {
for (i = 0; i < len; i += n) {
n = rb_enc_fast_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
else {
for (i = 0; i < len; i += n) {
n = rb_enc_mbclen(ptr + i, ptr + len, enc);
ENUM_ELEM(ary, rb_str_subseq(str, i, n));
}
}
RB_GC_GUARD(str);
if (ary)
return ary;
else
return orig;
}
From this we can see that it calls rb_enc_mbclen (or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string as tally consumes the iterator.
Tally is then implemented like so:
static void
tally_up(VALUE hash, VALUE group)
{
VALUE tally = rb_hash_aref(hash, group);
if (NIL_P(tally)) {
tally = INT2FIX(1);
}
else if (FIXNUM_P(tally) && tally < INT2FIX(FIXNUM_MAX)) {
tally += INT2FIX(1) & ~FIXNUM_FLAG;
}
else {
tally = rb_big_plus(tally, INT2FIX(1));
}
rb_hash_aset(hash, group, tally);
}
static VALUE
tally_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, hash))
{
ENUM_WANT_SVALUE();
tally_up(hash, i);
return Qnil;
}
Here, tally_i uses RB_BLOCK_CALL_FUNC_ARGLIST to call repeatedly to tally_up, which updates the tally hash on every iteration.
Rough time & memory analysis
The each_char method doesn't allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When we tally the characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.
Time-wise, tally does a full scan of the string, and calling find to locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.
However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).
However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).
My old accepted answer in Python
You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:
def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False
Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computerâ„¢.
Why not use a heap based data structure such as a minimum priority queue. As you read each character from the string, add it to the queue with a priority based on the location in the string and the number of occurrences so far. You could modify the queue to add priorities on collision so that the priority of a character is the sum of the number appearances of that character. At the end of the loop, the first element in the queue will be the least frequent character in the string and if there are multiple characters with a count == 1, the first element was the first unique character added to the queue.
Here is another fun way to do it. Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
Lots of answers are attempting O(n) but are forgetting the actual costs of inserting and removing from the lists/associative arrays/sets they're using to track.
If you can assume that a char is a single byte, then you use a simple array indexed by the char and keep a count in it. This is truly O(n) because the array accesses are guaranteed O(1), and the final pass over the array to find the first element with 1 is constant time (because the array has a small, fixed size).
If you can't assume that a char is a single byte, then I would propose sorting the string and then doing a single pass checking adjacent values. This would be O(n log n) for the sort plus O(n) for the final pass. So it's effectively O(n log n), which is better than O(n^2). Also, it has virtually no space overhead, which is another problem with many of the answers that are attempting O(n).
Counter requires Python2.7 or Python3.1
>>> from collections import Counter
>>> def first_non_repeated_character(s):
... counts = Counter(s)
... for c in s:
... if counts[c]==1:
... return c
... return None
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
Refactoring a solution proposed earlier (not having to use extra list/memory). This goes over the string twice. So this takes O(n) too like the original solution.
def first_non_repeated_character(s):
counts = defaultdict(int)
for c in s:
counts[c] += 1
for c in s:
if counts[c] == 1:
return c
return None
The following is a Ruby implementation of finding the first nonrepeated character of a string:
def first_non_repeated_character(string)
string1 = string.split('')
string2 = string.split('')
string1.each do |let1|
counter = 0
string2.each do |let2|
if let1 == let2
counter+=1
end
end
if counter == 1
return let1
break
end
end
end
p first_non_repeated_character('dont doddle in the forest')
And here is a JavaScript implementation of the same style function:
var first_non_repeated_character = function (string) {
var string1 = string.split('');
var string2 = string.split('');
var single_letters = [];
for (var i = 0; i < string1.length; i++) {
var count = 0;
for (var x = 0; x < string2.length; x++) {
if (string1[i] == string2[x]) {
count++
}
}
if (count == 1) {
return string1[i];
}
}
}
console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));
In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.
I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).
unsigned char find_first_unique(unsigned char *string)
{
int chars[256];
int i=0;
memset(chars, 0, sizeof(chars));
while (string[i++])
{
chars[string[i]]++;
}
i = 0;
while (string[i++])
{
if (chars[string[i]] == 1) return string[i];
}
return 0;
}
In Ruby:
(Original Credit: Andrew A. Smith)
x = "a huge string in which some characters repeat"
def first_unique_character(s)
s.each_char.detect { |c| s.count(c) == 1 }
end
first_unique_character(x)
=> "u"
def first_non_repeated_character(string):
chars = []
repeated = []
for character in string:
if character in repeated:
... discard it.
else if character in chars:
chars.remove(character)
repeated.append(character)
else:
if not character in repeated:
chars.append(character)
if len(chars):
return chars[0]
else:
return False
Other JavaScript solutions are quite c-style solutions here is a more JavaScript-style solution.
var arr = string.split("");
var occurences = {};
var tmp;
var lowestindex = string.length+1;
arr.forEach( function(c){
tmp = c;
if( typeof occurences[tmp] == "undefined")
occurences[tmp] = tmp;
else
occurences[tmp] += tmp;
});
for(var p in occurences) {
if(occurences[p].length == 1)
lowestindex = Math.min(lowestindex, string.indexOf(p));
}
if(lowestindex > string.length)
return null;
return string[lowestindex];
}
in C, this is almost Shlemiel the Painter's Algorithm (not quite O(n!) but more than 0(n2)).
But will outperform "better" algorithms for reasonably sized strings because O is so small. This can also easily tell you the location of the first non-repeating string.
char FirstNonRepeatedChar(char * psz)
{
for (int ii = 0; psz[ii] != 0; ++ii)
{
for (int jj = ii+1; ; ++jj)
{
// if we hit the end of string, then we found a non-repeat character.
//
if (psz[jj] == 0)
return psz[ii]; // this character doesn't repeat
// if we found a repeat character, we can stop looking.
//
if (psz[ii] == psz[jj])
break;
}
}
return 0; // there were no non-repeating characters.
}
edit: this code is assuming you don't mean consecutive repeating characters.
Here's an implementation in Perl (version >=5.10) that doesn't care whether the repeated characters are consecutive or not:
use strict;
use warnings;
foreach my $word(#ARGV)
{
my #distinct_chars;
my %char_counts;
my #chars=split(//,$word);
foreach (#chars)
{
push #distinct_chars,$_ unless $_~~#distinct_chars;
$char_counts{$_}++;
}
my $first_non_repeated="";
foreach(#distinct_chars)
{
if($char_counts{$_}==1)
{
$first_non_repeated=$_;
last;
}
}
if(length($first_non_repeated))
{
print "For \"$word\", the first non-repeated character is '$first_non_repeated'.\n";
}
else
{
print "All characters in \"$word\" are repeated.\n";
}
}
Storing this code in a script (which I named non_repeated.pl) and running it on a few inputs produces:
jmaney> perl non_repeated.pl aabccd "a huge string in which some characters repeat" abcabc
For "aabccd", the first non-repeated character is 'b'.
For "a huge string in which some characters repeat", the first non-repeated character is 'u'.
All characters in "abcabc" are repeated.
Here's a possible solution in ruby without using Array#detect (as in this answer). Using Array#detect makes it too easy, I think.
ALPHABET = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
def fnr(s)
unseen_chars = ALPHABET.dup
seen_once_chars = []
s.each_char do |c|
if unseen_chars.include?(c)
unseen_chars.delete(c)
seen_once_chars << c
elsif seen_once_chars.include?(c)
seen_once_chars.delete(c)
end
end
seen_once_chars.first
end
Seems to work for some simple examples:
fnr "abcdabcegghh"
# => "d"
fnr "abababababababaqababa"
=> "q"
Suggestions and corrections are very much appreciated!
Try this code:
public static String findFirstUnique(String str)
{
String unique = "";
foreach (char ch in str)
{
if (unique.Contains(ch)) unique=unique.Replace(ch.ToString(), "");
else unique += ch.ToString();
}
return unique[0].ToString();
}
In Mathematica one might write this:
string = "conservationist deliberately treasures analytical";
Cases[Gather # Characters # string, {_}, 1, 1][[1]]
{"v"}
This snippet code in JavaScript
var string = "tooth";
var hash = [];
for(var i=0; j=string.length, i<j; i++){
if(hash[string[i]] !== undefined){
hash[string[i]] = hash[string[i]] + 1;
}else{
hash[string[i]] = 1;
}
}
for(i=0; j=string.length, i<j; i++){
if(hash[string[i]] === 1){
console.info( string[i] );
return false;
}
}
// prints "h"
Different approach here.
scan each element in the string and create a count array which stores the repetition count of each element.
Next time again start from first element in the array and print the first occurrence of element with count = 1
C code
-----
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char t_c;
char *t_p = argv[1] ;
char count[128]={'\0'};
char ch;
for(t_c = *(argv[1]); t_c != '\0'; t_c = *(++t_p))
count[t_c]++;
t_p = argv[1];
for(t_c = *t_p; t_c != '\0'; t_c = *(++t_p))
{
if(count[t_c] == 1)
{
printf("Element is %c\n",t_c);
break;
}
}
return 0;
}
input is = aabbcddeef output is = c
char FindUniqueChar(char *a)
{
int i=0;
bool repeat=false;
while(a[i] != '\0')
{
if (a[i] == a[i+1])
{
repeat = true;
}
else
{
if(!repeat)
{
cout<<a[i];
return a[i];
}
repeat=false;
}
i++;
}
return a[i];
}
Here is another approach...we could have a array which will store the count and the index of the first occurrence of the character. After filling up the array we could jst traverse the array and find the MINIMUM index whose count is 1 then return str[index]
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <climits>
using namespace std;
#define No_of_chars 256
//store the count and the index where the char first appear
typedef struct countarray
{
int count;
int index;
}countarray;
//returns the count array
countarray *getcountarray(char *str)
{
countarray *count;
count=new countarray[No_of_chars];
for(int i=0;i<No_of_chars;i++)
{
count[i].count=0;
count[i].index=-1;
}
for(int i=0;*(str+i);i++)
{
(count[*(str+i)].count)++;
if(count[*(str+i)].count==1) //if count==1 then update the index
count[*(str+i)].index=i;
}
return count;
}
char firstnonrepeatingchar(char *str)
{
countarray *array;
array = getcountarray(str);
int result = INT_MAX;
for(int i=0;i<No_of_chars;i++)
{
if(array[i].count==1 && result > array[i].index)
result = array[i].index;
}
delete[] (array);
return (str[result]);
}
int main()
{
char str[] = "geeksforgeeks";
cout<<"First non repeating character is "<<firstnonrepeatingchar(str)<<endl;
return 0;
}
Function:
This c# function uses a HashTable (Dictionary) and have a performance O(2n) worstcase.
private static string FirstNoRepeatingCharacter(string aword)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < aword.Length; i++)
{
if (!dic.ContainsKey(aword.Substring(i, 1)))
dic.Add(aword.Substring(i, 1), 1);
else
dic[aword.Substring(i, 1)]++;
}
foreach (var item in dic)
{
if (item.Value == 1) return item.Key;
}
return string.Empty;
}
Example:
string aword = "TEETER";
Console.WriteLine(FirstNoRepeatingCharacter(aword)); //print: R
I have two strings i.e. 'unique' and 'repeated'. Every character appearing for the first time, gets added to 'unique'. If it is repeated for the second time, it gets removed from 'unique' and added to 'repeated'. This way, we will always have a string of unique characters in 'unique'.
Complexity big O(n)
public void firstUniqueChar(String str){
String unique= "";
String repeated = "";
str = str.toLowerCase();
for(int i=0; i<str.length();i++){
char ch = str.charAt(i);
if(!(repeated.contains(str.subSequence(i, i+1))))
if(unique.contains(str.subSequence(i, i+1))){
unique = unique.replaceAll(Character.toString(ch), "");
repeated = repeated+ch;
}
else
unique = unique+ch;
}
System.out.println(unique.charAt(0));
}
The following code is in C# with complexity of n.
using System;
using System.Linq;
using System.Text;
namespace SomethingDigital
{
class FirstNonRepeatingChar
{
public static void Main()
{
String input = "geeksforgeeksandgeeksquizfor";
char[] str = input.ToCharArray();
bool[] b = new bool[256];
String unique1 = "";
String unique2 = "";
foreach (char ch in str)
{
if (!unique1.Contains(ch))
{
unique1 = unique1 + ch;
unique2 = unique2 + ch;
}
else
{
unique2 = unique2.Replace(ch.ToString(), "");
}
}
if (unique2 != "")
{
Console.WriteLine(unique2[0].ToString());
Console.ReadLine();
}
else
{
Console.WriteLine("No non repeated string");
Console.ReadLine();
}
}
}
}
The following solution is an elegant way to find the first unique character within a string using the new features which have been introduced as part as Java 8. This solution uses the approach of first creating a map to count the number of occurrences of each character. It then uses this map to find the first character which occurs only once. This runs in O(N) time.
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
// Runs in O(N) time and uses lambdas and the stream API from Java 8
// Also, it is only three lines of code!
private static String findFirstUniqueCharacterPerformantWithLambda(String inputString) {
// convert the input string into a list of characters
final List<String> inputCharacters = Arrays.asList(inputString.split(""));
// first, construct a map to count the number of occurrences of each character
final Map<Object, Long> characterCounts = inputCharacters
.stream()
.collect(groupingBy(s -> s, counting()));
// then, find the first unique character by consulting the count map
return inputCharacters
.stream()
.filter(s -> characterCounts.get(s) == 1)
.findFirst()
.orElse(null);
}
Here is one more solution with o(n) time complexity.
public void findUnique(String string) {
ArrayList<Character> uniqueList = new ArrayList<>();
int[] chatArr = new int[128];
for (int i = 0; i < string.length(); i++) {
Character ch = string.charAt(i);
if (chatArr[ch] != -1) {
chatArr[ch] = -1;
uniqueList.add(ch);
} else {
uniqueList.remove(ch);
}
}
if (uniqueList.size() == 0) {
System.out.println("No unique character found!");
} else {
System.out.println("First unique character is :" + uniqueList.get(0));
}
}
I read through the answers, but did not see any like mine, I think this answer is very simple and fast, am I wrong?
def first_unique(s):
repeated = []
while s:
if s[0] not in s[1:] and s[0] not in repeated:
return s[0]
else:
repeated.append(s[0])
s = s[1:]
return None
test
(first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')
Question : First Unique Character of a String
This is the simplest solution.
public class Test4 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
firstUniqCharindex(a);
}
public static void firstUniqCharindex(String a) {
int[] count = new int[256];
for (int i = 0; i < a.length(); i++) {
count[a.charAt(i)]++;
}
int index = -1;
for (int i = 0; i < a.length(); i++) {
if (count[a.charAt(i)] == 1) {
index = i;
break;
} // if
}
System.out.println(index);// output => 8
System.out.println(a.charAt(index)); //output => P
}// end1
}
IN Python :
def firstUniqChar(a):
count = [0] * 256
for i in a: count[ord(i)] += 1
element = ""
for items in a:
if(count[ord(items) ] == 1):
element = items ;
break
return element
a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P
Using Java 8 :
public class Test2 {
public static void main(String[] args) {
String a = "GiniGinaProtijayi";
Map<Character, Long> map = a.chars()
.mapToObj(
ch -> Character.valueOf((char) ch)
).collect(
Collectors.groupingBy(
Function.identity(),
LinkedHashMap::new,
Collectors.counting()));
System.out.println("MAP => " + map);
// {G=2, i=5, n=2, a=2, P=1, r=1, o=1, t=1, j=1, y=1}
Character chh = map
.entrySet()
.stream()
.filter(entry -> entry.getValue() == 1L)
.map(entry -> entry.getKey())
.findFirst()
.get();
System.out.println("First Non Repeating Character => " + chh);// P
}// main
}
how about using a suffix tree for this case... the first unrepeated character will be first character of longest suffix string with least depth in tree..
Create Two list -
unique list - having only unique character .. UL
non-unique list - having only repeated character -NUL
for(char c in str) {
if(nul.contains(c)){
//do nothing
}else if(ul.contains(c)){
ul.remove(c);
nul.add(c);
}else{
nul.add(c);
}

Resources