writing trie implementation - algorithm

This question has been asked many times and I googled many places but still couldn't find one place where I can get step-by-step instruction for writing trie implementation. Please help me out
preferred language is Java or Python
Thank you

I have written a tries for string searching in java. Its pretty simple:
here are the steps:
Node class is like this:
public class Trienode {
char c;
Trienode parent;
ArrayList<Trienode> childs;
}
Trienode addString{ String str, Trienode root ){
if(str.length == 0) return root;
String newstr = [str without the first char];
char c = str[0];
Trienode newnode = root[c - '0'];
if(newnode == null){
newnode = new Trienode();
newnode.c = c;
newnode.parent = root;
}
return addString(newstr, newnode);
}
you can create search etc on the same line.

#!/usr/bin/env python
import sys
class Node:
def __init__(self):
self.next = {} #Initialize an empty hash (python dictionary)
self.word_marker = False
# There can be words, Hot and Hottest. When search is performed, usually state transition upto leaf node is peformed and characters are printed.
# Then in this case, only Hottest will be printed. Hot is intermediate state. Inorder to mark t as a state where word is to be print, a word_marker is used
def add_item(self, string):
''' Method to add a string the Trie data structure'''
if len(string) == 0:
self.word_marker = True
return
key = string[0] #Extract first character
string = string[1:] #Create a string by removing first character
# If the key character exists in the hash, call next pointing node's add_item() with remaining string as argument
if self.next.has_key(key):
self.next[key].add_item(string)
# Else create an empty node. Insert the key character to hash and point it to newly created node. Call add_item() in new node with remaining string.
else:
node = Node()
self.next[key] = node
node.add_item(string)
def dfs(self, sofar=None):
'''Perform Depth First Search Traversal'''
# When hash of the current node is empty, that means it is a leaf node.
# Hence print sofar (sofar is a string containing the path as character sequences through which state transition occured)
if self.next.keys() == []:
print "Match:",sofar
return
if self.word_marker == True:
print "Match:",sofar
# Recursively call dfs for all the nodes pointed by keys in the hash
for key in self.next.keys():
self.next[key].dfs(sofar+key)
def search(self, string, sofar=""):
'''Perform auto completion search and print the autocomplete results'''
# Make state transition based on the input characters.
# When the input characters becomes exhaused, perform dfs() so that the trie gets traversed upto leaves and print the state characters
if len(string) > 0:
key = string[0]
string = string[1:]
if self.next.has_key(key):
sofar = sofar + key
self.next[key].search(string,sofar)
else:
print "No match"
else:
if self.word_marker == True:
print "Match:",sofar
for key in self.next.keys():
self.next[key].dfs(sofar+key)
def fileparse(filename):
'''Parse the input dictionary file and build the trie data structure'''
fd = open(filename)
root = Node()
line = fd.readline().strip('\r\n') # Remove newline characters \r\n
while line !='':
root.add_item(line)
line = fd.readline().strip('\r\n')
return root
if __name__ == '__main__':
if len(sys.argv) != 2:
print "Usage: ", sys.argv[0], "dictionary_file.txt"
sys.exit(2)
root = fileparse(sys.argv[1])
print "Input:",
input=raw_input()
root.search(input)

I'm not a Java or Python coder, but can give you a very simple c# trie implementation. It's very very simple so I'm sure you could map it to Java.
Here it is:
public class Trie<T> : Dictionary<T, Trie<T>>
{
}
Done. Told you it was simple. But it is a trie and it works.

Here is implementation:-
import java.util.HashMap;
public class Tries {
class Node {
HashMap<Character, Node> children;
boolean end;
public Node(boolean b){
children = new HashMap<Character, Tries.Node>();
end = false;
}
}
private Node root;
public Tries(){
root = new Node(false);
}
public static void main(String args[]){
Tries tr = new Tries();
tr.add("dog");
tr.add("doggy");
System.out.println(tr.search("dogg"));;
}
private boolean search(String word) {
Node crawl = root;
int n = word.length();
for(int i=0;i<n;i++){
char ch = word.charAt(i);
if(crawl.children.get(ch) == null){
return false;
}
else {
crawl = crawl.children.get(ch);
if(i==n-1 && crawl.end == true){
return true;
}
}
}
return false;
}
private void add(String word) {
Node crawl = root;
int n = word.length();
for(int i=0;i<n;i++){
char ch = word.charAt(i);
if(crawl.children.containsKey(ch)){
crawl = crawl.children.get(ch);
}
else {
crawl.children.put(ch, new Node(false));
Node temp = crawl.children.get(ch);
if(i == n-1){
temp.end = true;
}
crawl = temp;
System.out.println(ch + " " + crawl.end);
}
}
}
}
Just use hashmap and keep track of end of word.

Related

Algorithm / data structure for resolving nested interpolated values in this example?

I am working on a compiler and one aspect currently is how to wait for interpolated variable names to be resolved. So I am wondering how to take a nested interpolated variable string and build some sort of simple data model/schema for unwrapping the evaluated string so to speak. Let me demonstrate.
Say we have a string like this:
foo{a{x}-{y}}-{baz{one}-{two}}-foo{c}
That has 1, 2, and 3 levels of nested interpolations in it. So essentially it should resolve something like this:
wait for x, y, one, two, and c to resolve.
when both x and y resolve, then resolve a{x}-{y} immediately.
when both one and two resolve, resolve baz{one}-{two}.
when a{x}-{y}, baz{one}-{two}, and c all resolve, then finally resolve the whole expression.
I am shaky on my understanding of the logic flow for handling something like this, wondering if you could help solidify/clarify the general algorithm (high level pseudocode or something like that). Mainly just looking for how I would structure the data model and algorithm so as to progressively evaluate when the pieces are ready.
I'm starting out trying and it's not clear what to do next:
{
dependencies: [
{
path: [x]
},
{
path: [y]
}
],
parent: {
dependency: a{x}-{y} // interpolated term
parent: {
dependencies: [
{
}
]
}
}
}
Some sort of tree is probably necessary, but I am having trouble figuring out what it might look like, wondering if you could shed some light on that with some pseudocode (or JavaScript even).
watch the leaf nodes at first
then, when the children of a node are completed, propagate upward to resolving the next parent node. This would mean once x and y are done, it could resolve a{x}-{y}, but then wait until the other nodes are ready before doing the final top-level evaluation.
You can just simulate it by sending "events" to the system theoretically, like:
ready('y')
ready('c')
ready('x')
ready('a{x}-{y}')
function ready(variable) {
if ()
}
...actually that may not work, not sure how to handle the interpolated nodes in a hacky way like that. But even a high level description of how to solve this would be helpful.
export type SiteDependencyObserverParentType = {
observer: SiteDependencyObserverType
remaining: number
}
export type SiteDependencyObserverType = {
children: Array<SiteDependencyObserverType>
node: LinkNodeType
parent?: SiteDependencyObserverParentType
path: Array<string>
}
(What I'm currently thinking, some TypeScript)
Here is an approach in JavaScript:
Parse the input string to create a Node instance for each {} term, and create parent-child dependencies between the nodes.
Collect the leaf Nodes of this tree as the tree is being constructed: group these leaf nodes by their identifier. Note that the same identifier could occur multiple times in the input string, leading to multiple Nodes. If a variable x is resolved, then all Nodes with that name (the group) will be resolved.
Each node has a resolve method to set its final value
Each node has a notify method that any of its child nodes can call in order to notify it that the child has been resolved with a value. This may (or may not yet) lead to a cascading call of resolve.
In a demo, a timer is set up that at every tick will resolve a randomly picked variable to some number
I think that in your example, foo, and a might be functions that need to be called, but I didn't elaborate on that, and just considered them as literal text that does not need further treatment. It should not be difficult to extend the algorithm with such function-calling features.
class Node {
constructor(parent) {
this.source = ""; // The slice of the input string that maps to this node
this.texts = []; // Literal text that's not part of interpolation
this.children = []; // Node instances corresponding to interpolation
this.parent = parent; // Link to parent that should get notified when this node resolves
this.value = undefined; // Not yet resolved
}
isResolved() {
return this.value !== undefined;
}
resolve(value) {
if (this.isResolved()) return; // A node is not allowed to resolve twice: ignore
console.log(`Resolving "${this.source}" to "${value}"`);
this.value = value;
if (this.parent) this.parent.notify();
}
notify() {
// Check if all dependencies have been resolved
let value = "";
for (let i = 0; i < this.children.length; i++) {
const child = this.children[i];
if (!child.isResolved()) { // Not ready yet
console.log(`"${this.source}" is getting notified, but not all dependecies are ready yet`);
return;
}
value += this.texts[i] + child.value;
}
console.log(`"${this.source}" is getting notified, and all dependecies are ready:`);
this.resolve(value + this.texts.at(-1));
}
}
function makeTree(s) {
const leaves = {}; // nodes keyed by atomic names (like "x" "y" in the example)
const tokens = s.split(/([{}])/);
let i = 0; // Index in s
function dfs(parent=null) {
const node = new Node(parent);
const start = i;
while (tokens.length) {
const token = tokens.shift();
i += token.length;
if (token == "}") break;
if (token == "{") {
node.children.push(dfs(node));
} else {
node.texts.push(token);
}
}
node.source = s.slice(start, i - (tokens.length ? 1 : 0));
if (node.children.length == 0) { // It's a leaf
const label = node.texts[0];
leaves[label] ??= []; // Define as empty array if not yet defined
leaves[label].push(node);
}
return node;
}
dfs();
return leaves;
}
// ------------------- DEMO --------------------
let s = "foo{a{x}-{y}}-{baz{one}-{two}}-foo{c}";
const leaves = makeTree(s);
// Create a random order in which to resolve the atomic variables:
function shuffle(array) {
for (var i = array.length - 1; i > 0; i--) {
var j = Math.floor(Math.random() * (i + 1));
[array[j], array[i]] = [array[i], array[j]];
}
return array;
}
const names = shuffle(Object.keys(leaves));
// Use a timer to resolve the variables one by one in the given random order
let index = 0;
function resolveRandomVariable() {
if (index >= names.length) return; // all done
console.log("\n---------------- timer tick --------------");
const name = names[index++];
console.log(`Variable ${name} gets a value: "${index}". Calling resolve() on the connected node instance(s):`);
for (const node of leaves[name]) node.resolve(index);
setTimeout(resolveRandomVariable, 1000);
}
setTimeout(resolveRandomVariable, 1000);
your idea of building a dependency tree it's really likeable.
Anyway I tryed to find a solution as simplest possible.
Even if it already works, there are many optimizations possible, take this just as proof of concept.
The background idea it's produce a List of Strings which you can read in order where each element it's what you need to solve progressively. Each element might be mandatory to solve something that come next in the List, hence for the overall expression. Once you solved all the chunks you have all pieces to solve your original expression.
It's written in Java, I hope it's understandable.
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Objects;
public class StackOverflow {
public static void main(String[] args) {
String exp = "foo{a{x}-{y}}-{baz{one}-{two}}-foo{c}";
List<String> chunks = expToChunks(exp);
//it just reverse the order of the list
Collections.reverse(chunks);
System.out.println(chunks);
//output -> [c, two, one, baz{one}-{two}, y, x, a{x}-{y}]
}
public static List<String> expToChunks(String exp) {
List<String> chunks = new ArrayList<>();
//this first piece just find the first inner open parenthesys and its relative close parenthesys
int begin = exp.indexOf("{") + 1;
int numberOfParenthesys = 1;
int end = -1;
for(int i = begin; i < exp.length(); i++) {
char c = exp.charAt(i);
if (c == '{') numberOfParenthesys ++;
if (c == '}') numberOfParenthesys --;
if (numberOfParenthesys == 0) {
end = i;
break;
}
}
//this if put an end to recursive calls
if(begin > 0 && begin < exp.length() && end > 0) {
//add the chunk to the final list
String substring = exp.substring(begin, end);
chunks.add(substring);
//remove from the starting expression the already considered chunk
String newExp = exp.replace("{" + substring + "}", "");
//recursive call for inner element on the chunk found
chunks.addAll(Objects.requireNonNull(expToChunks(substring)));
//calculate other chunks on the remained expression
chunks.addAll(Objects.requireNonNull(expToChunks(newExp)));
}
return chunks;
}
}
Some details on the code:
The following piece find the begin and the end index of the first outer chunk of expression. The background idea is: in a valid expression the number of open parenthesys must be equal to the number of closing parenthesys. The count of open(+1) and close(-1) parenthesys can't ever be negative.
So using that simple loop once I find the count of parenthesys to be 0, I also found the first chunk of the expression.
int begin = exp.indexOf("{") + 1;
int numberOfParenthesys = 1;
int end = -1;
for(int i = begin; i < exp.length(); i++) {
char c = exp.charAt(i);
if (c == '{') numberOfParenthesys ++;
if (c == '}') numberOfParenthesys --;
if (numberOfParenthesys == 0) {
end = i;
break;
}
}
The if condition provide validation on the begin and end indexes and stop the recursive call when no more chunks can be found on the remained expression.
if(begin > 0 && begin < exp.length() && end > 0) {
...
}

Swap Linked list objects

Following code works for sorting of the list (Peter,10) (John,32) (Mary,50) (Carol,31)
Ordered lists:
List 1: (Carol,31) (Carol,31) (John,32) (Mary,50)
however the peter is lost and carol is getting repeated, please help to suggest where Iam going wrong. WHat do I need to change in the loop to get this correct
LinkedList& LinkedList::order()
{
int swapped;
Node *temp;
Node *lptr = NULL;
temp=head;
// Checking for empty list
do
{
swapped = 0 ;
current = head;
while (current->get_next() != lptr)
{
if (current->get_data() > current->get_next()->get_data())
{
temp->set_Node(current->get_data());
current->set_Node(current->get_next()->get_data());
current->get_next()->set_Node(temp->get_data());
swapped = 1;
}
current = current->get_next();
}
lptr = current;
}
while (swapped);
return *this;
}

Locating predecessor and successor nodes

I have a method for getting the successor and predecessor nodes in a binary search tree, but I am having some problems locating the bug in my code. Say I add nodes with the following keys: "C", "B", and "K". If I print the contents of my binary search tree, I get the following output:
"C" "some data 1"
"B" "some data 2"
"K" "some data 3"
When I add "B" it obviously has no predecessor or successor, so I just set those to empty strings:
root = root->insert(root, key, data);
root->getNextAndPrev(root, prev, next, key);
string p;
string n;
if (!prev) {
pred = "";
}
else {
pred = prev->getKey();
}
if (!next) {
succ = "";
}
else {
succ = next->getKey();
}
return new Entry(data, succ, pred);
When I add "B" I get the output that "B"s successor is "C" and the predecessor is "" as expected. However, when I add "K" to the tree, I get the output that "K"s predecessor is "C" and the successor is also "C". I am not sure why I am getting this error since I check to see if there is no successor (nothing coming after "K") set it to an empty string.
My Node class handles the insert() and getNextAndPrev() methods, and here is how I've implemented them:
void Node::getNextAndPrev(Node* root, string key) {
if (!root) return;
if (root->key == key){
if (root->left != NULL){
Node* tempNode = root->left;
while (tempNode->right != NULL) {
tempNode = tempNode->right;
}
prev= tempNode;
}
if (root->right != NULL){
Node* tempNode = root->right;
while (tempNode->left != NULL) {
tempNode = tempNode->left;
}
next = tempNode;
}
}
if (root->key > key) {
next = root;
getNextAndPrev(root->left, key);
}
else {
prev = root;
getNextAndPrev(root->right, key);
}
}
Why is it that by adding some keys out of order causes my getNextAndPrev to retrieve incorrect values?
Perhaps it has something to do with how I am inserting entries in my main. I have a loop set up as follows:
string command = "";
Entry* entry = new Entry("","","");
string def = "";
while (true) {
cout << "Enter command: ";
getline(cin, command);
if (parseCommand(command, "ADD") == 0) {
string tempCmd = processCommand(command, 3);
string key = tempCmd.substr(0, tempCmd.length() - 4);
string data = tempCmd.substr(tempCmd.length() - 4);
trim(key);
trim(data);
def = data;
entry = dict->modify(key, data);
cout << "added: " << key << " with definition of : " << def << " to the dictionary " << endl;
}
modify() gets called like so inside my Dictionary class:
Entry * Dictionary::modify(string key, string data) {
Entry * entry = new Entry("","","");
if (root) entry = search(key);
//inserting something into the dictionary
if (data != "" && !this->root->doesContain(this->root, key)) {
root = root->insert(root, key, data);
return entry;
}
}
And finally, my search() method that gets called inside modify():
Entry * Dictionary::search(string key) {
if (key == "") {
return new Entry("", getSmallestKey(), getLargestKey());
}
if (!this->root->doesContain(root, key)) {
root->getNextAndPrev(root, key);
string prev;
string next;
if (root->getPrevNode() != NULL) {
prev = root->getPrevious();
cout << "Predecessor is " << prev << " root is: " << root->getKey() << endl;
}
else {
prev = "";
cout << "No Predecessor" << endl;
}
if (root->getNextNode() != NULL) {
next = root->getNext();
cout << "Successor is " << next << " root is: " << root->getKey() << endl;
}
else {
next = "";
cout << "No Successor" << endl;
}
if (next == prev) {
if (next < key) {
next = "";
}
if (prev > key) {
prev = "";
}
}
return new Entry("", next, prev);
}
To illustrate the problem in detail, here is the output from running the above:
Enter command: ADD "FOO" "D"
lookup stuff: root: prev: next: // gets logged out when I insert into dictionary
added: "FOO" with a definition of: "D" to the dictionary
Enter command: ADD "BIN" "C"
No Predecessor
Successor is "FOO" root is: "FOO"
lookup stuff: root: prev: next: "FOO"
added: "BIN" with a definition of: "C" to the dictionary
Enter command: ADD "QUUX" "D"
Predecessor is "FOO" root is: "FOO"
Successor is "FOO" root is: "FOO"
lookup stuff: root: prev: "FOO" next:
added: "QUUX" with a definition of: "D" to the dictionary
Enter command: ADD "BAZ" "N"
Predecessor is "FOO" root is: "FOO"
Successor is "BIN" root is: "FOO"
lookup stuff: root: prev: "FOO" next: "BIN"
added: "BAZ" with a definition of: "N" to the dictionary
I can't figure out why when adding BAZ to the dictionary, the predecessor and successor is now out of place:
Enter command: ADD "BAZ" "N"
Predecessor is "FOO" root is: "FOO"
Successor is "BIN" root is: "FOO"
I hope your Node constructor
new Node(key, d)
sets the left and right fields to NULL.
By successor and predecessor I think you mean inorder traversal successor and predecessor. Well your code seems absolutely fine to me
The only possibility is that I think you are using common variables prev and next to get the predecessor and successor. So try using different variables for different nodes.
After you are done getting successor and predecssor for the node B, define new variables:
Node *prevK,*nextK;
and call the function getNextAndPrev() with these variables.

splitting up the contents of a single line

I just went through a problem, where input is a string which is a single word.
This line is not readable,
Like, I want to leave is written as Iwanttoleave.
The problem is of separating out each of the tokens(words, numbers, abbreviations, etc)
I have no idea where to start
The first thought that came to my mind is making a dictionary and then mapping accordingly but I think making a dictionary is not at all a good idea.
Can anyone suggest some algorithm to do it ?
First of all, create a dictionary which helps you to identify if some string is a valid word or not.
bool isValidString(String s){
if(dictionary.contains(s))
return true;
return false;
}
Now, you can write a recursive code to split the string and create an array of actually useful words.
ArrayList usefulWords = new ArrayList<String>; //global declaration
void split(String s){
int l = s.length();
int i,j;
for(i = l-1; i >= 0; i--){
if(isValidString(s.substr(i,l)){ //s.substr(i,l) will return substring starting from index `i` and ending at `l-1`
usefulWords.add(s.substr(i,l));
split(s.substr(0,i));
}
}
}
Now, use these usefulWords to generate all possible strings. Maybe something like this:
ArrayList<String> splits = new ArrayList<String>[10]; //assuming max 10 possible outputs
ArrayList<String>[] allPossibleStrings(String s, int level){
for(int i = 0; i < s.length(); i++){
if(usefulWords.contains(s.substr(0,i)){
splits[level].add(s.substr(0,i));
allPossibleStrings(s.substr(i,s.length()),level);
level++;
}
}
}
Now, this code gives you all possible splits in a somewhat arbitrary manner. eg.
dictionary = {cat, dog, i, am, pro, gram, program, programmer, grammer}
input:
string = program
output:
splits[0] = {pro, gram}
splits[1] = {program}
input:
string = iamprogram
output:
splits[0] = {i, am, pro, gram} //since `mer` is not in dictionary
splits[1] = {program}
I did not give much thought to the last part, but I think you should be able to formulate a code from there as per your requirement.
Also, since no language is tagged, I've taken the liberty of writing the code in JAVA-like syntax as it is really easy to understand.
Instead of using a Dictionary, I'd suggest you use a Trie with all your valid words (the whole English dictionary?). Then you can start moving one letter at a time in your input line and the trie at the same time. If the letter leads to more results in the trie, you can continue expanding the current word, and if not, you can start looking for a new word in the trie.
This won't be a forward only search for sure, so you'll need some sort of backtracking.
// This method Generates a list with all the matching phrases for the given input
List<string> CandidatePhrases(string input) {
Trie validWords = BuildTheTrieWithAllValidWords();
List<string> currentWords = new List<string>();
List<string> possiblePhrases = new List<string>();
// The root of the trie has an empty key that points to all the first letters of all words
Trie currentWord = validWords;
int currentLetter = -1;
// Calls a backtracking method that creates all possible phrases
FindPossiblePhrases(input, validWords, currentWords, currentWord, currentLetter, possiblePhrases);
return possiblePhrases;
}
// The Trie structure could be something like
class Trie {
char key;
bool valid;
List<Trie> children;
Trie parent;
Trie Next(char nextLetter) {
return children.FirstOrDefault(c => c.key == nextLetter);
}
string WholeWord() {
Debug.Assert(valid);
string word = "";
Trie current = this;
while (current.Key != '\0')
{
word = current.Key + word;
current = current.parent;
}
}
}
void FindPossiblePhrases(string input, Trie validWords, List<string> currentWords, Trie currentWord, int currentLetter, List<string> possiblePhrases) {
if (currentLetter == input.Length - 1) {
if (currentWord.valid) {
string phrase = ""
foreach (string word in currentWords) {
phrase += word;
phrase += " ";
}
phrase += currentWord.WholeWord();
possiblePhrases.Add(phrase);
}
}
else {
// The currentWord may be a valid word. If that's the case, the next letter could be the first of a new word, or could be the next letter of a bigger word that begins with currentWord
if (currentWord.valid) {
// Try to match phrases when the currentWord is a valid word
currentWords.Add(currentWord.WholeWord());
FindPossiblePhrases(input, validWords, currentWords, validWords, currentLetter, possiblePhrases);
currentWords.RemoveAt(currentWords.Length - 1);
}
// If either the currentWord is a valid word, or not, try to match a longer word that begins with current word
int nextLetter = currentLetter + 1;
Trie nextWord = currentWord.Next(input[nextLetter]);
// If the nextWord is null, there was no matching word that begins with currentWord and has input[nextLetter] as the following letter.
if (nextWord != null) {
FindPossiblePhrases(input, validWords, currentWords, nextWord, nextLetter, possiblePhrases);
}
}
}

Algorithm to generate all variants of a word

i would like to explain my problem by the following example.
assume the word: abc
a has variants: ä, à
b has no variants.
c has variants: ç
so the possible words are:
abc
äbc
àbc
abç
äbç
àbç
now i am looking for the algorithm that prints all word variantions for abritray words with arbitray lettervariants.
I would recommend you to solve this recursively. Here's some Java code for you to get started:
static Map<Character, char[]> variants = new HashMap<Character, char[]>() {{
put('a', new char[] {'ä', 'à'});
put('b', new char[] { });
put('c', new char[] { 'ç' });
}};
public static Set<String> variation(String str) {
Set<String> result = new HashSet<String>();
if (str.isEmpty()) {
result.add("");
return result;
}
char c = str.charAt(0);
for (String tailVariant : variation(str.substring(1))) {
result.add(c + tailVariant);
for (char variant : variants.get(c))
result.add(variant + tailVariant);
}
return result;
}
Test:
public static void main(String[] args) {
for (String str : variation("abc"))
System.out.println(str);
}
Output:
abc
àbç
äbc
àbc
äbç
abç
A quickly hacked solution in Python:
def word_variants(variants):
print_variants("", 1, variants);
def print_variants(word, i, variants):
if i > len(variants):
print word
else:
for variant in variants[i]:
print_variants(word + variant, i + 1, variants)
variants = dict()
variants[1] = ['a0', 'a1', 'a2']
variants[2] = ['b0']
variants[3] = ['c0', 'c1']
word_variants(variants)
Common part:
string[] letterEquiv = { "aäà", "b", "cç", "d", "eèé" };
// Here we make a dictionary where the key is the "base" letter and the value is an array of alternatives
var lookup = letterEquiv
.Select(p => p.ToCharArray())
.SelectMany(p => p, (p, q) => new { key = q, values = p }).ToDictionary(p => p.key, p => p.values);
A recursive variation written in C#.
List<string> resultsRecursive = new List<string>();
// I'm using an anonymous method that "closes" around resultsRecursive and lookup. You could make it a standard method that accepts as a parameter the two.
// Recursive anonymous methods must be declared in this way in C#. Nothing to see.
Action<string, int, char[]> recursive = null;
recursive = (str, ix, str2) =>
{
// In the first loop str2 is null, so we create the place where the string will be built.
if (str2 == null)
{
str2 = new char[str.Length];
}
// The possible variations for the current character
var equivs = lookup[str[ix]];
// For each variation
foreach (var eq in equivs)
{
// We save the current variation for the current character
str2[ix] = eq;
// If we haven't reached the end of the string
if (ix < str.Length - 1)
{
// We recurse, increasing the index
recursive(str, ix + 1, str2);
}
else
{
// We save the string
resultsRecursive.Add(new string(str2));
}
}
};
// We launch our function
recursive("abcdeabcde", 0, null);
// The results are in resultsRecursive
A non-recursive version
List<string> resultsNonRecursive = new List<string>();
// I'm using an anonymous method that "closes" around resultsNonRecursive and lookup. You could make it a standard method that accepts as a parameter the two.
Action<string> nonRecursive = (str) =>
{
// We will have two arrays, of the same length of the string. One will contain
// the possible variations for that letter, the other will contain the "current"
// "chosen" variation of that letter
char[][] equivs = new char[str.Length][];
int[] ixes = new int[str.Length];
for (int i = 0; i < ixes.Length; i++)
{
// We start with index -1 so that the first increase will bring it to 0
equivs[i] = lookup[str[i]];
ixes[i] = -1;
}
// The current "workin" index of the original string
int ix = 0;
// The place where the string will be built.
char[] str2 = new char[str.Length];
// The loop will break when we will have to increment the letter with index -1
while (ix >= 0)
{
// We select the next possible variation for the current character
ixes[ix]++;
// If we have exausted the possible variations of the current character
if (ixes[ix] == equivs[ix].Length)
{
// Reset the current character to -1
ixes[ix] = -1;
// And loop back to the previous character
ix--;
continue;
}
// We save the current variation for the current character
str2[ix] = equivs[ix][ixes[ix]];
// If we are setting the last character of the string, then the string
// is complete
if (ix == str.Length - 1)
{
// And we save it
resultsNonRecursive.Add(new string(str2));
}
else
{
// Otherwise we have to do everything for the next character
ix++;
}
}
};
// We launch our function
nonRecursive("abcdeabcde");
// The results are in resultsNonRecursive
Both heavily commented.

Resources