Trie implementation - Inserting elements into a trie - algorithm

I am developing a Trie data-structure where each node represents a word. So words st, stack, stackoverflow and overflow will be arranged as
root
--st
---stack
-----stackoverflow
--overflow
My Trie uses a HashTable internally so all node lookup will take constant time. Following is the algorithm I came up to insert an item into the trie.
Check item existence in the trie. If exist, return, else goto step2.
Iterate each character in the key and check for the existence of the word. Do this until we get a node where the new value can be added as child. If no node found, it will be added under root node.
After insertion, rearrange the siblings of the node under which the new node was inserted. This will walk through all the siblings and compare against the newly inserted node. If any of the node starts with same characters that new node have, it will be moved from there and added as child of new node.
I am not sure that this is the correct way of implementing a trie. Any suggestions or improvements are welcome.
Language used : C++

The trie should look like this
ROOT
overflow/ \st
O O
\ack
O
\overflow
O
Normally you don't need to use hash tables as part of a trie; the trie itself is already an efficient index data structure. Of course you can do that.
But anyway, your step (2) should actually descend the trie during the search and not just query the hash function. In this way you find the insertion point readily and don't need to search for it later as a separate step.
I believe step (3) is wrong, you don't need to rearrange a trie and as a matter of fact you shouldn't be able to because it's only the additional string fragments that you store in the trie; see the picture above.

Following is the java code for insert algorithm.
public void insert(String s){
Node current = root;
if(s.length()==0) //For an empty character
current.marker=true;
for(int i=0;i<s.length();i++){
Node child = current.subNode(s.charAt(i));
if(child!=null){
current = child;
}
else{
current.child.add(new Node(s.charAt(i)));
current = current.subNode(s.charAt(i));
}
// Set marker to indicate end of the word
if(i==s.length()-1)
current.marker = true;
}
}
For a more detailed tutorial, refer here.

Related

Hash Tables and Separate Chaining: How do you know which value to return from the bucket's list?

We're learning about hash tables in my data structures and algorithms class, and I'm having trouble understanding separate chaining.
I know the basic premise: each bucket has a pointer to a Node that contains a key-value pair, and each Node contains a pointer to the next (potential) Node in the current bucket's mini linked list. This is mainly used to handle collisions.
Now, suppose for simplicity that the hash table has 5 buckets. Suppose I wrote the following lines of code in my main after creating an appropriate hash table instance.
myHashTable["rick"] = "Rick Sanchez";
myHashTable["morty"] = "Morty Smith";
Let's imagine whatever hashing function we're using just so happens to produce the same bucket index for both string keys rick and morty. Let's say that bucket index is index 0, for simplicity.
So at index 0 in our hash table, we have two nodes with values of Rick Sanchez and Morty Smith, in whatever order we decide to put them in (the first pointing to the second).
When I want to display the corresponding value for rick, which is Rick Sanchez per our code here, the hashing function will produce the bucket index of 0.
How do I decide which node needs to be returned? Do I loop through the nodes until I find the one whose key matches rick?
To resolve Hash Tables conflicts, that's it, to put or get an item into the Hash Table whose hash value collides with another one, you will end up reducing a map to the data structure that is backing the hash table implementation; this is generally a linked list. In the case of a collision this is the worst case for the Hash Table structure and you will end up with an O(n) operation to get to the correct item in the linked list. That's it, a loop as you said, that will search the item with the matching key. But, in the cases that you have a data structure like a balanced tree to search, it can be O(logN) time, as the Java8 implementation.
As JEP 180: Handle Frequent HashMap Collisions with Balanced Trees says:
The principal idea is that once the number of items in a hash bucket
grows beyond a certain threshold, that bucket will switch from using a
linked list of entries to a balanced tree. In the case of high hash
collisions, this will improve worst-case performance from O(n) to
O(log n).
This technique has already been implemented in the latest version of
the java.util.concurrent.ConcurrentHashMap class, which is also slated
for inclusion in JDK 8 as part of JEP 155. Portions of that code will
be re-used to implement the same idea in the HashMap and LinkedHashMap
classes.
I strongly suggest to always look at some existing implementation. To say about one, you could look at the Java 7 implementation. That will increase your code reading skills, that is almost more important or you do more often than writing code. I know that it is more effort but it will pay off.
For example, take a look at the HashTable.get method from Java 7:
public synchronized V get(Object key) {
Entry<?,?> tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
for (Entry<?,?> e = tab[index] ; e != null ; e = e.next) {
if ((e.hash == hash) && e.key.equals(key)) {
return (V)e.value;
}
}
return null;
}
Here we see that if ((e.hash == hash) && e.key.equals(key)) is trying to find the correct item with the matching key.
And here is the full source code: HashTable.java

optimizing the data structure implementation

There is a stream of random characters coming like 'a''b''c''a'... and so on. At any given point in time when I query I need to get the first non repeating character. For example, for the input "abca", 'b' should be returned since a is repeated and the first non repeating character is 'b'.
There needs to be two methods, one for inserting and one for querying.
My solution is to have a linkedList to store the incoming stream characters. While I get the next character, I just compare with all the current characters and if present I will not insert into the end of linkedlist, else I will insert at the end. By this approach, the query will take O(1) since I will get the first element on the linkedlist and insert will take O(n) since I need to compare from the first element till the last element in the worst case.
Is there any better performing way?
Either you haven't explained your algorithm well or it won't return the correct result. In the example a b a, would your algorithm return a (because it is the first element in the linked list)?
Anyway, here is a modification that improves performance. The idea is to use a hash map from characters to (doubly) linked list nodes. This map can be used to determine if a character has already been inserted and to get to the required node quickly. We should allow a null value for the map target (instead of the list node) to express a character that has ocurred more than once already.
The insertion method works as follows:
Check if the map contains the current character (O(1)). If not, add it to the end of the list and add a reference to the map (O(1)).
If the character is already in the map: Check if the pointed to node is null (O(1)). If so, just ignore it. If it is not, remove the pointed to node from the list and update the reference to a null value (O(1)).
Overall, a O(1) operation.
The query works as in your previous solution.
Here is a C# implementation. It's basically a 1:1 translation of the above explanation:
class StreamAnalyzer
{
LinkedList<char> characterList = new LinkedList<char>();
Dictionary<char, LinkedListNode<char>> characterMap
= new Dictionary<char, LinkedListNode<char>>();
public void AddCharacter(char c)
{
LinkedListNode<char> referencedNode;
if (characterMap.TryGetValue(c, out referencedNode))
{
if(referencedNode != null)
{
characterList.Remove(referencedNode);
characterMap[c] = null;
}
}
else
{
var node = new LinkedListNode<char>(c);
characterList.AddLast(node);
characterMap.Add(c, node);
}
}
public char? GetFirstNonRepeatingCharacter()
{
if (characterList.First == null)
return null;
else
return characterList.First.Value;
}
}

Pseudo code to check if binary tree is a binary search tree - not sure about the recursion

I have homeowork to write pseudo code to check if a valid binary tree is a search binary tree.
I created an array to hold the in-order values of the tree. if the in-order values are in decreasing order it means it is indeed BST. However I've got some problem with the recursion in the method InOverArr.
I need to update the index of the array in order to submit the values to the array in the order they are at the tree.
I'm not sure the index is really updated properly during the recursion.. is it or not? and if you see some problem can you help me fix this? thanks a lot
pseudo code
first function
IsBST(node)
size ← TreeSize(node)
create new array TreeArr of Size number of cells
index ← 0
few comments:
now we use the IN_ORDER procedure with a small variation , I called the new version of the procedure: InOrderArr
the pseudo code of InOrderArr is described below IsBST
InOrderArr(node, TreeArr, index)
for i from 1 to size-1 do
if not (TreeArr[i] > TreeArr[i-1]) return
false
return true
second function
InOrderArr (node, Array, index)
if node = NULL then return
else
InOrderArr (node.left, Array, index)
treeArr[index] = node.key
index ← index + 1
InOrderArr (node.right, Array, index)
Return
Your code is generally correct. Just three notes.
The correctness of the code depends on the implementation, specifically on the way of index handling. Many programming languages pass arguments to subroutines by value. That means the subroutine receives a copy of the value and modifications made to the parameter have no effect on the original value. So incrementing index during execution of InOrderArr (node.left, Array, index) would not affect the position used by treeArr[index] = node.key. As a result only the rightmost path would be stored in the array.
To avoid that you'll have to ensure that index is passed by reference, so that incrementation done by a callee advances the position used later by a caller.
BST is usually defined so that the left subtreee of a node contains keys that are less than that node's key, and the right subtree contains nodes with greater keys – see Wikipedia's article on BST. Then the inorder traversal retrieves keys in ascending order. Why do you expect descending order?
Possibly it would be more efficient to drop the array and just recursively test a definition condition of BST?
Whenever we follow a left link we expect keys which are less than the current one. Whenever we follow the right link we expect keys greater the the current one. So for most subtrees there is some interval of keys values, defined by some ancestor nodes' keys. Just track those keys and test whether the key falls inside the current valid interval. Be sure to handle 'no left end defined' condition on the letfmost path and 'no right end' on the rightmost path of the tree. At the root node there's no ancestor yet, so the root key is not tested at all (any value is OK).
EDIT
C code draft:
// Test a node against its closest left-side and right-side ancestors
boolean isNodeBST(NODE *lt, NODE *node, NODE *rt)
{
if(node == NULL)
return true;
if(lt != NULL && node->key < lt->key)
return false;
if(rt != NULL && node->key > rt->key)
return false;
return
isNodeBST(lt, node->left, node) &&
isNodeBST(node, node->right, rt);
}
boolean isTreeBST(TREE *tree)
{
return isNodeBST( NULL, tree->root, NULL);
}

Print nodes of two binary trees in ascending order

Given two binary search trees, print the nodes in ascending order with time complexity O(n) and space complexity: O(1)
The trees cannot be modified. Only traversal is allowed.
The problem I am facing is with the O(1)space solution. Had there not been this constraint, it could have been easily solved.
The only way this can be done in space O(1) is if the nodes know their parent.
Otherwise you cannot even traverse the tree unless there is some additional aid.
However with this constraint it's again easy and back to tree-traversal, but without recursion. The tricky part is probably knowing which tree-path you came from when you go from a node to its parent (p) and cannot store this information as this would require O(log N) space.
However, you know the last value you outputted. If it is smaller than the one of p, go the right, otherwise go to p’s parent.
if we're talking about BST's as defined by wikipedia:
The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.
with the additional perk that every node knows his parent, then the following C code does the trick (I hope you like C, I have put quite some effort in these 300 lines of demo application :D)
http://pastebin.com/MiTGqakq
(note that I didn't use recursion, because recursion is technically never O(1)space. The reason for this that every function call uses copies of the passed parameters, thus allocating additional space, making O_space dependent on the number of calls -> not in O(1)space.)
EDIT: ok, fixed version is linked. have fun.
I have solution of this problem.
I have coded my solution in C#, because it is my strongest language, but I hope that you will catch a main idea. Let's suppose, that each tree node has 3 references: to left, right and parent nodes.
So we have BinaryTree. How could we print it? Obviously:
this._tree.Print();
That wasn't very difficult. But how could we build Print method, if we should avoid recursion (because the last one involves O(log(n)) memory)? Have you ever read about lazy lists (or streams)? Lazy list doesn't hold the whole list in memory, but knows how to calculate next item based on current item. In every moment lazy list allocates O(1) memory. So, suppose we have managed to describe lazy list for tree. Then Print method is very simple:
public static void Print<T>(this BinaryTree<T> tree)
where T : IComparable<T>
{
var node = new TreeNodeWalker<T>(tree.Root, WalkerState.FromParent);
while (node != null)
{
node = node.WalkNext();
}
}
During this code snippet you could find out one unfamiliar entity: TreeNodeWalker. This object holds tree node that should be walked, state that signals in what moment of traversing this walker was created and method which gives next walker. In short walker performs next actions:
If we drop in any subtree from parent node, we should walk left subtree.
If we emerges from left subtree, we should print node value and walk right subtree.
If we emerges from right subtree we should walk parent.
It could be represented in code in the next way:
public class TreeNodeWalker<T>
where T:IComparable<T>
{
// Tree node, for which walker is created.
private readonly BinaryTreeNode<T> _node;
// State of walker.
private readonly WalkerState _state;
public TreeNodeWalker(BinaryTreeNode<T> node, WalkerState state)
{
this._node = node;
this._state = state;
}
public TreeNodeWalker<T> WalkNext()
{
if (this._state == WalkerState.FromParent)
{
// If we come to this node from parent
// we should walk left subtree first.
if (this._node.Left != null)
{
return new TreeNodeWalker<T>(this._node.Left, WalkerState.FromParent);
}
else
{
// If left subtree doesn't exist - return this node but with changed state (as if we have already walked left subtree).
return new TreeNodeWalker<T>(this._node, WalkerState.FromLeftSubTree);
}
}
else if (this._state == WalkerState.FromLeftSubTree)
{
// If we have returned from left subtree - current node is smallest in the tree
// so we should print it.
Console.WriteLine(this._node.Data.ToString());
// And walk right subtree...
if (this._node.Right != null)
{
//... if it exists
return new TreeNodeWalker<T>(this._node.Right, WalkerState.FromParent);
}
else
{
// ... or return current node as if we have returned from right subtree.
return new TreeNodeWalker<T>(this._node, WalkerState.FromRightSubTree);
}
}
else if (this._state == WalkerState.FromRightSubTree)
{
// If we have returned from right subtree, then we should move up.
if (this._node.Parent != null)
{
// If parent exists - we compare current node with left parent's node
// in order to say parent's walker which state is correct.
return new TreeNodeWalker<T>(this._node.Parent, this._node.Parent.Left == this._node ? WalkerState.FromLeftSubTree : WalkerState.FromRightSubTree);
}
else
{
// If there is no parent... Hooray, we have achieved root, which means end of walk.
return null;
}
}
else
{
return null;
}
}
}
You could see a lot of memory allocation in code and make decision that O(1) memory requirement is not fulfilled. But after getting next walker item, we don't need previous one any more. If you are coding in C++ don't forget to free memory. Alternatively, you could avoid new walker instance allocation at all with changing internal state and node variables instead (you should always return this reference in corresponding places).
As for time complexity - it's O(n). Actually O(3*n), because we visit each node three times maximum.
Good luck.

Right Threading a Binary Tree

I'm having a hell of a time trying to figure this one out. Everywhere I look, I seem to be only running into explanations on how to actually traverse through the list non-recursively (the part I actually understand). Can anyone out there hammer in how exactly I can go through the list initially and find the actual predecessor/successor nodes so I can flag them in the node class? I need to be able to create a simple Binary Search Tree and go through the list and reroute the null links to the predecessor/successor. I've had some luck with a solution somewhat like the following:
thread(node n, node p) {
if (n.left !=null)
thread (n.left, n);
if (n.right !=null) {
thread (n.right, p);
}
n.right = p;
}
From your description, I'll assume you have a node with a structure looking something like:
Node {
left
right
}
... and that you have a binary tree of these set up using the left and right, and that you want to re-assign values to left and right such that it creates a doublely-linked-list from a depth first traversal of the tree.
The root (no pun intended) problem with what you've got so far is that the "node p" (short for previous?) that is passed during the traversal needs to be independent of where in the tree you currently are - it always needs to contain the previously visited node. To do that, each time thread is run it needs to reference the same "previous" variable. I've done some Python-ish pseudo code with one C-ism - if you're not familiar, '&' means "reference to" (or "ref" in C#), and '*' means "dereference and give me the object it is pointing to".
Node lastVisited
thread(root, &lastVisisted)
function thread(node, lastVisitedRef)
if (node.left)
thread(node.left, lastVisitedRef)
if (node.right)
thread(node.right, lastVisitedRef)
// visit this node, reassigning left and right
if (*lastVisitedRef)
node.right = *lastVisitedRef
(*lastVisitedRef).left = node
// update reference lastVisited
lastVisitedRef = &node
If you were going to implement this in C, you'd actually need a double pointer to hold the reference, but the idea is the same - you need to persist the location of the "last visited node" during the entire traversal.

Resources