Trie complexity using linked list - data-structures

I implemented a Trie using linked lists (not array as usual)
My TrieNode contains linked list as node (instead of limit the input type- english,number,etc...)
I wonder what is the current complexity of my Trie using linked list as nodes.
Thanks!

If you want to check if a word w of length l is in the trie, you need to check at most l levels of the trie. In each layer, you need to check if the current node has a child with the letter you need next, by using linear search. The maximum number of child-nodes you need to iterate is the size of the alphabet you are using.
Therefore I think the answer is O(l * |A|), where A is the alphabet you are using, in case of lowercase latin letters A= a,b..,y,z; so |A| = 26.

I wouldn't say that I would use either linked list or an array* as the node storage for a trie, as either would make both search and insertion at each node be O(n), thus the complexity for the trie overall O(n * D) approaching O(A * D), where A and D are the alphabet length and trie depth, respectively.
Comparatively speaking, a hash map for each node strikes me as the least complex/most performant for a naive implementation of a trie as it would reduce complexity to search and insertion to O(1) per node and O(D) overall. The only additional complexity would be in resizing the map, but if space isn't a concern you could pre-size each map to have an initial capacity of A, eliminating the need to ever resize the maps dynamically.
*: This is based on using brute-force search and insertion for an array implementation. If you can create a static 1-1 mapping of characters to zero-based indices, you could achieve the same operational and space complexity as well as slightly better performance with an array than with a hash map.

Related

Comparison of search speed for B-Tree and Trie

I am trying to find out which will be more efficient in terms of speed of search, whether trie or B-Tree. I have a dictionary of English words and I want to locate a word in that dictionary efficiently.
If by "more efficient in time of search" you refer to theoretical time complexity, then B Tree offers O(logn * |S|)1 time complexity for search, while a trie offers O(|S|) time complexity, where |S| is the length of the searched string, and n is the number of elements in dictionary.
If by "more efficient in time of search" you refer to actual real life run time, that depends on the actual implementation, actual data and actual search behavior. Some examples that might influence the answer:
Size of data
Storage system (for example: RAM/Flah/disk/distributed filesystem/...)
Distribution of searches
Code optimizations of each implementation
(and much more)
(1) There are O(logn) comparisons, and each comparison takes O(|S|) times, since you need to traverse the entire string to decide which is higher (worst case analysis).
It depends on what's your need. If you want to get the whole subtree, a B+Tree is your best choice because it is space efficient and also the branching factor of the B+ Tree affects its performance (the number of intermediary nodes). If h is the height of the tree, then nmax ~~ bh. Therefore h ~~ log(nmax) / log(b).
With n = 1 000 000 000 and b = 100, we have h ~~ 5. Therefore it means only 5 pointer dereferencing for going from the root to the leaf. It's more cache-friendly than a Trie.
But if you want to get the first N children from a substree, then a Trie is the best choice because you simply visit less nodes than in a B+ Tree scenario.
Also the word prefix completion is well handled by trie.

How can you change Trie to support showing results by popularity (certain kind of weights)?

I was wondering how can you change the behavior of Trie data structure to support auto completion by popularity of certain words for example?
I was thinking about each node holds a linked list that is sorted by the popularity, is that a good idea? but then how can you know the order of the words the auto complete will show?
first, you have to add an attribute popularity to the trie node, and always a node is chosen, you increase it's popularity by 1.
then, let's suppose you wanna get the suggestions between all words starting with "abc".
first, you walk normally in the trie until you reach the node that represents "abc", then you can call a In-Order tree traversal starting is that node, and always you visit a node, you add it in a priority_queue (or heap), which will sort the nodes just as you define it to. You can define it to sort the nodes as the greatest popularity will be the priority.
after the traversal is done, you can remove the elements from the heap one by one (as many as you want) and each element you remove will be the greatest between all elements left (the first element will be the greatest of all, the second will be the greatest between all elements left, etc.)
You can read this if you are interested in time and space complexity:
this approach takes time O(size of word) to find the root node, then O(n lg n) to sort the nodes you want and then O(k lg n) to get the first k elements you need to show. In a nutshell, the time complexity is O(n lg n) and the space complexity is O(n) because of the heap.

Find nth Smallest element from Binary Search Tree

How to find nth Smallest element from Binary Search Tree
Constraints are :
time complexity must be O(1)
No extra space should be used
I have already tried 2 approaches.
Doing inorder traversal and finding nth element - Time complexity O(n)
Maintaining no. of small elements than current node and finding element with m small elements - Time complexity O(log n)
The only way I could think about is to change the data structure that holds the BST in memory. Should be simple if you actually consider every nodes as structure themselves (value, left_child and right_child) instead of storing them in a unordered array, you can store them in a ordered array. Thus the nth smallest element would be the nth element in your array. The extra computation will be at insertion and deletion. But it still would be more effective if you use for example a C++ set (log(n) for both insertion and deletion).
It mainly depends on your use case.
If you do not use data structure for handling the tree (based on array position) I don't think you cannot do it in something better than log(n).

data structure similar to array but supporting deletion

I am thinking of the following data structure question:
given integers between 1 and n in sorted order, every operation queries and then removes (in a single call) kth smallest number. How to make the query and removal both constant time operations?
It is similar to an array structure but requiring constant removing. Though an order balanced binary tree can do this, but it is O(lg n) complexity.
Can one take the advantage of the range property (numbers only between 1 and n) to make it work?
LinkedHashSet is what you are looking for . If you want index as in arrays then use this LinkedHashMap. But you need to insert them in order from 1 ton
What is the maximal value of N? You mentioned that you are going to work with positive numbers - Van Emde Boas tree probably the best choice for you.
Short description:
- allows to store only positive numbers from [0,2^k), where k is is a number of bits required to store maximal number N. - all operations (insert,delete,lookup,find_next,find_prev) works in log(K).Not log(N). So, for integer 32-bit numbers complexity is log(32)=5
- disadvantage is memory consumption. requires 2^k ~ O(N) memory, so for storing integers you need ~1GB RAM. Remember, that usually O(N) memory means O(number of elements) but here it means O(maximal stored value).
Note: I'm not sure about supporting k-th element query but description looks nice:
FindNext: find the key/value pair with the smallest key at least a
given k
FindPrevious: find the key/value pair with the largest key at most a
given k
UPDATE
As Dukeling mentioned below, K-th element query is not supported. I see the only way to implement it.
int x = getMin();
for(int i=0;i<k-1;i++) x = getNext(x);
after this loop x will store k-th element. But complexity is O(K*log(bits)). Too bad for large values of K(

O(1) extra space lookup data structure

I was wondering if there was a simple data structure that supports amortized log(n) lookup and insertion like a self balancing binary search tree but with constant memory overhead. (I don't really care about deleting elements).
One idea I had was to store everything in one contiguous block of memory divided into two contiguous blocks: an S part where all elements are sorted, and a U that isn't sorted.
To perform an insertion, we could add an element to U, and if the size of U exceeds log(size of S), then you sort the entire contiguous array (treat both S and U as one contiguous array), so that after the sort everything is in S and U is empty.
To perform lookup run binary search on S and just look through all of U.
However, I am having trouble calculating the amortized insertion time of my algorithm.
Ultimately I would just appreciate some reasonably simple algorithm/datastructure with desired properties, and some guarantee that it runs reasonably fast in amortized time.
Thank you!
If by constant amount of memory overhead you mean that for N elements stored in the data-structure the space consumption should be O(N), then any balanced tree will do -- in fact, any n-ary tree storing the elements in external leaves, where n > 1 and every external tree contains an element, has this property.
This follows from the fact that any tree graph with N nodes has N - 1 edges.
If by constant amount of memory overhead you mean that for N elements the space consumption should be N + O(1), then neither the balanced trees nor the hash tables have this property -- both will use k * N memory, where k > 1 due to extra node pointers in the case of trees and the load factor in the case of hash tables.
I find your approach interesting, but I do not think it will work even if you only sort U, and then merge the two sets in linear time. You would need to do a sort (O(logN * log(logN)) operations) after every logN updates, followed by an O(n) merging of S and U (note that so far nobody actually knows how to do this in linear time in place, that is, without an extra array).
The amortized insertion time would be O(n / logN). But you could maybe use your approach to achieve something close to O(√n) if you allow the size of U to grow to the square root of S.
Any hashtable will do that. The only tricky part about it is how you resolve conflicts - there are few ways of doing it, the other tricky part is correct hash computing.
See:
http://en.wikipedia.org/wiki/Hash_table

Resources