There two similar data structure Trie abd Hash Array Mapped Trie wikipedia says:
A hash array mapped trie1 (HAMT) is an implementation of an
associative array that combines the characteristics of a hash table
and an array mapped trie.
My problem is next - i can't get why we need for Hash Array Mapped Trie while we have Trie which works on searching elements for O(1) and does Hash array mapped trie is more efficient than TrieMap, if it's true?
Related
I implemented a Trie using linked lists (not array as usual)
My TrieNode contains linked list as node (instead of limit the input type- english,number,etc...)
I wonder what is the current complexity of my Trie using linked list as nodes.
Thanks!
If you want to check if a word w of length l is in the trie, you need to check at most l levels of the trie. In each layer, you need to check if the current node has a child with the letter you need next, by using linear search. The maximum number of child-nodes you need to iterate is the size of the alphabet you are using.
Therefore I think the answer is O(l * |A|), where A is the alphabet you are using, in case of lowercase latin letters A= a,b..,y,z; so |A| = 26.
I wouldn't say that I would use either linked list or an array* as the node storage for a trie, as either would make both search and insertion at each node be O(n), thus the complexity for the trie overall O(n * D) approaching O(A * D), where A and D are the alphabet length and trie depth, respectively.
Comparatively speaking, a hash map for each node strikes me as the least complex/most performant for a naive implementation of a trie as it would reduce complexity to search and insertion to O(1) per node and O(D) overall. The only additional complexity would be in resizing the map, but if space isn't a concern you could pre-size each map to have an initial capacity of A, eliminating the need to ever resize the maps dynamically.
*: This is based on using brute-force search and insertion for an array implementation. If you can create a static 1-1 mapping of characters to zero-based indices, you could achieve the same operational and space complexity as well as slightly better performance with an array than with a hash map.
I implement hash table , where key represented by 26 alphabetic character and values represent by array correspond for each key, that array contain all words start with the corresponding character. So, to search for specific word in that hash table, i should search in keys to find the first character for that word, once it found search in the corresponding array to find the specific word. is that take O(n^2) as search in keys for specific character and search in corresponding array for specific word. or it take O(log (n))?
You didn't post any code, so I'm going to make some assumptions. You say you have a hashtable with 26 keys (known in advance), so I'll assume this is implemented optimally as an array of 26 elements, which gets O(1) access by key into that array. Then you say you have arrays at each element, containing all of the elements that start with that letter. As hash functions go, it's fairly weak, but it certainly is a valid one.
So, when we want to search for a particular value, we need to look in the appropriate bin based on first letter (which takes O(1)) and then search linearly through that bin (O(n)). So overall, the complexity is O(n), assuming again that your toplevel hashtable data structure is implemented efficiently.
Now, since you say "first letter", I'll assume you're operating on strings, which gives a possibility for optimization. Strings have a nice property in that they can be ordered easily. So if you ensure that your bins are always sorted in lexicographic order, you can get O(log n) lookup with a binary search rather than a linear one. Note that this change takes your insertion function from (amortized) O(1) to O(n), so you should consider whether you'll be inserting or searching more often.
I have some doubts about, what is the best data structure for this task.
I have multiple texts with #hashtags, and I want to detect and insert hashtags of that text in a good data structure.
small example:
hey #my #name is blah #my name blah blah
then i Have
#my #name #my
#my 2
#name 1
I'm thinking about using a hashtable, so i can insert and lookup an hashtag with O(1). The problem is. If i want to print all the hashtags sorted by hashtag repetitions (and then alphabetically to break ties) I have to do it with O(N log N). Also if I want to find the hashtag with max repetitions I have to do it with O(N).
On the other hand, I have a Binary tree. I get insertion and lookup with O(log N) which is worst then HashTable, but I get O(N) printing in order, and O(log N) findind the max (O(1) with Binary Heap?) .
Which data structure give me the fastest solution? Binary Tree becuse give me a better average complexity? Binary Heap? There is any better Data Structure?
but I get O(N) printing in order, and O(log N) findind the max (O(1) with Binary Heap?)
If you use a binary tree as your main data structure while counting repetitions of the hashtags, you'll need to have it sorted alphabetically by the word in question, so that won't help you print "sorted by hashtag repetitions". And, you can trivially calculate the max while populating a hash table - there's no need to do another operation post insertions.
Solution: have a hash map from hashtag to count. Each time you increment a repetition count, if it's then greater than any you've seen before, remember that max_count value.
Then create an array of max_count (+1 if your language uses 0-based indexing) variable-sized arrays, and iterate the hash table appending the hashtags into the array index matching their frequency count. Then to print your results, iterate the outer frequency array, sorting the variable-length array of hashtags at each index then printing them.
How is a trie useful if one has to at least read all the characters from the input array of strings to insert the strings one by one into the trie? So the complexity is O(size_of_array). Suppose the input array of strings is {"hello",world","stack","overflow"}, and we want to search for "stack", then we would have to at least traverse the whole array for inserting the keys into the trie. So complexity is O(size_of_array). We could do this without a trie.
The point of a trie is for many queries.
A trie offers a relatively cheap insertion of each string.
A trie offers a relatively cheap search for a string.
That means, if you have an array, and you need to query existence of a string in it just once - it will probably not be very helpful, since creation of the trie for a single query is wasteful.
However, if you have a dictionary, and you are going to query it next millions of time - it is significantly more efficient to search the trie, than to repeatedly search the array - and that's where you benefit from it.
Having a sorted dict (hash table, map or whatever key/value structure) you can easily have a binary search to look for an item. If we assume the keys are unique but values could be repeated, what data structure can we use to have O(log n) retrieval for keys and also O(log n) query to find count of values=something in the given data?
Two binary search trees, one for keys, second for values, with mutual pointers will provide the required functionality. The pointers can be many-to-one from keys to values and one-to-many from values to keys.