Data Structure: Dictionary Like Tree - data-structures

I think I've heard that there is data structure something like tree to store dictionary entries.
It may look like:
c ┬ a ┬ b
│ ├ r
│ ├ s ─ e
│ └ t
├ i ─ ...
:
Is there any name for this data structure?
I cannot find it...
Thanks for your help, thank you in advance!

A trie might be what you're looking for.
A trie, also called digital tree and sometimes radix tree or prefix tree ..., is a kind of search tree - an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. ... [A node's] position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node...
A trie for keys "A","to", "tea", "ted", "ten", "i", "in", and "inn".

Related

How to reverse MACStringToOUI() in ClickHouse so I get the mac address with empty last 3 octets

SELECT MACStringToOUI('aa:bb:cc:dd:ee:ff')
gives me 11189196 which are the first three octets as a UInt64 number.
I'd like to convert it back to MacAddress so the desired result is aa:bb:cc:00:00:00.
I believe there's no native function for that. Do I have to move bits manually?
*256^3
SELECT MACNumToString(MACStringToOUI('aa:bb:cc:dd:ee:ff')*256*256*256) r;
Query id: 3a3637c3-d068-4b00-9024-01129517c3e2
┌─r─────────────────┐
│ AA:BB:CC:00:00:00 │
└───────────────────┘

Sorting Dictionary in Julia

I have a dictionary in Julia that I want to sort by values. I found a couple of ways to do it. For instance
dict = Dict(i => sqrt(i*rand()) for i = 1:20)
dict= sort(dict;byvalue = true) # method 1
dict = OrderedDict(i => sqrt(i*rand()) for i = 1:20) #method 2 using ordereddict package [https://github.com/JuliaCollections/DataStructures.j][1]
# I don't want to use collect() method as I do not want the tuples of dictionary.
However, sometimes these two methods do lose their orders down the line when they are passed from functions to functions. Are there any other methods where the ordered dictionary is immutable in Julia?
As by the docs, for OrderedDict
order refers to insertion order.
I don't know how this can "lose its order down the line", but maybe you are just mutating stuff?
Probably what you want is closer to a SortedDict; however, this sorts by keys, not values. A dictionary sorted by values is a bit of an unusual application.
If you want a mutable data structure with fast lookup by key and iteration sorted by value, you could emulate this by a two-level approach: a normal dict for storing a mapping between original keys and tokens, and a second SortedMultiDict{ValueType, Nothing} to emulate a sorted multiset into which the tokens index. Then you define your own mechanism for indirect lookup through tokens somehow like this:
function insert!(d::ValueSortedDict, k, v)
_, token = insert!(d.values, v, nothing)
d.keys[k] = token
end
getindex(d::ValueSortedDict, k) = deref_key((d.values, d.keys[k]))
And accordingly for other get/set style functions. (I didn't test this, it's just read off the docs.)
OTOH, if you never intend to mutate things, you can just do a very similar thing where you store a Dict{KeyType, Int} and a Vector{ValueType} together and sort! the vector once at the beginning. (Dictionaries.jl, described in #mcabbot's answer, is basically implementing this.)
You might be interested in Dictionaries.jl:
julia> dict = Dict(i => (i/10 + rand(1:99)) for i = 1:7)
Dict{Int64, Float64} with 7 entries:
5 => 16.5
4 => 23.4
6 => 98.6
7 => 56.7
2 => 7.2
3 => 58.3
1 => 85.1
julia> using Dictionaries
julia> sort(Dictionary(dict))
7-element Dictionary{Int64, Float64}
2 │ 7.2
5 │ 16.5
4 │ 23.4
7 │ 56.7
3 │ 58.3
1 │ 85.1
6 │ 98.6
julia> map(sqrt, ans)
7-element Dictionary{Int64, Float64}
2 │ 2.6832815729997477
5 │ 4.06201920231798
4 │ 4.8373546489791295
7 │ 7.52994023880668
3 │ 7.635443667528429
1 │ 9.22496612459905
6 │ 9.929753269845127

Can an enumeration be a class? - C++ [enum class] [Theory]

My End Goal:
Create the implementation of a hash-table from scratch. The twist, if the number of entries in a hash bucket is greater than 10 it is stored in Binary Search Tree, or else it is stored in a Linked List.
In my knowledge the only way to be able to achieve this is through a
enum class type_name { a, b };
My Question: Can 'a', and 'b' be classes?
Thought Process:
So to implement a hash table, I am thinking to make an array of the enumerated class this way, as soon the Linked List at any index of the array it will be replaced with a Binary Search Tree.
If this is not possible, what would be the best way to achieve this? My implementation for Linked List and Binary Search Tree are complete and work perfectly.
Note: I am not looking for a complete implemenation/ full code. I would like to be able to code it myself but I think my theory is flawed.
Visualization of My Idea
----------------------------------H A S H T A B L E---------------------------------------
enum class Hash { LinkedList, Tree };
INDEXES: 0 1 2 3 4
Hash eg = new Hash [ LinkedList, LinkedList, LinkedList, LinkedList, LinkedList ]
//11th element is inserted into eg[2]
//Method to Replace Linked List with Binary Search Tree
if (eg[1].getSize() > 10) {
Tree toReplace();
Node *follow = eg[1].headptr; //Each linked list is made of connected
//headptr is a pointer to the first element of the linked list
while ( follow != nullptr ){
toReplace.insert(follow->value);
follow = follow.next() //Next is the pointer to the next element in the linked list
}
}
//Now, the Linked List at eg[2] is replaced with a Binary Search Tree
Hash eg = new Hash [ LinkedList, LinkedList, Tree, LinkedList, LinkedList ]
Short answer: No.
An enumeration is a distinct type whose value is restricted to a range
of values (see below for details), which may include several
explicitly named constants ("enumerators"). The values of the
constants are values of an integral type known as the underlying type
of the enumeration.
http://en.cppreference.com/w/cpp/language/enum
Classes will not be 'values of an integral type'.
You may be able to achieve what you want with a tuple.
http://en.cppreference.com/w/cpp/utility/tuple

How to insert duplicate keys into b trees

Please answer on b trees and not b+ trees.
I have 2 questions.
What happens when you insert duplicated keys to a b tree?
For the following input how will the b tree with t=3 look like?
1,1,1,1,1,1,1,1,1,1,1,1,1,1
Can a parent node in a b tree with t=3 look like this?
1,1,4,10?
If so will the son between the key "1" and the second key" 1" contain only the value "1" ?
Just like hash tables, each node in the tree should store a link to a list of items associated with that key. You will store unique keys in the tree but the links will point to a list with possibly multiple items:
[node, key=1, ptr=l], l={1,1,1,1,1,1,1...}

String Algorithm Question - Word Beginnings

I have a problem, and I'm not too sure how to solve it without going down the route of inefficiency. Say I have a list of words:
Apple
Ape
Arc
Abraid
Bridge
Braide
Bray
Boolean
What I want to do is process this list and get what each word starts with up to a certain depth, e.g.
a - Apple, Ape, Arc, Abraid
ab - Abraid
ar -Arc
ap - Apple, Ape
b - Bridge, Braide, Bray, Boolean
br - Bridge, Braide, Bray
bo - Boolean
Any ideas?
You can use a Trie structure.
(root)
/
a - b - r - a - i - d
/ \ \
p r e
/ \ \
p e c
/
l
/
e
Just find the node that you want and get all its descendants, e.g., if I want ap-:
(root)
/
a - b - r - a - i - d
/ \ \
[p] r e
/ \ \
p e c
/
l
/
e
Perhaps you're looking for something like:
#!/usr/bin/env python
def match_prefix(pfx,seq):
'''return subset of seq that starts with pfx'''
results = list()
for i in seq:
if i.startswith(pfx):
results.append(i)
return results
def extract_prefixes(lngth,seq):
'''return all prefixes in seq of the length specified'''
results = dict()
lngth += 1
for i in seq:
if i[0:lngth] not in results:
results[i[0:lngth]] = True
return sorted(results.keys())
def gen_prefix_indexed_list(depth,seq):
'''return a dictionary of all words matching each prefix
up to depth keyed on these prefixes'''
results = dict()
for each in range(depth):
for prefix in extract_prefixes(each, seq):
results[prefix] = match_prefix(prefix, seq)
return results
if __name__ == '__main__':
words='''Apple Ape Arc Abraid Bridge Braide Bray Boolean'''.split()
test = gen_prefix_indexed_list(2, words)
for each in sorted(test.keys()):
print "%s:\t\t" % each,
print ' '.join(test[each])
That is you want to generate all the prefixes that are present in a list of words between one and some number you'll specify (2 in this example). Then you want to produce an index of all words matching each of these prefixes.
I'm sure there are more elegant ways to do this. For for a quick and easily explained approach I've just built this from a simple bottom-up functional decomposition of the apparent spec. Of the end result values are lists each matching a given prefix, then we start with the function to filter out such matches from our inputs. If the end result keys are all prefixes between 1 and some N that appear in our input then we have a function to extract those. Then our spec. is an extremely straightforward nested loop around that.
Of course this nest loop might be a problem. Such things usually equate to an O(n^2) efficiency. As shown this will iterate over the original list C * N * N times (C is the constant number representing the prefixes of length 1, 2, etc; while N is the length of the list).
If this decomposition provides the desired semantics then we can look at improving the efficiency. The obvious approach would be to lazily generate the dictionary keys as we iterate once over the list ... for each word, for each prefix length, generate key ... append this word to the the list/value stored at that key ... and continue to the next word.
There's still a nested loop ... but it's the short loop for each key/prefix length. That alternative design has the advantage of allowing us to iterate over lists of words from any iterable, not just an in memory list. So we could iterate over lines of a file, results generated from a database query, etc --- without incurring the memory overhead of keeping the entire original word list in memory.
Of course we're still storing the dictionary in memory. However we can also change that, decouple the logic from the input and storage. When we append each input to the various prefix/key values we don't care if they're lists in a dictionary, or lines in a set of files, or values being pulled out of (and pushed back into) a DBM or other key/value store (for example some sort of CouchDB or other "noSQL clustered/database."
The implementation of that is left as an exercise to the reader.
I don't know what you are thinking about, when you say "route of inefficiency", but pretty obvious solution (possibly the one you are thinking about) comes to mind. Trie looks like a structure for this kind of problems, but it's costly in terms of memory (there is a lot of duplication) and I'm not sure it makes things faster in your case. Maybe the memory usage would pay off, if the information was to be retrieved many times, but your answer suggests, you want to generate the output file once and store it. So in your case the Trie would be generated just to be traversed once. I don't think it makes sense.
My suggestion is to just sort the list of words in lexical order and then traverse the list in order as many times as the max length of the beginning is.
create a dictionary with keys being strings and values being lists of strings
for(i = 1 to maxBeginnigLength)
{
for(every word in your sorted list)
{
if(the word's length is no less than i)
{
add the word to the list in the dictionary at a key
being the beginning of the word of length i
}
}
}
store contents of the dictionary to the file
Using this PHP trie implementation will get you about 50% there. It's got some stuff you don't need and it doesn't have a "search by prefix" method, but you can write one yourself easily enough.
$trie = new Trie();
$trie->add('Apple', 'Apple');
$trie->add('Ape', 'Ape');
$trie->add('Arc', 'Arc');
$trie->add('Abraid', 'Abraid');
$trie->add('Bridge', 'Bridge');
$trie->add('Braide', 'Braide');
$trie->add('Bray', 'Bray');
$trie->add('Boolean', 'Boolean');
It builds up a structure like this:
Trie Object
(
[A] => Trie Object
(
[p] => Trie Object
(
[ple] => Trie Object
[e] => Trie Object
)
[rc] => Trie Object
[braid] => Trie Object
)
[B] => Trie Object
(
[r] => Trie Object
(
[idge] => Trie Object
[a] => Trie Object
(
[ide] => Trie Object
[y] => Trie Object
)
)
[oolean] => Trie Object
)
)
If the words were in a Database (Access, SQL), and you wanted to retrieve all words starting with 'br', you could use:
Table Name: mytable
Field Name: mywords
"Select * from mytable where mywords like 'br*'" - For Access - or
"Select * from mytable where mywords like 'br%'" - For SQL

Resources