Faced a confusing question today, which container would best suited for storing a mutable data structure as an entry?
Like map, set, tree, list, or an array?
Related
I am researching about hash tables and hash maps, everything I have read or watched gives a very vague description of the differences. From messing around on Netbeans with them both, they seem to have the same functions and do the same things, what are the fundamental differences between these two data structures?
There are no differences, but you can find that the same thing called differently in different programming languages, so how people call something depends on their background and programming language they use. For example: in c++ it will be HashMap and in java it will be HashTable.
Also, there could be one difference concluded based on the naming: HashTable allows only store hashed keys, but not values whereas HashMap allows to retrieve a value by hashed key. Internally the both will use the same algorithm and can be considered as same data structure.
HashTable sounds to me like a concrete data structure, although it has numerous variants depending on what happens when a collision occurs, when the table fills up, when it empties.
Map sounds like a abstract data structure, something defined by the available operations (Dictionary would be a potential other name for the same data structure, but I'd not be surprised if some nomenclature defined both with a nuance somewhere).
HashMap sounds like an implementation of the Map abstract data structure using an HashTable concrete data structure.
Again, I'd not be surprised if a language or a library provided both, with a nuance somewhere (HashMap for instance could provide only the operations defined for a Map, but HashTable provides everything which make sense for an HashTable).
I'm looking for a static data structure with amortised constant time associative lookup. The only operations I want to perform are lookup and construction. Also being functional is a must. I've had a look at finger trees, but I can't seem to wrap my head round them. Are there any good docs on them or, better yet, a simpler static functional data structure?
I assume that by "functional" and "static" you mean an immutable structure, which can not be modified after its construction, by "lookup" you mean a dictionary-like, key-value lookup and by "construction" you mean the initial construction of the datastructure from a given set of elements.
In that case an immutable, hashtable based dictionary would work. The disadvantage with that is that insertions and removals are O(N), but you state that this is acceptable in your case.
Depending on what programming language you are using a datatype that could be used to implement this might or might not be available. In Erlang a tuple could be used. Haskell has an immutable array in Data.Array.IArray.
You'd have to look at this from an information-theoretical point of view: The more key/value pairs you store, the more keys you'd have to recognize on an associative lookup. Thus, no matter what you do, the more keys you have, the more complex your lookup will be.
Constant time lookup is only possible when your key directly gives you the address (or something equivalent) of the element you want to look up.
Recently I started playing with MapDB, and learning about its interesting properties. As I understand now, it has three major data types: BTree, Hashmap and Hashset. Something which is a little obscure to me is that, when it is better to use Hahsmap (and Hashset), than using Btree? Any pros and cons in using each data structure compared to the other?
In 1.0 HashMap is better for larger keys, it also has entry expiration based on TTL or maximal size. TreeMap is sorted and has data pump.
I would recommend HashMap in general.
I'm going through some potential interview questions, one of which being can you implement a map or a linked list using a tree.
But even after some time googling I don't have a clear idea exactly what a map is, how it differs from an array or a hash table for example. Could anybody provide a clear description.
Can it, and a linked list be written as a tree?
A Map, aka Dictionary or associative array, is a data structure that allows you to look up a value using a key.
A Java Map can be implemented as a HashMap or a TreeMap; that suggests that hash map is one possible implementation and yes, you can implement a Map as a tree.
Can it (a map), and a linked list be written as a tree?
Maps are usually represented with arrays. To determine where an entry goes in a map, you need to compute its key. Look here for a better explanation.
Trees (with an undetermined number of nodes) can be implemented using lists (see here for further discussion). Lists are not usually implemented as trees.
I'd encourage you to get this book which is a classic on data structures and will give you alot of really great information.
I've been working on toy a Database in Clojure and wanted to implement a B+ Tree. When I started thinking about it, I realised there may not be a way to have something like a pointer/reference to other nodes in Clojure. It doesn't matter for something like a BST or a lot of other Tree structures since all you need is to store a Node's child. But what do I do in something like a B+ tree where I need to be able to refer to a Node's sibling?
When looking for solutions, I came across a post in Google Groups about how you don't implement a Doubly linked list in Clojure because there are other ways of doing things in Clojure.
What do I do for a B+ Tree though?
It's not that it's difficult to have references to objects in clojure; but generally, these references are immutable. It's immutability which makes the doubly linked list impossible, because unlike a singly-linked list, you can't change any part of it without creating a mutation somewhere.
To see this, suppose I have a singly linked list,
a -> b -> c
and suppose I want to change the head of it. I can do so, with changing the entirety of the list. I create a new list by creating a new value for the head value, and reuse the tail:
a'-> b -> c
But doubly linked lists are impossible. So in clojure, and other functional languages, we sometimes use a zipper in such situations.
Now, suppose you really need mutable references in Clojure -- how do it? Well, depending on what concurrency semantics you need, clojure has vars, refs, atoms, etc.
Also, with deftype, you can create objects that have mutable fields, and these mutable fields can hold references to other things. You can also use raw java arrays in clojure for this same purpose.
Is your database going to be an in-memory database, or a disk-backed database? If on disk, I think that the issue of pointer swizzling is trickier than that of having mutable references.
Getting back to the issue of functional data structures, I believe that it is possible to create B-trees which have purely functional semantics. The first clue here is that it's a tree, and trees are the bread butter and meat of functional data structures. Secondly, note that there are databases which work in an append-only fashion -- couchDB for instance. This has the benefit that the database is its own log, in a sense. To get more of an idea of the costs and benefits of this approach you might want to watch Slava Akhmechet's presentation. His company, RethinkDB, eventually took a sort of hybrid approach, IIRC.
You may wish to look at Chouser's finger trees in Clojure to see how the functionality of a doubly-linked list may be implemented using functional style.
Alternatively, you may simply want to step back and ask yourself why you believe that B+ is a good choice of data structure for a functional language.
If you are unfamiliar with the alternatives, you may want to look at Chris Okazaki's book "Purely Functional Data Structures."