Is QMap a hash table? - data-structures

I used Qmap many times but perhaps never used QHash. Now I'm reading about hash tables.
Is QMap a hash table?
I presume down there in a QHash we will find the ideas of Hash Maps. Should I say QHash is the implementation of a hash map (or hash table) data structure? Is QMap also the implementation of a hash table?
Can I use the terms map and table interchangeably?

No QMap is not a hash table.
Per the documentation:
The QMap class is a template class that provides a red-black-tree-based dictionary.
In other words it is a binary sort tree that uses the red-black tree algorithm to maintain balance. Meaning that searches will take O(logN) rather than O(1) as in the case of QHash.
It also means that QMap will keep the data sorted.
From the documentation you quoted:
QMap and QHash provide very similar functionality. The differences
are:
QHash provides average faster lookups than QMap. (See Algorithmic
Complexity for details.)
When iterating over a QHash, the items are arbitrarily ordered. With QMap, the items are always sorted by key.
The key type of a QHash must provide operator==() and a global
qHash(Key) function. The key type of a QMap must provide operator<()
specifying a total order. Since Qt 5.8.1 it is also safe to use a
pointer type as key, even if the underlying operator<() does not
provide a total order.
QHash is a hash table. QMap is a binary sort tree using the red black tree algorithm.

Related

How do you implement an ordered hashtable?

I am searching for how to implement an ordered hash table but not finding anything.
I would like to create a hash table which you can iterate over and that gives you the elements based on the order in which you defined or inserted the keys. How do you do this generally, in a performant way? How is this implemented? If a language must be chosen for an example, I am thinking about doing this in JavaScript. I know for example JavaScript objects ("hash maps") are ordered hash maps, but I have no idea how they are implemented. I would like to learn how to implement the same thing from scratch for a custom programming language.
The example is, say you are listing the native script version of a language name, like "עִברִית" for "Hebrew", as the key, and you want to create a map of the native language to the english language, but you want them to stay in the order defined. How do you implement that?
The general, performant solution to this problem is combining a linked list with a hash table. You will have a doubly linked list of Nodes, and each one is indexed via the hash table. You can look things up in either direction in constant time. Broadly, the operations are implemented as follows,
Insert - O(1)* - Insert a Node to the end of the linked list, and reference that Node via its key via the hash map.
Retrieve by key - O(1)* - Using the hash table, find the corresponding Node and return its value.
Delete by key - O(1)* - Using the hash table, find the corresponding Node and remove it by removing its neighbour's references to it.
Traverse - O(n) - Traverse the linked list. n in this case is the number of Nodes, not the full capacity of the hash table.
* The actual insert/retrieve/delete times are subject to the worst-case of the hash table's implementation. Typically this is O(n) worst case, O(1) average.
(An alternative to this is to store a "next key" in each Node instead of a direct pointer to the next node. The runtimes are the same, but traversal needs to involve the hash table whereas in the direct pointer implementation it can be bypassed.)
So if you want to implement this on your own, you'll need a hash table and some Nodes to use within it.

Data structure to store objects identified by unique 8digit hexadecimals for fast insertion and lookup

I have a bunch of objects with unique 8 digit hexadecimal identifiers ex[fd4786ac] that I need to construct and look up quickly. Deletion is not a priority. These hexadecimal values are currently being stored as strings.
Considered a trie(or some variation of a trie), a skip list, and a some variation of a hash table. Using a skip list over a AVL tree would be preferable since it is likely these strings will be sequential but not guaranteed and tree re-balancing would be often. How ever I'm open to other data structures if they better suit my need.
A good choice would be to convert your keys into 32-bit integers, and then use a hash table.
If you want to write your own just for this use case, then:
Instead of hashing keys all the time or storing hash values, use a bijective hash function and use the hashes instead of the keys.
Since your keys are very small you should probably use open addressing -- it will save space and it's a little faster. Wikipedia will give you lots of choices for probing schemes. I currently like robin hood hashing: https://www.sebastiansylvan.com/post/robin-hood-hashing-should-be-your-default-hash-table-implementation/
Your 8 digit hexadecimal identifiers represent a 4byte (32bit) integer so you can use that as an index for a (quite large) array with 2^32 entries.
If the array contains pointers this would cost 64GB.
Most likely too much to keep it in RAM.
So if the number of elements is orders of magnitudes below 2^32, use a Hash-Map or a sorted ist (access O(logn) ).

Data Structure to implement a Word Dictionary

Recently, I was asked in an interview about the usage of data structure.
The question was: what will be the data structure that I will intend to use while creating an English Dictionary. The dictionary will contain number of words under each alphabet and each word will have 1 meaning. Also, how will I implement the data structures to update, search and select different words?
What do you suggest guys? And what is the reason for your suggestion?
A hash table would be the preferred data structure to implement a dictionary with update, search and selection capabilities.
A hash table is a data structure that can store key-value pairs. It is essentially an array containing all of the keys to search on. A hash function(h()) is used to compute an index into an array in which an element can be inserted or searched. So when insertion is required, the hash function is used to find the location where the element needs to be inserted.
Insertion under reasonable assumptions is O(1). Each time we insert data, it takes O(1) time to insert it (assuming the hash function is O(1)).
Looking up data is also similar. If we need to find a the meaning of the word x, we need to calculate h(x), this would tell us where x is located in the hash table. So we can look up words (hash values) in O(1) as well.
However, O(1) insertion and search do not always hold true. There is nothing which guarantees that the hash function won't produce the same output for two different inputs, consequently there would be a collision. In order to handle this scenario various strategies can be employed, namely separate chaining and open addressing. But the search/insertion would no longer be O(1).

How to implement a hash function for a HashSet/HashMap

If I needed to hash an entire HashSet<T> or HashMap<T, U>, where T already had some hash algorithm implemented, how would I do it? Note that I am not asking about hashing elements of a hash table, I'm talking about hashing the entire data structure itself. This is not too difficult with an ordered set like a TreeSet, but because the order of the elements of a hash table is not well-defined, this is more tricky. Sorting the elements is infeasible in the general case, as the algorithm should take no more than O(n) time.
I'm looking for a general, language independent example, but you can provide code or links to code from any language.
Your options are to
Enforce an order for purposes of creating the hash
Apply a hash algorithm that is commutative (independent of order)
The first option may be viable if the number of elements is relatively small. You can sort the hash elements e.g. by hash value (of each element), then apply well-known hash-combining techniques such as multiplying each successive element's contribution to the hash by (SomePrime)^n.
For the second option, simply adding the hash of each element in the hash together may provide a suitable distribution, since the hash of each element itself should already be very well distributed.
Introduce new field for datastructure where you can keep hashbase.
On each addition of element to hashmap/hahset do something like hashbase += element.hash if element is not yet there. Use this hashbase for hash calculation.

When performing a looking in a Hash data structure, why is it a fast operation?

When you perform a lookup in a Hashtable, the key is converted into a hash. Now using that hashed value, does it directly map to a memory location, or are there more steps?
Just trying to understand things a little more under the covers.
And what other key based lookup data structures are there and why are they slower than a hash?
Hash tables are not necessarily fast. People consider hash tables a "fast" data structure because the retrieval time does not depend on the number of entries in the table. That is, retrieval from a hash table is an "O(1)" (constant time) operation.
Retrieval time from other data structures can vary depending on the number of entries in the map. For example, for a balanced binary tree, the retrieval time scales with the base-2 logarithm of its size; it's "O(log n)".
However, actually computing a hash code for an single object, in practice, often takes many times longer than comparing that type of object to others. So, you could find that for a small map, something like a red-black tree is faster than a hash table. As the maps grow, the hash table retrieval time will stay constant, and the red-black tree time will slowly grow until it is slower than a hash table.
A Hash (aka Hash Table) implies more than a Map (or Associative Array).
In particular, a Map (or Associative Array) is an Abstract Data Type:
...an associative array (also called a map or a dictionary) is an abstract data type composed of a collection of (key,value) pairs, such that each possible key appears at most once in the collection.
While a Hash table is an implementation of a Map (although it could also be considered an ADT that includes a "cost"):
...a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys [...], to their associated values [...]. Thus, a hash table implements an associative array [or, map].
Thus it is an implementation-detail leaking out: a HashMap is a Map that uses a Hash-table algorithm and thus provides the expected performance characteristics of such an algorithm. The "leaking" of the implementation detail is good in this case because it provides some basic [expected] bound guarantees, such as an [expected] O(1) -- or constant time -- get.
Hint: a hash function is important part of a hash-table algorithm and sets a HashMap apart from other Map implementations such as a TreeMap (that uses a red-black tree) or a ConcurrentSkipListMap (that uses a skip list).
Another form of a Map is an Association List (or "alist", which is common in LISP programming). While association lists are O(n) for get, they can have much less overhead for small n, which brings up another point: Big-Oh describes limiting behavior (as n -> infinity) and does not address the relative performance for a particular [smallish] n:
A description of a function in terms of big O notation usually only provides an upper bound on the growth rate of the function.
Please refer to the links above (including the javadoc) for the basic characteristics and different implementation strategies -- anything else I say here is already said there (or in other SO answers). If there are specific questions, open a new SO post if warranted :-)
Happy coding.
Here is the source for the HashMap implementation that is in OpenJDK 7. Looking at the put method shows that it a simple chaining as a collision-resolution method and that the underlying "bucket array" will grow by a factor of 2 each resize (which is triggered when the load factor is reached). The load factor and amortized performance expectations -- including those of the hashing function used -- are covered in the class documentation.
"Key-based" implies a mapping of some sort. You can implement one in a linked list or array, and it would probably be pretty slow (O(n)) for lookups or deletes.
Hashing takes constant time. In the more sophisticated implementations it will typically map to a memory address which stores a list of pointers back at the key object in addition to the mapped object or value, for collision detection and resolution.
The expensive operations are following the list of the "hashed to this location" objects to figure out which one you are really looking for. In theory, this could be O(n) for each lookup! However, if we use a larger space the probability of this occurring is reduced (although a few collisions is almost inevitable per the Birthday Problem) drastically.
If you start getting over a certain threshold of collisions, most implementations will expand the size of the hash table, which also takes another O(n) time. However, this will on average take place no more often than every 1/n inserts. So we have amortized constant time.

Resources