Basic differences between HashTable and HashMap? - data-structures

I am researching about hash tables and hash maps, everything I have read or watched gives a very vague description of the differences. From messing around on Netbeans with them both, they seem to have the same functions and do the same things, what are the fundamental differences between these two data structures?

There are no differences, but you can find that the same thing called differently in different programming languages, so how people call something depends on their background and programming language they use. For example: in c++ it will be HashMap and in java it will be HashTable.
Also, there could be one difference concluded based on the naming: HashTable allows only store hashed keys, but not values whereas HashMap allows to retrieve a value by hashed key. Internally the both will use the same algorithm and can be considered as same data structure.

HashTable sounds to me like a concrete data structure, although it has numerous variants depending on what happens when a collision occurs, when the table fills up, when it empties.
Map sounds like a abstract data structure, something defined by the available operations (Dictionary would be a potential other name for the same data structure, but I'd not be surprised if some nomenclature defined both with a nuance somewhere).
HashMap sounds like an implementation of the Map abstract data structure using an HashTable concrete data structure.
Again, I'd not be surprised if a language or a library provided both, with a nuance somewhere (HashMap for instance could provide only the operations defined for a Map, but HashTable provides everything which make sense for an HashTable).

Related

What is the best data structure to lookup if something exists?

I want to keep a list of things I've already encountered in my recursive looping so I can avoid recalculating the same things over and over again. What is the most efficient data structure for this?
Looking at complexity analysis, Hashtables seem to be the most efficient for lookups. However, it feels inefficient to hold a data structure of key value pairs when all I'm interested in is looking up whether a specified key exists.
I thought that maybe a set would be a good idea, but its complexity doesn't seem to indicate so.
A set probably is the correct abstract data type for this, since really the only information it encodes is whether an item is inside or not.
Using a set doesn't specify a particular implementation though. In most contexts, you should be able to find a set type implemented using hashing, or some type of tree structure. Which one you choose would depend on if the data you're inserting can be efficiently hashed or ordered.
In some libraries you will find set types that are implemented using the library's map types, but where the value is ignored. For example in rust, the standard library's HashSet type is implemented using the HashMap type.

Hashtable with both values as key

Is there a hashing based data structure where I can search an item in O(1) time on both key and value.
This can be achieved by adding duplicate entry in the list for each key value par by reversing key and value, but it will take double the space.
This kind of data structure might be useful in some scenarios: like I want to store opening and closing parenthesis in a map and while parsing the string, I can just check in the map if the key is present without worrying about whether it is opening-closing map or closing-opening map or without storing duplicate.
I hope I am clear enough!!
Data structure that fulfills your needs is called bidirectional map.
I suppose that you are looking for the existing implementation, not for the pointers how to implement it :) Since you didn't specify the programming language, this is the current situation for Java - there is no such data structure in Java API. However, there is Google Guava's bi-directional map interface with several implementations. From the docs:
A bimap (or "bidirectional map") is a map that preserves the
uniqueness of its values as well as that of its keys. This constraint
enables bimaps to support an "inverse view", which is another bimap
containing the same entries as this bimap but with reversed keys and
values.
Alternatively, there is BidiMap from Apache Collections.
For C++, have a look at Boost.Bimap.
For Python, have a look at bidict.
In C#, as well as in other languages, there does not exist an official implementation, but that's where Jon Skeet comes in.
You're searching for a bidirectional map. Here is an article describing the implementation in c++. Note though that a bidirectional map is basically two maps merged into a single object. There isn't any more efficient solution than this though, for a simple reason:
a map is basically an unconnected directed graph of (key,value)-pairs. Each pair is represented by an edge. If you want the map to be bidirectional you'll wind up with twice as many edges, thus doubling the amount of required memory.
C++ and Java STL don't provide any classes for this purpose though. In Java you can use Googles Guava library, in C++ the boost-library provides bi-directional maps.

Hashes: Tables, Lists and Maps, Oh My?

I've been trying to find some concrete (laymen; non super-academic) definitions for the various types of hash data structures, specifically hash tables, hash lists and hash maps. Online searches provide many useful links to all of these, but never give clear definitions of when it is appropriate to use each over the others.
(1) From a practical standpoint, what's the difference between these 3?
(2) How do their operations' run times differ? Are there clear instances when one should be used or avoided over the other types of hashes?
(3) How do each of these relate back to the Map ADT? Are they all just different implementations of it, or different beasts altogether?
Thanks for any insight here!
There's an abstract data structure that contains mapping between keys and values. It has several different names, including Map, Dictionary, Table, Association Table, and more.
The most basic operations that should be supported by this data-structure are adding, removing and retrieving a value, given its associated key. There are variations and additions around this basic concept - for instance, some structures support iterating over all the key-value pairs, some structures support multiple values per key, etc. There's also a difference in time and space complexity between the various implementations.
Of the multiple implementations available for this data structure, some of the most popular ones utilize hash functions for fast access times. Those implementations are sometimes called by the name Hash Table or Hash Map, you can read more about them in Wikipedia. The performance also varies between hash table implementations, with some reaching amortized O(1) insertion and access complexity (for the price of a lot of space used).
A hash list, on the other hand, is a different thing, and is more about the usage of a data structure, than its actual structures. A hash list is usually just a regular list of hash values, nothing special about it. It's used when verifying the integrity of a large piece of data - in that case it allows various data chunks to be verified independently, allowing for fixing or retrieving of just the bad chunks. This is as opposed to using a single hash value to hash the entire piece of data, in which case a failure means all the data has to be fixed or retrieved again.

Iterable O(1) insert and random delete collection

I am looking to implement my own collection class. The characteristics I want are:
Iterable - order is not important
Insertion - either at end or at iterator location, it does not matter
Random Deletion - this is the tricky one. I want to be able to have a reference to a piece of data which is guaranteed to be within the list, and remove it from the list in O(1) time.
I plan on the container only holding custom classes, so I was thinking a doubly linked list that required the components to implement a simple interface (or abstract class).
Here is where I am getting stuck. I am wondering whether it would be better practice to simply have the items in the list hold a reference to their node, or to build the node right into them. I feel like both would be fairly simple, but I am worried about coupling these nodes into a bunch of classes.
I am wondering if anyone has an idea as to how to minimize the coupling, or possibly know of another data structure that has the characteristics I want.
It'd be hard to beat a hash map.
Take a look at tries.
Apparently they can beat hashtables:
Unlike most other algorithms, tries have the peculiar feature that the time to insert, or to delete or to find is almost identical because the code paths followed for each are almost identical. As a result, for situations where code is inserting, deleting and finding in equal measure tries can handily beat binary search trees or even hash tables, as well as being better for the CPU's instruction and branch caches.
It may or may not fit your usage, but if it does, it's likely one of the best options possible.
In C++, this sounds like the perfect fit for std::unordered_set (that's std::tr1::unordered_set or boost::unordered_set to you if you have an older compiler). It's implemented as a hash set, which has the characteristics you describe.
Here's the interface documentation. Note that the hash containers actually offer two sets of iterators, the usual ones and local ones which only go through one bucket.
Many other languages have "hash sets" as well, certainly Java and C#.

External store for complex collections that can be accessed by Key-Value

Problem
I need a key-value store that can store values of the following form:
DS<DS<E>>
where the data structure DS can be
either a List, SortedSet or an Array
and E can be either a String or byte-array.
It is very expensive to generate this data and so once I put it into the store, I will only perform read queries on it. Essentially it is a complex object cache with no eviction.
Example Application
A (possibly bad, but sufficient to clarify) example of an application is storing tokenized sentences from a document where you need to be able to quickly access the qth word of the pth sentence given documentID. In this case, I would be storing it as a K-V pair as follows:
K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);
I prefer to avoid app-integrated Map solutions (such as EhCache within Java).
I have worked with Redis but it doesn't appear to support the second layer of data-structure complexity. Any other K-V solutions that can help my use case?
Update:
I know that I could serialize/deserialize my object but I was wondering if there is any other solution.
In terms of platform choice you have two options - A full document database will support arbitrarily complex objects, but won't have built in commands for working with specific data structures. Something like Redis which does have optimised code for specific data structures can't support all possible data structures.
You can actually get pretty close with Redis by using ids instead of the nested data structure. DS1<DS2<E>> becomes DS1<int> and DS2<E>, with the int from DS1 and a prefix giving you the key holding DS2.
With this structure you can access any E with only two operations. In some cases you will be able to get that down to a single operation by knowing what the id of DS2 will be for a given query.
I hesitate to "recommend" it, but one of the only storage engines I know of which handles multi-dimensional data of this sort efficiently is Intersystems Cache. I had to use it at my last job, mostly coding against it using it's built in MUMPS-based language. I would not recommend the native approach, unless you hate yourself or your developers. However, they do have decent Java adapters, which appears to be what you're using. I've seen it handle billions of records, efficiently stored in nested binary tree tables. There is no practical limit to the depth (number of dimensions) you can use. However, this is very much a proprietary solution. There is an open-source alternative called GT.M, but I don't know how compatible it is with languages that aren't M or C.
Any Key-Value store supports complex values, you just need to serialize/deserialize the data.
If you want fast retrieval only for specific parts of the data, you could use a more complex Key. In your example this would be:
K - tuple(docID, p, q)

Resources