Hashtable with both values as key - algorithm

Is there a hashing based data structure where I can search an item in O(1) time on both key and value.
This can be achieved by adding duplicate entry in the list for each key value par by reversing key and value, but it will take double the space.
This kind of data structure might be useful in some scenarios: like I want to store opening and closing parenthesis in a map and while parsing the string, I can just check in the map if the key is present without worrying about whether it is opening-closing map or closing-opening map or without storing duplicate.
I hope I am clear enough!!

Data structure that fulfills your needs is called bidirectional map.
I suppose that you are looking for the existing implementation, not for the pointers how to implement it :) Since you didn't specify the programming language, this is the current situation for Java - there is no such data structure in Java API. However, there is Google Guava's bi-directional map interface with several implementations. From the docs:
A bimap (or "bidirectional map") is a map that preserves the
uniqueness of its values as well as that of its keys. This constraint
enables bimaps to support an "inverse view", which is another bimap
containing the same entries as this bimap but with reversed keys and
values.
Alternatively, there is BidiMap from Apache Collections.
For C++, have a look at Boost.Bimap.
For Python, have a look at bidict.
In C#, as well as in other languages, there does not exist an official implementation, but that's where Jon Skeet comes in.

You're searching for a bidirectional map. Here is an article describing the implementation in c++. Note though that a bidirectional map is basically two maps merged into a single object. There isn't any more efficient solution than this though, for a simple reason:
a map is basically an unconnected directed graph of (key,value)-pairs. Each pair is represented by an edge. If you want the map to be bidirectional you'll wind up with twice as many edges, thus doubling the amount of required memory.
C++ and Java STL don't provide any classes for this purpose though. In Java you can use Googles Guava library, in C++ the boost-library provides bi-directional maps.

Related

Basic differences between HashTable and HashMap?

I am researching about hash tables and hash maps, everything I have read or watched gives a very vague description of the differences. From messing around on Netbeans with them both, they seem to have the same functions and do the same things, what are the fundamental differences between these two data structures?
There are no differences, but you can find that the same thing called differently in different programming languages, so how people call something depends on their background and programming language they use. For example: in c++ it will be HashMap and in java it will be HashTable.
Also, there could be one difference concluded based on the naming: HashTable allows only store hashed keys, but not values whereas HashMap allows to retrieve a value by hashed key. Internally the both will use the same algorithm and can be considered as same data structure.
HashTable sounds to me like a concrete data structure, although it has numerous variants depending on what happens when a collision occurs, when the table fills up, when it empties.
Map sounds like a abstract data structure, something defined by the available operations (Dictionary would be a potential other name for the same data structure, but I'd not be surprised if some nomenclature defined both with a nuance somewhere).
HashMap sounds like an implementation of the Map abstract data structure using an HashTable concrete data structure.
Again, I'd not be surprised if a language or a library provided both, with a nuance somewhere (HashMap for instance could provide only the operations defined for a Map, but HashTable provides everything which make sense for an HashTable).

An efficient Javascript set structure

After reading many similar questions:
JavaScript implementation of a set data structure
Mimicking sets in JavaScript?
Node JS, traditional data structures? (such as Set, etc), anything like Java.util for node?
Efficient Javascript Array Lookup
Best way to find if an item is in a JavaScript array?
How do I check if an array includes an object in JavaScript?
I still have a question: suppose I have a large array of strings (several thousands), and I have to make many lookups (i.e. check many times whether a given string is contained in this array). What is the most efficient way to do this in Node.js ?
A. Sort the array of strings, then use binary search? or:
B. Convert the strings to keys of an object, then use the "in" operator
?
I know that the complexity of A is O(log N), where N is the number of strings.
But I don't know the complexity of B.
IF a Javascript object is implemented as a hash table, then the complexity of B is, on average, O(1), which is better than A. However, I don't know if a Javascript object is really implemented as a hash table!
Update for 2016
Since you're asking about node.js and it is 2016, you can now use either the Set or Map object from ES6 as these are built into ES6. Both allows you to use any string as a key. The Set object is appropriate when you just want to see if the key exists as in:
if (mySet.has(someString)) {
//code here
}
And, Map is appropriate when you want to store a value for that key as in:
if (myMap.has(someString)) {
let val = myMap[someString];
// do something with val here
}
Both ES6 features are now built into node.js as of node V4 (the current version of node.js as of this edit is v6).
See this performance comparison to see how much faster the Set operations are than many other choices.
Older Answer
All important performance questions should be tested with actual performance tests in a tool like jsperf.com. In your case, a javascript object uses a hash-table like implementation because without something that performs pretty well, the whole implementation would be slow since so much of javascript uses object.
String keys on an object would be the first thing I'd test and would be my guess for the best performer. Since the internals of an object are implemented in native code, I'd expect this to be faster than your own hashtable or binary search implemented in javascript.
But, as I started my answer with, you should really test your specific circumstance with the number and length of strings you are most concerned about in a tool like jsperf.
For fixed large array of string I suggest to use some form of radix search
Also, take a look at different data structures and algorithms (AVL trees, queues/heaps etc) in this package
I'm pretty sure that using JS object as storage for strings will result in 'hash mode' for that object. Depending on implementation this could be O(log n) to O(1) time. Look at some jsperf benchmarks to compare property lookup vs binary search on sorted array.
In practice, especially if I'm not going to use the code in browser I would offload this functionality to something like redis or memcached.

Hashes: Tables, Lists and Maps, Oh My?

I've been trying to find some concrete (laymen; non super-academic) definitions for the various types of hash data structures, specifically hash tables, hash lists and hash maps. Online searches provide many useful links to all of these, but never give clear definitions of when it is appropriate to use each over the others.
(1) From a practical standpoint, what's the difference between these 3?
(2) How do their operations' run times differ? Are there clear instances when one should be used or avoided over the other types of hashes?
(3) How do each of these relate back to the Map ADT? Are they all just different implementations of it, or different beasts altogether?
Thanks for any insight here!
There's an abstract data structure that contains mapping between keys and values. It has several different names, including Map, Dictionary, Table, Association Table, and more.
The most basic operations that should be supported by this data-structure are adding, removing and retrieving a value, given its associated key. There are variations and additions around this basic concept - for instance, some structures support iterating over all the key-value pairs, some structures support multiple values per key, etc. There's also a difference in time and space complexity between the various implementations.
Of the multiple implementations available for this data structure, some of the most popular ones utilize hash functions for fast access times. Those implementations are sometimes called by the name Hash Table or Hash Map, you can read more about them in Wikipedia. The performance also varies between hash table implementations, with some reaching amortized O(1) insertion and access complexity (for the price of a lot of space used).
A hash list, on the other hand, is a different thing, and is more about the usage of a data structure, than its actual structures. A hash list is usually just a regular list of hash values, nothing special about it. It's used when verifying the integrity of a large piece of data - in that case it allows various data chunks to be verified independently, allowing for fixing or retrieving of just the bad chunks. This is as opposed to using a single hash value to hash the entire piece of data, in which case a failure means all the data has to be fixed or retrieved again.

External store for complex collections that can be accessed by Key-Value

Problem
I need a key-value store that can store values of the following form:
DS<DS<E>>
where the data structure DS can be
either a List, SortedSet or an Array
and E can be either a String or byte-array.
It is very expensive to generate this data and so once I put it into the store, I will only perform read queries on it. Essentially it is a complex object cache with no eviction.
Example Application
A (possibly bad, but sufficient to clarify) example of an application is storing tokenized sentences from a document where you need to be able to quickly access the qth word of the pth sentence given documentID. In this case, I would be storing it as a K-V pair as follows:
K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);
I prefer to avoid app-integrated Map solutions (such as EhCache within Java).
I have worked with Redis but it doesn't appear to support the second layer of data-structure complexity. Any other K-V solutions that can help my use case?
Update:
I know that I could serialize/deserialize my object but I was wondering if there is any other solution.
In terms of platform choice you have two options - A full document database will support arbitrarily complex objects, but won't have built in commands for working with specific data structures. Something like Redis which does have optimised code for specific data structures can't support all possible data structures.
You can actually get pretty close with Redis by using ids instead of the nested data structure. DS1<DS2<E>> becomes DS1<int> and DS2<E>, with the int from DS1 and a prefix giving you the key holding DS2.
With this structure you can access any E with only two operations. In some cases you will be able to get that down to a single operation by knowing what the id of DS2 will be for a given query.
I hesitate to "recommend" it, but one of the only storage engines I know of which handles multi-dimensional data of this sort efficiently is Intersystems Cache. I had to use it at my last job, mostly coding against it using it's built in MUMPS-based language. I would not recommend the native approach, unless you hate yourself or your developers. However, they do have decent Java adapters, which appears to be what you're using. I've seen it handle billions of records, efficiently stored in nested binary tree tables. There is no practical limit to the depth (number of dimensions) you can use. However, this is very much a proprietary solution. There is an open-source alternative called GT.M, but I don't know how compatible it is with languages that aren't M or C.
Any Key-Value store supports complex values, you just need to serialize/deserialize the data.
If you want fast retrieval only for specific parts of the data, you could use a more complex Key. In your example this would be:
K - tuple(docID, p, q)

Looking for a good definition of a map, and if maps can be implemented using trees

I'm going through some potential interview questions, one of which being can you implement a map or a linked list using a tree.
But even after some time googling I don't have a clear idea exactly what a map is, how it differs from an array or a hash table for example. Could anybody provide a clear description.
Can it, and a linked list be written as a tree?
A Map, aka Dictionary or associative array, is a data structure that allows you to look up a value using a key.
A Java Map can be implemented as a HashMap or a TreeMap; that suggests that hash map is one possible implementation and yes, you can implement a Map as a tree.
Can it (a map), and a linked list be written as a tree?
Maps are usually represented with arrays. To determine where an entry goes in a map, you need to compute its key. Look here for a better explanation.
Trees (with an undetermined number of nodes) can be implemented using lists (see here for further discussion). Lists are not usually implemented as trees.
I'd encourage you to get this book which is a classic on data structures and will give you alot of really great information.

Resources