What is the best data structure to lookup if something exists?

What is the best data structure to lookup if something exists? - algorithm

I want to keep a list of things I've already encountered in my recursive looping so I can avoid recalculating the same things over and over again. What is the most efficient data structure for this?
Looking at complexity analysis, Hashtables seem to be the most efficient for lookups. However, it feels inefficient to hold a data structure of key value pairs when all I'm interested in is looking up whether a specified key exists.
I thought that maybe a set would be a good idea, but its complexity doesn't seem to indicate so.

A set probably is the correct abstract data type for this, since really the only information it encodes is whether an item is inside or not.
Using a set doesn't specify a particular implementation though. In most contexts, you should be able to find a set type implemented using hashing, or some type of tree structure. Which one you choose would depend on if the data you're inserting can be efficiently hashed or ordered.
In some libraries you will find set types that are implemented using the library's map types, but where the value is ignored. For example in rust, the standard library's HashSet type is implemented using the HashMap type.

Related

Basic differences between HashTable and HashMap?

I am researching about hash tables and hash maps, everything I have read or watched gives a very vague description of the differences. From messing around on Netbeans with them both, they seem to have the same functions and do the same things, what are the fundamental differences between these two data structures?

There are no differences, but you can find that the same thing called differently in different programming languages, so how people call something depends on their background and programming language they use. For example: in c++ it will be HashMap and in java it will be HashTable.
Also, there could be one difference concluded based on the naming: HashTable allows only store hashed keys, but not values whereas HashMap allows to retrieve a value by hashed key. Internally the both will use the same algorithm and can be considered as same data structure.

HashTable sounds to me like a concrete data structure, although it has numerous variants depending on what happens when a collision occurs, when the table fills up, when it empties.
Map sounds like a abstract data structure, something defined by the available operations (Dictionary would be a potential other name for the same data structure, but I'd not be surprised if some nomenclature defined both with a nuance somewhere).
HashMap sounds like an implementation of the Map abstract data structure using an HashTable concrete data structure.
Again, I'd not be surprised if a language or a library provided both, with a nuance somewhere (HashMap for instance could provide only the operations defined for a Map, but HashTable provides everything which make sense for an HashTable).

How to implement an addressable FIFO queue?

I'm currently looking for a data structure with all O(1) operations
insert(K, V): Insert a value at the end of the queue.
remove_key(K): Remove the value from the queue corresponding to the provided key.
remove_head(): Remove the value from the front of the queue (the oldest one).
The only reasonably easy to implement thing I can think of is using a doubly linked list as the primary data structure, and keeping pointers to the list nodes in a hash table, which would get the desired asymptotic behavior, however this might not be the most efficient option in actual runtime.
I found "addressable priority queues" in the literature, but they are rather complicated (and maybe even more expensive) data structures, so I was wondering if someone has a better suggestion. It seems no one implemented something like this for Rust so far, which is why I'm hoping it doesn't get too complicated.

I would use a pub struct VecDeque<T> and use pop_front() instead of remove_head().
See the doc: VecDeque

Here I implemented an Addressable Binary Heap in Python, no third-party dependencies.

A data structure with certain properties

I want to implement a data structure myself in C++11. What I'm planning to do is having a data structure with the following properties:
search. O(log(n))
insert. O(log(n))
delete. O(log(n))
iterate. O(n)
What I have been thinking about after research was implementing a balanced binary search tree. Are there other structures that would fulfill my needs? I am completely new to this topic and thought a question here would give me a good jumpstart.

First of all, using the existing standard library data types is definitely the way to go for production code. But since you are asking how to implement such data structures yourself, I assume this is mainly an educational exercise for you.
Binary search trees of some form (https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree#Implementations) or B-trees (https://en.wikipedia.org/wiki/B-tree) and hash tables (https://en.wikipedia.org/wiki/Hash_table) are definitely the data structures that are usually used to accomplish efficient insertion and lookup. If you want to go wild you can combine the two by using a tree instead of a linked list to handle hash collisions (although this has a good potential to actually make your implementation slower if you don't make massive mistakes in sizing your hash table or in choosing an adequate hash function).
Since I'm assuming you want to learn something, you might want to have a look at minimal perfect hashing in the context of hash tables (https://en.wikipedia.org/wiki/Perfect_hash_function) although this only has uses in special applications (I had the opportunity to use a perfect minimal hash function exactly once). But it sure is fascinating. As you can see from the link above, the botany of search trees is virtually limitless in scope so you can also go wild on that front.

An efficient Javascript set structure

After reading many similar questions:
JavaScript implementation of a set data structure
Mimicking sets in JavaScript?
Node JS, traditional data structures? (such as Set, etc), anything like Java.util for node?
Efficient Javascript Array Lookup
Best way to find if an item is in a JavaScript array?
How do I check if an array includes an object in JavaScript?
I still have a question: suppose I have a large array of strings (several thousands), and I have to make many lookups (i.e. check many times whether a given string is contained in this array). What is the most efficient way to do this in Node.js ?
A. Sort the array of strings, then use binary search? or:
B. Convert the strings to keys of an object, then use the "in" operator
?
I know that the complexity of A is O(log N), where N is the number of strings.
But I don't know the complexity of B.
IF a Javascript object is implemented as a hash table, then the complexity of B is, on average, O(1), which is better than A. However, I don't know if a Javascript object is really implemented as a hash table!

Update for 2016
Since you're asking about node.js and it is 2016, you can now use either the Set or Map object from ES6 as these are built into ES6. Both allows you to use any string as a key. The Set object is appropriate when you just want to see if the key exists as in:
if (mySet.has(someString)) {
//code here
}
And, Map is appropriate when you want to store a value for that key as in:
if (myMap.has(someString)) {
let val = myMap[someString];
// do something with val here
}
Both ES6 features are now built into node.js as of node V4 (the current version of node.js as of this edit is v6).
See this performance comparison to see how much faster the Set operations are than many other choices.
Older Answer
All important performance questions should be tested with actual performance tests in a tool like jsperf.com. In your case, a javascript object uses a hash-table like implementation because without something that performs pretty well, the whole implementation would be slow since so much of javascript uses object.
String keys on an object would be the first thing I'd test and would be my guess for the best performer. Since the internals of an object are implemented in native code, I'd expect this to be faster than your own hashtable or binary search implemented in javascript.
But, as I started my answer with, you should really test your specific circumstance with the number and length of strings you are most concerned about in a tool like jsperf.

For fixed large array of string I suggest to use some form of radix search
Also, take a look at different data structures and algorithms (AVL trees, queues/heaps etc) in this package
I'm pretty sure that using JS object as storage for strings will result in 'hash mode' for that object. Depending on implementation this could be O(log n) to O(1) time. Look at some jsperf benchmarks to compare property lookup vs binary search on sorted array.
In practice, especially if I'm not going to use the code in browser I would offload this functionality to something like redis or memcached.

Iterable O(1) insert and random delete collection

I am looking to implement my own collection class. The characteristics I want are:
Iterable - order is not important
Insertion - either at end or at iterator location, it does not matter
Random Deletion - this is the tricky one. I want to be able to have a reference to a piece of data which is guaranteed to be within the list, and remove it from the list in O(1) time.
I plan on the container only holding custom classes, so I was thinking a doubly linked list that required the components to implement a simple interface (or abstract class).
Here is where I am getting stuck. I am wondering whether it would be better practice to simply have the items in the list hold a reference to their node, or to build the node right into them. I feel like both would be fairly simple, but I am worried about coupling these nodes into a bunch of classes.
I am wondering if anyone has an idea as to how to minimize the coupling, or possibly know of another data structure that has the characteristics I want.

It'd be hard to beat a hash map.

Take a look at tries.
Apparently they can beat hashtables:
Unlike most other algorithms, tries have the peculiar feature that the time to insert, or to delete or to find is almost identical because the code paths followed for each are almost identical. As a result, for situations where code is inserting, deleting and finding in equal measure tries can handily beat binary search trees or even hash tables, as well as being better for the CPU's instruction and branch caches.
It may or may not fit your usage, but if it does, it's likely one of the best options possible.

In C++, this sounds like the perfect fit for std::unordered_set (that's std::tr1::unordered_set or boost::unordered_set to you if you have an older compiler). It's implemented as a hash set, which has the characteristics you describe.
Here's the interface documentation. Note that the hash containers actually offer two sets of iterators, the usual ones and local ones which only go through one bucket.
Many other languages have "hash sets" as well, certainly Java and C#.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

What is the best data structure to lookup if something exists? - algorithm

Related

Basic differences between HashTable and HashMap?

How to implement an addressable FIFO queue?

A data structure with certain properties

An efficient Javascript set structure

Iterable O(1) insert and random delete collection

Categories

Resources