Must a hash table be implemented using an array? - data-structures

Must a hash table be implemented using an array? Will an alternative data structure achieve the same efficiency? If yes, why? If no, what condition must the data structure satisfy to ensure the same efficiency as provided by arrays?

Must a hash table be implemented using an array?
No. You could implement the HashTable interface with other datastructures besided the arrray. E.g. a Red-Black tree (java's TreeMap).
This offers O(logN) access time.
But Hash Table is expected to have O(1) access time (at best case - no collisions).
This can be achieved only via an array which offers the possibility of random access in constant time.
what condition must the data structure satisfy to ensure the same
efficiency as provided by arrays?
Must have a comparable performance (less than O(N)) with an array. A treemap has O(logN) worst access time for all operations

Related

Why not use hashing/hash tables for everything?

In computer science, it is said that the insert, delete and searching operations for hash tables have a complexity of O(1), which is the best. So, I was wondering, why do we need to use other data structures since hashing operations are so fast? Why can't we just simply use hashing/hash tables for everything?
Hash tables, on average, do have excellent time complexity for insertion, retrieval, and deletion. BUT:
Big-O complexity isn't everything. The constant factor is also very important. You could use hashtables in place of arrays, with the array indexes as hash keys. In either case, the time complexity of retrieving an item is O(1). But the constant factor is way higher for the hash table as opposed to the array.
Memory consumption may be much higher. This is certainly true if you use hash tables to replace arrays. (Of course, if the array is sparse, then the hash table may take less memory.)
There are some operations which are not efficiently supported by hash tables, such as iterating over all the elements whose keys are within a certain range, finding the element with the largest key or smallest key, and so on.
The O(n) complexity is on average. For some extreme cases (for example, all data fall into the same bucket), it would be inefficient.
All of that aside, you do still have a good point. Hashtables have an extraordinarily broad range of suitable use cases. That's why they are the primary built-in data structure in some scripting languages, like Lua.
You may use Hash to search the element, but you cannot use it to do the things like find the largest number quickly, you should use the data strutcture for the specified problem. Hash cannot solve all the problem.
HashTable is not answer for all. If your hash function does not distribute your key well than hashMap may turn into a linkedList in worst case for which the insertion, deletion, search will take O(N) in worst case.
HashMap has significant memory footprint so there are some use cases where you memory is too precious than time complexity then you HashMap may not be the best choice.
HashMap is not an answer for range queries or prefix queries. So that is why most of the database vendor do implement indexing by Btree rather than only by hashing for range or prefix queries.
HashTable in general exhibit poor locality of reference that is, the data to be accessed is distributed seemingly at random in memory.
For certain string processing applications, such as spellchecking, hash tables may be less efficient than tries, finite automata, or Judy arrays. Also, if each key is represented by a small enough number of bits, then, instead of a hash table, one may use the key directly as the index into an array of values. Note that there are no collisions in this case.
Hash Tables are not sorted (map)
Hash Tables are not best for head/tail insert (link list/deque)
Hash Tables have overhead to support searching (vector/array)
The potential security issues of hash tables on the web should also be pointed out. If someone knows the hash function, that person may perform a denial-of-service attack by creating lots of items with the same hashcode.
I don't get it, enum/symbol-keys not wasteful enough? ;) What about just using the raw string pointer as key? I must have overlooked some obvious advantage in hashing... but now thinking about it, it makes less and less sense.
It's all just local representation anyway, right? I mean, I could share the data everywhere... API's, IPC or RPC - but not sure how helpful those hashed keys are unless the full string is embedded too.
Meaning you just spent a lot of time hashing strings back and forth for your own amusement.
I'll just leave this here...

What is the best data structure for fast dictionary search?

As far as I know, hash tables and double array tries are two of fastest data structures for searching a dictionary. Are there any other data structures or algorithms that can beat them?
Hash table need not always be a fast searching data structure. It really depends on how good your hashing function is. If your hashing function is not very good it could resolves multiple keys to map to similar index causing collision and make the hash table degenerate to O(n) run time.
Self Balancing trees are considered to be fast data structures as well as they guarantee O(log n)

questions on data structure design for vectors?

While reading some materials on data structure design for sparse vectors, the authors make some statements as follows.
A hash table could be used
to implement a simple index-to-value mapping. Accessing an index value is slower than with direct array
access, but not by much.
Why assessing an index value is slower when using hash table?
Further, the authors state that
The problem with a hash-backed implementation is that it becomes relatively slow to iterate through
all values in order by index.
An ordered mapping based on a tree structure or
similar can address this problem, since it maintains keys in order. The price of this feature is longer access
time.
Why hash-based implementation performs bad when iterating through all values? Does that due the slower operation of assessing an index?
How can a tree structure help this kind of issue?
Accessing a hash table index is just a bit slower because of the calculation overhead.
In a hash table, if you request item 452345435 it doesn't mean it's in cell 452345435 ... The hash table performs a series of calculation to find the right cell. This is implementation dependent.
Hash table Performance analysis
Hash tables don't store sorted data. So if you want to get the items in the right order, a sorting algorithm will need to be called.
To solve that, you can use a tree, or any other sorted data structure.
But that will increase the inserting complexity from O(1) (hash table) to O(logn) (insert to a tree, sorted database).
That because each index will be added to both data structures, and the complexity will be O(1) + O(logn) = O(logn)
It will still take only O(1) to retrieve the data, because it's enough to request it from the hash table.

Is a data structure implementation with O(1) search possible without using arrays?

I am currently taking a university course in data structures, and this topic has been bothering me for a while now (this is not a homework assignment, just a purely theoretical question).
Let's assume you want to implement a dictionary. The dictionary should, of course, have a search function, accepting a key and returning a value.
Right now, I can only imagine 2 very general methods of implementing such a thing:
Using some kind of search tree, which would (always?) give an O(log n) worst case running time for finding the value by the key, or,
Hashing the key, which essentially returns a natural number which corresponds to an index in an array of values, giving an O(1) worst case running time.
Is O(1) worst case running time possible for a search function, without the use of arrays?
Is random access available only through the use of arrays?
Is it possible through the use of a pointer-based data structure (such as linked lists, search trees, etc.)?
Is it possible when making some specific assumptions, for example, the keys being in some order?
In other words, can you think of an implementation (if one is possible) for the search function and the dictionary that will receive any key in the dictionary and return its value in O(1) time, without using arrays for random access?
Here's another answer I made on that general subject.
Essentially, algorithms reach their results by processing a certain number of bits of information. The length of time they take depends on how quickly they can do that.
A decision point having only 2 branches cannot process more than 1 bit of information. However, a decision point having n branches can process up to log(n) bits (base 2).
The only mechanism I'm aware of, in computers, that can process more than 1 bit of information, in a single operation, is indexing, whether it is indexing an array or executing a jump table (which is indexing an array).
It is not the use of an array that makes the lookup O(1), it's the fact that the lookup time is not dependent upon the size of the data storage. Hence any method that accesses data directly, without a search proportional in some way to the data sotrage size, would be O(1).
you could have a hash implemented with a trie tree. The complexity is O(max(length(string))), if you have strings of limited size, then you could say it runs in O(1), it doesn't depend on the number of strings you have in the structure. http://en.wikipedia.org/wiki/Trie

What data structure would have the fastest time for search and insert functions?

The question pretty much says it all, but I'm building a compiler and am trying to decide on what sort of data structure to use for my symbol table. Considering the only functions the symbol table will need is a search and an insert I'd like to use a data structure that can do those as quickly as possible. Any suggestions?
Hash tables are very commonly used for this. A simple implementation with N bins and a hash function that takes the sum of the letters in each symbol (mod N) should be very close to O(1) on insert and search.
Dictionary/Hashtable has a lookup speed of O(1) if I'm not mistaken.
About the hashtable lookup - it's O(1) only if none or few collisions occur - so assuming that you have appropriate hashing function, it's O(1) typically, but under worst circumstances it could end up with O(N). Good estimation of data size is crucial.
And you should also consider time complexity of the hashing function you intend to use

Resources