Is finding the length of a hash map a costly operation? I understand it depends on the implementation, so how about in these languages
Javascript
Java
Python
PHP (in PHP we do count(<Array>) if I am correct)
Additional Question
Is there any source where I can learn how to determine cost of an operation (starting from primitive data types to complex structures) ?
HashMap in Java collection has the method size() to return its member size updated accordingly whenever there's a change in the elements.
If you are looking for its computational complexity,
the time-complexity of the method size() itself is clearly constant.
The complexity of doing it from scratch-- ignoring the availability of size count that is already there and implementing your own method, it would be the same as the complexity of accessing all elements of the HashMap and is O(n).
Related
I'm getting confused about the time complexity of hash tables in general.
I understand that insert, search, etc. are amortized O(1) time due to resizing, the fact that array accesses after computing a hashcode are constant time, that we can always keep the load factor of the hash table to a constant value, and that good hashcodes enable us to have minimal collisions.
However, my question is about the runtime of the hashCode() itself.
My understand is that insert(), get(), etc. rely on computing the hashcode first and then, conventionally, storing a given item at index i = hashCode() % buckets.length
This seems to conflict with the amortized O(1) performance of the hashtable, because O(1) performance seems to assume that computing a hashcode is constant time.
However, from my understanding, for many recursive objects (such as a linked list), we compute the hashcode recursively (I'm omitting the base case for simplicity):
hashcode():
return this.item.hashCode() + 31 * this.next.hashCode()
Wouldn't this mean that hashCode() has O(N) runtime and thus insert() for the hashtable would have O(N) runtime?
If this is the case, then wouldn't insert() for hashtables be pretty bad for items which are being inserted which have slow hashCode() functions? Even for something like inserting a String, doesn't this mean that insert() would be O(N) where N is the length of the string, since we compute the hashcode based on each character in the string?
I've seen hashcodes become cached before, but caching only happens for a particular instance of an object, which means we need to insert the same instance of an object multiple times in order for caching to have an effect, which doesn't seem particularly helpful in practical scenarios.
If what I'm saying is correct, then, does this mean that for some sequential data such as strings, a Trie would have better performance since its insert() and get() would be O(L) worst case where L is the length of the item/sequence being inserted - is this the point of Tries?
Must a hash table be implemented using an array? Will an alternative data structure achieve the same efficiency? If yes, why? If no, what condition must the data structure satisfy to ensure the same efficiency as provided by arrays?
Must a hash table be implemented using an array?
No. You could implement the HashTable interface with other datastructures besided the arrray. E.g. a Red-Black tree (java's TreeMap).
This offers O(logN) access time.
But Hash Table is expected to have O(1) access time (at best case - no collisions).
This can be achieved only via an array which offers the possibility of random access in constant time.
what condition must the data structure satisfy to ensure the same
efficiency as provided by arrays?
Must have a comparable performance (less than O(N)) with an array. A treemap has O(logN) worst access time for all operations
I want to cache 10.000+ key/value pairs (both strings) and started thinking which .NET (2.0, bound to MS Studio 2005 :( ) structure would be best. All items will be added in one shot, then there will be a few 100s of queries for particular keys.
I've read MSDN descriptions referenced in the other question but I stil miss some details about implementation / complexity of operation on various collections.
E.g. in the above mentioned question, there is quote from MSDN saying that SortedList is based on a tree and SortedDictionary "has similar object model" but different complexity.
The other question: are HashTable and Dictionary implemented in the same way?
For HashTable, they write:
If Count is less than the capacity of the Hashtable, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.
But when the capacity is increased? With every "Add"? Then it would be quadratic complexity of adding a series of key/value pairs. The same as with SortedList.
Not mentioning OrderedDictionary, where nothing is mentioned about implementation / complexity.
Maybe someone knows some good article about implementation of .NET collections?
The capacity the HashTable is different than the Count.
Normally the capacity -- the maximum number of items that can be stored, normally related to the number of underlying hash buckets -- doubles when a "grow" is required, although this is implementation-dependent. The Count simply refers to the number of items actually stored, which must be less than or equal to the capacity but is otherwise not related.
Because of the exponentially increasing interval (between the O(n), n = Count, resizing), most hash implementations claim O(1) amortized access. The quote is just saying: "Hey! It's amortized and isn't always true!".
Happy coding.
If you are adding that many pairs, you can/should use this Dictionary constructor to specify the capacity in advance. Then every add and lookup will be O(1).
If you really want to see how these classes are implemented, you can look at the Rotor source or use .NET Reflector to look at System.Collections (not sure of the legality of the latter).
The HashTable and Dictionary are implemented in the same way. Dictionary is the generic replacement for the HashTable.
When the capacity of collections like List and Dictionary have to increase, it will grow at a certain rate. For List the rate is 2.0, i.e. the capacity is doubled. I don't know the exact rate for Dictionary, but it works the same way.
For a List, the way that the capacity is increased means that an item has been copied by average 1.3 times extra. As that value stays constant when the list grows, the Add method is still an O(1) operation by average.
Dictionary is a kind of hashtable; I never use the original Hashtable since it only holds "objects". Don't worry worry about the fact that insertion is O(N) when the capacity is increased; Dictionary always doubles the capacity when the hashtable is full, so the average (amortized) complexity is O(1).
You should almost never use SortedList (which is basically an array), since complexity is O(N) for each insert or delete (assuming the data is not already sorted. If the data is sorted then you get O(1), but if the data is already sorted then you still don't need to use SortedList because an ordinary List would have sufficed.) Instead of SortedList, use SortedDictionary which offers O(N log N) for insert, delete, and search. However, SortedDictionary is slower than Dictionary, so use it only if your data needs to be sorted.
You say you want to cache 10,000 key-value pairs. If you want to do all the inserts before you do any queries, an efficient method is to create an unsorted List, then Sort it, and use BinarySearch for queries. This approach saves a lot of memory compared to using SortedDictionary, and it creates less work for the garbage collector.
I am currently taking a university course in data structures, and this topic has been bothering me for a while now (this is not a homework assignment, just a purely theoretical question).
Let's assume you want to implement a dictionary. The dictionary should, of course, have a search function, accepting a key and returning a value.
Right now, I can only imagine 2 very general methods of implementing such a thing:
Using some kind of search tree, which would (always?) give an O(log n) worst case running time for finding the value by the key, or,
Hashing the key, which essentially returns a natural number which corresponds to an index in an array of values, giving an O(1) worst case running time.
Is O(1) worst case running time possible for a search function, without the use of arrays?
Is random access available only through the use of arrays?
Is it possible through the use of a pointer-based data structure (such as linked lists, search trees, etc.)?
Is it possible when making some specific assumptions, for example, the keys being in some order?
In other words, can you think of an implementation (if one is possible) for the search function and the dictionary that will receive any key in the dictionary and return its value in O(1) time, without using arrays for random access?
Here's another answer I made on that general subject.
Essentially, algorithms reach their results by processing a certain number of bits of information. The length of time they take depends on how quickly they can do that.
A decision point having only 2 branches cannot process more than 1 bit of information. However, a decision point having n branches can process up to log(n) bits (base 2).
The only mechanism I'm aware of, in computers, that can process more than 1 bit of information, in a single operation, is indexing, whether it is indexing an array or executing a jump table (which is indexing an array).
It is not the use of an array that makes the lookup O(1), it's the fact that the lookup time is not dependent upon the size of the data storage. Hence any method that accesses data directly, without a search proportional in some way to the data sotrage size, would be O(1).
you could have a hash implemented with a trie tree. The complexity is O(max(length(string))), if you have strings of limited size, then you could say it runs in O(1), it doesn't depend on the number of strings you have in the structure. http://en.wikipedia.org/wiki/Trie
Say we are traversing a graph and want to quickly determine if a node has been seen before or not. We have a few set preconditions.
Nodes have been marked with integers values 1..N
Graph is implemented with nodes having an adjacency list
Every integer value from 1..N occurs in the graph, which is of size N
Any ideas for doing this in a purely functional way?(No Hash tables or arrays allowed).
I want a data structure with two functions working on it; add(adds an encountered integer) and lookup(checks if integer has been added). Both should preferably take O(n) time amortized for N repetitions.
Is this possible?
You can use a Data.Set. You add an element by creating a new set from the old one with insert and pass the new set around. You look up whether an element is a member of the set with member. Both operations are O(log n).
Perhaps, you could consider using a state monad to thread the passing of the set.
Efficient element lookup in functional languages is quite hard. Data.Set (as shown above) is implemented using a binary tree which can be built up in a purely functional way providing lookup operations in O(log n). HashTables (which aren't purely functional) would have O(1).
I believe that Data.BitSet might be O(n).
Take a look at judy hashtables, if you don't mind wrapping your code in the IO monad.