How is a hash mapped to a location in a vector? What happens when the vector is resized? - data-structures

I've been thinking about HashMaps/Dictionaries recently, and I realise there is a gap in my understanding of their implementation.
From my data structures classes, I was told that a hash value will map a key directly to a location in a vector of linked-list buckets. According to wikipedia, MurmurHash creates a 32-bit or 128-bit value. Obviously that value cannot map directly to a location in memory. How is that hash value used to assign a location in the underlying vector to the object being placed in the hash map?
After reading David Robinson's answer I want to expand my question:
If the mapping is based on the size of the underlying vector of lists, what happens when the vector is resized?

Typically when the result of a hash is generated, it is put through modulo N, where N is the size of the allocated vector of linked lists. Pseudocode:
linked_list = lists[MurmurHash(x) % len(lists)]
linked_list.append(x)
This lets the implementer decide on the length of the vector of linked lists (that is, how much he wants to trade space efficiency for time efficiency), while keeping the result pseudorandom.
A common alternative worth mentioning is bit masking- for example, ignoring all but the b least significant bits. (For instance, performing the operation x & 7 ignores all but the 3 least significant bits). This is equivalent to x modulo 2^b, it just happens to be faster on most operating systems.
To answer your second question: if the vector has to be resized, then each value stored in the dictionary does indeed need to be remapped.
There is an excellent blog post on the implementation of dictionaries in Python that explains how that language implements its built in hash table (which it calls dictionaries). In that implementation:
The dictionary is resized (made larger) if more than 2/3 of its slots are being used
The list of slots is resized to 4 times its current size
Every value in the old list of slots is remapped to the new list of slots.
There are many other useful optimizations described in that blog post; it gives an excellent view of the practical aspects of implementing a hash table.

Related

Why are we using linked list to address collisions in hash tables?

I was wondering why many languages (Java, C++, Python, Perl etc) implement hash tables using linked lists to avoid collisions instead of arrays?
I mean instead of buckets of linked lists, we should use arrays.
If the concern is about the size of the array then that means that we have too many collisions so we already have a problem with the hash function and not the way we address collisions. Am I misunderstanding something?
I mean instead of buckets of linked lists, we should use arrays.
Pros and cons to everything, depending on many factors.
The two biggest problem with arrays:
changing capacity involves copying all content to another memory area
you have to choose between:
a) arrays of Element*s, adding one extra indirection during table operations, and one extra memory allocation per non-empty bucket with associated heap management overheads
b) arrays of Elements, such that the pre-existing Elements iterators/pointers/references are invalidated by some operations on other nodes (e.g. insert) (the linked list approach - or 2a above for that matter - needn't invalidate these)
...will ignore several smaller design choices about indirection with arrays...
Practical ways to reduce copying from 1. include keeping excess capacity (i.e. currently unused memory for anticipated or already-erased elements), and - if sizeof(Element) is much greater than sizeof(Element*) - you're pushed towards arrays-of-Element*s (with "2a" problems) rather than Element[]s/2b.
There are a couple other answers claiming erasing in arrays is more expensive than for linked lists, but the opposite's often true: searching contiguous Elements is faster than scanning a linked list (less steps in code, more cache friendly), and once found you can copy the last array Element or Element* over the one being erased then decrement size.
If the concern is about the size of the array then that means that we have too many collisions so we already have a problem with the hash function and not the way we address collisions. Am I misunderstanding something?
To answer that, let's look at what happens with a great hash function. Packing a million elements into a million buckets using a cryptographic strength hash, a few runs of my program counting the number of buckets to which 0, 1, 2 etc. elements hashed yielded...
0=367790 1=367843 2=184192 3=61200 4=15370 5=3035 6=486 7=71 8=11 9=2
0=367664 1=367788 2=184377 3=61424 4=15231 5=2933 6=497 7=75 8=10 10=1
0=367717 1=368151 2=183837 3=61328 4=15300 5=3104 6=486 7=64 8=10 9=3
If we increase that to 100 million elements - still with load factor 1.0:
0=36787653 1=36788486 2=18394273 3=6130573 4=1532728 5=306937 6=51005 7=7264 8=968 9=101 10=11 11=1
We can see the ratios are pretty stable. Even with load factor 1.0 (the default maximum for C++'s unordered_set and -map), 36.8% of buckets can be expected to be empty, another 36.8% handling one Element, 18.4% 2 Elements and so on. For any given array resizing logic you can easily get a sense of how often it will need to resize (and potentially copy elements). You're right that it doesn't look bad, and may be better than linked lists if you're doing lots of lookups or iterations, for this idealistic cryptographic-hash case.
But, good quality hashing is relatively expensive in CPU time, such that general purpose hash-table supporting hash functions are often very weak: e.g. it's very common for C++ Standard library implementations of std::hash<int> to return their argument, and MS Visual C++'s std::hash<std::string> picks 10 characters evently spaced along the string to incorporate in the hash value, regardless of how long the string is.
Clearly implementation's experience has been that this combination of weak-but-fast hash functions and linked lists (or trees) to handle the greater collision proneness works out faster on average - and has less user-antagonising manifestations of obnoxiously bad performance - for everyday keys and requirements.
Strategy 1
Use (small) arrays which get instantiated and subsequently filled once collisions occur. 1 heap operation for the allocation of the array, then room for N-1 more. If no collision ever occurs again for that bucket, N-1 capacity for entries is wasted. List wins, if collisions are rare, no excess memory is allocated just for the probability of having more overflows on a bucket. Removing items is also more expensive. Either mark deleted spots in the array or move the stuff behind it to the front. And what if the array is full? Linked list of arrays or resize the array?
One potential benefit of using arrays would be to do a sorted insert and then binary search upon retrieval. The linked list approach cannot compete with that. But whether or not that pays off depends on the write/retrieve ratio. The less frequently writing occurs, the more could this pay off.
Strategy 2
Use lists. You pay for what you get. 1 collision = 1 heap operation. No eager assumption (and price to pay in terms of memory) that "more will come". Linear search within the collision lists. Cheaper delete. (Not counting free() here). One major motivation to think of arrays instead of lists would be to reduce the amount of heap operations. Amusingly the general assumption seems to be that they are cheap. But not many will actually know how much time an allocation requires compared to, say traversing the list looking for a match.
Strategy 3
Use neither array nor lists but store the overflow entries within the hash table at another location. Last time I mentioned that here, I got frowned upon a bit. Benefit: 0 memory allocations. Probably works best if you have indeed low fill grade of the table and only few collisions.
Summary
There are indeed many options and trade-offs to choose from. Generic hash table implementations such as those in standard libraries cannot make any assumption regarding write/read ratio, quality of hash key, use cases, etc. If, on the other hand all those traits of a hash table application are known (and if it is worth the effort), it is well possible to create an optimized implementation of a hash table which is tailored for the set of trade offs the application requires.
The reason is, that the expected length of these lists is tiny, with only zero, one, or two entries in the vast majority of cases. Yet these lists may also become arbitrarily long in the worst case of a really bad hash function. And even though this worst case is not the case that hash tables are optimized for, they still need to be able to handle it gracefully.
Now, for an array based approach, you would need to set a minimal array size. And, if that initial array size is anything other then zero, you already have significant space overhead due to all the empty lists. A minimal array size of two would mean that you waste half your space. And you would need to implement logic to reallocate the arrays when they become full because you cannot put an upper limit to the list length, you need to be able to handle the worst case.
The list based approach is much more efficient under these constraints: It has only the allocation overhead for the node objects, most accesses have the same amount of indirection as the array based approach, and it's easier to write.
I'm not saying that it's impossible to write an array based implementation, but its significantly more complex and less efficient than the list based approach.
why many languages (Java, C++, Python, Perl etc) implement hash tables using linked lists to avoid collisions instead of arrays?
I'm almost sure, at least for most from that "many" languages:
Original implementors of hash tables for these languages just followed classic algorithm description from Knuth/other algorithmic book, and didn't even consider such subtle implementation choices.
Some observations:
Even using collision resolution with separate chains instead of, say, open addressing, for "most generic hash table implementation" is seriously doubtful choice. My personal conviction -- it is not the right choice.
When hash table's load factor is pretty low (that should chosen in nearly 99% hash table usages), the difference between the suggested approaches hardly could affect overall data structure perfromance (as cmaster explained in the beginning of his answer, and delnan meaningfully refined in the comments). Since generic hash table implementations in languages are not designed for high density, "linked lists vs arrays" is not a pressing issue for them.
Returning to the topic question itself, I don't see any conceptual reason why linked lists should be better than arrays. I can easily imagine, that, in fact, arrays are faster on modern hardware / consume less memory with modern momory allocators inside modern language runtimes / operating systems. Especially when the hash table's key is primitive, or a copied structure. You can find some arguments backing this opinion here: http://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures
But the only way to find the correct answer (for particular CPU, OS, memory allocator, virtual machine and it's garbage collection algorithm, and the hash table use case / workload!) is to implement both approaches and compare them.
Am I misunderstanding something?
No, you don't misunderstand anything, your question is legal. It's an example of fair confusion, when something is done in some specific way not for a strong reason, but, largely, by occasion.
If is implemented using arrays, in case of insertion it will be costly due to reallocation which in case of linked list doesn`t happen.
Coming to the case of deletion we have to search the complete array then either mark it as delete or move the remaining elements. (in the former case it makes the insertion even more difficult as we have to search for empty slots).
To improve the worst case time complexity from o(n) to o(logn), once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree (in java).

What makes table lookups so cheap?

A while back, I learned a little bit about big O notation and the efficiency of different algorithms.
For example, looping through each item in an array to do something with it
foreach(item in array)
doSomethingWith(item)
is an O(n) algorithm, because the number of cycles the program performs is directly proportional to the size of the array.
What amazed me, though, was that table lookup is O(1). That is, looking up a key in a hash table or dictionary
value = hashTable[key]
takes the same number of cycles regardless of whether the table has one key, ten keys, a hundred keys, or a gigabrajillion keys.
This is really cool, and I'm very happy that it's true, but it's unintuitive to me and I don't understand why it's true.
I can understand the first O(n) algorithm, because I can compare it to a real-life example: if I have sheets of paper that I want to stamp, I can go through each paper one-by-one and stamp each one. It makes a lot of sense to me that if I have 2,000 sheets of paper, it will take twice as long to stamp using this method than it would if I had 1,000 sheets of paper.
But I can't understand why table lookup is O(1). I'm thinking that if I have a dictionary, and I want to find the definition of polymorphism, it will take me O(logn) time to find it: I'll open some page in the dictionary and see if it's alphabetically before or after polymorphism. If, say, it was after the P section, I can eliminate all the contents of the dictionary after the page I opened and repeat the process with the remainder of the dictionary until I find the word polymorphism.
This is not an O(1) process: it will usually take me longer to find words in a thousand page dictionary than in a two page dictionary. I'm having a hard time imagining a process that takes the same amount of time regardless of the size of the dictionary.
tl;dr: Can you explain to me how it's possible to do a table lookup with O(1) complexity?
(If you show me how to replicate the amazing O(1) lookup algorithm, I'm definitely going to get a big fat dictionary so I can show off to all of my friends my ninja-dictionary-looking-up skills)
EDIT: Most of the answers seem to be contingent on this assumption:
You have the ability to access any page of a dictionary given its page number in constant time
If this is true, it's easy for me to see. But I don't know why this underlying assumption is true: I would use the same process to to look up a page by number as I would by word.
Same thing with memory addresses, what algorithm is used to load a memory address? What makes it so cheap to find a piece of memory from an address? In other words, why is memory access O(1)?
You should read the Wikipedia article.
But the essence is that you first apply a hash function to your key, which converts it to an integer index (this is O(1)). This is then used to index into an array, which is also O(1). If the hash function has been well designed, there should only be one (or a few items) stored at each location in the array, so the lookup is complete.
So in massively-simplified pseudocode:
ValueType array[ARRAY_SIZE];
void insert(KeyType k, ValueType v)
{
int index = hash(k);
array[index] = v;
}
ValueType lookup(KeyType k)
{
int index = hash(k);
return array[index];
}
Obviously, this doesn't handle collisions, but you can read the article to learn how that's handled.
Update
To address the edited question, indexing into an array is O(1) because underneath the hood, the CPU is doing this:
ADD index, array_base_address -> pointer
LOAD pointer -> some_cpu_register
where LOAD loads data stored in memory at the specified address.
Update 2
And the reason a load from memory is O(1) is really just because this is an axiom we usually specify when we talk about computational complexity (see http://en.wikipedia.org/wiki/RAM_model). If we ignore cache hierarchies and data-access patterns, then this is a reasonable assumption. As we scale the size of the machine,, this may not be true (a machine with 100TB of storage may not take the same amount of time as a machine with 100kB). But usually, we assume that the storage capacity of our machine is constant, and much much bigger than any problem size we're likely to look at. So for all intents and purposes, it's a constant-time operation.
I'll address the question from a different perspective from every one else. Hopefully this will give light to why the accessing x[45] and accessing x[5454563] takes the same amount of time.
A RAM is laid out in a grid (i.e. rows and columns) of capacitors. A RAM can address a particular cell of memory by activating a particular column and row on the grid, so let's say if you have a 16-byte capacity RAM, laid out in a 4x4 grid (insanely small for modern computer, but sufficient for illustrative purpose), and you're trying to access the memory address 13 (1101), you first split the address into rows and column, i.e row 3 (11) column 1 (01).
Let's suppose a 0 means taking the left intersection and a 1 means taking a right intersection. So when you want to activate row 3, you send an army of electrons in the row starting gate, the row-army electrons went right, right to reach row 3 activation gate; next you send another army of electrons on the column starting gate, the column-army electrons went left then right to reach the 1st column activation gate. A memory cell can only be read/written if the row and column are both activated, so this would allow the marked cell to be read/written.
The effect of all this gibberish is that the access time of a memory address depends on the address length, and not the particular memory address itself; if an architecture uses a 32-bit address space (i.e. 32 intersections), then addressing memory address 45 and addressing memory address 5454563 both will still have to pass through all 32 intersections (actually 16 intersections for the row electrons and 16 intersections for the columns electrons).
Note that in reality memory addressing takes very little amount of time compared to charging and discharging the capacitors, therefore even if we start having a 512-bit length address space (enough for ~1.4*10^130 yottabyte of RAM, i.e. enough to keep everything under the sun in your RAM), which mean the electrons would have to go through 512 intersections, it wouldn't really add that much time to the actual memory access time.
Note that this is a gross oversimplification of modern RAM. In modern DRAM, if you want to access subsequent memory addresses you only change the columns and not spend time changing the rows, therefore accessing subsequent memory is much faster than accessing totally random addresses. Also, this description is totally ignorant about the effect of CPU cache (although CPU cache also uses a similar grid addressing scheme, however since CPU cache uses the much faster transistor-based capacitor, the negative effect of having large cache address space becomes very critical). However, the point still holds that if you're jumping around the memory, accessing any one of them will take the same amount of time.
You're right, it's surprisingly difficult to find a real-world example of this. The idea of course is that you're looking for something by address and not value.
The dictionary example fails because you don't immediately know the location of page say 278. You still have to look that up the same as you would a word because the page locations are not in your memory.
But say I marked a number on each of your fingers and then I told you to wiggle the one with 15 written on it. You'd have to look at each of them (assuming its unsorted), and if it's not 15 you check the next one. O(n).
If I told you to wiggle your right pinky. You don't have to look anything up. You know where it is because I just told you where it is. The value I just passed to you is its address in your "memory."
It's kind of like that with databases, but on a much larger scale than just 10 fingers.
Because work is done up front -- the value is put in a bucket that is easily accessible given the hashcode of the key. It would be like if you wanted to look up your work in the dictionary but had marked the exact page the word was on.
Imagine you had a dictionary where everything starting with letter A was on page 1, letter B on page 2...etc. So if you wanted to look up "balloon" you would know exactly what page to go to. This is the concept behind O(1) lookups.
Arbitrary data input => maps to a specific memory address
The trade-off of course being you need more memory to allocate for all the potential addresses, many of which may never be used.
If you have an array with 999999999 locations, how long does it take to find a record by social security number?
Assuming you don't have that much memory, then allocate about 30% more array locations that the number of records you intend to store, and then write a hash function to look it up instead.
A very simple (and probably bad) hash function would be social % numElementsInArray.
The problem is collisions--you can't guarantee that every location holds only one element. But thats ok, instead of storing the record at the array location, you can store a linked list of records. Then you scan linearly for the element you want once you hash to get the right array location.
Worst case this is O(n)--everything goes to the same bucket. Average case is O(1) because in general if you allocate enough buckets and your hash function is good, records generally don't collide very often.
Ok, hash-tables in a nutshell:
You take a regular array (O(1) access), and instead of using regular Int values to access it, you use MATH.
What you do, is to take the key value (lets say a string) calculate it into a number (some function on the characters) and then use a well known mathematical formula that gives you a relatively good distribution on the array's range.
So, in that case you are just doing like 4-5 calculations (O(1)) to get an object from that array, using a key which isn't an int.
Now, avoiding collisions, and finding the right mathematical formula for good distribution is the hard part. That's what is explained pretty well in wikipedia: en.wikipedia.org/wiki/Hash_table
Lookup tables know exactly how to access the given item in the table before hand.
Completely the opposite of say, finding an item by it's value in a sorted array, where you have to access items to check that it is what you want.
In theory, a hashtable is a series of buckets (addresses in memory) and a function that maps objects from a domain into those buckets.
Say your domain is 3 letter words, you'd block out 26^3=17,576 addresses for all the possible 3 letter words and create a function that maps all 3 letter words to those addresses, e.g., aaa=0, aab=1, etc. Now when you have a word you'd like to look up, say, "and", you know immediately from your O(1) function that it is address number 367.

Efficient mapping from 2^24 values to a 2^7 index

I have a data structure that stores amongst others a 24-bit wide value. I have a lot of these objects.
To minimize storage cost, I calculated the 2^7 most important values out of the 2^24 possible values and stored them in a static array. Thus I only have to save a 7-bit index to that array in my data structure.
The problem is: I get these 24-bit values and I have to convert them to my 7-bit index on the fly (no preprocessing possible). The computation is basically a search which one out of 2^7 values fits best. Obviously, this takes some time for a big number of objects.
An obvious solution would be to create a simple mapping array of bytes with the length 2^24. But this would take 16 MB of RAM. Too much.
One observation of the 16 MB array: On average 31 consecutive values are the same. Unfortunately there are also a number of consecutive values that are different.
How would you implement this conversion from a 24-bit value to a 7-bit index saving as much CPU and memory as possible?
Hard to say without knowing what the definition is of "best fit". Perhaps a kd-tree would allow a suitable search based on proximity by some metric or other, so that you quickly rule out most candidates, and only have to actually test a few of the 2^7 to see which is best?
This sounds similar to the problem that an image processor has when reducing to a smaller colour palette. I don't actually know what algorithms/structures are used for that, but I'm sure they're look-up-able, and might help.
As an idea...
Up the index table to 8 bits, then xor all 3 bytes of the 24 bit word into it.
then your table would consist of this 8 bit hash value, plus the index back to the original 24 bit value.
Since your data is RGB like, a more sophisticated hashing method may be needed.
bit24var & 0x000f gives you the right hand most char.
(bit24var >> 8) & 0x000f gives you the one beside it.
(bit24var >> 16) & 0x000f gives you the one beside that.
Yes, you are thinking correctly. It is quite likely that one or more of the 24 bit values will hash to the same index, due to the pigeon hole principal.
One method of resolving a hash clash is to use some sort of chaining.
Another idea would be to put your important values is a different array, then simply search it first. If you don't find an acceptable answer there, then you can, shudder, search the larger array.
How many 2^24 haves do you have? Can you sort these values and count them by counting the number of consecutive values.
Since you already know which of the 2^24 values you need to keep (i.e. the 2^7 values you have determined to be important), we can simply just filter incoming data and assign a value, starting from 0 and up to 2^7-1, to these values as we encounter them. Of course, we would need some way of keeping track of which of the important values we have already seen and assigned a label in [0,2^7) already. For that we can use some sort of tree or hashtable based dictionary implementation (e.g. std::map in C++, HashMap or TreeMap in Java, or dict in Python).
The code might look something like this (I'm using a much smaller range of values):
import random
def make_mapping(data, important):
mapping=dict() # dictionary to hold the final mapping
next_index=0 # the next free label that can be assigned to an incoming value
for elem in data:
if elem in important: #check that the element is important
if elem not in mapping: # check that this element hasn't been assigned a label yet
mapping[elem]=next_index
next_index+=1 # this label is assigned, the next new important value will get the next label
return mapping
if __name__=='__main__':
important_values=[1,5,200000,6,24,33]
data=range(0,300000)
random.shuffle(data)
answer=make_mapping(data,important_values)
print answer
You can make the search much faster by using hash/tree based set data structure for the set of important values. That would make the entire procedure O(n*log(k)) (or O(n) if its is a hashtable) where n is the size of input and k is the set of important values.
Another idea is to represent the 24BitValue array in a bit map. A nice unsigned char can hold 8 bits, so one would need 2^16 array elements. Thats 65536. If the corresponding bit is set, then you know that that specific 24BitValue is present in the array, and needs to be checked.
One would need an iterator, to walk through the array and find the next set bit. Some machines actually provide a "find first bit" operation in their instruction set.
Good luck on your quest.
Let us know how things turn out.
Evil.

What are some common uses for bitarrays?

I've done an example using bitarrays from a newbie manual. I want to know what they can be used for and what some common data structures for them (assuming that "array" is fairly loose terminology.)
Thanks.
There are several listed in the Applications section of the Bit array Wikipedia article:
Because of their compactness, bit arrays have a number of applications in areas where space or efficiency is at a premium. Most commonly, they are used to represent a simple group of boolean flags or an ordered sequence of boolean values.
We mentioned above that bit arrays are used for priority queues, where the bit at index k is set if and only if k is in the queue; this data structure is used, for example, by the Linux kernel, and benefits strongly from a find-first-zero operation in hardware.
Bit arrays can be used for the allocation of memory pages, inodes, disk sectors, etc. In such cases, the term bitmap may be used. However, this term is frequently used to refer to raster images, which may use multiple bits per pixel.
Another application of bit arrays is the Bloom filter, a probabilistic set data structure that can store large sets in a small space in exchange for a small probability of error. It is also possible to build probabilistic hash tables based on bit arrays that accept either false positives or false negatives.
Bit arrays and the operations on them are also important for constructing succinct data structures, which use close to the minimum possible space. In this context, operations like finding the nth 1 bit or counting the number of 1 bits up to a certain position become important.
Bit arrays are also a useful abstraction for examining streams of compressed data, which often contain elements that occupy portions of bytes or are not byte-aligned. For example, the compressed Huffman coding representation of a single 8-bit character can be anywhere from 1 to 255 bits long.
In information retrieval, bit arrays are a good representation for the posting lists of very frequent terms. If we compute the gaps between adjacent values in a list of strictly increasing integers and encode them using unary coding, the result is a bit array with a 1 bit in the nth position if and only if n is in the list. The implied probability of a gap of n is 1/2n. This is also the special case of Golomb coding where the parameter M is 1; this parameter is only normally selected when -log(2-p)/log(1-p) ≤ 1, or roughly the term occurs in at least 38% of documents.

Best Algorithm for key/value pair where key is an int64 in Delphi, pre Delphi 2009?

I need an algorithm to store a key/value pair, where the key is an Int64. I'm currently using a sorted IntList (same as a TStringList, but stores int64s). This gives me O(log n) for search, Insert and delete operations. Since I don't ever need the items sorted, this is a little inefficient. I need some kind of hashtable for O(1) operations. The problem is that most implementations I can find assume the key is a string. Now I could obviously convert the Int64 key to a string, but this does seem wasteful. Any ideas?
I do not know the number of items before they are entered to the data structure.
I also should add that I have implemented the same component in .net, using Dictionary, and it's adding the items that is so much faster in the .net version. Once the data structure is setup, traversals and retrievals are not that bad in comparison, but it's insertion that is killing me.
Delphi 2009 and later has added Generics.
So starting Delphi 2009, you can implement your key/value pair in a similar manner as you do in .NET using a TDICTIONARY.
And TDICTIONARY in Delphi uses a hash table table and has O(1) operations.
You could build a hash-table, where the hash-value is a simple modulo of the Int64 you're adding to the hash.
Any good hash-table implementation will have the generation of the hash-index (by hashing the key) separate from the rest of the logic.
Some implementations are summed up here : Hashtable implementation for Delphi 5
You can compute a hash value directly from the int64 value, but for that you need to find a hash function which distributes the different int64 values evenly, so that you get little to no collisions. This of course depends on the values of those keys. If you don't know the number of items you most probably also don't know how these int64 values are distributed, so coming up with a good hash function will be hard to impossible.
Assuming your keys are not multiples of something (like addresses, which will be multiples of 4, 8, 16 and so on) you could speed things up a little by using a list of several of those IntList objects, and compute first an index into this array of lists. Using the mod operator and a prime number would be an easy way to calculate the list index. As always this is a trade-off between speed and memory consumption.
You might also google for a good implementation of sparse arrays. IIRC the EZDSL library by Julian Bucknall has one.
Some thoughts, not a full blown solution.
Unless there is definite proof that the search itself is the bottleneck (don't use your "feeling" to detect bottlenecks, use a code profiler) I would stick with the IntList... If the time spent in the actual search/insert/delete does not amount for at least 20% of the total processor time, don't even bother.
If you still want a hashtable, then ...
Do not convert to a string. The conversion would allocate a new string from the heap, which is much more costly than doing the search itself. Use the int64 modulo some cleverly chosen prime number as the hash key.
Hashtables will give you O(1) only if they are large enough. Otherwise, you will get a large amount of records that share the same hash key. Make it too short, you'll waste your time searching (linearly !) through the linked list. Make it too large, and you waste memory.
Keep in mind that hash tables require some form of linked list to keep all records sharing the same key. This linked list must be implemented either by adding a "next" pointer in the payload objects (which breaks encapsulation - the object does not have to know it is stored in a hash table) or allocating a small helper object. This allocation is likely to be much more costly than the O(log) of the sorted list.

Resources