I am solving a problem on project euler requiring dynamic programming, and in this particular instance, it is cleaner to use a hash table over a dynamic programming "solutions" table. Using r5rs, what functions are available to me to create my own hash table? how might i best go about constructing and using one? it's a hash table of integers.
Check out SFRI 69: Basic hash tables, which includes a reference implementation that is nearly pure R5RS.
Related
Hash tables allow mapping keys to values by using a hashing function. Here the hashing function actually computes the index of the key mapped to a specific value. But I just can't get my head around why we even use hash tables it in the first place? Why do you need a hash table? Are maps/dictionaries not good enough? Why not declare a dictionary ({'key1': 'value1'} in Python) and use it in the places where a hash table is required? I read a lot about it and still don't get it. Can you help me understand this?
why do you need a hashtable, is the map/dictionary not good
This is like asking why you need an automotive engine, isn't a car good enough? An engine is how a car works; you just don't see the engine when you are driving the car. But if you are learning to become an automotive engineer, then you should learn how engines work and how to design, build and maintain them.
Likewise, a hash table is how a dictionary works, you just don't see the hash table if you are writing code that uses a dictionary. But if you are learning to become a computer scientist, then you should learn how hash tables and other data structures work, and how to design, build and maintain them.
I am researching about hash tables and hash maps, everything I have read or watched gives a very vague description of the differences. From messing around on Netbeans with them both, they seem to have the same functions and do the same things, what are the fundamental differences between these two data structures?
There are no differences, but you can find that the same thing called differently in different programming languages, so how people call something depends on their background and programming language they use. For example: in c++ it will be HashMap and in java it will be HashTable.
Also, there could be one difference concluded based on the naming: HashTable allows only store hashed keys, but not values whereas HashMap allows to retrieve a value by hashed key. Internally the both will use the same algorithm and can be considered as same data structure.
HashTable sounds to me like a concrete data structure, although it has numerous variants depending on what happens when a collision occurs, when the table fills up, when it empties.
Map sounds like a abstract data structure, something defined by the available operations (Dictionary would be a potential other name for the same data structure, but I'd not be surprised if some nomenclature defined both with a nuance somewhere).
HashMap sounds like an implementation of the Map abstract data structure using an HashTable concrete data structure.
Again, I'd not be surprised if a language or a library provided both, with a nuance somewhere (HashMap for instance could provide only the operations defined for a Map, but HashTable provides everything which make sense for an HashTable).
I have read many articles on julia and its performance. but no where, i can find clue about why julia team decided to use column major for matrix operations. is it because thier way of operating on matrix fits on column major or something.
Advance thanks.
"Multidimensional arrays in Julia are stored in column-major order. This means that arrays are stacked one column at a time. This can be verified using the vec function or the syntax [:] ..."
"This convention for ordering arrays is common in many languages like Fortran, Matlab, and R (to name a few). The alternative to column-major ordering is row-major ordering, which is the convention adopted by C and Python (numpy) among other languages."
For examples and discussion of performance differences, see the Performance Tips section of Julia's Manual.
I need some advice for implementing a good hash table structure. I've researching something, but I would like some external opinions. Thanks!
Whatever hash function you choose you have to fullfil the following requirements:
provide a uniform distribution of hash values: a non-uniform distribution will increase the amount of collisions between mapped values.
good schema for collision resolution: it's almost impossible to avoid them, so you will have to implement some strategies such as "separate chaining" or "open addressing". A good starting point is http://task3.cc/44/hash-maps-with-linear-probing-and-separate-chaining/.
Guys, I am using dynamic programming approach to solve a problem. Here is a brief overview of the approach
Each value generated is identified using 25 unique keys.
I use the boost::hash_combine to generate the seed for the hash table using these 25 keys.
I store the values in a hash table declared as
boost::unordered_map<Key_Object, Data_Object, HashFunction> hashState;
I did a time profiling on my algorithm and found that nearly 95% of the run time is spent towards retrieving/inserting data into the hash table.
These were the details of my hash table
hashState.size() 1880
hashState.load_factor() 0.610588
hashState.bucket_count() 3079
hashState.max_size() 805306456
hashState.max_load_factor() 1
hashState.max_bucket_count() 805306457
I have the following two questions
Is there anything which I can do to improve the performance of the Hash Table's insert/retrieve operations?
C++ STL has hash_multimap which would also suit my requirement. How does boost libraries unordered_map compare with hash_multimap in terms of insert/retrieve performance.
If your hash function is not the culprit, the best you can do is probably using a different map implementation. Since your keys are quite large, using unordered_map from Boost.Intrusive library might be the best option. Alternatively, you could try closed hashing: Google SparseHash or MCT, though profiling is certainly needed because closed hashing is recommended when elements are small enough. (SparseHash is more established and well tested, but MCT doesn't need those set_empty()/set_deleted() methods).
EDIT:
I just noticed there is no intrusive map in the Boost library I mentioned, only set and multiset. Still, you can try the two closed hashing libraries.
EDIT 2:
STL hash_map is not standard, it is probably some extension and not portable across compilers.
Are you sure that the hash function you are using is not the bottleneck?
Which time percentage takes the hash function?
Can you do the same test and replace the insert/retrievals by a simple call to the hash.