Millions of searches in unordered_map, runtime hogger - c++11

I have around 5000 strings of size (length in range from 50-80 mostly). Currently i create an unordered map and push these keys and during execution i access (using map'
s find function) them 10-100 million times. I did some profiling around this search, seems to be the runtime hogger.
I searched for other better and faster search options, but somehow did not find anything substantial.
Do anyone have idea about, how to make it faster, open for custom made container also. I did try std::map, but did not help. Do share link if anyone have.
Also one more point to add, i also modify values against some keys at runtime also, but not that many times. Mostly it's search.

Having considered a similar question to yours C++ ~ 1M look-ups in unordered_map with string key works much slower than .NET code, I would guess you have run into the issue caused by hash function used by std::unordered_map. For strings with length of 50-80 that could lead to a lot of collisions and this would significantly degrade look-up performance.
I would suggest you to use some custom hash function for the std::unordered_map. Or you could give A fast, memory efficient hash map for C++ a try.

Related

Efficient Data Structures in Maple

I'm working with a large amount of data in Maple and I need to know the most efficient way to store it. I started with lists, but I quickly learned how inefficient those are so I have since replaced them. Now I'm using a mixture of Arrays (for structures with a fixed length) and tables (for structures with variable length), but my code actually runs significantly slower than it did when I was only using lists.
So here are my questions:
What is the most efficient data structure to use in Maple for a static-length set of data? for a variable-length set?
Are there any "gotchas" I need to be aware of when using these structures as parameters in a recursive proc? If using Arrays or tables, does each one need to be copied for each iteration to avoid clobbering data?
I think I can wrap this one up now. I made a few performance improvements, mostly just small tweaks that only helped a bit, but I did manage a big improvement by removing as many instances of the copy command as I could (I used it on arrays and tables). It turns out this is what was causing my array/table implementation to be slower than my list-only implementation. But the code still didn't run as fast as I needed, so I re-wrote it in C#. That's probably not the best solution for "how to improve Maple efficiency", but it sure does run a lot faster now.

What are appropriate applications for a linked (doubly as well) list?

I have a question about fundamentals in data structures.
I understand that array's access time is faster than a linked list. O(1)- array vs O(N) -linked list
But a linked list beats an array in removing an element since there is no shifting needing O(N)- array vs O(1) -linked list
So my understanding is that if the majority of operations on the data is delete then using a linked list is preferable.
But if the use case is:
delete elements but not too frequently
access ALL elements
Is there a clear winner? In a general case I understand that the downside of using the list is that I access each node which could be on a separate page while an array has better locality.
But is this a theoretical or an actual concern that I should have?
And is the mixed-type i.e. create a linked list from an array (using extra fields) good idea?
Also does my question depend on the language? I assume that shifting elements in array has the same cost in all languages (at least asymptotically)
Singly-linked lists are very useful and can be better performance-wise relative to arrays if you are doing a lot of insertions/deletions, as opposed to pure referencing.
I haven't seen a good use for doubly-linked lists for decades.
I suppose there are some.
In terms of performance, never make decisions without understanding relative performance of your particular situation.
It's fairly common to see people asking about things that, comparatively speaking, are like getting a haircut to lose weight.
Before writing an app, I first ask if it should be compute-bound or IO-bound.
If IO-bound I try to make sure it actually is, by avoiding inefficiencies in IO, and keeping the processing straightforward.
If it should be compute-bound then I look at what its inner loop is likely to be, and try to make that swift.
Regardless, no matter how much I try, there will be (sometimes big) opportunities to make it go faster, and to find them I use this technique.
Whatever you do, don't just try to think it out or go back to your class notes.
Your problem is different from anyone else's, and so is the solution.
The problem with a list is not just the fragmentation, but mostly the data dependency. If you access every Nth element in array you don't have locality, but the accesses may still go to memory in parallel since you know the address. In a list it depends on the data being retrieved, and therefore traversing a list effectively serializes your memory accesses, causing it to be much slower in practice. This of course is orthogonal to asymptotic complexities, and would harm you regardless of the size.

What's the performance difference between DBI's fetchall_hashref and fetchall_arrayref?

I am writing some Perl scripts to manipulate large amounts (in total about 42 million rows, but it won't be done in one hit) of data in two PostgreSQL databases.
For some of my queries it makes good sense to use fetchall_hashref because I have synthetic keys. However, in other instances, I'm going to have use an array of three columns as the unique key.
This has got me wondering about performance differences between fetchall_arrayref and fetchall_hashref. I know that in both cases everything is going in to memory so selecting several GB of data probably isn't a good idea but other than that there appears to be very little guidance in the documentation when it comes to performance.
My googling has been unsuccessful so if anyone can point me in the direction of some general performance studies I'd be grateful.
(I know I could benchmark this myself but unfortunately for dev purposes I don't have access to a machine which has identical hardware to production which is why I'm looking for general guidelines or even best practices).
Most of the choices between fetch methods depend on what format you want the data to end up in and how much of the work for that you want DBI to do for you.
My recollection is that iterating with fetchrow_arrayref and using bind_columns is the fastest (least DBI overhead) way to read through returned data.
First question is whether you really need to use a fetchall in the first place. If you don't need all 42 million rows in memory at once, then don't read them all in at once! bind_columns and fetchrow_arrayref are generally the way to go whenever possible, as ysth already pointed out.
Assuming that fetchall really is needed, my gut intuition is that fetchall_arrayref will be marginally faster, since an array is a simpler data structure and doesn't need to compute hashes of the inserted keys, but the savings in time would be dwarfed by database read times, so it's unlikely to be significant.
Memory requirements are another matter entirely, though. The structure returned by fetchall_hashref is a hash of id => row, with each row being represented as a hash of field name => field value. If you get 42 million rows, that means your list of field names is repeated in each of 42 million sets of hash keys... That's going to require a good deal more memory to store than the array of arrays of arrays returned by fetchall_arrayref. (Unless DBI is doing some magic with tie to optimize the fetchall_hashref structure, I suppose.)

Using boost unordered map

Guys, I am using dynamic programming approach to solve a problem. Here is a brief overview of the approach
Each value generated is identified using 25 unique keys.
I use the boost::hash_combine to generate the seed for the hash table using these 25 keys.
I store the values in a hash table declared as
boost::unordered_map<Key_Object, Data_Object, HashFunction> hashState;
I did a time profiling on my algorithm and found that nearly 95% of the run time is spent towards retrieving/inserting data into the hash table.
These were the details of my hash table
hashState.size() 1880
hashState.load_factor() 0.610588
hashState.bucket_count() 3079
hashState.max_size() 805306456
hashState.max_load_factor() 1
hashState.max_bucket_count() 805306457
I have the following two questions
Is there anything which I can do to improve the performance of the Hash Table's insert/retrieve operations?
C++ STL has hash_multimap which would also suit my requirement. How does boost libraries unordered_map compare with hash_multimap in terms of insert/retrieve performance.
If your hash function is not the culprit, the best you can do is probably using a different map implementation. Since your keys are quite large, using unordered_map from Boost.Intrusive library might be the best option. Alternatively, you could try closed hashing: Google SparseHash or MCT, though profiling is certainly needed because closed hashing is recommended when elements are small enough. (SparseHash is more established and well tested, but MCT doesn't need those set_empty()/set_deleted() methods).
EDIT:
I just noticed there is no intrusive map in the Boost library I mentioned, only set and multiset. Still, you can try the two closed hashing libraries.
EDIT 2:
STL hash_map is not standard, it is probably some extension and not portable across compilers.
Are you sure that the hash function you are using is not the bottleneck?
Which time percentage takes the hash function?
Can you do the same test and replace the insert/retrievals by a simple call to the hash.

Is it worthwhile to use a bit vector/array rather than a simple array of bools?

When I want an array of flags it has typically pained me to use an entire byte (or word) to store each one, as would be the result if I made an array of bools or some other numeric type that could be set to 0 or 1. But now I wonder whether using a structure that is more space-efficient is worth it given the (albeit hopefully very slight) additional overhead of shifting and bit testing.
In my company we use Rogue Wave tools (though hopefully not for much longer) and it's their RWBitVec that I've used for this purpose up until now.
It's mostly about saving memory. If your array of bools is large enough that a 8x improvement on storage space is meaningful, then by all means, use a bitarray.
Note that the memory access is pretty expensive compared to the shift/and, so the bitarray approach is slightly faster than the array-of-chars. Basically it comes down to memory versus programmer time. Remember that premature optimization is a waste of time. I'd use whichever approach is the easiest to develop, and then refactor only after it shows that it's a primary performance bottleneck.
Don't use vector<bool>, it's not really a Container:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=98
Use std::bitset (for fixed size bitsets) and boost::dynamic_bitset (for resizeable ones) where appropriate. They aren't Containers either, but they don't look as if they ought to be, so are less likely to cause confusion.
Whether the trade-off is worth it depends, obviously, on how big the arrays are in your program. I think you're right that the overhead of bit access is usually negligible, but if the memory overhead is negligible too then you've nothing to go on there either.
bitsets have the advantage that they do exactly what they say on the tin - none of this "declare an array of chars/ints, but the only legal values are 0 and 1" nonsense. Your code will read about the same as if you'd used an array.
I wrote some code once to unpack a bitmap image line into separate bytes per pixel, then pack it back again after processing. For the code I was benchmarking, it was actually faster to do it that way than to work at the bit level.
I've used a bit array for indexing a HUGE tree. The algorithm was:
Check bitarray if entry exists
if entry doesn't exists
return null
else do binary search in tree
return value
The advantage is that the Tree has huge enough that searching for a non existent entry would cause several cache misses before completing. Thus the algorithm was taking longer or not depending on the existence of the value.
However adding that initial bit array search meant I'd reduce cache misses, and would avoid searching the tree at all if the answer wasn't there. By adding this extra step the algorithm became much more robust (actual performance time on a Computer, became nearly linear although the Big-O would say differently), and overall performance increased by an order of magnitude.
Like they say sometimes taking hardware into consideration is more important than the "ideal" mathematical algorithm.
Modern computers have barrel shifters so that a shift of any number of bits up to 31 takes a few cycles (less than many other instructions). Compilers take advantage of this and bit operations are not only space efficient but in most cases time efficient.
But it really depends on how you're using and testing the bits - there are some inefficient methods that would make using a whole integer faster.
-Adam
Is it worth it? Only if you know that you have a problem with memory usage.
But unless you're either:
Working on an embedded processor with very limited resources, or
Storing an astronomical number of bools
then the answer is no. You'll have to work somewhat harder to achieve the same level of readability in your source by using a bitmap than you will using bools, and unless you're operating under either of the previous two conditions you'll likely find that it doesn't make any noticeable difference to your memory footprint.

Resources