Use cases of std::multimap - c++11

I don't quite get the purpose of this data structure. What's the difference between std::multimap<K, V> and std::map<K, std::vector<V>>. The same goes for std::multiset- it could just be std::map<K, int> where the int counts the number of occurrences of K. Am I missing something on the uses of these structures?

A counter-example seems to be in order.
Consider a PhoneEntry in an AdressList grouped by name.
int AdressListCompare(const PhoneEntry& p1, const PhoneEntry& p2){
return p1.name<p2.name;
}
multiset<PhoneEntry, AdressListCompare> adressList;
adressList.insert( PhoneEntry("Cpt.G", "123-456", "Cellular") );
adressList.insert( PhoneEntry("Cpt.G", "234-567", "Work") );
// Getting the entries
addressList.equal_range( PhoneENtry("Cpt.G") ); // All numbers
This would not be feasible with a set+count. Your Object+count approach seems to be faster if this behavior is not required. For instance the multiset::count() member states
"Complexity: logarithmic in size +
linear in count."

You could use make the substitutions that you suggest, and extract similar behavior. But the interfaces would be very different than when dealing with regular standard containers. A major design theme of these containers is that they share as much interface as possible, making them as interchangeable as possible so that the appropriate container can be chosen without having to change the code that uses it.
For instance, std::map<K, std::vector<V>> would have iterators that dereference to std::pair<K, std::vector<V>> instead of std::pair<K, V>. std::map<K, std::vector<V>>::Count() wouldn't return the correct result, failing to account for the duplicates in the vector. Of course you could change your code to do the extra steps needed to correct for this, but now you are interfacing with the container in a much different way. You can't later drop in unordered_map or some other map implementation to see it performs better.
In a broader sense, you are breaking the container abstraction by handling container implementation details in your code rather than having a container that handles it's own business.
It's entirely possible that your compiler's implementation of std::multimap is really just a wrapper around std::map<K, std::vector<V>>. Or it might not be. It could be more efficient and friendly to object pool allocation (which vectors are not).
Using std::map<K, int> instead of std::multiset is the same case. Count() would not return the expected value, iterators will not iterate over the duplicates, iterators will dereference to std::pair<k, int> instead of directly to `K.

A multimap or multiset allows you to have elements with duplicate keys.
ie a set is a non-ordered group of elements that are all unique in that {A,B,C} == {B,C,A}

Related

Using unordered_map with key only to store pointers (dismiss value)

I'm implementing an algorithm that checks nodes in a mesh for a certain value. To store information on which node I have already checked I'd like to use an unordered_map with the pointer to the node as a key. I can then simply use umap.find(pointer) to see if the node was already checked and skip it. This way I can accomplish it in O(n) time.
However I don't need to actually store a value for the map. The key itself is enough information. Is std::unordered_map even the right solution then? If so, what should I put for the "value" field maximize performace? I have a 32bit embedded system, so I thought of just putting uint32_t or uint_fast32_t there.
tl;dr:
Is std::unordered_map the right tool to store keys without values?
Will the native hash function work well for pointers? Or would you suggest a different hashin algorithm?
What do I put as "value" for the map if using std::unordered_map to optimize for performance?
Is std::unordered_map the right tool to store keys without values?
I would use a std::unordered_set in these situations.
Will the native hash function work well for pointers?
Yes. It is most likely just a cast from pointer to std::size_t.
What do I put as "value" for the map if using std::unordered_map to optimize for performance?
If you use a std::unordered_set instead, there is no value, only the pointers.
Is std::unordered_map the right tool to store keys without values?
No - std::unordered_set is the one to use when you don't have distinct keys and values.
Will the native hash function work well for pointers? Or would you suggest a different hashin algorithm?
The "native" compiler-supplied hash function probably casts the pointer value to size_t - a kind of identity hash. That may or may not work well depending on the compromises your Standard Library has chosen. GCC and clang use prime numbers of buckets in the hash table, so it will work fine. Visual C++ (and many non-Standard hash table implementations) use powers of two (i.e. 128, 256, 512...). Powers of two are used because it's very fast to map them on to buckets - just AND with a bitwise mask (127, 255, 511) to retain however-many less-significant bits you need. The problem with doing that with pointers is that often the pointed-to objects have some alignment, so they may all be multiples of e.g. 4 or 8. A multiple of 8 always has the three least significant bits set to 0: those bits don't contribute to the randomised placement of the value in a bucket. Instead, only every 8th bucket will receive any share of the elements being hashed. If you have an implementation like this, then you're probably better off using a better hash function. At the least, you could say bit-shift the pointer values right by enough to remove the known zeros.
What do I put as "value" for the map if using std::unordered_map to optimize for performance?
Again, you should use an std::unordered_set, so don't have to worry about a value.

List storing pointers or "plain object"

I am designing a class which tracks the user manipulations in a software in order to restore previous application states (i.e. CTRL+Z/CTRL+Y). I symply wanted to clarify something about performances.
I am using the std::list container of the STL. This list is not meant to contain really huge objects, but a significant number. Should I use pointers or not?
For instance, here is the kinds of objects which will be stored:
struct ImagesState
{
cv::Mat first;
cv::Mat second;
};
struct StatusBarState
{
std::string notification;
std::string algorithm;
};
For now, I store the whole thing under the form of struct pointers, such as:
std::list<ImagesStatee*> stereoImages;
I know (I think) that new and delete operators are time consuming, but I don't want to encounter a stack overflow with "plain object". Is it a bad design?
If you are using a list, i would suggest not to use the pointer. The list items are on the heap anyway and the pointer just adds an unnecessary layer of indirection.
If you are after performance, using std::list is most likely not the best solution. Using std::vector might boost your performance significantly since the objects are better for your caches.
Even in an vector, the objects would lie on the heap and therefore the pointer are not needed (they would even harm you more than with a list). You only have to care about them if you make an array on your stack.
like so:
Type arrayName[REALLY_HUGE_NUMBER]

What is the fastest way to Initialize a priority_queue from an unordered_set

It is said in construction of a priority queue , option (12):
template< class InputIt >
priority_queue( InputIt first, InputIt last,
const Compare& compare = Compare(),
Container&& cont = Container() );
But I don't know how ot use this.
I have a non-empty std::unordered_set<std::shared_ptr<MyStruct>> mySet, and I want to convert it to a priority queue. I also create a comparator struct MyComparator:
struct MyComparator {
bool operator()(const std::shared_ptr<myStruct>& a,
const std::shared_ptr<myStruct>& b){...}
};
Now how can I construct a new priority_queue myQueue in a better way? I used the following and it works:
std::priority_queue<std::shared_ptr<MyStruct>, std::deque<std::shared_ptr<MyStruct>, MyComparator>
myQueue(mySet.begin(), mySet.end());
I benchmarked both vector and deque, and I find deque will outperform vector when the size is relatively large (~30K).
Since we have already known the size of mySet, I should create the deque with that size. But how can I create this priority_queue with my own comparator and predefined deque, say myDeque?
Since you have already determined that std::deque gives you better performance than std::vector, I don't think there is much more you can do in terms of how you construct the priority_queue. As you have probably seen, there is no std::deque::reserve() method, so it's simply not possible to create a deque with memory allocated ahead of time. For most use cases this is not a problem, because the main feature of deque vs vector is that deque does not need to copy elements as new ones are inserted.
If you are still not achieving the performance you desire, you might consider either storing raw pointers (keeping your smart pointers alive outside), or simply changing your unordered_map to a regular map and relying on the ordering that container provides.

Dynamically Allocated Jagged Arrays with Smart Pointers

So I've recently become familiar with (and fell in love with) boost and c++11 smart pointers. It makes memory management SO much easier. And, on top of all that, they can usually still work with legacy code (through the use of the get call)
However, the big hole I keep running into is multidimensional jagged arrays. The correct way to do it is to have a boost::scoped_array<boost::scoped_array<double>> or vector<vector<double>>, which will clean up nicely. However, you cannot get a double** out of this easily to send to legacy code.
Is there any way to do this, or am I stuck with non-smart jagged arrays?
I'd start with std::vector<std::vector<double>> for storage, unless the structure was highly static.
To produce my array-of-arrays, I'd produce a std::vector<double*> via transformation of my above storage, using syntax like transform_to_vector( storage, []( std::vector<double>& v ) { return v.data(); } ) (transform_to_vector left as an exercise to the reader).
Keeping the two in sync would be a simple matter of wrapping it in a small class.
If the jagged array is relatively fixed in size, I'd take a std::vector<std::size_t> to create my buffer (or maybe a std::initializer_list<std::size_t> -- actually, a template<typename Container>, and I'd just for( : ) over it twice, and let the caller pick what container it provided me), then create a single std::vector<double> with the sum of the sizes, then build a std::vector<double*> at the dictated offsets.
Resizing this gets expensive, which is a disadvantage.
A nice property of using std::vector is that newer APIs have full access to the pretty begin and end values. If you have a single large buffer, you can expose a range view of the sub arrays to new code (a structure containing a double* begin() and double* end(), and while we are at it a double& operator[] and std::size_t size() const { return end()-begin(); }), so they can bask in the glory of full on C++ container-style views while keeping C compatibility for legacy interfaces.
If you're working in C++11, you should probably work with unique_ptr<T[]> rather than scoped_array<T>. It can do everything that scoped_array can, and then some.
If you want a rectangular array, I recommend using a unique_ptr<double[]> to hold the main data and a unique_ptr<double*[]> to hold the row bases. This would work something like this:
unique_ptr<double[]> data{ new double[5*3] };
unique_ptr<double*[]> rows{ new double*[3] };
rows[0] = data.get();
for ( size_t i = 1; i!=5; ++i )
rows[i] = rows[i-1]+3;
Then you can pass rows.get() to a function taking double**. This approach can work for a non-rectangular array as well, provided the geometry of the array is known at array creation time so that you can allocate all the data at once and point rows to the proper offsets. (It may not be as straightforward as a simple loop, though.)
This will also give you better locality of reference and memory usage, since you only perform two allocations. All of your data will be stored together in memory and there won't be additional overhead for the separate allocations.
If you want to change the geometry of the jagged array after creating it, you will need to come up with a principled way of managing the storage for this solution to be applicable. However, since changing the geometry using scoped_array is awkward (requiring specific uses of swap()), I wouldn't be surprised if this isn't an issue for you.
(Note that this approach can work with scoped_array as well as unique_ptr<[]>; I'm simply illustrating it using unique_ptr since we're in C++11 now.)

Performance of std::vector<Test> vs std::vector<Test*>

In an std::vector of a non POD data type, is there a difference between a vector of objects and a vector of (smart) pointers to objects? I mean a difference in the implementation of these data structures by the compiler.
E.g.:
class Test {
std::string s;
Test *other;
};
std::vector<Test> vt;
std::vector<Test*> vpt;
Could be there no performance difference between vt and vpt?
In other words: when I define a vector<Test>, internally will the compiler create a vector<Test*> anyway?
In other words: when I define a vector, internally will the compiler create a vector anyway?
No, this is not allowed by the C++ standard. The following code is legal C++:
vector<Test> vt;
Test t1; t1.s = "1"; t1.other = NULL;
Test t2; t2.s = "1"; t2.other = NULL;
vt.push_back(t1);
vt.push_back(t2);
Test* pt = &vt[0];
pt++;
Test q = *pt; // q now equal to Test(2)
In other words, a vector "decays" to an array (accessing it like a C array is legal), so the compiler effectively has to store the elements internally as an array, and may not just store pointers.
But beware that the array pointer is valid only as long as the vector is not reallocated (which normally only happens when the size grows beyond capacity).
In general, whatever the type being stored in the vector is, instances of that may be copied. This means that if you are storing a std::string, instances of std::string will be copied.
For example, when you push a Type into a vector, the Type instance is copied into a instance housed inside of the vector. The copying of a pointer will be cheap, but, as Konrad Rudolph pointed out in the comments, this should not be the only thing you consider.
For simple objects like your Test, copying is going to be so fast that it will not matter.
Additionally, with C++11, moving allows avoiding creating an extra copy if one is not necessary.
So in short: A pointer will be copied faster, but copying is not the only thing that matters. I would worry about maintainable, logical code first and performance when it becomes a problem (or the situation calls for it).
As for your question about an internal pointer vector, no, vectors are implemented as arrays that are periodically resized when necessary. You can find GNU's libc++ implementation of vector online.
The answer gets a lot more complicated at a lower than C++ level. Pointers will of course have to be involved since an entire program cannot fit into registers. I don't know enough about that low of level to elaborate more though.

Resources