Will a vector move if I don't call push_back? - c++11

I'm working with a function that takes as input an array of structures of the form
struct Info
{
const char* something;
// more stuff
};
Since I didn't want to worry about cleaning up the memory in the char*, my thinking was that I'd
Create a std::vector<std::string> of all the something
loop over the vector and create the Info structures using std::vector::c_str()
feed the structures to the function.
That way the vector owns all the strings and I don't have to worry about memory management. My only concern is that the vector could decide to move the strings in memory at some point, which would invalidate the pointers I'm getting from c_str().
So my question: assuming I don't add anything to the vector, can I count on the pointers to stay where they are? As a bonus, can I combine steps 1 and 2 into a single loop assuming I call reserve before so that the vector doesn't have to move?

Related

Partially std::move a vector? Or how to split without new memory allocation?

When we move std::vector we just steal its content. So this code:
std::vector<MyClass> v{ std::move(tmpVec) };
will not allocate new memory, will not call any of constructors of MyClass.
But what if I want to split a temporary vector? In theory, I could steal the content as we did before and distribute it among new vectors. In practice I can't do this. The best so far solution I found is to use std::move() from <algorithm> header. But here the operator new will be called for every new vector. Additionally, move constructor (if available) will be called for every element we move.
What else can I do (c++17 counts)?
In theory, I could steal the content as we did before and distribute it among new vectors.
No, you cannot.
A memory allocation cannot be broken up into multiple memory allocations. At least, not without doing multiple memory allocations, then copying/moving the elements from the original into those separate pieces.
You cannot create separate vectors that have different storage without actually copying/moving the elements to those different memory buffers. You can of course take separate ranges of that vector and do whatever you can with such ranges (iterator/pointer pairs, gsl::span, etc). But each range would always be referencing elements ultimately owned by the source vector; they cannot independently own subranges of a vector.
You can write a span class that stores two pointers, and does not own the data between them. It can have many vector-like operations on it.
It should also support slicing itself (without allocation) into sub components.
You can write an shared_span class that has both those two pointers, and a shared_ptr which represents (possibly shared) ownership of the underlying buffer. It should support the operations of span, except functions returning span (like without_front(std::size_t count=1)) should instead return shared_span (with shared ownership).
You can write a move constructor from vector to shared_span easily. You may even be able to write a function from shared_span to vector with a special allocator that doesn't allocate until it grows. Making that fully portable would be very difficult.
If it is possible (I am uncertain), you could take a std::vector, move its storage into a shared_ptr<std::vector>, feed that to an allocator, build two std::vector<T, special_allocator>s that use that memory, and do what you want.
But you could just replace your request for vector doing this with code that consume a shared_span. shared_span could even have a concept of extra "dead" memory before/after the buffer it is using, giving it performance approaching std::vector.
There is a span in the gsl library you could possibly use. I am unaware of a publicly available shared_span.

List storing pointers or "plain object"

I am designing a class which tracks the user manipulations in a software in order to restore previous application states (i.e. CTRL+Z/CTRL+Y). I symply wanted to clarify something about performances.
I am using the std::list container of the STL. This list is not meant to contain really huge objects, but a significant number. Should I use pointers or not?
For instance, here is the kinds of objects which will be stored:
struct ImagesState
{
cv::Mat first;
cv::Mat second;
};
struct StatusBarState
{
std::string notification;
std::string algorithm;
};
For now, I store the whole thing under the form of struct pointers, such as:
std::list<ImagesStatee*> stereoImages;
I know (I think) that new and delete operators are time consuming, but I don't want to encounter a stack overflow with "plain object". Is it a bad design?
If you are using a list, i would suggest not to use the pointer. The list items are on the heap anyway and the pointer just adds an unnecessary layer of indirection.
If you are after performance, using std::list is most likely not the best solution. Using std::vector might boost your performance significantly since the objects are better for your caches.
Even in an vector, the objects would lie on the heap and therefore the pointer are not needed (they would even harm you more than with a list). You only have to care about them if you make an array on your stack.
like so:
Type arrayName[REALLY_HUGE_NUMBER]

Dynamically Allocated Jagged Arrays with Smart Pointers

So I've recently become familiar with (and fell in love with) boost and c++11 smart pointers. It makes memory management SO much easier. And, on top of all that, they can usually still work with legacy code (through the use of the get call)
However, the big hole I keep running into is multidimensional jagged arrays. The correct way to do it is to have a boost::scoped_array<boost::scoped_array<double>> or vector<vector<double>>, which will clean up nicely. However, you cannot get a double** out of this easily to send to legacy code.
Is there any way to do this, or am I stuck with non-smart jagged arrays?
I'd start with std::vector<std::vector<double>> for storage, unless the structure was highly static.
To produce my array-of-arrays, I'd produce a std::vector<double*> via transformation of my above storage, using syntax like transform_to_vector( storage, []( std::vector<double>& v ) { return v.data(); } ) (transform_to_vector left as an exercise to the reader).
Keeping the two in sync would be a simple matter of wrapping it in a small class.
If the jagged array is relatively fixed in size, I'd take a std::vector<std::size_t> to create my buffer (or maybe a std::initializer_list<std::size_t> -- actually, a template<typename Container>, and I'd just for( : ) over it twice, and let the caller pick what container it provided me), then create a single std::vector<double> with the sum of the sizes, then build a std::vector<double*> at the dictated offsets.
Resizing this gets expensive, which is a disadvantage.
A nice property of using std::vector is that newer APIs have full access to the pretty begin and end values. If you have a single large buffer, you can expose a range view of the sub arrays to new code (a structure containing a double* begin() and double* end(), and while we are at it a double& operator[] and std::size_t size() const { return end()-begin(); }), so they can bask in the glory of full on C++ container-style views while keeping C compatibility for legacy interfaces.
If you're working in C++11, you should probably work with unique_ptr<T[]> rather than scoped_array<T>. It can do everything that scoped_array can, and then some.
If you want a rectangular array, I recommend using a unique_ptr<double[]> to hold the main data and a unique_ptr<double*[]> to hold the row bases. This would work something like this:
unique_ptr<double[]> data{ new double[5*3] };
unique_ptr<double*[]> rows{ new double*[3] };
rows[0] = data.get();
for ( size_t i = 1; i!=5; ++i )
rows[i] = rows[i-1]+3;
Then you can pass rows.get() to a function taking double**. This approach can work for a non-rectangular array as well, provided the geometry of the array is known at array creation time so that you can allocate all the data at once and point rows to the proper offsets. (It may not be as straightforward as a simple loop, though.)
This will also give you better locality of reference and memory usage, since you only perform two allocations. All of your data will be stored together in memory and there won't be additional overhead for the separate allocations.
If you want to change the geometry of the jagged array after creating it, you will need to come up with a principled way of managing the storage for this solution to be applicable. However, since changing the geometry using scoped_array is awkward (requiring specific uses of swap()), I wouldn't be surprised if this isn't an issue for you.
(Note that this approach can work with scoped_array as well as unique_ptr<[]>; I'm simply illustrating it using unique_ptr since we're in C++11 now.)

Performance of std::vector<Test> vs std::vector<Test*>

In an std::vector of a non POD data type, is there a difference between a vector of objects and a vector of (smart) pointers to objects? I mean a difference in the implementation of these data structures by the compiler.
E.g.:
class Test {
std::string s;
Test *other;
};
std::vector<Test> vt;
std::vector<Test*> vpt;
Could be there no performance difference between vt and vpt?
In other words: when I define a vector<Test>, internally will the compiler create a vector<Test*> anyway?
In other words: when I define a vector, internally will the compiler create a vector anyway?
No, this is not allowed by the C++ standard. The following code is legal C++:
vector<Test> vt;
Test t1; t1.s = "1"; t1.other = NULL;
Test t2; t2.s = "1"; t2.other = NULL;
vt.push_back(t1);
vt.push_back(t2);
Test* pt = &vt[0];
pt++;
Test q = *pt; // q now equal to Test(2)
In other words, a vector "decays" to an array (accessing it like a C array is legal), so the compiler effectively has to store the elements internally as an array, and may not just store pointers.
But beware that the array pointer is valid only as long as the vector is not reallocated (which normally only happens when the size grows beyond capacity).
In general, whatever the type being stored in the vector is, instances of that may be copied. This means that if you are storing a std::string, instances of std::string will be copied.
For example, when you push a Type into a vector, the Type instance is copied into a instance housed inside of the vector. The copying of a pointer will be cheap, but, as Konrad Rudolph pointed out in the comments, this should not be the only thing you consider.
For simple objects like your Test, copying is going to be so fast that it will not matter.
Additionally, with C++11, moving allows avoiding creating an extra copy if one is not necessary.
So in short: A pointer will be copied faster, but copying is not the only thing that matters. I would worry about maintainable, logical code first and performance when it becomes a problem (or the situation calls for it).
As for your question about an internal pointer vector, no, vectors are implemented as arrays that are periodically resized when necessary. You can find GNU's libc++ implementation of vector online.
The answer gets a lot more complicated at a lower than C++ level. Pointers will of course have to be involved since an entire program cannot fit into registers. I don't know enough about that low of level to elaborate more though.

Sorting a Vector Containing Pointer to Struct VS Struct

I am sorting a large vector containing structs using heapsort and the runtime of my code is quite slow. Instead of storing a struct in the vector, I want to store a pointer to the struct now.
My question is, under the hood, what is actually happening when I sort things and would it be faster if I store a pointer to a struct as opposed to storing the struct itself?
Certainly yes. Storing objects as values in stl containers will result in running copy constructor of the stored object.
In general, for performance, it is better to store pointers instead. However you will need to be more carefull about leaks and exception safety once you are using pointers instead.
Anyway, the simplest thing happening on sorting is the swap algorithm. Which involves copy constructing:
void swap(T & a, T & b)
{
T c = a; // copy constructing
a = b; // copy constructing
b = c; // copy constructing
}
It is definitelly much more faster to copy pointer instead of bigger objects.

Resources