Efficient way to grow a large std::vector

Efficient way to grow a large std::vector - c++14

I'm filling a large (~100 to 1000MiB) std::vector<char> with data using std::ifstream::read(). I know the size of this read in advance, so I can construct the vector to that size.
Afterwards however, I keep reading from the file until I find a particular delimiter. Any data up to that delimiter is to be added to the vector. It's 500KiB at worst, usually much less.
Considering the vector's size, I'm wondering if this causes an expensive growth (and reallocation). Memory is an issue here, as the vector's size is to remain fairly close to that of its construction.
Is it a good solution to extend the vector's capacity slightly beyond its initial size using std::vector::reserve, so that the small amount of extra data doesn't require it to grow? If so, it's probably best to construct the vector empty, reserve the capacity and then resize it to its initial size, right?

Related

DirectX11 - Buffer size of instanced vertices, with various size

With DirectX/C++, suppose you are drawing the same model many times to the screen. You can do this with DrawIndexedInstanced(). You need to set the size of the instance buffer when you create it:
D3D11_BUFFER_DESC_instance.ByteWidth = sizeof(struct_with_instance_data)* instance_count;
If the instance_count can vary between a low and high value, is it customary to create the buffer with the max value (max_instance_count)? And only draw what is required.
Wouldn't that permanently use a lot of memory?
Recreating the buffer is a slow solution?
What are good methods?
Thank you.

All methods have pros and cons.
Create max.count — as you pointed out you’ll consume extra memory.
Create some initial count, and implement exponential growth — you can OOM in runtime, also growing large buffers may cause spikes in the profiler.
There’s other way which may or may not work for you depending on the application. You can create reasonably large fixed-size buffer, to render more instances call DrawIndexedInstanced multiple times in a loop, replacing the data in the buffer between calls. Will work well if the source data is generated in runtime from something else, you’ll need to rework that part to produce fixed-size (except the last one) batches instead of the complete buffer. Won’t work if the data in the buffer needs to persist across frames, e.g. if you update it with a compute shader.

Hashtable for a system with memory constraints

I have read about the variants of hashtable but it is not clear to me which one is more appropriate for a system that is low on memory (we have a memory constraint limit).
Linear/Quadratic probing works well for sparse tables.
I think Double hashing is the same as Quadratic in this aspect.
External chaining does not have issue with clustering.
Most textbooks I have checked seem to assume that a extra space will always be available but practically in most example implementations I have seen since the hashtable is never halved take much more space than really needed.
So which variant of a hashtable is most efficient when we want to make the best usage of memory?
Update:
So my question is not only about the size of the buckets. My understanding is that both the size of the buckets and the performance under load is what matters. Because if the size of the bucket is small but the table degrades on 50% load then this means we need to resize to a larger table often.

See this variant of Cukoo Hashing.
This will require from you more hash functions, but, it makes sense - you need to pay something for the memory savings.

Large 3D volume bad_alloc

I'm developing an application that creates a 3D Voronoi Diagram created from a 3D point cloud using boost multi_array allocated dynamically to store the whole diagram.
One of the test cases I'm using requires a large amount of memory (around [600][600][600]), which is over the limit allowed and results in bad_alloc.
I already tried to separate the diagram in small pieces but also it doesn't work, as it seems that the total memory is already over the limits.
My question is, how can I work with such large 3D volume under the PC constraints?
EDIT
The Element type is a struct as follows:
struct Elem{
int R[3];
int d;
int label;
}
The elements are indexed in the multiarray based on their position in the 3D space.
The multiarray is constructed by setting specific points on the space from a file and then filling the intermediate spaces by passing a forward and a backward mask over the whole space.

You didn't say how do you get all your points. If you read them from a file, then don't read them all. If you compute them, then you can probably recompute them as needed. In both cases you can implement some cache that will store most often used ones. If you know how your algorithm will use the data, then you can predict which values will be needed next. You can even do this in a different thread.
The second solution is to work on your data so they fit in your RAM. You have 216 millions of points, but we don't know what's the size of a point. They are 3D but do they use floats or doubles? Are they a classes or simple structs? Do they have vtables? Do you use Debug build? (in Debug objects may be bigger). Do you allocate entire array at the beginning or incrementally? I believe there should be no problem storing 216M of 3D points on current PC but it depends on answers for all those questions.
The third way that comes to my mind is to use Memory Mapped Files, but i never used them personally.
Here are few things to try:
Try to allocate in different batches, like: 1 * 216M, 1k * 216k, 1M * 216 to see how much memory can you get.
Try to change boost map to std::vector and even raw void* and compare maximum RAM you can get.

You didn't mention the element type. Give the element is a four-byte float, a 600*600*600 matrix only takes about 820M bytes, which is not very big actually. I'd suggest you to check your operating system's limit on memory usage per process. For Linux, check it with ulimit -a.
If you really cannot allocate the matrix in memory, create a file of desired size on disk map it to memory using mmap. Then pass the memory address returned by mmap to boost::multi_array_ref.

Heap Type Implementation

I was implementing a heap sort and I start wondering about the different implementations of heaps. When you don need to access the elements by index(like in a heap sort) what are the pros and cons of implementing a heap with an array or doing it like any other linked data structure.
I think it's important to take into account the memory wasted by the nodes and pointers vs the memory wasted by empty spaces in an array, as well as the time it takes to add or remove elements when you have to resize the array.
When I should use each one and why?

As far as space is concerned, there's very little issue with using arrays if you know how much is going into the heap ahead of time -- your values in the heap can always be pointers to the larger structures. This may afford for better cache localization on the heap itself, but you're still going to have to go out someplace to memory for extra data. Ideally, if your comparison is based on a small morsel of data (often just a 4 byte float or integer) you can store that as the key with a pointer to the full data and achieve good cache coherency.
Heap sorts are already not particularly good on cache hits throughout traversing the heap structure itself, however. For small heaps that fit entirely in L1/L2 cache, it's not really so bad. However, as you start hitting main memory performance will dive bomb. Usually this isn't an issue, but if it is, merge sort is your savior.
The larger problem comes in when you want a heap of undetermined size. However, this still isn't so bad, even with arrays. Anymore, in non-embedded environments with nice, pretty memory systems growing an array with some calls (e.g. realloc, please forgive my C background) really isn't all that slow because the data may not need to physically move in memory -- just some address pointer magic for most of it. Added to the fact that if you use a array-size-doubling strategy (array is too small, double the size in a realloc call) you're still ending up with an O(n) amortized cost with relatively few reallocs and at most double wasted space -- but hey, you'd get that with linked lists anyways if you're using a 32-bit key and 32-bit pointer.
So, in short, I'd stick with arrays for the smaller base data structures. When the heap goes away, so do the pointers I don't need anymore with a single deallocation. However, it's easier to read pointer-based code for heaps in my opinion since dealing with the indexing magic isn't quite as straightforward. If performance and memory aren't a concern, I'd recommend that to anyone in a heartbeat.

Best heuristic for malloc

Consider using malloc() to allocate x bytes of memory in a fragmented heap. Assume the heap has multiple contiguous locations of size greater than x bytes.
Which is the best (that leads to least heap wastage) heuristic to choose a location among the following?
Select smallest location that is bigger than x bytes.
Select largest location that is bigger than x bytes.
My intuition is smallest location that is bigger than x bytes. I am not sure which is the best in practice.
No, this is not any assignment question. I was reading this How do malloc() and free() work? and this looks like a good follow up question to ask.

In a generic heap where allocations of different sizes are mixed, then of the two I'd go for putting the allocation in the smallest block that can accomodate it (to avoid reducing the size of the largest block we can allocate before we need to).
There are other ways of implementing a heap however that would make this question less relevant (such as the popular dlmalloc by Doug Lea - where it pools blocks of similar sizes to improve speed and reduce overall fragmentation).
Which solution is best always comes down to the way the application is performing its memory allocations. If you know an applications pattern in advance you should be able to beat the generic heaps both in size and speed.

It's better to select the smallest location. Think about future malloc requests. You don't know what they'll be, and you want to satisfy as many requests as you can. So it's better to find a location that exactly fits your needs, so that bigger requests can be satisfied in the future. In other words, selecting the smallest location reduces fragmentation.

The heuristics you listed are used in the Best Fit and Worst Fit algorithms, respectively. There is also the First Fit algorithm which simply takes the first space it finds that is large enough. It is approximately as good as Best Fit, and much faster.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio