Ring buffer data structure - data-structures

The package GNAT.Bounded_Buffers provides a simple implementation of a bounded buffer but unfortunately insertion into the buffer blocks until space is available in the buffer. I am also missing an entry to extract all data stored in the buffer at once.
Before I start to implement my own ring buffer package with the missing functionality I would like to know if such a package may be already available?

Related

VirtualAlloc a writeable, unbacked "throw-away" garbage range?

Is it possible in Win32 to get a writeable (or write-only) range of "garbage" virtual address space (i.e., via VirtualAlloc, VirtualAlloc2, VirtualAllocEx, or other) that never needs to be persisted, and thus ideally is never backed by physical memory or pagefile?
This would be a "hole" in memory.
The scenario is for simulating a dry-run of a sequential memory writing operation just in order to obtain the size that it actually consumes. You would be able to use the exact same code used for actual writing, but instead pass in an un-backed "garbage" address range that essentially ignores or discards anything that's written to it. In this example, the size of the "void" address range could be 2⁶⁴ = 18.4ᴇʙ (why not? it's nothing, after all), and all you're interested in is the final value of an advancing pointer.
[edit:] see comments section for the most clever answer. Namely: map a single 4K page multiple times in sequence, tiling the entire "empty" range
This isn't possible. If you have code that attempts to write to memory then the virtual memory needs to be backed with something.
However, if you modified your code to use the stream pattern then you could provide a stream implementation that ignored the write and just tracked the size.

glDrawArrays - guaranteed to copy?

Following scenario:
I have a buffer of vertices in system memory whose address I submit to glDrawArrays via glVertexAttribPointer every frame. The rendering API is OpenGL ES 3.0.
Now my question:
Can I assume, that glDrawArrays will create a full copy of the buffer on every draw call? Or is it possible that it will draw from the buffer directly if I'm on a shared memory platform?
Regards
For all porpoises you need to treat it as "that it will draw from the buffer directly".
When setting the pointer you do not tell the openGL about any sizes of the buffer so at that point there is nothing set at all but some pointer on the GPU or rather an integer value since the same procedure is used when using GPU vertex buffers.
So the data size is only determined when calling draw where you say how many vertices are used from the buffer. At this point I would not expect openGL to copy the data into its internal memory but even if it does it is more like a temporary data cache, nothing more. These data are not reused through render calls and access to data must be done again.
You need to persist the data in your memory and they must be accessible. If they are no longer owned you may make the draw call to draw garbage or even receive a crash if you have no access to the memory you inserted. So for instance setting a pointer from a method/function which allocates the data in stack is generally a no-no.
Can I assume, that glDrawArrays will create a full copy of the buffer
on every draw call?
Graphics drivers try very hard not to copy bulk data buffers - it's horribly slow and energy intensive. The entire point of using buffers rather than client-side arrays is that they can be uploaded to the graphics server once and subsequently just referenced by the draw call without needing a copy or (on desktop) a transfer over PCIe into graphics RAM.
Rendering will behave as if the buffer has been copied (e.g. the output must reflect the state of the buffer at the point the draw call was made, even if it is subsequently modified). However, in most cases no copy is actually needed.
A badly written application can force copies to be taken; for example, if you modify a buffer (e.g. calling glBufferSubData) immediately after submitting a draw using it, the driver may need to create a new version of the buffer as the original data is likely still referenced by the draw you just queued. Well written applications try to pipeline their resource updates so this doesn't happen because it is normally fatal for application rendering performance ...
See https://community.arm.com/graphics/b/blog/posts/mali-performance-6-efficiently-updating-dynamic-resources for a longer explanation.

What makes node.js SlowBuffers "slow"?

I'm using node.js to serve some PNG images that are stored in an SQLite database as binary BLOBs. These images are small, on average 9500 bytes.
I'm using the sqlite3 npm package, which appears to return binary BLOB objects as SlowBuffers. My node.js service holds these SlowBuffers in memory to mitigate IO latency, serving them like this:
response.send(slowBuffer);
It appears that SlowBuffer has an interface similar to Buffer; converting to Buffer is trivial:
var f = function(slowBuffer) {
var buffer = new Buffer(slowBuffer.length);
slowBuffer.copy(buffer);
return buffer;
}
Should I convert these SlowBuffers to Buffers?
Help me understand why they are called "slow" buffers.
If you would read the posts:
https://groups.google.com/forum/?fromgroups=#!topic/nodejs-dev/jd52ZsVSZNo
https://groups.google.com/forum/?fromgroups=#!topic/nodejs/s1dnFbb-Rj8
Node provides two types of buffer objects. Buffer is a native Javascript data structure; SlowBuffer is implemented by a C++ module. Using C++ modules from the native Javascript environment costs extra CPU time, hence the "slow". Buffer objects are backed by SlowBuffer objects, but the contents can be read/written directly from Javascript for better performance.
Any Buffer object larger than 8 KB is backed by a single SlowBuffer object. Multiple Buffer objects smaller than 8 KB can be backed by a single SlowBuffer object. When many Buffer objects smaller than 8 KB exist in memory (backed by a single SlowBuffer), the C++ module penalty can be very high if you were to instead use a SlowBuffer for each one. Small Buffers are often used in large quantities.
This class is primarily for internal use means that if you want to manage buffers on the server on your own, then use SlowBuffer (to use in smaller chunks you have to partition SlowBuffer yourself). Unless you want that minute level control in handling buffers, you should be fine with using Buffer objects.

how to know the TCP buffer size dynamically

Is it possible to know the TCP buffer size dynamically on windows.I set the TCP buffer size using SO_SNDBUF,SO_RECVBUF and also can check its allocated buffer size using getsockopt(). But I wanted to know how to get the available buffer size so that if buffer size goes beyond I can take some action.Any utility or api will be equally useful.
My question is specific to windows.Though of anyone knows anything about linux it can also be useful to know for me to get any any parallel.
Buffers are used asynchronously by the kernel. You can not control them. Moreover, the underlying implementation can ignore your SO_SNDBUF/SO_RECVBUF requests, or choose to provide smaller/larger amounts than requested.

How is fseek() implemented in the filesystem?

This is not a pure programming question, however it impacts the performance of programs using fseek(), hence it is important to know how it works. A little disclaimer so that it doesn't get closed.
I am wondering how efficient it is to insert data in the middle of the file. Supposing I have a file with 1MB data and then I insert something at the 512KB offset. How efficient would that be compared to appending my data at the end of the file? Just to make the example complete lets say I want to insert 16KB of data.
I understand the answer varies depending on the filesystem, however I assume that the techniques used in common filesystems are quite similar and I just want to get the right notion of it.
(disclaimer: I want just to add some hints to this interesting discussion)
IMHO there are some things to take into account:
1) fseek is not a primary system service, but a library function. To evaluate its performance we must consider how the file stream library is implemented. In general, the file I/O library adds a layer of buffering in user space, so the performance of fseek may be quite different if the target position is inside or outside the current buffer. Also, the system services that the I/O libary uses may vary a lot. I.e. on some systems the library uses extensively the file memory mapping if possible.
2) As you said, different filesystems may behave in a very different way. In particular, I would expect that a transactional filesystem must do something very smart and perhaps expensive to be prepared to a possible rollback of an aborted write operation in the middle of a file.
3) Modern OS'es have very aggressive caching algorithms. An "fseeked" file is likely to be already present in cache, so operations become much faster. But they may degrade a lot if the overall filesystem activity produced by other processes become important.
Any comments?
fseek(...) is a library call, not an OS system call. It is the run-time library that takes care of the actual overhead involved in making a system call to the OS, technically speaking, fseek is indirectly making a call to the system but really it is not (this brings up a clear distinction between the differences between a library call and a system call). fseek(...) is a standard input-output function regardless of the underlying system...however...and this is a big however...
The OS will more than likely to have cached the file in its kernel memory, that is, the direct offset to the location on the disk on where the 1's and 0's are stored, it is through the OS's kernel layers, more than likely, a top-most layer within the kernel that would have the snapshot of what the file is composed of, i.e. data irrespectively of what it contains (it does not care either way, as long as the 'pointers' to the disk structure for that offset to the lcoation on the disk is valid!)...
When fseek(..) occurs, there would be a lot of over-head, indirectly, the kernel delegated the task of reading from the disk, depending on how fragmented the file is, it could be theoretically, "all over the place", that could be a significant over-head in terms of having to, from a user-land perspective, i.e. the C code doing an fseek(...), it could be scattering itself all over the place to gather the data into a "one contiguous view of the data" and henceforth, inserting into the middle of a file, (remember at this stage, the kernel would have to adjust the location/offsets into the actual disk platter for the data) would be deemed slower than appending to the end of the file.
The reason is quite simple, the kernel "knows" what was the last offset was, and simply wipe the EOF marker and insert more data, behind the scenes, the kernel, is having to allocate another block of memory for the disk-buffer with the adjusted offset to the location on the disk following an EOF marker, once the appending of data is completed.
Let us assume the ext2 FS and the Linux OS as an example. I don't think there will be a significant performance difference between a insert and an append. In both cases the files node and offset table must be read, the relevant disk sector mapped into memory, the data updated and at some later point the data written back to disk. What will make a big performance difference in this example is good temporal and spatial locality when accessing parts of the file since this will reduce the number of load/store combos.
As a previous answers says you may be able to speed up both operations if you deal with data writes that exact multiples of the FS block size, in this case you could skip the load stage and just insert the new blocks into the files inode datastrucure. This would not be practical, as you would need low level access to the FS driver, and using it would be very restrictive and not portable.
One observation I have made about fseek on Solaris, is that each call to it resets the read buffer of the FILE. The next read will then always read a full block (8K by default). So if you have a lot of random access with small reads it's a good idea to do it unbuffered (setvbuf with NULL buffer) or even use direct syscalls (lseek+read or even better pread which is only 1 syscall instead of 2). I suppose this behaviour will be similar on other OS.
You can insert data to the middle of file efficiently only if data size is a multiple of FS sector but OSes doesn't provide such functions so you have to use low-level interface to the FS driver.
Inserting data in the middle of the file is less efficient than appending to the end because when inserting you would have to move the data after the insertion point to make room for the data being inserted. Moving these data would involve reading them from disk, writing the data to be inserted and then writing the old data after the inserted data. So you have at least one extra read and write when inserting.

Resources