How to find no of I/Os of a program in stxxl? - algorithm

I am using STXXL, can somebody help me in finding the no. of I/O's(or blocks transferred) done by my program(or algorithm or process)? I know how to restrict the memory usage by any particular process, but don't know how to restrict the block size in STXXL and how to count no. of blocks transferred.

The STXXL provides an I/O Performance Counter, see here which stores various measured I/O data (including the number of blocks transferred).

If you are on Linux, blktrace will keep track of block I/O for you. I don't know about other systems.

Related

It is reasonable to read data on disk parallelly?

In an application, one may need to read the data/files on disk and load them into memory. Many programming languages have support to use multiple CPUs to do the work. I am wondering whether it is a reasonable option to read the disk parallelly. The parallel/concurrent routines will harm the disk, right?
Could you please provide some advice on how to design this kind of system? Thanks in advance.
If you are after performance, then reading data in parallel is the best thing you can do. The more requests you can provide a disk the faster it can complete the aggregate set of operations.
The only problem with reading data concurrently is that you need to be able to handle it correctly in your application. Typically this means using threads, although you can find OS specific solution that may help with this, such as AIO on linux.
Lastly, the term reasonable is somewhat loaded. While it may be faster to read data concurrently, is there a good use case/does it improve the user experience/is it worth the extra code complexity? In most cases, the answer to that would be no.

How to limit Hard Drive Disk I/O when reading/writing a file on disk?

I have a few Rust programs that read data from a file, do some operations, and write data on another file.
Simple enough, but I've been having a big issue in that my programs saturate the HDD max I/O and can only be executed when no other process is in use.
To be more precise, I'm currently using BufReader and BufWriter with a buffer size of 64 KB which is fantastic in and of itself to read/write a file as quickly as possible. But reading at 250MB/s and writing at the same time at 250MB/s has a tendency to overflow what the HDD can manage. Suffice to say that I'm all for speed and whatnot, but I realized that those Rust programs are asking for too much resources from the HDD and seems to be stalled by the Operating System (Windows) to let other processes work in peace. The files I'm reading/writing are generally a few Gigabytes
Now I know I could just add some form of wait() between each read/write operation on the disk but, I don't know how to find out at which speed I'm currently reading/writing and am looking for a more optimal solution. Plus even after reading the docs, I still can't find an option on BufReader/BufWriter that could limit HDD I/O operations to some arbitrary value (let's say 100MB/s for example).
I looked through the sysinfo crate but it does not seem to help in finding out current and maximum I/O for the HDD.
Am I out of luck and should I delve deeper in systems programming to find a solution ? Or is there already something that might teach how to prioritize my calls to the HDD or to simply limit my calls to some arbitrary value calculated from the currently available I/O rate of the HDD ?
After reading a bit more on the subject, apart from trying to read/write a lot of data and calculate from its performance, it seems like you can't find out HDD max I/O rate during the execution of the program and can only guess a constant at which HDD I/O rate can't go higher. (see https://superuser.com/questions/795483/how-to-limit-hdd-write-speed-for-chosen-programs/795488#795488)
But, you can still monitor disk activity, and with the number guessed earlier, you can use wait() more accurately than always limiting yourself at a constant speed. (here is a crate for Rust : https://github.com/myfreeweb/systemstat).
Prioritizing the process with the OS might be overkill since I'm trying to slip between other processes and share whatever resources are available at that time.

load file to memory by block of 256kb

I'm studying Blocked sort-based indexing and the algorithm talks about loading files by some block of 32 or 64kb because disk reading is by block so it is efficient.
My first question is how am I supposed to load file by block?buffer reader of 64kb? But if I use java input stream, whether or not this optimization has already beed done and I can just tream the stream?
I actually use apache spark, so whether or not sparkContext.textFile() does this optimization? what about spark streaming?
I don't think on the JVM you have any direct view onto the file system that would make it meaningful to align reads and block-sizes. Also there are various kinds of drives and many different file systems now, and block sizes would most likely vary or even have little effect on the total I/O time.
The best performance would probably be to use java.nio.FileChannel, and then you can experiment with reading ByteBuffers of given block sizes to see if it makes any performance difference. I would guess the only effect you see is that the JVM overhead for very small buffers matters more (extreme case, reading byte by byte).
You may also use the file-channel's map method to get hold of a MappedByteBuffer.

How does CUDA handle multiple updates to memory address?

I have written a CUDA kernel in which each thread makes an update to a particular memory address (with int size). Some threads might want to update this address simultaneously.
How does CUDA handle this? Does the operation become atomic? Does this increase the latency of my application in any way? If so, how?
The operation does not become atomic, and it is essentially undefined behavior. When two or more threads write to the same location, one of the values will end up in the location, but there is no way to predict which one.
It can be especially problematic if you are reading and writing, such as to increment a variable.
CUDA provides a set of atomic operations to help.
You may also use other coding techniques such as parallel reductions, to help when there are multiple updates to the same location, such as finding a max or min value.
If you don't care about the order of updates, it should not be a performance issue for newer GPUs which automatically condense writes or reads to a single location in global memory or shared memory, but this is also not specified behavior.

Limit the memory allocation in Go Language?

I'm finding a way to limit the memory usage in Go language. My application implementing with Go language has a big data that must be loaded in main memory, so I want to limit the maximum memory size of the process to the size specified by the user.
In C language, actually, I accumulate the sizes of malloc'ed memory to do that, but I don't know how to do same thing in Go language.
Please let me know if there is a way to do it.
Thank you.
The Go garbage collector is not deterministic and it is conservative. Therefore, using the runtime.MemStats variable is not going to be accurate for your purpose.
Fix your approximate memory usage by setting the maximum size of data that you are going to allow to be loaded at one time into a process using the input from the user.
Perhaps you want to use ulimit in conjunction with your go code?
You can do this via runtime/debug.SetMemoryLimit
See here for the original proposal.
Take a look here for the GitHub issue.
Besides runtime.MemStats you could use gosigar to monitor system memory.

Resources