Debugging memory usage of external command inside OCaml program - debugging

I'm having memory issues in a program which I cannot isolate. I'm wondering which would be the best strategy to debug it.
My program exhausts available memory when running a line similar to this one:
Sys.command "solver file.in > file.out".
The error message is:
Fatal error: exception Sys_error("solver file.in > file.out: Cannot allocate memory")
Before the error, the program runs for about 15 seconds, consuming over 1 GB of RAM, until it finally dies.
However, running the exact same command line in the shell (with the same input file) only requires 0.7 seconds and uses less than 10 MB of RAM.
It seems something is leaking an absurd amount of memory, but I cannot identify it. Trying to isolate the error by copying it in a new OCaml file results in a situation similar to running it directly in the shell.
For information, file.in and file.out (the expected resulting file, when running the command in the shell) are both about 200 KB large.
I tried using Unix.system instead of Command.sys, but didn't notice any differences.
I'd like to know if Sys.command has some known limitations concerning memory (e.g. excessive memory usage), and what is the best way to identify why the behavior of the external program changes so drastically.

Sys.command just calls system() from the C library. The chances that the problem is in the thin wrapper around system() are pretty small.
Most likely some aspect of the process context is just different in the two cases.
The first thing I'd try would be to add a small amount of tracing to the solver code to get a feel for what's happening in the failure case.
If you don't have sources to the solver, you could just try re-creating the environment that seems to work. Something like the following might be worth a try:
Sys.command "/bin/bash -l -c 'solver file.in > file.out'"
This depends on the availability of bash. The -l flag tells bash to pretend it's a login shell. If you don't have bash you can try something similar with whatever shell you do have.
Update
OK, it seems that the memory blowup is happening in your OCaml code before you run the solver. So the solver isn't the problem.
It's hard to say without knowing more about your OCaml code whether it's consuming a reasonable amount of memory.
It doesn't sound on the face of it like you're running out of stack space, so I wouldn't worry about lack of tail recursion right off. Often this is something to think about.
It actually sounds a lot like you have an infinite regression with memory being allocated along the way. This will eventually exhaust your memory space whether you have swapping turned on or not.
You can rule this out if your code works on a small example of whatever problem you're trying to solve. In that case, you might just have to reengineer you solution to take less memory.

After following the advice of Jeffrey Scofield, I realized that the out of memory issue happened before calling the solver, despite the error message.
The following simple OCaml file and Bash script were used to confirm it (matrix dimensions need to be adapted according to the available memory in your system):
test.ml:
let _ =
let _ = Array.create_matrix 45000 5000 0 in
Sys.command "./test.sh"
test.sh:
#!/bin/bash
for x in {1..20000};do :;done
Using a script to measure memory usage (such as the one referenced in this question), I've been able to confirm that, while the Bash script uses no more than 5 MB on my machine, and the original program peaks over 1.7 GB, the displayed error message seems to associate the error to the Sys.command line, even though this would be highly unlikely in practice.
In other words, to debug memory usage of external commands, it's best to ensure that the external process is actually called, otherwise the error message may be misleading.

Related

Increase pipe buffer size on macOS

My program on linux got a drastic speed increase when I wrote fcntl(fd, F_SETPIPE_SZ, size). fd is a pipe to a child process I created with fork+execv. I raised the pipe from 64K to 1MB which seems to be linux max without root permission.
I wrote a test to see how big it is on mac, it's also 64K, but I can't seem to figure out how to increase the pipe size. Does anyone know? I'm using an M2 and ventura
As far as I know macOS doesn't have a similar option available. However if your end goal is to speed up your program, it might be worth taking a closer look at why increasing the pipe capacity on Linux improves performance, as it seems to indicate some underlying problem elsewhere in the program.
An example off the top of my head: if you are sending large chunks of data (>64K), it may be that the reading part of the code doesn't correctly handle truncated data and hangs for some amount of time when a partial chunk is read. With a small buffer this would happen more often, so increasing the buffer size would improve performance, but wouldn't actually fix the root problem.

In bash, how are infinite output streams redirected to files handled?

As an experiment, I redirected the output of yes, which is a command that outputs a string repeatedly until killed, as follows:
yes > a.txt
I interrupted the process with Ctrl + C in a split of a second but even in that short time a.txt ended being a couple hundred megabytes in size.
So, I have several questions:
How can such a large file be generated at such a short time? Aren't there writing speed restrictions, especially on HDDs like the one I am using?
How can I avoid this when I unintentionally redirect the output of a program with an endless loop?
Assume the process above works for a long time. Will all the space left in storage be filled eventually or are there any "safety measures" set by bash, OS, etc.?
Output is kept in kernel buffers temporarily. When your memory fills up, then the writing will slow down to hard disk velocity, which is still pretty fast.
Most Unix filesystems reserve a percentage of space for root processes. You can see this by comparing the output of df with sudo df. But unless you have user quotas enabled, you can certainly use all your disk space up like that. Fortunately, the rm command-line tool doesn't need to consume disk storage to do its work.

Ocaml execution of program not producing new output after some time

I have 3 ocaml modules, the last one of the tree does the actual computation and uses functions defined in the other 2.
The basic idea of the program is to have a port as initial state, which contains boats, and moving those boats until we reach either a winning situation or no more moves are possible.
The actual code is not really a problem here. It is probably not very efficient, but finds a solution... except when it does not.
While the program takes a few seconds for complex situations, in some cases, when I add a bit more complexity, it takes forever to find a solutions.
When I execute the program like this:
$ ocamlc -o test.exe port.ml moves.ml solver.ml
$ ./test.exe > file
The resulting file is really huge, but its size stops increasing after some time..
It seems to me that after a while the program stops running but without terminating, no Stackoverflow or out of memory error is thrown. The program simply does not continue the execution. In other words, the command
$ ./test.exe > file
is still executed, but no more new lines are added to the file. If I log to the shell itself and not a file, I get the same result: no new lines keep getting added after some time.
What could this be?
The main function (that is responsible for finding a solution) uses a Depth-First-Search algorithm, and contains a lot of List operations, such as List.fold, List.map, List.iter, List.partition, List.filter. I was thinking that maybe these functions have problems dealing with huge lists of complex types at some moment, but again, no error is thrown, the execution just stops.
I explained this very vaguely, but I really don't understand the problem here. I don't know whether the problem has to do with my shell (Ubuntu subsystem on windows) running out of memory, or ocaml List functions being limited at some point... If you have any suggestions, feel free to comment
To debug such cases you should use diagnostic utilities provided by your operating system and the OCaml infrastructure itself.
First of all, you shall look into the state of your process. You can use top or htop utilities if you're running a Unix machine. Otherwise, you can use the task manager.
If the process is running out of the physical memory, it could be swapped by the operating system. In that case, all memory operations will turn into hard drive reads and writes. Therefore, garbage collecting a heap stored in a hard drive will take some time. If this was the case, then you can use a memory profiler to identify the crux of the problem.
If the process is constantly running without a change in the memory footprint, then it looks like that you either hit a bug in your code, i.e., an infinite loop, or that some of your algorithms have exponential complexity, as Konstantin has mentioned in the comment. Use a debugging output or tracing to identify the location where the program stalled.
Finally, if your program is in the sleeping state, then it could be a deadlock. For example, if you're reading and writing to the same file, this can end up in a race condition. In general, if your program is multithreaded or operates multiple processes, there are lots of possibilities to induce a race condition.

Shell script memory consumption

I have a ruby script that performs a lot of memory intensive work. The script sometimes fails to complete because of unavailability of memory. Fortunately, there is a way I can split the script into two as there are literally 2 parts to the ruby script. The memory intensive work also gets split into 2 as I split the script into 2 separate scripts. But now I want script 2 to execute after script 1. I intend to write a shell script which looks something like
ruby script1.rb
ruby script2.rb
What I'm concerned about in this approach is that since both scripts are scheduled in the same shell script, it would do no good to memory disintegration that we are trying to achieve by splitting the ruby script into 2.
Will both script1.rb and script2.rb run in their own memory space if they run as one shell script? And if script1.rb is terminated, would it free up memory that script2.rb might utilize? Does splitting the scripts into two and running them through a shell script sound like an approach for the memory problem?
(I don't really know Ruby, but I'll give it a shot)
Your case sounds as if:
You have a Ruby script that consists of two parts, A and B
A is executed first and uses lots of memory
When A finishes it does not clean up after itself and leaves lots of references to useless objects
B is executed afterwards and uses even more lots of memory
B runs out of memory
By splitting the Ruby script into two scripts you allow the memory used by A to be implicitly freed by the operating system when it terminates. Each script is a new process and as long as they are not executed concurrently (in parallel), they do not affect each other.
Therefore, using a shell script to execute A and B consecutively allows B to work as if A had never used any memory, so it is a workaround - a very ugly one.
Since you can apparrently run A and B consecutively, you should fix A so that it cleans up after itself and frees any memory. Setting all references to any object that is not needed after A is done to nil, will allow the Ruby garbage collector to free any used memory.
The best approach would be to go through your Ruby script and set the object references to nil as soon as you are done with that object, rather than do that at the end. You may find that your script has significanty reduced memory usage afterwards.
Keeping references to unnecessary objects is a form of memory leak that can lead to a program using inordinate amounts of memory. Lists and trees that grow indefinitely, due to objects being added but never removed, are a very common cause for this. This can also significantly reduce the performance of the code, since parsing trees and - especially - lists get slower as they get bigger.
A good mental test is: "Does my algorithm need as much memory on paper?"

Why does my Perl script to decompress files slower when I use threads?

So I'm running perl 5.10 on a core 2 duo macbook pro compiled with threading support: usethreads=define, useithreads=define. I've got a simple script to read 4 gzipped files containing aroud 750000 lines each. I'm using Compress::Zlib to do the uncompressing and reading of the files. I've got 2 implementations the only difference between them being one includes use threads. Other than that both script run the same subroutine to do the reading. Hence in psuedocode the non-threading program does this:
read_gzipped(file1);
read_gzipped(file2);
read_gzipped(file3);
read_gzipped(file4);
The threaded version goes like this:
my thr0 = threads->new(\$read_gzipped,'file1')
my thr1 = threads->new(\$read_gzipped,'file1')
my thr2 = threads->new(\$read_gzipped,'file1')
my thr3 = threads->new(\$read_gzipped,'file1')
thr0->join()
thr1->join()
thr2->join()
thr3->join()
Now the threaded version is actually running almost 2 times slower then the non-threaded script. This obviously was not the result I was hoping for. Can anyone explain what I'm doing wrong here?
You're using threads to try and speed up something that's IO-bound, not CPU-bound. That just introduces more IO contention, which slows down the script.
My guess is the bottleneck for GZIP operations is disk access. If you have four threads competing for disk access on platter harddisk, that slows things down considerably. The disk head will have to move to different files in rapid succession. If you just process one file at a time, the head can stay near that file, and the disk cache will be more accurate.
ithreads work well if you're dealing with something which is mostly not cpu bound. decompression is cpu bound.
You can easily alleviate the problem with using Parallel::ForkManager module.
Generally - threads in Perl and not really good.
I'm not prepared to assume that you're I/O bound without seeing the output of top while this is running. Like depesz, I tend to assume that compression/decompression operations (which are math-heavy) are more likely to be CPU-bound.
When you're dealing with a CPU-bound operation, using more threads/processes than you have processors will almost never[1] improve matters - if the CPU utilization is already at 100%, more threads/processes won't magically increase its capacity - and will most likely make things worse by adding in more context-switching overhead.
[1] I've heard it suggested that heavy compilations, such as building a new kernel, benefit from telling make to use twice as many processes as the machine has processors and my personal experience has been that this seems to be accurate. The explanation I've heard for it is that this allows each CPU to be kept busy compiling in one process while the other process is waiting for data to be fetched from main memory. If you view compiling as a CPU-bound process, this is an exception to the normal rule. If you view it as an I/O bound case (where the I/O is between the CPU and main memory rather than disk/network/user I/O), it is not.

Resources