Shell script memory consumption - shell

I have a ruby script that performs a lot of memory intensive work. The script sometimes fails to complete because of unavailability of memory. Fortunately, there is a way I can split the script into two as there are literally 2 parts to the ruby script. The memory intensive work also gets split into 2 as I split the script into 2 separate scripts. But now I want script 2 to execute after script 1. I intend to write a shell script which looks something like
ruby script1.rb
ruby script2.rb
What I'm concerned about in this approach is that since both scripts are scheduled in the same shell script, it would do no good to memory disintegration that we are trying to achieve by splitting the ruby script into 2.
Will both script1.rb and script2.rb run in their own memory space if they run as one shell script? And if script1.rb is terminated, would it free up memory that script2.rb might utilize? Does splitting the scripts into two and running them through a shell script sound like an approach for the memory problem?

(I don't really know Ruby, but I'll give it a shot)
Your case sounds as if:
You have a Ruby script that consists of two parts, A and B
A is executed first and uses lots of memory
When A finishes it does not clean up after itself and leaves lots of references to useless objects
B is executed afterwards and uses even more lots of memory
B runs out of memory
By splitting the Ruby script into two scripts you allow the memory used by A to be implicitly freed by the operating system when it terminates. Each script is a new process and as long as they are not executed concurrently (in parallel), they do not affect each other.
Therefore, using a shell script to execute A and B consecutively allows B to work as if A had never used any memory, so it is a workaround - a very ugly one.
Since you can apparrently run A and B consecutively, you should fix A so that it cleans up after itself and frees any memory. Setting all references to any object that is not needed after A is done to nil, will allow the Ruby garbage collector to free any used memory.
The best approach would be to go through your Ruby script and set the object references to nil as soon as you are done with that object, rather than do that at the end. You may find that your script has significanty reduced memory usage afterwards.
Keeping references to unnecessary objects is a form of memory leak that can lead to a program using inordinate amounts of memory. Lists and trees that grow indefinitely, due to objects being added but never removed, are a very common cause for this. This can also significantly reduce the performance of the code, since parsing trees and - especially - lists get slower as they get bigger.
A good mental test is: "Does my algorithm need as much memory on paper?"

Related

In bash, how are infinite output streams redirected to files handled?

As an experiment, I redirected the output of yes, which is a command that outputs a string repeatedly until killed, as follows:
yes > a.txt
I interrupted the process with Ctrl + C in a split of a second but even in that short time a.txt ended being a couple hundred megabytes in size.
So, I have several questions:
How can such a large file be generated at such a short time? Aren't there writing speed restrictions, especially on HDDs like the one I am using?
How can I avoid this when I unintentionally redirect the output of a program with an endless loop?
Assume the process above works for a long time. Will all the space left in storage be filled eventually or are there any "safety measures" set by bash, OS, etc.?
Output is kept in kernel buffers temporarily. When your memory fills up, then the writing will slow down to hard disk velocity, which is still pretty fast.
Most Unix filesystems reserve a percentage of space for root processes. You can see this by comparing the output of df with sudo df. But unless you have user quotas enabled, you can certainly use all your disk space up like that. Fortunately, the rm command-line tool doesn't need to consume disk storage to do its work.

Ocaml execution of program not producing new output after some time

I have 3 ocaml modules, the last one of the tree does the actual computation and uses functions defined in the other 2.
The basic idea of the program is to have a port as initial state, which contains boats, and moving those boats until we reach either a winning situation or no more moves are possible.
The actual code is not really a problem here. It is probably not very efficient, but finds a solution... except when it does not.
While the program takes a few seconds for complex situations, in some cases, when I add a bit more complexity, it takes forever to find a solutions.
When I execute the program like this:
$ ocamlc -o test.exe port.ml moves.ml solver.ml
$ ./test.exe > file
The resulting file is really huge, but its size stops increasing after some time..
It seems to me that after a while the program stops running but without terminating, no Stackoverflow or out of memory error is thrown. The program simply does not continue the execution. In other words, the command
$ ./test.exe > file
is still executed, but no more new lines are added to the file. If I log to the shell itself and not a file, I get the same result: no new lines keep getting added after some time.
What could this be?
The main function (that is responsible for finding a solution) uses a Depth-First-Search algorithm, and contains a lot of List operations, such as List.fold, List.map, List.iter, List.partition, List.filter. I was thinking that maybe these functions have problems dealing with huge lists of complex types at some moment, but again, no error is thrown, the execution just stops.
I explained this very vaguely, but I really don't understand the problem here. I don't know whether the problem has to do with my shell (Ubuntu subsystem on windows) running out of memory, or ocaml List functions being limited at some point... If you have any suggestions, feel free to comment
To debug such cases you should use diagnostic utilities provided by your operating system and the OCaml infrastructure itself.
First of all, you shall look into the state of your process. You can use top or htop utilities if you're running a Unix machine. Otherwise, you can use the task manager.
If the process is running out of the physical memory, it could be swapped by the operating system. In that case, all memory operations will turn into hard drive reads and writes. Therefore, garbage collecting a heap stored in a hard drive will take some time. If this was the case, then you can use a memory profiler to identify the crux of the problem.
If the process is constantly running without a change in the memory footprint, then it looks like that you either hit a bug in your code, i.e., an infinite loop, or that some of your algorithms have exponential complexity, as Konstantin has mentioned in the comment. Use a debugging output or tracing to identify the location where the program stalled.
Finally, if your program is in the sleeping state, then it could be a deadlock. For example, if you're reading and writing to the same file, this can end up in a race condition. In general, if your program is multithreaded or operates multiple processes, there are lots of possibilities to induce a race condition.

Debugging memory usage of external command inside OCaml program

I'm having memory issues in a program which I cannot isolate. I'm wondering which would be the best strategy to debug it.
My program exhausts available memory when running a line similar to this one:
Sys.command "solver file.in > file.out".
The error message is:
Fatal error: exception Sys_error("solver file.in > file.out: Cannot allocate memory")
Before the error, the program runs for about 15 seconds, consuming over 1 GB of RAM, until it finally dies.
However, running the exact same command line in the shell (with the same input file) only requires 0.7 seconds and uses less than 10 MB of RAM.
It seems something is leaking an absurd amount of memory, but I cannot identify it. Trying to isolate the error by copying it in a new OCaml file results in a situation similar to running it directly in the shell.
For information, file.in and file.out (the expected resulting file, when running the command in the shell) are both about 200 KB large.
I tried using Unix.system instead of Command.sys, but didn't notice any differences.
I'd like to know if Sys.command has some known limitations concerning memory (e.g. excessive memory usage), and what is the best way to identify why the behavior of the external program changes so drastically.
Sys.command just calls system() from the C library. The chances that the problem is in the thin wrapper around system() are pretty small.
Most likely some aspect of the process context is just different in the two cases.
The first thing I'd try would be to add a small amount of tracing to the solver code to get a feel for what's happening in the failure case.
If you don't have sources to the solver, you could just try re-creating the environment that seems to work. Something like the following might be worth a try:
Sys.command "/bin/bash -l -c 'solver file.in > file.out'"
This depends on the availability of bash. The -l flag tells bash to pretend it's a login shell. If you don't have bash you can try something similar with whatever shell you do have.
Update
OK, it seems that the memory blowup is happening in your OCaml code before you run the solver. So the solver isn't the problem.
It's hard to say without knowing more about your OCaml code whether it's consuming a reasonable amount of memory.
It doesn't sound on the face of it like you're running out of stack space, so I wouldn't worry about lack of tail recursion right off. Often this is something to think about.
It actually sounds a lot like you have an infinite regression with memory being allocated along the way. This will eventually exhaust your memory space whether you have swapping turned on or not.
You can rule this out if your code works on a small example of whatever problem you're trying to solve. In that case, you might just have to reengineer you solution to take less memory.
After following the advice of Jeffrey Scofield, I realized that the out of memory issue happened before calling the solver, despite the error message.
The following simple OCaml file and Bash script were used to confirm it (matrix dimensions need to be adapted according to the available memory in your system):
test.ml:
let _ =
let _ = Array.create_matrix 45000 5000 0 in
Sys.command "./test.sh"
test.sh:
#!/bin/bash
for x in {1..20000};do :;done
Using a script to measure memory usage (such as the one referenced in this question), I've been able to confirm that, while the Bash script uses no more than 5 MB on my machine, and the original program peaks over 1.7 GB, the displayed error message seems to associate the error to the Sys.command line, even though this would be highly unlikely in practice.
In other words, to debug memory usage of external commands, it's best to ensure that the external process is actually called, otherwise the error message may be misleading.

Making ruby program run on all processors

I've been looking at optimizing a ruby program that's quite calculation intensive on a lot of data. I don't know C and have chosen Ruby (not that I know it well either) and I'm quite happy with the results, apart from the time it takes to execute. It is a lot of data, and without spending any money, I'd like to know what I can do to make sure I'm maximizing my own systems resources.
When I run a basic Ruby program, does it use a single processor? If I have not specifically assigned tasks to a processor, Ruby won't read my program and magically load each processor to complete the program as fast as possible will it? I'm assuming no...
I've been reading a bit on speeding up Ruby, and in another thread read that Ruby does not support true multithreading (though it said JRuby does). But, if I were to "break up" my program into two chunks that can be run in separate instances and run these in parralel...would these two chunks run on two separate processors automatically? If I had four processors and opened up four shells and ran four separate parts (1/4) of the program - would it complete in 1/4 the time?
Update
After reading the comments I decided to give JRuby a shot. Porting the app over wasn't that difficult. I haven't used "peach" yet, but just by running it in JRuby, the app runs in 1/4 the time!!! Insane. I didn't expect that much of a change. Going to give .peach a shot now and see how that improves things. Still can't believe that boost.
Update #2
Just gave peach a try. Ended up shaving another 15% off the time. So switching to JRuby and using Peach was definitely worth it.
Thanks everyone!
Use JRuby and the peach gem, and it couldn't be easier. Just replace an .each with .peach and voila, you're executing in parallel. And there are additional options to control exactly how many threads are spawned, etc. I have used this and it works great.
You get close to n times speedup, where n is the number of CPUs/cores available. I find that the optimal number of threads is slightly more than the number of CPUs/cores.
Like others have said the MRI implementation of ruby (the one most people use) does not support native threads. Hence you can not split work between CPU cores by launching more threads using the MRI implementation.
However if your process is IO-bound (restricted by disk or network activity for example), then you may still benefit from multiple MRI-threads.
JRuby on the other hand does support native threads, meaning you can use threads to split work between CPU cores.
But all is not lost. With MRI (and all the other ruby implementations), you can still use processes to split work.
This can be done using Process.fork for example like this:
Process.fork {
10.times {
# Do some work in process 1
sleep 1
puts "Hello 1"
}
}
Process.fork {
10.times {
# Do some work in process 2
sleep 1
puts "Hello 2"
}
}
# Wait for the child processes to finish
Process.wait
Using fork will split the processing between CPU cores, so if you can live without threads then separate processes are one way to do it.
As nice as ruby is, it's not known for its speed of execution. That being said, if, as noted in your comment, you can break up the input into equal-sized chunks you should be able to start up n instances of the program, where n is the number of cores you have, and the OS will take care of using all the cores for you.
In the best case it would run in 1/n the time, but this kind of thing can be tricky to get exactly right as some portions of the system, like memory, need to be shared between the processes and contention between processes can cause things not to scale linearly. If the split is easy to do I'd give it a try. You can also just try running the same program twice and see how long it takes to run, if it takes the same amount of time to run one as it does to run two you're likely all set, just split your data and go for it.
Trying jruby and some threads would probably help, but that adds a fair amount of complexity. (It would probably be a good excuse to learn about threading.)
Threading is usually considered one of Ruby's weak points, but it depends more on which implementation of Ruby you use.
A really good writeup on the different threading models is "Does ruby have real multithreading?".
From my experience and from what I gathered from people who know better about this stuff, it seems if you are going to chose a Ruby implementation, JRuby is the way to go. Though, if you are learning Ruby you might want to chose another language such has Erlang, or maybe Clojure, which are popular choices if you wanting to use the JVM.

Is there a parallel make system that is smart enough to intelligently respond to low-memory/swapping conditions?

I'm a big fan of speeding up my builds using "make -j8" (replacing 8 with whatever my current computer's number of cores is, of course), and compiling N files in parallel is usually very effective at reducing compile times... unless some of the compilation processes are sufficiently memory-intensive that the computer runs out of RAM, in which case all the various compile processes start swapping each other out, and everything slows to a crawl -- thus defeating the purpose of doing a parallel compile in the first place.
Now, the obvious solution to this problem is "buy more RAM" -- but since I'm too cheap to do that, it occurs to me that it ought to be possible to have an implementation of 'make' (or equivalent) that watches the system's available RAM, and when RAM gets down to near zero and the system starts swapping, make would automatically step in and send a SIGSTOP to one or more of the compile processes it had spawned. That would allow the stopped processes to get fully swapped out, so that the other processes could finish their compile without further swapping; then, when the other processes exit and more RAM becomes available, the 'make' process would send a SIGCONT to the paused processes, allowing them to resume their own processing. That way most swapping would be avoided, and I could safely compile on all cores.
Is anyone aware of a program that implements this logic? Or conversely, is there some good reason why such a program wouldn't/couldn't work?
For GNU Make, there's the -l option:
-l [load], --load-average[=load]
Specifies that no new jobs (commands) should be started if there are others jobs running and the load average is at least load (a floating-
point number). With no argument, removes a previous load limit.
I don't think there's a standard option for this, though.

Resources