Spawning simultaneous child processes in Ruby - ruby

I'm using resque, with a queue processor which, as part of its execution, will start a shell process. Currently, I am using PTY.spawn() to invoke the shell command and handle its output.
I'd like to augment this code so that a quantity (N) can be given (the command executed onboards VMs, I want to be able to start a variable number with one call), and have the shell process be called N times in separate processes, without the Nth call having to wait for call N-1 to finish, and so on. I also want to capture all STDOUT from each invocation, so that I can do work on the output once the call is done.
I have looked at Kernel::fork but the scope of code inside a forked block is not the same as its parent (for pretty obvious reasons).
What tool(s) can I use so that each process can be spawned independently, their output can be captured, and I can still have the parent process wait for them all to finish before moving on?

Here:
stdouts=[]
numberOfProcesses.times do
stdouts<<PTY.spawn(command_line)[0..-1]
end
That's pretty basic if you just spawn them and get a bunch of STDOUT/STDIN pairs. If you want to be able to work on each process's output as soon as it is done, try this:
threads=[]
numberOfProcesses.times do
threads<<Thread.new(command_line) |cmd|
stdout, stdin, pid = PTY.spawn(cmd)
Process.waitpid(pid)
process_output(stdout.read)
end
end
threads.each {|t| t.join}
That spawns them in parallel, each thread waiting for when it's instance is done. When it's instance is done, it processes output and returns. The main thread sits waiting for all of the others to finish.

Related

Is it possible to make a console wait on another child process?

Usually when a program is run from the Windows console, the console will wait for the process to exit and then print the prompt and wait for user input. However, if the process starts a child process, the console will still only wait for the first process to exit. It will not wait for the child as well.
Is there a way for the program to get the console to wait on another child process instead of (or as well as) the current process.
I would assume it's impossible because presumably the console is waiting on the process' handle and there's no way to replace that handle. However, I'm struggling to find any confirmation of this.
Is there a way for the program to get the console to wait on another child process instead of (or as well as) the current process.
No. As you noted, as soon as the 1st process the console creates has exited, the console stops waiting. It has no concept of any child processes being created by that 1st process.
So, what you can do instead is either:
simply have the 1st process wait for any child process it creates before then exiting itself.
if that is not an option, then create a separate helper process that creates a Job Object and then starts the main process and assigns it to that job. Any child processes it creates will automatically be put into the same job as well 1. The helper process can then wait for all processes in the job to exit before then exiting itself. Then, you can have the console run and wait on the helper process rather than the main process.
1: by default - a process spawner can choose to break out a new child process from the current job, if the job is setup to allow that.

Threads state change - is there an WinAPI to get callbacks for them?

I have a thread in some console process, whose code is not available to me. Is there a way to get notified when its state changes (e.g. becomes idle)?
Maybe hooks API or even kernel-mode API (if no other way...)?
Expansion:
I have legacy console app I have to operate in my program. The only way to interact with it, obviously, is via stdin. So I run it as new Process, and send commands to its stdin. I take some text from its stdout, but it is not entirely predictable, so I cannot rely on it in order to know when it finished its current work and ready to receive the next commands. The flow is something like this: I run it, send (e.g.) command1, wait for it to finish its work (some CPU load and some IO operations), and then issue command2, and so on, until last command. When it finished working on last command, I can close it gracefully (send exit command). The thing is that I have several processes of this console exe that are working simultaneously.
I can:
use timeouts since last stdout received - not good, because it can be a millisecond and an hour
parse stdout using (e.g.) regex to wait for expected outputs - not good... the output is wholly unexpected. It's almost random
using timers, poll its threads state and only when all of them are in wait state (and not for IO), and at least one is waiting for user input (see this) - not good either: in case I use many processes simultaneously it can create unnecessary, non-proportional burden on the system.
So I like the last option, just instead polling, I rather events fired when these threads become idle.
I hope it explains the issue well...

Does process exit when the main thread ends?

I'm new to Ruby and quick googling of the question did not give a result. For this case it is relatively easy to code a test, however it might be worth to ask it here to get an authoritative answer.
Consider scenario: in a ruby application invoked from command line the main thread creates and starts worker threads. Worker threads perform long computations. A method of the main thread does not wait for anything and simply completes after spawning workers.
Will the process exit and worker threads be terminated after the main thread exits?
Is there a documentation describing this behavior?
As long as these threads are daemon threads, then they will exit along with your program. There is documentation regarding the exiting of threads (although its short) here. However, if your looking for processes to stay around after being spawned by another process (and the corresponding process ending), then you should look at a multi-processing gem or library suited for the task.

Why is there timing problem while to fork child processes

When I took a look at the reference of 'Launching-Jobs' in gnu.org, I didn't get this part.
The shell should also call setpgid to put each of its child processes into the new process group. This is because there is a potential timing problem: each child process must be put in the process group before it begins executing a new program, and the shell depends on having all the child processes in the group before it continues executing. If both the child processes and the shell call setpgid, this ensures that the right things happen no matter which process gets to it first.
There is two method on the link page, launch_job () and launch_process ().
They both call the setpgid in order to prevent the timing problem.
But I didn't get why is there such a problem.
I guess new program means result of execvp (p->argv[0], p->argv); in launch_process(). And before run execvp, setpgid (pid, pgid); is always executed, without same function on launch_job ().
So again, why is there such a problem? (why we have to call setpgid (); on launch_job () either?)
The problem is that the shell wants the process to be in the right process group. If the shell doesn't call setpgid() on its child process, there is a window of time during which the child process is not part of the process group, while the shell execution continues. (By calling setpgid() the shell can guarantee that the child process is part of the process group after that call).
There is another problem, which is that the child process may execute the new program (via exec) before its process group id has been properly set (i.e. before the parent calls setpgid()). That is why the child process should also call setpgid() (before calling exec()).
The description is admittedly pretty bad. There isn't just one problem being solved here; it's really two separate problems. One - the parent (i.e. the shell) wants to have the child process in the right process group. Two - the new program should begin execution only once its process has already been put into the right process group.

synchronize two ruby scripts on same computer

What's the best way to synchronize two ruby scripts running on the same computer?
The scripts will be started as separate processes from the shell.
I want the scripts to take turns running. That is, I want script A to run, send a signal to script B, then wait for a signal from script B. When script B gets the signal from script A, it starts running, signals script A when it is finished, and then waits for a signal from A. Basically, I want the two scripts to run interleaved.
What's the best way to implement this synchronization?
Right now all I can come up with is the creation of a file as the signal (each script busy loops waiting for a file to be created). Are there other implementations that are faster/easier/safer?
In case it affects the answer, I'm on OSX.
Probably the easiest way of doing IPC in ruby is via drb and using Queue, which is located in thread:
require 'thread'
queue = Queue.new # provides blocking read
Note, when using drb, you'll want to have the following line near the top of your program:
Socket.do_not_reverse_lookup=true;
Without it, things just run extremely slowly (source).
To solve the specific problem described in the question, you can create a Pipe class, which essentially is just two Queue objects, one for the inbox and one for the outbox. The blocking read behavior of the Queue makes it easy to have the processes wait for each other. The Pipe is shared between the two processes via drb.
The server startup code might look like this:
require 'drb'
Socket.do_not_reverse_lookup=true;
uri = "druby://localhost:2250" # you can pick a port to communicate on
pipe = Pipe.new
DRb.start_service uri, pipe
The client startup code would look like:
require 'drb'
Socket.do_not_reverse_lookup=true;
uri = "druby://localhost:2250"
DRb.start_service
pipe = DRbObject.new nil, uri
Now the client and server can communicate via the Pipe object.

Resources