synchronize two ruby scripts on same computer - ruby

What's the best way to synchronize two ruby scripts running on the same computer?
The scripts will be started as separate processes from the shell.
I want the scripts to take turns running. That is, I want script A to run, send a signal to script B, then wait for a signal from script B. When script B gets the signal from script A, it starts running, signals script A when it is finished, and then waits for a signal from A. Basically, I want the two scripts to run interleaved.
What's the best way to implement this synchronization?
Right now all I can come up with is the creation of a file as the signal (each script busy loops waiting for a file to be created). Are there other implementations that are faster/easier/safer?
In case it affects the answer, I'm on OSX.

Probably the easiest way of doing IPC in ruby is via drb and using Queue, which is located in thread:
require 'thread'
queue = Queue.new # provides blocking read
Note, when using drb, you'll want to have the following line near the top of your program:
Socket.do_not_reverse_lookup=true;
Without it, things just run extremely slowly (source).
To solve the specific problem described in the question, you can create a Pipe class, which essentially is just two Queue objects, one for the inbox and one for the outbox. The blocking read behavior of the Queue makes it easy to have the processes wait for each other. The Pipe is shared between the two processes via drb.
The server startup code might look like this:
require 'drb'
Socket.do_not_reverse_lookup=true;
uri = "druby://localhost:2250" # you can pick a port to communicate on
pipe = Pipe.new
DRb.start_service uri, pipe
The client startup code would look like:
require 'drb'
Socket.do_not_reverse_lookup=true;
uri = "druby://localhost:2250"
DRb.start_service
pipe = DRbObject.new nil, uri
Now the client and server can communicate via the Pipe object.

Related

Threads state change - is there an WinAPI to get callbacks for them?

I have a thread in some console process, whose code is not available to me. Is there a way to get notified when its state changes (e.g. becomes idle)?
Maybe hooks API or even kernel-mode API (if no other way...)?
Expansion:
I have legacy console app I have to operate in my program. The only way to interact with it, obviously, is via stdin. So I run it as new Process, and send commands to its stdin. I take some text from its stdout, but it is not entirely predictable, so I cannot rely on it in order to know when it finished its current work and ready to receive the next commands. The flow is something like this: I run it, send (e.g.) command1, wait for it to finish its work (some CPU load and some IO operations), and then issue command2, and so on, until last command. When it finished working on last command, I can close it gracefully (send exit command). The thing is that I have several processes of this console exe that are working simultaneously.
I can:
use timeouts since last stdout received - not good, because it can be a millisecond and an hour
parse stdout using (e.g.) regex to wait for expected outputs - not good... the output is wholly unexpected. It's almost random
using timers, poll its threads state and only when all of them are in wait state (and not for IO), and at least one is waiting for user input (see this) - not good either: in case I use many processes simultaneously it can create unnecessary, non-proportional burden on the system.
So I like the last option, just instead polling, I rather events fired when these threads become idle.
I hope it explains the issue well...

Send SIGINT using Ruby on Windows

For the purpose of a challenge over on PPCG.SE I want to run programs (submissions) with a strict time limit. In order for those submissions not to waste their precious time with I/O, I want to run the following scheme:
Submissions do their computation in an infinite loop (as far as they get) and register a signal handler.
After 10 minutes SIGINT is sent, and they have one second to produce whatever output is necessary.
SIGKILL is sent to terminate the submission for good.
However, submissions should have the option not to register a signal handler and just produce output in their own time.
I will run the submissions on Windows 8 and wanted to use Ruby to orchestrate this. However, I'm running into some trouble. I thought I'd be able to do this:
solver = IO.popen(solver_command, 'r+')
sleep(10*60)
Process.kill('INT', solver.pid)
sleep(1)
Process.kill('KILL', solver.pid)
puts solver.read
solver.close
However, the moment I send SIGINT, not only the submission process but the controlling Ruby process immediately aborts, too! Even worse, if I do this in PowerShell, PowerShell also shuts down.
A simple Ruby "submission" to reproduce this:
puts "My result"
$stdout.flush
next while true
What am I doing wrong here? I feel that it's very likely I'm misunderstanding something about processes and signals in general and not Ruby in particular, but I'm definitely missing something.

Why does resque use child processes for processing each job in a queue?

We have been using Resque in most of our projects, and we have been happy with it.
In a recent project, we were having a situation, where we are making a connection to a live streaming API from the twitter. Since, we have to maintain the connection, we were dumping each line from the streaming API to a resque queue, lest the connection is not lost. And we were, processing the queue afterwards.
We had a situation where the insertion rate into the queue was of the order 30-40/second and the rate at which the queue is popped was only 3-5/second. And because of this, the queue was always increasing. When we checked for reasons for this, we found that resque had a parent process, and for each job of the queue, it forks a child process, and the child process will be processing the job. Our rails environment was quite heavy and the child process forking was taking time.
So, we implemented another rake task of this sort, for the time being:
rake :process_queue => :environment do
while true
begin
interaction = Resque.pop("process_twitter_resque")
if interaction
ProcessTwitterResque.perform(interaction)
end
rescue => e
puts e.message
puts e.backtrace.join("\n")
end
end
end
and started the task like this:
nohup bundle exec rake process_queue --trace >> log/workers/process_queue/worker.log 2>&1 &
This does not handle failed jobs and all.
But, my question is why does Resque implement a child forked process to process the jobs from the queue. The jobs definitly does not need to be processed paralelly (since it is a queue and we expect it to process one after the other, sequentially and I beleive Resque also fork only 1 child process at a time).
I am sure Resque has done it with some purpose in mind. What is the exact purpose behind this parent/child process architecture?
The Ruby process that sits and listens for jobs in Redis is not the process that ultimately runs the job code written in the perform method. It is the “master” process, and its only responsibility is to listen for jobs. When it receives a job, it forks yet another process to run the code. This other “child” process is managed entirely by its master. The user is not responsible for starting or interacting with it using rake tasks. When the child process finishes running the job code, it exits and returns control to its master. The master now continues listening to Redis for its next job.
The advantage of this master-child process organization – and the advantage of Resque processes over threads – is the isolation of job code. Resque assumes that your code is flawed, and that it contains memory leaks or other errors that will cause abnormal behavior. Any memory claimed by the child process will be released when it exits. This eliminates the possibility of unmanaged memory growth over time. It also provides the master process with the ability to recover from any error in the child, no matter how severe. For example, if the child process needs to be terminated using kill -9, it will not affect the master’s ability to continue processing jobs from the Redis queue.
In earlier versions of Ruby, Resque’s main criticism was its potential to consume a lot of memory. Creating new processes means creating a separate memory space for each one. Some of this overhead was mitigated with the release of Ruby 2.0 thanks to copy-on-write. However, Resque will always require more memory than a solution that uses threads because the master process is not forked. It’s created manually using a rake task, and therefore must load whatever it needs into memory from the start. Of course, manually managing each worker process in a production application with a potentially large number of jobs quickly becomes untenable. Thankfully, we have pool managers for that.
Resque uses #fork for 2 reasons (among others): ability to prevent zombie workers (just kill them) and ability to use multiple cores (since it's another process).
Maybe this will help you with your fast-executing jobs: http://thewebfellas.com/blog/2012/12/28/resque-worker-performance

Spawning simultaneous child processes in Ruby

I'm using resque, with a queue processor which, as part of its execution, will start a shell process. Currently, I am using PTY.spawn() to invoke the shell command and handle its output.
I'd like to augment this code so that a quantity (N) can be given (the command executed onboards VMs, I want to be able to start a variable number with one call), and have the shell process be called N times in separate processes, without the Nth call having to wait for call N-1 to finish, and so on. I also want to capture all STDOUT from each invocation, so that I can do work on the output once the call is done.
I have looked at Kernel::fork but the scope of code inside a forked block is not the same as its parent (for pretty obvious reasons).
What tool(s) can I use so that each process can be spawned independently, their output can be captured, and I can still have the parent process wait for them all to finish before moving on?
Here:
stdouts=[]
numberOfProcesses.times do
stdouts<<PTY.spawn(command_line)[0..-1]
end
That's pretty basic if you just spawn them and get a bunch of STDOUT/STDIN pairs. If you want to be able to work on each process's output as soon as it is done, try this:
threads=[]
numberOfProcesses.times do
threads<<Thread.new(command_line) |cmd|
stdout, stdin, pid = PTY.spawn(cmd)
Process.waitpid(pid)
process_output(stdout.read)
end
end
threads.each {|t| t.join}
That spawns them in parallel, each thread waiting for when it's instance is done. When it's instance is done, it processes output and returns. The main thread sits waiting for all of the others to finish.

How to run multiple threads at the same time in ruby while working with a file?

I've been messing around with Ruby and threading a little bit today. I have a list of proxies that I want to check. Assuming a timeout of 10 seconds going through a very large list of proxies will take many hours if I write something that goes like:
proxies.each do |proxy|
check_proxy(proxy)
end
My first problem with trying to figure out threads is how to START multiple at the same exact time. I found a neat little snippet of code online:
for page in pages
threads << Thread.new(page) { |myPage|
puts "Fetching: #{myPage}\n"
doc = Hpricot(open(myPage.to_s)).to_s
puts "Got #{myPage}: #{doc.size}"
}
end
Seems to work nicely as far as starting them all at the same time. So now I can... start checking all 7 thousand records at the same time?
How do I go to a file, take out a line for each thread, run a batch of like 20 and repeat the process?
Can I run a while loop that in turn starts 20 threads at the same (which remove lines from a file) and keeps going until the file is blank?
I'm a little weak on the logic of what I'm supposed to do.
Thanks guys!
PS.
Another thought: Will there be file access issues if 20 workers are constantly messing with it randomly? What would be a good way around that if this is so?
The keyword you are after is threadpool. You can either try to find one for Ruby (I am sure there's couple at least on Github), or roll your own.
Here's a simple implementation here on SO.
Re: the file access, IMO you shouldn't let workers alter the file directly, but do it in your main thread. You don't want to allow simultaneous edits there.
Try to use gem DelayJob:
https://github.com/tobi/delayed_job
You don't need to generate that many Threads in order to do this work. In fact generating a lot of Threads can decrease the overall performance of your application. If you handle checking each proxy asynchronously, without blocking, you can get by with far fewer threads.
You'd create a file manager thread to process the file. Each line gets added as a request to an array(request queue). On the other end of the request queue you can use eventmachine to send the requests without blocking. eventmachine would also be used to receive the responses and handle the timeout. The response can then be placed on another array(response queue) which your file manager thread polls. The file manager thread pulls the responses from the response queue and resolves if the proxy exists or not.
This gets you down to just creating two threads. One issue that you will have is limiting the number of requests that have been sent since this model will be able to send out all of the requests in less than a second and flood the nearest router. In my experience you should be able to have around 500 outstanding requests at any one time.
There is more than one way to solve this problem asynchronously but hopefully the above is enough to help get you started with non-blocking I/O.

Resources