Prefered way to fork / start subprocesses in Cucumber - ruby

Let's say I have this scenario:
Scenario: Test LDAP access
Given that the LDAP dummy server is started
And the LDAP query is executed
...
I wish to start a LDAP server in that step. In my case, I use ruby-ldapserver, so I could, in theory, do this in my step:
args = { ... }
#ldap_pid = fork do
redirect_stdout_stderr_to_logfile()
wait_for_ldap_requests(args)
exit # avoid messing with Cucumber/web driver cleanup
end
...
After do
if #ldap_pid
Process.kill("HUP", #ldap_pid)
Process.wait #ldap_pid
end
end
A totally different approach:
system("some_script_that_starts_ldap_dummy < #{input} >#{tmpfile} 2>&1 &")
This certainly works but is rather unelegant (starting a ruby program from inside ruby - unnecessary process creation, and I need to set up the input parameters for that subprogram as well).
All that said, I'm not too altogether about either approach (the "warm fuzzy feeling" is not there).
What is your standard approach to these things? Is there one to speak of? Does Cucumber bring something to the table that could support me here? Should I run something to tell Cucumber that it has forked and should handle itself like a child process?
Edit: actually, when playing around with the fork approach, I did not notice any problems with the DB at all. I did notice that if I kill the child with SIGINT, it will break the web driver (Poltergeist / PhantomJS) in my case. A functioning workaround for this is to send a SIGHUP, handle it in the child by shutting down gracefully (if needed) but not callingexit; and then, after a few seconds a SIGKILL (which denies the child any chance to close down any protocols and just rips it away). Not nice... and not free of race conditions, say if the CI server should be under load.

Related

Run when you can

In my sinatra web application, I have a route:
get "/" do
temp = MyClass.new("hello",1)
redirect "/home"
end
Where MyClass is:
class MyClass
#instancesArray = []
def initialize(string,id)
#string = string
#id = id
#instancesArray[id] = this
end
def run(id)
puts #instancesArray[id].string
end
end
At some point I would want to run MyClass.run(1), but I wouldn't want it to execute immediately because that would slow down the servers response to some clients. I would want the server to wait to run MyClass.run(temp) until there was some time with a lighter load. How could I tell it to wait until there is an empty/light load, then run MyClass.run(temp)? Can I do that?
Addendum
Here is some sample code for what I would want to do:
$var = 0
get "/" do
$var = $var+1 # each time a request is recieved, it incriments
end
After that I would have a loop that would count requests/minute (so after a minute it would reset $var to 0, and if $var was less than some number, then it would run tasks util the load increased.
As Andrew mentioned (correctly—not sure why he was voted down), Sinatra stops processing a route when it sees a redirect, so any subsequent statements will never execute. As you stated, you don't want to put those statements before the redirect because that will block the request until they complete. You could potentially send the redirect status and header to the client without using the redirect method and then call MyClass#run. This will have the desired effect (from the client's perspective), but the server process (or thread) will block until it completes. This is undesirable because that process (or thread) will not be able to serve any new requests until it unblocks.
You could fork a new process (or spawn a new thread) to handle this background task asynchronously from the main process associated with the request. Unfortunately, this approach has the potential to get messy. You would have to code around different situations like the background task failing, or the fork/spawn failing, or the main request process not ending if it owns a running thread or other process. (Disclaimer: I don't really know enough about IPC in Ruby and Rack under different application servers to understand all of the different scenarios, but I'm confident that here there be dragons.)
The most common solution pattern for this type of problem is to push the task into some kind of work queue to be serviced later by another process. Pushing a task onto the queue is ideally a very quick operation, and won't block the main process for more than a few milliseconds. This introduces a few new challenges (where is the queue? how is the task described so that it can be facilitated at a later time without any context? how do we maintain the worker processes?) but fortunately a lot of the leg work has already been done by other people. :-)
There is the delayed_job gem, which seems to provide a nice all-in-one solution. Unfortunately, it's mostly geared towards Rails and ActiveRecord, and the efforts people have made in the past to make it work with Sinatra look to be unmaintained. The contemporary, framework-agnostic solutions are Resque and Sidekiq. It might take some effort to get up and running with either option, but it would be well worth it if you have several "run when you can" type functions in your application.
MyClass.run(temp) is never actually executing. In your current request to / path you instantiate a new instance of MyClass then it will immediately do a get request to /home. I'm not entirely sure what the question is though. If you want something to execute after the redirect, that functionality needs to exist within the /home route.
get '/home' do
# some code like MyClass.run(some_arg)
end

How to keep a persistent connection to SQL Server using Ruby Sequel and Tiny_TDS while in a loop

I have a ruby script that needs to run continually on the server. I've daemonized it using the daemon gem, and in my script I have it running in an infinite loop, since the daemon gem handles starting and stopping of the process that kicks off my script. In my script, I start out by setting up my DB instance using the Sequel gem and tiny_tds. Like so:
DB = Sequel.connect(adapter: 'tinytds', host: MSSQLHost, database: MSSQLDatabase, user: MSSQLUser, password: MSSQLPassword)
Then I have a loop do that is my infinite loop. Inside that, I test to see if I have a connection using DB.test_connection and then I query the DB every second or so to check if there is new content using a query such as:
DB['SELECT * FROM dbo.[MyTable]'].all do |row|
# MY logic here
# As part of my logic I test to see if I need to delete this row in the table and if so I use
DB.run('DELETE FROM dbo.[MyTable] WHERE some condition')
end
Then at the end of my logic, just before I loop again, I do:
sleep 1
DB.disconnect
All of this works great for about an hour to an hour and a half with everything checking the table, doing the logic, deleting rows, etc., then it dies and gives me this error message TinyTds::Error: Adaptive Server connection timed out
My question, why is that happening? Do I need to reformat my code in a different way? Why doesn't the DB.test_connection do what it is advertised to do? The documentation on that says it checks for a connection in the connection pool, and uses it if it finds it, and creates a new one otherwise.
Any help would be much appreciated
DB.test_connection just acquires a connection from the connection pool, it doesn't check that the connection is still valid (it must have been valid at one point or it wouldn't be in the pool). There's no way that a connection is still valid without actually sending a query. You can use the connection_validator extension that ships with Sequel if you want to do that automatically.
If you are loading Sequel before forking, you need to make sure you call DB.disconnect before forking, otherwise you can end up with multiple forked processes sharing the same connection, which can cause many different issues.
I finally ended up just putting a rescue statement in there that caught this, and re-ran my line of code to create the DB instance, yes, it puts a warning in my log about already setting that instance, but I guess I could just make that not a contstant an that would go away. Anyway, it appears to be working now, and the times it does timeout, I'm recovering gracefully from those. I just wish I could have figured out why it was/is disconnecting like it is.

How to use ruby fibers to avoid blocking IO

I need to upload a bunch of files in a directory to S3. Since more than 90% of the time required to upload is spent waiting for the http request to finish, I want to execute several of them at once somehow.
Can Fibers help me with this at all? They are described as a way to solve this sort of problem, but I can't think of any way I can do any work while an http call blocks.
Any way I can solve this problem without threads?
I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html
Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.
If you want to upload multiple files at once, simply launch a new Thread for each file:
t = Thread.new do
upload_file(param1, param2)
end
#all_threads << t
Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):
#all_threads.each do |t|
t.join if t.alive?
end
The Queue can either be a #member_variable or a $global.
To answer your actual questions:
Can Fibers help me with this at all?
No they can't. Jörg W Mittag explains why best.
No, you cannot do concurrency with Fibers. Fibers simply aren't a concurrency construct, they are a control-flow construct, like Exceptions. That's the whole point of Fibers: they never run in parallel, they are cooperative and they are deterministic. Fibers are coroutines. (In fact, I never understood why they aren't simply called Coroutines.)
The only concurrency construct in Ruby is Thread.
When he says that the only concurrency contruct in Ruby is Thread, remember that there are many different implimentations of Ruby and that they vary in their threading implementations. Jörg once again provides a great answer to these differences; and correctly concludes that only something like JRuby (that uses JVM threads mapped to native threads) or forking your process is how you can achieve true parallelism.
Any way I can solve this problem without threads?
Other than forking your process, I would also suggest that you look at EventMachine and something like em-http-request. It's an event driven, non-blocking, reactor pattern based HTTP client that is asynchronous and does not incur the overhead of threads.
Aaron Patterson (#tenderlove) uses an example almost exactly like yours to describe exactly why you can and should use threads to achieve concurrency in your situation.
Most I/O libraries are now smart enough to release the GVL (Global VM Lock, or most people know it as the GIL or Global Interpreter Lock) when doing IO. There is a simple function call in C to do this. You don't need to worry about the C code, but for you this means that most IO libraries worth their salt are going to release the GVL and allow other threads to execute while the thread that is doing the IO waits for the data to return.
If what I just said was confusing, you don't need to worry about it too much. The main thing that you need to know is that if you are using a decent library to do your HTTP requests (or any other I/O operation for that matter... database, interprocess communication, whatever), the Ruby interpreter (MRI) is smart enough to be able to release the lock on the interpreter and allow other threads to execute while one thread awaits IO to return. If the next thread has its own IO to grab, the Ruby interpreter will do the same thing (assuming that the IO library is built to utilize this feature of Ruby, which I believe most are these days).
So, to sum up what I am saying, use threads! You should see the performance benefit. If not, check to see whether your http library is using the rb_thread_blocking_region() function in C and, if not, find out why not. Maybe there is a good reason, maybe you need to consider using a better library.
The link to the Aaron Patterson video is here: http://www.youtube.com/watch?v=kufXhNkm5WU
It is worth a watch, even if just for the laughs, as Aaron Patterson is one of the funniest people on the internet.
You could use separate processes for this instead of threads:
#!/usr/bin/env ruby
$stderr.sync = true
# Number of children to use for uploading
MAX_CHILDREN = 5
# Hash of PIDs for children that are working along with which file
# they're working on.
#child_pids = {}
# Keep track of uploads that failed
#failed_files = []
# Get the list of files to upload as arguments to the program
#files = ARGV
### Wait for a child to finish, adding the file to the list of those
### that failed if the child indicates there was a problem.
def wait_for_child
$stderr.puts " waiting for a child to finish..."
pid, status = Process.waitpid2( 0 )
file = #child_pids.delete( pid )
#failed_files << file unless status.success?
end
### Here's where you'd put the particulars of what gets uploaded and
### how. I'm just sleeping for the file size in bytes * milliseconds
### to simulate the upload, then returning either +true+ or +false+
### based on a random factor.
def upload( file )
bytes = File.size( file )
sleep( bytes * 0.00001 )
return rand( 100 ) > 5
end
### Start a child uploading the specified +file+.
def start_child( file )
if pid = Process.fork
$stderr.puts "%s: uploaded started by child %d" % [ file, pid ]
#child_pids[ pid ] = file
else
if upload( file )
$stderr.puts "%s: done." % [ file ]
exit 0 # success
else
$stderr.puts "%s: failed." % [ file ]
exit 255
end
end
end
until #files.empty?
# If there are already the maximum number of children running, wait
# for one to finish
wait_for_child() if #child_pids.length >= MAX_CHILDREN
# Start a new child working on the next file
start_child( #files.shift )
end
# Now we're just waiting on the final few uploads to finish
wait_for_child() until #child_pids.empty?
if #failed_files.empty?
exit 0
else
$stderr.puts "Some files failed to upload:",
#failed_files.collect {|file| " #{file}" }
exit 255
end

Using Unix Process Control Methods in Ruby

Ryan Tomayko touched off quite a fire storm with this post about using Unix process control commands.
We should be doing more of this. A lot more of this. I'm talking about fork(2), execve(2), pipe(2), socketpair(2), select(2), kill(2), sigaction(2), and so on and so forth. These are our friends. They want so badly just to help us.
I have a bit of code (a delayed_job clone for DataMapper that I think would fit right in with this, but I'm not clear on how to take advantage of the listed commands. Any Ideas on how to improve this code?
def start
say "*** Starting job worker #{#name}"
t = Thread.new do
loop do
delay = Update.work_off(self)
break if $exit
sleep delay
break if $exit
end
clear_locks
end
trap('TERM') { terminate_with t }
trap('INT') { terminate_with t }
trap('USR1') do
say "Wakeup Signal Caught"
t.run
end
end
Ahh yes... the dangers of "We should do more of this" without explaining what each of those do and in what circumstances you'd use them. For something like delayed_job you may even be using fork without knowing that you're using fork. That said, it really doesn't matter. Ryan was talking about using fork for preforking servers. delayed_job would use fork for turning a process into a daemon. Same system call, different purposes. Running delayed_job in the foreground (without fork) vs in the background (with fork) will result in a negligible performance difference.
However, if you write a server that accepts concurrent connections, now Ryan's advice is right on the money.
fork: creates a copy of the original process
execve: stops executing the current file and begins executing a new file in the same process (very useful in rake tasks)
pipe: creates a pipe (two file descriptors, one for read, one for write)
socketpair: like a pipe, but for sockets
select: let's you wait for one or more of multiple file descriptors to be ready with a timeout
kill: used to send a signal to a process
sigaction: lets you change what happens when a process receives a signal
5 months later, you can view my solution at http://github.com/antarestrader/Updater. Look at lib/updater/fork_worker.rb

In ruby, how do I attempt a block of code but move on after n seconds?

I have a library method that occasionally hangs on a network connection, and there's no timeout mechanism.
What's the easiest way to add my own? Basically, I'm trying to keep my code from getting indefinitely stuck.
timeout.rb has some problems where basically it doesn't always work quite right, and I wouldn't recommend using it. Check System Timer or Terminator instead
The System Timer page in particular describes why timeout.rb can fail, complete with pretty pictures and everything. Bottom line is:
For timeout.rb to work, a freshly created “homicidal” Ruby thread has to be scheduled by the Ruby interpreter.
M.R.I. 1.8, the interpreter used by most Ruby applications in production, implements Ruby threads as green threads.
It is a well-known limitations of the green threads (running on top of a single native thread) that when a green thread performs a blocking system call to the underlying operating systems, none of green threads in the virtual machine will run until the system call returns.
Answered my own question:
http://www.ruby-doc.org/stdlib/libdoc/timeout/rdoc/index.html
require 'timeout'
status = Timeout::timeout(5) {
# Something that should be interrupted if it takes too much time...
}
To prevent an ugly error on timeout I suggest enclosing it and using a rescue like this:
begin
status = Timeout::timeout(5) do
#Some stuff that should be interrupted if it takes too long
end
rescue Timeout::Error
puts "Error message: It took too long!\n"
end

Resources