(Posted already at https://www.ruby-forum.com/topic/6876320, but crossposted here, because I did not receive a response so far).
A question about parallelizing tests in Minitest and/or Test::Unit (i.e. proper use of parallelize_me!):
Assume that I have some helper methods, which are needed by several tests. From my understanding, I could NOT do something like this in such a method (simplified example):
def prep(m,n)
#pid = m
#state = n
end
def process
if #stat > 5 && #pid != 0
...
else
...
end
end
I think I can't do this in Minitest and test-unit, because if I call prep and process from several of my test function, the tests can not be parallelized anymore - those test functions all set and read the same instance variable. Right?
Now, my question is, whether the following approach would be safe for parallelization: I make all of these mutable instance variables a hash, which I initialized in setup like this:
def setup
#pid ||= {}
#state ||= {}
end
My "helper methods" receive a key (for example, the name of the test
method) and use it to access the their "own" hash element:
def prep(key,m,n)
#pid[key] = m
#state[key] = n
end
def process
if #stat[key] > 5 && #pid[key] != 0
...
else
...
end
end
It's a bit ugly, but: Is this a reliable approach? Is this way of accessing a hash thread-safe? How can I do it better?
At least in Minitest you can safely do, for example,
setup do
#form = Form.new
end
without #form getting mixed up between parallel tests, so this approach should be safe too:
def setup
#stat = m
#pid = n
end
which means that your original approach should be safe as well.
================
UPDATE
consider the following gist with a piece of code that define 100 different tests accessing #random which is set in setup https://gist.github.com/bbozo/2a64e1f53d29747ca559
You will notice that the stuff set in setup isn't shared among tests, it is run before every test, basically every test is encapsulated so thread safety isn't an issue.
Your approach with the hashes makes sense, and it will work to distinguish between the threads. The problem lies with the Global Interpreter Lock.
Unless your helper methods are IO-bound (make HTTP requests, socket requests, handle local files), you won't see a speed improvement because Ruby will pretty much (to simplify things) run your code sequentially over multiple threads, without a guaranteed run order.
Good luck!
Related
Short question, I'd like to test this method foo:
class MyClass
def foo
# The `#cache_key` is defined by instance but can be the same for
# different instances (obviously I guess)
#foo ||= cache.fetch(#cache_key) do
# call a private method, tested aside
generate_foo_value || return
# The `|| return` trick is one I use to skip the yield result
# in case of falsey value. May not be related to the question,
# but I left it there because I'm not so sure.
end
end
end
The issue here is that I have two kinds of caching.
On one hand, I have #foo ||= that assure memoization and that I can test pretty easily. For instance by calling twice my method and watch if the result is different.
On the other hand, I have the cache.fetch method which may be shared between different instances. The trouble for me is here: I don't know the best way to unit test it. Should I mock my cache? Should I watch cache result? Or should I run plural instances with the same cache key?
And for this last question, I don't know how to run plural instances easily using rspec.
I don't think you need plural instances in this case. It looks like you could just set the cache value in the specs and test that the MyClass instance(s) is returning it.
before { cache.set(:cache_key, 'this is from cache') }
subject { MyClass.new(:cache_key, cache) } # you didn't provide the
# initializer, so I'll just assume
# that you can set the cache key
# and inject the cache object
specify do
expect(subject.foo).to eq 'this is from cache'
end
You can also set the expectation that generate_foo_value is not run at all in this case.
My reasoning is that: you don't need to fully emulate setting the cache in a separate process. You test only that if the cache with the key is set - your method has to return it instead of doing expensive computation.
The fact that this cache is shared between processes (like it sits in a PostgreSQL DB, or Redis or whatever) is irrelevant for the test.
I'm trying to write a multi-threaded code to achieve parallelism for a task that is taking too much time. Here is how it looks:
class A
attr_reader :mutex, :logger
def initialize
#reciever = ZeroMQ::Queue
#sender = ZeroMQ::Queue
#mutex = Mutex.new
#logger = Logger.new('log/test.log')
end
def run
50.times do
Thread.new do
run_parallel(#reciever.get_data)
end
end
end
def run_parallel(data)
## Define some local variables.
a , b = data
## Log some data to file.
logger.info "Got #{a}"
output = B.get_data(b)
## Send output back to zermoq.
mutex.synchronize { #sender.send_data(output} }
end
end
One needs to make sure that code is thread safe. Sharing and changing data (like #,##,$ without proper mutex) across threads could lead to thread safety issue.
I'm not sure whether if I pass the data to a method, that results in thread safety issue as well. In other words, do I have to ensure that the part of my code inside run_parallel has to be wrapped in a mutex if I'm not using any #, ##, $ inside the method? Or is the given mutex definition enough?
mutex.synchronize { #sender.send_data(output} }
Whenever you're running in a threaded context, you've got to be aware (for a simple heuristic) of anything that's not a local variable. I see these potential problems in your code:
run_parallel(#reciever.get_data) Is get_data threadsafe? You've synchronized send_data, and they're both a ZeroMQ::Queue, so I'm guessing not.
output = B.get_data(b) Is this call threadsafe? If it just pulls something out of b, you're fine, but if it uses state in B or calls anything else that has state, you're in trouble.
logger.info "Got #{a}" #coreyward points out that Logger is threadsafe, so this is no trouble. Just make sure to stick with it over puts, which will garble your output.
Once you're inside the mutex for #sender.send_data, you're safe, assuming #sender isn't accessed anywhere else in your code by another thread. Of course, the more synchronize you throw around, the more your threads will block on each other and lose performance, so there's a balance you need to find your design.
Do what you can to make your code functional: try to use only local state and write methods that don't have side effects. As your task gets more complicated, there are libraries like concurrent-ruby with threadsafe data structures and other patterns that can help.
I just re-read Practical Object Oriented Programming in Ruby by Sandi Metz, especially the chapter on testing. Also a very useful talk that I recommend Rubyists watch: http://www.youtube.com/watch?v=URSWYvyc42M
She says to test these cases:
Incoming query messages: Test them by asserting what they return.
Incoming command messages: Test the direct public side effects (I have a question about this)
Query messages sent to self: Don't test them
Command messages sent to self: Don't test them
Outgoing query messages: Don't test them
Outgoing command messages: Test that they are sent
For #2, she provided an example similar to this:
#class
class Gear
attr_reader :cog
def set_cog(cog)
#cog = cog
end
end
# example spec
it "sets the value of #cog" do
gear = Gear.new
gear.set_cog(1)
expect(gear.cog).to eq(1)
end
So this is simple because it just sets the value of the instance variable so the side effects are obvious. But what if my method calls another command message? For example:
class Gear
attr_reader :cog, :foo, :bar
def set_cog(cog)
reset_other_attributes
#cog = cog
end
def reset_other_attributes
#foo = nil
#bar = nil
end
end
How should I test that? I'm thinking that it should be treated like an outgoing command message, where you should assert that that message is sent and have a separate test for the reset_other_attributes method.
it "calls the reset_other_attributes method" do
gear = Gear.new
gear.should_receive(:reset_other_attributes)
gear.set_cog(1)
end
Is this correct?
The real reason why this method is hard to test is the fact that it violates the SRP principle. It is setting more than the value of cog.
Anyway, in this case I would test that the expected changes take effect, it doesn't seem reasonable to test that the "reset_other_attributes" method is called. From this snipped, it looks like "reset_other_attributes" shouldn't even be part of the public API.
Setting methods generally should have no hidden side effects.
That is, if given method represent PROCESS, and its understood that that process involve A, B, and C its OK to make single gobled function that mix and match.
And You MUST test all the results (eg. that A was set, B was executed, and C was logged).
But it may be that function actually do two unrelated things. That's bad. Users may forget about this or that side effect and nasty bug just started its life.
Then do write test, like for normal function. Then refactor into two distinct functions.
I recently ran into an issue where calling load twice on a Ruby class caused bugs (here's a real-world example). This is because there were stateful method invocations taking place in the class body, and load was causing these to be executed twice. A simple example is as follows:
base.rb:
class Base
def foo
puts "BASE"
end
end
derived.rb:
require "./base"
class Derived < Base
alias_method :foo_aliased, :foo
def foo
puts "DERIVED!"
end
end
Execution from a REPL:
$ load './derived.rb'
> true
$ Derived.new.foo
> DERIVED!
> nil
$ Derived.new.foo_aliased
> BASE
> nil
$ load './derived.rb'
> true
$ Derived.new.foo
> DERIVED!
> nil
$ Derived.new.foo_aliased
> DERIVED!
> nil
In this example, the second load causes alias_method to clobber the original alias. As a result, we've broken any code that depends on having an intact alias to the original method.
Brute-force class reloading is common enough (e.g., you see it a lot in the each_run clause for RSpec configurations using Spork) that it's not always easy to forbid the use of load outright. As such, it seems like the only way to prevent bugs is to ensure that class definitions are "idempotent". In other words, you have make sure that methods intended to be called from a class definition should produce the same result no matter how many times you call them. The fact that re-loads don't break more code seems to suggest an unspoken convention.
Is there some style guide for Ruby class design that mandates this? If so, what are the common techniques for enforcing these semantics? For some methods, you could just do some memoization to prevent re-execution of the same method with the same arguments, but other things like alias_method seem harder to work around.
You are entirely wrong. Assuming all class methods to be idempotent is neither a sufficient nor a necessary condition for a code to not break when loaded multiple times.
Therefore, it naturally follows that there is no style guide to enforce that.
What is a good way to provide an easy to use object-style interface to an HTTP API, while still allowing asynchronous HTTP requests to be made? For example, given the following code:
user = User.find(1) # /api/v1/users/1
f = user.foo # /api/v1/users/1/foo
b = user.bar # /api/v1/users/1/bar
Calls to the foo and bar methods could logically be called in parallel, but if possible I'd like to let the calling user have a clean way to declare their intent for parallel calls without getting into the details of the underlying HTTP lib.
I don't think invisibly automating the parallelization is a good idea, so that the calling code is explicit about its expectations. But, I do think something like the following block syntax could be extremely useful for front-end developers to be able to use when they know that a set of requests do not depend on each other.
# possible implementation?
user = User.find(1)
User.parallel do
f = user.foo
b = user.bar
end
Is this possible? How can I accomplish this using Ruby 1.9.2?
Transparent parallelization is generally accomplished through something called a future. In ruby, both the lazy and promise gems have implementations of futures. Here's an example with the promise gem:
require 'future'
user = User.find(1)
f = future{user.foo}
b = future{user.bar}
A future will run the computation in a block in a background thread, and any attempt to operate on f or b will block unless the background thread has already run to completion.
As a library designer, so long as your library is threadsafe, this should be fine in many cases, though it probably doesn't scale all that well in ruby.
The way this works using asynchronous SQL is something like this:
User.where(:id => 1).async_each do |user|
Foo.where(:id => user.foo_id).async_each do |foo|
# ... Do stuff with foo ...
end
Bar.where(:id => user.bar_id).async_each do |bar|
# ... Do stuff with foo ...
end
end
This is my experience with asynchronous Sequel, which is like ActiveRecord.
Keep in mind this is a lot of callbacks stacked together, so things can get sideways if you're not careful.