Ruby Thread local variables - ruby

Ruby has native support for thread local variables since version 2.0. However active_support/core_ext/thread.rb implements this feature in pure ruby for support of thread locals in earlier versions of Ruby. So, I wonder why should we use mutex in _locals method:
https://github.com/rails/rails/blob/ec1227a9cc682ebf796689ef0f329038162c421b/activesupport/lib/active_support/core_ext/thread.rb#L76

_locals does two things:
def _locals
# 1. Returns the local variable hash when defined
if defined?(#_locals)
#_locals
# 2. Lazily instantiates a locals hash
else
LOCK.synchronize { #_locals ||= {} }
end
end
That synchronization step is required to ensure that the #_locals is never cleared out during first access.
Consider the following scenario:
thread = Thread.new
# Say I run this statement...
thread.thread_variable_set('a', 1)
# In parallel with this statement...
thread.thread_variable_get(:a)
Both of those methods call _locals, and if they execute simultaneously, they may both end up at the lazy assignment step:
#_locals ||= {}
# Expands to...
unless #_locals
#_locals = {} # <-- We could end up here with both threads at the same time,
end # which jeopardizes any value that might have been set.
So imagine we had no mutex and that the setter completed execution while the getter had entered the lazy assignment step. We've effectively lost any locals we set due to a thread collision. Calling synchronize on a mutex guarantee that the block executes to completion without any such collisions.
Note that the core extension will not be loaded for versions of Ruby which support accessing thread-local variables. See the very last line:
unless Thread.instance_methods.include?(:thread_variable_set)

Related

Test cached (multiprocessed) and memoized method

Short question, I'd like to test this method foo:
class MyClass
def foo
# The `#cache_key` is defined by instance but can be the same for
# different instances (obviously I guess)
#foo ||= cache.fetch(#cache_key) do
# call a private method, tested aside
generate_foo_value || return
# The `|| return` trick is one I use to skip the yield result
# in case of falsey value. May not be related to the question,
# but I left it there because I'm not so sure.
end
end
end
The issue here is that I have two kinds of caching.
On one hand, I have #foo ||= that assure memoization and that I can test pretty easily. For instance by calling twice my method and watch if the result is different.
On the other hand, I have the cache.fetch method which may be shared between different instances. The trouble for me is here: I don't know the best way to unit test it. Should I mock my cache? Should I watch cache result? Or should I run plural instances with the same cache key?
And for this last question, I don't know how to run plural instances easily using rspec.
I don't think you need plural instances in this case. It looks like you could just set the cache value in the specs and test that the MyClass instance(s) is returning it.
before { cache.set(:cache_key, 'this is from cache') }
subject { MyClass.new(:cache_key, cache) } # you didn't provide the
# initializer, so I'll just assume
# that you can set the cache key
# and inject the cache object
specify do
expect(subject.foo).to eq 'this is from cache'
end
You can also set the expectation that generate_foo_value is not run at all in this case.
My reasoning is that: you don't need to fully emulate setting the cache in a separate process. You test only that if the cache with the key is set - your method has to return it instead of doing expensive computation.
The fact that this cache is shared between processes (like it sits in a PostgreSQL DB, or Redis or whatever) is irrelevant for the test.

In Ruby TracePoint, can you force stop the execution of the call being traced?

The goal of the method I am writing is to run every method called within a block argument in parallel.
I am exploring using TracePoint, which I am using to determine when each method inside the block is called, where I fork a process and call the method, however I can't find a way to then stop the default method execution.
I tried re-defining the method being called using define_singleton_method within the TracePoint block, which correctly alters the method for the NEXT execution of it, but the current execution still happens unaltered. Is there a way within TracePoint to stop execution of the method currently being traced? Alternatively, is there a way to accomplish my goal without using TracePoint?
Some simplified code to illustrate:
def my_method(&block)
trace = TracePoint.new(:call) do |tp|
tp.disable
method = tp.self.method(tp.method_id)
#args = []
method.parameters.each { |param|
#args.push(tp.binding.eval("#{param.last}"))
}
fork do
# This executes the method in it's own process with the same
# arguments given in the traced call
result = method.call(*#args)
end
tp.enable
end
trace.enable
block.call
trace.disable
end
After my code gets executed inside the TracePoint block and before block.call returns, the method within the block gets executed again in the main process. How can I prevent that from happening?
Maybe you could raise an exception inside of your TracePoint callback, and then catch and ignore it outside.

Ensure thread safety in the code

I'm trying to write a multi-threaded code to achieve parallelism for a task that is taking too much time. Here is how it looks:
class A
attr_reader :mutex, :logger
def initialize
#reciever = ZeroMQ::Queue
#sender = ZeroMQ::Queue
#mutex = Mutex.new
#logger = Logger.new('log/test.log')
end
def run
50.times do
Thread.new do
run_parallel(#reciever.get_data)
end
end
end
def run_parallel(data)
## Define some local variables.
a , b = data
## Log some data to file.
logger.info "Got #{a}"
output = B.get_data(b)
## Send output back to zermoq.
mutex.synchronize { #sender.send_data(output} }
end
end
One needs to make sure that code is thread safe. Sharing and changing data (like #,##,$ without proper mutex) across threads could lead to thread safety issue.
I'm not sure whether if I pass the data to a method, that results in thread safety issue as well. In other words, do I have to ensure that the part of my code inside run_parallel has to be wrapped in a mutex if I'm not using any #, ##, $ inside the method? Or is the given mutex definition enough?
mutex.synchronize { #sender.send_data(output} }
Whenever you're running in a threaded context, you've got to be aware (for a simple heuristic) of anything that's not a local variable. I see these potential problems in your code:
run_parallel(#reciever.get_data) Is get_data threadsafe? You've synchronized send_data, and they're both a ZeroMQ::Queue, so I'm guessing not.
output = B.get_data(b) Is this call threadsafe? If it just pulls something out of b, you're fine, but if it uses state in B or calls anything else that has state, you're in trouble.
logger.info "Got #{a}" #coreyward points out that Logger is threadsafe, so this is no trouble. Just make sure to stick with it over puts, which will garble your output.
Once you're inside the mutex for #sender.send_data, you're safe, assuming #sender isn't accessed anywhere else in your code by another thread. Of course, the more synchronize you throw around, the more your threads will block on each other and lose performance, so there's a balance you need to find your design.
Do what you can to make your code functional: try to use only local state and write methods that don't have side effects. As your task gets more complicated, there are libraries like concurrent-ruby with threadsafe data structures and other patterns that can help.

Ruby: Minitest, test-unit and instance variables

(Posted already at https://www.ruby-forum.com/topic/6876320, but crossposted here, because I did not receive a response so far).
A question about parallelizing tests in Minitest and/or Test::Unit (i.e. proper use of parallelize_me!):
Assume that I have some helper methods, which are needed by several tests. From my understanding, I could NOT do something like this in such a method (simplified example):
def prep(m,n)
#pid = m
#state = n
end
def process
if #stat > 5 && #pid != 0
...
else
...
end
end
I think I can't do this in Minitest and test-unit, because if I call prep and process from several of my test function, the tests can not be parallelized anymore - those test functions all set and read the same instance variable. Right?
Now, my question is, whether the following approach would be safe for parallelization: I make all of these mutable instance variables a hash, which I initialized in setup like this:
def setup
#pid ||= {}
#state ||= {}
end
My "helper methods" receive a key (for example, the name of the test
method) and use it to access the their "own" hash element:
def prep(key,m,n)
#pid[key] = m
#state[key] = n
end
def process
if #stat[key] > 5 && #pid[key] != 0
...
else
...
end
end
It's a bit ugly, but: Is this a reliable approach? Is this way of accessing a hash thread-safe? How can I do it better?
At least in Minitest you can safely do, for example,
setup do
#form = Form.new
end
without #form getting mixed up between parallel tests, so this approach should be safe too:
def setup
#stat = m
#pid = n
end
which means that your original approach should be safe as well.
================
UPDATE
consider the following gist with a piece of code that define 100 different tests accessing #random which is set in setup https://gist.github.com/bbozo/2a64e1f53d29747ca559
You will notice that the stuff set in setup isn't shared among tests, it is run before every test, basically every test is encapsulated so thread safety isn't an issue.
Your approach with the hashes makes sense, and it will work to distinguish between the threads. The problem lies with the Global Interpreter Lock.
Unless your helper methods are IO-bound (make HTTP requests, socket requests, handle local files), you won't see a speed improvement because Ruby will pretty much (to simplify things) run your code sequentially over multiple threads, without a guaranteed run order.
Good luck!

May a Recursive Function Release its own Mutex?

I have some code, a class method on a Ruby class FootballSeries.find(123), which performs an API call… owing to concerns about thread safety, only one thread may enter this method at a time. Due to some recent changes on the API, I have also support the following… FootballSeries.find('Premiership'), the second variety (see implementation below) simply makes an interim call to see if there's an ID can be found, then recursively calling itself using the ID.
class FootballSeries
#find_mutes = Mutex.new
class << self
def find(series_name_or_id)
#find_mutex.synchronize do
if series_name_or_id.is_a?(String)
if doc = search_xml_document(series_name_or_id)
if doc.xpath('//SeriesName').try(:first).try(:content) == series_name_or_id
#find_mutex.unlock
series = find(doc.xpath('//seriesid').first.content.to_i)
#find_mutex.lock
return series
end
end
elsif series_name_or_id.is_a?(Integer)
if doc = xml_document(series_name_or_id)
Series.new(doc)
end
end
end
end
end
end
Without lines 9 and 11, there's a recursive mutex lock: deadlock error (which makes enough sense… therefore my question is, may I release and re-lock the mutex. (I re-lock, so that when synchronize exits, I won't get an error unlocking a mutex that I don't own… but I haven't tested if this is required)
Is this a sane implementation, or would I be better served having find() call two individual methods, each protected with their own mutex? (example find_by_id, and find_by_name)
What I have now works (or at least appears to work).
Finally, bonus points for - how would I test such a method for safety?
This doesn't look good to me, as #find_mutex.unlock will allow another method(s) to enter at the same time. Also, I don't think using recursion is usual for this kind of method dispatch - actually you have two methods stuffed into one. I would certainly separate these two, and if you want to be able to call one method with different argument types, just check the argument's type and invoke one or the other. If you don't need to expose find_by_id and find_by_name, you can make them private, and put mutex.synchronize only in find.

Resources