I'm implementing a kind of write/store buffer in a Redis-backed library to squash multiple hincrby calls into a single call. The buffer needs to be fully atomic and work across multiple threads.
I'm quite new to dealing with thread-safety, hence; Are there any existing libraries or standardized ways to implement a global Hash-based buffer/queue that works fine in threaded environments?
As an example, the buffer hash would work something like this pseudo code:
buffer #=> { :ident1 => { :value_a => 1, :value_b => 4 },
# :ident2 => { :value_a => 2, :value_b => 3 } }
buffer[:ident1][:value_a] #=> 1
# saving merges and increments {:value_a => 2} into buffer[:ident1]
save(:ident1, {:value_a => 2})
buffer[:ident1][:value_a] #=> 3
The idea is that after X number of save calls the buffer is flushed by calling save with each item from the buffer.
In general, the way that you provide access to a global value in a thread-safe manner is to use the built-in Mutex class:
$buffer = {}
$bufflock = Mutex.new
threads = (0..2).map do |i|
Thread.new do
puts "Starting Thread #{i}"
3.times do
puts "Thread #{i} got: #{$buffer[:foo].inspect}"
$bufflock.synchronize{ $buffer[:foo] = ($buffer[:foo] || 1) * (i+1) }
sleep rand
end
puts "Ending Thread #{i}"
end
end
threads.each{ |t| t.join } # Wait for all threads to complete
#=> Starting Thread 0
#=> Thread 0 got: nil
#=> Starting Thread 1
#=> Thread 1 got: 1
#=> Starting Thread 2
#=> Thread 2 got: 2
#=> Thread 1 got: 6
#=> Thread 1 got: 12
#=> Ending Thread 1
#=> Thread 0 got: 24
#=> Thread 2 got: 24
#=> Thread 0 got: 72
#=> Thread 2 got: 72
#=> Ending Thread 0
#=> Ending Thread 2
Code inside a Mutex#synchronize block is atomic per thread; one thread cannot go into $bufflock until the previous thread is done with the block.
See also: Pure-Ruby concurrent Hash
Related
According to this post, i += 1 is thread safe in MRI Ruby because the preemption only happens at the end of function call, not somewhere between i += 1.
A repeatable test below shows that this is true:
But why while true do i += 1 end is not thread safe, as shown by the second test below where thread1 is preempted by thread2 when thread1 is still executing while true do i += 1 end ?
Please help.
Below are the code reference:
test one:
100.times do
i = 0
1000.times.map do
Thread.new {1000.times {i += 1}}
end.each(&:join)
puts i
end
test two:
t1 = Thread.new do
puts "#{Time.new} t1 running"
i = 0
while true do i += 1 end
end
sleep 4
t2 = Thread.new do
puts "#{Time.new} t2 running"
end
t1.join
t2.join
According to this post, i += 1 is thread safe in MRI
Not quite. The blog post states that method invocations are effectively thread-safe in MRI.
The abbreviated assignment i += 1 is syntactic sugar for:
i = i + 1
So we have an assignment i = ... and a method call i + 1. According to the blog post, the latter is thread-safe. But it also says that a thread-switch can occur right before returning the method's result, i.e. before the result is re-assigned to i:
i = i + 1
# ^
# here
Unfortunately this isn't easy do demonstrate from within Ruby.
We can however hook into Integer#+ and randomly ask the thread scheduler to pass control to another thread:
module Mayhem
def +(other)
Thread.pass if rand < 0.5
super
end
end
If MRI ensures thread-safety for the whole i += 1 statement, the above shouldn't have any effect. But it does:
Integer.prepend(Mayhem)
10.times do
i = 0
Array.new(10) { Thread.new { i += 1 } }.each(&:join)
puts i
end
Output:
5
7
6
4
4
8
4
5
6
7
If you want thread-safe code, don't rely on implementation details (those can change). In the above example, you could wrap the sensitive part in a Mutex#synchronize call:
Integer.prepend(Mayhem)
m = Mutex.new
10.times do
i = 0
Array.new(10) { Thread.new { m.synchronize { i += 1 } } }.each(&:join)
puts i
end
Output:
10
10
10
10
10
10
10
10
10
10
I am running multiple threads, and when one of the threads sets the global function '$trade_executed' to true I want it to kill all other threads and remove them from the global '$threads' array.
Then I restart the thread creation process.
Below is a simplified version of my codebase.
3 Threads are created and it looks like 2 threads are deleted but a third thread stays. (for reasons unknown)
Ideally this script would never print '2' or '3' because it would always trigger at '1' minute and kill all threads and reset.
*
thr.exit is preferred. I don't want any code pushed from other threads with a thr.join after $trade_executed is set
require 'thread'
class Finnean
def initialize
#lock = Mutex.new
end
def digger(minute)
sleep(minute * 60)
coco(minute)
end
def coco(minute)
#lock.synchronize {
puts "coco #{minute}"
$threads.each do |thr|
next if thr == Thread.current
thr.exit
end
$trade_executed = true
Thread.current.exit
}
end
end
minutes = [1, 2, 3]
$threads = Array.new
$trade_executed = false
abc = Finnean.new
def start_threads(minutes, abc)
minutes.each do |minute|
$threads << Thread.new {abc.digger(minute)}
puts minute
end
end
start_threads(minutes, abc)
while true
if $trade_executed != false then
count = 0
$threads.map! do |thr|
count += 1
puts "#{thr} & #{thr.status}"
thr.exit
$threads.delete(thr)
puts "Iteration #{count}"
end
count = 0
$threads.each do |thr|
count += 1
puts "#{thr}" ##{thr.status}
puts "Threads Still Left: #{count}"
end
$trade_executed = false
abc = Finnean.new
start_threads(minutes, abc)
end
end
Why not make a thread killer that you keep locked up until the first one finishes:
# Create two variables that can be passed in to the Thread.new block closure
threads = [ ]
killer = nil
# Create 10 threads, each of which waits a random amount of time before waking up the thread killer
10.times do |n|
threads << Thread.new do
sleep(rand(2..25))
puts "Thread #{n} finished!"
killer.wakeup
end
end
# Define a thread killer that will call `kill` on all threads, then `join`
killer = Thread.new(threads) do
Thread.stop
threads.each do |thread|
puts "Killing #{thread}"
thread.kill
thread.join
end
end
# The killer will run last, so wait for that to finish
killer.join
You can't force a thread to exit, but you can kill it. That generates an exception you could rescue and deal with as necessary.
How can I make these loops parallel with multithreading capability of ruby?
1.
from = 'a' * 1
to = 'z' * 3
("#{from}".."#{to}").each do |combination|
# ...
end
2.
##alphabetSet_tr.length.times do |i|
##alphabetSet_tr.length.times do |j|
##alphabetSet_tr.length.times do |k|
combination = ##alphabetSet_tr[i] + ##alphabetSet_tr[j] + ##alphabetSet_tr[k]
end
end
end
Note: ##alphabetSet_tr is an array which has 29 items
If you want to utilize your cores, you can use a Queue to divide the workload between a constant number of threads:
require 'thread'
queue = Queue.new
number_of_cores = 32
threads = (0..number_of_cores).map do
Thread.new do
combination = queue.pop
while combination
# do stuff with combination...
# take the next combination from the queue
combination = queue.pop
end
end
end
# fill the queue:
("#{from}".."#{to}").each do |combination|
queue << combination
end
# drain the threads
number_of_cores.times { queue << nil }
threads.each { |t| t.join }
If you fear that the size of the queue itself would be an issue - you can use SizedQueue which will block push operations if it gets larger than a certain size - capping its memory usage -
queue = SizedQueue.new(10000)
from = 'a' * 1
to = 'z' * 3
threads = ("#{from}".."#{to}").map do |combination|
Thread.new do
# ...
end
end
# In case you want the main thread waits all the child threads.
threads.each(&:join)
I'm surprised that Enumerator#each doesn't start off at the current position in the sequence.
o = Object.new
def o.each
yield 1
yield 2
yield 3
end
e = o.to_enum
puts e.next
puts e.next
e.each{|x| puts x}
# I expect to see 1,2,3 but I see 1,2,1,2,3
# apparently Enumerator's each (inherited from Enumerable) restarts the sequence!
Am I doin' it wrong? Is there a way to maybe construct another Enumerator (from e) that will have the expected each behavior?
You're not doing it wrong, that's just not the semantics defined for Enumerator#each. You could make a derivative enumerator that only iterates from current position to end:
class Enumerator
def enum_the_rest
Enumerator.new { |y| loop { y << self.next } }
end
end
o = Object.new
def o.each
yield 1
yield 2
yield 3
end
e = o.to_enum
=> #<Enumerator: ...>
e.next
=> 1
e2 = e.enum_the_rest
=> #<Enumerator: ...>
e2.each { |x| puts x }
=> 2
=> 3
And, BTW, each doesn't restart the sequence, it just always runs over the entire span. Your enumerator still knows where it is in relation to the next next call.
e3 = o.to_enum
e3.next
=> 1
e3.next
=> 2
e3.map(&:to_s)
=> ["1", "2", "3"]
e3.next
=> 3
Enumerator#next and Enumerator#each work on the object differently. Per the documentation for #each (emphasis mine):
Iterates over the block according to how this Enumerable was constructed. If no block is given, returns self.
So #each always behaves based on the original setup, not on the current internal state. If you quickly peak at the source you'll see that rb_obj_dup is called to setup a new enumerator.
Let's say I have
some_value = 23
I use the Integer's times method to loop.
Inside the iteration, is there an easy way, without keeping a counter, to see what iteration the loop is currently in?
Yes, just have your block accept an argument:
some_value.times{ |index| puts index }
#=> 0
#=> 1
#=> 2
#=> ...
or
some_value.times do |index|
puts index
end
#=> 0
#=> 1
#=> 2
#=> ...
3.times do |i|
puts i*100
end
In this way, you can replace 3 with any integer you like, and manipulate the index i in your looped calculations.
My example will print the following, since the index starts from 0:
# output
0
100
200