According to this post, i += 1 is thread safe in MRI Ruby because the preemption only happens at the end of function call, not somewhere between i += 1.
A repeatable test below shows that this is true:
But why while true do i += 1 end is not thread safe, as shown by the second test below where thread1 is preempted by thread2 when thread1 is still executing while true do i += 1 end ?
Please help.
Below are the code reference:
test one:
100.times do
i = 0
1000.times.map do
Thread.new {1000.times {i += 1}}
end.each(&:join)
puts i
end
test two:
t1 = Thread.new do
puts "#{Time.new} t1 running"
i = 0
while true do i += 1 end
end
sleep 4
t2 = Thread.new do
puts "#{Time.new} t2 running"
end
t1.join
t2.join
According to this post, i += 1 is thread safe in MRI
Not quite. The blog post states that method invocations are effectively thread-safe in MRI.
The abbreviated assignment i += 1 is syntactic sugar for:
i = i + 1
So we have an assignment i = ... and a method call i + 1. According to the blog post, the latter is thread-safe. But it also says that a thread-switch can occur right before returning the method's result, i.e. before the result is re-assigned to i:
i = i + 1
# ^
# here
Unfortunately this isn't easy do demonstrate from within Ruby.
We can however hook into Integer#+ and randomly ask the thread scheduler to pass control to another thread:
module Mayhem
def +(other)
Thread.pass if rand < 0.5
super
end
end
If MRI ensures thread-safety for the whole i += 1 statement, the above shouldn't have any effect. But it does:
Integer.prepend(Mayhem)
10.times do
i = 0
Array.new(10) { Thread.new { i += 1 } }.each(&:join)
puts i
end
Output:
5
7
6
4
4
8
4
5
6
7
If you want thread-safe code, don't rely on implementation details (those can change). In the above example, you could wrap the sensitive part in a Mutex#synchronize call:
Integer.prepend(Mayhem)
m = Mutex.new
10.times do
i = 0
Array.new(10) { Thread.new { m.synchronize { i += 1 } } }.each(&:join)
puts i
end
Output:
10
10
10
10
10
10
10
10
10
10
I am running multiple threads, and when one of the threads sets the global function '$trade_executed' to true I want it to kill all other threads and remove them from the global '$threads' array.
Then I restart the thread creation process.
Below is a simplified version of my codebase.
3 Threads are created and it looks like 2 threads are deleted but a third thread stays. (for reasons unknown)
Ideally this script would never print '2' or '3' because it would always trigger at '1' minute and kill all threads and reset.
*
thr.exit is preferred. I don't want any code pushed from other threads with a thr.join after $trade_executed is set
require 'thread'
class Finnean
def initialize
#lock = Mutex.new
end
def digger(minute)
sleep(minute * 60)
coco(minute)
end
def coco(minute)
#lock.synchronize {
puts "coco #{minute}"
$threads.each do |thr|
next if thr == Thread.current
thr.exit
end
$trade_executed = true
Thread.current.exit
}
end
end
minutes = [1, 2, 3]
$threads = Array.new
$trade_executed = false
abc = Finnean.new
def start_threads(minutes, abc)
minutes.each do |minute|
$threads << Thread.new {abc.digger(minute)}
puts minute
end
end
start_threads(minutes, abc)
while true
if $trade_executed != false then
count = 0
$threads.map! do |thr|
count += 1
puts "#{thr} & #{thr.status}"
thr.exit
$threads.delete(thr)
puts "Iteration #{count}"
end
count = 0
$threads.each do |thr|
count += 1
puts "#{thr}" ##{thr.status}
puts "Threads Still Left: #{count}"
end
$trade_executed = false
abc = Finnean.new
start_threads(minutes, abc)
end
end
Why not make a thread killer that you keep locked up until the first one finishes:
# Create two variables that can be passed in to the Thread.new block closure
threads = [ ]
killer = nil
# Create 10 threads, each of which waits a random amount of time before waking up the thread killer
10.times do |n|
threads << Thread.new do
sleep(rand(2..25))
puts "Thread #{n} finished!"
killer.wakeup
end
end
# Define a thread killer that will call `kill` on all threads, then `join`
killer = Thread.new(threads) do
Thread.stop
threads.each do |thread|
puts "Killing #{thread}"
thread.kill
thread.join
end
end
# The killer will run last, so wait for that to finish
killer.join
You can't force a thread to exit, but you can kill it. That generates an exception you could rescue and deal with as necessary.
I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.
I wrote a test program. In short, it does the following:
executes some random useless calcuations (but long enough) in single thread and prints execution
time
executes same program in 6 threads and prints execution time
Now, in both cases execution time is the same. What am I doing wrong?
Here are the sources:
def time
start = Time.now
yield
p Time.now - start
end
range_limit = 9999
i_exponent = 9999
time do
array = (1 .. range_limit).map { |i | i**i_exponent }
p (array.inject(:+)/2)[0]
end
time do
first_thread = Thread.new do
arr = (1..range_limit/6).map { |i| i**i_exponent }
arr.inject(:+)
end
second_thread = Thread.new do
arr = (range_limit/6..range_limit*2/6).map { |i| i**i_exponent }
arr.inject(:+)
end
third_thread = Thread.new do
arr = (range_limit*2/6..range_limit*3/6).map { |i| i**i_exponent }
arr.inject(:+)
end
fourth_thread = Thread.new do
arr = (range_limit*3/6..range_limit*4/6).map { |i| i**i_exponent }
arr.inject(:+)
end
fifth_thread = Thread.new do
arr = (range_limit*4/6..range_limit*5/6).map { |i| i**i_exponent }
arr.inject(:+)
end
sixth_thread = Thread.new do
arr = (range_limit*5/6..range_limit).map { |i| i**i_exponent }
arr.inject(:+)
end
first_thread.join
second_thread.join
third_thread.join
fourth_thread.join
fifth_thread.join
sixth_thread.join
result = first_thread.value + second_thread.value + third_thread.value + fifth_thread.value + sixth_thread.value
p (result/2)[0]
end
Interpreters like Ruby and Python are not truly parallel while executing multiple threads - to protect the state of the interpreter, they have a global VM lock that doesn't allow simultaneous execution.
To get the benefit of threads, you need to find a way to execute non-Ruby code that runs without the global lock, use multiple processes instead of threads. Another option is to switch to a ruby implementation that doesn't have the global lock, such as JRuby or Rubinius.
I'm implementing a kind of write/store buffer in a Redis-backed library to squash multiple hincrby calls into a single call. The buffer needs to be fully atomic and work across multiple threads.
I'm quite new to dealing with thread-safety, hence; Are there any existing libraries or standardized ways to implement a global Hash-based buffer/queue that works fine in threaded environments?
As an example, the buffer hash would work something like this pseudo code:
buffer #=> { :ident1 => { :value_a => 1, :value_b => 4 },
# :ident2 => { :value_a => 2, :value_b => 3 } }
buffer[:ident1][:value_a] #=> 1
# saving merges and increments {:value_a => 2} into buffer[:ident1]
save(:ident1, {:value_a => 2})
buffer[:ident1][:value_a] #=> 3
The idea is that after X number of save calls the buffer is flushed by calling save with each item from the buffer.
In general, the way that you provide access to a global value in a thread-safe manner is to use the built-in Mutex class:
$buffer = {}
$bufflock = Mutex.new
threads = (0..2).map do |i|
Thread.new do
puts "Starting Thread #{i}"
3.times do
puts "Thread #{i} got: #{$buffer[:foo].inspect}"
$bufflock.synchronize{ $buffer[:foo] = ($buffer[:foo] || 1) * (i+1) }
sleep rand
end
puts "Ending Thread #{i}"
end
end
threads.each{ |t| t.join } # Wait for all threads to complete
#=> Starting Thread 0
#=> Thread 0 got: nil
#=> Starting Thread 1
#=> Thread 1 got: 1
#=> Starting Thread 2
#=> Thread 2 got: 2
#=> Thread 1 got: 6
#=> Thread 1 got: 12
#=> Ending Thread 1
#=> Thread 0 got: 24
#=> Thread 2 got: 24
#=> Thread 0 got: 72
#=> Thread 2 got: 72
#=> Ending Thread 0
#=> Ending Thread 2
Code inside a Mutex#synchronize block is atomic per thread; one thread cannot go into $bufflock until the previous thread is done with the block.
See also: Pure-Ruby concurrent Hash