Loops in multiple threads - ruby

I have the following code (from a Ruby tutorial):
require 'thread'
count1 = count2 = 0
difference = 0
counter = Thread.new do
loop do
count1 += 1
count2 += 1
end
end
spy = Thread.new do
loop do
difference += (count1 - count2).abs
end
end
sleep 1
puts "count1 : #{count1}"
puts "count2 : #{count2}"
puts "difference : #{difference}"
counter.join(2)
spy.join(2)
puts "count1 : #{count1}"
puts "count2 : #{count2}"
puts "difference : #{difference}"
It's an example for using Mutex.synchronize. On my computer, the results are quite different from the tutorial. After calling join, the counts are sometimes equal:
count1 : 5321211
count2 : 6812638
difference : 0
count1 : 27307724
count2 : 27307724
difference : 0
and sometimes not:
count1 : 4456390
count2 : 5981589
difference : 0
count1 : 25887977
count2 : 28204117
difference : 0
I don't understand how it is possible that the difference is still 0 even though the counts show very different numbers.
The add operation probably looks like this:
val = fetch_current(count1)
add 1 to val
store val back into count1
and something similar for count2. Ruby can switch execution between threads, so it might not finish writing to a variable, but when the CPU gets back to the thread, it should continue from the line where it was interrupted, right?
And there is still just one thread that is writing into the variable. How is it possible that, inside the loop do block, count2 += 1 is executed much more times?

Execution of
puts "count1 : #{count1}"
takes some time (although it may be short). It is not done in an instance. Therefore, it is not mysterious that the two consecutive lines:
puts "count1 : #{count1}"
puts "count2 : #{count2}"
are showing different counts. Simply, the counter thread went though some loop cycles and incremented the counts while the first puts was executed.
Similarly, when
difference += (count1 - count2).abs
is calculated, the counts may in principle increment while count1 is referenced before count2 is referenced. But there is no command executed within that time span, and I guess that the time it takes to refer to count1 is much shorter than the time it takes for the counter thread to go through another loop. Note that the operations done in the former is a proper subset of what is done in the latter. If the difference is significant enough, which means that counter thread had not gone through a loop cycle during the argument call for the - method, then count1 and count2 will appear as the same value.
A prediction will be that, if you put some expensive calculation after referencing count1 but before referencing count2, then difference will show up:
difference += (count1.tap{some_expensive_calculation} - count2).abs
# => larger `difference`

Here's the answer. I think you've made the assumption that the threads stop execution after join(2) returns.
This is not the case! The threads continue to run even though join(2) returns execution (temporarily) back to the main thread.
If you change your code to this you will see what happens:
...
counter.join(2)
spy.join(2)
counter.kill
spy.kill
puts "count1 : #{count1}"
puts "count2 : #{count2}"
puts "difference : #{difference}"
This seems to work a bit differently in ruby 1.8, where the threads do not seem to get a chance to run while the main thread is executing.
The tutorial is probably written for ruby 1.8, but the threading model has been changed since then in 1.9.
In fact it was pure "luck" that it worked in 1.8, as the threads do not finish execution when join(2) returns in neither 1.8 nor 1.9.

Related

Ruby elegant alternative to ++ in nested loops?

Before anything, I have read all the answers of Why doesn't Ruby support i++ or i—? and understood why. Please note that this is not just another discussion topic about whether to have it or not.
What I'm really after is a more elegant solution for the situation that made me wonder and research about ++/-- in Ruby. I've looked up loops, each, each_with_index and things alike but I couldn't find a better solution for this specific situation.
Less talk, more code:
# Does the first request to Zendesk API, fetching *first page* of results
all_tickets = zd_client.tickets.incremental_export(1384974614)
# Initialises counter variable (please don't kill me for this, still learning! :D )
counter = 1
# Loops result pages
loop do
# Loops each ticket on the paged result
all_tickets.all do |ticket, page_number|
# For debug purposes only, I want to see an incremental by each ticket
p "#{counter} P#{page_number} #{ticket.id} - #{ticket.created_at} | #{ticket.subject}"
counter += 1
end
# Fetches next page, if any
all_tickets.next unless all_tickets.last_page?
# Breaks outer loop if last_page?
break if all_tickets.last_page?
end
For now, I need counter for debug purposes only - it's not a big deal at all - but my curiosity typed this question itself: is there a better (more beautiful, more elegant) solution for this? Having a whole line just for counter += 1 seems pretty dull. Just as an example, having "#{counter++}" when printing the string would be much simpler (for readability sake, at least).
I can't simply use .each's index because it's a nested loop, and it would reset at each page (outer loop).
Any thoughts?
BTW: This question has nothing to do with Zendesk API whatsoever. I've just used it to better illustrate my situation.
To me, counter += 1 is a fine way to express incrementing the counter.
You can start your counter at 0 and then get the effect you wanted by writing:
p "#{counter += 1} ..."
But I generally wouldn't recommend this because people do not expect side effects like changing a variable to happen inside string interpolation.
If you are looking for something more elegant, you should make an Enumerator that returns integers one at a time, each time you call next on the enumerator.
nums = Enumerator.new do |y|
c = 0
y << (c += 1) while true
end
nums.next # => 1
nums.next # => 2
nums.next # => 3
Instead of using Enumerator.new in the code above, you could just write:
nums = 1.upto(Float::INFINITY)
As mentioned by B Seven each_with_index will work, but you can keep the page_number, as long all_tickets is a container of tuples as it must be to be working right now.
all_tickets.each_with_index do |ticket, page_number, i|
#stuff
end
Where i is the index. If you have more than ticket and page_number inside each element of all_tickets you continue putting them, just remember that the index is the extra one and shall stay in the end.
Could be I oversimplified your example but you could calculate a counter from your inner and outer range like this.
all_tickets = *(1..10)
inner_limit = all_tickets.size
outer_limit = 5000
1.upto(outer_limit) do |outer_counter|
all_tickets.each_with_index do |ticket, inner_counter|
p [(outer_counter*inner_limit)+inner_counter, outer_counter, inner_counter, ticket]
end
# some conditional to break out, in your case the last_page? method
break if outer_counter > 3
end
all_tickets.each_with_index(1) do |ticket, i|
I'm not sure where page_number is coming from...
See Ruby Docs.

How can a value be accumulated when run in parallel/concurent processes?

I'm running some Ruby scripts concurrently using Grosser/Parallel.
During each concurrent test I want to add up the number of times a particular thing has happened, then display that number.
Let's say:
def main
$this_happened = 0
do_this_in_parallel
puts $this_happened
end
def do_this_in_parallel
Parallel.each(...) {
$this_happened += 1
}
end
The final value after do_this_in_parallel has finished will always be 0
I'd like to know why this happens.
How can I get the desired result which would be $this_happenend > 0?
Thanks.
This doesn't work because separate processes have separate memory spaces: setting variables etc in one process has no effect on what happens in the other process.
However you can return a result from your block (because under the hood parallel sets up pipes so that the processes can be fed input/return results). For example you could do this
counts = Parallel.map(...) do
#the return value of the block should
#be the number of times the event occurred
end
Then just sum the counts to get your total count (eg counts.reduce(:+)). You might also want to read up on map-reduce for more information about this way of parallelising work
I have never used parallel but the documentation seems to suggest that something like this might work.
Parallel.each(..., :finish => lambda {|*_| $this_happened += 1}) { do_work }

Execute ruby method's by time

I never do task's in ruby (without Rails). Now i need do to such thing:
Let's say i have 3 methods in ruby class:
a, b, c
in main method i need to execute method's via time, so that method run's not when i run script, but when time has come, for example:
a first must be executed in 11:59:30
then in xx-xx-xx, than in loop
a first must be executed in 12:00:30
then in xx-xx-xx, than in loop
etc
What i need to write, to google, or maybe give me simple example how can i do this?
can i do something like sleep until time == 14-50... ?
use sleep t to sleep t seconds.
for example a will run each t_a second. b in t_b and c in t_c seconds.
time = 0;
while true do
a if time % t_a == 0
b if time % t_b == 0
c if time % t_c == 0
time += 1
sleep 1
end

Generating a race condition with MRI

I was wondering whether it's easy to make a race condition using MRI ruby(2.0.0) and some global variables, but as it turns out it's not that easy. It looks like it should fail at some point, but it doesn't and I've been running it for 10 minutes. This is the code I've been trying to achieve it:
def inc(*)
a = $x
a += 1
a *= 3000
a /= 3000
$x = a
end
THREADS = 10
COUNT = 5000
loop do
$x = 1
THREADS.times.map do Thread.new { COUNT.times(&method(:inc)) } end.each(&:join)
break puts "woo hoo!" if $x != THREADS * COUNT + 1
end
puts $x
Why am I not able to generate (or detect) the expected race condition, and get the output woo hoo! in Ruby MRI 2.0.0?
Your example does (almost instantly) work in 1.8.7.
The following variation does the trick for 1.9.3+:
def inc
a = $x + 1
# Just one microsecond
sleep 0.000001
$x = a
end
THREADS = 10
COUNT = 50
loop do
$x = 1
THREADS.times.map { Thread.new { COUNT.times { inc } } }.each(&:join)
break puts "woo hoo!" if $x != THREADS * COUNT + 1
puts "No problem this time."
end
puts $x
The sleep command is a strong hint to the interpreter that it can schedule another thread, so this is not a huge surprise.
Note if you replace the sleep with something that takes just as long or longer, e.g. b = a; 500.times { b *= 100 }, then there is no race condition detected in the above code. But take it further with b = a; 2500.times { b *= 100 }, or increase COUNT from 50 to 500, and the race condition is more reliably triggered.
The thread scheduling in Ruby 1.9.3 onwards (of course including 2.0.0) appears to be assigning CPU time in larger chunks than in 1.8.7. Opportunities to switch threads can be low in simple code, unless some kind of I/O waiting is involved.
It is even possible that the threads in the OP, each of which is performing just a few thousand calculations, are in essence occurring in series - although increasing the COUNT global to avoid this still does not trigger additional race conditions.
Generally MRI Ruby does not switch context between threads during atomic processes (e.g. during a Fixnum multiply or division) that occur within its C implementation. This means that the only opportunities for a thread context switch where all methods are calls to Ruby internals without I/O waiting, are "in-between" each line of code. In the original example, there are only 4 such fleeting opportunities, and it seems that in the scheme of things that this is not very much at all for MRI 1.9.3+ (in fact, see update below, these opportunities probably have been removed by Ruby)
When I/O waits or sleep are involved, it actually gets more complex, as Ruby MRI (1.9+) will allow a little bit of true parallel processing on multi-core CPUs. Although this is not the direct cause of race conditions with threads, it is more likely to result in them, as Ruby will usually make a thread context switch at the same time to take advantage of the parallelism.
Whilst I was researching this rough answer, I found an interesting link: Nobody understands the GIL (part 2 linked, as more relevant to this question)
Update: I suspect that the interpretter is optimising away some potential thread-switching points
in the Ruby source. Starting with my sleep version of the code, and setting:
COUNT = 500000
the following variation of inc does not seem to have a race condition affecting $x:
def inc
a = $x + 1
b = 0
b += 1
$x = a
end
However, these minor changes both trigger a race condition:
def inc
a = $x + 1
b = 0
b = b.send( :+, 1 )
$x = a
end
def inc
a = $x + 1
b = 0
b += '1'.to_i
$x = a
end
My interpretation is that the Ruby parser has optimised b += 1 to remove some of the
overhead of method despatch. One of the optimised-away steps is likely to include
the check for a possible switch to a waiting thread.
If that is the case, then the code in the question may never have the opportunity to switch threads within the inc method, because all the operations inside it can be optimised
in the same way.

Confused by this unless statement in rubykoans

Lines 4 & 5 are causing me grief:
1 def test_break_statement
2 i = 1
3 result = 1
4 while true
5 break unless i <= 10
6 result = result * i
7 i += 1
8 end
9 assert_equal 3628800, result
10 end
I'm not sure what needs to remain true in the while true statement, however I believe it is the code that follows it. This leads to further confusion because I am reading the line:
break unless i <= 10 as break if i is not smaller or equal to 10. What procedure is this code going through ie how does the while and break statements interplay. I think I am nearly there but can't put the process in my head. Thanks.
The code will break out of the endless while loop when i is greater than 10.
But I'm not sure why the condition isn't checked in the while statement.
Edit: Had I read the method name I would have understood why the condition isn't checked directly with the while statement. The method's purpose is to test the break statement.
while statements test whatever comes after the word while. If the expression that follows them is true they execute the code within the loop. If the expression is false, they do not.
Thus, as other posters have pointed out, while true will always execute the code within the loop. Luckily for your code there is a break statement within the loop. If there wasn't, the loop would run forever and you'd have to kill the process running your program.
In your code sample the break keyword is followed by unless which means that it will break the loop unless the expression following it is true. Your code will break out of the loop when i is greater than 10.
while true is an infinite loop. break, when executed, will exit it immediately, to continue with the first line after it (assert_equal...).
In this specific case (nothing intervening between while and break unless), it is equivalent to this:
while i <= 10
result = result * i
i += 1
end
while true it is endless loop.
break unless i <= 10 is same as break if i > 10 it will break that loop if i is smaller or equal to 10

Resources