Getting Thread not to run until join in ruby - ruby

I am getting into ruby and have been using threads for a little while now with out fully understanding them. I notice that when adding a thread to an array and if I add a sleep() command as the first command the thread does not run until I do a join which is mostly what I want. So I have 2 questions.
1.Is that suppose to happen?
2.Is there a better way to do that other then the way I'm doing it. Here is a sample code that I have to show what I'm talking about.
job = Array.new
10.times do |n|
job << Thread.new do
sleep 0.001
puts "done #{n}"
end
end
#job.each do |t|
#t.join
#end
puts "End of script"
Output is
End of script
If I remove the comments output is
done 1
done 0
done 7
done 6
done 5
done 4
done 3
done 2
done 9
done 8
End of script
So I use this now but I don't understand why it does that. Sometimes I notice even doing something like `echo hi` instead of sleep does the trick.
Thanks in advance.

Timing of threads isn't a defined behavior. Once you put them to sleep, they will be put in a queue to be run later. You can't ever expect it to run one way or another.
Your main program doesn't take very long to run, so it is likely to happen to finish before your other threads get picked back up to run again. Really, when you think about it, 0.001 seconds is quite a long time to computer, so spinning off 10 threads in that time is likely to happen -- but even if it takes longer, there is no guarantee the thread will resume immediately after .001 seconds. Often there's really no guarantee it won't start before .001 seconds, either, but sleep calls usually don't end early.
When you add the join calls, you are introducing additional time into your main thread which allows the other threads time to run, so this behavior is expected.

Related

How limiting background jobs works?

while looking on how to parallelize bash tasks, I've stumbles over a code like this:
for item in "${items[#]}"
do
((i=i%THREADS)); ((i++==0)) && wait
process_item $item &
done
Where process_item is some king of function/program that works with item and the THREADS var contain the maximum number of background processes that can run simultaneously.
Can someone explain to me how this works? I understand that i=i%THREADS ensures that i is between 0 and THREADS-1, and that i++==0 increments i and checks whether it is 0. But is wait bound to all sub processes? Or how does it know that is has to wait until the previous batch stopped processing?
It's an obfuscated way of writing
for item in "${items[#]}"
do
# Every THREADSth job, stop and wait for everything
# to complete.
if (( i % THREADS == 0 )); then
wait
fi
((i++))
process_item $item &
done
It also doesn't actually work terribly well. It doesn't ensure that there are always $THREADS jobs running, only that no more than $THREADS jobs are running at once.
i++==0 checks and increments, not the opposite. wait waits for all currently active child processes. So, each iteration (but the first, thanks to the ((i++==0))) first waits for the process launched by the previous iteration and launches a new process.

Howto know that I do not block Ruby eventmachine with a mongodb operation

I am working on a eventmachine based application that periodically polls for changes of MongoDB stored documents.
A simplified code snippet could look like:
require 'rubygems'
require 'eventmachine'
require 'em-mongo'
require 'bson'
EM.run {
#db = EM::Mongo::Connection.new('localhost').db('foo_development')
#posts = #db.collection('posts')
#comments = #db.collection('comments')
def handle_changed_posts
EM.next_tick do
cursor = #posts.find(state: 'changed')
resp = cursor.defer_as_a
resp.callback do |documents|
handle_comments documents.map{|h| h["comment_id"]}.map(&:to_s) unless documents.length == 0
end
resp.errback do |err|
raise *err
end
end
end
def handle_comments comment_ids
meta_product_ids.each do |id|
cursor = #comments.find({_id: BSON::ObjectId(id)})
resp = cursor.defer_as_a
resp.callback do |documents|
magic_value = documents.first['weight'].to_i * documents.first['importance'].to_i
end
resp.errback do |err|
raise *err
end
end
end
EM.add_periodic_timer(1) do
puts "alive: #{Time.now.to_i}"
end
EM.add_periodic_timer(5) do
handle_changed_posts
end
}
So every 5 seconds EM iterates over all posts, and selects the changed ones. For each changed post it stores the comment_id in an array. When done that array is passed to a handle_comments which loads every comment and does some calculation.
Now I have some difficulties in understanding:
I know, that this load_posts->load_comments->calculate cycle takes 3 seconds in a Rails console with 20000 posts, so it will not be much faster in EM. I schedule the handle_changed_posts method every 5 seconds which is fine unless the number of posts raises and the calculation takes longer than the 5 seconds after which the same run is scheduled again. In that case I'd have a problem soon. How to avoid that?
I trust em-mongo but I do not trust my EM knowledge. To monitor EM is still running I puts a timestamp every second. This seems to be working fine but gets a bit bumpy every 5 seconds when my calculation runs. Is that a sign, that I block the loop?
Is there any general way to find out if I block the loop?
Should I nice my eventmachine process with -19 to give it top OS prio always?
I have been reluctant to answer here since I've got no mongo experience so far, but considering no one is answering and some of the stuff here is general EM stuff I may be able to help:
schedule next scan on first scan's end (resp.callback and resp.errback in handle_changed_posts seem like good candidates to chain next scan), either with add_timer or with next_tick
probably, try handling your mongo trips more often so they handle smaller chunks of data, any cpu cycle hog inside your reactor would make your reactor loop too busy to accept events such as periodic timer ticks
no simple way, no. One idea would be to measure diff of Time.now to next_tick{Time.now}, do benchmark and then trace possible culprits when the diff crosses a threshold. Simulating slow queries (Simulate slow query in mongodb? ?) and many parallel connections is a good idea
I honestly don't know, I've never encountered people who do that, I expect it depends on other things running on that server
To expand upon bbozo's answer, specifically in relation to your second question, there is no time when you run code that you do not block the loop. In my experience, when we talk about 'non-blocking' code what we really mean is 'code that doesn't block very long'. Typically, these are very short periods of time (less than a millisecond), but they still block while executing.
Further, the only thing next_tick really does is to say 'do this, but not right now'. What you really want to do, as bbozo mentioned, is split up your processing over multiple ticks such that each iteration blocks for as little time as possible.
To use your own benchmarks, if 20,000 records takes about 3 seconds to process, 4,000 records should take about 0.6 seconds. This would be short enough to not usually affect your 1 second heartbeat. You could split it up even farther to reduce the amount of blockage and make the reactor run smoother, but it really depends on how much concurrency you need from the reactor.

Watir ... difference between sleep and wait

Is there any notable difference between
sleep 10
and
wait_until(10)
They both seem to do the same thing: wait 10 seconds then proceed to the next step
sleep just does nothing for the specified time. wait_until takes a block. It waits until the block evaluates to true or times out. If no block is given they act the same.

Is there a way to call a block every microsecond using celluloid?

I'm using celluloid's every method to execute a block every microsecond however it seems to always call the block every second even when I specify a decimal.
interval = 1.0 / 2.0
every interval do
puts "*"*80
puts "Time: #{Time.now}"
puts "*"*80
end
I would expect this to be called every 0.5 seconds. But it is called every one second.
Any suggestions?
You can get fractional second resolution with Celluloid.
Celluloid uses the Timers gem to manage the every, which does good floating point time math and ruby's sleep which has reasonable sub-second resolution.
The following code works perfectly:
class Bob
include Celluloid
def fred
every 0.5 do
puts Time.now.strftime "%M:%S.%N"
end
end
end
Bob.new.fred
And it produces the following output:
22:51.299923000
22:51.801311000
22:52.302229000
22:52.803512000
22:53.304800000
22:53.805759000
22:54.307003000
22:54.808279000
22:55.309358000
22:55.810017000
As you can see, it is not perfect, but close enough for most purposes.
If you are seeing different results, it is likely because of how long your code takes in the block you have given to every or other timers running and starving that particular one. I would approach it by simplifying the situation as much as possible and slowly adding parts back in to determine where the slowdown is occurring.
As for microsecond resolution, I don't think you can hope to get that far down reliably with any non-trivial code.
The trivial example:
def bob
puts Time.now.strftime "%M:%S.%N"
sleep 1.0e-6
puts Time.now.strftime "%M:%S.%N"
end
Produces:
31:07.373858000
31:07.373936000
31:08.430110000
31:08.430183000
31:09.062000000
31:09.062079000
31:09.638078000
31:09.638156000
So as you can see, even just a base ruby version on my machine running nothing but a simple IO line doesn't reliably give me microsecond speeds.

Ruby, simple "threading" example to update progress in console app

I am trying to implement a simple console app that will do lots of long processes. During these processes I want to update progress.
I cannot find a SIMPLE example of how to do this anywhere!
I am still "young" in terms of Ruby knowledge and all I can seem to find are debates about Thread vs Fibers vs Green Threads, etc.
I'm using Ruby 1.9.2 if that helps.
th = Thread.new do # Here we start a new thread
Thread.current['counter']=0
11.times do |i| # This loops and increases i each time
Thread.current['counter']=i
sleep 1
end
return nil
end
while th['counter'].to_i < 10 do
# th is the long running thread and we can access the same variable as from inside the thread here
# keep in mind that this is not a safe way of accessing thread variables, for reading status information
# this works fine though. Read about Mutex to get a better understanding.
puts "Counter is #{th['counter']}"
sleep 0.5
end
puts "Long running process finished!"
Slightly smaller variation, and you don't need to read about Mutex.
require "thread"
q = Queue.new
Thread.new do # Here we start a new thread
11.times do |i| # This loops and increases i each time
q.push(i)
sleep 1
end
end
while (i = q.pop) < 10 do
puts "Counter is #{i}"
end
puts "Long running process finished!"

Resources