Distributed Julia: parallel map (pmap) with a timeout / time limit for each map task to complete - parallel-processing

My project involves computing in parallel a map using Julia's Distributed's pmap function.
Mapping a given element could take a few seconds, or it could take essentially forever. I want a timeout or time limit for an individual map task/computation to complete.
If a map task finishes in time, great, return the result of the computation. If the task doesn't complete by the time limit, stop computation when the time limit has been reached, and return some value or message indicating a timeout occurred.
A minimal example follows. First are imported modules, and then worker processes are launched:
num_procs = 1
using Distributed
if num_procs > 1
# The main process (no calling addprocs) can be used for `pmap`:
addprocs(num_procs-1)
end
Next, the mapping task is defined for all the worker processes. The mapping task should timeout after 1 second:
#everywhere import Random
#everywhere begin
"""
Compute stuff for `wait_time` seconds, and return `wait_time`.
If `timeout` seconds elapses, stop computation and return something else.
"""
function waitForTimeUnlessTimeout(wait_time, timeout=1)
# < Insert some sort of timeout code? >
# This block of code simulates a long computation.
# (pretend the computation time is unknown)
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
end
# computation completed before time limit. Return wait_time.
round(wait_time, digits=2)
end
end
The function that executes the parallel map (pmap) is defined on the main process. Each map task randomly takes up to 2 seconds to complete, but should time out after 1 second.
function myParallelMapping(num_tasks = 20, max_runtime=2)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# return the parallel computation of the mapping tasks
pmap((runtime)->waitForTimeUnlessTimeout(runtime), runtimes)
end
print(myParallelMapping())
How should this time-limited parallel map be implemented?

You could put something like this inside your pmap body
pmap(runtimes) do runtime
t0 = time()
task = #async waitForTimeUnlessTimeout(runtime)
while !istaskdone(task) && time()-t0 < time_limit
sleep(1)
end
istaskdone(task) && (return fetch(task))
error("time over")
end
Also note that (runtime)->waitForTimeUnlessTimeout(runtime) is the same as just waitForTimeUnlessTimeout .

Following #Fredrik Bagge's very helpful answer, here is the full working example implementation with some extra explanation.
num_procs = 8
using Distributed
if num_procs > 1
addprocs(num_procs-1)
end
#everywhere import Random
#everywhere begin
function waitForTime(wait_time)
# This code block simulates a long computation.
# Pretend the computation time is unknown.
t0 = time()
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
yield() # CRITICAL to release computation to check if task is done.
# If you comment out #yield(), you will see timeout doesn't work!
end
return round(wait_time, digits=2)
end
end
function myParallelMapping(num_tasks = 16, max_runtime=2, time_limit=1)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# parallel compute the mapping tasks. See "do block" in
# the Julia documentation, it's just syntactic sugar.
return pmap(runtimes) do runtime
t0 = time()
task = #async waitForTime(runtime)
while !istaskdone(task) && time()-t0 < time_limit
# releases computation to waitForTime
sleep(0.1)
# nothing past here will run until waitForTime calls yield()
# *and* 0.1 seconds have passed.
end
# equal to if istaskdone(task); return fetch(task); end
istaskdone(task) && (return fetch(task))
return "TimeOut"
# `return error("TimeOut")` halts pmap unless pmap is
# given an error handler argument. See pmap documentation.
end
end
The output is
julia> print(myParallelMapping())
Any["TimeOut", "TimeOut", 0.33, 0.35, 0.56, 0.41, 0.08, 0.14, 0.72,
"TimeOut", "TimeOut", "TimeOut", 0.52, "TimeOut", 0.33, "TimeOut"]
Note that there are two tasks per process in this example. The original task (the "time checker") is checking every 0.1 seconds if the other task has completed computation. The other task (created with #async) is computing something, periodically calling yield() to release control to the time checker; if it doesn't call yield(), time checking cannot occur.

Related

Matlab ROS slow publishing + subscribing

In my experience Matlab performs publish subscribe operations with ROS slow for some reason. I work with components as defined in an object class as shown below, where I made a test-class. Normally objects of comparable structure are used to control mobile robots.
To quantify performance tested required time for an operation and got the following results:
1x publishing a message + 1x simple subscriber callback : 3.7ms
Simply counting in a callback (per count): 2.1318e-03 ms
Creating a new message with msg1 = rosmessage(obj.publisher) adds 3.6-4.3ms per iteration
Pinging myself indicated communication latency of 0.05 ms
The times required for a simple publish + start of a subscribe callback seems oddly slow.
I want to have multiple system components as objects in my workspace such that they respond to ROS topic updates or on timer events. The pc used for testing is not a monster but should not be garbage either.
Do you also think the shown time requirements are unneccesary large? this allows barely to publish a single topic at 200hz without doing anything else. Normally I have multiple lower frequency topics (e.g.20hz) but the total consumed time becomes significant.
Do you know any practices to make the system operate quicker?
What do you think of the OOP style of making control system components in general?
classdef subpubspeedMonitor < handle
% Use: call in matlab console, after initializing ros:
%
% SPM1 = subpubspeedMonitor()
%
% This will create an object which starts a set repetitive task upon creation
% and finally destructs itself after posting results in console.
properties
node
subscriber
publisher
timestart
messagetotal
end
methods
function obj = subpubspeedMonitor()
obj.node = ros.Node('subspeedmonitor1');
obj.subscriber = ros.Subscriber(obj.node,'topic1','sensor_msgs/NavSatFix',{#obj.rosSubCallback});
obj.publisher = ros.Publisher(obj.node,'topic1','sensor_msgs/NavSatFix');
obj.timestart = tic;
obj.messagetotal = 0;
msg1 = rosmessage(obj.publisher);
% Choose to evaluate subscriber + publisher loop or just counting
if 1
send(obj.publisher,msg1);
else
countAndDisplay(obj)
end
end
%% Test method one: repetitive publishing and subscribing
function rosSubCallback(obj,~,msg_) % ~3.7 ms per loop for a simple publish+subscribe action
% Latency to self is 0.05ms on average, according to "pinging" in terminal
obj.messagetotal = obj.messagetotal+1;
if obj.messagetotal <10000
%msg1 = rosmessage(obj.publisher); % this line adds 4.3000ms per loop
msg_.Longitude = 51; % this line adds 0.25000 ms per loop
send(obj.publisher,msg_)
else
% Display some results
timepassed = toc(obj.timestart);
time_per_pubsub = timepassed/obj.messagetotal
delete(obj);
end
end
%% Test method two: simply counting
function countAndDisplay(obj) % this costs 2.1318e-03 ms(!) per loop
obj.messagetotal = obj.messagetotal+1;
if obj.messagetotal <10000
%msg1 = rosmessage(obj.publisher); %adds 3.6ms per loop
%i = 1% adds 5.7532e-03 ms per loop
%msg1 = rosmessage("std_msgs/Bool"); %adds 1.5ms per loop
countAndDisplay(obj);
else
% Display some results
timepassed = toc(obj.timestart);
time_per_count_FCN = timepassed/obj.messagetotal
delete(obj);
end
end
%% Deconstructor
function delete(obj)
delete(obj.subscriber)
delete(obj.publisher)
delete(obj.node)
end
end
end

How to measure execution time for prediction per image (keras)

I have a simple model created with Keras and I need to measure the execution time for prediction per image. Right now I just do this:
start = time.clock()
my_model.predict(images_test)
end = time.clock()
print("Time per image: {} ".format((end-start)/len(images_test)))
But I noticed that the calculated time is bigger when len(images_test) is smaller. For example when len(images_test) = 32 I get: 0.06 and when len(images_test) = 1024 I get: 0.006
Is there a "right" way to do this ?
if use TF it seems no Asynchronous problem
but if use pytorch it has Asynchronous problem.
in TF:
start = time.clock()
result = my_model.predict(images_test)
end = time.clock()
in pytorch:
torch.cuda.synchronize()
start = time.clock()
my_model.predict(images_test)
torch.cuda.synchronize()
end = time.clock()
But i think you can do 10 times Loop model_predict
and print time_list
(computer need load keras model so first time load slower than other times )
in TF:
pred_time_list=[]
for i in range(10):
start = time.clock()
result = my_model.predict(images_test)
end = time.clock()
pred_time_list.append(end-start)
print(pred_time_list)
(print the pred_time_list and you may find out why the times incorrect)
Reference:
[1]
https://discuss.pytorch.org/t/doing-qr-decomposition-on-gpu-is-much-slower-than-on-cpu/21213/6
[2]
https://discuss.pytorch.org/t/is-there-any-code-torch-backends-cudnn-benchmark-torch-cuda-synchronize-similar-in-tensorflow/51484/2

Measure estimated completion time of ruby script

I've been running a lot of scripts lately that iterate over 10k - 300k objects, and I'm thinking of writing some code that estimates the completion time of the script (they take 20-180 minutes). I've got to imagine though that there's something out there that does this already. Is there?
To Clarify (edit):
Were I to write code to do this, it would work by measuring how long it takes to perform "the operation" on a single object, multiplying that amount of time by the number of objects left, and adding it to the current time.
Granted, this would only work in situations where you have a script involving a single loop that takes up 99% of the script's total run time, and in which you could reasonably expect to be able to calculate an semi-accurate average for each iteration of that loop. This is true of the scripts for which I'd like estimate completion time.
Have a look at the ruby-progressbar gem: https://github.com/jfelchner/ruby-progressbar
It generates a nice progressbar and estimates completion time (ETA):
example task: 67% |oooooooooooooooooooooo | ETA: 00:01:15
You can granularity measure the time of each method within your script and then sum the components as described here.
You let your process run, and after a set time of iterations, you measure the elapsed time. You then use that value as an estimation for the time left. This ensures that the time is always dynamically estimated according to the current task.
This example is extra verbose, like a code double whopper with triple cheese:
# Some variables for this test
iterations = 1000
probe_at = (iterations * 0.1).to_i
time_total = 0
#======================================
iterations.times do |i|
time_start = Time.now
#you could yield here if this were a function
5000.times do # <tedius task simulation>
Math.sqrt(rand(200000))
end # <end of tedious task simulation>
time_total += time_taken = Time.now - time_start
if i == probe_at
iteration_cost = (time_total / probe_at)
time_left = iteration_cost * (iterations - probe_at)
puts "Time taken (ACTUAL): #{time_total} | iteration: #{i}"
puts "Time left (ESTIMATE): #{time_left} | iteration: #{i}"
puts "Estimated total: #{time_total + time_left} | iteration: #{i}"
end
if i == 999
puts "Time taken (ACTUAL): #{time_total} | iteration: #{i}"
end
end
You could easily rewrite this into a class or a method.

Rufus Scheduler: Every day, start a job at time X and process in batches with a fixed interval

I am trying to figure out how to be able to do the following.
I want to start a job at 12:00pm every day which computes a list of things and then processes these things in batches of size 'b'. and guarantee an interval of x minutes between the end of one batch and the beginning of another.
schedule.cron '00 12 * * *' do
# things = Compute the list of things
schedule.interval '10m', # Job.new(things[index, 10]) ???
end
I tried something like the following but I hate that I have to pass in the scheduler to the job or access the singleton scheduler.
class Job < Struct.new(:things, :batch_index, :scheduler)
def call
if (batch_index+1) >= things.length
ap "Done with all batches"
else
# Process batch
scheduler.in('10m', Dummy.new(things, batch_index+1, scheduler))
end
end
end
scheduler = Rufus::Scheduler.new
schedule.cron '00 12 * * *' , Dummy.new([ [1,2,3] , [4,5,6,7], [8,9,10]], 0, scheduler)
with a bit of help from https://github.com/jmettraux/rufus-scheduler#scheduling-handler-instances
I'd prepare a class like:
class Job
def initialize(batch_size)
#batch_size = batch_size
end
def call(job, time)
# > I want to start a job (...) which computes a list of things and
# > then processes these things in batches of size 'b'.
things = compute_list_of_things()
loop do
ts = things.shift(#batch_size)
do_something_with_the_batch(ts)
break if things.empty?
# > and guarantee an interval of x minutes between the end of one batch
# > and the beginning of another.
sleep 10 * 60
end
end
end
I'd pass an instance of that class to the scheduler instead of a block:
scheduler = Rufus::Scheduler.new
# > I want to start a job at 12:00pm every day which computes...
scheduler.cron '00 12 * * *', Job.new(10) # batch size of 10...
I don't bother using the scheduler for the 10 minutes wait.

Testing time critical code

I've written a feature for my library Rubikon that displays a throbber (a spinning — as you may have seen in other console apps) as long as some other code is running.
To test this feature I capture the output of the throbber in a StringIO and compare it with the expected value. As the throbber is only displayed as long as the other code is running the content of the IO gets longer when the code runs longer. In my tests I do a simple sleep 1 and should have a constant 1 second delay. This works most of the time, but sometimes (apparently due to external factors like heavy load on the CPU) it fails, because the code doesn't run for 1 second, but for a bit more, so that the throbber prints a few additional characters.
My question is: Is there any possibility to test such time critical features in Ruby?
From your github repository, I found this test for the Throbber class:
should 'work correctly' do
ostream = StringIO.new
thread = Thread.new { sleep 1 }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
assert_equal " \b-\b\\\b|\b/\b", ostream.string
end
I'll assume that a throbber iterates over ['-', '\', '|', '/'], backspacing before each write, once per second. Consider the following test:
should 'work correctly' do
ostream = StringIO.new
started_at = Time.now
ended_at = nil
thread = Thread.new { sleep 1; ended_at = Time.now }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
duration = ended_at - started_at
iterated_chars = " -\\|/"
expected = ""
if duration >= 1
# After n seconds we should have n copies of " -\\|/", excluding \b for now
expected << iterated_chars * duration.to_i
end
# Next append the characters we'd get from working for fractions of a second:
remainder = duration - duration.to_i
expected << iterated_chars[0..((iterated_chars.length*remainder).to_i)] if remainder > 0.0
expected = expected.split('').join("\b") + "\b"
assert_equal expected, ostream.string
end
The last assignment of expected is a bit unpleasant, but I made the assumption that the throbber would write character/backspace pairs atomically. If this is not true, you should be able to insert the \b escape sequence into the iterated_chars string and remove the last assignment entirely.
This question is similar (I think, altough I'm not completely sure) to this one:
Only real time operating system can
give you such precision. You can
assume Thread.Sleep has a precision of
about 20 ms so you could, in theory
sleep until the desired time - the
actual time is about 20 ms and THEN
spin for 20 ms but you'll have to
waste those 20 ms. And even that
doesn't guarantee that you'll get real
time results, the scheduler might just
take your thread out just when it was
about to execute the RELEVANT part
(just after spinning)
The problem is not rubby (possibly, I'm no expert in ruby), the problem is the real time capabilities of your operating system.

Resources