Matlab ROS slow publishing + subscribing

Matlab ROS slow publishing + subscribing - performance

In my experience Matlab performs publish subscribe operations with ROS slow for some reason. I work with components as defined in an object class as shown below, where I made a test-class. Normally objects of comparable structure are used to control mobile robots.
To quantify performance tested required time for an operation and got the following results:
1x publishing a message + 1x simple subscriber callback : 3.7ms
Simply counting in a callback (per count): 2.1318e-03 ms
Creating a new message with msg1 = rosmessage(obj.publisher) adds 3.6-4.3ms per iteration
Pinging myself indicated communication latency of 0.05 ms
The times required for a simple publish + start of a subscribe callback seems oddly slow.
I want to have multiple system components as objects in my workspace such that they respond to ROS topic updates or on timer events. The pc used for testing is not a monster but should not be garbage either.
Do you also think the shown time requirements are unneccesary large? this allows barely to publish a single topic at 200hz without doing anything else. Normally I have multiple lower frequency topics (e.g.20hz) but the total consumed time becomes significant.
Do you know any practices to make the system operate quicker?
What do you think of the OOP style of making control system components in general?
classdef subpubspeedMonitor < handle
% Use: call in matlab console, after initializing ros:
%
% SPM1 = subpubspeedMonitor()
%
% This will create an object which starts a set repetitive task upon creation
% and finally destructs itself after posting results in console.
properties
node
subscriber
publisher
timestart
messagetotal
end
methods
function obj = subpubspeedMonitor()
obj.node = ros.Node('subspeedmonitor1');
obj.subscriber = ros.Subscriber(obj.node,'topic1','sensor_msgs/NavSatFix',{#obj.rosSubCallback});
obj.publisher = ros.Publisher(obj.node,'topic1','sensor_msgs/NavSatFix');
obj.timestart = tic;
obj.messagetotal = 0;
msg1 = rosmessage(obj.publisher);
% Choose to evaluate subscriber + publisher loop or just counting
if 1
send(obj.publisher,msg1);
else
countAndDisplay(obj)
end
end
%% Test method one: repetitive publishing and subscribing
function rosSubCallback(obj,~,msg_) % ~3.7 ms per loop for a simple publish+subscribe action
% Latency to self is 0.05ms on average, according to "pinging" in terminal
obj.messagetotal = obj.messagetotal+1;
if obj.messagetotal <10000
%msg1 = rosmessage(obj.publisher); % this line adds 4.3000ms per loop
msg_.Longitude = 51; % this line adds 0.25000 ms per loop
send(obj.publisher,msg_)
else
% Display some results
timepassed = toc(obj.timestart);
time_per_pubsub = timepassed/obj.messagetotal
delete(obj);
end
end
%% Test method two: simply counting
function countAndDisplay(obj) % this costs 2.1318e-03 ms(!) per loop
obj.messagetotal = obj.messagetotal+1;
if obj.messagetotal <10000
%msg1 = rosmessage(obj.publisher); %adds 3.6ms per loop
%i = 1% adds 5.7532e-03 ms per loop
%msg1 = rosmessage("std_msgs/Bool"); %adds 1.5ms per loop
countAndDisplay(obj);
else
% Display some results
timepassed = toc(obj.timestart);
time_per_count_FCN = timepassed/obj.messagetotal
delete(obj);
end
end
%% Deconstructor
function delete(obj)
delete(obj.subscriber)
delete(obj.publisher)
delete(obj.node)
end
end
end

Related

Improve code result speed by multiprocessing

I'm self study of Python and it's my first code.
I'm working for analyze logs from the servers. Usually I need analyze full day logs. I created script (this is example, simple logic) just for check speed. If I use normal coding the duration of analyzing 20mil rows about 12-13 minutes. I need 200mil rows by 5 min.
What I tried:
Use multiprocessing (met issue with share memory, think that fix it). But as the result - 300K rows = 20 sec and no matter how many processes. (PS: Also need control processors count in advance)
Use threading (I found that it's not give any speed, 300K rows = 2 sec. But normal code same, 300K = 2 sec)
Use asyncio (I think that script is slow because need reads many files). Result same as threading - 300K = 2 sec.
Finally I think that all three my script incorrect and didn't work correctly.
PS: I try to avoid use specific python modules (like pandas) because in this case it will be more difficult to execute on different servers. Better to use common lib.
Please help to check 1st - multiprocessing.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a, n):
proc_num = os.getpid()
a_temp_m = a["vod_miss"]
a_temp_h = a["vod_hit"]
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m[n] = a_temp_m[n] + 1
elif j[3].find('HIT') != -1:
a_temp_h[n] = a_temp_h[n] + 1
a["vod_miss"][n] = a_temp_m[n]
a["vod_hit"][n] = a_temp_h[n]
if __name__ == '__main__':
procs = []
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
n = 1
vod_live_cuts[i] = manager.list([0] * cpu)
vod_live_cuts[ii] = manager.list([0] * cpu)
for m in file:
proc = Process(target=argument, args=(m, vod_live_cuts, (n-1)))
procs.append(proc)
proc.start()
if n >= cpu:
n = 1
proc.join()
else:
n += 1
[proc.join() for proc in procs]
[proc.close() for proc in procs]
I'm expect, each file by def argument will be processed by independent process and finally all results will be saved in dict vod_live_cuts. For each process I added independent list in dict. I think it will help cross operation for use this parameter. But maybe it's wrong way :(

using IPC is costly, so only use "shared objects" for saving the final result, not for intermediate results while parsing the file.
limiting the number of processes is done by using a multiprocessing.Pool, the following code uses it to reach the max hard-disk speed, you only need to post-process the results.
you can only parse data as fast as your HDD can read it (typically 30-80 MB/s), so if you need to improve the performance further you should use SSD or RAID0 for higher disk speed, you cannot get much faster than this without changing your hardware.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager, Pool
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a):
proc_num = os.getpid()
a_temp_m_n = 0 # make it local to process
a_temp_h_n = 0 # as shared lists use IPC
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m_n = a_temp_m_n + 1
elif j[3].find('HIT') != -1:
a_temp_h_n = a_temp_h_n + 1
a["vod_miss"].append(a_temp_m_n)
a["vod_hit"].append(a_temp_h_n)
if __name__ == '__main__':
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
vod_live_cuts[i] = manager.list()
vod_live_cuts[ii] = manager.list()
with Pool(cpu) as pool:
tasks = []
for m in file:
task = pool.apply_async(argument, args=(m, vod_live_cuts))
tasks.append(task)
for task in tasks:
task.get()
print(list(vod_live_cuts[i]))
print(list(vod_live_cuts[ii]))

Distributed Julia: parallel map (pmap) with a timeout / time limit for each map task to complete

My project involves computing in parallel a map using Julia's Distributed's pmap function.
Mapping a given element could take a few seconds, or it could take essentially forever. I want a timeout or time limit for an individual map task/computation to complete.
If a map task finishes in time, great, return the result of the computation. If the task doesn't complete by the time limit, stop computation when the time limit has been reached, and return some value or message indicating a timeout occurred.
A minimal example follows. First are imported modules, and then worker processes are launched:
num_procs = 1
using Distributed
if num_procs > 1
# The main process (no calling addprocs) can be used for `pmap`:
addprocs(num_procs-1)
end
Next, the mapping task is defined for all the worker processes. The mapping task should timeout after 1 second:
#everywhere import Random
#everywhere begin
"""
Compute stuff for `wait_time` seconds, and return `wait_time`.
If `timeout` seconds elapses, stop computation and return something else.
"""
function waitForTimeUnlessTimeout(wait_time, timeout=1)
# < Insert some sort of timeout code? >
# This block of code simulates a long computation.
# (pretend the computation time is unknown)
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
end
# computation completed before time limit. Return wait_time.
round(wait_time, digits=2)
end
end
The function that executes the parallel map (pmap) is defined on the main process. Each map task randomly takes up to 2 seconds to complete, but should time out after 1 second.
function myParallelMapping(num_tasks = 20, max_runtime=2)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# return the parallel computation of the mapping tasks
pmap((runtime)->waitForTimeUnlessTimeout(runtime), runtimes)
end
print(myParallelMapping())
How should this time-limited parallel map be implemented?

You could put something like this inside your pmap body
pmap(runtimes) do runtime
t0 = time()
task = #async waitForTimeUnlessTimeout(runtime)
while !istaskdone(task) && time()-t0 < time_limit
sleep(1)
end
istaskdone(task) && (return fetch(task))
error("time over")
end
Also note that (runtime)->waitForTimeUnlessTimeout(runtime) is the same as just waitForTimeUnlessTimeout .

Following #Fredrik Bagge's very helpful answer, here is the full working example implementation with some extra explanation.
num_procs = 8
using Distributed
if num_procs > 1
addprocs(num_procs-1)
end
#everywhere import Random
#everywhere begin
function waitForTime(wait_time)
# This code block simulates a long computation.
# Pretend the computation time is unknown.
t0 = time()
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
yield() # CRITICAL to release computation to check if task is done.
# If you comment out #yield(), you will see timeout doesn't work!
end
return round(wait_time, digits=2)
end
end
function myParallelMapping(num_tasks = 16, max_runtime=2, time_limit=1)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# parallel compute the mapping tasks. See "do block" in
# the Julia documentation, it's just syntactic sugar.
return pmap(runtimes) do runtime
t0 = time()
task = #async waitForTime(runtime)
while !istaskdone(task) && time()-t0 < time_limit
# releases computation to waitForTime
sleep(0.1)
# nothing past here will run until waitForTime calls yield()
# *and* 0.1 seconds have passed.
end
# equal to if istaskdone(task); return fetch(task); end
istaskdone(task) && (return fetch(task))
return "TimeOut"
# `return error("TimeOut")` halts pmap unless pmap is
# given an error handler argument. See pmap documentation.
end
end
The output is
julia> print(myParallelMapping())
Any["TimeOut", "TimeOut", 0.33, 0.35, 0.56, 0.41, 0.08, 0.14, 0.72,
"TimeOut", "TimeOut", "TimeOut", 0.52, "TimeOut", 0.33, "TimeOut"]
Note that there are two tasks per process in this example. The original task (the "time checker") is checking every 0.1 seconds if the other task has completed computation. The other task (created with #async) is computing something, periodically calling yield() to release control to the time checker; if it doesn't call yield(), time checking cannot occur.

reading in parallel from" generator in Keras

I have a big dataset divided in files.
I would like to read and process my data one file at the time and for this I have this keras generator:
def myGenerator():
while 1:
rnd = random.randint(1,200)
strRnd = str(rnd)
lenRnd = len(strRnd)
rndPadded = strRnd.rjust(5, '0')
nSearchesInBatch = 100
f = "path/part-" + rndPadded + "*" #read one block of data
data = sqlContext.read.load(f).toPandas()
imax = int(data.shape[0]/nSearchesInBatch) #number of batches that will be created sequentially from the generator
for i in range(imax):
data_batch = data[i*nSearchesInBatch:(i+1)*nSearchesInBatch]
features = data_batch['features']
output = data_batch['output']
yield features, output
The problem is that the reading takes the biggest part (each file is around 200mb), and in the meanwhile the GPU sits waiting, it is possible to pre-read the next batch while the GPU is traning on the previous one?
At the moment one file is read and split in steps (the inner loop), the CPUs are hidden and the GPU training, as soon as the epoch finishes, the GPU goes idle and the cpu start reading (which takes 20/30 seconds).
Any solution to parallelize this?

Calculate time remaining with different length of variables

I will have to admit the title of this question sucks... I couldn't get the best description out. Let me see if I can give an example.
I have about 2700 customers with my software at one time was installed on their server. 1500 or so still do. Basically what I have going on is an Auto Diagnostics to help weed out people who have uninstalled or who have problems with the software for us to assist with. Currently we have a cURL fetching their website for our software and looking for a header return.
We have 8 different statuses that are returned
GREEN - Everything works (usually pretty quick 0.5 - 2 seconds)
RED - Software not found (usually the longest from 5 - 15 seconds)
BLUE - Software found but not activated (usually from 3 - 9 seconds)
YELLOW - Server IP mismatch (usually from 1 - 3 seconds)
ORANGE - Server IP mismatch and wrong software type (usually 5 - 10 seconds)
PURPLE - Activation key incorrect (usually within 2 seconds)
BLACK - Domain returns 404 - No longer exists (usually within a second)
UNK - Connection failed (usually due to our load balancer -- VERY rare) (never countered this yet)
Now basically what happens is a cronJob will start the process by pulling the domain and product type. It will then cURL the domain and start cycling through the status colors above.
While this is happening we have an ajax page that is returning the results so we can keep an eye on the status. The major problem is the Time Remaining is so volatile that it does not do a good estimate. Here is the current math:
# Number of accounts between NOW and when started
$completedAccounts = floor($parseData[2]*($parseData[1]/100));
# Number of seconds between NOW and when started
$completedTime = strtotime("now") - strtotime("$hour:$minute:$second");
# Avg number of seconds per account
$avgPerCompleted = $completedTime / $completedAccounts;
# Total number of remaining accounts to be scanned
$remainingAccounts = $parseData[2] - $completedAccounts;
# The total of seconds remaining for all of the remaining accounts
$remainingSeconds = $remainingAccounts * $avgPerCompleted;
$remainingTime = format_time($remainingSeconds, ":");
I could create a count on all of the green, red, blue, etc... and do an average of how long each color does, then use that for the average time, although I don't believe that would give much better results.
With the difference in times that are so varied, any suggestions would be grateful?
Thanks,
Jeff

OK, I believe I have figured it out. I had to create a class so I could calculate a single regression over a period of time.
function calc() {
$n = count($this->mDatas);
$vSumXX = $vSumXY = $vSumX = $vSumY = 0;
//var_dump($this->mDatas);
$vCnt = 0; // for time-series, start at t=0<br />
foreach ($this->mDatas AS $vOne) {
if (is_array($vOne)) { // x,y pair<br />
list($x,$y) = $vOne;
} else { // time-series<br />
$x = $vCnt; $y = $vOne;
} // fi</p>
$vSumXY += $x*$y;
$vSumXX += $x*$x;
$vSumX += $x;
$vSumY += $y;
$vCnt++;
} // rof
$vTop = ($n*$vSumXY – $vSumX*$vSumY);
$vBottom = ($n*$vSumXX – $vSumX*$vSumX);
$a = $vBottom!=0?$vTop/$vBottom:0;
$b = ($vSumY – $a*$vSumX)/$n;
//var_dump($a,$b);
return array($a,$b);
}
I take each account and start building an array, for the amount of time it takes for each one. The array then runs through this calculation so it will build a x and y time sets. Finally I then run the array through the predict function.
/** given x, return the prediction y */
function calcpredict($x) {
list($a,$b) = $this->calc();
$y = $a*$x+$b;
return $y;
}
I put static values in so you could see the results:
$eachTime = array(7,1,.5,12,11,6,3,.24,.12,.28,2,1,14,8,4,1,.15,1,12,3,8,4,5,8,.3,.2,.4,.6,4,5);
$forecastProcess = new Linear($eachTime);
$forecastTime = $forecastProcess->calcpredict(5);
This overall system gives me about a .003 difference in 10 accounts and about 2.6 difference in 2700 accounts. Next will be to calculate the Accuracy.
Thanks for trying guys and gals

Testing time critical code

I've written a feature for my library Rubikon that displays a throbber (a spinning — as you may have seen in other console apps) as long as some other code is running.
To test this feature I capture the output of the throbber in a StringIO and compare it with the expected value. As the throbber is only displayed as long as the other code is running the content of the IO gets longer when the code runs longer. In my tests I do a simple sleep 1 and should have a constant 1 second delay. This works most of the time, but sometimes (apparently due to external factors like heavy load on the CPU) it fails, because the code doesn't run for 1 second, but for a bit more, so that the throbber prints a few additional characters.
My question is: Is there any possibility to test such time critical features in Ruby?

From your github repository, I found this test for the Throbber class:
should 'work correctly' do
ostream = StringIO.new
thread = Thread.new { sleep 1 }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
assert_equal " \b-\b\\\b|\b/\b", ostream.string
end
I'll assume that a throbber iterates over ['-', '\', '|', '/'], backspacing before each write, once per second. Consider the following test:
should 'work correctly' do
ostream = StringIO.new
started_at = Time.now
ended_at = nil
thread = Thread.new { sleep 1; ended_at = Time.now }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
duration = ended_at - started_at
iterated_chars = " -\\|/"
expected = ""
if duration >= 1
# After n seconds we should have n copies of " -\\|/", excluding \b for now
expected << iterated_chars * duration.to_i
end
# Next append the characters we'd get from working for fractions of a second:
remainder = duration - duration.to_i
expected << iterated_chars[0..((iterated_chars.length*remainder).to_i)] if remainder > 0.0
expected = expected.split('').join("\b") + "\b"
assert_equal expected, ostream.string
end
The last assignment of expected is a bit unpleasant, but I made the assumption that the throbber would write character/backspace pairs atomically. If this is not true, you should be able to insert the \b escape sequence into the iterated_chars string and remove the last assignment entirely.

This question is similar (I think, altough I'm not completely sure) to this one:
Only real time operating system can
give you such precision. You can
assume Thread.Sleep has a precision of
about 20 ms so you could, in theory
sleep until the desired time - the
actual time is about 20 ms and THEN
spin for 20 ms but you'll have to
waste those 20 ms. And even that
doesn't guarantee that you'll get real
time results, the scheduler might just
take your thread out just when it was
about to execute the RELEVANT part
(just after spinning)
The problem is not rubby (possibly, I'm no expert in ruby), the problem is the real time capabilities of your operating system.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Matlab ROS slow publishing + subscribing - performance

Related

Improve code result speed by multiprocessing

Distributed Julia: parallel map (pmap) with a timeout / time limit for each map task to complete

reading in parallel from" generator in Keras

Calculate time remaining with different length of variables

Testing time critical code

Categories

Resources