Profiling parallel code Julia - parallel-processing

How can we profile parallel code in julia? This question has been asked before. In fact, following the advice to call profile on each node does not work.
function profileSlaveTask(param)
#profile slaveTask(param)
return Profile.retrieve()
end
rrefs=Array(RemoteRef,length(workers()))
machines=workers()
for i=1:length(machines)
rrefs[i]= #spawnat machines[i] slaveTask(initdamp)
end
pres= fetch(rrefs[1])
using ProfileView
ProfileView(pres[1],lidict=pres[2])
Using ProfileView I obtain :

Works just fine for me (julia 0.4.0-dev, Ubuntu 14.04):
p = addprocs(1)[1]
#everywhere function profile_svd(A)
println("starting profile_svd")
#profile svd(sdata(A))
println("done with svd")
Profile.retrieve()
end
println("about to allocate SharedArray")
A = SharedArray(Float64,1000,1000)
println("about to fill SharedArray")
rand!(sdata(A))
println("about to call worker")
bt, lidict = remotecall_fetch(p, profile_svd, A)

Related

Lua: Why the function didnt call?

Okay I am coding in lua a cheat for Roblox (just for fun).
But I created a function and the function got called!
I used the Library Kavo UI for the window.
But...
There is everything looks like in my code just that I change function() to combat()!
If I run this it doesn't show the UI!
But if I delete the function and the section it shows!
How do I fix it?
local Library = loadstring(game:HttpGet("https://raw.githubusercontent.com/xHeptc/Kavo-UI-Library/main/source.lua"))()
local Window = Library.CreateLib("Ghoster", "Synapse")
local Combat = Window:NewTab("Combat")
local Movement = Window:NewTab("Movement")
local Exploit = Window:NewTab("Exploits")
local CombatSection = Combat:NewSection("Combat Options")
local MovementSection = Movement:NewSection("Movement Options")
local ExploitSection = Exploit:NewSection("Exploit Options")
function aimbot()
loadstring(game:HttpGet(('https://gitlab.com/marsscripts/marsscripts/-/raw/master/CCAimbotV2'),true))()
end
CombatSection:NewButton("Aimbot", "Aims your enemies", aimbot()
print("Loaded Aimbot")
end
The problem lies in misunderstanding the following syntax:
CombatSection:NewButton("Aimbot", "Aims your enemies", function()
print("Loaded Aimbot")
end
CombatSection:NewButton receives 3 arguments, "Aimbot", "Aims your enemies" and an anonymous function.
Just to ilustrate what you did, let's put that anonymous function into a variable instead:
local yourFunction = function()
print("Loaded Aimbot")
end
And you changed that to (the invalid code)
local yourFunction = aimbot()
print("Loaded Aimbot")
end
What you actually wanted is to pass aimbot as a function:
CombatSection:NewButton("Aimbot", "Aims your enemies", aimbot)
Or call aimbot from within that anomynous function:
CombatSection:NewButton("Aimbot", "Aims your enemies", function()
aimbot()
print("Loaded Aimbot")
end

In the PyTorch Distributed Data Parallel (DDP) tutorial, how does `setup` know it's rank?

For the tutorial Getting Started with Distributed Data Parallel
How does setup() function knows the rank when mp.spawn() doesn't pass the rank?
def setup(rank, world_size):
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355'
# initialize the process group
dist.init_process_group("gloo", rank=rank, world_size=world_size)
def demo_basic(rank, world_size):
print(f"Running basic DDP example on rank {rank}.")
setup(rank, world_size)
.......
def run_demo(demo_fn, world_size):
mp.spawn(demo_fn,
args=(world_size,),
nprocs=world_size,
join=True)
if __name__ == "__main__":
n_gpus = torch.cuda.device_count()
assert n_gpus >= 2, f"Requires at least 2 GPUs to run, but got {n_gpus}"
world_size = n_gpus
run_demo(demo_basic, world_size)
mp.spawn does pass the rank to the function it calls.
From the torch.multiprocessing.spawn docs
torch.multiprocessing.spawn(fn, args=(), nprocs=1, join=True, daemon=False, start_method='spawn')
...
fn (function) -
Function is called as the entrypoint of the spawned process. This
function must be defined at the top level of a module so it can be
pickled and spawned. This is a requirement imposed by multiprocessing. The function is called as fn(i, *args), where i is the process index
and args is the passed through tuple of arguments.
So when spawn invokes fn it passes it the process index as the first argument.

How to run a function in parallel with Julia language?

I am trying to figure out how to work with parallel computing with Julia. The documentation looks great, even for someone like me that has never worked with Parallel Computing (and that does not understand most of the concepts behind the documentation ;)).
Just to mention: I am working in a PC with Ubuntu. It has a 4-core processor.
To run the code I describe below I am calling the julia terminal as:
$ julia -p 4
I am following the documentation here. I am facing some problems with examples described in this section
I am trying to run the following piece of code:
#everywhere advection_shared_chunk!(q, u) = advection_chunk!(q, u, myrange(q)..., 1:size(q,3)-1)
function advection_shared!(q, u)
#sync begin
for p in procs(q)
#async remotecall_wait(advection_shared_chunk!, p, q, u)
end
end
q
end
q = SharedArray(Float64, (500,500,500))
u = SharedArray(Float64, (500,500,500))
#Run once to JIT-compile
advection_shared!(q,u)
But I am facing the following error:
ERROR: MethodError: `remotecall_wait` has no method matching remotecall_wait(::Function, ::Int64, ::SharedArray{Float64,3}, ::SharedArray{Float64,3})
Closest candidates are:
remotecall_wait(::LocalProcess, ::Any, ::Any...)
remotecall_wait(::Base.Worker, ::Any, ::Any...)
remotecall_wait(::Integer, ::Any, ::Any...)
in anonymous at task.jl:447
...and 3 other exceptions.
in sync_end at ./task.jl:413
[inlined code] from task.jl:422
in advection_shared! at none:2
What am I doing wrong here? As far as I know I am just reproducing the example in the docs... or not?
Thanks for any help,
Thanks #Daniel Arndt, you found the trick! I was looking at the docs in: http://docs.julialang.org/en/latest/manual/parallel-computing/ I thought it was supposed to be the one relative to Julia 0.4.x (the latest stable version so far) but it seems that it is relative to Julia 0.5.x (the latest version among all versions).
I did the changes you suggested (changed the order and added the functions that were missing) and everything worked like a charm. I will leave here the updated code
# Here's the kernel
#everywhere function advection_chunk!(q, u, irange, jrange, trange)
#show (irange, jrange, trange) # display so we can see what's happening
for t in trange, j in jrange, i in irange
q[i,j,t+1] = q[i,j,t] + u[i,j,t]
end
q
end
# This function retuns the (irange,jrange) indexes assigned to this worker
#everywhere function myrange(q::SharedArray)
idx = indexpids(q)
if idx == 0
# This worker is not assigned a piece
return 1:0, 1:0
end
nchunks = length(procs(q))
splits = [round(Int, s) for s in linspace(0,size(q,2),nchunks+1)]
1:size(q,1), splits[idx]+1:splits[idx+1]
end
#everywhere advection_shared_chunk!(q, u) = advection_chunk!(q, u, myrange(q)..., 1:size(q,3)-1)
function advection_shared!(q, u)
#sync begin
for p in procs(q)
#async remotecall_wait(p, advection_shared_chunk!, q, u)
end
end
q
end
q = SharedArray(Float64, (500,500,500))
u = SharedArray(Float64, (500,500,500))
#Run once to JIT-compile
advection_shared!(q,u)
Done!
I don't believe you are doing anything wrong, other than you are likely using a newer version of the docs (or we're seeing different things!).
Lets make sure you're using Julia 0.4.x and these docs: http://docs.julialang.org/en/release-0.4/manual/parallel-computing/
In Julia v0.5.0, the order of the first two parameters for remotecall_wait was changed. Switch the order to remotecall_wait(p, advection_shared_chunk!, q, u) and you should be on to your next error (myrange is not defined, which can be found earlier in the docs)

How do I mock AWS SDK (v2) with rspec?

I have a class which reads/processes messages from an SQS queue using the aws-sdk-rails gem (which is a wrapper on aws-sdk-ruby v2). How do I mock the AWS calls so I can test my code without hitting the external services?
communicator.rb:
class Communicator
def consume_messages
sqs_client = Aws::SQS::Client.new
# consume messages until the queue is empty
loop do
r = sqs_client.receive_message({
queue_url: "https://sqs.region.amazonaws.com/xxxxxxxxxxxx/foo",
visibility_timeout: 1,
max_number_of_messages: 1
})
break if (response.message.length == 0)
# process r.messages.first.body
r = sqs_client.delete_message({
queue_url: "https://sqs.region.amazonaws.com/xxxxxxxxxxxx/foo",
receipt_handle: r.messages.first.receipt_handle
})
end
end
end
The AWS SDK already provides stubbing. q.v. http://docs.aws.amazon.com/sdkforruby/api/Aws/ClientStubs.html for more information (Linked to official documentation.)
I had a hard time finding examples mocking AWS resources. I spent a few days figuring it out and wanted to share my results on Stack Overflow for posterity. I used rspec-mocks (doubles & verifying doubles). Here's an example with the communicator.rb example in the question.
communicator_spec.rb:
RSpec.describe Communicator do
describe "#consume_messages" do
it "can use rspec doubles & verifying doubles to mock AWS SDK calls" do
sqs_client = instance_double(Aws::SQS::Client)
allow(Aws::SQS::Client).to receive(:new).and_return(sqs_client)
SQSResponse = Struct.new(:messages)
SQSMessage = Struct.new(:body, :receipt_handle)
response = SQSResponse.new([SQSMessage.new(File.read('data/expected_body.json'), "receipt_handle")])
empty_response = SQSResponse.new([])
allow(sqs_client).to receive(:receive_message).
and_return(response, empty_response)
allow(sqs_client).to receive(:delete_message).and_return(nil)
Communicator.new.consume_messages
end
end
end

Using parfor and labSend/labRecieve

I want to run two matlab scripts in parallel for a project and communicate between them. The purpose of this is to have one script do image analysis and sending the results to the other which will use it for more calculations (time consuming, but not related to the task of finding stuff in the images). Since both tasks are time consuming, and should preferably be done in real time, I believe that parallelization is necessary.
To get a feel for how this should be done I created a test script to find out how to communicate between the two scripts.
The first script takes a user input using the built in function input, and then using labSend sends it to the other, which recieves it, and prints it.
function [blarg] = inputStuff(blarg)
mpiInit(); %added because of error message, but do not work...
for i=1:2
labBarrier; % added because of error message
inp = input('Enter a number to write');
labSend(inp);
if (inp == 0)
break;
else
i = 1;
end
end
end
function [ blarg ] = testWrite( blarg )
mpiInit(); % added because of error message, but does not help
par = 0;
if ( blarg == 0)
par = 1;
end
for i = 1:10
if (par == 1)
labBarrier
delta = labReceive();
i = 1;
else
delta = input('Enter number to write');
end
if (delta == 0)
break;
end
s = strcat('This lab no', num2str(labindex), '. Delta is = ')
delta
end
end
%%This is the file test_parfor.m
funlist = {#inputStuff, #testWrite};
matlabpool(2);
mpiInit(); % added because of error message, but does not help
parfor i=1:2
funlist{i}(0);
end
matlabpool close;
Then, when the code is run, the following error message appears:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
Error using parallel_function (line 589)
The MPI implementation has not yet been loaded. Please
call mpiInit.
Error stack:
testWrite.m at 11
Error in test_parfor (line 8)
parfor i=1:2
Calling the method mpiInit does not help... (Called as shown in the code above.)
And nowhere in the examples that mathworks have in the documentation, or on their website, show this error or what to do with it.
Any help is appreciated!
You would typically use constructs such as labSend, labRecieve and labBarrier within an spmd block, rather than a parfor block.
parfor is intended for implementing embarrassingly parallel algorithms, in other words algorithms that consist of multiple independent tasks that can be run in parallel, and do not require communication between tasks.
I'm stretching my knowledge here (perhaps someone more expert can correct me), but as I understand things, it does not set up an MPI ring for communication between workers, which is probably the explanation for the (rather uninformative) error message you're getting.
An spmd block enables communication between workers using labSend, labRecieve and labBarrier. There are quite a few examples of using them all in the documentation.
Sam is right that the MPI functionality is not enabled during parfor, only during spmd. You need to do something more like this:
spmd
funlist{labindex}(0);
end
(Sam is also quite right that the error message you saw is pretty unhelpful)

Resources