Recommendations for workflow when debugging Python scripts employing multiprocessing? - debugging

I use the Spyder IDE. Usually, when I am running non-parallelized scripts, I tend to debug using print statements. Depending on which statements are printed (or not), I can see where errors are occurring.
For example:
print "Started while loop..."
doWhileLoop = False
while doWhileLoop == True:
print "Doing something important!"
time.sleep(5)
print "Finished while loop..."
Above, I am missing a line that changes doWhileLoop to False at some point, so I will be stuck perpetually in the while loop, but my print statements let me see where it is in my code that I have hung up.
However, when running scripts that are parallelized, I get no output to the console until after the process has finished. Normally, what I do in this case is attempt to debug with a single process (i.e. temporarily deparallelize the program by running only one task, for instance), but currently, I am dealing with an error that seems to occur only when I am running more than one task.
So, I am having trouble figuring out what this error is using my usual methods -- how should I change my usual debugging practice in order to efficiently debug scripts employing multiprocessing?

Like #roippi said, debugging parallel things is hard. Another tool is using logging over print. Logging gives you severity, timestamps, and most importantly which process is doing something.
Example code:
import logging, multiprocessing, Queue
def myproc(arg):
return arg*2
def worker(inqueue, outqueue):
mylog = multiprocessing.get_logger()
mylog.info('start')
for job in iter(inqueue.get, 'STOP'):
mylog.info('got %s', job)
try:
outqueue.put( myproc(job), timeout=1 )
except Queue.Full:
mylog.error('queue full!')
mylog.info('done')
def executive(inqueue):
total = 0
mylog = multiprocessing.get_logger()
for num in iter(inqueue.get, 'STOP'):
total += num
mylog.info('got {}\ttotal{}', job, total)
logger = multiprocessing.log_to_stderr(
level=logging.INFO,
)
logger.info('setup')
inqueue, outqueue = multiprocessing.Queue(), multiprocessing.Queue()
if 0: # debug 'queue full!' issues
outqueue = multiprocessing.Queue(maxsize=1)
# prefill with 3 jobs
for num in range(3):
inqueue.put(num)
# signal end of jobs
inqueue.put('STOP')
worker_p = multiprocessing.Process(
target=worker, args=(inqueue, outqueue),
name='worker',
)
worker_p.start()
worker_p.join()
logger.info('done')
Example output:
[INFO/MainProcess] setup
[INFO/worker] child process calling self.run()
[INFO/worker] start
[INFO/worker] got 0
[INFO/worker] got 1
[INFO/worker] got 2
[INFO/worker] done
[INFO/worker] process shutting down
[INFO/worker] process exiting with exitcode 0
[INFO/MainProcess] done
[INFO/MainProcess] process shutting down

Related

Python multiprocessing Pool map and imap

I have a multiprocessing script with pool.map that works. The problem is that not all processes take as long to finish, so some processes fall asleep because they wait until all processes are finished (same problem as in this question). Some files are finished in less than a second, others take minutes (or hours).
If I understand the manual (and this post) correctly, pool.imap is not waiting for all the processes to finish, if one is done, it is providing a new file to process. When I try that, the script is speeding over the files to process, the small ones are processed as expected, the large files (that take more time to process) don't finish until the end (are killed without notice ?). Is this normal behavior for pool.imap, or do I need to add more commands/parameters ? When I add the time.sleep(100) in the else part as test, it is processing more large files but the other processes fall asleep. Any suggestions ? Thanks
def process_file(infile):
#read infile
#compare things in infile
#acquire Lock, save things in outfile, release Lock
#delete infile
def main():
#nprocesses = 8
global filename
pathlist = ['tmp0', 'tmp1', 'tmp2', 'tmp3', 'tmp4', 'tmp5', 'tmp6', 'tmp7', 'tmp8', 'tmp9']
for d in pathlist:
os.chdir(d)
todolist = []
for infile in os.listdir():
todolist.append(infile)
try:
p = Pool(processes=nprocesses)
p.imap(process_file, todolist)
except KeyboardInterrupt:
print("Shutting processes down")
# Optionally try to gracefully shut down the worker processes here.
p.close()
p.terminate()
p.join()
except StopIteration:
continue
else:
time.sleep(100)
os.chdir('..')
p.close()
p.join()
if __name__ == '__main__':
main()
Since you already put all your files in a list, you could put them directly into a queue. The queue is then shared with your sub-processes that take the file names from the queue and do their stuff. No need to do it twice (first into list, then pickle list by Pool.imap). Pool.imap is doing exactly the same but without you knowing it.
todolist = []
for infile in os.listdir():
todolist.append(infile)
can be replaced by:
todolist = Queue()
for infile in os.listdir():
todolist.put(infile)
The complete solution would then look like:
def process_file(inqueue):
for infile in iter(inqueue.get, "STOP"):
#do stuff until inqueue.get returns "STOP"
#read infile
#compare things in infile
#acquire Lock, save things in outfile, release Lock
#delete infile
def main():
nprocesses = 8
global filename
pathlist = ['tmp0', 'tmp1', 'tmp2', 'tmp3', 'tmp4', 'tmp5', 'tmp6', 'tmp7', 'tmp8', 'tmp9']
for d in pathlist:
os.chdir(d)
todolist = Queue()
for infile in os.listdir():
todolist.put(infile)
process = [Process(target=process_file,
args=(todolist) for x in range(nprocesses)]
for p in process:
#task the processes to stop when all files are handled
#"STOP" is at the very end of queue
todolist.put("STOP")
for p in process:
p.start()
for p in process:
p.join()
if __name__ == '__main__':
main()

Is it reasonable to use resque(ruby) to manage external long-running commands (and log tasks)

I have to run bash heavy-job.sh <data-num> (that takes 0.5~2 days) frequently on my computer to process data located at ~/a/data/num . The script call a few sub-processes sequentially and write a log to ~/a/result/num.log . I have done this manually until now.
I wanted to visualize processed tasks and it's status(success or fail), etc as html table. I wrote simple sinatra app to render a table that shows
the list of ~/a/data/num to be processed
~/a/result/num.log exists or not (process not-launched/processing/done)
it's status (the log file contains the word "error" or not)
I found that it would be convenient that if I could launch a bash heavy-job.sh <data-num> from the sinatra app, log the tasks (and info like time,date,etc..) and it's args (heavy-jobs takes some optional args ) and show them as html table.
So I need something that manages jobs and logs to files (or db).
First I wrote a code like below for test (! for test, not integrated with my system yet !), but later I found resque is what i wanted. I am a beginner and not sure if my decision is reasonable or not.
my questions are
is it reasonable to use resque to manage external long-running commands (and log tasks)
or should I use another tool (not necessarily ruby-tool).
(extra;) the task-manager and the sinatra app should work separately (and communicate each other over REST or something) OR not ?
The jobs are not critical since I can retry tasks manually later if failed.
I am not good at English and my question may be misleading. I appreciate any help :) .
class TaskSpawn
def initialize()
#pids = []
end
def spawn(command, options = {})
#opt = {:pgroup => true}
#pids << Kernel.spawn(command, options)
end
def pids()
return #pids.clone
end
def waitany_nohang()
delete_idx = nil
ret = nil
#pids.each_with_index do |p, idx|
pid,status = Process.waitpid2(p, Process::WNOHANG)
unless pid.nil?
delete_idx = idx
ret = [pid,status]
break
end
end
if delete_idx
#pids.delete_at(delete_idx)
return ret
else
# no task fininshed
return nil
end
end
def waitall()
ret = waitall
raise "interal error" if ret.size != pids.size
return ret
end
end

Handling zombie processes when using waitpid

I have the following method, the idea is to run a shell command and both stream output to stdout as its recieved and store the information as a variable so I can return a hash of the information, I found no standard way of doing this (you either get streaming or captured output).
It does this by creating a forks to stream the output and append to an IO pipe that I can read in at a later date.
def self.run_cmd(cmd)
stdout_rd, stdout_wr = IO.pipe
stderr_rd, stderr_wr = IO.pipe
status = Open4::popen4(cmd) do |_pid, _stdin, _stdout, _stderr|
pids = []
pids << fork do
_stdout.each_line do |l|
print l
stdout_wr.puts l
end
end
pids << fork do
_stderr.each_line do |l|
print l
stderr_wr.puts l
end
end
pids.each{|pid| Process.waitpid(pid)}
end
stdout_wr.close
stderr_wr.close
out = stdout_rd.gets
out = '' if out.nil?
err = stderr_rd.gets
err = '' if err.nil?
{ stdout: out, stderr: err, status: status.exitstatus }
end
This works great in most scenarios but specifically unzip doesn't play well with this approach, what happens is after a fixed amount of output from zip it will stall at stdout_wr.puts l
I've observed that when the ruby process has stalled that a zombie unzip is visible when running ps
Is there any way I can make this work?
Is there a better way of doing this? I appreciate that its a complex solution and it must be easier.
My potential idea is that my IO pipe is running out of buffered space but I'm able to print 10,000 lines of output without issue.

Using parfor and labSend/labRecieve

I want to run two matlab scripts in parallel for a project and communicate between them. The purpose of this is to have one script do image analysis and sending the results to the other which will use it for more calculations (time consuming, but not related to the task of finding stuff in the images). Since both tasks are time consuming, and should preferably be done in real time, I believe that parallelization is necessary.
To get a feel for how this should be done I created a test script to find out how to communicate between the two scripts.
The first script takes a user input using the built in function input, and then using labSend sends it to the other, which recieves it, and prints it.
function [blarg] = inputStuff(blarg)
mpiInit(); %added because of error message, but do not work...
for i=1:2
labBarrier; % added because of error message
inp = input('Enter a number to write');
labSend(inp);
if (inp == 0)
break;
else
i = 1;
end
end
end
function [ blarg ] = testWrite( blarg )
mpiInit(); % added because of error message, but does not help
par = 0;
if ( blarg == 0)
par = 1;
end
for i = 1:10
if (par == 1)
labBarrier
delta = labReceive();
i = 1;
else
delta = input('Enter number to write');
end
if (delta == 0)
break;
end
s = strcat('This lab no', num2str(labindex), '. Delta is = ')
delta
end
end
%%This is the file test_parfor.m
funlist = {#inputStuff, #testWrite};
matlabpool(2);
mpiInit(); % added because of error message, but does not help
parfor i=1:2
funlist{i}(0);
end
matlabpool close;
Then, when the code is run, the following error message appears:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
Error using parallel_function (line 589)
The MPI implementation has not yet been loaded. Please
call mpiInit.
Error stack:
testWrite.m at 11
Error in test_parfor (line 8)
parfor i=1:2
Calling the method mpiInit does not help... (Called as shown in the code above.)
And nowhere in the examples that mathworks have in the documentation, or on their website, show this error or what to do with it.
Any help is appreciated!
You would typically use constructs such as labSend, labRecieve and labBarrier within an spmd block, rather than a parfor block.
parfor is intended for implementing embarrassingly parallel algorithms, in other words algorithms that consist of multiple independent tasks that can be run in parallel, and do not require communication between tasks.
I'm stretching my knowledge here (perhaps someone more expert can correct me), but as I understand things, it does not set up an MPI ring for communication between workers, which is probably the explanation for the (rather uninformative) error message you're getting.
An spmd block enables communication between workers using labSend, labRecieve and labBarrier. There are quite a few examples of using them all in the documentation.
Sam is right that the MPI functionality is not enabled during parfor, only during spmd. You need to do something more like this:
spmd
funlist{labindex}(0);
end
(Sam is also quite right that the error message you saw is pretty unhelpful)

How do I block on reading a named pipe in Ruby?

I'm trying to set up a Ruby script that reads from a named pipe in a loop, blocking until input is available in the pipe.
I have a process that periodically puts debugging events into a named pipe:
# Open the logging pipe
log = File.open("log_pipe", "w+") #'log_pipe' created in shell using mkfifo
...
# An interesting event happens
log.puts "Interesting event #4291 occurred"
log.flush
...
I then want a separate process that will read from this pipe and print events to the console as they happen. I've tried using code like this:
input = File.open("log_pipe", "r+")
while true
puts input.gets #I expect this to block and wait for input
end
# Kill loop with ctrl+c when done
I want the input.gets to block, waiting patiently until new input arrives in the fifo; but instead it immediately reads nil and loops again, scrolling off the top of the console window.
Two things I've tried:
I've opened the input fifo with both "r" and "r+"--I have the same problem either way;
I've tried to determine if my writing process is sending EOF (which I've heard will cause the read fifo to close)--AFAIK it isn't.
SOME CONTEXT:
If it helps, here's a 'big picture' view of what I'm trying to do:
I'm working on a game that runs in RGSS, a Ruby based game engine. Since it doesn't have good integrated debugging, I want to set up a real-time log as the game runs--as events happen in the game, I want messages to show up in a console window on the side. I can send events in the Ruby game code to a named pipe using code similar to the writer code above; I'm now trying to set up a separate process that will wait for events to show up in the pipe and show them on the console as they arrive. I'm not even sure I need Ruby to do this, but it was the first solution I could think of.
Note that I'm using mkfifo from cygwin, which I happened to have installed anyway; I wonder if that might be the source of my trouble.
If it helps anyone, here's exactly what I see in irb with my 'reader' process:
irb(main):001:0> input = File.open("mypipe", "r")
=> #<File:mypipe>
irb(main):002:0> x = input.gets
=> nil
irb(main):003:0> x = input.gets
=> nil
I don't expect the input.gets at 002 and 003 to return immediately--I expect them to block.
I found a solution that avoids using Cygwin's unreliable named pipe implementation entirely. Windows has its own named pipe facility, and there is even a Ruby Gem called win32-pipe that uses it.
Unfortunately, there appears to be no way to use Ruby Gems in an RGSS script; but by dissecting the win32-pipe gem, I was able to incorporate the same idea into an RGSS game. This code is the bare minimum needed to log game events in real time to a back channel, but it can be very useful for deep debugging.
I added a new script page right before 'Main' and added this:
module PipeLogger
# -- Change THIS to change the name of the pipe!
PIPE_NAME = "RGSSPipe"
# Constant Defines
PIPE_DEFAULT_MODE = 0 # Pipe operation mode
PIPE_ACCESS_DUPLEX = 0x00000003 # Pipe open mode
PIPE_UNLIMITED_INSTANCES = 255 # Number of concurrent instances
PIPE_BUFFER_SIZE = 1024 # Size of I/O buffer (1K)
PIPE_TIMEOUT = 5000 # Wait time for buffer (5 secs)
INVALID_HANDLE_VALUE = 0xFFFFFFFF # Retval for bad pipe handle
#-----------------------------------------------------------------------
# make_APIs
#-----------------------------------------------------------------------
def self.make_APIs
$CreateNamedPipe = Win32API.new('kernel32', 'CreateNamedPipe', 'PLLLLLLL', 'L')
$FlushFileBuffers = Win32API.new('kernel32', 'FlushFileBuffers', 'L', 'B')
$DisconnectNamedPipe = Win32API.new('kernel32', 'DisconnectNamedPipe', 'L', 'B')
$WriteFile = Win32API.new('kernel32', 'WriteFile', 'LPLPP', 'B')
$CloseHandle = Win32API.new('kernel32', 'CloseHandle', 'L', 'B')
end
#-----------------------------------------------------------------------
# setup_pipe
#-----------------------------------------------------------------------
def self.setup_pipe
make_APIs
##name = "\\\\.\\pipe\\" + PIPE_NAME
##pipe_mode = PIPE_DEFAULT_MODE
##open_mode = PIPE_ACCESS_DUPLEX
##pipe = nil
##buffer = 0.chr * PIPE_BUFFER_SIZE
##size = 0
##bytes = [0].pack('L')
##pipe = $CreateNamedPipe.call(
##name,
##open_mode,
##pipe_mode,
PIPE_UNLIMITED_INSTANCES,
PIPE_BUFFER_SIZE,
PIPE_BUFFER_SIZE,
PIPE_TIMEOUT,
0
)
if ##pipe == INVALID_HANDLE_VALUE
# If we could not open the pipe, notify the user
# and proceed quietly
print "WARNING -- Unable to create named pipe: " + PIPE_NAME
##pipe = nil
else
# Prompt the user to open the pipe
print "Please launch the RGSSMonitor.rb script"
end
end
#-----------------------------------------------------------------------
# write_to_pipe ('msg' must be a string)
#-----------------------------------------------------------------------
def self.write_to_pipe(msg)
if ##pipe
# Format data
##buffer = msg
##size = msg.size
$WriteFile.call(##pipe, ##buffer, ##buffer.size, ##bytes, 0)
end
end
#------------------------------------------------------------------------
# close_pipe
#------------------------------------------------------------------------
def self.close_pipe
if ##pipe
# Send kill message to RGSSMonitor
##buffer = "!!GAMEOVER!!"
##size = ##buffer.size
$WriteFile.call(##pipe, ##buffer, ##buffer.size, ##bytes, 0)
# Close down the pipe
$FlushFileBuffers.call(##pipe)
$DisconnectNamedPipe.call(##pipe)
$CloseHandle.call(##pipe)
##pipe = nil
end
end
end
To use this, you only need to make sure to call PipeLogger::setup_pipe before writing an event; and call PipeLogger::close_pipe before game exit. (I put the setup call at the start of 'Main', and add an ensure clause to call close_pipe.) After that, you can add a call to PipeLogger::write_to_pipe("msg") at any point in any script with any string for "msg" and write into the pipe.
I have tested this code with RPG Maker XP; it should also work with RPG Maker VX and later.
You will also need something to read FROM the pipe. There are any number of ways to do this, but a simple one is to use a standard Ruby installation, the win32-pipe Ruby Gem, and this script:
require 'rubygems'
require 'win32/pipe'
include Win32
# -- Change THIS to change the name of the pipe!
PIPE_NAME = "RGSSPipe"
Thread.new { loop { sleep 0.01 } } # Allow Ctrl+C
pipe = Pipe::Client.new(PIPE_NAME)
continue = true
while continue
msg = pipe.read.to_s
puts msg
continue = false if msg.chomp == "!!GAMEOVER!!"
end
I use Ruby 1.8.7 for Windows and the win32-pipe gem mentioned above (see here for a good reference on installing gems). Save the above as "RGSSMonitor.rb" and invoke it from the command line as ruby RGSSMonitor.rb.
CAVEATS:
The RGSS code listed above is fragile; in particular, it does not handle failure to open the named pipe. This is not usually an issue on your own development machine, but I would not recommend shipping this code.
I haven't tested it, but I suspect you'll have problems if you write a lot of things to the log without running a process to read the pipe (e.g. RGSSMonitor.rb). A Windows named pipe has a fixed size (I set it here to 1K), and by default writes will block once the pipe is filled (because no process is 'relieving the pressure' by reading from it). Unfortunately, the RPGXP engine will kill a Ruby script that has stopped running for 10 seconds. (I'm told that RPGVX has eliminated this watchdog function--in which case, the game will hang instead of abruptly terminating.)
What's probably happening is the writing process is exiting, and as there are no other writing processes, EOF is sent to the pipe which causes gets to return nil, and so your code loops continually.
To get around this you can usually just open the pipe read-write at the reader end. This works for me (on a Mac), but isn't working for you (you've tried "r" and "r+"). I'm guessing this is to due with Cygwin (POSIX says opening a FIFO read-write is undefined).
An alternative is to open the pipe twice, once read-only and once write-only. You don't use the write-only IO for anything, it's just so that there's always an active writer attached to the pipe so it doesn't get closed.
input = File.open("log_pipe", "r") # note 'r', not 'r+'
keep_open = File.open("log_pipe", "w") # ensure there's always a writer
while true
puts input.gets
end

Resources