subprocess32.Popen crashes (cpu 100%) - bash

I have been trying to use subprocess32.Popen but this causes my system to crash (CPU 100%). So, I have the following code:
import subprocess32 as subprocess
for i in some_iterable:
output = subprocess.Popen(['/path/to/sh/file/script.sh',i[0],i[1],i[2],i[3],i[4],i[5]],shell=False,stdin=None,stdout=None,stderr=None,close_fds=True)
Before this, I had the following:
import subprocess32 as subprocess
for i in some_iterable:
output subprocess.check_output(['/path/to/sh/file/script.sh',i[0],i[1],i[2],i[3],i[4],i[5]])
.. and I had no problems with this - except that it was dead slow.
With Popen I see that this is fast - but my CPU goes too 100% in a couple of secs and the system crashes - forcing a hard reboot.
I am wondering what it is I am doing which is making Popen to crash?
On Linux,Python2.7 if that helps at all.
Thanks.

The problem is that you are trying to start 2 millon processes at once, which is blocking your system.
A solution would be to use a Pool to limit the maximum number of processes that can run at a time, and wait for each process to finish. For this cases where you're starting subprocesses and waiting for them (IO bound), a thread pool from the multiprocessing.dummy module would do:
import multiprocessing.dummy as mp
import subprocess32 as subprocess
def run_script(args):
args = ['/path/to/sh/file/script.sh'] + args
process = subprocess.Popen(args, close_fds=True)
# wait for exit and return exit code
# or use check_output() instead of Popen() if you need to process the output.
return process.wait()
# use a pool of 10 to allow at most 10 processes to be alive at a time
threadpool = mp.Pool(10)
# pool.imap or pool.imap_unordered should be used to avoid creating a list
# of all 2M return values in memory
results = threadpool.imap_unordered(run_script, some_iterable)
for result in results:
... # process result if needed
I've left out most of the arguments to Popen because you are using the default values anyway. The size of the pool should probably be in the range of your available CPU cores if your script is doing comutational work, if it's doing mostly IO (network access, writing files, ...) then probably more.

Related

Memory leak when using threads in ruby

I have a script that should ping hosts in separate threads. Separate threads are used for separate specific hosts pinging, because some pings sometimes takes longer than others. And if we will wait some one ping - others timeouts gets bigger than real.
That's code create a separate thread for each host. I just copy-pasted it from some example. I'am not sure that this code correct. Also I has memory leaks.
threads = []
config.each do |array_item|
host = array_item[0]
packet_size = array_item[1]
threads << Thread.new do
puts "\nCreating a thread for host:#{host} value:#{packet_size}"
ping(host, packet_size)
end
end
threads.each(&:join)
Also I run into heap analyzing, but cannot understand what's wrong. I just suppose that threads are not terminates.
Analyzing Heap (Generation: 10)
-------------------------------
allocated by memory (4717121) (in bytes)
==============================
3150072 /usr/lib/ruby/2.3.0/timeout.rb:81
1050024 ./pinger.rb:166
./pinger.rb:166 - is points to threads << Thread.new do
Do I need to control threads allocation and some sort of threads closing controls ?

MacOS: Why does Multiprocessing Queue.put stop working?

I have a pandas DataFrame with about 45,000 rows similar to:
from numpy import random
from pandas import DataFrame
df = DataFrame(random.rand(45000, 200))
I am trying to break up all the rows into a multiprocessing Queue like this:
from multiprocessing import Queue
rows = [idx_and_row[1] for idx_and_row in df.iterrows()]
my_queue = Queue(maxsize = 0)
for idx, r in enumerate(rows):
# print(idx)
my_queue.put(r)
But when I run it, only about 37,000 things get put into my_queue and then it the program raises the following error:
raise Full
queue.Full
What is happening and how can I fix it?
The multiprocessing.Queue is designed for inter-process communication. It is not intended for storing large amount of data. For that purpose, I'd suggest to use Redis or Memcached.
Usually, the queue maximum size is platform dependent, even if you set it to 0. You have no easy way to workaround that.
It seems that on windows, the maximum amount of objects in a multiprocessing.Queue is infinite, but on Linux and MacOS the maximum size is 32767, which is 215 - 1, here is the significance of that number.
I solved the program by making an empty Queue object and then passing it to all the processes I wanted to pass it to, plus another process. The additional process is responsible for filling the queue with 10,000 rows, and checking it every few seconds to see if the queue has been emptied. When its empty, another 10,000 rows are added. This way, all 45,000 row is processed.

Python 3.6 multiprocessing on windows 10 using pool.apply_async to create new threads stops working after many iterations

I have recently started working with pythons multiprocessing library, and decided that using the Pool() and apply_async() approach is the most suitable for my problem. The code is quite long, but for this question I've compressed everything that isn't related to the multiprocessing in functions.
Background information
Basically, my program is supposed to take some data structure and send it to another program that will process it and write the results to a txt file. I have several thousands of these structures (N*M), and there are big chunks (M) that are independent and can be processed in any order. I created a worker pool to process these M structures before retrieving the next chunk. In order to process one structure, a new thread has to be created for the external program to run. The time spend outside the external program during the processing is less than 20 %, so if I check Task Manager, I can see the external program running under processes.
Actual problem
This works very well for a while, but after many processed structures (any number between 5000 and 20000) suddenly the external program stop showing up in the Task Manager and the python children runs at their individual peak performance (~13% cpu) without producing any more results. I don't understand what the problem might be. There are plenty of RAM left, and each child only use around 90 Mb. Also it is really weird that it works for quite some time and then stops. If I use ctrl-c, it stops after a few minutes, so it is semi-irresponsive to user input.
One thought I had was that when the timed-out external program thread is killed (which happens every now and then), maybe something isn't closed properly so that the child process is waiting for something it cannot find anymore? And if so, is there any better way of handling timed-out external processes?
from multiprocessing import Pool, TimeoutError
N = 500 # Number of chunks of data that can be multiprocessed
M = 80 # Independed chunk of data
timeout = 100 # Higher than any of the value for dataStructures.timeout
if __name__ == "__main__":
results = [None]*M
savedData = []
with Pool(processes=4) as pool:
for iteration in range(N):
dataStructures = [generate_data_structure(i) for i in range(M)]
#---Process data structures---
for iS, dataStructure in enumerate(dataStructures):
results[iS] = pool.apply_async(processing_func,(dataStructure,))
#---Extract processed data---
for iR, result in enumerate(results):
try:
processedData = result.get(timeout=timeout)
except TimeoutError:
print("Got TimeoutError.")
if processedData.someBool:
savedData.append(processedData)
Here is also the functions that create the new thread for the external program.
import subprocess as sp
import win32api as wa
import threading
def processing_func(dataStructure):
# Call another program that processes the data, and wait until it is finished/timed out
timedOut = RunCmd(dataStructure.command).start_process(dataStructure.timeout)
# Read the data from the other program, stored in a text file
if not timedOut:
processedData = extract_data_from_finished_thread()
else:
processedData = 0.
return processedData
class RunCmd(threading.Thread):
CREATE_NO_WINDOW = 0x08000000
def __init__(self, cmd):
threading.Thread.__init__(self)
self.cmd = cmd
self.p = None
def run(self):
self.p = sp.Popen(self.cmd, creationflags=self.CREATE_NO_WINDOW)
self.p.wait()
def start_process(self, timeout):
self.start()
self.join(timeout)
timedOut = self.is_alive()
# Kills the thread if timeout limit is reached
if timedOut:
wa.TerminateProcess(self.p._handle,-1)
self.join()
return timedOut

zeromq and python multiprocessing, too many open files

I have an agent-based model, where several agents are started by a central process and communicate via another central process. Every agent and the communication process communicate via zmq. However, when I start more than 100 agents standard_out sends:
Invalid argument (src/stream_engine.cpp:143) Too many open files
(src/ipc_listener.cpp:292)
and Mac Os prompts a problem report :
Python quit unexpectedly while using the libzmq.5.dylib plug-in.
The problem appears to me that too many contexts are opened. But how can I avoid this with multiprocessing?
I attach part of the code below:
class Agent(Database, Logger, Trade, Messaging, multiprocessing.Process):
def __init__(self, idn, group, _addresses, trade_logging):
multiprocessing.Process.__init__(self)
....
def run(self):
self.context = zmq.Context()
self.commands = self.context.socket(zmq.SUB)
self.commands.connect(self._addresses['command_addresse'])
self.commands.setsockopt(zmq.SUBSCRIBE, "all")
self.commands.setsockopt(zmq.SUBSCRIBE, self.name)
self.commands.setsockopt(zmq.SUBSCRIBE, group_address(self.group))
self.out = self.context.socket(zmq.PUSH)
self.out.connect(self._addresses['frontend'])
time.sleep(0.1)
self.database_connection = self.context.socket(zmq.PUSH)
self.database_connection.connect(self._addresses['database'])
time.sleep(0.1)
self.logger_connection = self.context.socket(zmq.PUSH)
self.logger_connection.connect(self._addresses['logger'])
self.messages_in = self.context.socket(zmq.DEALER)
self.messages_in.setsockopt(zmq.IDENTITY, self.name)
self.messages_in.connect(self._addresses['backend'])
self.shout = self.context.socket(zmq.SUB)
self.shout.connect(self._addresses['group_backend'])
self.shout.setsockopt(zmq.SUBSCRIBE, "all")
self.shout.setsockopt(zmq.SUBSCRIBE, self.name)
self.shout.setsockopt(zmq.SUBSCRIBE, group_address(self.group))
self.out.send_multipart(['!', '!', 'register_agent', self.name])
while True:
try:
self.commands.recv() # catches the group adress.
except KeyboardInterrupt:
print('KeyboardInterrupt: %s,self.commands.recv() to catch own adress ~1888' % (self.name))
break
command = self.commands.recv()
if command == "!":
subcommand = self.commands.recv()
if subcommand == 'die':
self.__signal_finished()
break
try:
self._methods[command]()
except KeyError:
if command not in self._methods:
raise SystemExit('The method - ' + command + ' - called in the agent_list is not declared (' + self.name)
else:
raise
except KeyboardInterrupt:
print('KeyboardInterrupt: %s, Current command: %s ~1984' % (self.name, command))
break
if command[0] != '_':
self.__reject_polled_but_not_accepted_offers()
self.__signal_finished()
#self.context.destroy()
the whole code is under http://www.github.com/DavoudTaghawiNejad/abce
Odds are it's not too many contexts, it's too many sockets. Looking through your repo, I see you're (correctly) using IPC as your transport; IPC uses a file descriptor as the "address" to pass data back and forth between different processes. If I'm reading correctly, you're opening up to 7 sockets per process, so that'll add up quickly. I'm betting that if you do some debugging in the middle of your code, you'll see that it doesn't fail when the last context is created, but when the last socket pushes the open file limit over the edge.
My understanding is that the typical user limit for open FDs is around 1000, so at around 100 agents you're pushing 700 open FDs just for your sockets. The remainder is probably just typical. There should be no problem increasing your limit up to 10,000, higher depending on your situation. Otherwise you'll have to rewrite to use less sockets per process to get a higher process limit.
This has nothing to do with zeromq nor python. It's the underlying operating system, that allow only up to a certain threshold of concurrently opened files. This limit includes normal files, but also socket connections.
You can see your current limit using ulimit -n, it will probably default to 1024. Machines running servers or having other reasons (like your multiprocessing) often require to set this limit higher or just to unlimited. – More info about ulimit.
Additionally, there's another global limit, however it's nothing I had to adjust yet.
In general, you should ask yourself, if you really need that many agents. Usually, X / 2X worker processes should be enough, where X corresponds to your CPU count.
You should increase the number of allowed open files for the process as in this question:
Python: Which command increases the number of open files on Windows?
the default per process is 512
import win32file
print win32file._getmaxstdio() #512
win32file._setmaxstdio(1024)
print win32file._getmaxstdio() #1024

Can this implement Matlabpool in parallel?

Im using the Matpower - Matlab toolbox in parallel computing and building the computer cluster to simulate the programme which is shown below:
matlabpool open job1 5 % matlabpool means computer cluster
spmd %the statement from the Parallel computing toolbox
% Run all the statements in parallel
% first part of code
if labindex==1
runopf('casea');
end
% second part of code
if labindex==2
runopf('caseb');
end
end
matlabpool close;
When the labindex is 1 the first part of code in this program is running in "computer1" in the cluster, and so forth when the labindex is 2, then the second part of code in the program is "running in computer2". My question is the main code shown above running in sequence or in parallel?
By which I mean, does the second part code has to wait to be executed until the first part of code is executed or two parts of codes can be executed parallel at the two different computers in the cluster?
The code between spmd and corresponding end is sent to all workers (5 in your case) and they execute these instructions in parallel. Then, in your code you instructed worker #1 to execute runopf('casea'); and worker #2 runopf('caseb');. Workers #3 to #5 will effectively do nothing.
Technically, worker #2 will execute runopf('caseb'); a little later. The delay appears because worker #2 will also check the first if statement (but will not execute the code in it).

Resources