I am using multiprocessing to perform a bunch of serial tasks. Those tasks are each time the same on different files located in different folders. Each tasks is made up of calls to several other modules and C++ programs. There is a high level wrapper that manages the calls to other modules/functions. At the beginning of the execution of the multiprocessing code, a list with the id and the instance of this high level class is created. Then a pool of processes execute the tasks.
It runs fine until a point when the obscure exception is raised:
Traceback (most recent call last):
File "test_parallel.py", line 197, in <module>
pool_outputs = pool.map(do_calculations, zip(list_instances, list_IDs), )
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 148, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 422, in get
raise self._value
IndexError: tuple index out of range
It is usually raised when the tasks has been performed a lot of times (~100th task on 200 planned).
A shortened version of the code is:
import multiprocessing
if __name__=="__main__":
which_subfields=range(200)
pool_size = int( multiprocessing.cpu_count() )
run = WrapperAroundModule.run(version = "parallel")
if pool_size == 0:
pool_size=1
list_IDs = list(which_subfields)
lock=multiprocessing.Lock()
runs = []
for _ in which_subfields:
runs.append(copy.deepcopy(run))
pool = multiprocessing.Pool(processes=pool_size, )
pool_outputs = pool.map(do_calculations, zip(list_instances, list_IDs), )
pool.close()
pool.join()
with the signature of the do_calculations function being: do_calculations((instance, id))
I made sure that the function do_calculations is thread-safe, but it didn't change the situation, then I wanted to use maxtasksperchild, but unfortunately I must use python 2.6 and the module Billard can't be install on the server (run scientific linux) I'm using. I therefore wrote a workaround: the tasks to be performed are divided by chunks of length pool_size*maxtasksperchild. The scripts executes those tasks on a pool using a similar code. Once it's finished, the pool and all the variables around it are deleted and a new pool is created with the next tasks. Sadly, the error is still raised at some point. Moreover, I made sure that the two lists passed as arguments are long enough. The function do_calculations runs smoothly on the single tasks that fail in the multiprocessing version.
Any idea on the source of this error and a possible correction ?
raise self._value means that do_calculations raised an exception in a child process and multiprocessing reraises it for you in the main process.
To get rid of the exception, fix do_calculations() function. Wrap it with try/except and print the full traceback/locals to understand where the error is.
Related
Trying to create a event loop inside a thread, where the thread is initiated within the constructor of a class. I want to run multiple tasks within the event loop. However, having an issue whenever I try to run with the thread and get the error "NoneType object has no attribute create_task"
Is there something I am doing wrong in calling it.
import asyncio
import threading
Class Test():
def __init__(self):
self.loop = None
self.th = threading.Thread(target=self.create)
self.th.start()
def __del__(self):
self.loop.close()
def self.create(self):
self.loop = new_event_loop()
asyncio.set_event_loop(self.loop)
def fun(self):
task = self.loop.create_task(coroutine)
loop.run_until_complete(task)
def fun2(self):
task = self.loop.create_task(coroutine)
loop.run_until_complete(task)
t = Test()
t.fun()
t.fun2()
It is tricky to combine threading and asyncio, although it can be useful if done properly.
The code you gave has several syntax errors, so obviously it isn't the code you are actually running. Please, in the future, check your post carefully out of respect for the time of those who answer questions here. You'll get better and quicker answers if you spot these avoidable errors yourself.
The keyword "class" should not be capitalized.
The class definition does not need empty parenthesis.
The function definition for create should not have self. in front of it.
There is no variable named coroutine defined in the script.
The next problem is the launching of the secondary thread. The method threading.Thread.start() does not wait for the thread to actually start. The new thread is "pending" and will start sometime soon, but you don't have control over when that happens. So start() returns immediately; your __init__ method returns; and your call to t.fun() happens before the thread starts. At that point self.loop is in fact None, as the error message indicates.
An nice way to overcome this is with a threading.Barrier object, which can be used to insure that the thread has started before the __init__ method returns.
Your __del__ method is probably not necessary, and will normally only get executed during program shut down. If it runs under any other circumstances, you will get an error if you call loop.close on a loop that is still running. I think it's better to insure that the thread shuts down cleanly, so I've provided a Test.close method for that purpose.
Your functions fun and fun2 are written in a way that makes them not very useful. You start a task and then you immediately wait for it to finish. In that case, there's no good reason to use asyncio at all. The whole idea of asyncio is to run more than one task concurrently. Creating tasks one at a time and always waiting for each one to finish doesn't make a lot of sense.
Most asyncio functions are not threadsafe. You have to use the two important methods loop.call_soon_threadsafe and asyncio.run_coroutine_threadsafe if you want to run asyncio code across threads. The methods fun and fun2 execute in the main thread, so you should use run_coroutine_threadsafe to launch tasks in the secondary thread.
Finally, with such programs it's usually a good idea to provide a thread shutdown method. In the following listing, close obtains a list of all the running tasks, sends a cancel message to each, and then sends the stop command to the loop itself. Then it waits for the thread to really exit. The main thread will be blocked until the secondary thread is finished, so the program will shut down cleanly.
Here is a simple working program, with all the functionality that you seem to want:
import asyncio
import threading
async def coro(s):
print(s)
await asyncio.sleep(3.0)
class Test:
def __init__(self):
self.loop = None
self.barrier = threading.Barrier(2) # Added
self.th = threading.Thread(target=self.create)
self.th.start()
self.barrier.wait() # Blocks until the new thread is running
def create(self):
self.loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.loop)
self.barrier.wait()
print("Thread started")
self.loop.run_forever()
print("Loop stopped")
self.loop.close() # Clean up loop resources
def close(self): # call this from main thread
self.loop.call_soon_threadsafe(self._close)
self.th.join() # Wait for the thread to exit (insures loop is closed)
def _close(self): # Executes in thread self.th
tasks = asyncio.all_tasks(self.loop)
for task in tasks:
task.cancel()
self.loop.call_soon(self.loop.stop)
def fun(self):
return asyncio.run_coroutine_threadsafe(coro("Hello 1"), self.loop)
def fun2(self):
return asyncio.run_coroutine_threadsafe(coro("Hello 2"), self.loop)
t = Test()
print("Test constructor complete")
t.fun()
fut = t.fun2()
# Comment out the next line if you don't want to wait here
# fut.result() # Wait for fun2 to finish
print("Closing")
t.close()
print("Finished")
Thanks for any reply in advance.
I have the entrance program main.py:
import asyncio
from loguru import logger
from multiprocessing import Process
from app.events import type_a_tasks, type_b_tasks, type_c_tasks
def run_task(task):
loop = asyncio.get_event_loop()
loop.run_until_complete(task())
loop.run_forever()
def main():
processes = list()
processes.append(Process(target=run_task, args=(type_a_tasks,)))
processes.append(Process(target=run_task, args=(type_b_tasks,)))
processes.append(Process(target=run_task, args=(type_c_tasks,)))
for process in processes:
process.start()
logger.info(f"Started process id={process.pid}, name={process.name}")
for process in processes:
process.join()
if __name__ == '__main__':
main()
where the different types of tasks are similarly defined, for example type_a_tasks are:
import asyncio
from . import business_1, business_2, business_3, business_4, business_5, business_6
async def type_a_tasks():
tasks = list()
tasks.append(asyncio.create_task(business_1.main()))
tasks.append(asyncio.create_task(business_2.main()))
tasks.append(asyncio.create_task(business_3.main()))
tasks.append(asyncio.create_task(business_4.main()))
tasks.append(asyncio.create_task(business_5.main()))
tasks.append(asyncio.create_task(business_6.main()))
await asyncio.wait(tasks)
return tasks
where the main() function of businesses(1-6) are Future objects provided by asyncio, in which I implemented my business code.
Is my usage of multiprocessing and asyncio event loops above the correct way of doing it?
I am doing so because I have a lot of asynchronous tasks to perform, but it doesn't seem appropriate to put them all in one event loop, so I divided them into three parts(a, b and c) accordingly, and I hope they can be run in three different processes to exert the capability of multiple CPU cores, in the meantime taking advantage of asyncio features.
I tried running my code, where the log records show there actually are different processes but all are using the same thread/event loop(knowing this by adding process_id and thread_id to loguru format)
this seens ok. Just use asyncio.run(task()) inside run_task - it is simpler and there is no need to call run_forever (also, with the run_forever` call, your processes will never join the base one.
IDs for other objects across process may repeat - if you want, add to your logging the result of calling os.getpid() in the body of run_task.
(if these are, by chance, the same, that means that somehow subprocessing is using a "dummy" backend due to some configuration in your project - should not happen anyway)
The documentation for asyncio.run states:
This function always creates a new event loop and closes it at the end.
It should be used as a main entry point for asyncio programs, and should
ideally only be called once.
But it does not say why. I have a non-async program that needs to invoke something async. Can I just use asyncio.run every time I get to the async portion, or is this unsafe/wrong?
In my case, I have several async coroutines I want to gather and run in parallel to completion. When they are all completed, I want move on with my synchronous code.
async my_task(url):
# request some urls or whatever
integration_tasks = [my_task(url1), my_task(url2)]
async def gather_tasks(*integration_tasks):
return await asyncio.gather(*integration_tasks)
def complete_integrations(*integration_tasks):
return asyncio.run(gather_tasks(*integration_tasks))
print(complete_integrations(*integration_tasks))
Can I use asyncio.run() to run coroutines multiple times?
This actually is an interesting and very important question.
As a documentation of asyncio (python3.9) says:
This function always creates a new event loop and closes it at the end. It should be used as a main entry point for asyncio programs, and should ideally only be called once.
It does not prohibit calling it multiple times. And moreover, an old way of calling coroutines from synchronous code, which was:
loop = asyncio.get_event_loop()
loop.run_until_complete(coroutine)
Is now deprecated because of get_event_loop() method, which documentation says:
Consider also using the asyncio.run() function instead of using lower level functions to manually create and close an event loop.
Deprecated since version 3.10: Deprecation warning is emitted if there is no running event loop. In future Python releases, this function will be an alias of get_running_loop().
So in future releases it will not spawn new event loop if already running one is not present! Docs are proposing usage of asyncio.run() if You want to automatically spawn new loop if there is no new one.
There is a good reason for such decision. Even if You have an event loop and You will successfully use it to execute coroutines, there is few more things You must remember to do:
closing an event loop
consuming unconsumed generators (most important in case of failed coroutines)
...probably more, which I do not even attempt to refer here
What is exactly needed to be done to properly finalize event loop You can read in this source code.
Managing an event loop manually (if there is no running one) is a subtle procedure, and it is better to not doing that, unless one know what he is doing.
So Yes, I think that proper way of runing async function from synchronous code is calling asyncio.run(). But it is only suitable from a fully synchronous application. If there is already running event loop, it will probably fail (not tested). In such case, just await it or use get_runing_loop().run_untilcomplete(coro).
And for such synchronous apps, using asyncio.run() it is safe way and actually the only safe way of doing this, and it can be invoked multiple times.
The reason docs says that You should call it only once is that usually there is one single entrypoint to whole asynchronous application. It simplifies things and actually improves performance, because setting thins up for an event loop also takes some time. But if there is no single loop available in Your application, You should use multiple calls to asyncio.run() to run coroutines multiple times.
Is there is any performance gain?
Beside discussing multiple calls to asyncio.run(), I want to address one more concern. In comments, #jwal says:
asyncio is not parallel processing. Says so in the docs. [...] If you want parallel, run in a separate processes on a computer with a separate CPU core, not a separate thread, not a separate event loop.
Suggesting that asyncio is not suitable for parallel processing, which can be misunderstood and misleading to a conclusion, that it will not result in a performance gain, which is not always true. Moreover it is usually false!
So, any time You can delegate a job to an external process (not only a python process, it can be a database worker process, http call, ideally any TCP socket call) You can utilize a performance gain using asyncio. In huge majority of cases, when You are using a library which exposes async interface, the author of that library made an effort to eventually await for a result from a network/socket/process call. While response from such socket is not ready, event loop is completely free to do any other tasks. If loop has more than one such tasks, it will gain a performance.
A canonical example of such case is making a calls to a HTTP endpoints. At some point, there will be a network call, so python thread is free to do other work while awaiting for a data to appear on a TCP socket buffer. I have an example!
The example uses httpx library to compare performance of doing multiple calls to a OpenWeatherMap API. There are two functions:
get_weather_async()
get_weather_sync()
The first one does 8 request to an http API, but schedules those request to
run cooperatively (not concurrently!) on an event loop using asyncio.gather().
The second one performs 8 synchronous request in sequence.
To call the asynchronous function, I am actually using asyncio.run() method. And moreover, I am using timeit module to perform such call to asyncio.run() 4 times. So in a single python application, asyncio.run() was called 4 times, just to challenge my previous considerations.
from time import time
import httpx
import asyncio
import timeit
from random import uniform
class AsyncWeatherApi:
def __init__(
self, base_url: str = "https://api.openweathermap.org/data/2.5"
) -> None:
self.client: httpx.AsyncClient = httpx.AsyncClient(base_url=base_url)
async def weather(self, lat: float, lon: float, app_id: str) -> dict:
response = await self.client.get(
"/weather",
params={
"lat": lat,
"lon": lon,
"appid": app_id,
"units": "metric",
},
)
response.raise_for_status()
return response.json()
class SyncWeatherApi:
def __init__(
self, base_url: str = "https://api.openweathermap.org/data/2.5"
) -> None:
self.client: httpx.Client = httpx.Client(base_url=base_url)
def weather(self, lat: float, lon: float, app_id: str) -> dict:
response = self.client.get(
"/weather",
params={
"lat": lat,
"lon": lon,
"appid": app_id,
"units": "metric",
},
)
response.raise_for_status()
return response.json()
def get_random_locations() -> list[tuple[float, float]]:
"""generate 8 random locations in +/-europe"""
return [(uniform(45.6, 52.3), uniform(-2.3, 29.4)) for _ in range(8)]
async def get_weather_async(locations: list[tuple[float, float]]):
api = AsyncWeatherApi()
return await asyncio.gather(
*[api.weather(lat, lon, api_key) for lat, lon in locations]
)
def get_weather_sync(locations: list[tuple[float, float]]):
api = SyncWeatherApi()
return [api.weather(lat, lon, api_key) for lat, lon in locations]
api_key = "secret"
def time_async_job(repeat: int = 1):
locations = get_random_locations()
def run():
return asyncio.run(get_weather_async(locations))
duration = timeit.Timer(run).timeit(repeat)
print(
f"[ASYNC] In {duration}s: done {len(locations)} API calls, all"
f" repeated {repeat} times"
)
def time_sync_job(repeat: int = 1):
locations = get_random_locations()
def run():
return get_weather_sync(locations)
duration = timeit.Timer(run).timeit(repeat)
print(
f"[SYNC] In {duration}s: done {len(locations)} API calls, all repeated"
f" {repeat} times"
)
if __name__ == "__main__":
time_sync_job(4)
time_async_job(4)
At the end, a comparison of performance was printed. It says:
[SYNC] In 5.5580058859995916s: done 8 API calls, all repeated 4 times
[ASYNC] In 2.865574334995472s: done 8 API calls, all repeated 4 times
Those 4 repetitions was just to show that You can safely run a asyncio.run() multiple times. It had actualy destructive impact on measuring performance of asynchronous http calls, because all 32 request was actually run in four synchronous batches of 8 asynchronous tasks. Just to compare performance of one batch of 32 request:
[SYNC] In 4.373898585996358s: done 32 API calls, all repeated 1 times
[ASYNC] In 1.5169846520002466s: done 32 API calls, all repeated 1 times
So yes, it can, and usually will result in performance gain, if only proper async library is used (if library exposes an async API, it usually does it intentianally, knowing that there will be a network call somewhere).
I have a script that collects data from a database, filters and puts into list for further processing. I've split entries in the database between several processes to make the filtering faster. Here's the snippet:
def get_entry(pN,q,entries_indicies):
##collecting and filtering data
q.put((address,page_text,))
print("Process %d finished!" % pN)
def main():
#getting entries
data = []
procs = []
for i in range(MAX_PROCESSES):
q = Queue()
p = Process(target=get_entry,args=(i,q,entries_indicies[i::MAX_PROCESSES],))
procs += [(p,q,)]
p.start()
for i in procs:
i[0].join()
while not i[1].empty():
#process returns a tuple (address,full data,)
data += [i[1].get()]
print("Finished processing database!")
#More tasks
#................
I've run it on Linux (Ubuntu 14.04) and it went totally fine. The problems start when I run it on Windows 7. The script gets stuck on i[0].join() for 11th process out of 16 (which looks totally random to me). No error messages, nothing, just freezes there. At the same time, the print("Process %d finished!" % pN) is displayed for all processes, which means they all come to an end, so there should be no problems with the code of get_entry
I tried to comment the q.put line in the process function, and it all went through fine (well, of course, data ended up empty).
Does it mean that Queue here is to blame? Why does it make join() stuck? Is it because of internal Lock within Queue? And if so, and if Queue renders my script unusable on Windows, is there some other way to pass data collected by processes to data list in the main process?
Came up with an answer to my last question.
I use Manager instead
def get_entry(pN,q,entries_indicies):
#processing
# assignment to manager list in another process doesn't work, but appending does.
q += result
def main():
#blahbalh
#getting entries
data = []
procs = []
for i in range(MAX_PROCESSES):
manager = Manager()
q = manager.list()
p = Process(target=get_entry,args=(i,q,entries_indicies[i::MAX_PROCESSES],))
procs += [(p,q,)]
p.start()
# input("Press enter when all processes finish")
for i in procs:
i[0].join()
data += i[1]
print("data", data)#debug
print("Finished processing database!")
#more stuff
The nature of freezing in Windows on join() due to presence of Queue still remains a mystery. So the question is still open.
As the docs says,
Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
So, since the multiprocessing.Queue is a kind of Pipe, when you call .join(), there are some items in the queue, and you should consume then or simply .get() them to make the empty. Then call .close() and .join_thread() for each queue.
You can also refer to this answer.
I need to upload a bunch of files in a directory to S3. Since more than 90% of the time required to upload is spent waiting for the http request to finish, I want to execute several of them at once somehow.
Can Fibers help me with this at all? They are described as a way to solve this sort of problem, but I can't think of any way I can do any work while an http call blocks.
Any way I can solve this problem without threads?
I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html
Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.
If you want to upload multiple files at once, simply launch a new Thread for each file:
t = Thread.new do
upload_file(param1, param2)
end
#all_threads << t
Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):
#all_threads.each do |t|
t.join if t.alive?
end
The Queue can either be a #member_variable or a $global.
To answer your actual questions:
Can Fibers help me with this at all?
No they can't. Jörg W Mittag explains why best.
No, you cannot do concurrency with Fibers. Fibers simply aren't a concurrency construct, they are a control-flow construct, like Exceptions. That's the whole point of Fibers: they never run in parallel, they are cooperative and they are deterministic. Fibers are coroutines. (In fact, I never understood why they aren't simply called Coroutines.)
The only concurrency construct in Ruby is Thread.
When he says that the only concurrency contruct in Ruby is Thread, remember that there are many different implimentations of Ruby and that they vary in their threading implementations. Jörg once again provides a great answer to these differences; and correctly concludes that only something like JRuby (that uses JVM threads mapped to native threads) or forking your process is how you can achieve true parallelism.
Any way I can solve this problem without threads?
Other than forking your process, I would also suggest that you look at EventMachine and something like em-http-request. It's an event driven, non-blocking, reactor pattern based HTTP client that is asynchronous and does not incur the overhead of threads.
Aaron Patterson (#tenderlove) uses an example almost exactly like yours to describe exactly why you can and should use threads to achieve concurrency in your situation.
Most I/O libraries are now smart enough to release the GVL (Global VM Lock, or most people know it as the GIL or Global Interpreter Lock) when doing IO. There is a simple function call in C to do this. You don't need to worry about the C code, but for you this means that most IO libraries worth their salt are going to release the GVL and allow other threads to execute while the thread that is doing the IO waits for the data to return.
If what I just said was confusing, you don't need to worry about it too much. The main thing that you need to know is that if you are using a decent library to do your HTTP requests (or any other I/O operation for that matter... database, interprocess communication, whatever), the Ruby interpreter (MRI) is smart enough to be able to release the lock on the interpreter and allow other threads to execute while one thread awaits IO to return. If the next thread has its own IO to grab, the Ruby interpreter will do the same thing (assuming that the IO library is built to utilize this feature of Ruby, which I believe most are these days).
So, to sum up what I am saying, use threads! You should see the performance benefit. If not, check to see whether your http library is using the rb_thread_blocking_region() function in C and, if not, find out why not. Maybe there is a good reason, maybe you need to consider using a better library.
The link to the Aaron Patterson video is here: http://www.youtube.com/watch?v=kufXhNkm5WU
It is worth a watch, even if just for the laughs, as Aaron Patterson is one of the funniest people on the internet.
You could use separate processes for this instead of threads:
#!/usr/bin/env ruby
$stderr.sync = true
# Number of children to use for uploading
MAX_CHILDREN = 5
# Hash of PIDs for children that are working along with which file
# they're working on.
#child_pids = {}
# Keep track of uploads that failed
#failed_files = []
# Get the list of files to upload as arguments to the program
#files = ARGV
### Wait for a child to finish, adding the file to the list of those
### that failed if the child indicates there was a problem.
def wait_for_child
$stderr.puts " waiting for a child to finish..."
pid, status = Process.waitpid2( 0 )
file = #child_pids.delete( pid )
#failed_files << file unless status.success?
end
### Here's where you'd put the particulars of what gets uploaded and
### how. I'm just sleeping for the file size in bytes * milliseconds
### to simulate the upload, then returning either +true+ or +false+
### based on a random factor.
def upload( file )
bytes = File.size( file )
sleep( bytes * 0.00001 )
return rand( 100 ) > 5
end
### Start a child uploading the specified +file+.
def start_child( file )
if pid = Process.fork
$stderr.puts "%s: uploaded started by child %d" % [ file, pid ]
#child_pids[ pid ] = file
else
if upload( file )
$stderr.puts "%s: done." % [ file ]
exit 0 # success
else
$stderr.puts "%s: failed." % [ file ]
exit 255
end
end
end
until #files.empty?
# If there are already the maximum number of children running, wait
# for one to finish
wait_for_child() if #child_pids.length >= MAX_CHILDREN
# Start a new child working on the next file
start_child( #files.shift )
end
# Now we're just waiting on the final few uploads to finish
wait_for_child() until #child_pids.empty?
if #failed_files.empty?
exit 0
else
$stderr.puts "Some files failed to upload:",
#failed_files.collect {|file| " #{file}" }
exit 255
end