process is not spawning with Multiprocessing module in celery - multiprocessing

I have a pretty standard set-up: Django + Rabbitmq + Celery.
I am trying to spawn a process using standard python multiprocessing module in celery.
I have noticed that process itself is not spawning. What could be the reason for not spawning the process. here is the code:
import multiprocessing as mp
from celery.schedules import crontab
from celery.decorators import periodic_task
#periodic_task(run_every=crontab(minute='*/1'), name='test_process_celery')
def main():
data = config_read()
try:
myqueue = mp.Queue(-1)
mylog_process = mp.Process(target=test_logger_process, args=(myqueue,))
mylog_process.start()
. . .
. . .
except Exception as e:
raise
finally:
mylog_process.join()
Can we use multiprocessing module in celery?
Is there any celery limitation with multiprocessing ? if there is no limitation why process is not spawned?
Thanks you in Advance.

try this:
#periodic_task(run_every=crontab(minute='*/1'), name='test_process_celery')
def main():
from multiprocessing import current_process
current_process().daemon = False
your_stuff()
current_process().daemon = True

Related

How to call async method from greenlet (playwright)

My framework (Locust, https://github.com/locustio/locust) is based on gevent and greenlets. But I would like to leverage Playwright (https://playwright.dev/python/), which is built on asyncio.
Naively using Playwrights sync api doesnt work and gives an exception:
playwright._impl._api_types.Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
I'm looking for some kind of best practice on how to use async in combination with gevent.
I've tried a couple different approaches but I dont know if I'm close or if what I'm trying to do is even possible (I have some experience with gevent, but havent really used asyncio before)
Edit: I kind of have something working now (I've removed Locust and just directly spawned some greenlets to make it easier to understan). Is this as good as it gets, or is there a better solution?
import asyncio
import threading
from playwright.async_api import async_playwright
import gevent
def thr(i):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(do_stuff(i))
loop.close()
async def do_stuff(i):
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless=False)
page = await browser.new_page()
await page.wait_for_timeout(5000)
await page.goto(f"https://google.com")
await page.close()
print(i)
def green(i):
t = threading.Thread(target=thr, args=(i,))
t.start()
# t.join() # joining doesnt work, but I couldnt be bothered right now :)
g1 = gevent.spawn(green, 1)
g2 = gevent.spawn(green, 2)
g1.join()
g2.join()
Insipred by #user4815162342 's comment, I went with something like this:
from playwright.async_api import async_playwright # need to import this first
from gevent import monkey, spawn
import asyncio
import gevent
monkey.patch_all()
loop = asyncio.new_event_loop()
async def f():
print("start")
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless=True)
context = await browser.new_context()
page = await context.new_page()
await page.goto(f"https://www.google.com")
print("done")
def greeny():
while True: # and not other_exit_condition
future = asyncio.run_coroutine_threadsafe(f(), loop)
while not future.done():
gevent.sleep(1)
greenlet1 = spawn(greeny)
greenlet2 = spawn(greeny)
loop.run_forever()
The actual implementation will end up in Locust some day, probably after some optimization (reusing browser instance etc)
Here's a simple way to integrate asyncio and gevent:
Run an asyncio loop in a dedicated thread
Use asyncio.run_coroutine_threadsafe() to run a coroutine
Use gevent.event.Event to wait until the coroutine resolves
import asyncio
import threading
import gevent
loop = asyncio.new_event_loop()
loop_thread = threading.Thread(target=loop.run_forever, daemon=True)
loop_thread.start()
async def your_coro():
# ...
def wait_until_complete(coro):
future = asyncio.run_coroutine_threadsafe(coro, loop)
event = gevent.event.Event()
future.add_dome_callback(lambda _: event.set())
event.wait()
return future.result()
result = wait_until_complete(your_coro())

Multiprocessing runtime error freeze_support() in Mac 64 bit

I am trying to learn Threading and Multiprocessing on a MacOS. I am unable to launch the processes though, with python giving the following error message.
Error
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
my code:
parallel_processing.py
import multiprocessing
import time
start= time.perf_counter()
def do_something():
print('sleeping 1 sec....')
time.sleep(1)
return('done sleeping...')
# do_something()
p1 = multiprocessing.Process(target = do_something)
p2 = multiprocessing.Process(target = do_something)
p1.start()
p2.start()
finish= time.perf_counter()
print(f'finished in {finish-start} seconds')
It seems that you can resolve the issue by putting your code inside the block if __name__ == '__main__'. For example:
import multiprocessing as mp
import time
def do_something():
...
if __name__ == '__main__':
p1 = mp.Process(target=do_something)
p2 = mp.Process(target=do_something)
p1.start()
p2.start()
...
I'm not sure this is resulted from the recent macOS update or python, but it seems to have to do with how OS creates new threads, i.e. forking versus spwaning. Please see this blog post for details.

How to run perticular code in gpu using PyTorch?

I am using an image processing code in python opencv. Since that process is taking a lot of time to process say 30 images. I tried to process these image parallel using Multiprocessing. The multiprocessing part is working good in CPU but I want to use that multiprocessing thing in GPU(cuda).
I use torch.multiprocessing for running task in parallel. So I am using torch.device('cuda') for our class to run whole thing in to this perticular device. When I run the code it's showing device using "cuda" but not using any GPU processing.
import cv2
import numpy as np
import torch
import torch.nn as nn
from torch.multiprocessing import Process, Pool, Manager, set_start_method
import sys
import os
class RoadShoulderWidth(nn.Module):
def __init__(self):
super(RoadShoulderWidth, self).__init__()
pass
// Want to run below method in parallel for 30 images.
#staticmethod
def get_dim(image, road_shoulder_width_list):
..... code
def get_road_shoulder_width(self, _root_dir, _img_path_list):
manager = Manager()
road_shoulder_width_list = manager.list()
processes = []
for img_path in img_path_list[:30]:
img = cv2.imread(_root_dir + '/' + img_path)
img = img[72 * 5:72 * 6, 0:1280]
# Do work
p = Process(target=self.get_dim,args=(img,road_shoulder_width_list))
p.start()
processes.append(p)
for p in processes:
p.join()
return road_shoulder_width_list
Use below set of code to run your class
if __name__ == '__main__':
root_dir = '/home/nikhil_m/r'
img_path_list = os.listdir(root_dir)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
dataloader_kwargs = {'pin_memory': True}
set_start_method('fork')
obj = RoadShoulderWidth().to(device)
val = obj.get_road_shoulder_width(str(root_dir), img_path_list)
print(val)
print(torch.cuda.is_available())
Can anybody suggest me how to fix this?
Your class RoadShoulderWidth is a nn.Module subclass which lets you use .to(device). This only means that all other nn.Module objects or nn.Parameters that are members of your RoadShoulderWidth object are moved to the device. As from your example, there are none, so nothing happens.
In general PyTorch does not move code to GPU but data. If all data of a pytorch operation are on the GPU (e.g. a + b, a and b are on GPU) then the operation is executed on the GPU. You can move the data with a.to(device), given a is a torch.Tensor object.
PyTorch can only execute its own operations on GPU. It's not able to execute OpenCV code on GPU.

`ProcessPoolExecutor` works on Ubuntu, but fails with `BrokenProcessPool` when running Jupyter 5.0.0 notebook with Python 3.5.3 on Windows 10

I'm running Jupyter 5.0.0 notebook with Python 3.5.3 on Windows 10. The following example code fails to run:
from concurrent.futures import as_completed, ProcessPoolExecutor
import time
import numpy as np
def do_work(idx1, idx2):
time.sleep(0.2)
return np.mean([idx1, idx2])
with ProcessPoolExecutor(max_workers=4) as executor:
futures = set()
for idx in range(32):
future = winprocess.submit(
executor, do_work, idx, idx * 2
)
futures.add(future)
for future in as_completed(futures):
print(future.result())
... and throws BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
The code works perfectly fine on Ubuntu 14.04.
I've understand that Windows doesn't have os.fork, thus multiprocessing is handled differently, and doesn't always play nice with interactive mode and Jupyter.
What are some workarounds to make ProcessPoolExecutor work in this case?
There are some similar questions, but they relate to multiprocessing.Pool:
multiprocessing.Pool in jupyter notebook works on linux but not windows
Closer inspection shows that a Jupyter notebook can run external python modules which is parallelized using ProcessPoolExecutor. So, a solution is to do the parallelizable part of your code in a module and call it from the Jupyter notebook.
That said, this can be generalized as a utility. The following can be stored as a module, say, winprocess.py and imported by jupyter.
import inspect
import types
def execute_source(callback_imports, callback_name, callback_source, args):
for callback_import in callback_imports:
exec(callback_import, globals())
exec('import time' + "\n" + callback_source)
callback = locals()[callback_name]
return callback(*args)
def submit(executor, callback, *args):
callback_source = inspect.getsource(callback)
callback_imports = list(imports(callback.__globals__))
callback_name = callback.__name__
future = executor.submit(
execute_source,
callback_imports, callback_name, callback_source, args
)
return future
def imports(callback_globals):
for name, val in list(callback_globals.items()):
if isinstance(val, types.ModuleType) and val.__name__ != 'builtins' and val.__name__ != __name__:
import_line = 'import ' + val.__name__
if val.__name__ != name:
import_line += ' as ' + name
yield import_line
Here is how you would use this:
from concurrent.futures import as_completed, ProcessPoolExecutor
import time
import numpy as np
import winprocess
def do_work(idx1, idx2):
time.sleep(0.2)
return np.mean([idx1, idx2])
with ProcessPoolExecutor(max_workers=4) as executor:
futures = set()
for idx in range(32):
future = winprocess.submit(
executor, do_work, idx, idx * 2
)
futures.add(future)
for future in as_completed(futures):
print(future.result())
Notice that executor has been changed with winprocess and the original executor is passed to the submit function as a parameter.
What happens here is that the notebook function code and imports are serialized and passed to the module for execution. The code is not executed until it is safely in a new process, thus does not trip up with trying to make a new process based on the jupyter notebook itself.
Imports are handled in such a way as to maintain aliases. The import magic can be removed if you make sure to import everything needed for the function being executed inside the function itself.
Also, this solution only works if you pass all necessary variables as arguments to the function. The function should be static so to speak, but I think that's a requirement of ProcessPoolExecutor as well. Finally, make sure you don't execute other functions defined elsewhere in the notebook. Only external modules will be imported, thus other notebook functions won't be included.

from Gui import * in python 3?

I'm trying this:
import os, sys
from Gui import *
import Image as PIL
import ImageTk
class ImageBrowser(Gui):
def __init__(self):
Gui.__init__(self)
self.button = self.bu(command=self.quit, relief=FLAT)
def image_loop(self, dirname='.'):
files = os.listdir(dirname)
for file in files:
try:
self.show_image(file)
print (file)
self.mainloop()
except IOError:
continue
except:
break
def show_image(self, filename):
image = PIL.open(filename)
self.tkpi = ImageTk.PhotoImage(image)
self.button.config(image=self.tkpi)
def main(script, dirname='.'):
g = ImageBrowser()
g.image_loop(dirname)
if __name__ == '__main__':
main(*sys.argv)
I'm getting an error that says:
from Gui import *
ImportError: No module named Gui
I'm assuming "from Gui import *" doesn't work in python 3, does anyone know how to do this in python 3? Thank you so much (:
If you are talking about the Gui module that comes with Swampy, then
in order to use Gui with Python3, you'll need to install the Python3 version of Swampy.

Resources