Multiprocessing runtime error freeze_support() in Mac 64 bit - macos

I am trying to learn Threading and Multiprocessing on a MacOS. I am unable to launch the processes though, with python giving the following error message.
Error
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
my code:
parallel_processing.py
import multiprocessing
import time
start= time.perf_counter()
def do_something():
print('sleeping 1 sec....')
time.sleep(1)
return('done sleeping...')
# do_something()
p1 = multiprocessing.Process(target = do_something)
p2 = multiprocessing.Process(target = do_something)
p1.start()
p2.start()
finish= time.perf_counter()
print(f'finished in {finish-start} seconds')

It seems that you can resolve the issue by putting your code inside the block if __name__ == '__main__'. For example:
import multiprocessing as mp
import time
def do_something():
...
if __name__ == '__main__':
p1 = mp.Process(target=do_something)
p2 = mp.Process(target=do_something)
p1.start()
p2.start()
...
I'm not sure this is resulted from the recent macOS update or python, but it seems to have to do with how OS creates new threads, i.e. forking versus spwaning. Please see this blog post for details.

Related

Why does huggingface hang on list input for pipeline sentiment-analysis?

With python 3.10 and latest version of huggingface.
for simple code likes this
from transformers import pipeline
input_list = ['How do I test my connection? (Windows)', 'how do I change my payment method?', 'How do I contact customer support?']
classifier = pipeline('sentiment-analysis')
results = classifier(input_list)
the program hangs and returns error messages:
File ".......env/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
but replace the list input with a string, it works
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier('How do I test my connection? (Windows)')
It needs to define a main function to run multitask that the list input depends on. Following update works
from transformers import pipeline
def main():
input_list = ['How do I test my connection? (Windows)',
'how do I change my payment method?',
'How do I contact customer support?']
classifier = pipeline('sentiment-analysis')
results = classifier(input_list)
if __name__ == '__main__':
main()
The question is reduced to where to put freeze_support() in a Python script?

How to create multiprocess with regression function?

I'm trying to build a regression function that call itself in a new process. The new process should not stop the parent process nor wait for it to finish, that is why I don't use join(). Do you have another way to create regression function with multi-process.
I use the following code:
import multiprocessing as mp
import concurrent.futures
import time
def do_something(c, seconds, r_list):
c += 1 # c is a counter that all processes should use
# such that no more than 20 processes are created.
print(f"Sleeping {seconds} second(s)...")
if c < 20:
P_V = mp.Value('d', 0.0, lock=False)
p = mp.Process(group=None, target=do_something, args=(c, 1, r_list,))
p.start()
if not p.is_alive():
r_list.append(P_V.value)
time.sleep(seconds)
print(f"Done Sleeping...{seconds}")
return f"Done Sleeping...{seconds}"
if __name__ == '__main__':
C = 0 # C is a counter that all processes should use
# such that no more than 20 processes are created.
Result_list = [] # results that come from all processes are saved here
Result_list.append(do_something(C, 1, Result_list))
Notice that results from all processes should be compared at the end.
In fact, this code is working well but the child processes, which are created in the recursive method, do not print anything, the list "Result_list" contains only one item from the first call, and C=0 at the end, any idea why?
Here's a simplified example of what I think you're trying to do (side note: launching processes recursively is a great way to accidentally create a "fork bomb". It is extremely more common to create multiple processes in some sort of loop instead)
from multiprocessing import Process, Queue
from time import sleep
from os import getpid
def foo(n_procs, return_Q, arg):
if __name__ == "__main__": #don't actually run the body of foo in the "main" process, just start the recursion
Process(target=foo, args=(n_procs, return_Q, arg)).start()
else:
n_procs -= 1
if n_procs > 0:
Process(target=foo, args=(n_procs, return_Q, arg)).start()
sleep(arg)
print(f"{getpid()} done sleeping {arg} seconds")
return_Q.put(f"{getpid()} done sleeping {arg} seconds") #put the result to a queue so we can get it in the main process
if __name__ == "__main__":
q = Queue()
foo(10, q, 2)
sleep(10) #do something else in the meantime
results = []
#while not q.empty(): #usually better to just know how many results you're expecting as q.empty can be unreliable
for _ in range(10):
results.append(q.get())
print("mp results:")
print("\n".join(results))

`ProcessPoolExecutor` works on Ubuntu, but fails with `BrokenProcessPool` when running Jupyter 5.0.0 notebook with Python 3.5.3 on Windows 10

I'm running Jupyter 5.0.0 notebook with Python 3.5.3 on Windows 10. The following example code fails to run:
from concurrent.futures import as_completed, ProcessPoolExecutor
import time
import numpy as np
def do_work(idx1, idx2):
time.sleep(0.2)
return np.mean([idx1, idx2])
with ProcessPoolExecutor(max_workers=4) as executor:
futures = set()
for idx in range(32):
future = winprocess.submit(
executor, do_work, idx, idx * 2
)
futures.add(future)
for future in as_completed(futures):
print(future.result())
... and throws BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
The code works perfectly fine on Ubuntu 14.04.
I've understand that Windows doesn't have os.fork, thus multiprocessing is handled differently, and doesn't always play nice with interactive mode and Jupyter.
What are some workarounds to make ProcessPoolExecutor work in this case?
There are some similar questions, but they relate to multiprocessing.Pool:
multiprocessing.Pool in jupyter notebook works on linux but not windows
Closer inspection shows that a Jupyter notebook can run external python modules which is parallelized using ProcessPoolExecutor. So, a solution is to do the parallelizable part of your code in a module and call it from the Jupyter notebook.
That said, this can be generalized as a utility. The following can be stored as a module, say, winprocess.py and imported by jupyter.
import inspect
import types
def execute_source(callback_imports, callback_name, callback_source, args):
for callback_import in callback_imports:
exec(callback_import, globals())
exec('import time' + "\n" + callback_source)
callback = locals()[callback_name]
return callback(*args)
def submit(executor, callback, *args):
callback_source = inspect.getsource(callback)
callback_imports = list(imports(callback.__globals__))
callback_name = callback.__name__
future = executor.submit(
execute_source,
callback_imports, callback_name, callback_source, args
)
return future
def imports(callback_globals):
for name, val in list(callback_globals.items()):
if isinstance(val, types.ModuleType) and val.__name__ != 'builtins' and val.__name__ != __name__:
import_line = 'import ' + val.__name__
if val.__name__ != name:
import_line += ' as ' + name
yield import_line
Here is how you would use this:
from concurrent.futures import as_completed, ProcessPoolExecutor
import time
import numpy as np
import winprocess
def do_work(idx1, idx2):
time.sleep(0.2)
return np.mean([idx1, idx2])
with ProcessPoolExecutor(max_workers=4) as executor:
futures = set()
for idx in range(32):
future = winprocess.submit(
executor, do_work, idx, idx * 2
)
futures.add(future)
for future in as_completed(futures):
print(future.result())
Notice that executor has been changed with winprocess and the original executor is passed to the submit function as a parameter.
What happens here is that the notebook function code and imports are serialized and passed to the module for execution. The code is not executed until it is safely in a new process, thus does not trip up with trying to make a new process based on the jupyter notebook itself.
Imports are handled in such a way as to maintain aliases. The import magic can be removed if you make sure to import everything needed for the function being executed inside the function itself.
Also, this solution only works if you pass all necessary variables as arguments to the function. The function should be static so to speak, but I think that's a requirement of ProcessPoolExecutor as well. Finally, make sure you don't execute other functions defined elsewhere in the notebook. Only external modules will be imported, thus other notebook functions won't be included.

Sqlite3 crashes from Process on Python 3

I am trying to initialize my database from from a process but I get return code -11, which seems to be segmentation fault.
import requests
import sqlite3
from multiprocessing import Process
def initializer():
print("Before sqlite connect")
connection = sqlite3.connect("/tmp/files.db")
print("Before sqlite cursor")
connection.cursor()
if __name__ == "__main__":
response = requests.get("http://google.com")
init_process = Process(target=initializer)
init_process.start()
init_process.join()
print("Exit code: %d" % init_process.exitcode)
The process crashes before I can create the cursor:
% python3 file-scanner.py
Before sqlite connect
Exit code: -11
It only happens on Python 3 on OS X it seems. And only happens if I use requests before connecting to sqlite.
I don't have the opportunity to switch to Python 2 -- nor do I really want to.

Tkinter problems with GUI when entering while loop

I have a simple GUI which run various scripts from another python file, everything works fine until the GUI is running a function which includes a while loop, at which point the GUI seems to crash and become in-active. Does anybody have any ideas as to how this can be overcome, as I believe this is something to do with the GUI being updated,Thanks. Below is a simplified version of my GUI.
GUI
#!/usr/bin/env python
# Python 3
from tkinter import *
from tkinter import ttk
from Entry import ConstrainedEntry
import tkinter.messagebox
import functions
AlarmCode = "2222"
root = Tk()
root.title("Simple Interface")
mainframe = ttk.Frame(root, padding="3 3 12 12")
mainframe.grid(column=0, row=0, sticky=(N, W, E, S))
mainframe.columnconfigure(0, weight=1)
mainframe.rowconfigure(0, weight=1)
ttk.Button(mainframe, width=12,text="ButtonTest",
command=lambda: functions.test()).grid(
column=5, row=5, sticky=SE)
for child in mainframe.winfo_children():
child.grid_configure(padx=5, pady=5)
root.mainloop()
functions
def test():
period = 0
while True:
if (period) <=100:
time.sleep(1)
period +=1
print(period)
else:
print("100 seconds has passed")
break
What will happen in the above is that when the loop is running the application will crash. If I insert a break in the else statement after the period has elapsed, everything will work fine. I want users to be able to click when in loops as this GUI will run a number of different functions.
Don't use time.sleep in the same thread than your Tkinter code: it freezes the GUI until the execution of test is finished. To avoid this, you should use after widget method:
# GUI
ttk.Button(mainframe, width=12,text="ButtonTest",
command=lambda: functions.test(root))
.grid(column=5, row=5, sticky=SE)
# functions
def test(root, period=0):
if period <= 100:
period += 1
print(period)
root.after(1000, lambda: test(root, period))
else:
print("100 seconds has passed")
Update:
In your comment you also add that your code won't use time.sleep, so your original example may not be the most appropiate. In that case, you can create a new thread to run your intensive code.
Note that I posted the alternative of after first because multithreading should be used only if it is completely necessary - it adds overhead to your applicacion, as well as more difficulties to debug your code.
from threading import Thread
ttk.Button(mainframe, width=12,text="ButtonTest",
command=lambda: Thread(target=functions.test).start())
.grid(column=5, row=5, sticky=SE)
# functions
def test():
for x in range(100):
time.sleep(1) # Simulate intense task (not real code!)
print(x)
print("100 seconds has passed")

Resources