How to assign specific cpu num to each process in python Multiprocessing? - multiprocessing

I run two processes with Multiprocessing in python like
from multiprocessing import Pool
pool = Pool(processes=2)
pool.apply_async(func1, (args))
pool.apply_async(func2, (args))
pool.close()
pool.join()
func1() and func2() are indepedent most of time and interact with files sometime. func1() needs more resources(cpu nums) and func2 needs less. So I want to assign more cpu to func1 process and less to func2 process. However, I read the multiprocessing doc and cannot find the option to set cpu num to each process.
Is there any good solutions?
Another question: how to monitor how much cpu num each process use? Does multiprocessing assign cpu num equally, like 50-50, 33-33-33, 25-25-25-25?

Related

linux -- relinquishing your CPU time slice programatically in user mode

What I'd like to do is to code a process like the following:
while(forever)
{
do something;
relinquish current cpu time slice;
}
Is there some call to make in Linux that simply terminates your time slice so that the forever loop will not hog the entire a CPU thread? I'm sure you can make another system call or something but that invokes possibly unnecessary kernel/user CPU work, I just want to say I am done with my timeslice and reschedule.
This type of call could also be very nice in a realtime environment.
The sched_yield(2) system call does exactly what you want. Just #include <sched.h> and call sched_yield();.

Addressing multiple B200 devices through the UHD API

I have 2 B210 radios on USB3 on a Windows 10 system using the UHD USRP C API latest release and Python 3.6 as the programming environment. I can "sort of" run them simultaneously in separate processes but would like to know if it is possible to run them in a single thread? How?
1 Happy to move to Linux if it makes things easier, I'm just more familiar with Windows.
2 "sort of" = I sometimes get errors which might be the two processes colliding somewhere down the stack.
The code below illustrates the race condition, sometimes one or both processes fail with error code 40 (UHD_ERROR_ASSERTION) or occasionally code 11 ( UHD_ERROR_KEY )
from ctypes import (windll, byref, c_void_p, c_char_p)
from multiprocessing import Process, current_process
def pread(argstring):
# get handle for device
usrp = c_void_p(0)
uhdapi = windll.uhd
p_str=c_char_p(argstring.encode("UTF8"))
errNo = uhdapi.uhd_usrp_make(byref(usrp),p_str)
if errNo != 0:
print("\r*****************************************************************")
print("ERROR: ",errNo," IN: ", current_process())
print("=================================================================")
if usrp.value != 0:
uhdapi.uhd_usrp_free(byref(usrp))
return
if __name__ == '__main__':
while True:
p2 = Process(target=pread, args=("",))
p1 = Process(target=pread, args=("",))
p1.start()
p2.start()
p1.join()
p2.join()
print("end")
Yes, you can have multiple multi_usrp handles.
By the way, note that UHD is natively C++, and the C API is just a wrapper around that. It's designed for generating scripting interfaces like the Python thing you're using (don't know which interface between Python and the C API you're using – something self-written?).
While it's possible, there's no good reason to call the recv and send functions from the same thread – most modern machines are multi-threaded, and you should make use of that. Real-time SDR is a CPU-intensive task and you should use all the CPU resources available to get data to and from the driver, as to avoid overflowing buffers.

how to change process (application) priority from Normal to Low Programmatically in Golang

How to change Process Priority Class Programmatically in Golang?
I have CPU intensive task and I want system and user programs have higher priority so my Golang application run only when the system is idle, or better using free CPU cores.
Like this
System.Diagnostics.Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.Idle;
https://stackoverflow.com/questions/10391449/set-process-priority-of-an-application
But in Golang.
Thanks in Advance
Process's scheduling priority is a OS and platform dependent setting, so you might be want to look at the syscall package:
func Setpriority(which int, who int, prio int) (err error)
This works on Linux.

why go routine only operate on one core [duplicate]

I'm testing this Go code on my VirtualBoxed Ubuntu 11.4
package main
import ("fmt";"time";"big")
var c chan *big.Int
func sum( start,stop,step int64) {
bigStop := big.NewInt(stop)
bigStep := big.NewInt(step)
bigSum := big.NewInt(0)
for i := big.NewInt(start);i.Cmp(bigStop)<0 ;i.Add(i,bigStep){
bigSum.Add(bigSum,i)
}
c<-bigSum
}
func main() {
s := big.NewInt( 0 )
n := time.Nanoseconds()
step := int64(4)
c = make( chan *big.Int , int(step))
stop := int64(100000000)
for j:=int64(0);j<step;j++{
go sum(j,stop,step)
}
for j:=int64(0);j<step;j++{
s.Add(s,<-c)
}
n = time.Nanoseconds() - n
fmt.Println(s,float64(n)/1000000000.)
}
Ubuntu has access to all my 4 cores. I checked this with simultaneous run of several executables and System Monitor.
But when I'm trying to run this code, it's using only one core and is not gaining any profit of parallel processing.
What I'm doing wrong?
You probably need to review the Concurrency section of the Go FAQ, specifically these two questions, and work out which (if not both) apply to your case:
Why doesn't my multi-goroutine program
use multiple CPUs?
You must set the GOMAXPROCS shell environment
variable or use the similarly-named function
of the runtime package to allow the run-time
support to utilize more than one OS thread.
Programs that perform parallel computation
should benefit from an increase in GOMAXPROCS.
However, be aware that concurrency is not parallelism.
Why does using GOMAXPROCS > 1
sometimes make my program slower?
It depends on the nature of your
program. Programs that contain several
goroutines that spend a lot of time
communicating on channels will
experience performance degradation
when using multiple OS threads. This
is because of the significant
context-switching penalty involved in
sending data between threads.
Go's goroutine scheduler is not as
good as it needs to be. In future, it
should recognize such cases and
optimize its use of OS threads. For
now, GOMAXPROCS should be set on a
per-application basis.
For more detail on this topic see the
talk entitled Concurrency is not Parallelism.

Poor performance / lockup with STM

I'm writing a program where a large number of agents listen for events and react on them. Since Control.Concurrent.Chan.dupChan is deprecated I decided to use TChan's as advertised.
The performance of TChan is much worse than I expected. I have the following program that illustrates the issue:
{-# LANGUAGE BangPatterns #-}
module Main where
import Control.Concurrent.STM
import Control.Concurrent
import System.Random(randomRIO)
import Control.Monad(forever, when)
allCoords :: [(Int,Int)]
allCoords = [(x,y) | x <- [0..99], y <- [0..99]]
randomCoords :: IO (Int,Int)
randomCoords = do
x <- randomRIO (0,99)
y <- randomRIO (0,99)
return (x,y)
main = do
chan <- newTChanIO :: IO (TChan ((Int,Int),Int))
let watcher p = do
chan' <- atomically $ dupTChan chan
forkIO $ forever $ do
r#(p',_counter) <- atomically $ readTChan chan'
when (p == p') (print r)
return ()
mapM_ watcher allCoords
let go !cnt = do
xy <- randomCoords
atomically $ writeTChan chan (xy,cnt)
go (cnt+1)
go 1
When compiled (-O) and run the program first will output something like this:
./tchantest
((0,25),341)
((0,33),523)
((0,33),654)
((0,35),196)
((0,48),181)
((0,48),446)
((1,15),676)
((1,50),260)
((1,78),561)
((2,30),622)
((2,38),383)
((2,41),365)
((2,50),596)
((2,57),194)
((3,19),259)
((3,27),344)
((3,33),65)
((3,37),124)
((3,49),109)
((3,72),91)
((3,87),637)
((3,96),14)
((4,0),34)
((4,17),390)
((4,73),381)
((4,74),217)
((4,78),150)
((5,7),476)
((5,27),207)
((5,47),197)
((5,49),543)
((5,53),641)
((5,58),175)
((5,70),497)
((5,88),421)
((5,89),617)
((6,0),15)
((6,4),322)
((6,16),661)
((6,18),405)
((6,30),526)
((6,50),183)
((6,61),528)
((7,0),74)
((7,28),479)
((7,66),418)
((7,72),318)
((7,79),101)
((7,84),462)
((7,98),669)
((8,5),126)
((8,64),113)
((8,77),154)
((8,83),265)
((9,4),253)
((9,26),220)
((9,41),255)
((9,63),51)
((9,64),229)
((9,73),621)
((9,76),384)
((9,92),569)
...
And then, at some point, will stop writing anything, while still consuming 100% cpu.
((20,56),186)
((20,58),558)
((20,68),277)
((20,76),102)
((21,5),396)
((21,7),84)
With -threaded the lockup is even faster and occurs after only a handful of lines. It will also consume whatever number of cores are made available through RTS' -N flag.
Additionally the performance seems rather poor - only about 100 events per second are processed.
Is this a bug in STM or am I misunderstanding something about semantics of STM?
The program is going to perform quite badly. You're spawning off 10,000 threads all of which will queue up waiting for a single TVar to be written to. So once they're all going, you may well get this happening:
Each of the 10,000 threads tries to read from the channel, finds it empty, and adds itself to the wait queue for the underlying TVar. So you'll have 10,000 queue-up events, and 10,000 processes in the wait queue for the TVar.
Something is written to the channel. This will unqueue each of the 10,000 threads and put it back on the run-queue (this may be O(N) or O(1), depending on how the RTS is written).
Each of the 10,000 threads must then process the item to see if it's interested in it, which most won't be.
So each item will cause processing O(10,000). If you see 100 events per second, that means that each thread requires about 1 microsecond to wake up, read a couple of TVars, write to one and queue up again. That doesn't seem so unreasonable. I don't understand why the program would grind to a complete halt, though.
In general, I would scrap this design and replace it as follows:
Have a single thread reading the event channel, which maintains a map from coordinate to interested-receiver-channel. The single thread can then pick out the receiver(s) from the map in O(log N) time (much better than O(N), and with a much smaller constant factor involved), and send the event to just the interested receiver. So you perform just one or two communications to the interested party, rather than 10,000 communications to everyone. A list-based form of the idea is written in CHP in section 5.4 of this paper: http://chplib.files.wordpress.com/2011/05/chp.pdf
This is a great test case! I think you've actually created a rare instance of genuine livelock/starvation. We can test this by compiling with -eventlog and running with -vst or by compiling with -debug and running with -Ds. We see that even as the program "hangs" the runtime still is working like crazy, jumping between blocked threads.
The high-level reason is that you have one (fast) writer and many (fast) readers. The readers and writer both need to access the same tvar representing the end of the queue. Let's say that nondeterministically one thread succeeds and all others fail when this happens. Now, as we increase the number of threads in contention to 100*100, then the probability of the reader making progress rapidly goes towards zero. In the meantime, the writer in fact takes longer in its access to that tvar than do the readers, so that makes things worse for it.
In this instance, putting a tiny throttle between each invocation of go for the writer (say, threadDelay 100) is enough to fix the problem. It gives the readers enough time to all block between successive writes, and so eliminates the livelock. However, I do think that it would be an interesting problem to improve the behavior of the runtime scheduler to deal with situations like this.
Adding to what Neil said, your code also has a space leak (noticeable with smaller n): After fixing the obvious tuple build-up issue by making tuples strict, I was left with the following profile: What's happening here, I think, is that the main thread is writing data to the shared TChan faster than the worker threads can read it (TChan, like Chan, is unbounded). So the worker threads spend most of their time reexecuting their respective STM transactions, while the main thread is busy stuffing even more data into the channel; this explains why your program hangs.

Resources