If I want to set the number of thread and run it on terminal, I use
export JULIA_NUM_THREADS=4
but what do I have to do, If I want to parallelise my program on Jupiter notebook? How can I set the number of threads that I want?
Related
Im writing a bash script that essentially fires off a python script that takes roughly 10 hours to complete, followed by an R script that checks the outputs of the python script for anything I need to be concerned about. Here is what I have:
ProdRun="python scripts/run_prod.py"
echo "Commencing Production Run"
$ProdRun #Runs python script
wait
DupCompare="R CMD BATCH --no-save ../dupCompareTD.R" #Runs R script
$DupCompare
Now my issues is that often the python script can generate a whole heap of different processes on our linux server depending on its input, with lots of different PIDs AND we have heaps of workers using the same server firing off scripts. As far as I can tell from reading, the 'wait' command must wait for all processes to finish or for a specific PID to finish, but when i cannot tell what or how many PIDs will be assigned/processes run, how exactly do I use it?
EDIT: Thank you to all that helped, here is what caused my dilemma for anyone google searching this. I broke up the ProdRun python script into its individual script that it was itself calling, but still had the issue, I think found that one of these scripts was also calling another smaller script that had a "&" at the end of it that was ignoring any commands to wait on it inside the python script itself. Simply removing this and inserting a line of "os.system()" allowed all the code to run sequentially.
It sounds like you are trying to implement a job scheduler with possibly some complex dependencies between different tasks. I recommend to use a job scheduler instead. It allows you to specify to run those jobs whilst also benefitting from features like monitoring, handling exceptional cases, errors, ...
Examples are: the open source rundeck https://github.com/rundeck/rundeck or the commercial one http://www.bmcsoftware.uk/it-solutions/control-m.html
Make your Python program wait on the children it spawns. That's the proper way to fix this scenario. Then you don't have to wait for Python after it finishes (sic).
(Also, don't put your commands in variables.)
I have 4 proc in my tcl script. Each proc contain a while loop to wait for a task to be finished and to process the result files subsequently. My purpose now is to parallel this 4 process together instead of 1 by 1. Anyone has any idea?
Background:
The normal way before is I open 4 terminal in KDE/GNOME to execute the different tasks. 4 different tasks actually running together.
Tcl threads can do the job just fine: http://www.tcl.tk/man/tcl8.6/ThreadCmd/thread.htm
Of course you may just leave everything as it is and run your scripts in the background within one terminal, if that's what you are looking for, e.g.
script1.tcl &
script2.tcl &
threading is better option for this scenario and it gives better control for your subprocess. You refer the following link for simple example : https://www.activestate.com/blog/2016/09/threads-done-right-tcl
I am an avid csh/tcsh user.
But the current environment I have to work on has all ksh scripts. The team works on k-shell.
So, if I select a seed and run a test in k-shell and c-shell, would the outcome be the same?
Seed is just one example, I want to know whether using alternate shell would create any divergence of end result?
The whole point of a seeding mechanism for random number generators is to be able to reproduce results regardless of other factors. This means that as long as you're running the same compiled code for the simulator (same version, basically) you're going to get the same results when passing in a seed, regardless of what machine you're running on, what shell you use, etc.
Also, the shell you use has no end effect on the executable being started, other than setting environment variables that the program might use. You're going to have to make sure that you don't diverge in this point.
I have a shell script. In that script I am starting 6 new processes. My system has 4 CPUs.
If I run the shell script the new processes spawned are automatically allocated to the one of the CPUs by default by the operating system. Now, I want to reduce the total time of running of my script. Is there a way that I can check a processor's free utilization and then chose one to run my process?
I do not want to run a process on a CPU which is >75% utilized. I would wait instead and run on a CPU which is <75% utilized.
I need to program my script in such a way that it should check the 4 CPUs' utilization and then run the process on the chosen CPU.
Can someone please help me with an example?
I recommend GNU Parallel:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
In addition, use nice.
You can tell the scheduler that a certain CPU should be used, using the taskset command:
taskset -c 1 process
will tell the scheduler that process should run on CPU1.
However, I think in most cases the built-in Linux scheduler should work well.
I want to increase the throughput of a script which does net I/O (a scraper). Instead of making it multithreaded in ruby (I use the default 1.9.1 interpreter), I want to launch multiple processes. So, is there a system for doing this to where I can track when one finishes to re-launch it again so that I have X number running at any time. ALso some will run with different command args. I was thinking of writing a bash script but it sounds like a potentially bad idea if there already exists a method for doing something like this on linux.
I would recommend not forking but instead that you use EventMachine (and the excellent em-http-request if you're doing HTTP). Managing multiple processes can be a bit of a handful, even more so than handling multiple threads, but going down the evented path is, in comparison, much simpler. Since you want to do mostly network IO, which consist mostly of waiting, I think that an evented approach would scale as well, or better than forking or threading. And most importantly: it will require much less code, and it will be more readable.
Even if you decide on running separate processes for each task, EventMachine can help you write the code that manages the subprocesses using, for example, EventMachine.popen.
And finally, if you want to do it without EventMachine, read the docs for IO.popen, Open3.popen and Open4.popen. All do more or less the same thing but give you access to the stdin, stdout, stderr (Open3, Open4), and pid (Open4) of the subprocess.
You can try fork http://ruby-doc.org/core/classes/Process.html#M003148
You can get the PID in return and see if this process run again or not.
If you want manage IO concurrency. I suggest you to use EventMachine.
You can either
implement (or find an equivalent gem) a ThreadPool (ProcessPool, in your case), or
prepare an array of all, let's say 1000 tasks to be processed, split it into, say 10 chunks of 100 tasks (10 being the number of parallel processes you want to launch), and launch 10 processes, of which each process right away receives 100 tasks to process. That way you don't need to launch 1000 processes and control that not more than 10 of them work at the same time.