How to execute multiple proc in tcl scripting - parallel-processing

I have 4 proc in my tcl script. Each proc contain a while loop to wait for a task to be finished and to process the result files subsequently. My purpose now is to parallel this 4 process together instead of 1 by 1. Anyone has any idea?
Background:
The normal way before is I open 4 terminal in KDE/GNOME to execute the different tasks. 4 different tasks actually running together.

Tcl threads can do the job just fine: http://www.tcl.tk/man/tcl8.6/ThreadCmd/thread.htm
Of course you may just leave everything as it is and run your scripts in the background within one terminal, if that's what you are looking for, e.g.
script1.tcl &
script2.tcl &

threading is better option for this scenario and it gives better control for your subprocess. You refer the following link for simple example : https://www.activestate.com/blog/2016/09/threads-done-right-tcl

Related

How to monitor and control background processes in shell script

I need to write a shell (bash) script that will be executing several Hive queries.
Each of the queries will produce a directory with a lot of files.
After all queries are finished I need to process all these files in a specific order.
I want to run Hive queries in parallel as background processes as each one might take couple of hours.
I would also like to parallelize resulting file processing but there are some culprits, that I don't know how to handle. I.e. I can start processing results of the first and second queries as soon as they are finished, but for the third, I need to hold until first two processors are done. Similarly for the fourth and fifth.
I won't have any problems writing such a program in Java, but how to do it in shell - beats me.
If someone can give me a hint on how can I monitor execution of these components in the shell script, I would appreciate it greatly.

Correct use Bash wait command with unknown processes number

Im writing a bash script that essentially fires off a python script that takes roughly 10 hours to complete, followed by an R script that checks the outputs of the python script for anything I need to be concerned about. Here is what I have:
ProdRun="python scripts/run_prod.py"
echo "Commencing Production Run"
$ProdRun #Runs python script
wait
DupCompare="R CMD BATCH --no-save ../dupCompareTD.R" #Runs R script
$DupCompare
Now my issues is that often the python script can generate a whole heap of different processes on our linux server depending on its input, with lots of different PIDs AND we have heaps of workers using the same server firing off scripts. As far as I can tell from reading, the 'wait' command must wait for all processes to finish or for a specific PID to finish, but when i cannot tell what or how many PIDs will be assigned/processes run, how exactly do I use it?
EDIT: Thank you to all that helped, here is what caused my dilemma for anyone google searching this. I broke up the ProdRun python script into its individual script that it was itself calling, but still had the issue, I think found that one of these scripts was also calling another smaller script that had a "&" at the end of it that was ignoring any commands to wait on it inside the python script itself. Simply removing this and inserting a line of "os.system()" allowed all the code to run sequentially.
It sounds like you are trying to implement a job scheduler with possibly some complex dependencies between different tasks. I recommend to use a job scheduler instead. It allows you to specify to run those jobs whilst also benefitting from features like monitoring, handling exceptional cases, errors, ...
Examples are: the open source rundeck https://github.com/rundeck/rundeck or the commercial one http://www.bmcsoftware.uk/it-solutions/control-m.html
Make your Python program wait on the children it spawns. That's the proper way to fix this scenario. Then you don't have to wait for Python after it finishes (sic).
(Also, don't put your commands in variables.)

Passing multiple arguments to external programs in a Pipeline

I'm trying to build a pipeline for NGS data.
I made a small example pipeline for passing commands to shell. Example pipeline has two scripts thats called from shell that just concatenates(sumtool.py) and multiplies(multool.py) values in many dataframes (10 in this case). My wrapper(wrapper.py) handles with the input and passes the commands that runs the scripts in order. Here is the relevant part of the code from the wrapper:
def run_cmd(orig_func):
#wraps(orig_func)
def wrapper(*args,**kwargs):
cmdls = orig_func(*args,**kwargs)
cmdc = ' '.join(str(arg) for arg in cmdls)
cmd = cmdc.replace(',','')
Popen(cmd,shell=True).wait()
return wrapper
#run_cmd
def runsumtool(*args):
return args
for file in getcsv():
runsumtool('python3','sumtool.py','--infile={}'.format(file),'--outfile={}'.format(dirlist[1]))
This works alright but I want to be able to pass all the commands at once for the first script with all the dataframes wait for it to finish and then run the second script with all commands at once for every dataframe. Since Popen().wait() waits for each command it takes way longer.
I tried to incorporate luigi for a solution but I wasn't successful running external programs or trying to pass multiple I/O's with luigi. Any tip on that is appreciated.
Another solution I'm imagining is passing the samples individually all at once but I'm not sure how to put it in python(or any other language really). This would also solve the I/O problem with luigi.
thanks
Note1: This is a small example pipeline I build. My main purpose is to call programs like bwa, picard in a pipeline ... which i cannot import.
Note2: I'm using Popen from subprocess already. You can find it between lines 4 and 5.

Creating a shell script which can spawn multiple concurrent processes which call a specified web service

I am trying to create a load testing shell script essentially what I am looking to do is have the script spawn some N number of concurrent processes and have each of those processes call a specified URL and perform a few basic actions. I am having a hard time figuring this out - any help would be awesome!!
If you really need to use shell, take a look at Bash: parallel processes. But there are load testing tools like ab (Apache HTTP server benchmarking) that can do the job for you.
You can use ab as simple as:
ab -n 10 -c 2 -A myuser:mypassword http://localhost:8080/
For more examples, look at Howto: Performance Benchmarks a Webserver.
have a look at this article:
http://prll.sourceforge.net/shell_parallel.html
as described:
"Parallel batch processing in the shell
How to process a large batch job using several concurrent processes in bash or zsh
This article describes three methods of parallel execution: the first is not very performant and the other two are not safe to use. A more complete solution is called prll and can be found here. This article is not meant to be good advice, but instead laments the state of scriptable job control in shells."

Count number of executions of batch-script

This is my problem, I've got a batch-script that I can't modify (lets call it foo) and I would like to count how many times/day this script is executed - to keep track of that data.
Preferably, I would like to write the number of executions with date and exit-code to some kind of log file.
So my question is if this is possible and in that case - how? To create a batch-script/something that works in the background and writes every execution of foo to a log.
(I know this would be easy if I could modify foo but I can't. Also, everything is running on WinXP machines.)
You could write a wrapper script that does the logging and calls the existing script. Then use the wrapper in place of the original script
Consider writing a program that interrogates the Task Manager.
See http://www.netomatix.com/ProcDiagnostics.aspx
You could, for example, write a simple Console app which runs on a timer; every 5 seconds it checks that your foo application process exists. If it finds that it does, it assumes that find as the start time of the application; if it doesn't find it, it assumes the application has now closed and logs that information. It wouldn't be accurate to the second by any means, but would give you a rough approximation of when the thing is running and closing.
You might be able to configure Process Monitor
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx to capture the information you require

Resources