I'm trying to build a pipeline for NGS data.
I made a small example pipeline for passing commands to shell. Example pipeline has two scripts thats called from shell that just concatenates(sumtool.py) and multiplies(multool.py) values in many dataframes (10 in this case). My wrapper(wrapper.py) handles with the input and passes the commands that runs the scripts in order. Here is the relevant part of the code from the wrapper:
def run_cmd(orig_func):
#wraps(orig_func)
def wrapper(*args,**kwargs):
cmdls = orig_func(*args,**kwargs)
cmdc = ' '.join(str(arg) for arg in cmdls)
cmd = cmdc.replace(',','')
Popen(cmd,shell=True).wait()
return wrapper
#run_cmd
def runsumtool(*args):
return args
for file in getcsv():
runsumtool('python3','sumtool.py','--infile={}'.format(file),'--outfile={}'.format(dirlist[1]))
This works alright but I want to be able to pass all the commands at once for the first script with all the dataframes wait for it to finish and then run the second script with all commands at once for every dataframe. Since Popen().wait() waits for each command it takes way longer.
I tried to incorporate luigi for a solution but I wasn't successful running external programs or trying to pass multiple I/O's with luigi. Any tip on that is appreciated.
Another solution I'm imagining is passing the samples individually all at once but I'm not sure how to put it in python(or any other language really). This would also solve the I/O problem with luigi.
thanks
Note1: This is a small example pipeline I build. My main purpose is to call programs like bwa, picard in a pipeline ... which i cannot import.
Note2: I'm using Popen from subprocess already. You can find it between lines 4 and 5.
I've been using Julia in parallel on my computer successfully but want to increase the number of processors/workers I use so I plan to use my departmental cluster (UCL Econ). When just using julia on my computer, I have two seperate files. FileA contains all the functions I use, including the main function funcy(x,y,z). FileB calls this function over several processors as follows:
addprocs(4)
require("FileA.jl")
solution = pmap(imw -> funcy(imw,y,z), 1:10)
When I try to run this on the cluster, the require statement does not seem to work (though I don't get an explicit error output which is frustrating). Any advice?
I recognize that it is possible to run a script in the background if the code is encompassed by the syntax:
job1 = fork do
# functional code in script
end
Process.detach(job1)
Now I have attempted this format on many occasions and it has worked on each occasion to successfully place the script in the background, so that it does not terminate when the Terminal window is closed, however for this particular script (too large to include), it stubbornly remains tethered to its window, and terminates when that window is closed.
This makes no sense, so my questions:
Is there any Ruby programming construct which could keep a program tethered to a window and in the foreground? e.g. print function, which requires a window to print to (so may possibly effect things).
Is there a better alternative to the above-mentioned format for placing a script in the background, using code within that script.
A colleague has a MATLAB startup.m file that contains interactive code (it calls the command questdlg to ask him which project directory he wishes to work in).
This works fine for him when running MATLAB directly. However, he also needs to run MATLAB code in parallel, having started up a matlabpool.
When starting up, the workers in the matlabpool are running his startup.m file, getting to the questdlg and then hanging (infinitely, or until Ctrl C).
An easy solution is to just get rid of the interactive code from his startup.m, as it's not really essential.
But is there a way to detect whether this startup.m is being run by a worker starting up - something similar to isdeployed or ismcc? Then he could keep the interactive code that he finds useful, but only execute it when not starting up a worker.
The command getCurrentWorker seemed like it might be what was needed, but I believe that only works during the execution of a task, rather than at startup.
You could use the usejava function to see if the interactive desktop is running, which is probably a good enough approximation unless you frequently use -nodesktop mode.
if usejava('desktop')
questdlg(...);
end
Take a look at labindex and, failing that, labSend and labReceive.
At least for my R2014b,
isempty(getCurrentWorker)
seem to do the job:
>> getCurrentWorker
ans =
[]
>> parfor i=1:2;disp(getCurrentWorker);end
Worker
Host: IMP.OIMRDS
ComputerType: WIN64
ProcessId: 15784
Worker
Host: IMP.OIMRDS
ComputerType: WIN64
ProcessId: 17220
This question already has an answer here:
How to abort a running program in MATLAB?
(1 answer)
Closed 7 years ago.
I write a long running script in Matlab, e.g.
tic;
d = rand(5000);
[a,b,c] = svd(d);
toc;
It seems running forever. Becasue I press F5 in the editor window. So I cannot press C-Break to stop in the Matlab console.
I just want to know how to stop the script. I am current use Task Manager to kill Matlab, which is really silly.
Thanks.
Matlab help says this-
For M-files that run a long time, or that call built-ins or MEX-files that run a long time, Ctrl+C does not always effectively stop execution. Typically, this happens on Microsoft Windows platforms rather than UNIX[1] platforms. If you experience this problem, you can help MATLAB break execution by including a drawnow, pause, or getframe function in your M-file, for example, within a large loop. Note that Ctrl+C might be less responsive if you started MATLAB with the -nodesktop option.
So I don't think any option exist. This happens with many matlab functions that are complex. Either we have to wait or don't use them!.
If ctrl+c doesn't respond right away because your script is too long/complex, hold it.
The break command doesn't run when matlab is executing some of its deeper scripts, and either it won't log a ctrl sequence in the buffer, or it clears the buffer just before or just after it completes those pieces of code. In either case, when matlab returns to execute more of your script, it will recognize that you are holding ctrl+c and terminate.
For longer running programs, I usually try to find a good place to provide a status update and I always accompany that with some measure of time using tic and toc. Depending on what I am doing, I might use run time, segment time, some kind of average, etc...
For really long running programs, I found this to be exceptionally useful
http://www.mathworks.com/matlabcentral/fileexchange/16649-send-text-message-to-cell-phone/content/send_text_message.m
but it looks like they have some newer functions for this too.
MATLAB doesn't respond to Ctrl-C while executing a mex implemented function such as svd. Also when MATLAB is allocating big chunk of memory it doesn't respond. A good practice is to always run your functions for small amount of data, and when all test passes run it for actual scale. When time is an issue, you would want to analyze how much time each segment of code runs as well as their rough time complexity.
Consider having multiple matlab sessions. Keep the main session window (the pretty one with all the colours, file manager, command history, workspace, editor etc.) for running stuff that you know will terminate.
Stuff that you are experimenting with, say you are messing with ode suite and you get lots of warnings: matrix singular, because you altered some parameter and didn't predict what would happen, run in a separate session:
dos('matlab -automation -r &')
You can kill that without having to restart the whole of Matlab.
One solution I adopted--for use with java code, but the concept is the same with mexFunctions, just messier--is to return a FutureValue and then loop while FutureValue.finished() or whatever returns true. The actual code executes in another thread/process. Wrapping a try,catch around that and a FutureValue.cancel() in the catch block works for me.
In the case of mex functions, you will need to return somesort of pointer (as an int) that points to a struct/object that has all the data you need (native thread handler, bool for complete etc). In the case of a built in mexFunction, your mexFunction will most likely need to call that mexFunction in the separate thread. Mex functions are just DLLs/shared objects after all.
PseudoCode
FV = mexLongProcessInAnotherThread();
try
while ~mexIsDone(FV);
java.lang.Thread.sleep(100); %pause has a memory leak
drawnow; %allow stdout/err from mex to display in command window
end
catch
mexCancel(FV);
end
Since you mentioned Task Manager, I'll guess you're using Windows. Assuming you're running your script within the editor, if you aren't opposed to quitting the editor at the same time as quitting the running program, the keyboard shortcut to end a process is:
Alt + F4
(By which I mean press the 'Alt' and 'F4' keys on your keyboard simultaneously.)
Alternatively, as mentioned in other answers,
Ctrl + C
should also work, but will not quit the editor.
if you are running your matlab on linux, you can terminate the matlab by command in linux consule.
first you should find the PID number of matlab by this code:
top
then you can use this code to kill matlab:
kill
example:
kill 58056
To add on:
you can insert a time check within a loop with intensive or possible deadlock, ie.
:
section_toc_conditionalBreakOff;
:
where within this section
if (toc > timeRequiredToBreakOff) % time conditional break off
return;
% other options may be:
% 1. display intermediate values with pause;
% 2. exit; % in some cases, extreme : kill/ quit matlab
end