perl script performance misbehave while calling another script - performance

I have a perlscript which calls another perl script which functions normal database queries.
But sometimes when the inner script calling is so frequent then it will not inserting proper values in database, in debug I found that it was not able to fetch all the records from database which is most important before inserting the new calculated values.
I used system() method to call child script. this will wait till the child process end but then how it is possible to misbehave while the calling of child is frequently. In normal scenario main script will hold for 30 seconds which results proper executions of child script.
can anyone have any suggestion for debugging my code or any solution for this kind of issue.

Running Perl scripts for every DB insert is very ineffective.
You should notice that Perl will first compile called script (every time it has been called) - so it creates significant overhead.
Much better to use OOP and to put DB handling code into separate class - it will be compiled only once - at run time. Or you can use modules and put DB code into functions, they will be compiled only once too. Look at "use" pragma.
For example: simple programm with module
main_file.pl
use strict;
use DB_code;
DB_code::insert($data);
DB_code.pm
package DB_code;
use strict;
sub insert {
my $data = shift;
print "Your data has been inserted!";
}
1;

Related

Passing multiple arguments to external programs in a Pipeline

I'm trying to build a pipeline for NGS data.
I made a small example pipeline for passing commands to shell. Example pipeline has two scripts thats called from shell that just concatenates(sumtool.py) and multiplies(multool.py) values in many dataframes (10 in this case). My wrapper(wrapper.py) handles with the input and passes the commands that runs the scripts in order. Here is the relevant part of the code from the wrapper:
def run_cmd(orig_func):
#wraps(orig_func)
def wrapper(*args,**kwargs):
cmdls = orig_func(*args,**kwargs)
cmdc = ' '.join(str(arg) for arg in cmdls)
cmd = cmdc.replace(',','')
Popen(cmd,shell=True).wait()
return wrapper
#run_cmd
def runsumtool(*args):
return args
for file in getcsv():
runsumtool('python3','sumtool.py','--infile={}'.format(file),'--outfile={}'.format(dirlist[1]))
This works alright but I want to be able to pass all the commands at once for the first script with all the dataframes wait for it to finish and then run the second script with all commands at once for every dataframe. Since Popen().wait() waits for each command it takes way longer.
I tried to incorporate luigi for a solution but I wasn't successful running external programs or trying to pass multiple I/O's with luigi. Any tip on that is appreciated.
Another solution I'm imagining is passing the samples individually all at once but I'm not sure how to put it in python(or any other language really). This would also solve the I/O problem with luigi.
thanks
Note1: This is a small example pipeline I build. My main purpose is to call programs like bwa, picard in a pipeline ... which i cannot import.
Note2: I'm using Popen from subprocess already. You can find it between lines 4 and 5.

Bash scripting, react immediately to signals

I have a shell script which runs very large simulation binaries. This becomes problematic when I want to request some output of variables in the script. For instance, when I run 10 large simulations, I want to be able to print which iteration I am on without having to wait a minute or two for the current simulation to terminate.
Currently, I am using the trap command. However, the script does not react immediately to signals but will only execute the binded function when the current iteration terminates. I will post the code if anyone needs it.
You should start threads for each large thing you're going to run. Have those threads dump results somewhere, then you have your main method free waiting to interrogate the results on the fly.

How to exit the entire call stack of shell scripts if a child script fails?

I have set of shell scripts, around 20-30, that are used for performing one big task as a whole. The wrapper script calls mainly the high-level task scripts but internally those scripts calls other scripts like and the flow goes on in a nested manner.
I want to know if there is a way to exit the entire call stack if some critical script fails. Normally I run exit 125 command and then catch that in caller script and so on but I feel that little complicated. Is there a special exit that will abort the entire call stack? I don't want to use kill command to abort the wrapper script process.
You could have your main wrapper script start every sub-script in its own process group, using e.g. chpst -P.
Then the sub-scripts, as well as their children, could kill their own process group by sending it a KILL signal, and this would not affect the main wrapper script.
I think this would be a bad idea and what you're currently doing is the good way, though (because it makes the code easier to follow).

redis: EVAL and the TIME

I like the Lua-scripting for redis but i have a big problem with TIME.
I store events in a SortedSet.
The score is the time, so that in my application i can view all events in given time-window.
redis.call('zadd', myEventsSet, TIME, EventID);
Ok, but this is not working - i can not access the TIME (Servertime).
Is there any way to get a time from the Server without passing it as an argument to my lua-script? Or is passing the time as argument the best way to do it?
This is explicitly forbidden (as far as I remember). The reasoning behind this is that your lua functions must be deterministic and depend only on their arguments. What if this Lua call gets replicated to a slave with different system time?
Edit (by Linus G Thiel): This is correct. From the redis EVAL docs:
Scripts as pure functions
A very important part of scripting is writing scripts that are pure functions. Scripts executed in a Redis instance are replicated on slaves by sending the script -- not the resulting commands.
[...]
In order to enforce this behavior in scripts Redis does the following:
Lua does not export commands to access the system time or other external state.
Redis will block the script with an error if a script calls a Redis command able to alter the data set after a Redis random command like RANDOMKEY, SRANDMEMBER, TIME. This means that if a script is read-only and does not modify the data set it is free to call those commands. Note that a random command does not necessarily mean a command that uses random numbers: any non-deterministic command is considered a random command (the best example in this regard is the TIME command).
There is a wealth of information on why this is, how to deal with this in different scenarios, and what Lua libraries are available to scripts. I recommend you read the whole documentation!

Count number of executions of batch-script

This is my problem, I've got a batch-script that I can't modify (lets call it foo) and I would like to count how many times/day this script is executed - to keep track of that data.
Preferably, I would like to write the number of executions with date and exit-code to some kind of log file.
So my question is if this is possible and in that case - how? To create a batch-script/something that works in the background and writes every execution of foo to a log.
(I know this would be easy if I could modify foo but I can't. Also, everything is running on WinXP machines.)
You could write a wrapper script that does the logging and calls the existing script. Then use the wrapper in place of the original script
Consider writing a program that interrogates the Task Manager.
See http://www.netomatix.com/ProcDiagnostics.aspx
You could, for example, write a simple Console app which runs on a timer; every 5 seconds it checks that your foo application process exists. If it finds that it does, it assumes that find as the start time of the application; if it doesn't find it, it assumes the application has now closed and logs that information. It wouldn't be accurate to the second by any means, but would give you a rough approximation of when the thing is running and closing.
You might be able to configure Process Monitor
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx to capture the information you require

Resources