Bash Script slows down executed program - bash

I have a program where i test different data sets and configuration. I have a script to execute all of those.
imagine my code as :
start = omp_get_wtime()
function()
end = omp_get_wtime()
print(end-start)
and the bash script as
for a in "${first_option[#]}"
do
for b in "${second_option[#]}"
do
for c in "${third_option[#]}"
do
printf("$a $b $c \n")
./exe $a $b $c >> logs.out
done
done
done
now when i execute the exact same configurations by hand, i get varying results from 10 seconds to 0.05 seconds but when i execute the script, i get the same results on the up side but for some reason i can't get any timings lower than 1 seconds. All the configurations that manually compute at less than a second get written in the file at 1.001; 1.102; 0.999 ect...
Any ideas of what is going wrong?
Thanks

My suggestion would be to remove the ">> logs.out" to see what happens with the speed.
From there you can try several options:
Replace ">> log.out" with "| tee -a log.out"
Investigate stdbuf and if your code is python, look at "PYTHONUNBUFFERED=1" shell variable. See also: How to disable stdout buffer when running shell
Redirect bash print command with ">&2" (write to stderr) and move ">> log.out" or "| tee -a log.out" behind the last "done"
You can probably see what is causing the delay by using:
strace -f -t bash -c "<your bash script>" | tee /tmp/strace.log
With a little luck you will see which system call is causing the delay on the bottom of the screen. But it is a lot of information to process. Alternatively look for the name of your "./exe" in "/tmp/strace.log" after tracing is done. And then look for the system calls after invocation (process start of ./exe) that eat most time. Could be just many calls ... Don't spent to much time on this if you don't have the stomach for it.

Related

Determining all the processes started with a given executable in Linux

I have this need to collect\log all the command lines that were used to start a process on my machine during the execution of a Perl script which happens to be a test automation script. This Perl script starts the executable in question (MySQL) multiple times with various command lines and I would like to inspect all of the command lines of those invocations. What would be the right way to do this? One possibility i see is run something like "ps -aux | grep mysqld | grep -v grep" in a loop in a shell script and capture the results in a file but then I would have to do some post processing on this and remove duplicates etc and I could possibly miss some process command lines because of timing issues. Is there a better way to achieve this.
Processing the ps output can always miss some processes. It will only capture the ones currently existing. The best way would be to modify the Perl script to log each command before or after it executes it.
If that's not an option, you can get the child pids of the perl script by running:
pgrep -P $pid -a
-a gives the full process command. $pid is the pid of the perl script. Then process just those.
You could use strace to log calls to execve.
$ strace -f -o strace.out -e execve perl -e 'system("echo hello")'
hello
$ egrep ' = 0$' strace.out
11232 execve("/usr/bin/perl", ["perl", "-e", "system(\"echo hello\")"], 0x7ffc6d8e3478 /* 55 vars */) = 0
11233 execve("/bin/echo", ["echo", "hello"], 0x55f388200cf0 /* 55 vars */) = 0
Note that strace.out will also show the failed execs (where execve returned -1), hence the egrep command to find the successful ones. A successful execve call does not return, but strace records it as if it returned 0.
Be aware that this is a relatively expensive solution because it is necessary to include the -f option (follow forks), as perl will be doing the exec call from forked subprocesses. This is applied recursively, so it means that your MySQL executable will itself run through strace. But for a one-off diagnostic test it might be acceptable.
Because of the need to use recursion, any exec calls done from your MySQL executable will also appear in the strace.out, and you will have to filter those out. But the PID is shown for all calls, and if you were to log also any fork or clone calls (i.e. strace -e execve,fork,clone), you would see both the parent and child PIDs, in the form <parent_pid> clone(......) = <child_pid> so then you should hopefully then have enough information to reconstruct the process tree and decide which processes you are interested in.

Stdout based on a PID (OSX)

I am processing many files using many separate scripts. To speed up the processing I placed them in the background using &, however with this I lost the ability to keep track of what they are doing (I can't see the output).
Is there a simple way of getting output based on PID? I found some answers which are based on fg [job number], but I can't figure out job number from PID.
You might consider running your scripts from screen then return to them whenever you want:
$ screen ./script.sh
To "detach" and keep the script running press ControlA followed by ControlD
$ screen -ls
Will list your screen sessions
$ screen -r <screen pid number>
Returns to a screen session
The few commands above barely touches on the abilities that screen has, so check out the man pages about it and you might be surprised by all it can do.
A script that is backgrounded will normally just continue to write to standard output; if you run several, they will all be dumping their output intermingled with each other. Dump them to a file instead. For example, generate an output file name using $$ (current process ID) and write to that file.
outfile=process.$$.out
# ...
echo Output >$outfile
will write to, say, process.27422.out.
The answers by other users are right - exec &>$outfile or exec &>$outfifo or exec &>$another_tty is what you need to do & is the correct way.
However, if you have already started the scripts, then there is a workaround that you can use. I had written this script to redirect the stdout/stderr of any running process to another file/terminal.
$ cat redirect_terminal
#!/bin/bash
PID=$1
stdout=$2
stderr=${3:-$2}
if [ -e "/proc/$PID" ]; then
gdb -q -n -p $PID <<EOF >/dev/null
p dup2(open("$stdout",1),1)
p dup2(open("$stderr",1),2)
detach
quit
EOF
else
echo No such PID : $PID
fi
Sample usage:
./redirect_terminal 1234 /dev/pts/16
Where,
1234 is the PID of the script process.
/dev/pts/16 is another terminal opened separately.
Note that this updated stdout/stderr will not be inherited to the already running children of that process.
Consider using GNU Parallel - it is easily installed on OSX with homebrew. Not only will it tag your output lines, but it will also keep your CPUs busy, scheduling another job immediately the previous one finishes. You can make up your own tags with substitution parameters.
Let's say you have 20 files called file{10..20}.txt to process:
parallel --tagstring "MyTag-{}" 'echo Start; echo Processing file {}; echo Done' ::: file*txt
MyTag-file15.txt Start
MyTag-file15.txt Processing file file15.txt
MyTag-file15.txt Done
MyTag-file16.txt Start
MyTag-file16.txt Processing file file16.txt
MyTag-file16.txt Done
MyTag-file17.txt Start
MyTag-file17.txt Processing file file17.txt
MyTag-file17.txt Done
MyTag-file18.txt Start
MyTag-file18.txt Processing file file18.txt
MyTag-file18.txt Done
MyTag-file14.txt Start
MyTag-file14.txt Processing file file14.txt
MyTag-file14.txt Done
MyTag-file13.txt Start
MyTag-file13.txt Processing file file13.txt
MyTag-file13.txt Done
MyTag-file12.txt Start
MyTag-file12.txt Processing file file12.txt
MyTag-file12.txt Done
MyTag-file19.txt Start
MyTag-file19.txt Processing file file19.txt
MyTag-file19.txt Done
MyTag-file20.txt Start
MyTag-file20.txt Processing file file20.txt
MyTag-file20.txt Done
MyTag-file11.txt Start
MyTag-file11.txt Processing file file11.txt
MyTag-file11.txt Done
MyTag-file10.txt Start
MyTag-file10.txt Processing file file10.txt
MyTag-file10.txt Done
If you want the output in order, use parallel -k to keep the output order
If you want a progress report, use parallel --progress
If you want a log of when jobs started/ended, use parallel --joblog log.txt
If you want to run 32 jobs in parallel, instead of the default 1 job per CPU core, use parallel -j 32
Example joblog:
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
6 : 1461141901.514 0.005 0 38 0 0 echo Start; echo Processing file file15.txt; echo Done
7 : 1461141901.517 0.006 0 38 0 0 echo Start; echo Processing file file16.txt; echo Done
8 : 1461141901.519 0.006 0 38 0 0 echo Start; echo Processing file file17.txt; echo Done

Storing execution time of a command in a variable

I am trying to write a task-runner for command line. No rationale. Just wanted to do it. Basically it just runs a command, stores the output in a file (instead of stdout) and meanwhile prints a progress indicator of sorts on stdout and when its all done, prints Completed ($TIME_HERE).
Here's the code:
#!/bin/bash
task() {
TIMEFORMAT="%E"
COMMAND=$1
printf "\033[0;33m${2:-$COMMAND}\033[0m\n"
while true
do
for i in 1 2 3 4 5
do
printf '.'
sleep 0.5
done
printf "\b\b\b\b\b \b\b\b\b\b"
sleep 0.5
done &
WHILE=$!
EXECTIME=$({ TIMEFORMAT='%E';time $COMMAND >log; } 2>&1)
kill -9 $WHILE
echo $EXECTIME
#printf "\rCompleted (${EXECTIME}s)\n"
}
There are some unnecessarily fancy bits in there I admit. But I went through tons of StackOverflow questions to do different kinds of fancy stuff just to try it out. If it were to be applied anywhere, a lot of fat could be cut off. But it's not.
It is to be called like:
task "ping google.com -c 4" "Pinging google.com 4 times"
What it'll do is print Pinging google.com 4 times in yellow color, then on the next line, print a period. Then print another period every .5 seconds. After five periods, start from the beginning of the same line and repeat this until the command is complete. Then it's supposed to print Complete ($TIME_HERE) with (obviously) the time it took to execute the command in place of $TIME_HERE. (I've commented that part out, the current version would just print the time).
The Issue
The issue is that that instead of the execution time, something very weird gets printed. It's probably something stupid I'm doing. But I don't know where that problem originates from. Here's the output.
$ sh taskrunner.sh
Pinging google.com 4 times
..0.00user 0.00system 0:03.51elapsed 0%CPU (0avgtext+0avgdata 996maxresident)k 0inputs+16outputs (0major+338minor)pagefaults 0swaps
Running COMMAND='ping google.com -c 4';EXECTIME=$({ TIMEFORMAT='%E';time $COMMAND >log; } 2>&1);echo $EXECTIME in a terminal works as expected, i.e. prints out the time (3.559s in my case.)
I have checked and /bin/sh is a symlink to dash. (However that shouldn't be a problem because my script runs in /bin/bash as per the shebang on the top.)
I'm looking to learn while solving this issue so a solution with explanation will be cool. T. Hanks. :)
When you invoke a script with:
sh scriptname
the script is passed to sh (dash in your case), which will ignore the shebang line. (In a shell script, a shebang is a comment, since it starts with a #. That's not a coincidence.)
Shebang lines are only interpreted for commands started as commands, since they are interpreted by the system's command launcher, not by the shell.
By the way, your invocation of time does not correctly separate the output of the time builtin from any output the timed command might sent to stderr. I think you'd be better with:
EXECTIME=$({ TIMEFORMAT=%E; time $COMMAND >log.out 2>log.err; } 2>&1)
but that isn't sufficient. You will continue to run into the standard problems with trying to put commands into string variables, which is that it only works with very simple commands. See the Bash FAQ. Or look at some of these answers:
How to escape a variable in bash when passing to command line argument
bash quotes in variable treated different when expanded to command
Preserve argument splitting when storing command with whitespaces in variable
find command fusses on -exec arg
Using an environment variable to pass arguments to a command
(Or probably hundreds of other similar answers.)

Ruby shell script realtime output

script.sh
echo First!
sleep 5
echo Second!
sleep 5
echo Third!
another_script.rb
%x[./script.sh]
I want another_script.rb to print the output of script.sh as it happens. That means printing "First!", waiting five seconds, printing "Second!', waiting 5 seconds, and so on.
I've read through the different ways to run an external script in Ruby, but none seem to do this. How can I fulfill my requirements?
You can always execute this in Ruby:
system("sh", "script.sh")
Note it's important to specify how to execute this unless you have a proper #!/bin/sh header as well as the execute bit enabled.

How can I have output from one named pipe fed back into another named pipe?

I'm adding some custom logging functionality to a bash script, and can't figure out why it won't take the output from one named pipe and feed it back into another named pipe.
Here is a basic version of the script (http://pastebin.com/RMt1FYPc):
#!/bin/bash
PROGNAME=$(basename $(readlink -f $0))
LOG="$PROGNAME.log"
PIPE_LOG="$PROGNAME-$$-log"
PIPE_ECHO="$PROGNAME-$$-echo"
# program output to log file and optionally echo to screen (if $1 is "-e")
log () {
if [ "$1" = '-e' ]; then
shift
$# > $PIPE_ECHO 2>&1
else
$# > $PIPE_LOG 2>&1
fi
}
# create named pipes if not exist
if [[ ! -p $PIPE_LOG ]]; then
mkfifo -m 600 $PIPE_LOG
fi
if [[ ! -p $PIPE_ECHO ]]; then
mkfifo -m 600 $PIPE_ECHO
fi
# cat pipe data to log file
while read data; do
echo -e "$PROGNAME: $data" >> $LOG
done < $PIPE_LOG &
# cat pipe data to log file & echo output to screen
while read data; do
echo -e "$PROGNAME: $data"
log echo $data # this doesn't work
echo -e $data > $PIPE_LOG 2>&1 # and neither does this
echo -e "$PROGNAME: $data" >> $LOG # so I have to do this
done < $PIPE_ECHO &
# clean up temp files & pipes
clean_up () {
# remove named pipes
rm -f $PIPE_LOG
rm -f $PIPE_ECHO
}
#execute "clean_up" on exit
trap "clean_up" EXIT
log echo "Log File Only"
log -e echo "Echo & Log File"
I thought the commands on line 34 & 35 would take the $data from $PIPE_ECHO and output it to the $PIPE_LOG. But, it doesn't work. Instead I have to send that output directly to the log file, without going through the $PIPE_LOG.
Why is this not working as I expect?
EDIT: I changed the shebang to "bash". The problem is the same, though.
SOLUTION: A.H.'s answer helped me understand that I wasn't using named pipes correctly. I have since solved my problem by not even using named pipes. That solution is here: http://pastebin.com/VFLjZpC3
it seems to me, you do not understand what a named pipe really is. A named pipe is not one stream like normal pipes. It is a series of normal pipes, because a named pipe can be closed and a close on the producer side is might be shown as a close on the consumer side.
The might be part is this: The consumer will read data until there is no more data. No more data means, that at the time of the read call no producer has the named pipe open. This means that multiple producer can feed one consumer only when there is no point in time without at least one producer. Think of it of door which closes automatically: If there is a steady stream of people keeping the door always open either by handing the doorknob to the next one or by squeezing multiple people through it at the same time, the door is open. But once the door is closed it stays closed.
A little demonstration should make the difference a little clearer:
Open three shells. First shell:
1> mkfifo xxx
1> cat xxx
no output is shown because cat has opened the named pipe and is waiting for data.
Second shell:
2> cat > xxx
no output, because this cat is a producer which keeps the named pipe open until we tell him to close it explicitly.
Third shell:
3> echo Hello > xxx
3>
This producer immediately returns.
First shell:
Hello
The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.
Third shell
3> echo World > xxx
3>
First shell:
World
The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.
Second Shell: write into the cat > xxx window:
And good bye!
(control-d key)
2>
First shell
And good bye!
1>
The ^D key closed the last producer, the cat > xxx, and hence the consumer exits also.
In your case which means:
Your log function will try to open and close the pipes multiple times. Not a good idea.
Both your while loops exit earlier than you think. (check this with (while ... done < $PIPE_X; echo FINISHED; ) &
Depending on the scheduling of your various producers and consumers the door might by slam shut sometimes and sometimes not - you have a race condition built in. (For testing you can add a sleep 1 at the end of the log function.)
You "testcases" only tries each possibility once - try to use them multiple times (you will block, especially with the sleeps ), because your producer might not find any consumer.
So I can explain the problems in your code but I cannot tell you a solution because it is unclear what the edges of your requirements are.
It seems the problem is in the "cat pipe data to log file" part.
Let's see: you use a "&" to put the loop in the background, I guess you mean it must run in parallel with the second loop.
But the problem is you don't even need the "&", because as soon as no more data is available in the fifo, the while..read stops. (still you've got to have some at first for the first read to work). The next read doesn't hang if no more data is available (which would pose another problem: how does your program stops ?).
I guess the while read checks if more data is available in the file before doing the read and stops if it's not the case.
You can check with this sample:
mkfifo foo
while read data; do echo $data; done < foo
This script will hang, until you write anything from another shell (or bg the first one). But it ends as soon as a read works.
Edit:
I've tested on RHEL 6.2 and it works as you say (eg : bad!).
The problem is that, after running the script (let's say script "a"), you've got an "a" process remaining. So, yes, in some way the script hangs as I wrote before (not that stupid answer as I thought then :) ). Except if you write only one log (be it log file only or echo,in this case it works).
(It's the read loop from PIPE_ECHO that hangs when writing to PIPE_LOG and leaves a process running each time).
I've added a few debug messages, and here is what I see:
only one line is read from PIPE_LOG and after that, the loop ends
then a second message is sent to the PIPE_LOG (after been received from the PIPE_ECHO), but the process no longer reads from PIPE_LOG => the write hangs.
When you ls -l /proc/[pid]/fd, you can see that the fifo is still open (but deleted).
If fact, the script exits and removes the fifos, but there is still one process using it.
If you don't remove the log fifo at the cleanup and cat it, it will free the hanging process.
Hope it will help...

Resources