I'm writing a script that should create a rotating series of debug logs as it runs over a period of time. My current problem is that when I ran it with -vx attached, I can see that it stops during the actual debugging process and doesn't proceed through the loop. This is reflective of how the command would run normally. So I thought to continue the process, I want to run with &.
The problem is that this will become exponentially messier over time (since none of the processes are stopping). So what I'm looking for is a way to parse the PID output of the & command into a variable, and then I will add a kill command at the start of the loop pointed at that variable.
Figuring out how to parse the output of commands will also be useful in the other part of my project, which is to terminate the while loop based on a particular % free in a df -h for a select partition
No parsing needed. The PID of the most recent background process is stored in $!.
command & # run command in background
pid=$! # save pid as $pid
...
kill $pid # kill command
Related
I came across this shell script
bash# while true; do
vmtouch -m 10000000000 -l *head* & sleep 10m
kill %vmtouch
done
and wonder how does the kill %vmtouch portion work?
I normally pass a pid to kill a process but how does %vmtouch resolve to a pid?
I tried to run portions of script seperately but I got
-bash: kill: %vmtouch: no such job error.
%something is not a general shell script feature, but syntax used by the kill, fg and bg builtin commands to identify jobs. It searches the list of the shell's active jobs for the given string, and then signals that.
Here's man bash searching for /jobspec:
The character % introduces a job specification (jobspec).
Job number n may be referred to as %n. A job may also be referred to using a prefix of the name used to start it, or using a substring that appears in its command line. [...]
So if you do:
sleep 30 &
cat &
You can use things like %sleep or %sl to conveniently refer to the last one without having to find or remember its pid or job number.
You should look at the Job control section of the man bash page. The character % introduces a job specification (jobspec). Ideally when you have started this background job, you should have seen an entry in the terminal
[1] 25647
where 25647 is some random number I used. The line above means that the process id of the last backgrounded job (on a pipeline, the process id of the last process) is using job number as 1.
The way you are using the job spec is wrong in your case as it does not take process name of the background job. The last backgrounded is referred to as %1, so ideally your kill command should have been written as below, which is the same as writing kill 25647
vmtouch -m 10000000000 -l *head* & sleep 10m
kill %1
But that said, instead of relying the jobspec ids, you can access the process id of the background job which is stored in a special shell variable $! which you can use as
vmtouch -m whatever -l *head* & vmtouch_pid=$!
sleep 10m
kill "$vmtouch_pid"
See Job Control Basics from the GNU bash man page.
How do I start multiple processes in bash and time how long they take?
From this question I know how to start multiple processes in a bash script but using time script.sh doesn't work because the processes spawned end after the script ends.
I tried using wait but that didn't change anything.
Here is the script in its entirety:
for i in `seq $1`
do
( ./client & )
done
wait # This doesn't seem to change anything
I'm trying to get the total time for all the processes to finish and not the time for each process.
Why the parentheses around client invocation? That's going to run the command in a subshell. Since the background job isn't in the top level shell, that's why the wait is ineffective (there's no jobs in this shell to wait for).
Then you can add time back inside the for loop and it should work.
What I usually do is pause my script, run it in the background and then disown it like
./script
^Z
bg
disown
However, I would like to be able to cancel my script at any time. If I have a script that runs indefinitely, I would like to be able to cancel it after a few hours or a day or whenever I feel like cancelling it.
Since you are having a bit of trouble following along, let's see if we can keep it simple for you. (this presumes you can write to /tmp, change as required). Let's start your script in the background and create a PID file containing the PID of its process.
$ ./script & echo $! > /tmp/scriptPID
You can check the contents of /tmp/scriptPID
$ cat /tmp/scriptPID
######
Where ###### is the PID number of the running ./script process. You can further confirm with pidof script (which will return the same ######). You can use ps aux | grep script to view the number as well.
When you are ready to kill the ./script process, you simply pass the number (e.g. ######) to kill. You can do that directly with:
$ kill $(</tmp/scriptPID)
(or with the other methods listed in my comment)
You can add rm /tmp/scriptPID to remove the pid file after killing the process.
Look things over and let me know if you have any further questions.
Ive got a script that takes a quite a long time to run, as it has to handle many thousands of files. I want to make this script as fool proof as possible. To this end, I want to check if the user ran the script using nohup and '&'. E.x.
me#myHost:/home/me/bin $ nohup doAlotOfStuff.sh &. I want to make 100% sure the script was run with nohup and '&', because its a very painful recovery process if the script dies in the middle for whatever reason.
How can I check those two key paramaters inside the script itself? and if they are missing, how can I stop the script before it gets any farther, and complain to the user that they ran the script wrong? Better yet, is there way I can force the script to run in nohup &?
Edit: the server enviornment is AIX 7.1
The ps utility can get the process state. The process state code will contain the character + when running in foreground. Absence of + means code is running in background.
However, it will be hard to tell whether the background script was invoked using nohup. It's also almost impossible to rely on the presence of nohup.out as output can be redirected by user elsewhere at will.
There are 2 ways to accomplish what you want to do. Either bail out and warn the user or automatically restart the script in background.
#!/bin/bash
local mypid=$$
if [[ $(ps -o stat= -p $mypid) =~ "+" ]]; then
echo Running in foreground.
exec nohup $0 "$#" &
exit
fi
# the rest of the script
...
In this code, if the process has a state code +, it will print a warning then restart the process in background. If the process was started in the background, it will just proceed to the rest of the code.
If you prefer to bailout and just warn the user, you can remove the exec line. Note that the exit is not needed after exec. I left it there just in case you choose to remove the exec line.
One good way to find if a script is logging to nohup, is to first check that the nohup.out exists, and then to echo to it and ensure that you can read it there. For example:
echo "complextag"
if ( $(cat nohup.out | grep "complextag" ) != "complextag" );then
# various commands complaining to the user, then exiting
fi
This works because if the script's stdout is going to nohup.out, where they should be going (or whatever out file you specified), then when you echo that phrase, it should be appended to the file nohup.out. If it doesn't appear there, then the script was nut run using nohup and you can scold them, perhaps by using a wall command on a temporary broadcast file. (if you want me to elaborate on that I can).
As for being run in the background, if it's not running you should know by checking nohup.
I have a bash script with a loop that calls a hard calculation routine every iteration. I use the results from every calculation as input to the next. I need make bash stop the script reading until every calculation is finished.
for i in $(cat calculation-list.txt)
do
./calculation
(other commands)
done
I know the sleep program, and i used to use it, but now the time of the calculations varies greatly.
Thanks for any help you can give.
P.s>
The "./calculation" is another program, and a subprocess is opened. Then the script passes instantly to next step, but I get an error in the calculation because the last is not finished yet.
If your calculation daemon will work with a precreated empty logfile, then the inotify-tools package might serve:
touch $logfile
inotifywait -qqe close $logfile & ipid=$!
./calculation
wait $ipid
(edit: stripped a stray semicolon)
if it closes the file just once.
If it's doing an open/write/close loop, perhaps you can mod the daemon process to wrap some other filesystem event around the execution? `
#!/bin/sh
# Uglier, but handles logfile being closed multiple times before exit:
# Have the ./calculation start this shell script, perhaps by substituting
# this for the program it's starting
trap 'echo >closed-on-calculation-exit' 0 1 2 3 15
./real-calculation-daemon-program
Well, guys, I've solved my problem with a different approach. When the calculation is finished a logfile is created. I wrote then a simple until loop with a sleep command. Although this is very ugly, it works for me and it's enough.
for i in $(cat calculation-list.txt)
do
(calculations routine)
until [[ -f $logfile ]]; do
sleep 60
done
(other commands)
done
Easy. Get the process ID (PID) via some awk magic and then use wait too wait for that PID to end. Here are the details on wait from the advanced Bash scripting guide:
Suspend script execution until all jobs running in background have
terminated, or until the job number or process ID specified as an
option terminates. Returns the exit status of waited-for command.
You may use the wait command to prevent a script from exiting before a
background job finishes executing (this would create a dreaded orphan
process).
And using it within your code should work like this:
for i in $(cat calculation-list.txt)
do
./calculation >/dev/null 2>&1 & CALCULATION_PID=(`jobs -l | awk '{print $2}'`);
wait ${CALCULATION_PID}
(other commands)
done