Assigning Process IDs to a variable in a shell script - shell

I want to assign all process ids to a variable.
For example the result for
pgrep abc
29845
29846
I want to assign these 2 ids to a variable like this
a = '29845 29845'.
The variable a should contain the 2 process ids separated by a space.
The whole purpose of this is to kill all the process ids
Thanks

Some like this
cat file
29845
29846
var=$(awk '{printf "%s ",$1}' t)
echo $var
29845 29846
You may skip grep and only use awk

I tested the commands by starting sleep 320 & three times.
You can assign the output of a command like this:
procs=$(pgrep sleep | tr '\n' ' ')
When you want to kill the processen, consider
pgrep sleep | xargs kill -9
or
pkill sleep

Related

Efficiently find PIDs of many processes started by services

I have a file with many service names, some of them are running, some of them aren't.
foo.service
bar.service
baz.service
I would like to find an efficient way to get the PIDs of the running processes started by the services (for the not running ones a 0, -1 or empty results are valid).
Desired output example:
foo.service:8484
bar.server:
baz.service:9447
(bar.service isn't running).
So far I've managed to do the following: (1)
cat t.txt | xargs -I {} systemctl status {} | grep 'Main PID' \
| awk '{print $3}'
With the following output:
8484
9447
But I can't tell which service every PID belongs to.
(I'm not bound to use xargs, grep or awk.. just looking for the most efficient way).
So far I've managed to do the following: (2)
for f in `cat t.txt`; do
v=`systemctl status $f | grep 'Main PID:'`;
echo "$f:`echo $v | awk '{print \$3}'`";
done;
-- this gives me my desired result. Is it efficient enough?
I ran into similar problem and fount leaner solution:
systemctl show --property MainPID --value $SERVICE
returns just the PID of the service, so your example can be simplified down to
for f in `cat t.txt`; do
echo "$f:`systemctl show --property MainPID --value $f`";
done
You could also do:
while read -r line; do
statuspid="$(sudo service $line status | grep -oP '(?<=(process|pid)\s)[0-9]+')"
appendline=""
[[ -z $statuspid ]] && appendline="${line}:${statuspid}" || appendline="$line"
"$appendline" >> services-pids.txt
done < services.txt
To use within a variable, you could also have an associative array:
declare -A servicearray=()
while read -r line; do
statuspid="$(sudo service $line status | grep -oP '(?<=(process|pid)\s)[0-9]+')"
[[ -z $statuspid ]] && servicearray[$line]="statuspid"
done < services.txt
# Echo output of array to command line
for i in "${!servicearray[#]}"; do # Iterating over the keys of the associative array
# Note key-value syntax
echo "service: $i | pid: ${servicearray[$i]}"
done
Making it more efficient:
To list all processes with their execution commands and PIDs. This may give us more than one PID per command, which might be useful:
ps -eo pid,comm
So:
psidcommand=$(ps -eo pid,comm)
while read -r line; do
# Get all PIDs with the $line prefixed
statuspids=$(echo $psidcommand | grep -oP '[0-9]+(?=\s$line)')
# Note that ${statuspids// /,} replaces space with commas
[[ -z $statuspids ]] && appendline="${line}:${statuspids// /,}" || appendline="$line"
"$appendline" >> services-pids.txt
done < services.txt
OUTPUT:
kworker:5,23,28,33,198,405,513,1247,21171,22004,23749,24055
If you're confident your file has the full name of the process, you can replace the:
grep -oP '[0-9]+(?=\s$line)'
with
grep -oP '[0-9]+(?=\s$line)$' # Note the extra "$" at the end of the regex
to make sure it's an exact match (in the grep without trailing $, line "mys" would match with "mysql"; in the grep with trailing $, it would not, and would only match "mysql").
Building up on Yorik.sar's answer, you first want to get the MainPID of a server like so:
for SERVICE in ...<service names>...
do
MAIN_PID=`systemctl show --property MainPID --value $SERVICE`
if test ${MAIN_PID} != 0
than
ALL_PIDS=`pgrep -g $MAIN_PID`
...
fi
done
So using systemctl gives you the PID of the main process controlled by your daemon. Then the pgrep gives you the daemon and a list of all the PIDs of the processes that daemon started.
Note: if the processes are user processes, you have to use the --user on the systemctl command line for things to work:
MAIN_PID=`systemctl --user show --property MainPID --value $SERVICE`
Now you have the data you are interested in the MAIN_PID and ALL_PIDS variables, so you can print the results like so:
if test -n "${ALL_PID}"
then
echo "${SERVICE}: ${ALL_PIDS}"
fi

integer expression expected [bash does not understand .]

I made a small script to kill PID's if they exceed expected cpu usage. It works, but there is a small problem.
Script:
while [ 1 ];
do
cpuUse=$(ps -eo %cpu | sort -nr | head -1)
cpuMax=80
PID=$(ps -eo %cpu,pid | sort -nr | head -1 | cut -c 6-20)
if [ $cpuUse -gt $cpuMax ] ; then
kill -9 "$PID"
echo Killed PID $PID at the usage of $cpuUse out of $cpuMax
fi
exit 0
sleep 1;
done
It works if the integer is three digits long but fails if it drops to two and displays this:
./kill.sh: line 7: [: 51.3: integer expression expected
My question here is, how do I make bash understand the divider so it can kill processes under three digits.
You are probably getting leading space in that variable. Try piping with tr to strip all spaces first:
cpuUse=$(ps -eo %cpu | sort -nr | head -1 | tr -d '[[:space:]]')
Remove text after dot from cpuUse variable:
cpuUse="${cpuUse%%.*}"
Also better to use quotes in if condition:
if [ "$cpuUse" -gt "$cpuMax" ] ; then
OR better use arithmetic operator (( and )):
if (( cpuUse > cpuMax )); then
As you see, bash doesn't grok non-integer numbers. You need to eliminate the decimal point and the following digits from $cpuUse before doing the comparison":
cpuUse=$(sed 's/\..*/' <<<$cpuUse)
However, this is really a job for awk. It will simplify much of what you're doing. Whenever you find yourself with greps of greps, or head and then cuts, you should be dealing with awk. Awk can easily combine these multiple piped seds, greps, cuts, heads, into a single command.
By the way, the correct ps command is:
$ ps -eocpu="",pid=""
Using the ="" will eliminate the heading and simply give you the CPU and PID.
Looking at your program, there's no real need to sort. You're simply looking for all processes above that $cpuMax threshold:
ps -eo %cpu="",pid="" | awk '$1 > 80 {print $2}'
That prints out your PIDs which are over your threshold. Awk automatically loop through your entire input line-by-line. Awk also automatically divides each line into columns, and assigns each a variable from $1 and up. You can change the field divider with the -F parameter.
The above awk says look for all lines where the first column is above 80%, (the CPU usage) and print out the second column (the pid).
If you want some flexibility and be able to pass in different $cpuMax, you can use the -v parameter to set Awk variables:
ps -eo %cpu="",pid="" | awk -vcpuMax=$cpuMax '$1 > cpuMax {print $2}'
Now that you can pipe the output of this command into a while to delete all those processes:
pid=$(ps -eo %cpu="",pid="" | awk -vcpuMax=$cpuMax '$1 > cpuMax {print $2}')
if [[ -n $pid ]]
then
kill -9 $pid
echo "Killed the following processes:" $pid
fi

setting variables in bash script

I have the following command:
echo "exec [loc_ver].[dbo].[sp_RptEmpCheckInOutMissingTime_2]" |
/home/mdland_tool/common/bin/run_isql_prod_rpt_2.sh |
grep "^| Row_2" |
awk '{print $15 }'
which only works with echo in the front. I tried to set this line into a variable. I've tried quotations marks, parenthesis, and back ticks, with no luck.
May anyone tell me the correct syntax for setting this into a variable?
If you want more columns store in an array you should use this syntax (it is also good if you have only one result):
#!/bin/bash
result=( $( echo "exec [loc_ver].[dbo].[sp_RptEmpCheckInOutMissingTime_2]" |
/home/mdland_tool/common/bin/run_isql_prod_rpt_2.sh |
grep "^| Row_2" |
awk '{print $15 }' ) )
$result=$(exec [loc_ver].[dbo].[sp_RptEmpCheckInOutMissingTime_2]" | /home/mdland_tool/common/bin/run_isql_prod_rpt_2.sh | grep "^| Row_2" | awk '{print $15 })
Since you asked for both meanings of your question:
First, imagine a simpler case:
echo asdf
If you want to execute this command and store the result somewhere, you do the following:
$(echo asdf)
For example:
variable=$(echo asdf)
# or
if [ "$(echo asdf)" == "asdf" ]; then echo good; fi
So generally, $(command) executes command and returns the output.
If you want to store the text of the command itself, you can, well, just do that:
variable='echo asdf'
# or
variable="echo asdf"
# or
variable=echo\ asdf
Different formats depending on the content of your command. So now once you have the variable storing the command, you can simply execute the command with the $(command) syntax. However, pay attention that the command itself is in a variable. Therefore, you would execute the command by:
$($variable)
As a complete example, let's combine both:
command="echo asdf"
result=$($command)
echo $result

Why is my code not working as I want it to?

I have this code:
total=0;
ps -u $(whoami) --no-headers | awk {'print $1'} | while read line;
do vrednost=$(pmap $line | tail -n1 | column -t | cut -d" " -f3 | tr "K" " ");
total=$(( vrednost + total ))
echo $total
done
echo total: $total
As you can see, my code sums usage of all my processes. When I echo my total every time in while, it is working ok, but at the end... When i want total to be a value (echo total: $total) it is still zero. but before (in while) has right value.
BASH FAQ #24: "I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?"
#!/bin/bash
while read ...
do
...
done < <(ps ...)
Okay, pick and choose. You can either do it in BASH or AWK, but don't do both. You've seen a BASH example, here's an AWK example:
ps -e -o user -o vsz | awk -v USER="$(whoami)" '
BEGIN {TOTAL = 0}
END {print "Total is " TOTAL}
{
if ($1 == USER) {
TOTAL += $2
}
}
'
Awk is like a programming language that assumes a loop (like perl -n) and processes each line in the file. Each field (normally separated by whitespace) is given a $ variable. The first is $1, the second is $2, etc.
The -v option allows me to define an awk variable (in this case USER) before I run awk.
The BEGIN line is what I want to do before I run my awk script. In this case, initialize TOTAL to zero. (NOTE: This really isn't necessary since undefined variables automatically are given a value of zero). The END line is what I want to do afterwards. In this case, print out my total.
So, if the first field ($1) is equal to my user, I'll add the second field (the vsize) to my total.
All Awk programs are surrounded by {...} and they usually have single quotes around them to prevent shell interpolation of $1, etc.
Try this
#!/bin/bash
total=0;
for line in `ps -u $(whoami) --no-headers | awk {'print $1'}`;
do
vrednost=$(pmap $line | tail -n1 | column -t | cut -d" " -f3 | tr "K" " ");
total=$(( $vrednost + $total ))
echo $total
done
echo "total: $total"
Ignacio's answer is fine, but process substitution is not portable. And there is a simpler solution. You need to echo the total in the same subshell in which it is calculated:
... | { while read line; do ...; done; echo total: $total; }
Let's cut down on the number of extra processes you need :)
declare -i total=0
for size in $( ps -u $(whoami) --no-header -o vsz ); do
total+=$size
done
echo $total
First, use various options for ps to generate the desired list of process sizes, in kilobytes. Iterate over that list using a bash for-loop, keeping a running total in a parameter declared with the 'integer' attribute for easy arithmetic. The desired sum is now in total, ready for whatever use you need. The sum includes the memory used by the ps process itself.
Using while (Dennis' suggestion) and avoiding process substitution (William's suggestion):
ps -u $(whoami) --no-header -o vsz | {
while read -r var; do
((total+=$var))
done
echo $total
}
(For real one-liner, here's a dc command that I borrowed from https://stackoverflow.com/a/453290/1126841:
ps -u $(whoami) --no-header -o vsz | dc -f - -e '[+z1<r]srz1<rp'
This sum includes the memory used by the ps and dc commands themselves.)

How to get the PID of a process in a pipeline

Consider the following simplified example:
my_prog|awk '...' > output.csv &
my_pid="$!" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my_pid #my_prog still has data in its buffer that awk never saw. Data is lost!
In bash, $my_pid points to the PID for awk. However, I need the PID for my_prog. If I kill awk, my_prog does not know to flush it's output buffer and data is lost. So, how would one obtain the PID for my_prog? Note that ps aux|grep my_prog will not work since there may be several my_prog's going.
NOTE: changed cat to awk '...' to help clarify what I need.
Just had the same issue. My solution:
process_1 | process_2 &
PID_OF_PROCESS_2=$!
PID_OF_PROCESS_1=`jobs -p`
Just make sure process_1 is the first background process. Otherwise, you need to parse the full output of jobs -l.
I was able to solve it with explicitly naming the pipe using mkfifo.
Step 1: mkfifo capture.
Step 2: Run this script
my_prog > capture &
my_pid="$!" #Now, I have the PID for my_prog!
awk '...' capture > out.csv &
sleep 10
kill $my_pid #kill my_prog
wait #wait for awk to finish.
I don't like the management of having a mkfifo. Hopefully someone has an easier solution.
Here is a solution without wrappers or temporary files. This only works for a background pipeline whose output is captured away from stdout of the containing script, as in your case. Suppose you want to do:
cmd1 | cmd2 | cmd3 >pipe_out &
# do something with PID of cmd2
If only bash could provide ${PIPEPID[n]}!! The replacement "hack" that I found is the following:
PID=$( { cmd1 | { cmd2 0<&4 & echo $! >&3 ; } 4<&0 | cmd3 >pipe_out & } 3>&1 | head -1 )
If needed, you can also close the fd 3 (for cmd*) and fd 4 (for cmd2) with 3>&- and 4<&-, respectively. If you do that, for cmd2 make sure you close fd 4 only after you redirect fd 0 from it.
Add a shell wrapper around your command and capture the pid. For my example I use iostat.
#!/bin/sh
echo $$ > /tmp/my.pid
exec iostat 1
Exec replaces the shell with the new process preserving the pid.
test.sh | grep avg
While that runs:
$ cat my.pid
22754
$ ps -ef | grep iostat
userid 22754 4058 0 12:33 pts/12 00:00:00 iostat 1
So you can:
sleep 10
kill `cat my.pid`
Is that more elegant?
Improving #Marvin's and #Nils Goroll's answers with a oneliner that extract the pids for all commands in the pipe into a shell array variable:
# run some command
ls -l | rev | sort > /dev/null &
# collect pids
pids=(`jobs -l % | egrep -o '^(\[[0-9]+\]\+| ) [ 0-9]{5} ' | sed -e 's/^[^ ]* \+//' -e 's! $!!'`)
# use them for something
echo pid of ls -l: ${pids[0]}
echo pid of rev: ${pids[1]}
echo pid of sort: ${pids[2]}
echo pid of first command e.g. ls -l: $pids
echo pid of last command e.g. sort: ${pids[-1]}
# wait for last command in pipe to finish
wait ${pids[-1]}
In my solution ${pids[-1]} contains the value normally available in $!. Please note the use of jobs -l % which outputs just the "current" job, which by default is the last one started.
Sample output:
pid of ls -l: 2725
pid of rev: 2726
pid of sort: 2727
pid of first command e.g. ls -l: 2725
pid of last command e.g. sort: 2727
UPDATE 2017-11-13: Improved the pids=... command that works better with complex (multi-line) commands.
Based on your comment, I still can't see why you'd prefer killing my_prog to having it complete in an orderly fashion. Ten seconds is a pretty arbitrary measurement on a multiprocessing system whereby my_prog could generate 10k lines or 0 lines of output depending upon system load.
If you want to limit the output of my_prog to something more determinate try
my_prog | head -1000 | awk
without detaching from the shell. In the worst case, head will close its input and my_prog will get a SIGPIPE. In the best case, change my_prog so it gives you the amount of output you want.
added in response to comment:
In so far as you have control over my_prog give it an optional -s duration argument. Then somewhere in your main loop you can put the predicate:
if (duration_exceeded()) {
exit(0);
}
where exit will in turn properly flush the output FILEs. If desperate and there is no place to put the predicate, this could be implemented using alarm(3), which I am intentionally not showing because it is bad.
The core of your trouble is that my_prog runs forever. Everything else here is a hack to get around that limitation.
With inspiration from #Demosthenex's answer: using subshells:
$ ( echo $BASHPID > pid1; exec vmstat 1 5 ) | tail -1 &
[1] 17371
$ cat pid1
17370
$ pgrep -fl vmstat
17370 vmstat 1 5
My solution was to query jobs and parse it using perl.
Start two pipelines in the background:
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
Query background jobs:
$ jobs
[1]- Running sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
[2]+ Running sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
$ jobs -l
[1]- 6108 Running sleep 600
6109 | sleep 600
6110 | sleep 600
6111 | sleep 600
6112 | sleep 600 &
[2]+ 6114 Running sleep 600
6115 | sleep 600
6116 | sleep 600
6117 | sleep 600
6118 | sleep 600 &
Parse the jobs list of the second job %2. The parsing is probably error prone, but in these cases it works. We aim to capture the first number followed by a space. It is stored into the variable pids as an array using the parenthesis:
$ pids=($(jobs -l %2 | perl -pe '/(\d+) /; $_=$1 . "\n"'))
$ echo $pids
6114
$ echo ${pids[*]}
6114 6115 6116 6117 6118
$ echo ${pids[2]}
6116
$ echo ${pids[4]}
6118
And for the first pipeline:
$ pids=($(jobs -l %1 | perl -pe '/(\d+) /; $_=$1 . "\n"'))
$ echo ${pids[2]}
6110
$ echo ${pids[4]}
6112
We could wrap this into a little alias/function:
function pipeid() { jobs -l ${1:-%%} | perl -pe '/(\d+) /; $_=$1 . "\n"'; }
$ pids=($(pipeid)) # PIDs of last job
$ pids=($(pipeid %1)) # PIDs of first job
I have tested this in bash and zsh. Unfortunately, in bash I could not pipe the output of pipeid into another command. Probably because that pipeline is ran in a sub shell not able to query the job list??
I was desperately looking for good solution to get all the PIDs from a pipe job, and one promising approach failed miserably (see previous revisions of this answer).
So, unfortunately, the best I could come up with is parsing the jobs -l output using GNU awk:
function last_job_pids {
if [[ -z "${1}" ]] ; then
return
fi
jobs -l | awk '
/^\[/ { delete pids; pids[$2]=$2; seen=1; next; }
// { if (seen) { pids[$1]=$1; } }
END { for (p in pids) print p; }'
}

Resources