See/Count how many subshells a script has opened - shell

Im having a conundrum with a script (or possible wsl2 memory leak).
I'm running a large script (that takes 0.67 seconds to loop)
My issue is that the loop time is slowly increasing, and so is the memory usage, so from 0.67 seconds / 0.9gig memory to 1.20 seconds / 1.7gig after a few hours.
If I restart (stop/start), the speed goes up again and the memory usage goes down to 0.9 again..
I'm suspecting that my script is leaving running subshells, and I'm wondering if there's anyway to see how many subshells that's currently running?
oh, I'm running this on win10 Wsl2 Ubuntu

Run ps and show only parent process ids and process ids. Pipe the output to awk, setting a variable pid to a given parent process id. Where the first space delimited field (parent process id) is equal to the passed pid, print the process id (field 2)
ps -eo ppid,pid | awk -v pid=<pid> '$1==pid { print $2 }'

Related

Why does bash "forget" about my background processes?

I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

Bash script to identify which processes are launched on system start-up

Need a script in Bash that identifies which are the processes lunched when system start-up and then:
Print them in the order in which they were launched;
Print them by ordering them according to the CPU consumption they have made.
What could be the solution?
I tried with the commands
ps -e -opid, lstart, cmd,% cpu
and various combinations, but nothing.

Understanding the behavior of processes - why all process run together and sleep together?

I have written a script to initiate multi-processing
for i in `seq 1 $1`
do
/usr/bin/php index.php name&
done
wait
A cron run every min - myscript.sh 3 now three background process get initiated and after some time I see list of process via ps command. I see all the processes are together in "Sleep" or "Running" mode...Now I wanted to achieve that when one goes to sleep other processes must process..how can I achieve it?. Or this is normal.
This is normal. A program that can run will be given time by the operating system... when possible. If all three are sleeping, then the system is most likely busy and time is being given to other processes.

Automatically identify (and kill) processes with long processing time

I'm running a script that daily downloads, builds and checks a program I'm contributing to. The "check" part implies performing a suit of test runs and comparing results with the reference.
As long as the program finishes (with or without errors), everything is OK (I mean that I can handle that). But in some cases some test run is apparently stuck in an infinite loop and I have to kill it. This is quite inconvenient for a job that's supposed to run unattended. If this happens at some point, the test will not progress any further and, worse, next day a new job will be launched, which might suffer the same problem.
Manually, I can identify the "stuck" process, for instance, with ps -u username, anything with more than, say, 15 minutes in the TIME column should be killed. Note that this is not just the "age" of the process, but the processing time used. I don't want to kill the wrapper script or the ssh session.
Before trying to write some complicated script that periodically runs ps -u username, parses the output and kills what needs to be killed, is there some easier or pre-cooked solution?
EDIT:
From the replies in the suggested thread, I have added this line to the user's crontab, which seems to work so far:
10,40 * * * * ps -eo uid,pid,time | egrep '^ *`id -u`' | egrep ' ([0-9]+-)?[0-9]{2}:[2-9][0-9]:[0-9]{2}' | awk '{print $2}' | xargs -I{} kill {}
It runs every half hour (at *:10 and *:40), identifies processes belonging to the user (id -u in backticks, because $UID is not available in dash) and with processing time longer than 20 minutes ([2-9][0-9]), and kills them.
The time parsing is not perfect, it would not catch processes that have been running for several hours and less than 20 minutes, but since it runs every 30 minutes that should not happen.

How to check CPU Utilization of system calls used in a shell script while script is executing?

I Have a shell script which uses couple of system calls (grep,ps etc). I need to find CPU utilization for each system call used inside a script. I am using AIX unix version 5.1.Please help.
I have already tried Topas, vmstat , iostat commands, but they display overall cpu utilization of processes.
use below commnad
ps -aef | grep "process_name"
there would be a column 'C' in ouptut, which display cpu utilization for that process.
Thanks,
Gopal
I'm not sure if it's available on AIX, but on Linux the time command is what you would use
time wc /etc/hosts
9 26 235 /etc/hosts
real 0m0.075s
user 0m0.002s
sys 0m0.004s
sys is the amount of system call time, user is not system call time used by the process

Resources