Memory usage in a bash script - bash

I've developed a bash script which is causing an anomalous memory usage when looping over a great number of files. After some time, all memory is exhausted and system starts swapping, so that it becomes unusable.
After having put some sentinels around the code, I think that function which is causing the issue is _do_cmd(), which is included in the following simplified script.
#!/bin/bash
WORKINGDIR=$(dirname "$0")
SCRIPT=$(basename $0)
LOGFILE=$WORKINGDIR/test-log.txt
FILELIST=$WORKINGDIR/file.list
INDIR=/media/data/incoming
OUTDIR=$WORKINGDIR/Foto/copied
_log() {
echo -e "[$(date +"%Y-%m-%d %H:%M:%S")]: $*" >> $LOGFILE
}
_do_cmd() {
local LOGFILE_TMP="$LOGFILE.tmp"
exec 2> >(tee -a $LOGFILE_TMP)
_log " ACTION: $#"
"$#"
ret=$?
if [[ $ret -ne 0 ]]; then
_log "ERROR: Return code $ret"
grep -v "frame= " "$LOGFILE_TMP" >> $LOGFILE
rm -f $LOGFILE_TMP
exit $ret
fi
if [ -f $LOGFILE_TMP ]; then rm $LOGFILE_TMP; fi
}
while read F
do
echo "Before: $(ps -ef | grep $SCRIPT | wc -l)"
FILE=$(basename $F)
_do_cmd cp "$INDIR/$FILE" "$OUTDIR"
echo "After: $(ps -ef | grep $SCRIPT | wc -l)"
done < $FILELIST
When I run the script, I see an output like the following one:
$ ./test-mem.sh
Before: 3
After: 4
Before: 4
After: 5
Before: 5
After: 6
Before: 6
After: 7
Before: 7
After: 8
Before: 8
After: 9
Before: 9
After: 10
Before: 10
After: 11
Before: 11
After: 12
Before: 12
After: 13
Before: 13
After: 14
Before: 14
After: 15
Before: 15
After: 16
Before: 16
After: 17
Before: 17
After: 18
Before: 18
After: 19
Before: 19
After: 20
Before: 20
After: 21
^C
Looking at running processes during the execution, I find that number of instances of my script constantly grows during the execution:
$ watch -n 1 "ps -ef | grep test-mem.sh"
Every 1,0s: ps -ef | grep test-mem.sh Wed Apr 4 10:23:32 2018
user 4117 4104 0 10:23 pts/1 00:00:00 watch -n 1 ps -ef | grep test-mem.sh
user 4877 1309 11 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4885 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4899 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4913 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4927 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4941 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4955 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4969 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4983 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 4997 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5011 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5025 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5043 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5057 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5071 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5085 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5099 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5113 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5127 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5141 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5155 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5169 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5183 4877 0 10:23 pts/0 00:00:00 /bin/bash ./test-mem.sh
user 5304 4117 0 10:23 pts/1 00:00:00 watch -n 1 ps -ef | grep test-mem.sh
user 5305 5304 0 10:23 pts/1 00:00:00 sh -c ps -ef | grep test-mem.sh
user 5307 5305 0 10:23 pts/1 00:00:00 grep test-mem.sh
Function _do_cmd() has the purpose of running a command, capturing its error output and, only in case of an error, store it to the log file and exit to shell.
Can anybody help me to understand why after every _do_cmd execution I have a new instance of test-mem.sh running in the system?
Thanks in advance.

Ok, I've found a solution.
Using this reviewed function, "Before" and "After" values remain stable and memory usage stops from increasing more and more.
_do_cmd() {
local LOGFILE_TMP="$LOGFILE.tmp"
_log " ACTION: $#"
"$#" 2> >(tee -a $LOGFILE_TMP)
ret=$?
if [[ $ret -ne 0 ]]; then
_log "ERROR: Return code $ret"
grep -v "frame= " "$LOGFILE_TMP" >> $LOGFILE
rm -f $LOGFILE_TMP
exit $ret
fi
if [ -f $LOGFILE_TMP ]; then rm $LOGFILE_TMP; fi
}
The issue was probably due to my exec command usage, which left the process substitution running in background.

Related

grep only files generated in a particular hour

I am trying to grep some pattern in a file set under a folder like below
Where on the output I have to perform remaining operation.
The output main.log is coming so huge almost 50k lines ,as the files starting with server02.log are almost 30 to 40 in number . The script based on this output is taking forever to complete.
Is there a way that I can only take files name starting with server02.log. and generated between time
20:00:00 and 21:00:00
ls -lrth server02.log.*
-rw-r--r-- 1 user user 1.9M Apr 15 20:20 server02.log.2020
-rw-r--r-- 1 user user 1.7M Apr 15 20:30 server02.log.2030
-rw-r--r-- 1 user user 1.6M Apr 15 20:41 server02.log.2041
-rw-r--r-- 1 user user 1.9M Apr 15 20:50 server02.log.2050
-rw-r--r-- 1 user user 2.1M Apr 15 21:00 server02.log.2100
-rw-r--r-- 1 user user 1.4M Apr 15 21:10 server02.log.2110
-rw-r--r-- 1 user user 1.9M Apr 15 21:20 server02.log.2120
-rw-r--r-- 1 user user 656K Apr 15 21:29 server02.log.2129
-rw-r--r-- 1 user user 4.6M Apr 15 21:40 server02.log.2140
-rw-r--r-- 1 user user 1.9M Apr 15 21:50 server02.log.2150
-rw-r--r-- 1 user user 1.7M Apr 15 21:59 server02.log.2159
-rw-r--r-- 1 user user 724K Apr 15 22:09 server02.log.2209
-rw-r--r-- 1 user user 1.3M Apr 15 22:20 server02.log.2220
-rw-r--r-- 1 user user 1.1M Apr 15 22:29 server02.log.2229
-rw-r--r-- 1 user user 1.7M Apr 15 22:41 server02.log.2241
-rw-r--r-- 1 user user 1.5M Apr 15 22:49 server02.log.2249
-rw-r--r-- 1 user user 2.4M Apr 15 23:01 server02.log.2301
-rw-r--r-- 1 user user 1.4M Apr 15 23:10 server02.log.2310
-rw-r--r-- 1 user user 585K Apr 15 23:19 server02.log.2319
-rw-r--r-- 1 user user 858K Apr 15 23:30 server02.log.2330
-rw-r--r-- 1 user user 892K Apr 15 23:40 server02.log.2340
-rw-r--r-- 1 user user 698K Apr 15 23:49 server02.log.2349
grep -E "###Update |###Initiate |###Re-Initiate " server02.log.* >> main.log
from the comments I made the change to my code as below
#!/bin/bash
DIR="."
d=$(date +%Y-%m-%d);
log_dir="logs/$d"
PREFIX="$log_dir/srv_02.log"
#PREFIX="srv_02.log"
echo "prefix value is $PREFIX"
START_HOUR="06"
for F in "$( find "$DIR" -name "${PREFIX}*" -printf '%Tc %p\n' | grep "\ ${START_HOUR}:" )"; do
echo "F value is $F"
grep -E "###Update |###Initiate |###Re-Initiate" "$F" >> main.log
done
error:
prefix value is logs/2021-04-16/srv_02.log
find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `logs/2021-04-16/srv_02.log*'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ `logs/2021-04-16/osbpd_srv_02.log*''.
F value is
grep: : No such file or directory
This solution looks for files in the given directory, created during the specified hour with names matching the given prefix.
#!/bin/bash
d=$(date +%Y-%m-%d)
DIR="logs/$d/$log_dir"
PREFIX="srv_02.log"
#PREFIX=server02.log
echo "prefix value is $PREFIX"
START_HOUR="06"
for F in "$( find "$DIR" -name "${PREFIX}*" -printf '%TY-%Tm-%Td\n' | grep "\ ${START_HOUR}:" )"; do
echo "$F"
# grep -E "###Update |###Initiate |###Re-Initiate Assignment Milestone|###Complete Assignment Milestone|###Cancel Assignment Milestone|###Suspend Assignment Milestone|###Resume Assignment Milestone" "$F" >> main.log
done

Jump to the top parent shell from any arbitrary depth of subshell

I created multiple subshells
$ ps -f
UID PID PPID C STIME TTY TIME CMD
501 2659 2657 0 8:22AM ttys000 0:00.15 -bash
501 2776 2659 0 8:23AM ttys000 0:00.02 bash
501 2778 2776 0 8:23AM ttys000 0:00.09 bash
501 3314 2778 0 9:13AM ttys000 0:00.26 bash
501 8884 3314 0 4:41PM ttys000 0:00.03 /bin/bash
501 8891 8884 0 4:41PM ttys000 0:00.01 /bin/bash
501 8899 8891 0 4:41PM ttys000 0:00.02 /bin/bash
501 423 408 0 7:16AM ttys001 0:00.22 -bash
501 8095 423 0 3:52PM ttys001 0:00.15 ssh root#www.****.com
501 8307 8303 0 4:05PM ttys002 0:00.17 -bash
I'd like to jump back the most top one, but have to try exit one by one
$ ps -f
UID PID PPID C STIME TTY TIME CMD
501 2659 2657 0 8:22AM ttys000 0:00.17 -bash
501 423 408 0 7:16AM ttys001 0:00.22 -bash
501 8095 423 0 3:52PM ttys001 0:00.15 ssh root#***.com
501 8307 8303 0 4:05PM ttys002 0:00.17 -bash
I checked there are 3 bashes left, so I continue,
$ exit
logout
Saving session...completed.
[Process completed]
Sad, it's the most cases I encounter, How could I jump to the top from arbitrary depth of subshells?

Optimizing vncscreenshot scripts

Good Day,
I'm using vncsnapshot http://vncsnapshot.sourceforge.net/ in debian 7 environment to capture screenshots of workstations to monitor staffs desktop activity. This captures screenshot via nmap and saves it to my desired location accessed via internal web-page.
I have scripts like this . The x.x.x.x is the ip-range of the network to capture all open workstations.
#!/bin/bash
nmap -v -p5900 --script=vnc-screenshot-it --script-args vnc-screenshot.quality=30 x.x.x.x
And set-up in crontab to run every 5 mins.
The server has too many running processes because of it. This is the sample of ps command
root 32696 0.0 0.0 4368 0 ? S Feb23 0:00 /bin/bash /var/www/vncsnapshot/.scripts/.account.sh
root 32708 0.0 0.0 14580 4 ? S Feb23 0:00 nmap -v -p5900,5901,5902 --script=vnc-screenshot-mb
root 32717 0.0 0.0 1952 60 ? S Apr10 0:00 sh -c vncsnapshot -cursor -quality 30 x.x.x.x
root 32719 0.0 0.1 11480 4892 ? S Apr10 0:00 vncsnapshot -cursor -quality 30 30 x.x.x.x /var/w
root 32720 0.0 0.0 1952 60 ? S Apr25 0:00 sh -c vncsnapshot -cursor -quality 30 30 x.x.x.x
root 32722 0.0 0.0 1952 4 ? Ss Feb09 0:00 /bin/sh -c /var/www/vncsnapshot/.scripts/.account.sh
root 32723 0.0 0.0 3796 140 ? S Apr25 0:00 vncsnapshot -cursor -quality 30 30 x.x.x.x /var/w
root 32730 0.0 0.0 1952 4 ? Ss Feb08 0:00 /bin/sh -c /var/www/vncsnapshot/.scripts/.account
root 32734 0.0 0.0 4364 0 ? S Feb08 0:00 /bin/bash /var/www/vncsnapshot/.scripts/.account.
root 32741 0.0 0.0 13700 4 ? S Feb08 0:00 nmap -v -p5900 --script=vnc-screenshot-account --
root 32755 0.0 0.0 1952 4 ? Ss Feb08 0:00 /bin/sh -c /var/www/vncsnapshot/.scripts/.account.sh
root 32757 0.0 0.0 1952 4 ? S Feb07 0:00 sh -c vncsnapshot -cursor -quality 30 30 x.x.x.x
root 32760 0.0 0.0 3796 0 ? S Feb07 0:00 vncsnapshot -cursor -quality 30 30 x.x.x.x /var/w
root 32762 0.0 0.0 4368 0 ? S Feb09 0:00 /bin/bash /var/www/vncsnapshot/.scripts/.account.sh
root 32764 0.0 0.0 4368 0 ? S Feb08 0:00 /bin/bash /var/www/vncsnapshot/.scripts/.account.sh
How can I optimize this set-up to close un-nessesary processes that are still running.
Thanks
I split the processes in two part: nmap that regularly scan the network and the vncsnapshot that grab screenshot of a list of previously scanned host.
In my opinion, in this way the things are cleaner.
i haven't test this code
#!/bin/bash
## capture the list of host with vnc port open
list=/dev/shm/list
port=5900
network=192.168.1.*
nmap -n -p${port} --open ${network} -oG - | grep 'open\/tcp' | awk '{print $2}' > ${list}
the other script, check if a process is alive with lock file and in case launch the grab command
#!/bin/bash
list=/dev/shm/list
run=/run/vncscreenshot/
mkdir -p ${run} &>/dev/null
cat ${list} |\
while read host
do
lock="${run}/${host}.lock"
test -e ${lock} && ps -p $(<${lock}) &>/dev/null && continue
vnc-screenshot-it vnc-screenshot.quality=30 ${host} &
echo $! > ${lock}
done

unable to see command arguments at OS level issued from exec.Command

My routine is supposed to spin 10 child processes from the same go executable binary (os.Args[0]), adding some command line arguments that are valid. All processes should live for a number of seconds, which is specified in one of the arguments.
func spinChildProcesses() {
cmdParts := make([]string, 4)
cmdParts[0] = "-c"
cmdParts[1] = os.Args[0]
cmdParts[2] = "--duration"
cmdParts[3] = "10000"
for i := 0; i < 10; i++ {
proc := exec.Command("bash", cmdParts...)
go proc.Start()
}
}
func main() {
# not showing code that parses duration arg
# create 10 child subprocesses
go spinChildProcesses()
// set a duration to the process and terminate
time.Sleep(time.Second * time.Duration(duration))
fmt.Println(" - process terminating normaly")
}
When the above is run, looking at OS level I can see the arguments are not carried out. Only the root process has the arguments which I typed:
ps -ef | grep my-test-pr
root 3806 14446 0 15:23 pts/1 00:00:00 ./my-test-pr --duration 10000
root 3810 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3811 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3813 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3814 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3818 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3823 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3824 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3829 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3836 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
root 3840 3806 0 15:23 pts/1 00:00:00 ./my-test-pr
Any idea why and how to ensure the arguments are passed to the children processes ?
The -c bash flag takes a single string argument to interpret. Since the argument to -c is only the string os.Args[0], that is all bash is executing, and the rest of the args are being ignored.
To provide the arguments to your binary to be executed by bash -c, join them into a single string:
var args []string
args = append(args, os.Args[0])
args = append(args, "--duration")
args = append(args, "10000")
for i := 0; i < 10; i++ {
proc := exec.Command("/bin/bash", "-c", stringsJoin(args, " "))
go proc.Start()
}
Or simply exec your binary directly without the extra shell.

How do I put an already running CHILD process under nohup

My question is very similar to that posted in: How do I put an already-running process under nohup?
Say I execute foo.sh from my command line, and it in turn executes another shell script, and so on. For example:
foo.sh
\_ bar.sh
\_ baz.sh
Now I press Ctrl+Z to suspend "foo.sh". It is listed in my "jobs -l".
How do I disown baz.sh so that it is no longer a grandchild of foo.sh? If I type "disown" then only foo.sh is disowned from its parent, which isn't exactly what i want. I'd like to kill off the foo.sh and bar.sh processes and only be left with baz.sh.
My current workaround is to "kill -18" (resume) baz.sh and go on with my work, but I would prefer to kill the aforementioned processes. Thanks.
Use ps to get the PID of bar.sh, and kill it.
imac:barmar $ ps -l -t p0 -ww
UID PID PPID F CPU PRI NI SZ RSS WCHAN S ADDR TTY TIME CMD
501 3041 3037 4006 0 31 0 2435548 760 - Ss 8c6da80 ttyp0 0:00.74 /bin/bash --noediting -i
501 68228 3041 4006 0 31 0 2435544 664 - S 7cbc2a0 ttyp0 0:00.00 /bin/bash ./foo.sh
501 68231 68228 4006 0 31 0 2435544 660 - S c135a80 ttyp0 0:00.00 /bin/bash ./bar.sh
501 68232 68231 4006 0 31 0 2435544 660 - S a64b7e0 ttyp0 0:00.00 /bin/bash ./baz.sh
501 68233 68232 4006 0 31 0 2426644 312 - S f9a1540 ttyp0 0:00.00 sleep 100
0 68243 3041 4106 0 31 0 2434868 480 - R+ a20ad20 ttyp0 0:00.00 ps -l -t p0 -ww
imac:barmar $ kill 68231
./foo.sh: line 3: 68231 Terminated ./bar.sh
[1]+ Exit 143 ./foo.sh
imac:barmar $ ps -l -t p0 -ww
UID PID PPID F CPU PRI NI SZ RSS WCHAN S ADDR TTY TIME CMD
501 3041 3037 4006 0 31 0 2435548 760 - Ss 8c6da80 ttyp0 0:00.74 /bin/bash --noediting -i
501 68232 1 4006 0 31 0 2435544 660 - S a64b7e0 ttyp0 0:00.00 /bin/bash ./baz.sh
501 68233 68232 4006 0 31 0 2426644 312 - S f9a1540 ttyp0 0:00.00 sleep 100
0 68248 3041 4106 0 31 0 2434868 480 - R+ 82782a0 ttyp0 0:00.00 ps -l -t p0 -ww

Resources