How to limit the memory usage of a script? - bash

I use a script that consumes a lot of ram memory to such an extent that it freezes my computer and gives an error.
How could you limit the memory usage of this particular script?
I use debian 9 (linux)
thanks.
this is the basic script
path="/home/xxx"
mesMenosUnDia=$(date +%m --date='-1 month')
fecha=$(date +"%Y-$mesMenosUnDia-%d")
echo "find"
list=$(find /home/xxx -type f)
listArray=($list)
for i in "${listArray[#]}"
do
onlyDate=$(echo $i | grep -P '\d{4}\-\d{2}\-\d{2}' -o)
if [[ $onlyDate < $fecha ]];then
rm $i
else
fi
done

You can limit the maximum memory to be used with ulimit. The drawback is that the script will hit the limit and die. But your computer will not hang, because when the script asks for too much memory, it will be killed.
You script is using too much memory because you are storing too many things in memory. Specifically, you grab the output of the find command into a variable and the you copy all of this data into an array, so the whole content gets duplicated.
Instead of keeping everything in memory, put it in disk.
path="/home/xxx"
mesMenosUnDia=$(date +%m --date='-1 month')
fecha=$(date +"%Y-$mesMenosUnDia-%d")
echo "find"
find /home/xxx -type f > tmp
for i in $(<tmp)
do
onlyDate=$(echo $i | grep -P '\d{4}\-\d{2}\-\d{2}' -o)
if [[ $onlyDate < $fecha ]];then
rm $i
else
fi
done
rm tmp

Related

Bash script that reads rsync progress and bails on the file if too slow

I have so little experience with bash scripting that it is laughable.
I have spent 3 days transferring files from a failing HDD (1 of 3 in an LVM) on my NAS to a new HDD. Most (percentage wise) of the files transfer fine, but many (thousands) are affected and instead of failing with an i/o error, they drop the speed down to agonizing rates.
I was using a simple cp command but then I switched to rsync and used the --progress option to at least be able to identify when this was happening.
Currently, I'm manually watching the screen (sucks when we're talking DAYS), ^C when there's a hangup, then copy the file name and paste it into an exclude file and restart rsync.
I NEED to automate this!
I know nothing about bash scripting, but I figure I can probably "watch" the standard output, parse the rate info and use some logic like this:
if rate is less than 5Mbps for 3 consecutive seconds, bail and restart
This is the rsync command I'm using:
rsync -aP --ignore-existing --exclude-from=EXCLUDE /mnt/olddisk/ /mnt/newdisk/
And here is a sample output from progress:
path/to/file.ext
3434,343,343 54% 144.61MB/s 0:00:05 (xfr#1, ir-chk=1024/1405)
So parse the 3rd column of the 2nd line and make sure it isn't too slow, if it is then kill the command, append the file name to EXCLUDE and give it another go.
Is that something someone can help me with?
This is a horrible approach, and I do not expect it to usefully solve your problem. However, the following is a literal answer to your question.
#!/usr/bin/env bash
[[ $1 ]] || {
echo "Usage: rsync -P --exclude=exclude-file ... | $0 exclude-file" >&2
exit 1
}
is_too_slow() {
local rate=$1
case $rate in
*kB/s) return 0 ;;
[0-4][.]*MB/s) return 0 ;;
*) return 1 ;;
esac
}
exclude_file=$1
last_slow_time=0
filename=
too_slow_count=0
while IFS=$'\n' read -r -d $'\r' -a pieces; do
for piece in "${pieces[#]}"; do
case $piece in
"sending incremental file list") continue ;;
[[:space:]]*)
read -r size pct rate time <<<"$piece"
if is_too_slow "$rate"; then
if (( last_slow_time == SECONDS )); then
continue # ignore multiple slow results in less than a second
fi
last_slow_time=$SECONDS
if (( ++too_slow_count > 3 )); then
echo "$filename" >>"$exclude_file"
exit 1
fi
else
too_slow_count=0
fi
;;
*) filename=$piece; too_slow_count=0 ;;
esac
done
done

Optimizing a script that lists available commands with manual pages

I'm using this script to generate a list of the available commands with manual pages on the system. Running this with time shows an average of about 49 seconds on my computer.
#!/usr/local/bin/bash
for x in $(for f in $(compgen -c); do which $f; done | sort -u); do
dir=$(dirname $x)
cmd=$(basename $x)
if [[ ! $(man --path "$cmd" 2>&1) =~ 'No manual entry' ]]; then
printf '%b\n' "${dir}:\n${cmd}"
fi
done | awk '!x[$0]++'
Is there a way to optimize this for faster results?
This is a small sample of my current output. The goal is to group commands by directory. This will later be fed into an array.
/bin: # directories generated by $dir
[ # commands generated by $cmd (compgen output)
cat
chmod
cp
csh
date
Going for a complete disregard of built-ins here. That's what which does, anyway. Script not thoroughly tested.
#!/bin/bash
shopt -s nullglob # need this for "empty" checks below
MANPATH=${MANPATH:-/usr/share/man:/usr/local/share/man}
IFS=: # chunk up PATH and MANPATH, both colon-deliminated
# just look at the directory!
has_man_p() {
local needle=$1 manp manpp result=()
for manp in $MANPATH; do
# man? should match man0..man9 and a bunch of single-char things
# do we need 'man?*' for longer suffixes here?
for manpp in "$manp"/man?; do
# assumption made for filename formats. section not checked.
result=("$manpp/$needle".*)
if (( ${#result[#]} > 0 )); then
return 0
fi
done
done
return 1
}
unset seen
declare -A seen # for deduplication
for p in $PATH; do
printf '%b:\n' "$p" # print the path first
for exe in "$p"/*; do
cmd=${exe##*/} # the sloppy basename
if [[ ! -x $exe || ${seen[$cmd]} == 1 ]]; then
continue
fi
seen["$cmd"]=1
if has_man_p "$cmd"; then
printf '%b\n' "$cmd"
fi
done
done
Time on Cygwin with a truncated PATH (the full one with Windows has too many misses for the original version):
$ export PATH=/usr/local/bin:/usr/bin
$ time (sh ./opti.sh &>/dev/null)
real 0m3.577s
user 0m0.843s
sys 0m2.671s
$ time (sh ./orig.sh &>/dev/null)
real 2m10.662s
user 0m20.138s
sys 1m5.728s
(Caveat for both versions: most stuff in Cygwin's /usr/bin comes with a .exe extension)

group by users usage on a shared folder in Red hat FS

History: I have a shared folder which can be access by all the users of the system. every one claims that they are not using much. So, I decided to check how much each user uses the shared folder.
I am able to get the total usage by du -sh <path/to/folder>. but not at the individual users level.
I think, I am thinking too much to get this done. probably, there might be straight forward way to get this done.
If somebody asked similar kind of question please share the URL.
Here's a couple of functions that may help
space() {
local user=$1
local space=0
local tmp=`mktemp`
find . -user $user -exec stat --printf="%s\n" {} \; 2>/dev/null >> $tmp
for size in `cat $tmp`; do ((space=space + size)); done
local humanized=`mb $space`
echo "`pwd` $user $humanized"
rm -f $tmp
}
mb() {
local orig=$1
if [[ $orig -gt $((2**20)) ]]; then
echo -n $(($orig / 2**20))
echo "mb"
else
echo -n $(($orig / 2**10))
echo "kb"
fi
}
Paste these into your shell and then call it on the command line like
$space <user>
it will print out all the file sizes to a temporary file and then add them all up. The mb function makes it human readable. When I run it I get
/home/me me 377mb
Compared with
du -sh .
399M .
Pretty close ;)

How to find latest modified files and delete them with SHELL code

I need some help with a shell code. Now I have this code:
find $dirname -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
This code finds duplicated files (with same content) in a given directory. What I need to do is to update it - find out latest (by date) modified file (from duplicated files list), print that file name and also give opportunity to delete that file in terminal.
Doing this in pure bash is a tad awkward, it would be a lot easier to write
this in perl or python.
Also, if you were looking to do this with a bash one-liner, it might be feasible,
but I really don't know how.
Anyhoo, if you really want a pure bash solution below is an attempt at doing
what you describe.
Please note that:
I am not actually calling rm, just echoing it - don't want to destroy your files
There's a "read -u 1" in there that I'm not entirely happy with.
Here's the code:
#!/bin/bash
buffer=''
function process {
if test -n "$buffer"
then
nbFiles=$(printf "%s" "$buffer" | wc -l)
echo "================================================================================="
echo "The following $nbFiles files are byte identical and sorted from oldest to newest:"
ls -lt -c -r $buffer
lastFile=$(ls -lt -c -r $buffer | tail -1)
echo
while true
do
read -u 1 -p "Do you wish to delete the last file $lastFile (y/n/q)? " answer
case $answer in
[Yy]* ) echo rm $lastFile; break;;
[Nn]* ) echo skipping; break;;
[Qq]* ) exit;;
* ) echo "please answer yes, no or quit";;
esac
done
echo
fi
}
find . -type f -exec md5sum '{}' ';' |
sort |
uniq --all-repeated=separate -w 33 |
cut -c 35- |
while read -r line
do
if test -z "$line"
then
process
buffer=''
else
buffer=$(printf "%s\n%s" "$buffer" "$line")
fi
done
process
echo "done"
Here's a "naive" solution implemented in bash (except for two external commands: md5sum, of course, and stat used only for user's comfort, it's not part of the algorithm). The thing implements a 100% Bash quicksort (that I'm kind of proud of):
#!/bin/bash
# Finds similar (based on md5sum) files (recursively) in given
# directory. If several files with same md5sum are found, sort
# them by modified (most recent first) and prompt user for deletion
# of the oldest
die() {
printf >&2 '%s\n' "$#"
exit 1
}
quicksort_files_by_mod_date() {
if ((!$#)); then
qs_ret=()
return
fi
# the return array is qs_ret
local first=$1
shift
local newers=()
local olders=()
qs_ret=()
for i in "$#"; do
if [[ $i -nt $first ]]; then
newers+=( "$i" )
else
olders+=( "$i" )
fi
done
quicksort_files_by_mod_date "${newers[#]}"
newers=( "${qs_ret[#]}" )
quicksort_files_by_mod_date "${olders[#]}"
olders=( "${qs_ret[#]}" )
qs_ret=( "${newers[#]}" "$first" "${olders[#]}" )
}
[[ -n $1 ]] || die "Must give an argument"
[[ -d $1 ]] || die "Argument must be a directory"
dirname=$1
shopt -s nullglob
shopt -s globstar
declare -A files
declare -A hashes
for file in "$dirname"/**; do
[[ -f $file ]] || continue
read md5sum _ < <(md5sum -- "$file")
files[$file]=$md5sum
((hashes[$md5sum]+=1))
done
has_found=0
for hash in "${!hashes[#]}"; do
((hashes[$hash]>1)) || continue
files_with_same_md5sum=()
for file in "${!files[#]}"; do
[[ ${files[$file]} = $hash ]] || continue
files_with_same_md5sum+=( "$file" )
done
has_found=1
echo "Found ${hashes[$hash]} files with md5sum=$hash, sorted by modified (most recent first):"
# sort them by modified date (using quicksort :p)
quicksort_files_by_mod_date "${files_with_same_md5sum[#]}"
for file in "${qs_ret[#]}"; do
printf " %s %s\n" "$(stat --printf '%y' -- "$file")" "$file"
done
read -p "Do you want to remove the oldest? [yn] " answer
if [[ ${answer,,} = y ]]; then
echo rm -fv -- "${qs_ret[#]:1}"
fi
done
if((!has_found)); then
echo "Didn't find any similar files in directory \`$dirname'. Yay."
fi
I guess the script is self-explanatory (you can read it like a story). It uses the best practices I know of, and is 100% safe regarding any silly characters in file names (e.g., spaces, newlines, file names starting with hyphens, file names ending with a newline, etc.).
It uses bash's globs, so it might be a bit slow if you have a bloated directory tree.
There are a few error checkings, but many are missing, so don't use as-is in production! (it's a trivial but rather tedious taks to add these).
The algorithm is as follows: scan each file in the given directory tree; for each file, will compute its md5sum and store in associative arrays:
files with keys the file names and values the md5sums.
hashes with keys the hashes and values the number of files the md5sum of which is the key.
After this is done, we'll scan through all the found md5sum, select only the ones that correspond to more than one file, then select all files with this md5sum, then quicksort them by modified date, and prompt the user.
A sweet effect when no dups are found: the script nicely informs the user about it.
I would not say it's the most efficient way of doing things (might be better in, e.g., Perl), but it's really a lot of fun, surprisingly easy to read and follow, and you can potentially learn a lot by studying it!
It uses a few bashisms and features that only are in bash version ≥ 4
Hope this helps!
Remark. If on your system date has the -r switch, you can replace the stat command by:
date -r "$file"
Remark. I left the echo in front of rm. Remove it if you're happy with how the script behaves. Then you'll have a script that uses 3 external commands :).

wrong output because of backgrounded processes

If I run the script with ./test.sh 100 I do not get the output 100 because I am using a thread. What do I have to do to get the expected output? (I must not change test.sh though.)
test.sh
#!/bin/bash
FILE="number.txt"
echo "0" > $FILE
for (( x=1; x<=$1; x++)); do
exec "./increment.sh" $FILE &
done
wait
cat $FILE
increment.sh
#!/bin/bash
value=(< "$1")
let value++
echo $value > "$1"
EDIT
Well I tried this:
#!/bin/bash
flock $1 --shared 2>/dev/null
value=(< "$1")
let value++
echo $value > "$1"
Now i get something like 98 99 all the time if I use ./test.sh 100
I is not working very well and I do not know how to fix it.
If test.sh really cannot be improved, then each instance of increment.sh must serialize it's own access to $FILE.
Filesystem locking is the obvious solution for this under UNIX. However, there is no shell builtin to accomplish this. Instead, you must rely on an external utility program like flock, setlock, or chpst -l|-L. For example:
#!/bin/bash
(
flock 100 # Lock *exclusively* (not shared)
value=(< "$1")
let value++
echo $value > "$1"
) 100>>"$1" # A note of caution
A note of caution: using the file you'll be modifying as a lockfile gets tricky quickly — it's easy to truncate in shell when you didn't mean to, and the mixing of access modes above might offend some people — but the above avoids gross mistakes.

Resources