Reporting with cut and grep - bash

I'm trying to create a script that gets an extension and reports in two columns, the user and the amount of files that user owns with that extension.
The results must be printed in report.txt
Here is my code.
#!/bin/bash
#Uncoment to create /tmp/test/ with 50 txt files
#mkdir /tmp/test/
#touch /tmp/test/arch{01..50}.txt
clear
usage(){
echo "The script needs an extension to search"
echo "$0 <extension>"
}
if [ $# -eq 0 ]; then
usage
exit 1
fi
folder="/tmp/test/"
touch report.txt
count=0
pushd $folder
for file in $(ls -l); do
grep "*.$1" | cut -d " " -f3 >> report.txt
done
popd
The program just runs endlessly. And I'm not even counting the files for each user.
How can I solve this using only grep and cut?

With GNU stat :
stat -c '%U' *."$1" | sort | uniq -c | awk '{print $2,"\t",$1}' > report.txt
As pointed out by mklement0, under BSD/OSX you must use a -f option with stat :
stat -f '%Su' *."$1" | sort | uniq -c | awk '{print $2,"\t",$1}' > report.txt
Edit :
To process many files and avoid argument number limitation, you'd better use a printf piped to the stat command (thanks again mklement0) :
printf '%s\0' *."$1" | xargs -0 stat -c '%U' | sort | uniq -c | awk '{print $2,"\t",$1}'

You don't need a loop for this (unless you later need to loop over several folders), and changing the working directory in a script is rarely necessary. Also, reading ls output is usually not recommended.
Here's a version that replaces the loop, and uses du:
ext="$1"
printf "Folder '%s':\t" "$folder" >>report.txt
du -hc "$folder"/*."$ext" | sed -n '$p' >>report.txt

Related

Unexpected Error while Executing Simple grep Script

I'm trying to collect a line from a series of very long files. Unfortunately, I need to extract the same line from an identically named file in 1600 distinct directories. The directory structure is like this.
Directory jan10 contains both the executed bash script, and directories named 18-109. The directories 18-109 each contain directories named 18A, 18B, ..., 18H. Inside each of these directories is the file "target.out" that we want the information from. Here is the code that I wrote to access this information:
for i in $(cat ~/jan10/list.txt);
do
cd $i
cd *A
grep E-SUM-OVERALL target.out | cut -c 17-24 > ../overallenergy.out
cd ../*B
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*C
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*D
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*E
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*F
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*G
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*H
done
In this example, list.txt contains the numbers 18-109 each on a different line. An example of the "list.txt" is shown below:
17
18
19
20
21
22
23
24
25
Unexpectedly, this code simply won't work, it returns the error:
./testscript.sh: line 8: cd: 18: No such file or directory
./testscript.sh: line 11: cd: *A: No such file or directory
It returns this error for every numbered directory and every lettered sub-directory. Does anyone have any insight on what I've done wrong? I'll answer any questions, and I apologize again if this is unclear. The grep command by itself does work, so I imagine it's a problem with one of the "cd" commands, but I'm unsure. The code is being executed in the jan10 directory.
for Dir in $(cat ~/jan10/list.txt)
do
find "$Dir" -type f -name target.out |
while read File
do
grep E-SUM-OVERALL "$File" > "${File%/*/target.out}"/overallenergy.out
done
done
Now that I understand your requirement better (my fault), here's a more fleshed out solution.
prompt$ cat simpleGrepScript.sh
#!/bin/bash
if ${testMode:-true} ; then
echo "processing file $1 into outfile ${1%/*}/../overallenergy.out" 1>&2
else
[[ -f "$1" ]] && grep 'E-SUM-OVERALL' "$1" > ${1%/*}/../overallenergy.out || echo "no file "$1" found" 1>&2
fi
Run
prompt$ find /starting/path -name target.out | xargs /path/to/simpleGrepScript.sh
if the output from the testMode
"processing file $1 into outfile ${1%/*}/../overallenergy.out"
looks OK, then change to ${testMode:-false}.
If it doesn't look right, post the mininum error examples as a comment and I'll see if I can fix it.
If there are spaces in your path name, we'll have to circle back and add some more options to find and xargs.
IHTH.
Define a shell function that, for a given directory, finds all the underlying targets and for each target outputs, on stdout, a suitable command.
% gen_greps () {
find $1 -name target.out | while read fname ; do
printf "grep E-SUM-OVERALL $fname | "
printf "cut -c 17-24 > "
printf "$(dirname $fname)/overallenergy.out\n"
done
}
%
make a dry run
% gen_greps jan10
...
grep E-SUM-OVERALL jan10/29/29H/target.out | cut -c 17-24 > jan10/29/29H/overallenergy.out
...
%
if what we see is what we want, pass the commands to a shell for execution
% gen_greps jan10 | sh
%
That's all (?)
Don't use for in this way. In order for for to execute, it must first process the cat command, and if there are white spaces in the file name, the for will fail. Plus, it's very possible to overload your command line when executing the for.
Instead use a while read loop which is more efficient and more tolerant of file name issues:
while read dir
do
....
done < ~/jan10/list.txt
It is also very dangerous to use glob patters in the cd command because more than one file could match that pattern, and that could cause cd to fail.
Also, if you find yourself piping to a series of grep, cut, sed commands, you can usually replace that with a single awk command.
If all of your files you need are called target.out and there are no other files called target.out that you want to skip, you can use find to find the various files without changing directories to each one:
Note how much shorter and simpler the entire program is:
while read dir
do
find $dir -name "target.out" -type f \
-exec awk '/E-SUM-OVERALL/ {print substr $0, 17, 8}' {}\;
done < ~/jan10/list.txt > overallenergy.out
I don't have any data, so it's sort of hard to actually test this. It maybe possible that I could simply use the field in my awk rather that substr. Or my substr command could be off.

bash continue execution on command failure

#! /bin/bash
while :
do
filenames=$(ls -rt *.log | tail -n 2)
echo $filenames
cat $filenames > jive_log.txt
sleep 0.1
done
I am trying to read latest 2 files from a directory and join them using bash.
However when no files are present in the current directory with an extension .log the command ls -rt *.log fails with error "ls: cannot access *.log: No such file or directory". After the error it looks like the while loop does not execute.
AfterWhat do I do so that the infinite loop continues even if one command fails.
I'm not sure what you mean but perhaps:
for (( ;; )); do
while IFS= read -r FILE; do
cat "$FILE"
done < <(exec ls -rt1 *.log | tail -n 2) >> jive_log.txt
sleep 1
done
Note the ls option -1 which prints out files line by line.
Anyhow you can join last two files to jive_log.txt with:
while IFS= read -r FILE; do
cat "$FILE"
done < <(exec ls -rt1 *.log | tail -n 2) >> jive_log.txt
Another way is to save it to an array (e.g. with readarray) then pass the last 2 elements to cat.
readarray -t FILES < <(exec ls -rt1 *.log)
cat "${FILES[#]:(-2)}" > jive_log.txt ## Or perhaps you mean to append it? (>>)
If you want to sort the output of find, you have to add a sort key at the beginning, which can be removed later on.
find . -name \*.log -printf '%T+\t%p\n' |
sort -r |
head -2 |
cut -f 2-
Using head instead of tail is a bit cheaper.

Bash: reading files and moving into sub-directory

I have around 1000 files (png) and need to move them into the corresponding directory and their sub-directory.
I do have 26 directories (A - Z) and below each directory the complete alphabet A-Z again. File names are 6 characters/digits long and have a png extension, e.g. e.g. AH2BC0.png
I would need to move the file AH2BC0.png into the directory A and within that directory into the sub-directory H, e.g.A->H->AH2BC0.png.
I have created following script which is not really working as expected:
#!/bin/bash
ls >LISTE.txt
for i in LISTE.txt; do
a=$(cat $i | cut -b 1 | tr '[:lower:]' '[:upper:]')
b=$(cat $i | cut -b 2 | tr '[:lower:]' '[:upper:]')
mkdir -p $a/$b
cat $i | xargs mv $a/$b
rm $i
done
Problem is that a) the sub-directory is not created and b) the files are not moved. Any suggestions or better ideas for the script?
Thanks
PS: I guess it's obvious that it's quite some years ago that I have created any bash scripts or coded so please bear with me.
PSS: working on MAC OSX bash 3.2
There's already a post showing a better program to do what you want but I thought I'd show you how to fix yours. Hopefully you'll find it informative.
#!/bin/bash
ls >LISTE.txt
for i in LISTE.txt; do
This loops over the single value LISTE.txt; replace it with:
for i in $(cat LISTE.txt); do
to loop over the contents of the file instead.
a=$(cat $i | cut -b 1 | tr '[:lower:]' '[:upper:]')
b=$(cat $i | cut -b 2 | tr '[:lower:]' '[:upper:]')
You want to use echo rather than cat in the above two lines, as you're after the name of the file not its content.
mkdir -p $a/$b
cat $i | xargs mv $a/$b
I don't think the above line does what you think it does... It will attempt to rename the $a/$b directory to C, where C is the content of file $i. Replace it with:
mv $i $a/$b
The following line is not needed:
rm $i
So simply delete it. It would only be necessary if you copied rather than moved the files using mv.
done
Here's your complete program after the changes I've suggested.
#!/bin/bash
ls >LISTE.txt
for i in $(cat LISTE.txt); do
a=$(echo $i | cut -b 1 | tr '[:lower:]' '[:upper:]')
b=$(echo $i | cut -b 2 | tr '[:lower:]' '[:upper:]')
mkdir -p $a/$b
mv $i $a/$b
done
#!/bin/bash
for item in *; do
first=${item:0:1}
second=${item:1:1}
folder="$first/$second"
mkdir -p $folder
mv $item $folder/
done

Argument list too long - Unix

This scripts will sort the files by date then move the first 2500 files to another directory.
When I run below scripts, system prompt out Argument list too long msg. Anyone can help me enhance the scripts ? Thanks
NUM_OF_FILES=2500
FROM_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in
DESTINATION_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in_load
if [ ! -d $DESTINATION_DIRECTORY ]
then
echo "unused_file directory does not exist!"
mkdir $DESTINATION_DIRECTORY
echo "$DESTINATION_DIRECTORY directory created!"
else
echo "$DESTINATION_DIRECTORY exist!"
fi
echo "Moving $NUM_OF_FILES oldest files to $DESTINATION_DIRECTORY directory"
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -$NUM_OF_FILES |
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
You didn't say, but I assume this is where the problem occurs:
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -2500 | \
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
(You can verify it by adding "set -x" to the top of your script.)
The problem is that the kernel has a fixed maximum size of the total length of the command line given to a new process, and your exceeding that in the ls command. You can work around it by not using globbing and instead using grep:
ls -tr $FROM_DIRECTORY/ | grep '/MSCERC\*\.Z$' |head -2500 | \
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
(grep uses regular expressions instead of globs, so the pattern looks a little bit different.)
Change
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -2500 | \
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
do something like the following:
find "$FROM_DIRECTORY" -maxdepth 1 -type f -name 'MSCERC*.Z' -printf '%p\t%T#\n' | sort -k2,2 -r | cut -f1 | head -$NUM_OF_FILES | xargs mv -t "$DESTINATION_DIRECTORY"
This uses find to create a list of files with modification timestamps, sorts by the timestamp, then removes the unneeded field before passing the output to head and xargs
EDIT
Another variant, should work with non GNU utils
find "$FROM_DIRECTORY" -type f -name 'MSCERC*.Z' -printf '%p\t%T#' |sort -k 2,2 -r | cut -f1 | head -$NUM_OF_FILES | xargs -i mv \{\} "$DESTINATION_DIRECTORY"
First of create a backup list of the files to be treated. Then read the backup file line-by-line and heal it. For example
#!/bin/bash
NUM_OF_FILES=2500
FROM_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in
DESTINATION_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in_load
if [ ! -d $DESTINATION_DIRECTORY ]
then
echo "unused_file directory does not exist!"
mkdir $DESTINATION_DIRECTORY
echo "$DESTINATION_DIRECTORY directory created!"
else
echo "$DESTINATION_DIRECTORY exist!"
fi
echo "Moving $NUM_OF_FILES oldest files to $DESTINATION_DIRECTORY directory"
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -2500 > list
exec 3<list
while read file <&3
do
mv $file $DESTINATION_DIRECTORY
done
A quick way to fix this would be to change to $FROM_DIRECTORY, so that you can refer the files using (shorter) relative paths.
cd $FROM_DIRECTORY &&
ls -tr MSCERC*.Z|head -2500 |xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
This is also not entirely fool-proof, if you have too many files that match.

Delete all but the most recent X files in bash

Is there a simple way, in a pretty standard UNIX environment with bash, to run a command to delete all but the most recent X files from a directory?
To give a bit more of a concrete example, imagine some cron job writing out a file (say, a log file or a tar-ed up backup) to a directory every hour. I'd like a way to have another cron job running which would remove the oldest files in that directory until there are less than, say, 5.
And just to be clear, there's only one file present, it should never be deleted.
The problems with the existing answers:
inability to handle filenames with embedded spaces or newlines.
in the case of solutions that invoke rm directly on an unquoted command substitution (rm `...`), there's an added risk of unintended globbing.
inability to distinguish between files and directories (i.e., if directories happened to be among the 5 most recently modified filesystem items, you'd effectively retain fewer than 5 files, and applying rm to directories will fail).
wnoise's answer addresses these issues, but the solution is GNU-specific (and quite complex).
Here's a pragmatic, POSIX-compliant solution that comes with only one caveat: it cannot handle filenames with embedded newlines - but I don't consider that a real-world concern for most people.
For the record, here's the explanation for why it's generally not a good idea to parse ls output: http://mywiki.wooledge.org/ParsingLs
ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm -- {}
Note: This command operates in the current directory; to target a directory explicitly, use a subshell ((...)) with cd:
(cd /path/to && ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm -- {})
The same applies analogously to the commands below.
The above is inefficient, because xargs has to invoke rm separately for each filename.
However, your platform's specific xargs implementation may allow you to solve this problem:
A solution that works with GNU xargs is to use -d '\n', which makes xargs consider each input line a separate argument, yet passes as many arguments as will fit on a command line at once:
ls -tp | grep -v '/$' | tail -n +6 | xargs -d '\n' -r rm --
Note: Option -r (--no-run-if-empty) ensures that rm is not invoked if there's no input.
A solution that works with both GNU xargs and BSD xargs (including on macOS) - though technically still not POSIX-compliant - is to use -0 to handle NUL-separated input, after first translating newlines to NUL (0x0) chars., which also passes (typically) all filenames at once:
ls -tp | grep -v '/$' | tail -n +6 | tr '\n' '\0' | xargs -0 rm --
Explanation:
ls -tp prints the names of filesystem items sorted by how recently they were modified , in descending order (most recently modified items first) (-t), with directories printed with a trailing / to mark them as such (-p).
Note: It is the fact that ls -tp always outputs file / directory names only, not full paths, that necessitates the subshell approach mentioned above for targeting a directory other than the current one ((cd /path/to && ls -tp ...)).
grep -v '/$' then weeds out directories from the resulting listing, by omitting (-v) lines that have a trailing / (/$).
Caveat: Since a symlink that points to a directory is technically not itself a directory, such symlinks will not be excluded.
tail -n +6 skips the first 5 entries in the listing, in effect returning all but the 5 most recently modified files, if any.
Note that in order to exclude N files, N+1 must be passed to tail -n +.
xargs -I {} rm -- {} (and its variations) then invokes on rm on all these files; if there are no matches at all, xargs won't do anything.
xargs -I {} rm -- {} defines placeholder {} that represents each input line as a whole, so rm is then invoked once for each input line, but with filenames with embedded spaces handled correctly.
-- in all cases ensures that any filenames that happen to start with - aren't mistaken for options by rm.
A variation on the original problem, in case the matching files need to be processed individually or collected in a shell array:
# One by one, in a shell loop (POSIX-compliant):
ls -tp | grep -v '/$' | tail -n +6 | while IFS= read -r f; do echo "$f"; done
# One by one, but using a Bash process substitution (<(...),
# so that the variables inside the `while` loop remain in scope:
while IFS= read -r f; do echo "$f"; done < <(ls -tp | grep -v '/$' | tail -n +6)
# Collecting the matches in a Bash *array*:
IFS=$'\n' read -d '' -ra files < <(ls -tp | grep -v '/$' | tail -n +6)
printf '%s\n' "${files[#]}" # print array elements
Remove all but 5 (or whatever number) of the most recent files in a directory.
rm `ls -t | awk 'NR>5'`
(ls -t|head -n 5;ls)|sort|uniq -u|xargs rm
This version supports names with spaces:
(ls -t|head -n 5;ls)|sort|uniq -u|sed -e 's,.*,"&",g'|xargs rm
Simpler variant of thelsdj's answer:
ls -tr | head -n -5 | xargs --no-run-if-empty rm
ls -tr displays all the files, oldest first (-t newest first, -r reverse).
head -n -5 displays all but the 5 last lines (ie the 5 newest files).
xargs rm calls rm for each selected file.
find . -maxdepth 1 -type f -printf '%T# %p\0' | sort -r -z -n | awk 'BEGIN { RS="\0"; ORS="\0"; FS="" } NR > 5 { sub("^[0-9]*(.[0-9]*)? ", ""); print }' | xargs -0 rm -f
Requires GNU find for -printf, and GNU sort for -z, and GNU awk for "\0", and GNU xargs for -0, but handles files with embedded newlines or spaces.
All these answers fail when there are directories in the current directory. Here's something that works:
find . -maxdepth 1 -type f | xargs -x ls -t | awk 'NR>5' | xargs -L1 rm
This:
works when there are directories in the current directory
tries to remove each file even if the previous one couldn't be removed (due to permissions, etc.)
fails safe when the number of files in the current directory is excessive and xargs would normally screw you over (the -x)
doesn't cater for spaces in filenames (perhaps you're using the wrong OS?)
ls -tQ | tail -n+4 | xargs rm
List filenames by modification time, quoting each filename. Exclude first 3 (3 most recent). Remove remaining.
EDIT after helpful comment from mklement0 (thanks!): corrected -n+3 argument, and note this will not work as expected if filenames contain newlines and/or the directory contains subdirectories.
Ignoring newlines is ignoring security and good coding. wnoise had the only good answer. Here is a variation on his that puts the filenames in an array $x
while IFS= read -rd ''; do
x+=("${REPLY#* }");
done < <(find . -maxdepth 1 -printf '%T# %p\0' | sort -r -z -n )
For Linux (GNU tools), an efficient & robust way to keep the n newest files in the current directory while removing the rest:
n=5
find . -maxdepth 1 -type f -printf '%T# %p\0' |
sort -z -nrt ' ' -k1,1 |
sed -z -e "1,${n}d" -e 's/[^ ]* //' |
xargs -0r rm -f
For BSD, find doesn't have the -printf predicate, stat can't output NULL bytes, and sed + awk can't handle NULL-delimited records.
Here's a solution that doesn't support newlines in paths but that safeguards against them by filtering them out:
#!/bin/bash
n=5
find . -maxdepth 1 -type f ! -path $'*\n*' -exec stat -f '%.9Fm %N' {} + |
sort -nrt ' ' -k1,1 |
awk -v n="$n" -F'^[^ ]* ' 'NR > n {printf "%s%c", $2, 0}' |
xargs -0 rm -f
note: I'm using bash because of the $'\n' notation. For sh you can define a variable containing a literal newline and use it instead.
Solution for UNIX & Linux (inspired from AIX/HP-UX/SunOS/BSD/Linux ls -b):
Some platforms don't provide find -printf, nor stat, nor support NUL-delimited records with stat/sort/awk/sed/xargs. That's why using perl is probably the most portable way to tackle the problem, because it is available by default in almost every OS.
I could have written the whole thing in perl but I didn't. I only use it for substituting stat and for encoding-decoding-escaping the filenames. The core logic is the same as the previous solutions and is implemented with POSIX tools.
note: perl's default stat has a resolution of a second, but starting from perl-5.8.9 you can get sub-second resolution with the stat function of the module Time::HiRes (when both the OS and the filesystem support it). That's what I'm using here; if your perl doesn't provide it then you can remove the ‑MTime::HiRes=stat from the command line.
n=5
find . '(' -name '.' -o -prune ')' -type f -exec \
perl -MTime::HiRes=stat -le '
foreach (#ARGV) {
#st = stat($_);
if ( #st > 0 ) {
s/([\\\n])/sprintf( "\\%03o", ord($1) )/ge;
print sprintf( "%.9f %s", $st[9], $_ );
}
else { print STDERR "stat: $_: $!"; }
}
' {} + |
sort -nrt ' ' -k1,1 |
sed -e "1,${n}d" -e 's/[^ ]* //' |
perl -l -ne '
s/\\([0-7]{3})/chr(oct($1))/ge;
s/(["\n])/"\\$1"/g;
print "\"$_\"";
' |
xargs -E '' sh -c '[ "$#" -gt 0 ] && rm -f "$#"' sh
Explanations:
For each file found, the first perl gets the modification time and outputs it along the encoded filename (each newline and backslash characters are replaced with the literals \012 and \134 respectively).
Now each time filename is guaranteed to be single-line, so POSIX sort and sed can safely work with this stream.
The second perl decodes the filenames and escapes them for POSIX xargs.
Lastly, xargs calls rm for deleting the files. The sh command is a trick that prevents xargs from running rm when there's no files to delete.
I realize this is an old thread, but maybe someone will benefit from this. This command will find files in the current directory :
for F in $(find . -maxdepth 1 -type f -name "*_srv_logs_*.tar.gz" -printf '%T# %p\n' | sort -r -z -n | tail -n+5 | awk '{ print $2; }'); do rm $F; done
This is a little more robust than some of the previous answers as it allows to limit your search domain to files matching expressions. First, find files matching whatever conditions you want. Print those files with the timestamps next to them.
find . -maxdepth 1 -type f -name "*_srv_logs_*.tar.gz" -printf '%T# %p\n'
Next, sort them by the timestamps:
sort -r -z -n
Then, knock off the 4 most recent files from the list:
tail -n+5
Grab the 2nd column (the filename, not the timestamp):
awk '{ print $2; }'
And then wrap that whole thing up into a for statement:
for F in $(); do rm $F; done
This may be a more verbose command, but I had much better luck being able to target conditional files and execute more complex commands against them.
If the filenames don't have spaces, this will work:
ls -C1 -t| awk 'NR>5'|xargs rm
If the filenames do have spaces, something like
ls -C1 -t | awk 'NR>5' | sed -e "s/^/rm '/" -e "s/$/'/" | sh
Basic logic:
get a listing of the files in time order, one column
get all but the first 5 (n=5 for this example)
first version: send those to rm
second version: gen a script that will remove them properly
With zsh
Assuming you don't care about present directories and you will not have more than 999 files (choose a bigger number if you want, or create a while loop).
[ 6 -le `ls *(.)|wc -l` ] && rm *(.om[6,999])
In *(.om[6,999]), the . means files, the o means sort order up, the m means by date of modification (put a for access time or c for inode change), the [6,999] chooses a range of file, so doesn't rm the 5 first.
Adaptation of #mklement0's excellent answer with some parameters and without needing to navigate to the folder containing the files to be deleted...
TARGET_FOLDER="/my/folder/path"
FILES_KEEP=5
ls -tp "$TARGET_FOLDER"**/* | grep -v '/$' | tail -n +$((FILES_KEEP+1)) | xargs -d '\n' -r rm --
[Ref(s).: https://stackoverflow.com/a/3572628/3223785 ]
Thanks! 😉
found interesting cmd in Sed-Onliners - Delete last 3 lines - fnd it perfect for another way to skin the cat (okay not) but idea:
#!/bin/bash
# sed cmd chng #2 to value file wish to retain
cd /opt/depot
ls -1 MyMintFiles*.zip > BigList
sed -n -e :a -e '1,2!{P;N;D;};N;ba' BigList > DeList
for i in `cat DeList`
do
echo "Deleted $i"
rm -f $i
#echo "File(s) gonzo "
#read junk
done
exit 0
Removes all but the 10 latest (most recents) files
ls -t1 | head -n $(echo $(ls -1 | wc -l) - 10 | bc) | xargs rm
If less than 10 files no file is removed and you will have :
error head: illegal line count -- 0
To count files with bash
I needed an elegant solution for the busybox (router), all xargs or array solutions were useless to me - no such command available there. find and mtime is not the proper answer as we are talking about 10 items and not necessarily 10 days. Espo's answer was the shortest and cleanest and likely the most unversal one.
Error with spaces and when no files are to be deleted are both simply solved the standard way:
rm "$(ls -td *.tar | awk 'NR>7')" 2>&-
Bit more educational version: We can do it all if we use awk differently. Normally, I use this method to pass (return) variables from the awk to the sh. As we read all the time that can not be done, I beg to differ: here is the method.
Example for .tar files with no problem regarding the spaces in the filename. To test, replace "rm" with the "ls".
eval $(ls -td *.tar | awk 'NR>7 { print "rm \"" $0 "\""}')
Explanation:
ls -td *.tar lists all .tar files sorted by the time. To apply to all the files in the current folder, remove the "d *.tar" part
awk 'NR>7... skips the first 7 lines
print "rm \"" $0 "\"" constructs a line: rm "file name"
eval executes it
Since we are using rm, I would not use the above command in a script! Wiser usage is:
(cd /FolderToDeleteWithin && eval $(ls -td *.tar | awk 'NR>7 { print "rm \"" $0 "\""}'))
In the case of using ls -t command will not do any harm on such silly examples as: touch 'foo " bar' and touch 'hello * world'. Not that we ever create files with such names in real life!
Sidenote. If we wanted to pass a variable to the sh this way, we would simply modify the print (simple form, no spaces tolerated):
print "VarName="$1
to set the variable VarName to the value of $1. Multiple variables can be created in one go. This VarName becomes a normal sh variable and can be normally used in a script or shell afterwards. So, to create variables with awk and give them back to the shell:
eval $(ls -td *.tar | awk 'NR>7 { print "VarName=\""$1"\"" }'); echo "$VarName"
leaveCount=5
fileCount=$(ls -1 *.log | wc -l)
tailCount=$((fileCount - leaveCount))
# avoid negative tail argument
[[ $tailCount < 0 ]] && tailCount=0
ls -t *.log | tail -$tailCount | xargs rm -f
I made this into a bash shell script. Usage: keep NUM DIR where NUM is the number of files to keep and DIR is the directory to scrub.
#!/bin/bash
# Keep last N files by date.
# Usage: keep NUMBER DIRECTORY
echo ""
if [ $# -lt 2 ]; then
echo "Usage: $0 NUMFILES DIR"
echo "Keep last N newest files."
exit 1
fi
if [ ! -e $2 ]; then
echo "ERROR: directory '$1' does not exist"
exit 1
fi
if [ ! -d $2 ]; then
echo "ERROR: '$1' is not a directory"
exit 1
fi
pushd $2 > /dev/null
ls -tp | grep -v '/' | tail -n +"$1" | xargs -I {} rm -- {}
popd > /dev/null
echo "Done. Kept $1 most recent files in $2."
ls $2|wc -l
Modified version of the answer of #Fabien if you want to specify a path. Useful if you're running the script elsewhere.
ls -tr /path/foo/ | head -n -5 | xargs -I% --no-run-if-empty rm /path/foo/%

Resources