Combining Bash command with AWS CLI copy command - bash

I need to copy some files from Linux machine to the S3 bucket. I need to copy only selected files. I am able to get files using below Bash command:
ls -1t /var/lib/pgsql/backups/full/backup_daily/test* | tail -n +8
Now, I want to combine this bash command with AWS S3 cp command. I searched and find below solution but it's not working.
ls -1t /var/lib/pgsql/backups/full/backup_daily/test* | tail -n +8 | aws s3 cp - s3://all-postgresql-backup/dev/
How can I make this work?

If you're on a platform with GNU tools (find, sort, tail, sed), and you want to insert all the names in the position where you have the -, doing this reliably (in a manner robust against unexpected filenames) might look like:
find /var/lib/pgsql/backups/full/daily_backup -name 'guest*' -type f -printf '%T# %p\0' |
sort -znr |
tail -z -n +8 |
sed -zEe 's/[^ ]+ //' |
xargs -0 sh -c 'aws s3 cp "$#" s3://all-postgresql-backup/ncldevshore/' _
There's a lot there, so let's take it piece-by-piece:
ls does not generate output safe for programmatic use. Thus, we use find instead, with a -printf string that puth a timestamp (in UNIX epoch time, seconds since 1970) before each file, and terminates each entry with a NUL (a character which, unlike a newline, cannot exist in filenames on UNIX).
sort -z is a GNU extension which delimits input and output by NULs; -n specifies numeric sort (since the timestamps are numeric); -r reverses sort order.
sed -z is a GNU extension which, again, delimits records by NULs rather than newlines; here, we're stripping the timestamp off the records after sorting them.
xargs -0 ... tells xargs to read NUL-delimited records from stdin, and append them to the argument list of ..., splitting into multiple invocations whenever this would go over maximum command-line length.
sh -c '..."$#"...' _ runs a shell -- sh -- with a command that includes "$#", which expands to the list of arguments that shell was passed. _ is a placeholder for $0. xargs will place the names produced by the preceding pipeline after the _, becoming $1, $2, etc, such that they're placed on the aws command line in place of the "$#".
References:
BashFAQ #3 - How can I sort or compare files based on some metadata attribute (newest / oldest modification time, size, etc)?
ParsingLs - Why you shouldn't parse the output of ls
UsingFind - See the "Actions In Bulk" section for discussion of safety precautions necessary to use xargs without introducing bugs (which the above code follows, but other suggestions may not).

You might also want to take a look at S3 sync and s3 copy with --exclude commands.
aws s3 sync . s3://mybucket --exclude "*.jpg"
You could have a simple cron job that runs in the background every few minutes and keeps the directories in sync.
Syncs directories and S3 prefixes. Recursively copies new and updated
files from the source directory to the destination. Only creates
folders in the destination if they contain one or more files.
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Related

How do i loop through certain number of folders

I am trying to loop through a folder that has 744 sub directories. How do i loop through only certain number of folders. Since i have 744 sub directories i would split this into half and loop through first 372 directories and then later on loop through rest of the 372 directories. I want to make sure i don't copy directories multiple times. Below is what i tried doing but i want to know what would be the effective way of doing this to avoid duplication.
for d in `ls -tr|tail -372`
do
echo $d
done
Since my xargs answer didn't receive any feedback, here's another approach.
printf "%s\n" */ |
awk 'BEGIN { n=1; OFS="\t"
split("first:second:third", destination, /:/) }
(i++ % 372)=0 { ++n }
{ print destination[n], $0 }'
This will add a field in front of each directory name, which you can use to process the results further. Sample output:
first directory1/
first directory2/
first directory3/
:
first directory372/
second directory373/
second directory374/
:
second directory743/
second directory744/
So the field value third from the Awk script is never used, but I put it in anyway to demonstrate that this could easily be extended to do three-way partitions, or four-way or what have you.
You would use this e.g. by piping to
while IFS= read -r dest dir; do
echo mv "dir" "$dest"
done
Unlike the xargs -0 answer, this is not robust against arbitrary file names; in particular, directory names which contain newlines will not work correctly.
Actually a much better solution would be to split the files the other way -- i.e. for a two-way partition, print first on every other line, and second on every other. Then you don't have to hard-code the number of items, just the number of partitions.
printf "%s\n" */ |
awk 'BEGIN { OFS="\t"
n = split("ernie:bert", host, /:/) }
{ print host[1+((NR-1)% n)], $0 }' |
while IFS= read -r server dir; do
mkdir -p "$server"
mv "$dir" "$server/"
done
Regardless of the number of directories, this splits them evenly into the directories ernie and bert, on the optimistic assumption that you (too) might have named your file servers after Sesame Street characters.
If you want to scp the directories instead of mv them, grouping them by server name would be a lot more efficient; but a simple sort takes care of that if necessary. (That's not the only reason we print the destination before each file name; it's also useful because then we don't have to worry that the directory names could contain our field separator.)
You can use xargs but this requires the 372 directory names to fit in one invocation (i.e. the directory names combined must not exceed ARG_MAX).
printf '%s\0' */ |
xargs -n 372 -r -0 sh -c '
d=dest$$; mkdir "$d"; cp "$#" "$d"' _
This will generate a unique new directory with the prefix dest and a number for each batch of directories it copies. There are probably better ways to split the files (and calling sh from xargs is not exactly a newbie-friendly answer) but maybe this should at least give you some ideas.
In some more detail, xargs -n 372 limits the number of arguments that get processed in one go, and the command you pass to xargs could be something a lot simpler; xargs -n 372 cp -t fnord would copy first 372 directories to fnord, then another 372; but in order for this to be actualy useful, we want the destination directory to change each time we call xargs, and so I put in a simple script which does that.
You also need to understand that 372 is a maximum, and if the directory names are really long, xargs could decide that it needs to pass fewer directories in order to not cross the "argument list too long" limit. But for your use case, on any remotely modern system, we are probably far below that limit anyway.
xargs -0 and cp -t are GNU extensions, i.e. they should work on Linux out of the box, and you can install them on most other platforms; if you really need to support something like Solaris without installing external tools, that's going to be slightly more challenging.
Addendum: Here's a xargs implementation of the ernie & bert part of the Awk answer:
printf "%s\0" */ |
xargs -r -0 -n 2 sh -c '
mkdir -p ernie bert
mv "$1" ernie
mv "$2" bert' _
There will be an ugly but harmless error message for the last item if you have an uneven number of input directories. There are obvious but inelegant ways to fix that, or elegant but obscure ones; but I prefer to keep this plain for now.

Bash Merging multiple files into single file after reading list of files from another external file

I have file with the name of filesList.txt which contain list of all files which needs to be merged into single file.
filesList.txt
------------------
../../folder/a.js
../../folder/b.js
../../folder/c.js
../../folder/d.js
Current I am running following commands.
cp filesList.txt filesList.sh
chmod 777 filesList.sh
vim filesList.sh
cat
../../folder/a.js
../../folder/b.js
../../folder/c.js
../../folder/d.js
> output.txt
RUN vim command j10 to make above multiline file into single line like this
cat ../../folder/a.js ../../folder/b.js ../../folder/c.js ../../folder/d.js > output.txt
save and quit file within vim using :wq
and run ./fileList.sh to create single output.text file in exact same order files are listed in.
My Question is what command I need to use to create a bash file which create external list of file(filesList.txt) line by line and generate and single file with its contents. So I don't have to conver my filesList.txt file into filesList.sh file each time I need to merge file.
A line-oriented file is a bad choice here (in a "any attacker who can control filenames can inject arbitrary files into your output" sense of bad; you probably don't want to risk that someone who figures out how to create new .js files matching your glob can then introduce /etc/passwd to the list by creating ../../$'\n'/etc/passwd$'\n'/hello.js). Instead, separate values by NULs, and use xargs -0 (a non-POSIX extension, but a popular one provided by major OS vendors) to convert those into arguments.
printf '%s\0' ../../folder/*.js >filesList.nsv # generate file w/ null-separated values
xargs -0 cat <filesList.nsv >output.txt # combine to argument list split on NUL
By the way, if you want to generate your list of files recursively, that first part would become:
find ../../folder -name '*.js' -print0 >filesList.nsv
...and if you don't have any other need for filesList.nsv, I'd just avoid it entirely and generate output.txt directly:
find ../../folder -name '*.js' -exec cat '{}' + >output.txt
If you must use newlines, but you have GNU xargs, at least use xargs -d $'\n' to process them to try to avoid other, quoting-related bugs found in stock xargs or more naive practices in bash:
printf '%s\n' ../../folder/*.js >filesList.txt # generate w/ newline-separated values
xargs -d $'\n' cat <filesList.txt >output.txt # combine on those values
If you don't have GNU xargs, then you can implement this yourself in shell:
# Newline-separated input
while IFS= read -r filename; do
cat "$filename"
done <filesList.txt >output.txt
# ...or NUL-separated input
while IFS= read -r -d '' filename; do
cat "$filename"
done <filesList.txt >output.txt

Iterate through list of filenames in order they were created in bash

Parsing output of ls to iterate through list of files is bad. So how should I go about iterating through list of files in order by which they were first created? I browsed several questions here on SO and they all seem to parsing ls.
The embedded link suggests:
Things get more difficult if you wanted some specific sorting that
only ls can do, such as ordering by mtime. If you want the oldest or
newest file in a directory, don't use ls -t | head -1 -- read Bash FAQ
99 instead. If you truly need a list of all the files in a directory
in order by mtime so that you can process them in sequence, switch to
perl, and have your perl program do its own directory opening and
sorting. Then do the processing in the perl program, or -- worst case
scenario -- have the perl program spit out the filenames with NUL
delimiters.
Even better, put the modification time in the filename, in YYYYMMDD
format, so that glob order is also mtime order. Then you don't need ls
or perl or anything. (The vast majority of cases where people want the
oldest or newest file in a directory can be solved just by doing
this.)
Does that mean there is no native way of doing it in bash? I don't have the liberty to modify the filename to include the time in them. I need to schedule a script in cron that would run every 5 minutes, generate an array containing all the files in a particular directory ordered by their creation time and perform some actions on the filenames and move them to another location.
The following worked but only because I don't have funny filenames. The files are created by a server so it will never have special characters, spaces, newlines etc.
files=( $(ls -1tr) )
I can write a perl script that would do what I need but I would appreciate if someone can suggest the right way to do it in bash. Portable option would be great but solution using latest GNU utilities will not be a problem either.
sorthelper=();
for file in *; do
# We need something that can easily be sorted.
# Here, we use "<date><filename>".
# Note that this works with any special characters in filenames
sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
# or
sorthelper+=("$(stat --printf "%Y %n" -- "$file")"); # Linux only
done;
sorted=();
while read -d $'\0' elem; do
# this strips away the first 14 characters (<date>)
sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[#]}" | sort -z)
for file in "${sorted[#]}"; do
# do your stuff...
echo "$file";
done;
Other than sort and stat, all commands are actual native Bash commands (builtins)*. If you really want, you can implement your own sort using Bash builtins only, but I see no way of getting rid of stat.
The important parts are read -d $'\0', printf '%s\0' and sort -z. All these commands are used with their null-delimiter options, which means that any filename can be procesed safely. Also, the use of double-quotes in "$file" and "${anarray[*]}" is essential.
*Many people feel that the GNU tools are somehow part of Bash, but technically they're not. So, stat and sort are just as non-native as perl.
With all of the cautions and warnings against using ls to parse a directory notwithstanding, we have all found ourselves in this situation. If you do find yourself needing sorted directory input, then about the cleanest use of ls to feed your loop is ls -opts | read -r name; do... This will handle spaces in filenames, etc.. without requiring a reset of IFS due to the nature of read itself. Example:
ls -1rt | while read -r fname; do # where '1' is ONE not little 'L'
So do look for cleaner solutions avoiding ls, but if push comes to shove, ls -opts can be used sparingly without the sky falling or dragons plucking your eyes out.
let me add the disclaimer to keep everyone happy. If you like newlines inside your filenames -- then do not use ls to populate a loop. If you do not have newlines inside your filenames, there are no other adverse side-effects.
Contra: TLDP Bash Howto Intro:
#!/bin/bash
for i in $( ls ); do
echo item: $i
done
It appears that SO users do not know what the use of contra means -- please look it up before downvoting.
You can try using use stat command piped with sort:
stat -c '%Y %n' * | sort -t ' ' -nk1 | cut -d ' ' -f2-
Update: To deal with filename with newlines we can use %N format in stat andInstead of cut we can use awk like this:
LANG=C stat -c '%Y^A%N' *| sort -t '^A' -nk1| awk -F '^A' '{print substr($2,2,length($2)-2)}'
Use of LANG=C is needed to make sure stat uses single quotes only in quoting file names.
^A is conrtrol-A character typed using ControlVA keys together.
How about a solution with GNU find + sed + sort?
As long as there are no newlines in the file name, this should work:
find . -type f -printf '%T# %p\n' | sort -k 1nr | sed 's/^[^ ]* //'
It may be a little more work to ensure it is installed (it may already be, though), but using zsh instead of bash for this script makes a lot of sense. The filename globbing capabilities are much richer, while still using a sh-like language.
files=( *(oc) )
will create an array whose entries are all the file names in the current directory, but sorted by change time. (Use a capital O instead to reverse the sort order). This will include directories, but you can limit the match to regular files (similar to the -type f predicate to find):
files=( *(.oc) )
find is needed far less often in zsh scripts, because most of its uses are covered by the various glob flags and qualifiers available.
I've just found a way to do it with bash and ls (GNU).
Suppose you want to iterate through the filenames sorted by modification time (-t):
while read -r fname; do
fname=${fname:1:((${#fname}-2))} # remove the leading and trailing "
fname=${fname//\\\"/\"} # removed the \ before any embedded "
fname=$(echo -e "$fname") # interpret the escaped characters
file "$fname" # replace (YOU) `file` with anything
done < <(ls -At --quoting-style=c)
Explanation
Given some filenames with special characters, this is the ls output:
$ ls -A
filename with spaces .hidden_filename filename?with_a_tab filename?with_a_newline filename_"with_double_quotes"
$ ls -At --quoting-style=c
".hidden_filename" " filename with spaces " "filename_\"with_double_quotes\"" "filename\nwith_a_newline" "filename\twith_a_tab"
So you have to process a little each filename to get the actual one. Recalling:
${fname:1:((${#fname}-2))} # remove the leading and trailing "
# ".hidden_filename" -> .hidden_filename
${fname//\\\"/\"} # removed the \ before any embedded "
# filename_\"with_double_quotes\" -> filename_"with_double_quotes"
$(echo -e "$fname") # interpret the escaped characters
# filename\twith_a_tab -> filename with_a_tab
Example
$ ./script.sh
.hidden_filename: empty
filename with spaces : empty
filename_"with_double_quotes": empty
filename
with_a_newline: empty
filename with_a_tab: empty
As seen, file (or the command you want) interprets well each filename.
Each file has three timestamps:
Access time: the file was opened and read. Also known as atime.
Modification time: the file was written to. Also known as mtime.
Inode modification time: the file's status was changed, such as the file had a new hard link created, or an existing one removed; or if the file's permissions were chmod-ed, or a few other things. Also known as ctime.
Neither one represents the time the file was created, that information is not saved anywhere. At file creation time, all three timestamps are initialized, and then each one gets updated appropriately, when the file is read, or written to, or when a file's permissions are chmoded, or a hard link created or destroyed.
So, you can't really list the files according to their file creation time, because the file creation time isn't saved anywhere. The closest match would be the inode modification time.
See the descriptions of the -t, -u, -c, and -r options in the ls(1) man page for more information on how to list files in atime, mtime, or ctime order.
Here's a way using stat with an associative array.
n=0
declare -A arr
for file in *; do
# modified=$(stat -f "%m" "$file") # For use with BSD/OS X
modified=$(stat -c "%Y" "$file") # For use with GNU/Linux
# Ensure stat timestamp is unique
if [[ $modified == *"${!arr[#]}"* ]]; then
modified=${modified}.$n
((n++))
fi
arr[$modified]="$file"
done
files=()
for index in $(IFS=$'\n'; echo "${!arr[*]}" | sort -n); do
files+=("${arr[$index]}")
done
Since sort sorts lines, $(IFS=$'\n'; echo "${!arr[*]}" | sort -n) ensures the indices of the associative array get sorted by setting the field separator in the subshell to a newline.
The quoting at arr[$modified]="${file}" and files+=("${arr[$index]}") ensures that file names with caveats like a newline are preserved.

Commandline find, sed, exec

I have a bunch of files in a folder, in subfolders and I'm trying to make some kind of one-liner for quick copy/pasting once in a while.
The contents is (too long to paste here): http://pastebin.com/4aZCPbwT
I've tried the following commands:
List all files and their directories
find . -name '[!.]*'
Replace all instances of "Namespace" with "Test:
find . -name '[!.]*' -print0 | sed 's/Namespace/Test/gI' | xargs -i -0 echo '{}'
What I need to do is:
Replace foldes names like above, and copy the folders (including files), to another location. Create the folders if they don't exist (they most likely won't) - BUT, there are some of them that I don't need, like ./app, as this folder exists. I could use -wholename './app' for that.
When they are copied, I need to replace some text inside each file, same as above (Namespace with Test - also occours inside the files and save them of course).
Something like this I would imagine:
-print -exec sed -i 's/Namespace/Test/gI' {} \;
Can these 3 things be done in a one-liner? Replace text in files (Namespace <=> Test), copy files including their directories with cp -p (don't want to write over folders), but renaming each directory/file with as above (Namespace <=> Test).
Thanks a lot :-)
Besides describing the how with painstaking verbosity below, this method may also be unique in that it incorporates built-in debugging. It basically doesn't do anything at all as written except compile and save to a variable all commands it believes it should do in order to perform the work requested.
It also explicitly avoids loops as much as possible. Besides the sed recursive search for more than one match of the pattern there is no other recursion as far as I know.
And last, this is entirely null delimited - it doesn't trip on any character in any filename except the null. I don't think you should have that.
By the way, this is REALLY fast. Look:
% _mvnfind() { mv -n "${1}" "${2}" && cd "${2}"
> read -r SED <<SED
> :;s|${3}\(.*/[^/]*${5}\)|${4}\1|;t;:;s|\(${5}.*\)${3}|\1${4}|;t;s|^[0-9]*\(.*\)${5}|\1|p
> SED
> find . -name "*${3}*" -printf "%d\tmv %P ${5} %P\000" |
> sort -zg | sed -nz ${SED} | read -r ${6}
> echo <<EOF
> Prepared commands saved in variable: ${6}
> To view do: printf ${6} | tr "\000" "\n"
> To run do: sh <<EORUN
> $(printf ${6} | tr "\000" "\n")
> EORUN
> EOF
> }
% rm -rf "${UNNECESSARY:=/any/dirs/you/dont/want/moved}"
% time ( _mvnfind ${SRC=./test_tree} ${TGT=./mv_tree} \
> ${OLD=google} ${NEW=replacement_word} ${sed_sep=SsEeDd} \
> ${sh_io:=sh_io} ; printf %b\\000 "${sh_io}" | tr "\000" "\n" \
> | wc - ; echo ${sh_io} | tr "\000" "\n" | tail -n 2 )
<actual process time used:>
0.06s user 0.03s system 106% cpu 0.090 total
<output from wc:>
Lines Words Bytes
115 362 20691 -
<output from tail:>
mv .config/replacement_word-chrome-beta/Default/.../googlestars \
.config/replacement_word-chrome-beta/Default/.../replacement_wordstars
NOTE: The above function will likely require GNU versions of sed and find to properly handle the find printf and sed -z -e and :;recursive regex test;t calls. If these are not available to you the functionality can likely be duplicated with a few minor adjustments.
This should do everything you wanted from start to finish with very little fuss. I did fork with sed, but I was also practicing some sed recursive branching techniques so that's why I'm here. It's kind of like getting a discount haircut at a barber school, I guess. Here's the workflow:
rm -rf ${UNNECESSARY}
I intentionally left out any functional call that might delete or destroy data of any kind. You mention that ./app might be unwanted. Delete it or move it elsewhere beforehand, or, alternatively, you could build in a \( -path PATTERN -exec rm -rf \{\} \) routine to find to do it programmatically, but that one's all yours.
_mvnfind "${#}"
Declare its arguments and call the worker function. ${sh_io} is especially important in that it saves the return from the function. ${sed_sep} comes in a close second; this is an arbitrary string used to reference sed's recursion in the function. If ${sed_sep} is set to a value that could potentially be found in any of your path- or file-names acted upon... well, just don't let it be.
mv -n $1 $2
The whole tree is moved from the beginning. It will save a lot of headache; believe me. The rest of what you want to do - the renaming - is simply a matter of filesystem metadata. If you were, for instance, moving this from one drive to another, or across filesystem boundaries of any kind, you're better off doing so at once with one command. It's also safer. Note the -noclobber option set for mv; as written, this function will not put ${SRC_DIR} where a ${TGT_DIR} already exists.
read -R SED <<HEREDOC
I located all of sed's commands here to save on escaping hassles and read them into a variable to feed to sed below. Explanation below.
find . -name ${OLD} -printf
We begin the find process. With find we search only for anything that needs renaming because we already did all of the place-to-place mv operations with the function's first command. Rather than take any direct action with find, like an exec call, for instance, we instead use it to build out the command-line dynamically with -printf.
%dir-depth :tab: 'mv '%path-to-${SRC}' '${sed_sep}'%path-again :null delimiter:'
After find locates the files we need it directly builds and prints out (most) of the command we'll need to process your renaming. The %dir-depth tacked onto the beginning of each line will help to ensure we're not trying to rename a file or directory in the tree with a parent object that has yet to be renamed. find uses all sorts of optimization techniques to walk your filesystem tree and it is not a sure thing that it will return the data we need in a safe-for-operations order. This is why we next...
sort -general-numerical -zero-delimited
We sort all of find's output based on %directory-depth so that the paths nearest in relationship to ${SRC} are worked first. This avoids possible errors involving mving files into non-existent locations, and it minimizes need to for recursive looping. (in fact, you might be hard-pressed to find a loop at all)
sed -ex :rcrs;srch|(save${sep}*til)${OLD}|\saved${SUBSTNEW}|;til ${OLD=0}
I think this is the only loop in the whole script, and it only loops over the second %Path printed for each string in case it contains more than one ${OLD} value that might need replacing. All other solutions I imagined involved a second sed process, and while a short loop may not be desirable, certainly it beats spawning and forking an entire process.
So basically what sed does here is search for ${sed_sep}, then, having found it, saves it and all characters it encounters until it finds ${OLD}, which it then replaces with ${NEW}. It then heads back to ${sed_sep} and looks again for ${OLD}, in case it occurs more than once in the string. If it is not found, it prints the modified string to stdout (which it then catches again next) and ends the loop.
This avoids having to parse the entire string, and ensures that the first half of the mv command string, which needs to include ${OLD} of course, does include it, and the second half is altered as many times as is necessary to wipe the ${OLD} name from mv's destination path.
sed -ex...-ex search|%dir_depth(save*)${sed_sep}|(only_saved)|out
The two -exec calls here happen without a second fork. In the first, as we've seen, we modify the mv command as supplied by find's -printf function command as necessary to properly alter all references of ${OLD} to ${NEW}, but in order to do so we had to use some arbitrary reference points which should not be included in the final output. So once sed finishes all it needs to do, we instruct it to wipe out its reference points from the hold-buffer before passing it along.
AND NOW WE'RE BACK AROUND
read will receive a command that looks like this:
% mv /path2/$SRC/$OLD_DIR/$OLD_FILE /same/path_w/$NEW_DIR/$NEW_FILE \000
It will read it into ${msg} as ${sh_io} which can be examined at will outside of the function.
Cool.
-Mike
I haven't tested this, but I think it's what you're after.
find . -name '[!.]*' -print | while read line; do nfile=`echo "$line" | sed 's/Namespace/Test/gI'`; mkdir -p "`dirname $nfile`"; cp -p "$line" "$nfile"; sed -i 's/Namespace/Test/gI' "$nfile"; done

bash script to delete old deployments

I have a directory where our deployments go. A deployment (which is itself a directory) is named in the format:
<application-name>_<date>
e.g. trader-gui_20091102
There are multiple applications deployed to this same parent directory, so the contents of the parent directory might look something like this:
trader-gui_20091106
trader-gui_20091102
trader-gui_20091010
simulator_20091106
simulator_20091102
simulator_20090910
simulator_20090820
I want to write a bash script to clean out all deployments except for the most current of each application. (The most current denoted by the date in the name of the deployment). So running the bash script on the above parent directory would leave:
trader-gui_20091106
simulator_20091106
Any help would be appreciated.
A quick one-liner:
ls | sed 's/_[0-9]\{8\}$//' | uniq |
while read name; do
rm $(ls -r ${name}* | tail -n +2)
done
List the files, chop off an underscore followed by eight digits, only keep unique names. For each name, remove everything but the most recent.
Assumptions:
the most recent will be last when sorted alphabetically. If that's not the case, add a sort that does what you want in the pipeline before tail -n +2
no other files in this directory. If there are, limit the output of the ls, or pipe it through a grep to select only what you want.
no weird characters in the filenames. If there are... instead of directly using the output of the inner ls pipeline, you'd probably want to pipe it into another while loop so you can quote the individual lines, or else capture it in an array so you can use the quoted expansion.
shopt -s exglob
ls|awk -F"_" '{a[$1]=$NF}END{for(i in a)print i"_"a[i]}'|while read -r F
do
rm !($F)
done
since your date in filename is already "sortable" , the awk command finds the latest file of each application. rm (!$F) just means remove those filename that is not latest.
You could try find:
# Example: Find and delete all directories in /tmp/ older than 7 days:
find /tmp/ -type d -mtime +7 -exec rm -rf {} \; &>/dev/null

Resources