bash script to delete old deployments - bash

I have a directory where our deployments go. A deployment (which is itself a directory) is named in the format:
<application-name>_<date>
e.g. trader-gui_20091102
There are multiple applications deployed to this same parent directory, so the contents of the parent directory might look something like this:
trader-gui_20091106
trader-gui_20091102
trader-gui_20091010
simulator_20091106
simulator_20091102
simulator_20090910
simulator_20090820
I want to write a bash script to clean out all deployments except for the most current of each application. (The most current denoted by the date in the name of the deployment). So running the bash script on the above parent directory would leave:
trader-gui_20091106
simulator_20091106
Any help would be appreciated.

A quick one-liner:
ls | sed 's/_[0-9]\{8\}$//' | uniq |
while read name; do
rm $(ls -r ${name}* | tail -n +2)
done
List the files, chop off an underscore followed by eight digits, only keep unique names. For each name, remove everything but the most recent.
Assumptions:
the most recent will be last when sorted alphabetically. If that's not the case, add a sort that does what you want in the pipeline before tail -n +2
no other files in this directory. If there are, limit the output of the ls, or pipe it through a grep to select only what you want.
no weird characters in the filenames. If there are... instead of directly using the output of the inner ls pipeline, you'd probably want to pipe it into another while loop so you can quote the individual lines, or else capture it in an array so you can use the quoted expansion.

shopt -s exglob
ls|awk -F"_" '{a[$1]=$NF}END{for(i in a)print i"_"a[i]}'|while read -r F
do
rm !($F)
done
since your date in filename is already "sortable" , the awk command finds the latest file of each application. rm (!$F) just means remove those filename that is not latest.

You could try find:
# Example: Find and delete all directories in /tmp/ older than 7 days:
find /tmp/ -type d -mtime +7 -exec rm -rf {} \; &>/dev/null

Related

Combining Bash command with AWS CLI copy command

I need to copy some files from Linux machine to the S3 bucket. I need to copy only selected files. I am able to get files using below Bash command:
ls -1t /var/lib/pgsql/backups/full/backup_daily/test* | tail -n +8
Now, I want to combine this bash command with AWS S3 cp command. I searched and find below solution but it's not working.
ls -1t /var/lib/pgsql/backups/full/backup_daily/test* | tail -n +8 | aws s3 cp - s3://all-postgresql-backup/dev/
How can I make this work?
If you're on a platform with GNU tools (find, sort, tail, sed), and you want to insert all the names in the position where you have the -, doing this reliably (in a manner robust against unexpected filenames) might look like:
find /var/lib/pgsql/backups/full/daily_backup -name 'guest*' -type f -printf '%T# %p\0' |
sort -znr |
tail -z -n +8 |
sed -zEe 's/[^ ]+ //' |
xargs -0 sh -c 'aws s3 cp "$#" s3://all-postgresql-backup/ncldevshore/' _
There's a lot there, so let's take it piece-by-piece:
ls does not generate output safe for programmatic use. Thus, we use find instead, with a -printf string that puth a timestamp (in UNIX epoch time, seconds since 1970) before each file, and terminates each entry with a NUL (a character which, unlike a newline, cannot exist in filenames on UNIX).
sort -z is a GNU extension which delimits input and output by NULs; -n specifies numeric sort (since the timestamps are numeric); -r reverses sort order.
sed -z is a GNU extension which, again, delimits records by NULs rather than newlines; here, we're stripping the timestamp off the records after sorting them.
xargs -0 ... tells xargs to read NUL-delimited records from stdin, and append them to the argument list of ..., splitting into multiple invocations whenever this would go over maximum command-line length.
sh -c '..."$#"...' _ runs a shell -- sh -- with a command that includes "$#", which expands to the list of arguments that shell was passed. _ is a placeholder for $0. xargs will place the names produced by the preceding pipeline after the _, becoming $1, $2, etc, such that they're placed on the aws command line in place of the "$#".
References:
BashFAQ #3 - How can I sort or compare files based on some metadata attribute (newest / oldest modification time, size, etc)?
ParsingLs - Why you shouldn't parse the output of ls
UsingFind - See the "Actions In Bulk" section for discussion of safety precautions necessary to use xargs without introducing bugs (which the above code follows, but other suggestions may not).
You might also want to take a look at S3 sync and s3 copy with --exclude commands.
aws s3 sync . s3://mybucket --exclude "*.jpg"
You could have a simple cron job that runs in the background every few minutes and keeps the directories in sync.
Syncs directories and S3 prefixes. Recursively copies new and updated
files from the source directory to the destination. Only creates
folders in the destination if they contain one or more files.
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

How to delete files from directory using CSV in bash

I have 600,000+ images in a directory. The filenames look like this:
1000000-0.jpeg
1000000-1.jpeg
1000000-2.jpeg
1000001-0.jpeg
1000002-0.jpeg
1000003-0.jpeg
The first number is a unique ID and the second number is an index.
{unique-id}-{index}.jpeg
How would I load the unique-id's in from a .CSV file and remove each file whose Unique ID matches the Unique ID's in the .CSV file?
The CSV file looks like this:
1000000
1000001
1000002
... or I can have it separated by semicolons like so (if necessary):
1000000;1000001;1000002
You can set the IFS variable to ; and loop over the values read into an array:
#! /bin/bash
while IFS=';' read -a ids ; do
for id in "${ids[#]}" ; do
rm $id-*.jpg
done
done < file.csv
Try running the script with echo rm ... first to verify it does what you want.
If there's exactly one ID per line, this will show you all matching file names:
ls | grep -f unique-ids.csv
If that list looks correct, you can delete the files with:
ls | grep -f unique-ids.csv | xargs rm
Caveat: This is a quick and dirty solution. It'll work if the file names are all named the way you say. Beware it could easily be tricked into deleting the wrong things by a clever attacker or a particularly hapless user.
You could use find and sed:
find dir -regextype posix-egrep \
-regex ".*($(sed 's/\;/|/g' ids.csv))-[0-9][0-9]*\.jpeg"
replace dir with your search directory, and ids.csv with your CVS file. To delete the files you could include -delete option.

Shell script to delete whose files names are not in a text file

I have a txt file which contains list of file names
Example:
10.jpg
11.jpg
12.jpeg
...
In a folder this files should protect from delete process and other files should delete.
So i want oppposite logic of this question: Shell command/script to delete files whose names are in a text file
How to do that?
Use extglob and Bash extended pattern matching !(pattern-list):
!(pattern-list)
Matches anything except one of the given patterns
where a pattern-list is a list of one or more patterns separated by a |.
extglob
If set, the extended pattern matching features described above are enabled.
So for example:
$ ls
10.jpg 11.jpg 12.jpeg 13.jpg 14.jpg 15.jpg 16.jpg a.txt
$ shopt -s extglob
$ shopt | grep extglob
extglob on
$ cat a.txt
10.jpg
11.jpg
12.jpeg
$ tr '\n' '|' < a.txt
10.jpg|11.jpg|12.jpeg|
$ ls !(`tr '\n' '|' < a.txt`)
13.jpg 14.jpg 15.jpg 16.jpg a.txt
The deleted files are 13.jpg 14.jpg 15.jpg 16.jpg a.txt according to the example.
So with extglob and !(pattern-list), we can obtain the files which are excluded based on the file content.
Additionally, if you want to exclude the entries starting with ., then you could switch on the dotglob option with shopt -s dotglob.
This is one way that will work with bash GLOBIGNORE:
$ cat file2
10.jpg
11.jpg
12.jpg
$ ls *.jpg
10.jpg 11.jpg 12.jpg 13.jpg
$ echo $GLOBIGNORE
$ GLOBIGNORE=$(tr '\n' ':' <file2 )
$ echo $GLOBIGNORE
10.jpg:11.jpg:12.jpg:
$ ls *.jpg
13.jpg
As it is obvious, globing ignores whatever (file, pattern, etc) is included in the GLOBIGNORE bash variable.
This is why the last ls reports only file 13.jpg since files 10,11 and 12.jpg are ignored.
As a result using rm *.jpg will remove only 13.jpg in my system:
$ rm -iv *.jpg
rm: remove regular empty file '13.jpg'? y
removed '13.jpg'
When you are done, you can just set GLOBIGNORE to null:
$ GLOBIGNORE=
Worths to be mentioned, that in GLOBIGNORE you can also apply glob patterns instead of single filenames, like *.jpg or my*.mp3 , etc
Alternative :
We can use programming techniques (grep, awk, etc) to compare the file names present in ignorefile and the files under current directory:
$ awk 'NR==FNR{f[$0];next}(!($0 in f))' file2 <(find . -type f -name '*.jpg' -printf '%f\n')
13.jpg
$ rm -iv "$(awk 'NR==FNR{f[$0];next}(!($0 in f))' file2 <(find . -type f -name '*.jpg' -printf '%f\n'))"
rm: remove regular empty file '13.jpg'? y
removed '13.jpg'
Note: This also makes use of bash process substitution, and will break if filenames include new lines.
Another alternative to George Vasiliou's answer would be to read the file with the names of the files to keep using the Bash builtin mapfile and then check for each of the files to be deleted whether it is in that list.
#! /bin/bash -eu
mapfile -t keepthose <keepme.txt
declare -a deletethose
for f in "$#"
do
keep=0
for not in "${keepthose[#]}"
do
[ "${not}" = "${f}" ] && keep=1 || :
done
[ ${keep} -gt 0 ] || deletethose+=("${f}")
done
# Remove the 'echo' if you really want to delete files.
echo rm -f "${deletethose[#]}"
The -t option causes mapfile to trim the trailing newline character from the lines it reads from the file. No other white-space will be trimmed, though. This might be what you want if your file names actually contain white-space but it could also cause subtle surprises if somebody accidentally puts a space before or after the name of an important file they want to keep.
Note that I'm first building a list of the files that should be deleted and then delete them all at once rather than deleting each file individually. This saves some sub-process invocations.
The lookup in the list, as coded above, has linear complexity which gives the overall script quadratic complexity (precisely, N × M where N is the number of command-line arguments and M the number of entries in the keepme.txt file). If you only have a few dozen files, this should be fine. Unfortunately, I don't know of a better way to check for set membership in Bash. (We cannot use the file names as keys in an associative array because they might not be proper identifiers.) If you are concerned with performance for many files, using a more powerful language like Python might be worth consideration.
I would also like to mention that the above example simply compares strings. It will not realize that important.txt and ./important.txt are the same file and hence delete the file. It would be more robust to convert the file name to a canonical path using readlink -f before comparing it.
Furthermore, your users might want to be able to put globing patterns (like important.* into the list of files to keep. If you want to handle those, extra logic would be required.
Overall, specifying what files to not delete seems a little dangerous as the error is on the bad side.
Provided there's no spaces or special escaped chars in the file names, either of these (or variations of these) would work:
rm -v $(stat -c %n * | sort excluded_file_list | uniq -u)
stat -c %n * | grep -vf excluded_file_list | xargs rm -v

How to get the most recent timestamped file in BASH

I'm writing a deployment script that saves timestamped backup files to a backups directory. I'd like to do a rollback implementation that would roll back to the most recent file.
My backups directory:
$:ls
. 1341094065_public_html_bu 1341094788_public_html_bu
.. 1341094390_public_html_bu
1341093920_public_html_bu 1341094555_public_html_bu
I want to identify the most recent file (by timestamp in the filename) in the backup directory, and save its name to a variable, then cp it to ../public_html, and so on...
ls -t will sort files by mtime. ls -t | head -n1 will select the newest file. This is independent of any naming scheme you have, which may or may not be a plus.
...and a more "correct" way, which won't break when filenames contain newlines, and also not when there are no matching files (unexpanded glob results)
for newestfile in ./* ; do : ; done
if test -e "$newestfile"; then do something with "$newestfile" ; fi
The latest-timestamped filename should sort last alphabetically. So you can then use tail -n1 to extract it.
For files that don't have newlines in their names:
shopt -s nullglob
printf '%s\n' "$buDir"/* | tail -n 1

How can I process a list of files that includes spaces in its names in Unix?

I'm trying to list the files in a directory and do something to them in the Mac OS X prompt.
It should go like this: for f in $(ls -1); do echo $f; done
If I have files without spaces in their names (fileA.txt, fileB.txt), the echo works fine.
If the files include spaces in their names ("file A.txt", "file B.txt"), I get 4 strings (file, A.txt, file, B.txt).
I've tried quoting the listing command, but it only changed the problem.
If I do this: for f in $(ls -1); do echo $f; done
I get: file A.txt\nfile B.txt
(It displays correctly, but it is a single string and I need the 2 lines separated.
Step away from ls if at all possible. Use find from the findutils package.
find /target/path -type f -print0 | xargs -0 your_command_here
-print0 will cause find to output the names separated by NUL characters (ASCII zero). The -0 argument to xargs tells it to expect the arguments separated by NUL characters too, so everything will work just fine.
Replace /target/path with the path under which your files are located.
-type f will only locate files. Use -type d for directories, or omit altogether to get both.
Replace your_command_here with the command you'll use to process the file names. (Note: If you run this from a shell using echo for your_command_here you'll get everything on one line - don't get confused by that shell artifact, xargs will do the expected right thing anyway.)
Edit: Alternatively (or if you don't have xargs), you can use the much less efficient
find /target/path -type f -exec your_command_here \{\} \;
\{\} \; is the escape for {} ; which is the placeholder for the currently processed file. find will then invoke your_command_here with {} ; replaced by the file name, and since your_command_here will be launched by find and not by the shell the spaces won't matter.
The second version will be less efficient since find will launch a new process for each and every file found. xargs is smart enough to pipe the commands to a newly launched process if it can figure it's safe to do so. Prefer the xargs version if you have the choice.
for f in *; do echo "$f"; done
should do what you want. Why are you using ls instead of * ?
In general, dealing with spaces in shell is a PITA. Take a look at the $IFS variable, or better yet at Perl, Ruby, Python, etc.
Here's an answer using $IFS as discussed by derobert
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
You can pipe the arguments into read. For example, to cat all files in the directory:
ls -1 | while read FILENAME; do cat "$FILENAME"; done
This means you can still use ls, as you have in your question, or any other command that produces $IFS delimited output.
The while loop makes it much easier to do several things to the argument, and makes complex processing more readable in my opinion. A contrived example:
ls -1 | while read FILE
do
echo 1: "$FILE"
echo 2: "$FILE"
done
look --quoting-style option.
for instance, --quoting-style=c would produce :
$ ls --quoting-style=c
"file1" "file2" "dir one"
Check out the manpage for xargs:
it works like this:
ls -1 /tmp/*.jpeg | xargs rm

Resources