How to select most recent file based off of date in filename - bash

I have a list of files
- backups/
- backup.2017-08-28.zip
- backup.2017-08-29.zip
- backup.2017-09-2.zip
I would like to be able to upload the most recent back to a server which I can do with command:
dobackup ~/backups/backup.2017-09-2.zip
My questions is: Within a .sh file (so I can start an automated/cron job for this) how can I get the latest file name to then run that command?
Limitation: I must use the date on the filename not the modifcation metadata.

Adding a couple more files:
backup.2017-08-28.zip
backup.2017-08-29.zip
backup.2017-09-10.zip
backup.2017-09-2.zip
backup.2017-09-28.zip
backup.2017-09-3.zip
How about something like this, though granted, a bit convoluted:
ls -1 backup*zip | sed 's/-\([1-9]\)\./-0\1\./g' | sort [-r] | sed 's/-0\([1-9]\)\./-\1\./g'
sed is looking for a match like -[0-9].
the escaped/matching parens - \( and \) designates a pattern we want to reference in the replacement portion
the new pattern will be -0\1. where the \1 is a reference to the first pattern wrapped in escaped/matching parens (ie, \1 will be replaced with the single digit that matched [0-9])
our period (.) is escaped to make sure it's handled as a literal period and not considered as a single-position wildcard
at this point the ls/sed construct has produced a list of files with 2-digit days
we run through sort (or sort -r) as needed
then run the results back through sed to convert back to a single digit day for days starting with a 0
at this point you can use a head or tail to strip off the first/last line based on which sort/sort -r you used
Running against the sample files:
$ ls -1 backup*zip | sed 's/-\([1-9]\)\./-0\1\./g' | sort | sed 's/-0\([1-9]\)\./-\1\./g'
backup.2017-08-28.zip
backup.2017-08-29.zip
backup.2017-09-2.zip
backup.2017-09-3.zip
backup.2017-09-10.zip
backup.2017-09-28.zip
# reverse the ordering
$ ls -1 backup*zip | sed 's/-\([1-9]\)\./-0\1\./g' | sort -r | sed 's/-0\([1-9]\)\./-\1\./g'
backup.2017-09-28.zip
backup.2017-09-10.zip
backup.2017-09-3.zip
backup.2017-09-2.zip
backup.2017-08-29.zip
backup.2017-08-28.zip

You can sort it on 2nd field delimited by dot:
printf '%s\n' backup.* | sort -t '.' -k2,2r | head -1
backup.2017-09-2.zip

Related

Bash - how to copy latest files by filename to another folder?

Let's say I have these files in folder Test1
AAAA-12_21_2020.txt
AAAA-12_20_2020.txt
AAAA-12_19_2020.txt
BBB-12_21_2020.txt
BBB-12_20_2020.txt
BBB-12_19_2020.txt
I want below latest files to folder Test2
AAAA-12_21_2020.txt
BBB-12_21_2020.txt
This code would work:
ls $1 -U | sort | cut -f 1 -d "-" | uniq | while read -r prefix; do
ls $1/$prefix-* | sort -t '_' -k3,3V -k1,1V -k2,2V | head -n 1
done
We first iterate over every prefix in the directory specified as the first argument, which we get by sorting the list of files and deleting duplicates, before extracting everything before -. Then we sort those filenames by three fields separated by the _ symbol using the -k option of sort (primarily by years in the third field, then months in second and lastly days). We use version sort to be able to ignore the text around and interpret numbers correctly (as opposed to lexicographical sort).
I'm not sure whether this is the best way to do this, as I used only basic bash functions. Because of the date format and the fact that you have to differentiate prefixes, you have to parse the string fully, which is a job better suited for AWK or Perl.
Nonetheless, I would suggest using day-month-year or year-month-day format for machine-readable filenames.
Using awk:
ls -1 Test1/ | awk -v src_dir="Test1" -v target_dir="Test2" -F '(-|_)' '{p=$4""$2""$3; if(!($1 in b) || b[$1] < p){a[$1]=$0}} END {for (i in a) {system ("mv "src_dir"/"a[i]" "target_dir"/")}}'

Extracting all but a certain sequence of characters in Bash

In bash I need to extract a certain sequence of letters and numbers from a filename. In the example below I need to extract just the S??E?? section of the filenames. This must work with both upper/lowercase.
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
Expected output would be:
s01e02
s03e12
S05E11
I've been trying to do this with SED but can't get it to work. This is what I have tried, without success:
sed 's/.*s[0-9][0-9]e[0-9][0-9].*//'
Many thanks for any help.
With sed we can match the desired string in a capture group, and use the I suffix for case-insensitive matching, to accomplish the desired result.
For the sake of this answer I'm assuming the filenames are in a file:
$ cat fnames
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
One sed solution:
$ sed -E 's/.*\.(s[0-9][0-9]e[0-9][0-9])\..*/\1/I' fnames
s01e02
s03e12
S05E11
Where:
-E - enable extended regex support
\.(s[0-9][0-9]e[0-9][0-9])\. - match s??e?? with a pair of literal periods as bookends; the s??e?? (wrapped in parens) will be stored in capture group #1
\1 - print out capture group #1
/I - use case-insensitive matching
I think your pattern is ok. With the grep -o you get only the matched part of a string instead of matching lines. So
grep -io 'S[0-9]{2}E[0-9]{2}'
solves your problem. Compared to your pattern only numbers will be matched. Maybe you can put it in an if, so lines without a match show that something is wrong with the filename.
Suppose you have those file names:
$ ls -1
great.s03e12.h264.Dolby.mkv
my.show.s01e02.h264.aac.subs.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
You can extract the substring this way:
$ printf "%s\n" * | sed -E 's/^.*([sS][0-9][0-9][eE][0-9][0-9]).*/\1/'
Or with grep:
$ printf "%s\n" *.m* | grep -o '[sS][0-9][0-9][eE][0-9][0-9]'
Either prints:
s03e12
s01e02
S05E11
You could use that same sed or grep on a file (with filenames in it) as well.

How to create argument variable in bash script

I am trying to write a script such that I can identify number of characters of the n-th largest file in a sub-directory.
I was trying to assign n and the name of sub-directory into arguments like $1, $2.
Current directory: Greetings
Sub-directory: language_files, others
Sub-directory: English, German, French
Files: Goodmorning.csv, Goodafternoon.csv, Goodevening.csv ….
I would be at directory “Greetings”, while I indicating subdirectory (English, German, French), it would show the nth-largest file in the subdirectory indicated and calculate number of characters as well.
For instance, if I am trying to figure out number of characters of 2nd largest file in English, I did:
langs=$1
n=$2
for langs in language_files/;
Do count=$(find language_files/$1 name "*.csv" | wc -m | head -n -1 | sort -n -r | sed -n $2(p))
Done | echo "The file has $count bytes!"
The result I wanted was:
$ ./script1.sh English 2
The file has 1100 bytes!
The main problem of all the issue is the fact that I don't understand how variables and looping work in bash script.
no need for looping
find language_files/"$1" -name "*.csv" | xargs wc -m | sort -nr | sed -n "$2{p;q}"
for byte counting you should use -c, since -m is for char counting (it may be the same for you).
You don't use the loop variable in the script anyway.
Bash loops are interesting. You are encouraged to learn more about them when you have some time. However, this particular problem might not need a loop. Set lang (you can call it langs if you prefer) and n appropriately, and then try this:
count=$(stat -c'%s %n' language_files/$lang/* | sort -nr | head -n$n | tail -n1 | sed -re 's/^[[:space:]]*([[:digit:]]+).*/\1/')
That should give you the $count you need. Then you can echo it however you like.
EXPLANATION
If you wish to learn how it works:
The stat command outputs various statistics about the named file (or files), in this case %s the file's size and %n the file's name.
The head and tail output respectively the first and last several lines of a file. Together, they select a specific line from the file
The sed command screens a certain part of the line. (You can use cut, instead, if you prefer.)
If you wish to be cleverer, then you can optimize as #karafka has done.

How to read CSV file stored in variable

I want to read a CSV file using Shell,
But for some reason it doesn't work.
I use this to locate the latest added csv file in my csv folder
lastCSV=$(ls -t csv-output/ | head -1)
and this to count the lines.
wc -l $lastCSV
Output
wc: drupal_site_livinglab.csv: No such file or directory
If I echo the file it says: drupal_site_livinglab.csv
Your issue is that you're one directory up from the path you are trying to read. The quick fix would be wc -l "csv-output/$lastCSV".
Bear in mind that parsing ls -t though convenient, isn't completely robust, so you should consider something like this to protect you from awkward file names:
last_csv=$(find csv-output/ -mindepth 1 -maxdepth 1 -printf '%T#\t%p\0' |
sort -znr | head -zn1 | cut -zf2-)
wc -l "$last_csv"
GNU find lists all files along with their last modification time, separating the output using null bytes to avoid problems with awkward filenames.
if you remove -maxdepth 1, this will become a recursive search
GNU sort arranges the files from newest to oldest, with -z to accept null byte-delimited input.
GNU head -z returns the first record from the sorted list.
GNU cut -z at the end discards the timestamp, leaving you with only the filename.
You can also replace find with stat (again, this assumes that you have GNU coreutils):
last_csv=$(stat csv-output/* --printf '%Y\t%n\0' | sort -znr | head -zn1 | cut -zf2-)

Colorize output of a command using multiple "sed" calls

i'd like to colorize the output of "history" with its timestamps. Let's say one line would be
2084 10.05.16 17:08:13 history | sed 's/^[ 0-9]*[ ]/\o033[1;32m&\o033[0m/' | tail -n10
Currently i figured
history | sed 's/^[ 0-9]*[ ]/\o033[1;32m&\o033[0m/' | tail -n10
to print the counter in a yellowish tone. Now i'd like to have the timestamp in another color. I tried
history | sed 's/[ 0-9.:]*[ ]/\o033[1;31m&\o033[0m/' | sed 's/^[ 0-9]*[ ]/\o033[1;32m&\o033[0m/' | tail -n10
but that displays the counter as well as the time stamp in red.
How do i have to write the sed calls to have "2084" in one color and the time stamp "10.05.16 17:13:39" in another?
THX in advance!
history on the machine I use doesn't have a timestamp field but if your history command outputs lines like:
2084 10.05.16 17:08:13 history
then you'd want:
history | sed -E 's/^(\S+)(\s+)(\S+\s+\S+)/\o033[1;31m\1\o033[0m\2\o033[1;32m\3\o033[0m/'
The above uses GNU sed for \S and \s - replace them with [^[:blank:]] and [[:blank:]] respectively if your sed doesn't support them.
I would recommend installing grc (a generic colouriser program) and creating a custom config file, like so:
# conf.history for grc
# sequence number
regexp=^\s+\d+
colours=yellow
=======
# date/time stamp
regexp=\d+\.\d+\.\d+\s+\d+:\d+:\d+
colours=red
count=once
Then you can do this:
history | grcat conf.history
If I set HISTTIMEFORMAT to "%d.%m.%y %H:%M:%S" to match your output, the above works on my machine - I get yellow sequence numbers and red date/time stamps and white everything else.

Resources