How do you compress multiple folders at a time, using a shell? - bash

There are n folders in the directory named after the date, for example:
20171002 20171003 20171005 ...20171101 20171102 20171103 ...20180101 20180102
tips: Dates are not continuous.
I want to compress every three folders in each month into one compression block.
For example:
tar jcvf mytar-20171002_1005.tar.bz2 20171002 20171003 20171005
How to write a shell to do this?

You need to do a for loop on your ls variable, then parse the directory name.
dir_list=$(ls)
prev_month=""
times=0
first_dir=""
last_dir=""
dir_list=()
for i in $dir_list; do
month=${i:0:6} #here month will be year plus month
if [ "$month" = "$prev_month" ]; then
i=$(($i+1))
if [ "$i" -eq "3" ]; then
#compress here
dir_list=()
first_dir=""
last_dir=""
else
last_dir=$i
dir_list+=($i)
fi
else
if [ "$first_dir" = "" ]; then
first_dir=$i
else
#compress here
first_dir="$i"
last_dir=""
dir_list=()
fi
fi
This code is not tested and may contain syntaxe error. '#compress here' need to be replace by a loop on the array to create a string to compress.

Assuming you don't have too many directories (I think the limit is several hundred), then you can use Bash's array manipulation.
So, you first load all your directory names into a Bash array:
dirs=( $(ls) )
(I'm going to assume files have no spaces in their names, otherwise it gets a bit dicey)
Then you can use Bash's array slice syntax to pop 3 elements at a time from the array:
while [ "${#dirs[#]}" -gt 0 ]; do
dirs_to_compress=( "${dirs[#]:0:3}" )
dirs=( "${dirs[#]:3}" )
# do something with dirs_to_compress
done
The rest should be pretty easy.

You can achieve this with xargs, a bash while loop, and awk:
ls | xargs -n3 | while read line; do
tar jcvf $(echo $line | awk '{print "mytar-"$1"_"substr($NF,5,4)".tar.bz2"}') $line
done

unset folders
declare -A folders
g=3
for folder in $(ls -d */); do
folders[${folder:0:6}]+="${folder%%/} "
done
for folder in "${!folders[#]}"; do
for((i=0; i < $(echo ${folders[$folder]} | tr ' ' '\n' | wc -l); i+=g)) do
group=(${folders[$folder]})
groupOfThree=(${group[#]:i:g})
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[-1]:4:4}.tar.bz2 ${groupOfThree[#]}
done
done
This script finds all folders in the current directory, seperates them in groups of months, makes groups of at most three folders and creates a .tar.bz2 for each of them with the name you used in the question.
I tested it with those folders:
20171101 20171102 20171103 20171002 20171003 20171005 20171007 20171009 20171011 20171013 20180101 20180102
And the created tars are:
mytar-20171002_1005.tar.bz2
mytar-20171007_1011.tar.bz2
mytar-20171013_1013.tar.bz2
mytar-20171101_1103.tar.bz2
mytar-20180101_0102.tar.bz2
Hope that helps :)
EDIT: If you are using bash version < 4.2 then replace the line:
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[-1]:4:4}.tar.bz2 ${groupOfThree[#]}
by:
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[`expr ${#groupOfThree[#]} - 1`]:4:4}.tar.bz2 ${groupOfThree[#]}
That's because bash version < 4.2 doesn't support negative indices for arrays.

Related

BASH: File sorting according to file name

I need to sort 12000 filles into 1000 groups, according to its name and create for each group a new folder containing filles of this group. The name of each file is given in multi-column format (with _ separator), where the second column is varried from 1 to 12 (number of the part) and the last column ranged from 1 to 1000 (number of the system), indicating that initially 1000 different systems (last column) were splitted on 12 separate parts (second column).
Here is an example for a small subset based on 3 systems devided by 12 parts, totally 36 filles.
7000_01_lig_cne_1.dlg
7000_02_lig_cne_1.dlg
7000_03_lig_cne_1.dlg
...
7000_12_lig_cne_1.dlg
7000_01_lig_cne_2.dlg
7000_02_lig_cne_2.dlg
7000_03_lig_cne_2.dlg
...
7000_12_lig_cne_2.dlg
7000_01_lig_cne_3.dlg
7000_02_lig_cne_3.dlg
7000_03_lig_cne_3.dlg
...
7000_12_lig_cne_3.dlg
I need to group these filles based on the second column of their names (01, 02, 03 .. 12), thus creating 1000 folders, which should contrain 12 filles for each system in the following manner:
Folder1, name: 7000_lig_cne_1, it contains 12 filles: 7000_{this is from 01 to 12}_lig_cne_1.dlg
Folder2, name: 7000_lig_cne_2, it contains 12 filles 7000_{this is from 01 to 12}_lig_cne_2.dlg
...
Folder1000, name: 7000_lig_cne_1000, it contains 12 filles 7000_{this is from 01 to 12}_lig_cne_1000.dlg
Assuming that all *.dlg filles are present withint the same dir, I propose bash loop workflow, which only lack some sorting function (sed, awk ??), organized in the following manner:
#set the name of folder with all DLG
home=$PWD
FILES=${home}/all_DLG/7000_CNE
# set the name of protein and ligand library to analyse
experiment="7000_CNE"
#name of the output
output=${home}/sub_folders_to_analyse
#now here all magic comes
rm -r ${output}
mkdir ${output}
# sed sollution
for i in ${FILES}/*.dlg # define this better to suit your needs
do
n=$( <<<"$i" sed 's/.*[^0-9]\([0-9]*\)\.dlg$/\1/' )
# move the file to proper dir
mkdir -p ${output}/"${experiment}_lig$n"
cp "$i" ${output}/"${experiment}_lig$n"
done
! Note: there I indicated beggining of the name of each folder as ${experiment} to which I add the number of the final column $n at the end. Would it be rather possible to set up each time the name of the new folder automatically based on the name of the coppied filles? Manually it could be achived via skipping the second column in the name of the folder
cp ./all_DLG/7000_*_lig_cne_987.dlg ./output/7000_lig_cne_987
Iterate over files. Extract the destination directory name from the filename. Move the file.
for i in *.dlg; do
# extract last number with your favorite tool
n=$( <<<"$i" sed 's/.*[^0-9]\([0-9]*\)\.dlg$/\1/' )
# move the file to proper dir
echo mkdir -p "folder$n"
echo mv "$i" "folder$n"
done
Notes:
Do not use upper case variables in your scripts. Use lower case variables.
Remember to quote variables expansions.
Check your scripts with http://shellcheck.net
Tested on repl
update: for OP's foldernaming convention:
for i in *.dlg; do
foldername="$HOME/output/${i%%_*}_${i#*_*_}"
echo mkdir -p "$foldername"
echo mv "$i" "$foldername"
done
This might work for you (GNU parallel):
ls *.dlg |
parallel --dry-run 'd={=s/^(7000_).*(lig.*)\.dlg/$1$2/=};mkdir -p $d;mv {} $d'
Pipe the output of ls command listing files ending in .dlg to parallel, which creates directories and moves the files to them.
Run the solution as is, and when satisfied the output of the dry run is ok, remove the option --dry-run.
The solution could be one instruction:
parallel 'd={=s/^(7000_).*(lig.*)\.dlg/$1$2/=};mkdir -p $d;mv {} $d' ::: *.dlg
Using POSIX shell's built-in grammar only and sort:
#!/usr/bin/env sh
curdir=
# Create list of files with newline
# Safe since we know there is no special
# characters in name
printf -- %s\\n *.dlg |
# Sort the list by 5th key with _ as field delimiter
sort -t_ -k5 |
# Iterate reading the _ delimited fields of the sorted list
while IFS=_ read -r _ _ c d e; do
# Compose the new directory name
newdir="${c}_${d}_${e%.dlg}"
# If we enter a new group / directory
if [ "$curdir" != "$newdir" ]; then
# Make the new directory current
curdir="$newdir"
# Create the new directory
echo mkdir -p "$curdir"
# Move all its files into it
echo mv -- *_"$curdir.dlg" "$curdir/"
fi
done
Optionally as a sort and xargs arguments stream:
printf -- %s\\n * |
sort -u -t_ -k5
xargs -n1 sh -c
'd="lig_cne_${0##*_}"
d="${d%.dlg}"
echo mkdir -p "$d"
echo mv -- *"_$d.dlg" "$d/"
'
Here is a very simple awk script that do the trick in single sweep.
script.awk
BEGIN{FS="[_.]"} # make field separator "_" or "."
{ # for each filename
dirName=$1"_"$3"_"$4"_"$5; # compute the target dir name from fields
sysCmd = "mkdir -p " dirName"; cp "$0 " "dirName; # prepare bash command
system(sysCmd); # run bash command
}
running script.awk
ls -1 *.dlg | awk -f script.awk
oneliner awk script
ls -1 *.dlg | awk 'BEGIN{FS="[_.]"}{d=$1"_"$3"_"$4"_"$5;system("mkdir -p "d"; cp "$0 " "d);}'

How to rename files with incrementing numbers to files with that number plus 10

Hi I have a list of files ex. 0.png, 1.png ... 60.png, 61.png and I want to rename all the files to 10.png,11.png ... 70.png, 71.png however I do not know how I could do that.
In bash, you can use a parameter expansion to handle the rename, e.g.
for name in *.png; do
val="${name%.png}"
val=$((val+10))
mv "$name" "$val.png"
done
Explanation
val is created from the parameter expansion "${name%.png}" which simply trims ".png" from the right-hand side of the filename.
val=$((val+10)) adds 10 to the number.
mv "$name" "$val.png" moves the file from its original name to the new name with the value increased by 10.
If you want to eliminate the intermediate val variable, you can do it all in a single expression, e.g.
for name in *.png; do
mv "$name" "$((${name%.png} + 10)).png"
done
Look things over and let me know if you have further questions.
Assuming that the filenames are of the form number.ext, this function would do the trick.
#!/bin/bash
function rename_file() {
local file=$1
local fname=$(($(echo $file | cut -d. -f1) + 10))
local ext=$(echo $file | cut -d. -f2)
mv $file $fname.$ext
}
To rename a file, call rename_file file_name in your shell script.

Finding the file name in a directory with a pattern

I need to find the latest file - filename_YYYYMMDD in the directory DIR.
The below is not working as the position is shifting each time because of the spaces between(occurring mostly at file size field as it differs every time.)
please suggest if there is other way.
report =‘ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | cut -d “ “ -f9’
You can use AWK to cut the last field . like below
report=`ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | awk '{print $NF}'`
Cut may not be an option here
If I understand you want to loop though each file in the directory and file the largest 'YYYYMMDD' value and the filename associated with that value, you can use simple POSIX parameter expansion with substring removal to isolate the 'YYYYMMDD' and compare against a value initialized to zero updating the latest variable to hold the largest 'YYYYMMDD' as you loop over all files in the directory. You can store the name of the file each time you find a larger 'YYYYMMDD'.
For example, you could do something like:
#!/bin/sh
name=
latest=0
for i in *; do
test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }
done
printf "%s\n" "$name"
Example Directory
$ ls -1rt
filename_20120615
filename_20120612
filename_20120115
filename_20120112
filename_20110615
filename_20110612
filename_20110115
filename_20110112
filename_20100615
filename_20100612
filename_20100115
filename_20100112
Example Use/Output
$ name=; latest=0; \
> for i in *; do \
> test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }; \
> done; \
> printf "%s\n" "$name"
filename_20120615
Where the script selects filename_20120615 as the file with the greatest 'YYYYMMDD' of all files in the directory.
Since you are using only tools provided by the shell itself, it doesn't need to spawn subshells for each pipe or utility it calls.
Give it a test and let me know if that is what you intended, let me know if your intent was different, or if you have any further questions.

Indexing files and parsing name

I have a directory, ./grd_files/lat36/ that has 7 files in it (n36e114.grd, n36e115.grd, n36e116.grd, n36e117.grd, n36e118.grd, n36e119.grd, n36e120.grd. Also beneath ./grd_files/ are other folders named lat37, lat38, lat39. Each contains some files named in the same format as those in lat36, only instead of n36e114.grd, the file for the e114 longitude in the lat37 folder would be called n37e114. Now, not all lat** folders contain all the longitudes, but I need them to.
I have written a part of the script to determine which lat** folder has the most columns in it (it is lat36 with 7 longitudes). I want to compare the longitudes that exist in lat36 folder to the other folders, and if a column is missing in another folder, I will make it. I can handle the if then statement, but I am stumped on how to compare the lists in bash.
I was thinking to make a list of the file names in the row1 folder, and compare that to the to the files in the other folders, but the names won't and shouldn't match -- only the column part of the name will and should match. So far I have tried to make an array of the file names and then parse it for just the column part of the name. Note that these are actually map tiles, so the names are really in the format of coordinates in northing (row) and easing (col) e.g. n36e114.grd. So I want to isolate all the e114 style parts of the names and check and make sure that they exist in the other rows. I hope that makes sense. Below is what I attempted, but I am not great in bash syntax so I'm stumped. Thanks so much for the help.
col_list_raw=( $(find $maxdirectory -name ".grd" -exec basename {} .grd \;) )
col_list=( for c in ${col_list_raw[#]}; do echo ${col_list_raw[$c]:3:7}; done )
where $maxdirectory is the one with the most columns.*
UPDATE: I have removed what I described in italics above and attempted to incorporate the solution from John1024. Below is the code.
cd ./grd_files
for row in lat*/
do
ls "$row" | sed 's/.*lon/lon/' >"${row%/}.tmp"
done
for f in lat*.tmp
do
grep -vFf "$f" ${latXX}.tmp >missing.tmp
[ -s missing.tmp ] && echo ${f%.tmp} is missing $(cat missing.tmp)
done
cd ..
Where latXX is the folder with the most longitudes. John1024's first loop works nicely, and I get the correct lists for each of the lat** folders, but the second loop straight up compares the lists , returning:
lat37 is missing n36e114.grd n36e115.grd n36e116.grd n36e117.grd n36e118.grd n36e119.grd n36e120.grd
lat38 is missing n36e114.grd n36e115.grd n36e116.grd n36e117.grd n36e118.grd n36e119.grd n36e120.grd
lat39 is missing n36e114.grd n36e115.grd n36e116.grd n36e117.grd n36e118.grd n36e119.grd n36e120.grd
I need that loop to compare only part of the file name. ie I want to check each folder for the existence of each longitude. So that if file `n37e114.grd' exists, nothing happens, but if it does not exist, that information is returned and I can execute a command based on the missing file. I hope my edits clear up the naming convention and are understandable. Thanks again for the help. AM
SOLUTION:
thanks to the help of #John1024 I was able to find a solution. I have reproduced the final solution below. Following this, I read in the *.out files and conduct my command on each line of them.
cd ./grd_files
for lat in */
do
ls "$lat" | sed 's/[a-z][1-9][1-9].*\([a-z][0-9][0-9]*\).grd/\1/' >"${lat%/}.tmp"
done
for file in *.tmp
do
lat=$(echo $file | awk -F "." '{print $1}')
grep -vFf "$file" ${xXX}.tmp >${lat}missing.out
[ -s ${lat}missing.out ] && echo ${file%.tmp} is missing $(cat ${lat}missing.out)
done
The question includes two different naming schemes for the files. Both would work the same, but to keep it simple and intuitive, this answer uses the first scheme.
It is possible to loop through bash arrays to find the missing columns. However, grep is well-suited to this task, greatly simplifies the logic, and, if there are many columns and rows, it is likely much faster. Using grep:
cd ./grd_files
for row in row*/
do
ls "$row" | sed 's/.*col/col/' >"${row%/}.tmp"
done
for f in row*.tmp
do
grep -vFf "$f" row1.tmp >missing.tmp
[ -s missing.tmp ] && echo ${f%.tmp} is missing $(cat missing.tmp)
done
The first loop above, creates lists of columns that exist in each of the rows. These lists are saved in temporary files name row1.tmp, row2.tmp, etc.
The second loop compares each of those lists to the reference row, row1.tmp. The list of columns missing from that row are saved in temporary file missing.tmp. If missing.tmp has a nonzero size, then there are missing columns and a report is generated.
For cleanup, one might want to delete the tmp files. If so, add this line to the end of the script:
rm row*.tmp missing.tmp
Fancier version
Using process substitution, the need for many of the temporary files can be eliminated:
trap "rm missing.tmp" EXIT
for row in row*/
do
ls row1/ | sed 's/.*col/col/' | grep -vFf <(ls "$row" | sed 's/.*col/col/') >missing.tmp
[ -s missing.tmp ] && echo $row is missing $(cat missing.tmp)
done
This version also uses trap to assure that the sole remaining temporary file is removed when the script is finished.
Using the other naming scheme as per revised question
cd ./grd_files
for row in lat*/
do
ls "$row" | sed 's/.*n[0-9][0-9]e/e/' >"${row%/}.tmp"
done
for f in lat*.tmp
do
grep -vFf "$f" ${latXX}.tmp >missing.tmp
[ -s missing.tmp ] && echo ${f%.tmp} is missing $(cat missing.tmp)
done
cd ..
As I told you in the comment, supplying the test data is a good practice. In this case you would got much more answers if supplied a script what creating a test case, something such next:
mkdir grid
cd grid
mkdir lat3{5..9}
#if you don't know the {3..9} expansion, simply write
#mkdir lat36 lat37 lat38 lat39
touch lat35/n35e111.grd
touch lat36/n36e11{4..9}.grd lat36/n36e120.grd
touch lat37/n37e11{4,6,8}.grd
touch lat38/n38e11{4..9}.grd
#39 missing all files
Such script what creating an test case helps much more as full page of words. ;) Or, if no script, at least supply the output of find like find grid -print. Your first edit helps a bit, (I missed it) and +100 to #John1024's work.
Now about the solution.
Your final solution have one problem. What if the directory with the MOST LONGITUDES (your latXX) missing some gridfile what exists in some other directories? E.g. it has the most gridfiles, but still not all. Like in the above test case, the lat36 contains 7 files (most of all), but sill missing a file n36e111.grd (because the 111 exists only in the lat35)?
Therefore i created an alternative solution, what eliminates this problem and show the result as the next matrix:
111 114 115 116 117 118 119 120
35: + no no no no no no no # the 111 is here
36: no + + + + + + + # the dir with a MOST of longitudes but missing 111
37: no + no + no + no no
38: no + + + + + + no
39: no no no no no no no no # missing all longitudes
the script
start="./test/grid"
cd "$start" || err "can cd to $start" || exit 1
known_longs=$(find . -type f -name \*.grd -print | sed 's:.*/n.*e\([0-9][0-9]*\)\.grd:\1:' | sort -u)
known_lats=$(find . -type d -print | grep -oP 'lat\K\d+(?=/?)' | sort -u)
print_matrix() {
echo -ne "\t"
paste -s - <<<"$known_longs"
for lat in $known_lats
do
echo -en "$lat:"
for long in $known_longs
do
[[ -e "./lat${lat}/n${lat}e${long}.grd" ]] && echo -en "\t+" || echo -en "\tno"
done
echo
done
}
print_matrix
The logic is easy:
search for all known longs e.g. for the filenames what contains eNNN
search for all known lats e.g. for the directories wit latNN
in a cycle test the existence if the files
The above printed matrix is probably not very useful, because you probably want do something with the found or missing files, so here is an action variant of the script.
start="./test/grid"
cd "$start" || err "can cd to $start" || exit 1
known_longs=$(find . -type f -name \*.grd -print | sed 's:.*/n.*e\([0-9][0-9]*\)\.grd:\1:' | sort -u)
known_lats=$(find . -type d -print | grep -oP 'lat\K\d+(?=/?)' | sort -u)
do_if_exists() {
local xlat="$1"
local xlong="$2"
filename="n${xlat}e${xlong}.grd"
#do nothing
}
do_if_missing() {
local xlat="$1"
local xlong="$2"
filename="n${xlat}e${xlong}.grd"
echo "from lat$xlat missing $filename"
}
do_actions() {
for lat in $known_lats
do
for long in $known_longs
do
[[ -e "./lat${lat}/n${lat}e${long}.grd" ]] && do_if_exists $lat $long || do_if_missing $lat $long
done
done
}
do_actions
what for the missing file do an action (echoes what missing), and the output is the next:
from lat35 missing n35e114.grd
from lat35 missing n35e115.grd
from lat35 missing n35e116.grd
from lat35 missing n35e117.grd
from lat35 missing n35e118.grd
from lat35 missing n35e119.grd
from lat35 missing n35e120.grd
from lat36 missing n36e111.grd
from lat37 missing n37e111.grd
from lat37 missing n37e115.grd
from lat37 missing n37e117.grd
from lat37 missing n37e119.grd
from lat37 missing n37e120.grd
from lat38 missing n38e111.grd
from lat38 missing n38e120.grd
from lat39 missing n39e111.grd
from lat39 missing n39e114.grd
from lat39 missing n39e115.grd
from lat39 missing n39e116.grd
from lat39 missing n39e117.grd
from lat39 missing n39e118.grd
from lat39 missing n39e119.grd
from lat39 missing n39e120.grd
Of course, is possible optimise more, like:
do the find only once (helps if the directory tree is large - by creating a list of filenames by the find command
don't test each file, but test the existence of the filename in the previously created list of filenames
like in the next
startdir="./test/grid"
(cd "$startdir" || err "can cd to $start" || exit 1
gridlist="/tmp/griglist.$$"
trap "rm -f $gridlist;exit" 0 2
find . -regex '\./lat[0-9][0-9]*.*' -print >$gridlist
known_longs=($(sed -n 's:^.*/n[0-9][0-9]*e\([0-9][0-9]*\)\.grd$:\1:p' $gridlist | sort -u))
known_lats=($(grep -oP '/lat\K\d+((?=/?)|$)' $gridlist | sort -u))
full_list() {
for lat in ${known_lats[#]}
do
for long in ${known_longs[#]}
do
echo "./lat${lat}/n${lat}e${long}.grd"
done
done
}
comm -13 $gridlist <(full_list)) | while read missing
do
#do something with the miising file
echo "$missing"
done

Need to pick Latest File From a Dir Using Shell Script

I am new to Shell Script and I got a requirement to pick the latest files from a dir using Shell script
Directory Name : FTPDIR
File In this Dir will be of
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
Note: (Need to Pick one Latest from each Group)- Below is the out put which I need to get after executing the shell script
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
Ex: In the latest File APC5502015VP092020121451.csv the no 092020121451 is the date part in the format : MMDDYYYYHHMM and string part is APC5502015VP (Length Not Fixed in String Part)
I need to pick those three files from the dir using shell script
Can you help me to resolve this?
It's going to be really problematic to do this safely in just bash. As Jonathan mentioned, "special" characters like spaces or newlines may bung up your script.
If we can assume that there won't be any of those, then we can do most of job in bash, without involving other tools.
# Make an associative array to record types, in the second loop...
declare -A a
for file in *.csv; do
# First, we convert the filenames into something that can be sorted.
# The next three lines account for your "unknown length" in the first part
# of the filename. We assume the date+time is the 12 chars before ".csv".
new="$(rev <<<"$file")"
new="${new:4:12}"
new="$(rev <<<"$new")"
new="${new:4:4}${new:0:2}${new:2:2}${new:8:4}"
len=$(( ${#file} - 16 ))
echo "$new ${file:0:$len} $file"
done | sort | while read date type file; do
# Next, we print only the first of each "type"...
if [[ ${a[$type]} -eq 0 ]]; then
a[$type]=1
echo "$file"
fi
# And stop once we have collected three types.
if [[ ${#a[*]} -ge 3 ]]; then
break
fi
done
As I say, this doesn't handle newlines in filenames.
Note also that this uses rev and sort, which are not built in to bash. The rev parts could be done internally, using more code, which might make them execute faster, but you'd only see a difference in very extreme cases. There's not much we can do about sort, since there isn't a built-in within bash.
This Perl script works on the given data. No doubt it could be improved.
#!/usr/bin/env perl
use strict;
use warnings;
my %bases;
while (<>)
{
chomp;
my $name = $_;
my($prefix, $mmdd, $yyyy, $hhmm) = ($name =~ m/(.*)(\d{4})(\d{4})(\d{4})\.csv/);
#print "$name = $prefix $yyyy $mmdd $hhmm\n";
my $stamp = "$yyyy$mmdd$hhmm";
if (!exists($bases{$prefix}) || ($stamp > $bases{$prefix}->{stamp}))
{
$bases{$prefix} = { name => $name, stamp => $stamp };
}
}
foreach my $prefix (sort keys %bases)
{
print "$bases{$prefix}->{name}\n";
}
Output:
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
this is the awk solution:
cd FTPDIR
ls -1|awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}'
Testted Below:
> cat temp
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
> awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}' temp
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
APC5502015VP092020121451.csv

Resources