Need to pick Latest File From a Dir Using Shell Script - bash

I am new to Shell Script and I got a requirement to pick the latest files from a dir using Shell script
Directory Name : FTPDIR
File In this Dir will be of
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
Note: (Need to Pick one Latest from each Group)- Below is the out put which I need to get after executing the shell script
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
Ex: In the latest File APC5502015VP092020121451.csv the no 092020121451 is the date part in the format : MMDDYYYYHHMM and string part is APC5502015VP (Length Not Fixed in String Part)
I need to pick those three files from the dir using shell script
Can you help me to resolve this?

It's going to be really problematic to do this safely in just bash. As Jonathan mentioned, "special" characters like spaces or newlines may bung up your script.
If we can assume that there won't be any of those, then we can do most of job in bash, without involving other tools.
# Make an associative array to record types, in the second loop...
declare -A a
for file in *.csv; do
# First, we convert the filenames into something that can be sorted.
# The next three lines account for your "unknown length" in the first part
# of the filename. We assume the date+time is the 12 chars before ".csv".
new="$(rev <<<"$file")"
new="${new:4:12}"
new="$(rev <<<"$new")"
new="${new:4:4}${new:0:2}${new:2:2}${new:8:4}"
len=$(( ${#file} - 16 ))
echo "$new ${file:0:$len} $file"
done | sort | while read date type file; do
# Next, we print only the first of each "type"...
if [[ ${a[$type]} -eq 0 ]]; then
a[$type]=1
echo "$file"
fi
# And stop once we have collected three types.
if [[ ${#a[*]} -ge 3 ]]; then
break
fi
done
As I say, this doesn't handle newlines in filenames.
Note also that this uses rev and sort, which are not built in to bash. The rev parts could be done internally, using more code, which might make them execute faster, but you'd only see a difference in very extreme cases. There's not much we can do about sort, since there isn't a built-in within bash.

This Perl script works on the given data. No doubt it could be improved.
#!/usr/bin/env perl
use strict;
use warnings;
my %bases;
while (<>)
{
chomp;
my $name = $_;
my($prefix, $mmdd, $yyyy, $hhmm) = ($name =~ m/(.*)(\d{4})(\d{4})(\d{4})\.csv/);
#print "$name = $prefix $yyyy $mmdd $hhmm\n";
my $stamp = "$yyyy$mmdd$hhmm";
if (!exists($bases{$prefix}) || ($stamp > $bases{$prefix}->{stamp}))
{
$bases{$prefix} = { name => $name, stamp => $stamp };
}
}
foreach my $prefix (sort keys %bases)
{
print "$bases{$prefix}->{name}\n";
}
Output:
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv

this is the awk solution:
cd FTPDIR
ls -1|awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}'
Testted Below:
> cat temp
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
> awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}' temp
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
APC5502015VP092020121451.csv

Related

Bash Bulk Rename Folders with 3-Digit Prefix and Delimiter

I have a series of folders that I'd like to rename with a prefix number and delimited text. For instance:
% ls
blue green keyboard pictures red tango yellow
flyer gum orange pop runner videos
rename to:
% ls
001-blue 002-green 003-keyboard 004-pictures 005-red 006-tango 007-yellow
008-flyer 009-gum 010-orange 011-pop 012-runner 013-videos
I am using the following to rename except that after 009, I then have 0010, 0011, and so on. I would like to keep prefix numbers to 3 digits.
% i=0; for x in *; do; mv "$x" "00$i-$x" ; i=$((i + 1)); done
I know the problem is in the mv command because of the hard-coded 00 in the destination name, but I don't know how to change that to a 3-digit exclusive destination name with the $i variable.
Thanks in advance.
Use this Perl one-liner:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and system $cmd for #ARGV;'
To do a dry run and print the intended commands without renaming any files, use print instead of system, like so:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and print $cmd for #ARGV;'
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
See also the docs for sprintf.

shell script compare file with multiple line pattern

I have a file which is created after some manual configuration.
I need to check this file automatically with a shell script.
The file looks like this:
eth0;eth0;1c:98:ec:2a:1a:4c
eth1;eth1;1c:98:ec:2a:1a:4d
eth2;eth2;1c:98:ec:2a:1a:4e
eth3;eth3;1c:98:ec:2a:1a:4f
eth4;eth4;48:df:37:58:da:44
eth5;eth5;48:df:37:58:da:45
eth6;eth6;48:df:37:58:da:46
eth7;eth7;48:df:37:58:da:47
I want to compare it to a pattern like this:
eth0;eth0;*
eth1;eth1;*
eth2;eth2;*
eth3;eth3;*
eth4;eth4;*
eth5;eth5;*
eth6;eth6;*
eth7;eth7;*
If I would only have to check this pattern I could run this loop:
c=0
while [ $c -le 7 ]
do
if [ "$(grep "eth"${c}";eth"${c}";*" current_mapping)" ];
then
echo "eth$c ok"
fi
(( c++ ))
done
There are 6 or more different patterns possible. A pattern could also look like this for example (depending and specific configuration requests):
eth4;eth0;*
eth5;eth1;*
eth6;eth2;*
eth7;eth3;*
eth0;eth4;*
eth1;eth5;*
eth2;eth6;*
eth3;eth7;*
So I don't think I can run a standard grep per line command in a loop. The eth numbers are not consistently the same.
Is it possible somehow to compare the whole file to pattern like it would be possible with grep for a single line?
Assuming file is your data file and patt is your file that contains above pattern. You can use this grep -f in conjunction with sed in a process substitution that replaces * with .* and ? with . to make it a workable regex.
grep -f <(sed 's/\*/.*/g; s/?/./g' patt) file
eth0;eth0;1c:98:ec:2a:1a:4c
eth1;eth1;1c:98:ec:2a:1a:4d
eth2;eth2;1c:98:ec:2a:1a:4e
eth3;eth3;1c:98:ec:2a:1a:4f
eth4;eth4;48:df:37:58:da:44
eth5;eth5;48:df:37:58:da:45
eth6;eth6;48:df:37:58:da:46
eth7;eth7;48:df:37:58:da:47
I wrote this loop now and it does the job (current_mapping being the file with the content in the first code block of the question). I would have to create arrays with different patterns and use a case for every pattern. I was just wondering if there is something like grep for multiple lines, that could the same without writing this loop.
array=("eth0;eth0;*" "eth1;eth1;*" "eth2;eth2;*" "eth3;eth3;*" "eth4;eth4;*" "eth5;eth5;*" "eth6;eth6;*" "eth7;eth7;*")
c=1
while [ $c -le 8 ]
do
if [ ! "$(sed -n "${c}"p current_mapping | grep "${array[$c-1]}")" ];
then
echo "somethings wrong"
fi
(( c++ ))
done
Try any:
grep -P '(eth[0-9]);\1'
grep -E '(eth[0-9]);\1'
sed -n '/\(eth[0-9]\);\1/p'
awk -F';' '$1 == $2'
There are commands only. Apply them to a pipe or file.
Updated the answer after the question was edited.
As we can see the task requirements are as follows:
a file (a set of lines) formatted like ethN;ethM;MAC
examine each line for equality ethN and ethM
if they are equal, output a string ethN ok
If I understand the task correctly we can achieve this using the following code without loops:
awk -F';' '$1 == $2 { print $1, "ok" }'

How do you compress multiple folders at a time, using a shell?

There are n folders in the directory named after the date, for example:
20171002 20171003 20171005 ...20171101 20171102 20171103 ...20180101 20180102
tips: Dates are not continuous.
I want to compress every three folders in each month into one compression block.
For example:
tar jcvf mytar-20171002_1005.tar.bz2 20171002 20171003 20171005
How to write a shell to do this?
You need to do a for loop on your ls variable, then parse the directory name.
dir_list=$(ls)
prev_month=""
times=0
first_dir=""
last_dir=""
dir_list=()
for i in $dir_list; do
month=${i:0:6} #here month will be year plus month
if [ "$month" = "$prev_month" ]; then
i=$(($i+1))
if [ "$i" -eq "3" ]; then
#compress here
dir_list=()
first_dir=""
last_dir=""
else
last_dir=$i
dir_list+=($i)
fi
else
if [ "$first_dir" = "" ]; then
first_dir=$i
else
#compress here
first_dir="$i"
last_dir=""
dir_list=()
fi
fi
This code is not tested and may contain syntaxe error. '#compress here' need to be replace by a loop on the array to create a string to compress.
Assuming you don't have too many directories (I think the limit is several hundred), then you can use Bash's array manipulation.
So, you first load all your directory names into a Bash array:
dirs=( $(ls) )
(I'm going to assume files have no spaces in their names, otherwise it gets a bit dicey)
Then you can use Bash's array slice syntax to pop 3 elements at a time from the array:
while [ "${#dirs[#]}" -gt 0 ]; do
dirs_to_compress=( "${dirs[#]:0:3}" )
dirs=( "${dirs[#]:3}" )
# do something with dirs_to_compress
done
The rest should be pretty easy.
You can achieve this with xargs, a bash while loop, and awk:
ls | xargs -n3 | while read line; do
tar jcvf $(echo $line | awk '{print "mytar-"$1"_"substr($NF,5,4)".tar.bz2"}') $line
done
unset folders
declare -A folders
g=3
for folder in $(ls -d */); do
folders[${folder:0:6}]+="${folder%%/} "
done
for folder in "${!folders[#]}"; do
for((i=0; i < $(echo ${folders[$folder]} | tr ' ' '\n' | wc -l); i+=g)) do
group=(${folders[$folder]})
groupOfThree=(${group[#]:i:g})
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[-1]:4:4}.tar.bz2 ${groupOfThree[#]}
done
done
This script finds all folders in the current directory, seperates them in groups of months, makes groups of at most three folders and creates a .tar.bz2 for each of them with the name you used in the question.
I tested it with those folders:
20171101 20171102 20171103 20171002 20171003 20171005 20171007 20171009 20171011 20171013 20180101 20180102
And the created tars are:
mytar-20171002_1005.tar.bz2
mytar-20171007_1011.tar.bz2
mytar-20171013_1013.tar.bz2
mytar-20171101_1103.tar.bz2
mytar-20180101_0102.tar.bz2
Hope that helps :)
EDIT: If you are using bash version < 4.2 then replace the line:
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[-1]:4:4}.tar.bz2 ${groupOfThree[#]}
by:
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[`expr ${#groupOfThree[#]} - 1`]:4:4}.tar.bz2 ${groupOfThree[#]}
That's because bash version < 4.2 doesn't support negative indices for arrays.

Changing words in text files using multiple dictionaries

I have a bunch of files which need to be translated using custom dictionaries. Each file contains a line indicating which dictionary to use. Here's an example:
*A:
!
=1
*>A_intro
1r
=2
1r
=3
1r
=4
1r
=5
2A:maj
*-
In the file above, *A: indicates to use dictA.
I can translate this part easily using the following syntax:
sed -f dictA < myfile
My problem is that some files require a change of dictionary half way in the text. For example:
*B:
1B:maj
2E:maj/5
2B:maj
2E:maj/5
*C:
2F:maj/5
2C:maj
2F:maj/5
2C:maj
*-
I would like to write a script to automate the translation process. Using this example, I would like the script to read the first line, select dictB, use dictB to translate each line until it reads *C:, select dictC, and then keep going.
Thanks #Cyrus. That was useful. Here's what I ended up doing.
#!/bin/sh
key="sedDictNull.txt"
while read -r line || [ -n "$line" ] ## Makes sure that the last line is read. See http://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line
do
if [[ $line =~ ^\*[Aa]:$ ]]
then
key="sedDictA.txt"
elif [[ $line =~ ^\*[Aa]#:$ ]]
then
key="sedDictA#.txt"
fi
echo "$line" | sed -f $key
done < $1
I assume your "dictionaries" are really sed scripts that search and replace, like this:
s/2C/nothing/;
s/2B/something/;
You could reorganize these scripts into sections, like this:
/^\*B:/, /^\*[^B]/ {
s/1B/whatever/;
s/2B/something/;
}
/^\*C:/, /^\*[^C]/ {
s/2C/nothing/;
s/2B/something/;
}
And, of course, you could do that on the fly:
for dict in B C
do echo "/^\\*$dict:/, /^\\*[^$dict]/ {"
cat dict.$dict
echo "}"
done | sed -f- dict.in

Using sed on text files with a csv

I've been trying to do bulk find and replace on two text files using a csv. I've seen the questions that SO suggests, and none seem to answer my question.
I've created two variables for the two text files I want to modify. The csv has two columns and hundreds of rows. The first column contains strings (none have whitespaces) already in the text file that need to be replaced with the corresponding strings in same row in the second column.
As a test, I tried the script
#!/bin/bash
test1='long_file_name.txt'
find='string1'
replace='string2'
sed -e "s/$find/$replace/g" $test1 > $test1.tmp && mv $test1.tmp $test1
This was successful, except that I need to do it once for every row in the csv, using the values given by the csv in each row. My hunch is that my while loop was used wrongly, but I can't find the error. When I execute the script below, I get the command line prompt, which makes me think that something has happened. When I check the text files, nothing's changed.
The two text files, this script, and the csv are all in the same folder (it's also been my working directory when I do this).
#!/bin/bash
textfile1='long_file_name1.txt'
textfile2='long_file_name2.txt'
while IFS=, read f1 f2
do
sed -e "s/$f1/$f2/g" $textfile1 > $textfile1.tmp && \
mv $textfile1.tmp $textfile1
sed -e "s/$f1/$f2/g" $textfile2 > $textfile2.tmp && \
mv $textfile2.tmp $textfile2
done <'findreplace.csv'
It seems to me that this code should do what I want it to do (but doesn't); perhaps I'm misunderstanding something fundamental (I'm new to bash scripting)?
The csv looks like this, but with hundreds of rows. All a_i's should be replaced with their counterpart b_i in the next column over.
a_1 b_1
a_2 b_2
a_3 b_3
Something to note: All the strings actually contain underscores, just in case this affects something. I've tried wrapping the variable name in braces a la ${var}, but it still doesn't work.
I appreciate the solutions, but I'm also curious to know why the above doesn't work. (Also, I would vote everyone up, but I lack the reputation to do so. However, know that I appreciate and am learning a lot from your answers!)
If you are going to process lot of data and your patterns can contain a special character I would consider using Perl. Especially if you are going to have a lot of pairs in findreplace.csv. You can use following script as filter or in-place modification with lot of files. As side effect, it will load replacements and create Aho-Corrasic automaton only once per invocation which will make this solution pretty efficient (O(M+N) instead of O(M*N) in your solution).
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $in_place = ( #ARGV and $ARGV[0] =~ /^-i(.*)/ )
? do {
shift;
my $backup_extension = $1;
my $backup_name = $backup_extension =~ /\*/
? sub { ( my $fn = $backup_extension ) =~ s/\*/$_[0]/; $fn }
: sub { shift . $backup_extension };
my $oldargv = '-';
sub {
if ( $ARGV ne $oldargv ) {
rename( $ARGV, $backup_name->($ARGV) );
open( ARGVOUT, '>', $ARGV );
select(ARGVOUT);
$oldargv = $ARGV;
}
};
}
: sub { };
die "$0: File with replacements required." unless #ARGV;
my ( $re, %replace );
do {
my $filename = shift;
open my $fh, '<', $filename;
%replace = map { chomp; split ',', $_, 2 } <$fh>;
close $fh;
$re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/;
};
while (<>) {
$in_place->();
s/$re/$replace{$1}/g;
}
continue {print}
Usage:
./replace.pl replace.csv <file.in >file.out
as well as
./replace.pl replace.csv file.in >file.out
or in-place
./replace.pl -i replace.csv file1.csv file2.csv file3.csv
or with backup
./replace.pl -i.orig replace.csv file1.csv file2.csv file3.csv
or with backup whit placeholder
./replace.pl -ithere.is.\*.original replace.csv file1.csv file2.csv file3.csv
You should convert your CSV file to a sed.script with the following command:
cat replace.csv | awk -F, '{print "s/" $1 "/" $2 "/g";}' > sed.script
And then you will be able to do a one pass replacement:
sed -i -f sed.script longfilename.txt
This will be a faster implementation of what you wanna do.
BTW, sorry, but I do not understand what is wrong with your script which should work except if your CSV file has more than 2 columns.

Resources