Match closest filename based on timestamp - shell

We have backups stored in S3 and need to retrieve a backup based on nearest time stamp. We have files stored in S3 in YYYYMMDD_HHMM.tar.gz format
20181009_0910.tar.gz
20181004_0719.tar.gz
20180925_0414.tar.gz
20180915_2210.tar.gz
Given a timestamp 20180922_1020, we need to fetch file 20180925_0414.tar.gz using shell script.
Thanks.

Using Perl One liner ( Of-course somewhat lengthy !!).
I understand that "nearest" means shortest between the input in either direction. That is if you have 2 files, Oct-1st.txt and Oct-30.txt, and if the input is Oct-20, then Oct-30 file will be the output
$ ls -l *2018*gz
-rw-r--r-- 1 xxxx xxxx 0 Oct 17 00:04 20180915_2210.tar.gz
-rw-r--r-- 1 xxxx xxxx 0 Oct 17 00:04 20180925_0414.tar.gz
-rw-r--r-- 1 xxxx xxxx 0 Oct 17 00:04 20181004_0719.tar.gz
-rw-r--r-- 1 xxxx xxxx 0 Oct 17 00:04 20181009_0910.tar.gz
$ export input=20180922_1020
$ perl -ne 'BEGIN { #VAR=#ARGV; $in=$ENV{input}; $in=~s/_//g;foreach(#VAR) {$x=$_;s/.tar.gz//g;s/_//g;s/(\d+)/abs($1-$in)/e;$KV{$_}=$x};$res=(sort keys %KV)[0]; print "$KV{$res}"} ' 2018*gz
20180925_0414.tar.gz
$ export input=20180905_0101
$ perl -ne 'BEGIN { #VAR=#ARGV; $in=$ENV{input}; $in=~s/_//g;foreach(#VAR) {$x=$_;s/.tar.gz//g;s/_//g;s/(\d+)/abs($1-$in)/e;$KV{$_}=$x};$res=(sort keys %KV)[0]; print "$KV{$res}"} ' 2018*gz
20180915_2210.tar.gz
$
Hope, this helps!

Related

How to automate concatenating several series of files using bash extended globbing and negation patterns?

thank you so much for any advice and feedback on this matter.
This is is my situation:
I have a directory with several hundred files, all that start with foo* and end with *.txt. However, they differ in between the beginning and end with a unique identifier that is "Group#.#" and the files look like so:
foo.Group1.1.txt
foo.Group1.2.txt
foo.Group1.4.txt
foo.Group2.45.txt
.
.
.
foo.Group16.9.txt
The files begin with Group1 and end at Group 16. They are simple one column txt files, each file has several thousand lines. Each row is a number.
I want to so a series of concatenations with these files in which I concatenate all but those files with the "Group1" and then all the files except "Group1" and "Group2" and then all the files except "Group1", "Group2", and "Group3" and so on, until I am left with just the last Group: "Group16"
In order to do this I use a bash extended globbing expression with a negation syntax to concatenate all files except those that have "Group1" as their ID.
I make a directory "jacks" and output the concatenated file into a txt file within this subdirectory:
cat !(*Group1.*) > jacks/jackknife1.freqs.txt
I can then continue using this command, but adding "Group2" and "Group3" for subsequent concatenations.
cat !(*Group1.*|*Group2.*) > jacks/jackknife2.freqs.txt
cat !(*Group1.*|*Group2.*|*Group3.*) > jacks/jackknife3.freqs.txt
Technically, this works. And 16 Groups isn't too terrible to do this manually.
But I am wondering if there is a way, perhaps using loops or bash scripting to automate this process and speed it up?
I would appreciate any advice or leads on this question!
thank you very much,
daniela
Some tries around bash globbing
Try using echo before cat !
touch foo.Group{1..3}.{1..5}.txt
ls -l
total 0
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.5.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.5.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.5.txt
Then
echo !(*Group1.*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.3.txt foo.Group2.4.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.2.txt foo.Group3.3.txt foo.Group3.4.txt foo.Group3.5.txt
Ok, and
echo !(*Group[23].*)
foo.Group1.1.txt foo.Group1.2.txt foo.Group1.3.txt foo.Group1.4.txt foo.Group1.5.txt
Or
echo !(*Group*(1|3).*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.3.txt foo.Group2.4.txt foo.Group2.5.txt
Or even
echo !(*Group*(1|*.3).*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.4.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.2.txt foo.Group3.4.txt foo.Group3.5.txt
and
echo !(*Group*(1|*.[2-4]).*)
foo.Group2.1.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.5.txt
I will let you think about last two sample! ;-)

Does anybody have a script that counts the number of consecutive files which contain a specific word?

Any resources or advice would help, since I am pretty rubbish at scripting
So, I need to go to this path: /home/client/data/storage/customer/data/2020/09/15
And check to see if there are 5 or more consecutive files that contain the word "REJECTED":
ls -ltr
-rw-rw-r-- 1 root root 5059 Sep 15 00:05 customer_rlt_20200915000514737_20200915000547948_8206b49d-b585-4360-8da0-e90b8081a399.zip
-rw-rw-r-- 1 root root 5023 Sep 15 00:06 customer_rlt_20200915000547619_20200915000635576_900b44dc-1cf4-4b1b-a04f-0fd963591e5f.zip
-rw-rw-r-- 1 root root 39856 Sep 15 00:09 customer_rlt_20200915000824108_20200915000908982_b87b01b3-a5dc-4a80-b19d-14f31ff667bc.zip
-rw-rw-r-- 1 root root 39719 Sep 15 00:09 customer_rlt_20200915000901688_20200915000938206_38261b59-8ebc-4f9f-9e2d-3e32eca3fd4d.zip
-rw-rw-r-- 1 root root 12829 Sep 15 00:13 customer_rlt_20200915001229811_20200915001334327_1667be2f-f1a7-41ae-b9ca-e7103d9abbf8.zip
-rw-rw-r-- 1 root root 12706 Sep 15 00:13 customer_rlt_20200915001333922_20200915001357405_609195c9-f23a-4984-936f-1a0903a35c07.zip
Example of rejected file:
customer_rlt_20200513202515792_20200513202705506_5b8deae0-0405-413c-9a81-d1cc2171fa51REJECTED.zip
What I have so far:
!/bin/bash
YYYY=$(date +%Y);
MM=$(date +%m)
DD=$(date +%d)
#Set constants
CODE_OK=0
CODE_WARN=1
CODE_CRITICAL=2
CODE_UNKNOWN=3
#Set Default Values
FILE="/home/client/data/storage/customer/data/${YYYY}/${MM}/{DD}"
if [ ! -f $FILE ]
then
echo "NO TRANSACTIONS FOUND"
exit $CODE_CRITICAL
fi
You can do something quick in AWK:
$ cat consec.awk
/REJECTED/ {
if (match_line == NR - 1) {
consecutives++
} else {
consecutives = 1
}
if (consecutives == 5) {
print "5 REJECTED"
exit
}
match_line = NR
}
$ touch 1 2REJECTED 3REJECTED 5REJECTED 6REJECTED 7REJECTED 8
$ ls -1 | awk -f consec.awk
5 REJECTED
$ rm 3REJECTED; touch 3
$ ls -1 | awk -f consec.awk
$
This works by matching line containing REJECTED, counting consecutive lines (checked with match_line == NR - 1, which means "the last matching line was the previous line") and printing "5 REJECTED" if the number of consecutive lines is 5.
I've used ls -1 (note digit 1, not letter l) to sort by filename in this example. You could use ls -1rt (digit 1 again) to sort by file modification time, as in your original post.

I dont want to print repeated lines based on column 6 and 7

I don't want to print repeated lines based on column 6 and 7. sort -u does not seem to help
cat /tmp/testing :-
-rwxrwxr-x. 1 root root 52662693 Feb 27 13:11 /home/something/bin/proxy_exec
-rwxrwxr-x. 1 root root 27441394 Feb 27 13:12 /home/something/bin/keychain_exec
-rwxrwxr-x. 1 root root 45570820 Feb 27 13:11 /home/something/bin/wallnut_exec
-rwxrwxr-x. 1 root root 10942993 Feb 27 13:12 /home/something/bin/log_exec
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
When I try cat /tmp/testing | sort -u -k 6,6 -k 7,7 I get :-
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
-rwxrwxr-x. 1 root root 52662693 Feb 27 13:11 /home/something/bin/proxy_exec
Desired output is below, as that is the only file different from others based on month and date column
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
[not] to print repeated lines based on column 6 and 7 using awk, you could:
$ awk '
++seen[$6,$7]==1 { # count seen instances
keep[$6,$7]=$0 # keep first seen ones
}
END { # in the end
for(i in seen)
if(seen[i]==1) # the ones seen only once
print keep[i] # get printed
}' file # from file or pipe your ls to the awk
Output for given input:
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
Notice: All standard warnings against parsing ls output still apply.
tried on gnu sed
sed -E '/^\s*(\S+\s+){5}Feb\s+27/d' testing
tried on gnu awk
awk 'NR==1{a=$6$7;next} a!=$6$7{print}' testing

How to get a filename list with ncftp?

So I tried
ncftpls -l
which gives me a list
-rw-r--r-- 1 100 ftpgroup 3817084 Jan 29 15:50 1548773401.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817089 Jan 29 15:51 1548773461.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817083 Jan 29 15:52 1548773521.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817085 Jan 29 15:53 1548773582.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817090 Jan 29 15:54 1548773642.tar.gz
But all I want is to check the timestamp (which is the name of the tar.gz)
How to only get the timestamp list ?
As requested, all I wanted to do is delete old backups, so awk was a good idea (at least it was effective) even it wasn't the right params. My method to delete old backup is probably not the best but it works
ncftpls *authParams* | (awk '{match($9,/^[0-9]+/, a)}{ print a[0] }') | while read fileCreationDate; do
VALIDITY_LIMIT="$((`date +%s`-600))"
a=$VALIDITY_LIMIT
b=$fileCreationDate
if [ $b -lt $a ];then
deleteFtpFile $b
fi
done;
You can use awk to only display the timestamps from the output like so:
ncftpls -l | awk '{ print $5 }'

Using Finger in Bash to display unique names

I'm trying to write a shell script that displays unique Names, user name and Date using finger command.
Right now when I enter finger, it displays..
Login Name Tty Idle Login Time Office
1xyz xyz pts/13 Dec 2 18:24 (76.126.34.32)
1xyz xyz pts/13 Dec 2 18:24 (76.126.34.32)
2xxxx xxxx pts/23 2 Dec 2 21:35 (108.252.136.12)
2zzzz zzzz pts/61 13 Dec 2 20:46 (24.4.205.223)
2yyyy yyyy pts/32 57 Dec 2 21:06 (205.154.255.145)
1zzz zzz pts/35 37 Dec 2 20:56 (71.198.36.189)
1zzz zzz pts/48 12 Dec 2 20:56 (71.198.36.189)
I would the script to eliminate the unique values of the username and display it like..
xyz (1xyz) Dec 2 18:24
xxxx (2xxxx) Dec 2 21:35
zzzz (2zzzz) Dec 2 20:46
yyyy (2yyyy) Dec 2 21:06
zzz (1zzz) Dec 2 20:56
the Name is in the first column and the user name is in () and Date is last column
Thanks in Advance!
Ugly but should work.
finger | sed 's/\t/ /' | sed 's/pts\/[0-9]* *[0-9]*//' | awk '{print $2"\t("$1")\t"$3" "$4" "$5}' | sort | uniq
Unique names with sort-u is the easy part.
When you only want to parse the data in your example, you can try matching all strings in one command.
finger | sed 's/^\([^ ]*\) *\([^ ]*\) *pts[^A-Z]*\([^(]*\).*/\2\t(\1)\t\3/'
However, this is hard work and waiting to fail. My finger returns
Login Name Tty Idle Login Time Where
notroot notroot *:0 - Nov 26 15:30 console
notroot notroot pts/0 7d Nov 26 15:30
notroot notroot *pts/1 - Nov 26 15:30
You can try to improve the sed command, good luck with that!
I think the only way is looking at the columns: Read the finger output one line a time and slice each line with ${line:start:len} into parts (and remove spaces afterwards). Have a nice count (and be aware for that_user_with_a_long_name).

Resources