remove files within specific time range - bash

I have a directory of several files, and each file contains a timestamp at the end of its name, in the format of .%Y-%m-%dT%H-%M-%S, like following
filename.2021-02-12T10-29-59
filename.2021-02-11T05-04-30
filename.2021-02-10T00-12-30
filename.2021-02-10T20-30-30
...
I'm writing a script that whose timestamp is 3 days older than the current date.
For example, if current date is 2021-02-10 (from date '+%Y-%m-%d' command), it should delete all files older than 2021-02-07.

You can do with GNU awk and xargs:
awk -F\. '{ newdat=gensub("[T-]"," ","g",$2);if (mktime(newdat)<( strftime("%s")-259200)) { print $0 } }' <(for i in filename*;do echo $i;done) | xargs rm
Loop on the files in the directory with the name *.filename and redirect the output back into awk. Take the second "." delimited field and replace all "T" and "-" characters, with a space, reading the result into a variable newdat. This is then used for the mktime function to compare the date in epoch format to the current epoch format date (attained with strftime) minus 259200 (seconds in 3 days). If the difference is greater than 3 days, print the filename and pipe the output through to xargs rm to remove the file(s)
Use xargs echo as opposed to rm to first verify that the files are listed as expected.

Related

Adding part of filename as column to csv files, then concatenate

I have many csv files that look like this:
data/0.Raw/20190401_data.csv
(Only the date in the middle of the filename changes)
I want to concatenate all these files together, but add the date as a new column in the data to be able to distinguish between the different files after merging.
I wrote a bash script that adds the full path and filename as a column in each file, and then merges into a master csv. However, I am having trouble getting rid of the path and the extension to only keep the date portion
The bash script
#! /bin/bash
mkdir data/1.merged
for i in "data/0.Raw/"*.csv; do
awk -F, -v OFS=, 'NR==1{sub(/\_data.csv$/, "", FILENAME) } NR>1{ $1=FILENAME }1' "$i" |
column -t > "data/1.merged/"${i/"data/0.Raw/"/""}""
done
awk 'FNR > 1' data/1.merged/*.csv > data/1.merged/all_files
rm data/1.merged/*.csv
mv data/1.merged/all_files data/1.merged/all_files.csv
using "sub" I was able to remove the "_data.csv" part, but as a result the column gets added as "data/0.Raw/20190401" - that is, I am having trouble removing both the part before the date as well as the part after the date.
I tried replacing sub with gensub to regex match everything except the 8 digits in the middle but that does not seem to work either.
Any ideas on how to solve this?
Thanks!
You can process and concatenate all the files with a single awk call:
awk '
FNR == 1 {
date = FILENAME
gsub(/.*\/|_data\.csv$/,"",date)
next
}
{ print date "," $0 }
' data/0.Raw/*_data.csv > all_files.csv
However, I am having trouble getting rid of the path and the extension
to only keep the date portion
Then take look at basename command
basename NAME [SUFFIX]
Print NAME with any leading directory components removed. If
specified, also remove a trailing SUFFIX.
Example
basename 'data/0.Raw/20190401_data.csv' _data.csv
gives output
20190401

Associate all the filenames at different paths along with their time interval in Unix

I have multiple files (with .txt or .ext format) at different directories.
The files path is stored in a variable say var.
I want to pick all the filenames as well the time interval (in hours) since the file was last placed.
The time interval will be current time - the last modification time.
Let's say
The file is at /Files/New directory with below time :
-rwxrwxrwx 1 ad.sam unx_9998_access 0 Nov 9 08:43 out.txt
I want the file name i.e out.txt and the interval ( in hrs) together.
This want to do for all the files in different paths (in the var variable).
So expected output is :
out.txt,12
abc.txt,9
pqr.txt,7
I am able to pull those details separately in different variables like below:
Files_in_Path=`ls -ltr | awk '{ print $9 }'`
TIMEDIFF=echo $(( ($(date +%s) - $(stat $Files_in_Path -c %Y)) / 3600 ))
But I am not able to associate it together like filename,interval for all the files.
It's not really clear what your expected output is. If it's enough to print the file name and its age side by side, try
now=$(date +%s)
for file in ./*; do
then=$(stat "$file" -c '%Y')
printf '%s,%i\n' "$file" $(( (now - then) / 3600))
done
Notice also how we don't use ls in scripts and more tangentially that
TIMEDIFF=echo $((1))
doesn't actually assign the evaluated value of $((1)) to TIMEDIFF -- instead, it temporarily assigns the string echo to TIMEDIFF and attempts to evaluate the value as a command (so you would get a 1: command not found unless you happen to have a command whose name is 1).

Awk - filter only dates with certain format from text file

I have a .txt file with many lines of text on macOS. I would like to filter only dates and have them saved in order of appearance line by line in a new text file.
I am, however, not interested in all dates, only in those who are complete, looking like 02/03/2019, and those where the number of days is below 13, i. e. 01...12.
Then, I would like to have those dates removed where the number for the day and month are the same like 01/01/2019 and 02/02/2019 etc.
How can I achieve this with awk or similar software in bash?
If perl is a choice:
perl -ne 'print if m:(\d\d)/(\d\d)/(\d\d\d\d): && $1 < 13 && $1 != $2' dates.txt >newdates.txt
this assumes this format /dd/mm/yyyy
Note that I am using a m: : notation instead of the usual / / for regex matching. Thus I do not need to escape the / slashes in the date.
Deleting Dates Inside a Text File
The following command will delete all dates of the form✱ aa/bb/cccc where aa = bb < 13. The original file will be copied to yourFile.txt.bak as a backup and the new text with deleted dates will overwrite the old file.
sed -E -i.bak 's:\b(0[0-9]|1[0-2])/\1/[0-9]{4}\b::g' yourFile.txt
If you want to insert something instead of just deleting the dates you can do so by writing the replacement between the two ::. For instance sed … 's:…:deleted date:/g' … will replace each matching date with the text deleted date.
✱ Note that it doesn't matter for your criterion whether the date format is dd/mm/yyyy or mm/dd/yyyy since your are only interested in dates where dd and mm are equal.
Extracting Specific Dates From A Text File
If you do not want to delete, but only extract specific dates as mentioned in your comment, you can use the following command.
grep -Eo '\b([0-9]{2}/){2}[0-9]{4}\b' yourFile.txt | awk -F/ '$1<13 && $1!=$2'
This will extract all dates in dd/mm/yyyy (!) format where mm ≠ dd < 13. The dates are printed in order of appearance on stdin. If you want to save them to a file append > yourOutputFile.txt to the end of the command.

How to separate month's worth of timestamped data by day

I have a .log file which restarts at the beginning of each month, each message beginning with the following timestamp format: 01-07-2016 00:00:00:868|
There are thousands of messages per day and I'd like to create a short script which can figure out when the date increments and output each date to a new file with just that day's data. I'm not proficient in bash but I'd like to use sed or awk, as it's very useful for automating processes at my job and creating reports.
Below script will split the input log file into multiple files with the date added as a suffix to the input file name:
split_logfile_by_date
#!/bin/bash
exec < $1
while read line
do
date=$(echo $line|cut -d" " -f 1)
echo $line >> $1.$date
done
Example:
$ ls
log
$ split_logfile_by_date log
$ ls
log log.01-07-2016 log.02-07-2016 log.03-07-2016
awk '{log = FILENAME "." $1; print > log}' logfile
This will write all the 01-07-2016 records to the file logfile.01-07-2016

Printing the time of files in shell script

I am trying to print the time of all the files using the following shell script. But I see that not always bytes 42 to 46 is the time, as it changes due to more/less bytes in username and other details. Is there another way to fetch the time?
#!/bin/sh
for file in `ls `
do
#echo `ls -l $file`
echo `ls -l $file | cut -b 42-46`
done
Use awk.
Try ls -l | awk '{ print $6 $7 $8}'
This will print the 6th, 7th and 8th fields of ls -l split by whitespace
If the fields are different for you change the numbers to adjust which fields.
The output from ls varies depending on the age of the files. For files less than about 6 months old, it is the month, day, time (in hours and minutes); for files more than about 6 months old, it prints the month, day, year.
The stat command can be used to get more accurate times.
For example, to print the time of the last modification and file name of some text files, try:
stat -c '%y %n' *.txt
From the manual:
%x Time of last access
%X Time of last access as seconds since Epoch
%y Time of last modification
%Y Time of last modification as seconds since Epoch
%z Time of last change
%Z Time of last change as seconds since Epoch
man stat

Resources