bash script to find old files based off date in file name

bash script to find old files based off date in file name - bash

I'm developing a bash script that needs to search out files within a single directory that are "old" based off a variable that specifies how many days need to pass before the threshold is exceeded and the files are marked for action (could be anything from move to archive to delete, etc...).
The catch is that the modify time of the file is irrelevant in determining how old the files need to be before taken action upon, as the files may infrequently be changed, the execution time of the script can vary, etc...
The time that determines hold the files are is in the actual file name in the form of YYYY-MM-DD (or %F with the date command). take for instance the filename contents-2011-05-23.txt. What command(s) could be run in this directory to find all files that exceed a certain amount of days (I have the threshold currently set to 7 days, could change) and print out their file names?

Create a bash script isOld.sh like this:
#!/bin/bash
fileName=$1
numDays=$2
fileDt=$(echo $fileName | sed 's/^[^-]*-\([^.]*\)\..*$/\1/')
d1=$(date '+%s')
d2=$(date -d $fileDt '+%s')
diff=$((d1-d2))
seconds=$((numDays * 24 * 60 * 60))
[[ diff -ge seconds ]] && echo $fileName
Then give execute permission to above file by running:
chmod +x ./isOld.sh
And finally run this find command from top of your directory to print files older than 7 days as:
find . -name "contents-*" -exec ./isOld.sh {} 7 \;

In BSD, the -j is used to prevent the date being set and the -f parameter is used to set the format of the input date. :
First, you need to find today's date in the number of days since January 1, 1970:
today=$(date -j -f "%Y-%m-%d" 1969-12-31 +%s)
Now, you can use that to find out the time seven days ago:
((cutoff = $today - 604800))
The number 604800 is the number of seconds in seven days.
Now, for each file in your directory, you need to find the date part of the string. I don't know of a better way. (Maybe someone knows some Bash magic).
find . -type f | while read fileName
do
fileDate=$(echo $foo | sed 's/.*-\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\).*/\1/')
yadda, yadda, yadda #Figure this out later
done
Once we have the file date, we can use the date command to figure out if that date in seconds in less than (and thus older than the cutoff date)
today=$(date -j -f "%Y-%m-%d" 1969-12-31 +%s)
((cutoff = $today - 604800))
find . -type f | while read fileName #Or however you get all the file names
do
fileDate=$(echo $foo | sed 's/.*-\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\).*/\1/')
fileDateInSeconds=$(date -j -f "%Y-%m-%d" $fileDate +%s)
if [ $fileDateInSeconds -lt $cutoff ]
then
rm $fileName
fi
done
In Linux, you use the -d parameter to define the date which must be in YYYY-MM-DD format:
today=$(date +"%Y-%m-%d)
Now, you can take that and find the number of seconds:
todayInSeconds=(date -d $today +%s)
Everything else should be more or less the same as above.

If you run the command daily, you could do this:
echo *-`date -d '8 days ago' '+%F'`.txt
Additional wildcards could be added ofcourse

find *[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]*.txt -exec bash -c 'dt=`echo $0 | sed -re "s/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/"`; file_time=`date -d $dt +%s`; cutoff_time=`date -d "31 days ago" +%s` ; test $file_time -lt $cutoff_time ' {} \; -print
That's one of my longest one liners :-) Here it is again wrapped:
find *[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]*.txt \
-exec bash -c ' dt=`echo $0 | \
sed -re "s/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/"`; \
file_time=`date -d $dt +%s`; \
cutoff_time=`date -d "31 days ago" +%s` ;\
test $file_time -lt $cutoff_time \
' {} \; -print

Related

Issue with the for loop not working correctly

I have multiple jobs which runs based on file indicators.
I am looking to build a unix script to flag the job based on if the file is present for the current day of previous day.
I am maintaining a csv file with the below records in it and a Interval column ( which is in hours).
If the difference of the current time and the modification time of the file ( in hours) is more than what is there in the csv file then it will be flagged as old day file.
-sh-4.2$ cat scorecard_file_details.csv
Scorecard_Name,Path,FileName,Time_Interval_HRS
Scorecard_LDO_ABC_BTS,/NAS/IDQ/Bank_SEN,ABC.EXT,12
Scorecard_LDO_PQR_BTS,/NAS/IDQ/Bank_Prof,PQR.EXT,6
The files come at different path which is path in the above csv file.
Now, I want to match the file name for the csv with the filename at it's corresponding path and get the data may be in another file ( filename, path, flag).
I have come up with the below script but it is currently not returning anything at the highlighted (Bold) step ( it's incomplete as of now).
can anyone please help why the below for is not returning anything although the cat is working fine?
Also, any help with the logic is appreciated.
set -x
CSV_File_Path=/NAS/Target/DQ
**for FileName in $(cat scorecard_file_details.csv | awk -F "," '{ print $3 }'); do
echo $Filename**
CURTIME=$(date +%s)
File_Path=`awk '{ print $2 }'` $FileName
cd $File_Path
Files_in_Path=`ls -ltr | awk '{ print $9 }'`
for files in $Files_in_Path ; do
if [[ "$Files_in_Path" = $FileName ]]; then
TIMEDIFF=echo $(( ($(date +%s) - $(stat $files -c %Y)) / 3600 ))
echo $files","$TIMEDIFF >> /NAS/Target/DQ/file_with_difference.txt
else
echo "File is not present"
fi
done
<<Logic to flag based on time difference and interval>>
done
set +x

I highly suggest you redesign your script to use find command with -printf format option and combination of -mtime | -ctime | -mmin | -cmin filters.
Using find command you can filter a combination of time differences, type, name, path and more. Also assign actions on found files.
Please read intro tutorial here. And detailed manpage here.
In short you can push the time difference calculation into the find command, and than operate on the found files.
For example:
$ find /tmp -mmin +60 -and -mmin -150 -printf "%p %AF %AT \n"
Find files in /tmp folder.
That -mmin +60 -and -mmin -150 modified more than 60 min ago and modified less than 150 min ago.
Print the found files as file path=%p file create date=%AF file create time=%AT
Output:
/tmp/Low 2020-11-12 20:16:45.1960274000
/tmp/StructuredQuery.log 2020-11-12 20:55:00.3057165000
/tmp/~DF3465C10364E2CAFE.TMP 2020-11-12 20:16:45.7578495000
/tmp/~DFAC3AC652357DBBED.TMP 2020-11-12 20:16:46.1726618000
/tmp/~DFC2B1A30DCA4CA52A.TMP 2020-11-12 20:16:46.3941610000

Remove files older than the start of the current day

I want to use logic that allows to use the find command to find all files older than today's date.
Using the below has a 24 hour timestamp from the current time:
find /home/test/ -mtime +1
I am trying to achieve a solution that no matter what time it executes in the cron it will check all files older than the start of the day at 00:00. I believe this can be achieved using epoch, but struggling to find the best logic for this.

#!/bin/ksh
touch -t $(date +%Y%m%d0000.00) fence
find /home/test/ ! -newer fence -exec \
sh -c '
for f in "$#"; do
[[ $f -ot fence ]] && printf "%s\n" "$f"
done
' sh {} + \
;
rm fence
Why find(1) has no -older expression. :-(
UNIX find: opposite of -newer option exists?

Move file that has aged x minutes [duplicate]

This question already has answers here:
How can I compare numbers in Bash?
(10 answers)
Closed 3 years ago.
I have a recurring process that runs to check to see if a file has aged x mins. In a situation where it has, I move the file over to a new directory.
However, I noticed that files are being moved instantly. Any ideas what could be causing the issue?
# Expected age time = 10 minutes
EXPECTED_AGE_TIME=10
# How long the file has actually aged
ACTUAL_AGE_TIME=$((`date +%s` - `stat -L --format %Y $FILE`))
if [[ $ACTUAL_AGE_TIME > $((EXPECTED_AGE_TIME * 60)) ]]; then
mv $FILE ./loaded/$FILE
fi

Building on comment to use find in comments above. Apply find to a single find:
find $FILE -mmin +10 -exec mv '{}' ../loaded/ \;
This will eliminate messy date math, formatting of dates, ...

Checking relative age of files can be done by Bash's built-in file date comparison operator -ot.
See help test:
FILE1 -nt FILE2 True if file1 is newer than file2 (according to modification date).
FILE1 -ot FILE2 True if file1 is older than file2.
#!/usr/bin/env bash
declare -- TIME_FILE
TIME_FILE="$(mktemp)" || exit 1 # Failed to create temp-file
trap 'rm -- "$TIME_FILE"' EXIT # Purge the temp-file on exit
declare -i EXPECTED_AGE_TIME=10
# Set the time of the referrence $TIME_FILE to $EXPECTED_AGE_TIME minutes
touch --date "$((EXPECTED_AGE_TIME)) min ago" "$TIME_FILE"
# If $FILE is older than $TIME_FILE, then move it
[[ "$FILE" -ot "$TIME_FILE" ]] && mv -- "$FILE" "./loaded/$FILE"

How to delete files older than 30 days based on the date in the filename [duplicate]

This question already has answers here:
Delete all files older than 30 days, based on file name as date
(3 answers)
Closed 3 years ago.
I have CSV files get updated every day and we process the files and delete the files older than 30 days based on the date in the filename.
Example filenames :
XXXXXXXXXXX_xx00xx_**20171001**.000000_0.csv
I would like to schedule the job in crontab to delete 30 days older files daily.
Path could be /mount/store/
XXXXXXXXXXX_xx00xx_**20171001**.000000_0.csv
if [ $(date -d '-30 days' +%Y%m%d) -gt $D ]; then
rm -rf $D
fi
this above script doesn't seem to help me. Kindly help me on this.
I have been trying this for last two days.
Using CENTOS7
Thanks.

For all files:
Extract the date
touch the file with that date
delete files with the -mtime option
Do this in the desired dir for all files:
f=XXXXXXXXXXX_xx00xx_20171001.000000_0.csv
d=$(echo $f | sed -r 's/[^_]+_[^_]+_(20[0-9]{6})\.[0-9]{6}_.\.csv/\1/')
touch -d $d $f
After performing that for the whole dir, delete the older-thans:
find YourDir -type f -mtime +30 -name "*.csv" -delete
Gnu-sed has the -delete option. Other finds might need -exec rm ... .
Test before. Other pitfalls are different kind of dates, affected by touch (mtime, ctime, atime).
Test, manipulating the date with touch:
touch XXXXXXXXXXX_xx00xx_20171001.000000_0.csv
f=XXXXXXXXXXX_xx00xx_20171001.000000_0.csv; d=$(echo $f | sed -r 's/[^_]+_[^_]+_(20[0-9]{6})\.[0-9]{6}_.\.csv/\1/'); touch -d $d $f
ls -l $f
-rw-rw-r-- 1 stefan stefan 0 Okt 1 00:00 XXXXXXXXXXX_xx00xx_20171001.000000_0.csv

An efficient way to extract date from filename is to use variable expansions
f=XXXXXXXXXXX_xx00xx_20171001.000000_0.csv
d=${f%%.*} # removes largest suffix .*
d=${d##*_} # removes largest prefix *_
Or to use bash specific regex
if [[ $f =~ [0-9]{8} ]]; then echo "$BASH_REMATCH"; fi

Here is a solution if you have dgrep from dateutils.
ls *.csv | dateutils.dgrep -i '%Y%m%d' --le $(date -d "-30 day" +%F) | xargs -d '\n' rm
First we can use either ls or find to obtain a list of filenames. We can then pipe the results to dgrep to filter the filenames that contains a date string which matches our condition (in this case older than 30 days). Finally, we pipe the result to xargs rm to remove all the matched files.
-i '%Y%m%d' input date format as specified in your filename
--le $(date -d "-30 day" +%F) filter dates that are older than 30 days
You can change rm to printf "%s\n" to test the command before actually deleting it.

The following approach does not look at any generation time information of the file, it assumes the date in the filename is unrelated to the day the file is created.
#/usr/bin/env bash
d=$(date -d "-30 days" "+%Y%m%d")
for file in /yourdir/*csv; do
date=${file:$((${#file}-21)):8}
(( date < d )) && rm $file
done

Boolean check if a file has been opened in the past hour

I am trying to write a crontab that checks inside some specified directory and checks if the files are more than an hour old.
!#/bin/bash
for F in /My/Path/*.txt;do
if [ ***TEST IF FILE WAS OPENED IN THE PAST HOUR *** ]
then
echo "$F"
fi
done
thanks for any help

This can be done with a simple find
find /path/to/directory -type f -newermt "1 hours ago"
Any files accessed / modified within the past hour will print to stdout. No need to loop and print.
#/bin/bash
OLD_FILES=$(find /path/to/directory -type f -newermt "1 hours ago")
if [[ -n $OLD_FILES ]]; then
echo "$OLD_FILES"
else
echo "No old files found in dir"
fi
You can always pipe the results to a log file if you're trying to compile a list as well
find /path/to/directory -type f -newermt "1 hours ago" >> $yourLogFile

A more rigorous approach using GNU date, which has an option -r
-r, --reference=FILE
display the last modification time of FILE
Using the above, incorporating in your script
#!/bin/bash
for filename in /My/Path/*.txt ; do
if (( (($(date +%s) - $(date -r "$filename" +%s))/60) <= 60 )); then
echo "$filename"
fi
done
The logic is straight-forward, we are getting the file modification time in minutes by subtracting the file's modification EPOCH with the current time. If the file is modified within 60 minutes, the particular file is printed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash script to find old files based off date in file name - bash

If you run the command daily, you could do this: echo *-`date -d '8 days ago' '+%F'`.txt Additional wildcards could be added ofcourse

Related

Issue with the for loop not working correctly

Remove files older than the start of the current day

Move file that has aged x minutes [duplicate]

How to delete files older than 30 days based on the date in the filename [duplicate]

Boolean check if a file has been opened in the past hour

Categories

Resources