Monitoring Tool for checking files in a directory - bash

I have four files which come in to this directory C:/Desktop/Folder each day which are in the following format:
DD/MM/YYYY/HH/MM/SS/File
Examples:
01/01/2018/01:01:00
01/01/2018/01:02:00
01/01/2018/01:03:00
01/01/2018/01:04:00
I want my script to only email me if in a 24 hour period the 4th file:
01/01/2018/01:04:00 does not come into the mailbox. I cannot do an = as each day the date increments by 1 for example:
01/01/2018/01:01:00 then
02/01/2018/01:01:00
Code:
#!/bin/bash
monitor_dir=/path/to/dir
email=me#me.com
files=$(find "$monitor_dir" -maxdepth 1 | sort)
IFS=$'\n'
while true
do
sleep 5s
newfiles=$(find "$monitor_dir" -maxdepth 1 | sort)
added=$(comm -13 <(echo "$files") <(echo "$newfiles"))
[ "$added" != "" ] && find $added -maxdepth 1 -printf '%Tc\t%s\t%p\n' | mail -s "incoming" "$email"
files="$newfiles"
done
Can I please have some assistance with how I can alter this code to reflect my new requirements?

Related

Linux - Finding the max modified date of each set of files in each directory

path/mydir contains a list of directories. The names of these directories tell me which database they relate to.
Inside each directory is a bunch of files, but the filenames tell me nothing of importance.
I'm trying to write a command in linux bash that accomplishes the following:
For each directory in path/mydir, find the max timestamp of the last modified file within that directory
Print the last modified file's timestamp next to the parent directory's name
Exclude any timestamps less than 30 days old
Exclude specific directory names using regex
Order by oldest timestamp
Given this directory structure in path/mydir:
database_1
table_1.file (last modified 2021-11-01)
table_2.file (last modified 2021-11-01)
table_3.file (last modified 2021-11-05)
database_2
table_1.file (last modified 2021-05-01)
table_2.file (last modified 2021-05-01)
table_3.file (last modified 2021-08-01)
database_3
table_1.file (last modified 2020-01-01)
table_2.file (last modified 2020-01-01)
table_3.file (last modified 2020-06-01)
I would want to output:
database_3 2020-06-01
database_2 2021-08-01
This half works, but looks at the modified date of the parent directory instead of the max timestamp of files under the directory:
find . -maxdepth 1 -mtime +30 -type d -ls | grep -vE 'name1|name2'
I'm very much a novice with bash, so any help and guidance is appreciated!
Would you please try the following
#!/bin/bash
cd "path/mydir/"
for d in */; do
dirname=${d%/}
mdate=$(find "$d" -maxdepth 1 -type f -mtime +30 -printf "%TY-%Tm-%Td\t%TT\t%p\n" | sort -rk1,2 | head -n 1 | cut -f1)
[[ -n $mdate ]] && echo -e "$mdate\t$dirname"
done | sort -k1,1 | sed -E $'s/^([^\t]+)\t(.+)/\\2 \\1/'
Output with the provided example:
database_3 2020-06-01
database_2 2021-08-01
for d in */; do loops over the subdirectories in path/mydir/.
dirname=${d%/} removes the trailing slash just for the printing purpose.
printf "%TY-%Tm-%Td\t%TT\t%p\n" prepends the modification date and time
to the filename delimited by a tab character. The result will look like:
2021-08-01 12:34:56 database_2/table_3.file
sort -rk1,2 sorts the output by the date and time fields in descending order.
head -n 1 picks the line with the latest timestamp.
cut -f1 extracts the first field with the modification date.
[[ -n $mdate ]] skips the empty mdate.
sort -k1,1 just after done performs the global sorting across the
outputs of the subdirectories.
sed -E ... swaps the timestamp and the dirname. It just considers
the case the dirname may contain a tab character. If not, you can
omit the sed command by switching the order of timestamp and dirname
in the echo command and changing the sort command to sort -k2,2.
As for the mentioned Exclude specific directory names using regex, add
your own logic to the find command or whatever.
[Edit]
In order to print the directory name if the last modified file in the subdirectories is older than the specified date, please try instead:
#!/bin/bash
cd "path/mydir/"
now=$(date +%s)
for d in */; do
dirname=${d%/}
read -r secs mdate < <(find "$d" -type f -printf "%T#\t%TY-%Tm-%Td\n" | sort -nrk1,1 | head -n 1)
secs=${secs%.*}
if (( secs < now - 3600 * 24 * 30 )); then
echo -e "$secs\t$dirname $mdate"
fi
done | sort -nk1,1 | cut -f2-
now=$(date +%s) assigns the variable now to the current time as
the seconds since the epoch.
for d in */; do loops over the subdirectories in path/mydir/.
dirname=${d%/} removes the trailing slash just for the printing purpose.
-printf "%T#\t%TY-%Tm-%Td\n" prints the modificaton time as seconds since
the epoch and the modification date delimited by a tab character.
The result will look like:
1627743600 2021-08-01
sort -nrk1,1 sorts the output by the modification time in descending order.
head -n 1 picks the line with the latest timestamp.
read -r secs mdate < <( stuff ) assigns secs and mdate to the
outputs of the command in order.
secs=${secs%.*} removes the fractional part.
The condition (( secs < now - 3600 * 24 * 30 )) meets if secs
is 30 days or more older than now.
echo -e "$secs\t$dirname $mdate" prints dirname and mdate
prepending the secs for the sorting purpose.
sort -nk1,1 just after done performs the global sorting across the
outputs of the subdirectories.
cut -f2- removes secs portion.

Compare two version of zip file and find which file has been modified within that zip

I have two zip files called 10.88.10 and 10.88.12. One or more files in 10.88.12 have been modified. Is there any way I can find out which file has been modified?
The zip file contains a directory, a subdirectory, and zip files inside.
Code I've tried (I don't think I am on right path):
m1= md5sum 10.88.10.zip | cut -d' ' -f1
m2= md5sum 10.88.12.zip | cut -d' ' -f1
if [ "m1" != "m2" ]; then
echo file are not same
cd "/c/Users/name/Downloads/10.88.10/"
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
cd "/c/Users/name/Downloads/10.88.10/"
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
cd "/c/Users/name/Downloads/10.88.12/"
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
cd "/c/Users/name/Downloads/"
find 10.88.10/* -type f -print0 | xargs -0 sha1sum |cut -d' ' -f1 > file1.txt
find 10.88.12/* -type f -print0 | xargs -0 sha1sum | cut -d' ' -f1 > file2.txt
diff file1.txt file2.txt
else
echo false
fi
I tried hash to find out modified file by comparing and getting unique values but unfortunately I only receive the hash and can't think of a way to get names of the the input file which corresponds to that hash.
Running the hash cmd:
find 10.88.10/* -type f -print0 | xargs -0 sha1sum
Output:
c3f2b563b3cb091e2adsss321221a3d *10.88.12/name.xml
Difference/Modified file in hash:
1c1
< 3c2a991d1231c3eae391fadsdadda19e8f7b85df8caf2d
---
> c3f2b56qwdq2112e375b40fbfd5e60f526da3d1874c1874
< fbdc82dasdaa30538e5adadadada2d9456ff86953fbeeb1
---
> f962e8eqeqeqqe3b65d3ed43559adc879f5600c738e1e1c
Required output:
< 10.88.10/FOLDER/FILE1.XML
---
> 10.88.12/FOLDER1/FILE1.XML
< 10.88.10/FOLDER/FILE2.TXT
---
> 10.88.12/FOLDER/FILE2.TXT
IF anyone has a Java solution or bash script please share it.
The following is a shell script that leverages the sqlite3 command line tool's ability to open zip files to avoid having to unzip the files into a temporary location and using some simple SQL to do all the work:
#!/bin/sh
oldfile="$1"
newfile="$2"
sqlite3 -batch -bail <<EOF
.mode tabs
.headers off
CREATE VIRTUAL TABLE oldfile USING zipfile('${oldfile}');
CREATE VIRTUAL TABLE newfile USING zipfile('${newfile}');
-- Show files present in newfile that are absent in oldfile
SELECT 'added', name
FROM (SELECT name FROM newfile EXCEPT SELECT name FROM oldfile)
ORDER BY name;
-- Show files missing from newfile that are present in oldfile
SELECT 'deleted', name
FROM (SELECT name FROM oldfile EXCEPT SELECT name FROM newfile)
ORDER BY name;
-- Show files whose contents differ between the two
SELECT 'modified', of.name
FROM oldfile AS of
JOIN newfile AS nf ON of.name = nf.name
WHERE of.data <> nf.data
ORDER BY of.name;
EOF
Example usage:
$ unzip -l test1.zip
Archive: test1.zip
Length Date Time Name
--------- ---------- ----- ----
0 2020-02-27 04:05 1/
4 2020-02-27 04:05 1/a.txt
4 2020-02-27 04:05 1/b.txt
4 2020-02-27 04:05 a.txt
--------- -------
12 4 files
$ unzip -l test2.zip
Archive: test2.zip
Length Date Time Name
--------- ---------- ----- ----
0 2020-02-27 04:07 1/
4 2020-02-27 04:07 1/a.txt
4 2020-02-27 04:06 a.txt
4 2020-02-27 04:06 b.txt
--------- -------
12 4 files
$ ./cmpzip test1.zip test2.zip
added b.txt
deleted 1/b.txt
modified 1/a.txt
(I'm not sure why you want diff-style output when all you seem to care about is if a file changed, not what the change is, so this produces TSV output that's easier to understand and work with in further processing)

А command to delete files older than X days, but leave the last 2 files

I need help with the command to delete files on the server.
I have some archive folder.
file names of the form app-XXXXXX.tar.gz, where XXXXXX is the backup date. For example, app-231019.tar.gz
I need to delete files older than 14 days, but not the last 2 files.
I found a command
find /folder +14 -type f -delete
but it is not suitable for me
The filter "older than 14 days" should be applied based on the file name, and not by the date of recording to the server.
I cannot find a command on how to set a limit so that the last 2 files are not deleted, even if they are older than 14 days.
Would you please try the following:
dir="dir" # replace with your pathname
fortnightago=$(awk 'BEGIN {print strftime("%y%m%d", systime() - 86400 * 14)}')
# If your date command supports -d option, you can also say as:
# fortnightago=$(date -d "14 days ago" +%y%m%d)
for i in "$dir"/app-*.tar.gz; do
if [[ $i =~ app-([0-9]{2})([0-9]{2})([0-9]{2})\.tar\.gz ]]; then
yy="${BASH_REMATCH[3]}"
mm="${BASH_REMATCH[2]}"
dd="${BASH_REMATCH[1]}"
if (( $yy$mm$dd <= $fortnightago )); then
printf "%d%d%d%c%s\n" "${yy#0}" "${mm#0}" "${dd#0}" $'\t' "$i"
fi
fi
done | sort -rn -k1 | tail -n +3 | cut -f 2 | xargs rm --
[Explanation]
First it extracts the date string and rearrange it in "%y%m%d" order
for numeric comparison.
Print filenames which are older than 14 days with adding the date
in the 1st field.
Then sort the filenames by the 1st field in the descending order (the latest file first).
Then skip the first two lines to keep them.
Cut out the filenames as a removing list.
The filenames are passed to xargs with rm command.
As an alternative, if perl is your option, you can say:
perl -e '
$dir = "dir";
#t = localtime(time() - 86400 * 14);
$fortnightago = sprintf("%02d%02d%02d", $t[5] - 100, $t[4] + 1, $t[3]);
#ary = map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
grep { $_->[1] <= $fortnightago }
map { [ $_, m/app-(\d{2})(\d{2})(\d{2})\.tar\.gz/ && "$3$2$1" ] }
(<$dir/app-*.tar.gz>);
unlink splice(#ary, 2);
'
Hope this helps.

Delete all 0 byte files in HDFS which is created between a date range

How do i delete files in HDFS for a date range . ie to delete 0 byte files created between yesterday and 150 days from today.This is to be done in a shell script.
I am using below command to delete all 0 byte files but i need one where i can provide date range
hdfs dfs -ls -R $directory/* |grep -Ev "txt|xml|csv|mrc"| awk '$1 !~ /^d/ && $5 == "0" { print $8 }' | xargs -n100 hdfs dfs -rm
Any help?
# Create reference file with the date of today 00:00:00.000000 am
# as our upper date limit (excluded bound)
# that's equal to all yesterday up to 11:59:59.999999 pm
touch -d 'today' /tmp/before.tmp # before today is yesterday
# Create reference file with the date of 150 days ago as our lower date limit
# that's equal to 150 days ago 00:00:00.000000 am
touch -d '150 days ago' /tmp/after.tmp
# Find and delete files
find \
"$directory" \
-maxdepth 1 \
-type f \
-size 0 \
-anewer /tmp/after.tmp \
-not -anewer /tmp/before.tmp \
-regex '.*/.*\.\(txt\|xml\|csv\|mrc\)' \
-delete
Breakdown of the find command:
"$directory": find starting in this path from variable $directory
-maxdepth 1: limit search to this directory without descending sub-dirs
-type f: search actual files (no directory, no links...)
-size 0: search files with an actual size of 0
-anewer /tmp/after.tmp: search files that were accessed more recently than this reference file's date /tmp/after.tmp
-not -anewer /tmp/before.tmp: and that where accessed at most or before the reference file's date /tmp/before.tmp
-regex '.*/.*\.\(txt\|xml\|csv\|mrc\)': search files whose full-name with path match the POSIX RegularExpression './..(txt\|xml\|csv\|mrc)'
-delete: delete the files that are found matching all the previous options predicates

bash script to help identify files and path with a count sorted by day of the week recursivley

i'm not a script-er really (yet), so apologies in advance.
What I need to do is search a path for files within the last 7 days then count the number of files in each directrory for each day (Mon to sun) for each day for each directory.
so for eaxmple
From folder - Rootfiles
Directory 1 :
Number of files Monday
Number of files ..n
Number of files Sunday
Directory 2 :
Number of files Monday
Number of files ..n
Number of files Sunday
So far I have this from my basic command line knowledge and a bit of research.
#!/bin/bash
find . -type f -mtime -7 -exec ls -l {} \; | grep "^-" | awk '{
key=$6$7
freq[key]++
}
END {
for (date in freq)
printf "%s\t%d\n", date, freq[date]
}'
but a couple of problems, I need to print each directory then I need to figure out the Monday, Tuesday, Wednesday sort.
And for some reason works on my test folders with basic folders and names, but isn't on the production folders.
Even some pouinters of where to start thinking would be helpful
Thanks in advance all, you all awesome!
Neil
I found some addtional code that is helping
#!bin/bash
# pass in the directory to search on the command line, use $PWD if not arg received
rdir=${1:-$(pwd)}
# if $rdir is a file, get it's directory
if [ -f $rdir ]; then
rdir=$(dirname $rdir)
fi
# first, find our tree of directories
for dir in $( find $rdir -type d -print ); do
# get a count of directories within $dir.
sdirs=$( find $dir -maxdepth 1 -type d | wc -l );
# only proceed if sdirs is less than 2 ( 1 = self ).
if (( $sdirs < 2 )); then
# get a count of all the files in $dir, but not in subdirs of $dir)
files=$( find $dir -maxdepth 1 -type f | wc -l );
echo "$dir : $files";
fi
done
if I could somehow replace the line
sdirs=$( find $dir -maxdepth 1 -type d | wc -l );
with my original code block that would help.
props to
https://unix.stackexchange.com/questions/22803/counting-files-in-leaves-of-directory-tree
for that bit of code
Neat problem.
I think in your find command you will want to add the --time-style=+%w to get the day of the week.
find . -type f -mtime -7 -exec ls -l --time-style=+%w {} \;
I'm not sure why you are grepping for lines that start with a dash (since you're already only finding files.) This is not necessary, so I would remove it.
Then I would get the directory names from this output by stripping the filenames, or everything after the last slash from each line.
| sed -e 's:/[^/]*$:/:'
Then I would cut out all the tokens before the day of the week. Since you're using . as the starting point, you can expect each directory to start with ./.
| sed -e 's:.*\([0-6]\) \./:\1 ./:'
From here you can sort -k2 to sort by directory name and then day of the week.
Eventually you can pipe this into uniq -c to get the counts of days per week by directory, but I would convert it to human readable days first.
| awk '
/^0/ { $1 = "Monday " }
/^1/ { $1 = "Tuesday " }
/^2/ { $1 = "Wednesday" }
/^3/ { $1 = "Thursday " }
/^4/ { $1 = "Friday " }
/^5/ { $1 = "Saturday " }
/^6/ { $1 = "Sunday " }
{ print $0 }
'
Putting this all together:
find . -type f -mtime -7 -exec ls -l --time-style=+%w {} \; \
| sed -e 's:/[^/]*$:/:' \
| sed -e 's:.*\([0-6]\) \./:\1 ./:' \
| sort -k2 \
| awk '
/^0/ { $1 = "Monday " }
/^1/ { $1 = "Tuesday " }
/^2/ { $1 = "Wednesday" }
/^3/ { $1 = "Thursday " }
/^4/ { $1 = "Friday " }
/^5/ { $1 = "Saturday " }
/^6/ { $1 = "Sunday " }
{ print $0 }
' | uniq -c
On my totally random PWD it looks like this:
1 Monday ./
1 Tuesday ./
5 Saturday ./
2 Sunday ./
17 Monday ./M3Javadoc/
1 Thursday ./M3Javadoc/
1 Saturday ./M3Javadoc/
1 Sunday ./M3Javadoc/

Resources