dynamically scan accessed files, or modifed files with AV - bash

I need to set up McAfee AV for Linux to either dynamically scan accessed files, or to perform daily scans on all modified files.
I know how to make a cron job, and to search for last modified files, but I can't find any documentation anywhere on how to do what I need to do, even from McAfee :(
The problem with scanning modified files is that I can't find any find options that will scan the modified files from the last scan date, only from a time-frame. If I set McAfee to scan modified files daily, and the machine is off for over a day, it wont see those modified files as being modified within 24hours, and thus won't scan them. I also cannot figure out how to make McAfee scan a while when it is accessed. I assume I could possibly write a script that just launches a scan when any file is opened, but I am not sure how to do this either.
If possible, I'd like to use bash to do this, only because I haven't learned awk or perl yet. Any help or a point in the right direction would be appreciated. Thanks!

This works for me with ClamAV, replace 'clamscan' with the equivalent command provided by McAfee. This loop will look for files in the /root directory that have been edited in the last 2 days and then run a virus scan on them:
for i in `find /root -type f -mtime -2`; do
clamscan $i


How to compare two directories, and if they're the EXACT SAME, delete the second

I'm trying to setup an automatic backup on a raspberry pi system connected to an external hard drive.
Basically, I have shared folders and they're mounted via samba on the rPI under
I will then have the external hard drive plugged in and mounted with two folders under
I will then run a recursive copy from /mnt/Comp1* to /media/external/Comp1/* and the same with Comp2.
What I need help with is at the end of the copies (because it will be a total of 5 computers), I would like to verify that all the files transferred, and if they did and everything is on the external, then I can delete from the local machine automatically. I understand this is risky, because almost inevitably it will delete things that may not be backed up, but I need help knowing where to start.
I've found a lot of information on checking contents of a folder, and I know I can use the diff command, but I don't know how to use it in this pseudocode
use diff on directories /mnt/Comp1/ and /media/external/Comp1
if no differences, proceed to delete /mnt/Comp1/* recursively
if differences, preferably move the files not saved to /media/external/Comp1
repeat checking for differences, and deleting if necessary
Try something like:
diff -r -q d1/ d2/ >/dev/null 2>&1
check return value with $?
remove the d2, if return value is 1.

How can I organize recovered files into separate folders by file type?

I've got 218GB of assorted files recovered from a failing hard drive using PhotoRec. The files do not have their original file names and they're not sorted in any manner.
How can I go about sorting the files into separate folders by file type? I've tried searching for .jpg, for example, and I can copy those results into a new folder. But when I search for something like .txt, I get 16GB of text files as the result and there's no way I've found to select them all and copy them into their own folder. The system just hangs.
This is all being done on Windows 10.
Open powershell. Change to the recovered data folder cd c:\...\recovered_files. Make a directory for the text files mkdir text_files. Do the move mv *.txt text_files.
You really just want to move/cut the files like this instead of copying, because moving the files is just a name change (very fast), but to copy would have to duplicate all of the data (quite slow).
If your files are distributed among many directories, you would need to use a find command. In Linux, this would be quite simple with the command, find. In Windows, I have never tried anything like this. On MSDN there is an article about PowerShell that features an example which seems reminiscient of what you want to do. MSDN Documentation
The gist of it is that you would use the command:
cd <your recovered files directory containing the recup_dir folders>
Get-ChildItem -Path ".\*.txt" -Recurse | Move-Item -Verbose -Destination "Z:\stock_recovered\TXT"
Note that the destination is outside of the search path, which might be important!
Since I have never tried this before, there is NO WARRANTY. Supposing it works, I would be curious to know.

How to determine when a file was created

I have a .jar file that is compiled on a server and is later copied down to a local machine. Doing ls -lon the local machine just gives me the time it was copied down onto the local machine, which could be much later than when it was created on the server. Is there a way to find that time on the command line?
UNIX-like systems do not record file creation time.
Each directory entry has 3 timestamps, all of which can be shown by running the stat command or by providing options to ls -l:
Last modification time (ls -l)
Last access time (ls -lu)
Last status (inode) change time (ls -lc)
For example, if you create a file, wait a few minutes, then update it, read it, and do a chmod to change its permissions, there will be no record in the file system of the time you created it.
If you're careful about how you copy the file to the local machine (for example, using scp -p rather than just scp), you might be able to avoid updating the modification time. I presume that a .jar file probably won't be modified after it's first created, so the modification time might be good enough.
Or, as Etan Reisner suggests in a comment, there might be useful information in the .jar file itself (which is basically a zip file). I don't know enough about .jar files to comment further on that.
wget and curl have options that allow you to preserve the file's modified time stamp. This is close enough to what I was looking for.

How can I find a directory-diff of millions of files to script maintenance?

I have been working on how to verify that millions of files that were on file system A have infact been moved to file system B. While working on a system migration, it became evident that all the files needed to be audited to prove that the files have been moved. The files were initially moved via rsync, which does provide logs, although not in a format that is helpful for doing an audit. So, I wrote this script to index all the files on System A:
# Get directories and file list to be used to verify proper file moves have worked successfully.
LOGDATE=`/usr/bin/date +%Y-%m-%d`
MOUNT_POINTS="/mounts/AA mounts/AB"
for directory in $MOUNT_POINTS; do
# format: type,user,group,bytes,octal,octets,file_name
gfind $directory -mount -printf "%y","%u","%g","%s","%m","%p\n" >> $FILE_LIST_OUT
The file indexing works fine and takes about two hours to index ~30 million files.
On side B is where we run into issues. I have written a very simple shell script that reads the index file, tests to see if the file is there, and then counts up how many files are there, but it's running out of memory while looping through the 30 million lines on indexed file names. Effectively doing this little bit of code below through a while loop, and counters to increment for files found and not found.
if [ -f "$TYPE" "$FILENAME" ] ; then
print file found
file not found
My questions are:
Can a shell script do this type of reporting from such a large list. A 64 bit unix system ran out of memory while trying to execute this script. I have already considered breaking up the input script into smaller chunks to make it faster. Currently it can
If as shell script is inappropriate, what would you suggest?
You just used rsync, use it again...
This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn’t affect the data that goes into the file-lists, and thus it doesn’t affect deletions. It just limits the files that the receiver requests to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to continue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hierarchy (when it is used properly), using --ignore existing will ensure that the already-handled files don’t get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that this option is only looking at the existing files in the destination hierarchy itself.
That will actually fix any problems (at least in the same sense that any diff-list on file-exist tests could fix problem. Using --ignore-existing means rsync only does the file-exist tests (so it'll construct the diff list as you request and use it internally). If you just want information on the differences, check --dry-run, and --itemize-changes.
Lets say you have two directories, foo and bar. Let's say bar has three files, 1,2, and 3. Let's say that bar, has a directory quz, which has a file 1. The directory foo is empty:
Now, here is the result,
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/
>f+++++++++ 1
>f+++++++++ 2
>f+++++++++ 3
cd+++++++++ quz/
>f+++++++++ quz/1
Note, you're not interested in the cd+++++++++ -- that's just showing you that rsync issued a chdir. Now, let's add a file in foo called 1, and let's use grep to remove the chdir(s),
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/ | grep -v '^cd'
>f+++++++++ 2
>f+++++++++ 3
>f+++++++++ quz/1
f is for file. The +++++++++ means the file doesn't exist in the DEST dir.
Here is the bonus, remove --dry-run, and, it'll go ahead and make the changes for you.
Have you considered a solution such as kdiff3, which will diff directories of files ?
Note the feature for version 0.9.84
Directory-Comparison: Option "Full Analysis" allows to show the number
of solved vs. unsolved conflicts or deltas vs. whitespace-changes in
the directory tree.
There is absolutely no problem reading a 30 million line file in a shell script. The reason why your process failed was most likely that you tried to read the file entirely into memory, e.g. by doing something wrong like for i in $(cat file).
The correct way of reading a file is:
while IFS= read -r line
echo "Something with $line"
done < someFile
A shell script is inappropriate, yes. You should be using a diff tool:
diff -rNq /original /new
If you're not particular about the solution being a script, you could also look into meld, which would let you diff directory trees quite easily and you can also set ignore patterns if you have any.

Conditional action based on whether any file in a directory has a ctime newer than X

I would like to run a backup job on a directory tree from a bash script if any of the files have been modified in the last 30 minutes. I think I can hack together something using find with the -ctime flag, but I'm sure there is a standard way to examine a directory for changes.
I know that I can inspect the ctime of the top level directory to see if files were added, but I need to be able to see changes also.
FWIW, I am using duplicity to backup directories to S3.
For time in minutes, you should use -cmin -n
find /some/start/dir -cmin -30 -type f
Changes to files already in the directory do not cause a change in the directory's timestamps, so you need to check the files inside (e.g. with find as you suggest).
