Directory monitoring using fswatch - macos

I am using fswatch to monitor a directory and run a script when video files are copied into that directory:
fswatch -o /Path/To/Directory/Directory | xargs -n 1 sh /Path/To/Script/Script.sh
The problem is that the file is often not completed its copy before the script is actioned. The files are video files of varying size. Small files are OK, larger files are not.
How can I delay the fswatch notification until the file has completed its copy?

First of all, the behaviour of the fswatch "monitors" is OS-specific: when asking question about fswatch you'd better specify the OS you use.
However, there's no way to do that using fswatch alone. A process may open a file for writing and keep it open for an amount of time sufficiently long for the OS to send multiple events. I'm afraid there is nothing fswatch can do about it.
An alternate approach may be using another tool to check whether the modified file is currently open: if it is not, then run your script, otherwise skip it and wait for its next event. Such tools are OS-specific: in OS X and Linux you may use lsof. Beware this approach does not protect you from another process opening that file while your script is running.

Related

using entr to watch a directory without any matching files

I want to modify *.ica files (to launch Citrix apps) when they are downloaded (to add a transparent Key Passthrough option for remote desktop), so I settled on using entr to monitor the directory, and call then another script (which invokes sed) to update all ica files.
while true; do
ls *.ica | entr -d ~/Downloads/./transparentKeyPassthrough-CitrixIca.sh
done
However, this only works when there is already an .ica file in the directory. If the directory has no *.ica files when first executed, entr errors with:
entr: No regular files to match
Putting a dummy ica file suffices, in which case the new (real) ica file will be detected by entr, and then acted on.
Is there a better way to do this?
The alternative I can think of is to use entr to watch the whole directory for any changes, and if so, run ls -l *.ica and if the change resulted in a new ica file, and then in turn, run the above script.
It seems inelegant and complicated to nest entr that way, so wanted to know if there is some simple option I am missing.

Zip directory in different batches

I'm trying to zip a massive directory with images that will be fed into a deep learning system. This is incredibly time consuming, so I would like to stop prematurely the zipping proccess with Ctrl + C and zip the directory in different "batches".
Currently I'm using zip -r9v folder.zip folder, and I've seen that the option -u allows to update changed files and add new ones.
I'm worried about some file or the zip itself ending up corrupted if I terminate the process with Ctrl + C. From this answer I understand that the cp can be terminated safely, and this other answer suggests that gzip is also safe.
Putting it all together: Is it safe to end prematurely the zip command? Is the -u option viable for zipping in different batches?
Is it safe to end prematurely the zip command?
In my tests, canceling zip (Info-ZIP, 16 June 2008 (v3.0)) using CtrlC did not create a zip-archive at all, even when the already compressed data was 2.5GB. Therefore, I would say CtrlC is "safe" (you won't end up with a corrupted file, but also pointless (you did all the work for nothing).
Is the -u option viable for zipping in different batches?
Yes. Zip archives compress each file individually, so the archives you get from adding files later on are as good as adding all files in a single run. Just remember that starting zip takes time too. So set the batch size as high as acceptable to save time.
Here is a script that adds all your files to the zip archive, but gives a chance to stop the compression at every 100th file.
#! /bin/bash
batchsize=100
shopt -s globstar
files=(folder/**)
echo "Press enter to stop compression after this batch."
for ((startfile=0; startfile<"${#files[#]}"; startfile+=batchsize)); do
((startfile==0)) && u= || u=u
zip "-r9v$u" folder.zip "${files[#]:startfile:batchsize}"
u=u
if read -t 0; then
echo "Compression stopped before file $startfile."
echo "Re-run this script with startfile=$startfile to continue".
exit
fi
done
For more speed you might want to look into alternative zip implementations.

macOS with APFS: Copy-On-Write in Terminal

I am writing some little script that assembles backup data into one directory. The directory content will then be uploaded to a cloud service and after that we can remove it. I was wondering how one could utilize APFS' copy-on-write feature with a command like cp in Terminal.
The Finder does a great job. But if I run cp Largefile LargeFileCopy it takes forever to copy the file and also uses the space accordingly.
I found it myself.
On macOS, cp supports the -c option. cp -c Largefile LargeFileCopy will then use the new clonefile(2) library and immediately return without using any additional space on the device.

How to monitor file changes on network mapped drives?

From what I see, network mapped drives appear as subfolders of the /Volumes folder.
What is the proper way to get file changes updates (delete/create/update) from this folder?
Would /dev/fsevents work for that?
How does Finder know about the changes?
You're correct, OS X mounts the network drives in /Volumes
The way to get file change updates is to use File System Events API. It is a C-based API where you would watch for all changes in specific directories (or even /).
You would create the stream with FSEventStreamCreate and starting it with FSEventStreamScheduleWithRunLoop
Be prepared to dig into the header-file as there is more documentation on it as in the Reference documentation
From what I can tell, Finder probably uses some internal API or the kernel queues which are more complex to setup than the higher-level API of FSEvents.h
There is a nice GUI to helping you see how the whole events come in. It's called fseventer by fernlightning (not yet Yosemite ready)
You can use fswatch, which I find easest to install via homebrew. And, yes it does use FSEvents. Then you just do:
fswatch /Volumes/MUSIC
where MUSIC is a Samba-based music server on my network.
Here is how it looks in action... first I show the mounted volumes (and that MUSIC is Samba based) in the top window, then I start fswatch in the bottom left window, then I make modifications in the filesystem in the top window and you can see them happen in the Finder and also see in the bottom left window that fswatch tracks all the events.
You can also use it to interact with another program whenever events are detected, like this (extracted from the fswatch manpage):
Probably the simplest way to pipe fswatch to another program in order to respond to an event is
using xargs:
$ fswatch -0 [opts] [paths] | xargs -0 -n 1 -I {} [command]
fswatch -0 will split records using the NUL character.
xargs -0 will split records using the NUL character. This is required to correctly match
impedance with fswatch.
xargs -n 1 will invoke command every record. If you want to do it every x records, then
use xargs -n x.
xargs -I {} will substitute occurrences of {} in command with the parsed argument. If
the command you are running does not need the event path name, just delete this option.
If you prefer using another replacement string, substitute {} with yours.

How to keep two folders automatically synchronized?

I would like to have a synchronized copy of one folder with all its subtree.
It should work automatically in this way: whenever I create, modify, or delete stuff from the original folder those changes should be automatically applied to the sync-folder.
Which is the best approach to this task?
BTW: I'm on Ubuntu 12.04
Final goal is to have a separated real-time backup copy, without the use of symlinks or mount.
I used Ubuntu One to synchronize data between my computers, and after a while something went wrong and all my data was lost during a synchronization.
So I thought to add a step further to keep a backup copy of my data:
I keep my data stored on a "folder A"
I need the answer of my current question to create a one-way sync of "folder A" to "folder B" (cron a script with rsync? could be?). I need it to be one-way only from A to B any changes to B must not be applied to A.
The I simply keep synchronized "folder B" with Ubuntu One
In this manner any change in A will be appled to B, which will be detected from U1 and synchronized to the cloud. If anything goes wrong and U1 delete my data on B, I always have them on A.
Inspired by lanzz's comments, another idea could be to run rsync at startup to backup the content of a folder under Ubuntu One, and start Ubuntu One only after rsync is completed.
What do you think about that?
How to know when rsync ends?
You can use inotifywait (with the modify,create,delete,move flags enabled) and rsync.
while inotifywait -r -e modify,create,delete,move /directory; do
rsync -avz /directory /target
done
If you don't have inotifywait on your system, run sudo apt-get install inotify-tools
You need something like this:
https://github.com/axkibe/lsyncd
It is a tool which combines rsync and inotify - the former is a tool that mirrors, with the correct options set, a directory to the last bit. The latter tells the kernel to notify a program of changes to a directory ot file.
It says:
It aggregates and combines events for a few seconds and then spawns one (or more) process(es) to synchronize the changes.
But - according to Digital Ocean at https://www.digitalocean.com/community/tutorials/how-to-mirror-local-and-remote-directories-on-a-vps-with-lsyncd - it ought to be in the Ubuntu repository!
I have similar requirements, and this tool, which I have yet to try, seems suitable for the task.
Just simple modification of #silgon answer:
while true; do
inotifywait -r -e modify,create,delete /directory
rsync -avz /directory /target
done
(#silgon version sometimes crashes on Ubuntu 16 if you run it in cron)
Using the cross-platform fswatch and rsync:
fswatch -o /src | xargs -n1 -I{} rsync -a /src /dest
You can take advantage of fschange. It’s a Linux filesystem change notification. The source code is downloadable from the above link, you can compile it yourself. fschange can be used to keep track of file changes by reading data from a proc file (/proc/fschange). When data is written to a file, fschange reports the exact interval that has been modified instead of just saying that the file has been changed.
If you are looking for the more advanced solution, I would suggest checking Resilio Connect.
It is cross-platform, provides extended options for use and monitoring. Since it’s BitTorrent-based, it is faster than any other existing sync tool. It was written on their behalf.
I use this free program to synchronize local files and directories: https://github.com/Fitus/Zaloha.sh. The repository contains a simple demo as well.
The good point: It is a bash shell script (one file only). Not a black box like other programs. Documentation is there as well. Also, with some technical talents, you can "bend" and "integrate" it to create the final solution you like.

Resources