I monitor a file for changes in a separate thread using kqueues/ kevent(2).
(I monitor a Python file for reparsing)
I subscribe as following:
EV_SET(&file_change, pyFileP, EVFILT_VNODE,
EV_ADD | EV_CLEAR,
NOTE_DELETE | NOTE_WRITE | NOTE_EXTEND |
NOTE_ATTRIB | NOTE_LINK | NOTE_RENAME | NOTE_REVOKE,
0, 0);
When I write to the file "/tmp/somefile.py" using Vim, I get two separate kevents:
The flags of these events (event.fflags) are:
NOTE_RENAME
and
NOTE_DELETE | NOTE_LINK
I never get a "NOTE_WRITE" event!
This seems to have something to do with the way Vim writes these files, since if I do
echo "sometext" >> /tmp/somefile.py
I do get the:
NOTE_WRITE|NOTE_EXTEND
event.
Odd, eh? I haven't checked the Vim source code but it must do something strange, or does it simply use user level functions that are implemented that way?
I wasn't really expecting this. Is this a known problem, I just have to check for all events possible, or is there a known interface that really checks if a file has been written?
What is actually happening is that Vim won't write over the same file, first
it probably renames it to something else and then creates another file (link).
You can confirm that by doing something like:
$ vim file -c wq
This will open a file and write it. Now check the inode:
$ ls -i
30621217 file
Write the file with Vim again and re-check the inode:
$ vim file -c wq
$ ls -i
30621226 file
It's just different. That means the second file is actually another file
(linked to another inode) with the same name, and the old one was unlinked.
Many editors do that. I can't confirm why exactly Vim takes this approach.
Maybe for safety: if you first rename the file and something goes wrong
while writing the new file, you still have the old one. If you start writing
over a file and a problem occurs (even with memory) you'll probably loose part
of it. Maybe.
Related
I'm scratching my head about two seemingly different behaviors of bash when editing a running script.
This is not a place to discuss WHY one would do this (you probably shouldn't). I would only like to try to understand what happens and why.
Example A:
$ echo "echo 'echo hi' >> script.sh" > script.sh
$ cat script.sh
echo 'echo hi' >> script.sh
$ chmod +x script.sh
$ ./script.sh
hi
$ cat script.sh
echo 'echo hi' >> script.sh
echo hi
The script edits itself, and the change (extra echo line) is directly executed. Multiple executions lead to more lines of "hi".
Example B:
Create a script infLoop.sh and run it.
$ cat infLoop.sh
while true
do
x=1
echo $x
done
$ ./infLoop.sh
1
1
1
...
Now open a second shell and edit the file changing the value of x. E.g. like this:
$ sed --in-place 's/x=1/x=2/' infLoop.sh
$ cat infLoop.sh
while true
do
x=2
echo $x
done
However, we observe that the output in the first terminal is still 1. Doing the same with only one terminal, interrupting infLoop.sh through Ctrl+Z, editing, and then continuing it via fg yields the same result.
The Question
Why does the change in example A have an immediate effect but the change in example B not?
PS: I know there are questions out there showing similar examples but none of those I saw have answers explaining the difference between the scenarios.
There are actually two different reasons that example B is different, either one of which is enough to prevent the change from taking effect. They're due to some subtleties of how sed and bash interact with files (and how unix-like OSes treat files), and might well be different with slightly different programs, etc.
Overall, I'd say this is a good example of how hard it is to understand & predict what'll happen if you modify a file while also running it (or reading etc from it), and therefore why it's a bad idea to do things like this. Basically, it's the computer equivalent of sawing off the branch you're standing on.
Reason 1: Despite the option's name, sed --in-place does not actually modify the existing file in place. What it actually does is create a new file with a temporary name, then when it's finished that it deletes the original and renames the new file into its place. It has the same name, but it's not actually the same file. You can tell this by looking at the file's inode number with ls -li:
$ ls -li infLoop.sh
88 -rwxr-xr-x 1 pi pi 39 Aug 4 22:04 infLoop.sh
$ sed --in-place 's/x=1/x=2/' infLoop.sh
$ ls -li infLoop.sh
4073 -rwxr-xr-x 1 pi pi 39 Aug 4 22:05 infLoop.sh
But bash still has the old file open (strictly speaking, it has an open file handle pointing to the old file), so it's going to continue getting the old contents no matter what changed in the new file.
Note that this doesn't apply to all programs that edit files. vim, for example, will rewrite the contents of existing files (unless file permissions forbid it, in which case it switches to the delete&replace method). Appending with >> will always append to the existing file rather than creating a new one.
(BTW, if it seems weird that bash could have a file open after it's been "deleted", that's just part of how unix-like OSes treat their files. Files are not truly deleted until their last directory entry is removed and the last open file handle referring to them is closed. Some programs actually take advantage of this for security by opening(/creating) a file and then immediately "deleting" it, so that the open file handle is the only way to reach the file.)
Reason 2: Even if you used a tool that actually modified the existing file in place, bash still wouldn't see the change. The reason for this is that bash reads from the file (parsing as it goes) until it has something it can execute, runs that, then goes back and reads more until it has another executable chunk, etc. It does not go back and re-read the same chunk to see if it's changed, so it'll only ever notice changes in parts of the file it hasn't read yet.
In example B, it has to read and parse the entire while true ... done loop before it can start executing it. Therefore, changes in that part (or possibly before it) will not be noticed once the loop has been read & started executing. Changes in the file after the loop would be noticed after the loop exited (if it ever did).
See my answer to this question (and Don Hatch's comment on it) for more info about this.
Recently I tried to list all of the images located in a directory I had (several hundred) and put them into a file. I used a very simple command
ls > "image_names.txt"
I was bored and decided to look inside the file and realized that image_names.txt was located in the file. Then I realized, the order of operations performed was not what I thought. I read the command as left to right, in two separate steps:
ls (First list all the file names)
> "image_names.txt" (Then create this file and pipe it here)
Why is it creating the file first then listing all of the files in the directory, despite the ls command coming first?
When you use output redirection, the shell needs a place to put your output( suppose it was very long, then it could all be lost on terminate, or exhaust all working memory), so the first step is to open the output file for streaming output from the executed command's stdout.
This is especially important to know in this kind of command
cat a.txt | grep "foo" > a.txt
since a is opened first and not in append mode it will be truncated, meaning there is no input for cat. So the behaviour you expect that the lines will be filtered from a.txt and replace a.txt will not actually happen. Instead you will just lose the contents of a.txt.
Because redirection > "image_names.txt" was performed before ls command.
I am copying a LOGFILE to a remote server as it is being created.
tail -f LOGILE | gzip -c >> /faraway/log.gz
However, when the original LOGFILE is closed, and moved to a storage directory, my tail -f seems to get some odd data.
How can I ensure that tail -f stops cleanly and that the compressed file /faraway/log.gz is a true copy of LOGFILE?
EDIT 1
I did a bit more digging.
/faraway/log.gz terminated badly - halfway through a FIX message. This must be because I ctrlCed the whole piped command above.
IF ignore this last line, then the original LOGFILE and log.gz match EXACTLY! That's for a 40G file transmitted across the atlantic.
I am pretty impressed by that as it does exactly what I want. Does any reader think I was just "lucky" in this case - is this likely NOT to work in future?
Now, I just need to get a clean close of gzip. Perhaps sending a kill -9 to the tail PID as suggested below may do allow GZIP to finish its compression properly.
To get a full copy, use
tail -n +1 -f your file
If your don't use -n +1 option, you only get the tail part of the file.
Yet this does not solve the deleted/moved file problem.. In fact, the deleting/moving file problem is an IPC (inter-process communication) problem, or an inter-process co-operation problem. If you don't have the correct behavior model of the other process(es), you can't resolve the problem.
For example, if the other program COPY the log file to somewhere else, and then delete the current one, and the program then log outputs to that new log file... Apparently your tail can not read those outputs.
A related feature of unix (and unix-like system) worth of mentioning:
When a file is opened for read by process A, but is then deleted by
process B, the physical contents will not be immediately deleted,
since its reference count is not zero (someone is still using it, i.e.
process A). Process A can still access the file, until it closes the
file. Moving file is another question: If Process B, say, moves
file to the same physical file system (Note: you may have many
physical file system attached on your system), process A can still
access the file, even the file is growing. This kind of moving is
just to change name (path name + file name), nothing more. The
identity of the file (a.k.a. "i-node" in unix) does not change. Yet
if the file is moved to another physical file system, local or remote,
it is as if the file is copied and then removed. So the remove rule
mentioned can be applied.
The missing lines problem you mentioned is interesting, and may need more analysis on the behavior of the programs/processes which generate and move/delete the log file.
--update--
Happy to see you got some progress. Like I said, a process like tail can still access data after
the file is deleted, in a unix-like system.
You can use
( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) | gzip -c - > yourZipFile.gz
to gzip your log file, and kill the tail program by
kill -TERM `cat /tmp/PID_tail`
The gzip should finish by itself without error. Even if you are worried about that gzip will receive a broken
pipe signal, you can use this alternative way to prevent from the broken pipe:
( ( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) ; true ) | gzip -c - > yourZipFile.gz
The broken pipe is protected by a true, which prints nothing, but ends itself.
From the tail manpage: Emphasis mine
With --follow (-f), tail defaults to following the file
descriptor, which means that even if a tail'ed file is renamed,
tail will continue to track its end. This default behavior is
not desirable when you really want to track the actual name of
the file, not the file descriptor (e.g., log rotation). Use
--follow=name in that case. That causes tail to track the named
file in a way that accommodates renaming, removal and creation.
Therefore the solution to the problem you proposed is to use:
tail --follow=name LOGILE | gzip -c >> /faraway/log.gz
This way, when the file is deleted, tail stops reading it.
I'm trying to concatenate a license to the top of my built sources. I'm use GNU Make. In one of my rules, I have:
cat src/license.txt build/3d-tags.js > build/3d-tags.js
But this seems to be causing an infinite loop. When I kill the cat command, I see that build/3d-tags is just the contents of src/license.txt over and over again? What's going on? I would have suspected the two files to be concatenated together, and the resulting ouput from cat to be redirected back into build/3d-tags.js. I'm not looking to append. I'm on OSX, in case the issue is related to GNU cat vs BSD cat.
The shell launches cat as a subprocess. The output redirection (>) is inherited by that subprocess as its stdout (file descriptor 1). Because the subprocess has to inherit the file descriptor at its creation, it follows that the shell has to open the output file before launching the subprocess.
So, the shell opens build/3d-tags.js for writing. Furthermore, since you're not appending (>>), it truncates the file. Remember, this happens before cat has even been launched. At this point, it's impossible to achieve what you wanted because the original contents of build/3d-tags.js is already gone, and cat hasn't even been launched yet.
Then, when cat is launched, it opens the files named in its arguments. The timing and order in which it opens them isn't terribly important. It opens them both for reading, of course. It then reads from src/license.txt and writes to its stdout. This writing goes to build/3d-tags.js. At this point, it's the only content in that file because it was truncated before.
cat then reads from build/3d-tags.js. It finds the content that was just written there, which is what cat previously read from src/license.txt. It writes that content to the end of the file. It then goes back and tries to read some more. It will, of course, find more to read, because it just wrote more data to the end of the file. It reads this remaining data and writes it to the file. And on and on.
In order for cat to work as you hoped (even ignoring the shell redirection obliterating the contents of build/3d-tags.js), it would have to read and keep in memory the entire contents of build/3d-tags.js, no matter how big it was, so that it could write it after it wrote the contents of src/license.txt.
Probably the best way to achieve what you want is something like this:
cat src/license.txt build/3d-tags.js > build/3d-tags.js.new && mv build/3d-tags.js.new build/3d-tags.js || rm -f build/3d-tags.js.new
That is: concatenate the two files to a new file; if that succeeds, move the new file to the original file name (replacing the original); if either step fails, remove the temporary "new" file so as to not leave junk around.
I have a program (that I did not write) which is not designed to read in commands from a file. Entering commands on STDIN is pretty tedious, so I'd like to be able to automate it by writing the commands in a file for re-use. Trouble is, if the program hits EOF, it loops infinitely trying to read in the next command dropping an endless torrent of menu options on the screen.
What I'd like to be able to do is cat a file containing the commands into the program via a pipe, then use some sort of shell magic to have it switch from the file to STDIN when it hits the file's EOF.
Note: I've already considered using cat with the '-' for STDIN. Unfortunately (I didn't know this before), piped commands wait for the first program's output to terminate before starting the second program -- they do not run in parallel. If there's some way to get the programs to run in parallel with that kind of piping action, that would work!
Any thoughts? Thanks for any assistance!
EDIT:
I should note that my goal is not only to prevent the system from hitting the end of the commands file. I would like to be able to continue typing in commands from the keyboard when the file hits EOF.
I would do something like
(cat your_file_with_commands; cat) | sh your_script
That way, when the file with commands is done, the second cat will feed your script with whatever you type on stdin afterwards.
Same as Idelic answer with more simple syntax ;)
cat your_file_with_commands - | sh your_script
I would think expect would work for this.
Have you tried using something like tail -f commandfile | command I think that should pipe the lines of the file to command without closing the file descriptor afterwards. Use -n to specify the number of lines to be piped if tail -f doesn't catch all of them.