How do I tell tail -f it has finished - cleanly? - bash

I am copying a LOGFILE to a remote server as it is being created.
tail -f LOGILE | gzip -c >> /faraway/log.gz
However, when the original LOGFILE is closed, and moved to a storage directory, my tail -f seems to get some odd data.
How can I ensure that tail -f stops cleanly and that the compressed file /faraway/log.gz is a true copy of LOGFILE?
EDIT 1
I did a bit more digging.
/faraway/log.gz terminated badly - halfway through a FIX message. This must be because I ctrlCed the whole piped command above.
IF ignore this last line, then the original LOGFILE and log.gz match EXACTLY! That's for a 40G file transmitted across the atlantic.
I am pretty impressed by that as it does exactly what I want. Does any reader think I was just "lucky" in this case - is this likely NOT to work in future?
Now, I just need to get a clean close of gzip. Perhaps sending a kill -9 to the tail PID as suggested below may do allow GZIP to finish its compression properly.

To get a full copy, use
tail -n +1 -f your file
If your don't use -n +1 option, you only get the tail part of the file.
Yet this does not solve the deleted/moved file problem.. In fact, the deleting/moving file problem is an IPC (inter-process communication) problem, or an inter-process co-operation problem. If you don't have the correct behavior model of the other process(es), you can't resolve the problem.
For example, if the other program COPY the log file to somewhere else, and then delete the current one, and the program then log outputs to that new log file... Apparently your tail can not read those outputs.
A related feature of unix (and unix-like system) worth of mentioning:
When a file is opened for read by process A, but is then deleted by
process B, the physical contents will not be immediately deleted,
since its reference count is not zero (someone is still using it, i.e.
process A). Process A can still access the file, until it closes the
file. Moving file is another question: If Process B, say, moves
file to the same physical file system (Note: you may have many
physical file system attached on your system), process A can still
access the file, even the file is growing. This kind of moving is
just to change name (path name + file name), nothing more. The
identity of the file (a.k.a. "i-node" in unix) does not change. Yet
if the file is moved to another physical file system, local or remote,
it is as if the file is copied and then removed. So the remove rule
mentioned can be applied.
The missing lines problem you mentioned is interesting, and may need more analysis on the behavior of the programs/processes which generate and move/delete the log file.
--update--
Happy to see you got some progress. Like I said, a process like tail can still access data after
the file is deleted, in a unix-like system.
You can use
( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) | gzip -c - > yourZipFile.gz
to gzip your log file, and kill the tail program by
kill -TERM `cat /tmp/PID_tail`
The gzip should finish by itself without error. Even if you are worried about that gzip will receive a broken
pipe signal, you can use this alternative way to prevent from the broken pipe:
( ( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) ; true ) | gzip -c - > yourZipFile.gz
The broken pipe is protected by a true, which prints nothing, but ends itself.

From the tail manpage: Emphasis mine
With --follow (-f), tail defaults to following the file
descriptor, which means that even if a tail'ed file is renamed,
tail will continue to track its end. This default behavior is
not desirable when you really want to track the actual name of
the file, not the file descriptor (e.g., log rotation). Use
--follow=name in that case. That causes tail to track the named
file in a way that accommodates renaming, removal and creation.
Therefore the solution to the problem you proposed is to use:
tail --follow=name LOGILE | gzip -c >> /faraway/log.gz
This way, when the file is deleted, tail stops reading it.

Related

Why does a bash redirection happen *before* the command starts, if the ">" is after the command?

Recently I tried to list all of the images located in a directory I had (several hundred) and put them into a file. I used a very simple command
ls > "image_names.txt"
I was bored and decided to look inside the file and realized that image_names.txt was located in the file. Then I realized, the order of operations performed was not what I thought. I read the command as left to right, in two separate steps:
ls (First list all the file names)
> "image_names.txt" (Then create this file and pipe it here)
Why is it creating the file first then listing all of the files in the directory, despite the ls command coming first?
When you use output redirection, the shell needs a place to put your output( suppose it was very long, then it could all be lost on terminate, or exhaust all working memory), so the first step is to open the output file for streaming output from the executed command's stdout.
This is especially important to know in this kind of command
cat a.txt | grep "foo" > a.txt
since a is opened first and not in append mode it will be truncated, meaning there is no input for cat. So the behaviour you expect that the lines will be filtered from a.txt and replace a.txt will not actually happen. Instead you will just lose the contents of a.txt.
Because redirection > "image_names.txt" was performed before ls command.

Why does grep stop working after "Permission Denied"?

I executed a very simple grep:
grep -r "someSimpleWord" .
Now, there is one directory which contains some files producing some messages like
grep: path/to/some/unpermitted/file: Permission denied
After printing those messages grep just stops doing anything. It does not return nor does it continue searching (looking at top output there is no grep after the messages have been printed).
When I add
--exclude-dir="path/to/some/unpermitted"
grep works as expected again.
Since there is no error message I would consider that a bug in grep but that feels very unlikely. What am I missing here?
I am on Ubuntu 12.02.
Edit: Think of using the -s option (suppressing all error messages), which would leave you with an empty line and a grep doing nothing. So you wait and wait because, well, that just could take a while.
The name of the file where grep stops, ./path to unpermitted/ptmx indicates it's a perhaps a special device file such as a pty mux, normally only found in the /dev/ directory. grep will open that, but the device doesn't supply any data, so grep blocks until data becomes available(which is never).
Use the -D skip argument to grep.
-D ACTION, --devices=ACTION
If an input file is a device, FIFO or socket, use ACTION to process it.
By default, ACTION is read, which means that devices are read just as if they
were ordinary files. If ACTION is skip, devices are silently skipped.

shell - keep writing to the same file[name] even if externally changed

There's a process, which should not be interrupted, and I need to capture the stdout from it.
prg > debug.log
I can't quite modify the way the process outputs its data, though I'm free to wrap its launch as I see fit. Background, piping the output to other commands, etc, that's all a fair game. The process, once started, must run till the end of times (and can't be blocked, say, waiting for a fifo to be emptied). The writing isn't very fast, and the file can be cut at arbitrary place, if it exceeds predefined size.
Now, the problem is the log would grow to fill all available space, and so it must be rotated, oldest instances deleted/overwritten. And now there's the problem...
If I do
mv debug.log debug.log.1
the file debug.log vanishes forever while debug.log.1 keeps growing.
If I do
cp debug.log debug.log.1
rm debug.log
the file debug.log.1 doesn't grow, but debug.log vanishes forever, and all consecutive output from the program is lost.
Is there some way to make the stdout redirect behave like typical log writing - if the file vanished, got renamed or such, create it again?
(this is all working under busybox, so lightweight solutions are preferred.)
If the application in question holds the log file open all the time and cannot be told to close and re-open the log file (as many applications can) then the only option I can think of is to truncate the file in place.
Something like this:
cp debug.log debug.log.1
: > debug.log
You may use the split command or any other command which reopens a new file
prg | split --numeric-suffixes --lines=100 debug.log.
The reason is, that the redirected file output is sent to a file handle, not the file name.
So the process has to close the file handle and open a new one.
You may use rotatelogs to do it in Apache HTTPD style, if you like:
http://httpd.apache.org/docs/2.2/programs/rotatelogs.html
In bash style, you can use a script:
fn='debug.log'
prg | while IFS='' read in; do
if ...; then # time or number of lines read or filesize or ...
i=$((i+1))
mv "$file" "$fn.$i" # rename the file
> "$file" # make it empty
fi
echo "$in" >> "$file" # fill the file again
done

Infinite loop when redirecting file to itself from Unix cat

I'm trying to concatenate a license to the top of my built sources. I'm use GNU Make. In one of my rules, I have:
cat src/license.txt build/3d-tags.js > build/3d-tags.js
But this seems to be causing an infinite loop. When I kill the cat command, I see that build/3d-tags is just the contents of src/license.txt over and over again? What's going on? I would have suspected the two files to be concatenated together, and the resulting ouput from cat to be redirected back into build/3d-tags.js. I'm not looking to append. I'm on OSX, in case the issue is related to GNU cat vs BSD cat.
The shell launches cat as a subprocess. The output redirection (>) is inherited by that subprocess as its stdout (file descriptor 1). Because the subprocess has to inherit the file descriptor at its creation, it follows that the shell has to open the output file before launching the subprocess.
So, the shell opens build/3d-tags.js for writing. Furthermore, since you're not appending (>>), it truncates the file. Remember, this happens before cat has even been launched. At this point, it's impossible to achieve what you wanted because the original contents of build/3d-tags.js is already gone, and cat hasn't even been launched yet.
Then, when cat is launched, it opens the files named in its arguments. The timing and order in which it opens them isn't terribly important. It opens them both for reading, of course. It then reads from src/license.txt and writes to its stdout. This writing goes to build/3d-tags.js. At this point, it's the only content in that file because it was truncated before.
cat then reads from build/3d-tags.js. It finds the content that was just written there, which is what cat previously read from src/license.txt. It writes that content to the end of the file. It then goes back and tries to read some more. It will, of course, find more to read, because it just wrote more data to the end of the file. It reads this remaining data and writes it to the file. And on and on.
In order for cat to work as you hoped (even ignoring the shell redirection obliterating the contents of build/3d-tags.js), it would have to read and keep in memory the entire contents of build/3d-tags.js, no matter how big it was, so that it could write it after it wrote the contents of src/license.txt.
Probably the best way to achieve what you want is something like this:
cat src/license.txt build/3d-tags.js > build/3d-tags.js.new && mv build/3d-tags.js.new build/3d-tags.js || rm -f build/3d-tags.js.new
That is: concatenate the two files to a new file; if that succeeds, move the new file to the original file name (replacing the original); if either step fails, remove the temporary "new" file so as to not leave junk around.

kqueues on Mac OS X: strange event order

I monitor a file for changes in a separate thread using kqueues/ kevent(2).
(I monitor a Python file for reparsing)
I subscribe as following:
EV_SET(&file_change, pyFileP, EVFILT_VNODE,
EV_ADD | EV_CLEAR,
NOTE_DELETE | NOTE_WRITE | NOTE_EXTEND |
NOTE_ATTRIB | NOTE_LINK | NOTE_RENAME | NOTE_REVOKE,
0, 0);
When I write to the file "/tmp/somefile.py" using Vim, I get two separate kevents:
The flags of these events (event.fflags) are:
NOTE_RENAME
and
NOTE_DELETE | NOTE_LINK
I never get a "NOTE_WRITE" event!
This seems to have something to do with the way Vim writes these files, since if I do
echo "sometext" >> /tmp/somefile.py
I do get the:
NOTE_WRITE|NOTE_EXTEND
event.
Odd, eh? I haven't checked the Vim source code but it must do something strange, or does it simply use user level functions that are implemented that way?
I wasn't really expecting this. Is this a known problem, I just have to check for all events possible, or is there a known interface that really checks if a file has been written?
What is actually happening is that Vim won't write over the same file, first
it probably renames it to something else and then creates another file (link).
You can confirm that by doing something like:
$ vim file -c wq
This will open a file and write it. Now check the inode:
$ ls -i
30621217 file
Write the file with Vim again and re-check the inode:
$ vim file -c wq
$ ls -i
30621226 file
It's just different. That means the second file is actually another file
(linked to another inode) with the same name, and the old one was unlinked.
Many editors do that. I can't confirm why exactly Vim takes this approach.
Maybe for safety: if you first rename the file and something goes wrong
while writing the new file, you still have the old one. If you start writing
over a file and a problem occurs (even with memory) you'll probably loose part
of it. Maybe.

Resources