Why does grep stop working after "Permission Denied"? - bash

I executed a very simple grep:
grep -r "someSimpleWord" .
Now, there is one directory which contains some files producing some messages like
grep: path/to/some/unpermitted/file: Permission denied
After printing those messages grep just stops doing anything. It does not return nor does it continue searching (looking at top output there is no grep after the messages have been printed).
When I add
--exclude-dir="path/to/some/unpermitted"
grep works as expected again.
Since there is no error message I would consider that a bug in grep but that feels very unlikely. What am I missing here?
I am on Ubuntu 12.02.
Edit: Think of using the -s option (suppressing all error messages), which would leave you with an empty line and a grep doing nothing. So you wait and wait because, well, that just could take a while.

The name of the file where grep stops, ./path to unpermitted/ptmx indicates it's a perhaps a special device file such as a pty mux, normally only found in the /dev/ directory. grep will open that, but the device doesn't supply any data, so grep blocks until data becomes available(which is never).
Use the -D skip argument to grep.
-D ACTION, --devices=ACTION
If an input file is a device, FIFO or socket, use ACTION to process it.
By default, ACTION is read, which means that devices are read just as if they
were ordinary files. If ACTION is skip, devices are silently skipped.

Related

Run an executable and read logs at the same time

I have a scenario where I run an executable (as an entrypoint) from a docker container.
The problem is, that executable doesn't write logs to stdout, but to a file.
I need a way to run that executable in the foreground (so that if it crashes, it crashes the container as well), but pipe logs from a file to stdout at the same time.
Any suggestion on how to do that?
The Linux environment provides a couple of special files that actually relay to other file descriptors. If you set the log file to /dev/stdout or /dev/fd/1, it will actually appear on the main process's stdout.
The Docker Hub nginx image has a neat variation on this. If you look at its Dockerfile it specifies:
RUN ln -sf /dev/stdout /var/log/nginx/access.log
The Nginx application configuration specifies its log file as /var/log/nginx/access.log. If you do nothing, that is a symlink to /dev/stdout, and so access logs appear in the docker logs output. But if you'd prefer to have the logs in files, you can bind-mount a host directory on /var/log/nginx and you'll get access.log on the host as a file instead.
G'day areller!
Well, your question is quite generic/abstract (as David Maze mentioned, you didn't said what are the exact commands neither how are you running them), but i think i got it.
You will do the following:
# if the command logs to the stdout, do:
command &> /var/tmp/command.log &
tail -f /var/tmp/command.log
# if the command logs to an specific file in, idk, /var/adm, do:
tail -f /specific/file/directory/command.log
Instead of explaining tail -f by myself, i will take a citation from the tail(1p) manual page:
-f If the input file is a regular file or if the file operand specifies a FIFO, do not terminate after the last line of the input file
has been copied, but read and copy further bytes from the input file when they become available. If no file operand is specified and
standard input is a pipe, the -f option shall be ignored. If the input file is not a FIFO, pipe, or regular file, it is unspecified
whether or not the -f option shall be ignored.
I hope i've helped you.

What is mean that `grep -m 1 ` command in UNIX

I googled this command but there was not.
grep -m 1 "\[{" xxx.txt > xxx.txt
However I typed this command, error didn't occured.
Actually, there was not also result of this command.
Please anyone explain me this command's working?
This command reads from and writes to the same file, but not in a left-to-right fashion. In fact > xxx.txt runs first, emptying the file before the grep command starts reading it. Therefore there is no output. You can fix this by storing the result in a temporary file and then renaming that file to the original name.
PS: Some commands, like sed, have an output file option which works around this issue by not relying on shell redirects.

How do I tell tail -f it has finished - cleanly?

I am copying a LOGFILE to a remote server as it is being created.
tail -f LOGILE | gzip -c >> /faraway/log.gz
However, when the original LOGFILE is closed, and moved to a storage directory, my tail -f seems to get some odd data.
How can I ensure that tail -f stops cleanly and that the compressed file /faraway/log.gz is a true copy of LOGFILE?
EDIT 1
I did a bit more digging.
/faraway/log.gz terminated badly - halfway through a FIX message. This must be because I ctrlCed the whole piped command above.
IF ignore this last line, then the original LOGFILE and log.gz match EXACTLY! That's for a 40G file transmitted across the atlantic.
I am pretty impressed by that as it does exactly what I want. Does any reader think I was just "lucky" in this case - is this likely NOT to work in future?
Now, I just need to get a clean close of gzip. Perhaps sending a kill -9 to the tail PID as suggested below may do allow GZIP to finish its compression properly.
To get a full copy, use
tail -n +1 -f your file
If your don't use -n +1 option, you only get the tail part of the file.
Yet this does not solve the deleted/moved file problem.. In fact, the deleting/moving file problem is an IPC (inter-process communication) problem, or an inter-process co-operation problem. If you don't have the correct behavior model of the other process(es), you can't resolve the problem.
For example, if the other program COPY the log file to somewhere else, and then delete the current one, and the program then log outputs to that new log file... Apparently your tail can not read those outputs.
A related feature of unix (and unix-like system) worth of mentioning:
When a file is opened for read by process A, but is then deleted by
process B, the physical contents will not be immediately deleted,
since its reference count is not zero (someone is still using it, i.e.
process A). Process A can still access the file, until it closes the
file. Moving file is another question: If Process B, say, moves
file to the same physical file system (Note: you may have many
physical file system attached on your system), process A can still
access the file, even the file is growing. This kind of moving is
just to change name (path name + file name), nothing more. The
identity of the file (a.k.a. "i-node" in unix) does not change. Yet
if the file is moved to another physical file system, local or remote,
it is as if the file is copied and then removed. So the remove rule
mentioned can be applied.
The missing lines problem you mentioned is interesting, and may need more analysis on the behavior of the programs/processes which generate and move/delete the log file.
--update--
Happy to see you got some progress. Like I said, a process like tail can still access data after
the file is deleted, in a unix-like system.
You can use
( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) | gzip -c - > yourZipFile.gz
to gzip your log file, and kill the tail program by
kill -TERM `cat /tmp/PID_tail`
The gzip should finish by itself without error. Even if you are worried about that gzip will receive a broken
pipe signal, you can use this alternative way to prevent from the broken pipe:
( ( echo $BASHPID > /tmp/PID_tail; exec tail -n + 1 -f yourLogFile ) ; true ) | gzip -c - > yourZipFile.gz
The broken pipe is protected by a true, which prints nothing, but ends itself.
From the tail manpage: Emphasis mine
With --follow (-f), tail defaults to following the file
descriptor, which means that even if a tail'ed file is renamed,
tail will continue to track its end. This default behavior is
not desirable when you really want to track the actual name of
the file, not the file descriptor (e.g., log rotation). Use
--follow=name in that case. That causes tail to track the named
file in a way that accommodates renaming, removal and creation.
Therefore the solution to the problem you proposed is to use:
tail --follow=name LOGILE | gzip -c >> /faraway/log.gz
This way, when the file is deleted, tail stops reading it.

Annoying cat/sed behavior when running through MATLAB on Mac

I am running a script to parse text email files that can be called by MATLAB or run from the command line. The script looks like this:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d" | "$MYSED" "/\-Original Message\-/q"
If I run cat message_file | ./parser.sh in my Terminal window, I get a parsed text file. If I do the same using the system command in MATLAB, I occasionally get the same parsed text followed by the error message
cat: stdout: Broken pipe
When I was using a sed command instead of a cat command, I was getting the same error message. This happens maybe on 1 percent of the files I am parsing, almost always large files where a lot gets deleted after the Original Message line. I do not get the error when I do not include the last pipe, the one deleting everything after 'Original Message'.
I would like to suppress the error message from cat if possible. Ideally, I would like to understand why running the script through MATLAB gives me an error while running it in Terminal does not? Since it tends to happen on larger files, I am guessing it has to do with a memory limitation, but 'broken pipe' is such a vague error message that I can't be sure. Any hints on either issue would be much appreciated.
I could probably run the script outside of MATLAB and save the processed files, but as some of the files are large I would much rather not duplicate them at this point.
The problem is occurring because of the final gsed command, "$MYSED" "/\-Original Message\-/q". This (obviously) quits as soon as it sees a match, and if the gsed feeding it tries to write anything after that it'll receive SIGPIPE and quit, and if there's enough data the same will happen to the first gsed, and if there's enough data after that SIGPIPE will be sent to the original cat command, which reports the error. Whether or not the error makes it back to cat or not will depend on timing, buffering, the amount of data, the phase of the moon, etc.
My first suggestion would be to put the "$MYSED" "/\-Original Message\-/q" command at the beginning of the pipeline, and have it do the reading from the file (rather than feeding it from cat). This'd mean changing the script to accept the file to read from as an argument:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" "/\-Original Message\-/q" "$#" | "$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d"
...and then run it with ./parser.sh message_file. If my assumptions about the message file format are right, changing the order of the gsed commands this way shouldn't cause trouble. Is there any reason the message file needs to be piped to stdin rather than passed as an argument and read directly?

kqueues on Mac OS X: strange event order

I monitor a file for changes in a separate thread using kqueues/ kevent(2).
(I monitor a Python file for reparsing)
I subscribe as following:
EV_SET(&file_change, pyFileP, EVFILT_VNODE,
EV_ADD | EV_CLEAR,
NOTE_DELETE | NOTE_WRITE | NOTE_EXTEND |
NOTE_ATTRIB | NOTE_LINK | NOTE_RENAME | NOTE_REVOKE,
0, 0);
When I write to the file "/tmp/somefile.py" using Vim, I get two separate kevents:
The flags of these events (event.fflags) are:
NOTE_RENAME
and
NOTE_DELETE | NOTE_LINK
I never get a "NOTE_WRITE" event!
This seems to have something to do with the way Vim writes these files, since if I do
echo "sometext" >> /tmp/somefile.py
I do get the:
NOTE_WRITE|NOTE_EXTEND
event.
Odd, eh? I haven't checked the Vim source code but it must do something strange, or does it simply use user level functions that are implemented that way?
I wasn't really expecting this. Is this a known problem, I just have to check for all events possible, or is there a known interface that really checks if a file has been written?
What is actually happening is that Vim won't write over the same file, first
it probably renames it to something else and then creates another file (link).
You can confirm that by doing something like:
$ vim file -c wq
This will open a file and write it. Now check the inode:
$ ls -i
30621217 file
Write the file with Vim again and re-check the inode:
$ vim file -c wq
$ ls -i
30621226 file
It's just different. That means the second file is actually another file
(linked to another inode) with the same name, and the old one was unlinked.
Many editors do that. I can't confirm why exactly Vim takes this approach.
Maybe for safety: if you first rename the file and something goes wrong
while writing the new file, you still have the old one. If you start writing
over a file and a problem occurs (even with memory) you'll probably loose part
of it. Maybe.

Resources