A command like top | grep chromium is giving me a time trend over CPU and memory load to STDOUT. I am trying to pipe this output to a file with top | grep chromium >> load.log which fails (no error but load.log stays empty).
Is the problem the time varying output of top?
How can I solve this issue without using external tools solely using Bash?
How can I adjust the update time?
top | grep --line-buffered chromium >> load.log
When grep writes to a tty its output is line buffered. When it's redirected to a file its output is fully buffered, meaning it only flushes every 4096 bytes. The --line-buffered option overrides this behavior, forcing it to flush stdout after every line.
This behavior isn't unique to grep; it can happen with any standard C program that uses libc. You can use stdbuf to force an arbitrary program to be line-buffered.
top | stdbuf -oL grep chromium >> load.log
Related
I am trying to find when and how the cache is flushed, planned to use the command redis-cli monitor | grep -iE "del|flush" > redis_log.txt for that, but for some reason the file is empty. If i use the command without > redis_log.txt part - it shows a correct output in the terminal, if i use redis-cli monitor > redis_log.txt command - it also saves an actual output to the file, but together it fails, only an empty file is created. Has anybody met a similar issue before?
As mentioned in the comments, the issue you notice certainly comes from the I/O buffering applied to the grep command, especially when its standard output is not attached to a terminal, but redirected to a file or so.
To be more precise, see e.g. this nice blog article which concludes with this wrap-up:
Here’s how buffering is usually set up:
STDIN is always buffered.
STDERR is never buffered.
if STDOUT is a terminal, line buffering will be automatically selected. Otherwise, block buffering (probably 4096 bytes) will be used.
[…] these 3 points explain all “weird” behaviors.
General solution
To tweak the I/O streams buffering of a program, a very handy program provided by coreutils is stdbuf.
So for your use case:
you may want to replace grep -iE "del|flush" with:
stdbuf -o0 grep -iE "del|flush" to completely disable STDOUT buffering;
or if you'd like some trade-off and just have STDOUT line-buffering,
you may want to replace grep -iE "del|flush" with:
either stdbuf -oL grep -iE "del|flush",
or grep --line-buffered -iE "del|flush".
Wrap-up
Finally as suggested by #jetchisel, you'll probably want to redirect STDERR as well to your log file in order not to miss some errors messages… Hence, for example:
redis-cli monitor | stdbuf -o0 grep -iE "del|flush" > redis_log.txt 2>&1
Background
I'm working on a change to my CI build that triggers a command which runs my Xcode unit test output. These tests log an extremely verbose amount of information – so much so that I exceed the 4 MB log capture limit and my build gets terminated. As far as I can tell, there's no way for me to make it less verbose (one way I came up with is going to require a change to the command that runs the tests, which I'm working on.
My Workaround
So I decided to get clever and try filtering my output with sed, like so:
test_running_command | sed '/xctest\[/d; /^$/d'
The sed command works when I run it on a file, filtering out lines containing xctest[ and empty lines, as intended. But when I incorporate this into my CI build, I see output streamed to a certain point, and then it just stops. After 10 minutes, my CI build gets killed anyway, before I had the chance to hit the 4 MB limit.
The Question
Why is sed hanging like this?
Troubleshooting Performed
I tried using awk, like so, which similarly hangs.
test_running_command | awk '$0 !~ /xctest\[/ && $0 !~ /^$/ {print}'
I tried a command from this answer to turn on line buffering for the original command, which just hung at a different place.
script -q /dev/null test_running_command | <sed or awk command from above>
As suggested by #CharlesDuffy in a comment, I used tee to write to file what goes into the pipe, and determined that the left side of the pipe definitely gets much further than the right. I observed (on my local machine) that while the output to the console was frozen, before_filter.txt was continuing to progress. This is the line I used:
test_running_command | tee before.txt | sed '/xctest\[/d; /^$/d' | tee after.txt
As suggested by #LuisMuñoz in a comment, I tried using stdbuf (after installing with brew install coreutils) to disable buffering into and out of sed. I didn't see a difference in behavior. Output still froze at an arbitrary point.
test_running_command | gstdbuf -i0 -o0 sed '/xctest\[/d; /^$/d'
I discovered the -l flag for sed. Apparently that turns on line-buffering mode, which produced the desired effect. I wonder if it was having problems because its buffer was getting overrun by the incredibly long lines produced. Regardless, this command ultimately worked:
test_running_command | sed -l '/xctest\[/d; /^$/d'
As a side note, I had one test that was taking longer than 10 minutes on the CI server, and without log output, that was causing a timeout. I put a single printf statement inside its outer loop that produced just enough log output so the build didn't get killed. The printf line didn't get filtered by the sed command, so this is perfect.
Running this:
ping google.com | grep -o 'PING'
Will print PING to the terminal, so I assume that means that the stdout of grep was captured by the terminal.
So why doesn't the follow command print anything? The terminal just hangs:
ping google.com | grep -o 'PING' | grep -o 'IN'
I would think that the stdout of the first grep command would be redirected to the stdin of the second grep. Then the stdout of the second grep would be captured by the terminal and printed.
This seems to be what happens if ping is replaced with echo:
echo 'PING' | grep -o 'PING' | grep -o 'IN'
IN is printed to the terminal, as I would expect.
So what's special about ping that prevents anything from being printed?
You could try being more patient :-)
ping google.com | grep -o 'PING' | grep -o 'IN'
will eventually display output, but it might take half an hour or so.
Under Unix, the standard output stream handed to a program when it starts up is "line-buffered" if the stream is a terminal; otherwise it is fully buffered, typically with a buffer of 8 kilobytes (8,192 characters). Buffering means that output is accumulated in memory until the buffer is full, or, in the case of line-buffered streams, until a newline character is sent.
Of course, a program can override this setting, and programs which produce only small amounts of output -- like ping -- typically make stdout line-buffered regardless of what it is. But grep does not do so (although you can tell Gnu grep to do that by using the --line-buffered command-line option.)
"Pipes" (which are created to implement the | operator) are not considered terminals. So the grep in the middle will have a fully-buffered output, meaning that its output will be buffered until 8k characters are written. That will take a while in your case, because each line contains only five characters (PING plus a newline), and they are produced once a aecond. So the buffer will fill up after about 1640 seconds, which is almost 28 minutes.
Many unix distributions come with a program called stdbuf which can be used to change buffering for standard streams before running a program. (If you have stdbuf, you can find out how it works by typing man 1 stdbuf.) Programming languages like Perl generally provide other mechanisms to call the stdbuf standard library function. (In Perl, you can force a flush after every write using the builtin variable $|, or the autoflush(BOOL) io handle method.)
Of course, when a program successfully terminates, all output buffers are "flushed" (srnt to their respective streams). So
echo PING | grep -o 'PING' | grep -o 'IN'
will immediately output its only output line. But ping does not terminate unless you provide a count command-line option (-c N; see man ping). So if you need immediate piped throughput, you may need to modify buffering behaviour.
I observed a few times now that A | B | C may not lead to immediate output, although A is constantly producing output. I have no idea how this even may be possible. From my understanding all three processes ought to be working on the same time, putting their output into the next pipe (or stdout) and taking from the previous pipe when they are finished with one step.
Here's an example where I am currently experiencing that:
tcpflow -ec -i any port 8340 | tee second.flow | grep -i "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'
What is supposed to happen:
I look at one port for tcp packages. If something comes it should be a certain XML format and I want to grep the Manufacturer and the Serialnumber from these packages. I would also like to get the full, unmodified output in a text file "second.flow", for later reference.
What happens:
Everything as desired, but instead of getting output every 10 seconds (I'm sure I get these outputs every ten seconds!) I have to wait for a long time and then a lot is printed at once. It's like one of the tools gobbles up everything in a buffer and only prints it if the buffer is full. I don't want that. I want to get each line as fast as possible.
If I replace tcpflow ... with a cat second.flow it works immediately. Can someone describe what's going on? And in case that it's obvious would there be another way to achieve the same result?
Every layer in a series of pipes can involve buffering; by default, tools that don't specify buffering behavior for stdout will use line buffering when outputting to a terminal, and block buffering when outputting anywhere else (including piping to another program or a file). In a chained pipe, all but the last stage will see their output as not going to the terminal, and will block buffer.
So in your case, tcpflow might be producing output constantly, and if it's doing so, tee should be producing data almost at the same rate. But grep is going to limit that flow to a trickle, and won't produce output until that trickle exceeds the size of the output buffer. It's already performed the filtering and called fwrite or puts or printf, but the data is waiting for enough bytes to build up behind it before sending it along to awk, to reduce the number of (expensive) system calls.
cat second.flow produces output immediately because as soon as cat finishes producing output, it exits, flushing and closing its stdout in the process, which cascades, when each step finds its stdin to be at EOF, it exits, flushing and closing its stdout. tcpflow isn't exiting, so the cascade of EOFs and flushing isn't happening.
For some programs, in the general case, you can change the buffering behavior by using stdbuf (or unbuffer, though that can't do line buffering to balance efficiency, and has issues with piped input). If the program is using internal buffering, this still might not work, but it's worth a shot.
In your specific case, though, since it's likely grep that's causing the interruption (by only producing a trickle of output that is sticking in the buffer, where tcpflow and tee are producing a torrent, and awk is connected to stdout and therefore line buffered by default), you can just adjust your command line to:
tcpflow -ec -i any port 8340 | tee second.flow | grep -i --line-buffered "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'
At least for Linux's grep (not sure if switch is standard), that makes grep change its own output buffering to line-oriented buffering explicitly, which should remove the delay. If tcpflow itself is not producing enough output to flush regularly (you implied it did, but you could be wrong), you'd use stdbuf on it (but not tee, which, per stdbuf man page notes, manually changes its buffering, so stdbuf doesn't do anything) to make them line buffered:
stdbuf -oL tcpflow -ec -i any port 8340 | tee second.flow | grep -i --line-buffered "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'
Update from comments: It looks like some flavors of awk block buffer prints to stdout, even when connected to a terminal. For mawk (the default on many Debian based distros), you can non-portably disable it by passing the -Winteractive switch at invocation. Alternatively, to work portably, you can just call system("") after each print, which portably forces output flushing on all implementations of awk. Sadly, the obvious fflush() is not portable to older implementations of awk, but if you only care about modern awk, just use fflush() to be obvious and mostly portable.
Reduce Buffering
Each application in the pipeline can do its own buffering. You may want to see if you can reduce buffering in tcpflow, as your other commands are line-oriented and unlikely to be the source of your buffering issue. I didn't see any specific options for buffer control in tcpflow, though the -b flag for max_bytes may help in circumstances where the text you want to work with is near the front of the flow.
You can also try modifying the buffering of tcpflow using stdbuf from GNU coreutils. This may help to reduce latency in your pipeline, but the man page provides the following caveats:
NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does for example) then that will override corresponding changes by 'stdbuf'. Also some filters (like 'dd' and 'cat' etc.) don't use streams for I/O, and are thus unaffected by 'stdbuf' settings.
As an example, the following may reduce output buffering of tcpflow:
stdbuf --output=0 tcpflow -ec -i any port 8340 # unbuffered output
stdbuf --output=L tcpflow -ec -i any port 8340 # line-buffered output
unless one of the caveats above apply. Your mileage may vary.
For context, I'm attempting to create a shell script that simplifies the realtime console output of ffmpeg, only displaying the current frame being encoded. My end goal is to use this information in some sort of progress indicator for batch processing.
For those unfamiliar with ffmpeg's output, it outputs encoded video information to stdout and console information to stderr. Also, when it actually gets to displaying encode information, it uses carriage returns to keep the console screen from filling up. This makes it impossible to simply use grep and awk to capture the appropriate line and frame information.
The first thing I've tried is replacing the carriage returns using tr:
$ ffmpeg -i "ScreeningSchedule-1.mov" -y "test.mp4" 2>&1 | tr '\r' '\n'
This works in that it displays realtime output to the console. However, if I then pipe that information to grep or awk or anything else, tr's output is buffered and is no longer realtime. For example: $ ffmpeg -i "ScreeningSchedule-1.mov" -y "test.mp4" 2>&1 | tr '\r' '\n'>log.txt results in a file that is immediately filled with some information, then 5-10 secs later, more lines get dropped into the log file.
At first I thought sed would be great for this: $ # ffmpeg -i "ScreeningSchedule-1.mov" -y "test.mp4" 2>&1 | sed 's/\\r/\\n/', but it gets to the line with all the carriage returns and waits until the processing has finished before it attempts to do anything. I assume this is because sed works on a line-by-line basis and needs the whole line to have completed before it does anything else, and then it doesn't replace the carriage returns anyway. I've tried various different regex's for the carriage return and new line, and have yet to find a solution that replaces the carriage return. I'm running OSX 10.6.8, so I am using BSD sed, which might account for that.
I have also attempted to write the information to a log file and use tail -f to read it back, but I still run into the issue of replacing carriage returns in realtime.
I have seen that there are solutions for this in python and perl, however, I'm reluctant to go that route immediately. First, I don't know python or perl. Second, I have a completely functional batch processing shell application that I would need to either port or figure out how to integrate with python/perl. Probably not hard, but not what I want to get into unless I absolutely have to. So I'm looking for a shell solution, preferably bash, but any of the OSX shells would be fine.
And if what I want is simply not doable, well I guess I'll cross that bridge when I get there.
If it is only a matter of output buffering by the receiving application after the pipe. Then you could try using gawk (and some BSD awk) or mawk which can flush buffers. For example, try:
... | gawk '1;{fflush()}' RS='\r\n' > log.txt
Alternatively if you awk does not support this you could force this by repeatedly closing the output file and appending the next line...
... | awk '{sub(/\r$/,x); print>>f; close(f)}' f=log.out
Or you could just use shell, for example in bash:
... | while IFS= read -r line; do printf "%s\n" "${line%$'\r'}"; done > log.out
Libc uses line-buffering when stdout and stderr are connected to a terminal and full-buffering (with a 4KB buffer) when connected to a pipe. This happens in the process generating the output, not in the receiving process—it's ffmpeg's fault, in your case, not tr's.
unbuffer ffmpeg -i "ScreeningSchedule-1.mov" -y "test.mp4" 2>&1 | tr '\r' '\n'
stdbuf -e0 -o0 ffmpeg -i "ScreeningSchedule-1.mov" -y "test.mp4" 2>&1 | tr '\r' '\n'
Try using unbuffer or stdbuf to disable output buffering.
The buffering of data between processes in a pipe is controlled with some system limits, which is at least on my system (Fedora 17) not possible to modify:
$ ulimit -a | grep pipe
pipe size (512 bytes, -p) 8
$ ulimit -p 1
bash: ulimit: pipe size: cannot modify limit: Invalid argument
$
Although this buffering is mostly related to how much excess data the producer is allowed to produce before it is stopped if the consumer is not consuming at the same speed, it might also affect timing of delivery of smaller amounts of data (not quite sure of this).
That is the buffering of pipe data, and I do not think there is much to tweak here. However, the programs reading/writing the piped data might also buffer stdin/stdout data and this you want to avoid in your case.
Here is a perl script that should do the translation with minimal input buffering and no output buffering:
#!/usr/bin/perl
use strict;
use warnings;
use Term::ReadKey;
$ReadKeyTimeout = 10; # seconds
$| = 1; # OUTPUT_AUTOFLUSH
while( my $key = ReadKey($ReadKeyTimeout) ) {
if ($key eq "\r") {
print "\n";
next;
}
print $key;
}
However, as already pointer out, you should make sure that ffmpeg does not buffer its output if you want real-time response.