Displaying stdout on screen and a file simultaneously - bash

I'd like to log standard output form a script of mine to a file, but also have it display to me on screen for realtime monitoring. The script outputs something about 10 times every second.
I tried to redirect stdout to a file and then tail -f that file from another terminal, but for some reason tail is updating the screen significantly slower than the script is writing to the file.
What's causing this lag? Is there an alternate method of getting one standard output stream both on my terminal and into a file for later examination?

I can't say why tail lags, but you can use tee:
Redirect output to multiple files, copies standard input to standard output and also to any files given as arguments. This is useful when you want not only to send some data down a pipe, but also to save a copy.
Example: <command> | tee <outputFile>

How much of a lag do you see? A few hundred characters? A few seconds? Minutes? Hours?
What you are seeing is buffering. Almost all file reads and writes are buffered. This includes input and output and there is also some buffering taking place within pipes. It's just more efficient to pass a packet of data around rather than a byte at a time. I believe data on HFS+ file systems are stored in UTF-16 while Mac OS X normally use UTF-8 as a default. (NTFS also stores data using UTF-16 while Windows uses code pages for character data by default).
So, if you run tail -f from another terminal, you may be seeing buffering from tail, but when you use a pipe and then tee, you may have a buffer in the pipe, and in the tee command which maybe why you see the lag.
By the way, how do you know there's a lag? How do you know how quickly your program is writing to the disk? Do you print out something in your program to help track the writes to the file?
In that case, you might not be lagging as much as you think. File writes are also buffered. So, it is very possible that the lag isn't from the tail -f, but from your script writing to the file.

Use tee command:
tail -f /path/logFile | tee outfile

Related

How to get error text in the iperf message? [duplicate]

I am rather confused with the purpose of these three files. If my understanding is correct, stdin is the file in which a program writes into its requests to run a task in the process, stdout is the file into which the kernel writes its output and the process requesting it accesses the information from, and stderr is the file into which all the exceptions are entered. On opening these files to check whether these actually do occur, I found nothing seem to suggest so!
What I would want to know is what exactly is the purpose of these files, absolutely dumbed down answer with very little tech jargon!
Standard input - this is the file handle that your process reads to get information from you.
Standard output - your process writes conventional output to this file handle.
Standard error - your process writes diagnostic output to this file handle.
That's about as dumbed-down as I can make it :-)
Of course, that's mostly by convention. There's nothing stopping you from writing your diagnostic information to standard output if you wish. You can even close the three file handles totally and open your own files for I/O.
When your process starts, it should already have these handles open and it can just read from and/or write to them.
By default, they're probably connected to your terminal device (e.g., /dev/tty) but shells will allow you to set up connections between these handles and specific files and/or devices (or even pipelines to other processes) before your process starts (some of the manipulations possible are rather clever).
An example being:
my_prog <inputfile 2>errorfile | grep XYZ
which will:
create a process for my_prog.
open inputfile as your standard input (file handle 0).
open errorfile as your standard error (file handle 2).
create another process for grep.
attach the standard output of my_prog to the standard input of grep.
Re your comment:
When I open these files in /dev folder, how come I never get to see the output of a process running?
It's because they're not normal files. While UNIX presents everything as a file in a file system somewhere, that doesn't make it so at the lowest levels. Most files in the /dev hierarchy are either character or block devices, effectively a device driver. They don't have a size but they do have a major and minor device number.
When you open them, you're connected to the device driver rather than a physical file, and the device driver is smart enough to know that separate processes should be handled separately.
The same is true for the Linux /proc filesystem. Those aren't real files, just tightly controlled gateways to kernel information.
It would be more correct to say that stdin, stdout, and stderr are "I/O streams" rather
than files. As you've noticed, these entities do not live in the filesystem. But the
Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice,
that really means that you can use the same library functions and interfaces (printf,
scanf, read, write, select, etc.) without worrying about whether the I/O stream
is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction.
Most programs need to read input, write output, and log errors, so stdin, stdout,
and stderr are predefined for you, as a programming convenience. This is only
a convention, and is not enforced by the operating system.
As a complement of the answers above, here is a sum up about Redirections:
EDIT: This graphic is not entirely correct.
The first example does not use stdin at all, it's passing "hello" as an argument to the echo command.
The graphic also says 2>&1 has the same effect as &> however
ls Documents ABC > dirlist 2>&1
#does not give the same output as
ls Documents ABC > dirlist &>
This is because &> requires a file to redirect to, and 2>&1 is simply sending stderr into stdout
I'm afraid your understanding is completely backwards. :)
Think of "standard in", "standard out", and "standard error" from the program's perspective, not from the kernel's perspective.
When a program needs to print output, it normally prints to "standard out". A program typically prints output to standard out with printf, which prints ONLY to standard out.
When a program needs to print error information (not necessarily exceptions, those are a programming-language construct, imposed at a much higher level), it normally prints to "standard error". It normally does so with fprintf, which accepts a file stream to use when printing. The file stream could be any file opened for writing: standard out, standard error, or any other file that has been opened with fopen or fdopen.
"standard in" is used when the file needs to read input, using fread or fgets, or getchar.
Any of these files can be easily redirected from the shell, like this:
cat /etc/passwd > /tmp/out # redirect cat's standard out to /tmp/foo
cat /nonexistant 2> /tmp/err # redirect cat's standard error to /tmp/error
cat < /etc/passwd # redirect cat's standard input to /etc/passwd
Or, the whole enchilada:
cat < /etc/passwd > /tmp/out 2> /tmp/err
There are two important caveats: First, "standard in", "standard out", and "standard error" are just a convention. They are a very strong convention, but it's all just an agreement that it is very nice to be able to run programs like this: grep echo /etc/services | awk '{print $2;}' | sort and have the standard outputs of each program hooked into the standard input of the next program in the pipeline.
Second, I've given the standard ISO C functions for working with file streams (FILE * objects) -- at the kernel level, it is all file descriptors (int references to the file table) and much lower-level operations like read and write, which do not do the happy buffering of the ISO C functions. I figured to keep it simple and use the easier functions, but I thought all the same you should know the alternatives. :)
I think people saying stderr should be used only for error messages is misleading.
It should also be used for informative messages that are meant for the user running the command and not for any potential downstream consumers of the data (i.e. if you run a shell pipe chaining several commands you do not want informative messages like "getting item 30 of 42424" to appear on stdout as they will confuse the consumer, but you might still want the user to see them.
See this for historical rationale:
"All programs placed diagnostics on the standard output. This had
always caused trouble when the output was redirected into a file, but
became intolerable when the output was sent to an unsuspecting
process. Nevertheless, unwilling to violate the simplicity of the
standard-input-standard-output model, people tolerated this state of
affairs through v6. Shortly thereafter Dennis Ritchie cut the Gordian
knot by introducing the standard error file. That was not quite enough.
With pipelines diagnostics could come from any of several programs
running simultaneously. Diagnostics needed to identify themselves."
stdin
Reads input through the console (e.g. Keyboard input).
Used in C with scanf
scanf(<formatstring>,<pointer to storage> ...);
stdout
Produces output to the console.
Used in C with printf
printf(<string>, <values to print> ...);
stderr
Produces 'error' output to the console.
Used in C with fprintf
fprintf(stderr, <string>, <values to print> ...);
Redirection
The source for stdin can be redirected. For example, instead of coming from keyboard input, it can come from a file (echo < file.txt ), or another program ( ps | grep <userid>).
The destinations for stdout, stderr can also be redirected. For example stdout can be redirected to a file: ls . > ls-output.txt, in this case the output is written to the file ls-output.txt. Stderr can be redirected with 2>.
Using ps -aux reveals current processes, all of which are listed in /proc/ as /proc/(pid)/, by calling cat /proc/(pid)/fd/0 it prints anything that is found in the standard output of that process I think. So perhaps,
/proc/(pid)/fd/0 - Standard Output File
/proc/(pid)/fd/1 - Standard Input File
/proc/(pid)/fd/2 - Standard Error File
for example
But only worked this well for /bin/bash other processes generally had nothing in 0 but many had errors written in 2
For authoritative information about these files, check out the man pages, run the command on your terminal.
$ man stdout
But for a simple answer, each file is for:
stdout for a stream out
stdin for a stream input
stderr for printing errors or log messages.
Each unix program has each one of those streams.
stderr will not do IO Cache buffering so if our application need to print critical message info (some errors ,exceptions) to console or to file use it where as use stdout to print general log info as it use IO Cache buffering there is a chance that before writing our messages to file application may close ,leaving debugging complex
A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. The fopen() function creates certain descriptive data for a stream and returns a pointer to designate the stream in all further transactions. Normally there are three open streams with constant pointers declared in the header and associated with the standard open files.
At program startup three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device
https://www.mkssoftware.com/docs/man5/stdio.5.asp
Here is a lengthy article on stdin, stdout and stderr:
What Are stdin, stdout, and stderr on Linux?
To summarize:
Streams Are Handled Like Files
Streams in Linux—like almost everything else—are treated as though
they were files. You can read text from a file, and you can write text
into a file. Both of these actions involve a stream of data. So the
concept of handling a stream of data as a file isn’t that much of a
stretch.
Each file associated with a process is allocated a unique number to
identify it. This is known as the file descriptor. Whenever an action
is required to be performed on a file, the file descriptor is used to
identify the file.
These values are always used for stdin, stdout, and stderr:
0: stdin
1: stdout
2: stderr
Ironically I found this question on stack overflow and the article above because I was searching for information on abnormal / non-standard streams. So my search continues.

How to split a real-time stdout stream into several files?

I have a python script which is continuously writing a text stream to stdout.
Something like this (genstream.py):
while 1:
print (int(time.time()))
time.sleep(1)
I want a bash script which launch the python script, save its output to a set of files, let's say to split the output every hour to avoid the creation of a huge file which is difficult to manage.
The so created files will be then processed (i.e. one at the end of each hour) by the same bash script to insert the values into a database and moved to an archive folder.
I did my search in google/stack overflow (e.g. split STDIN to multiple files (and compress them if possible) Bash reading STDOUT stream in real-time or https://unix.stackexchange.com/questions/26175/ ) but I didn't find any solution so far.
I've tried to use also something easy like this (so without taking in account the time but only the number of lines)
python3 ./genstream.py | split -l5 -
but I have no output.
I've tried a combination of (named-)pipes and tee but nothing seems to work.
Try this:
python3 ./genstream.py | while read line; do
echo "$line" >> split_$(date +%Y-%m-%d-%H)
done

Cannot capture diagnostic output from mpg123 while the program is running

I want to invoke mpg123 from PHP (using exec) and monitor the diagnostic output generated by the program while it is running.
I have been searching the Internet and cannot find any way to see the redirected output of a command line program while it is running.
Instead, the output file is always written out AFTER the process finishes, but I need to access the output while it still running, hence my question.
Testing with:
mpg123.exe http://148.251.184.14:8192/stream | tee.exe streaming.txt
... file streaming.txt` is always empty while running the exe.
[Editors note: and so it would be, mpg123 sends diagnostic output to stderr].
Also, I tested this:
mpg123.exe http://148.251.184.14:8192/stream > streaming.txt
... and still no luck, because again, file streaming.txt is always empty while mpg123 is still running.
[Editor's note: of course, for the same reason as above, the command should be:
mpg123.exe http://148.251.184.14:8192/stream 2> streaming.txt
But still you see nothing in file streaming.txt until the program terminates.
end note]
Is there a way to do this? Seems to be a hard nut or not even possible...
Thank you for any help.
PS:
Using static binary from: https://mpg123.de/download/win64/1.25.10/
Tee.exe: https://sourceforge.net/projects/unxutils/files/unxutils/current/
You could, for example, get tail from GnuWin32 (it's in package coreutils). Then:
In one command prompt window run tail -F output-file. This will initially sit there because there is no output-file yet. Let it sit.
In another command prompt window run your-command > output.file.
In the first command prompt window tail will display the contents of output-file as it is generated.
Note 1: The program your-command may buffer its output, so that it written in chunks. Some programs have options to minimize output buffering, for example sed -u or grep --line-buffered.
Note 2: tail works as fast as it can, but console output is quite slow on Windows. It is perfectly possible for a program to generate output much faster than tail can display it.
I have tested this procedure with dir /s C:\ > Ls-lR.txt and tail Ls-lR.txt.
The quirks of MPG123
The specific program which the querent wants to monitor is MPG123. This program:
Does not normally write to standard output, and it actually closes stdandard output unless it wants to write WAV data.
Writes diagnostic messages to standard error, but only if standard error is not redirected or the option -v is given.
So...
Open a command prompt window and type tail -F mpg123.out. Since there is no file named mpg123.out, tail will sit and wait. Let it wait.
C> tail -F MPG123.out
Open a second command prompt window, and run mpg123
Redirecting stdandard error to mpg123.out, and
With the option -v.
C> mpg123.exe 2>MPG123.out -v "\path\to\the\music\file.mp3"
In the first window, watch the diagnostic messages of MPG123.
I have decided to delete my original answer and post a new one, because although the old one was factually correct it didn't answer the question very well. Now that I understand what the OP is actually doing, I can answer this properly.
The issue is actually very simple. Most programs, especially command line programs, on most platforms contain logic to detect if stdout or stderr has been redirected to a file (> file) or a pipe (e.g. | tee). This logic is usually actually buried in the runtime library so programs get it for free, which is why they pretty much all do it, and I'm sure that's true of mpg123 which is a relatively simple beast. What I say below will apply to almost any program.
Now, what this logic does is to decide whether or not to buffer output to stdout / stderr (it may make a different decision for each one). If output is going directly to the console (or, in Unix, the terminal) then it is not buffered at all (or maybe just on a per-line basis). Everything is sent out pretty much as soon as the program generates it.
If, on the other hand, output is redirected then mpg123 detects this and writes the data out in chunks (often 4k chunks), and if the total amount of output generated while the program is running is smaller than the size of the buffer then you won't see anything in the output file or pipe until the program terminates, at which point the buffer is flushed and the file closed (so you see it then, as the OP noted).
Now, knowing all that, we can explain the behaviour that the OP observes when running mpg123. This is not in fact down to any intricate juggling that mpg123 might do with file handles and the change in behaviour when you add in -v is just a side-effect. What you see is a direct result of the different buffer strategy used when the output is redirected.
So, using the binary linked to by the OP, this command:
mpg123 http://148.251.184.14:8192/stream
Generates the following output on the console straightaway (because nothing is buffered):
High Performance MPEG 1.0/2.0/2.5 Audio Player for Layers 1, 2 and 3
version 1.25.10; written and copyright by Michael Hipp and others
free software (LGPL) without any warranty but with best wishes
Directory: http://148.251.184.14:8192/
Playing MPEG stream 1 of 1: stream ...
ICY-NAME: Chroma Metal
ICY-URL: http://chromaradio.com
MPEG 1.0 L III cbr128 44100 j-s
ICY-META: StreamTitle='Avantasia - The Seven Angels';
It then goes on to play the stream though the sound card, which takes quite a while. The above information is written to stdout (and mpg123 always writes diagostic information to stdout).
This command, however, behaves differently, because the output is buffered (note the redirection of stdout):
mpg123 http://148.251.184.14:8192/stream 2>x.txt
As noted by the OP, this just creates a zero length file while the stream is playing, because the total amount of diagnostic output fits in mpg123s internal buffer so it just stays there until the program terminates, at which point the output duly turns up in the file for the reason given above.
And finally, this command, with the -v parameter added in:
mpg123 -v http://148.251.184.14:8192/stream 2>x.txt
does generate some output in x.txt while the program is running because the buffer fills up with the extra diagnostic information that the -v flag generates and at that point mpg123 has to write it to disk. The -v flag means verbose. That's where the extra output comes from.
Please note though that when you do this the data in the file is still always some way behind (because the next buffer-full is building up and won't be output until it's full), so while adding -v might get you what you want (or at least some of it), it hasn't changed the underlying problem. You can see this quite clearly if you run the above command in one console window and tail -F x.txt in another. When you do that, nothing shows up for the first 5 seconds or so. Then some (partial) output appears, and so it goes on.
So I hope that clears things up. Windows and Unix behave pretty much the same in this regard. I will edit the OP's question to make it a little less confusing. It's a bit untidy at the moment.
Perhaps the "tee" already on the machine could be used. I do not have you mpg123.exe executable, so I cannot test it.
powershell -NoProfile -Command "& mpg123.exe [StreamURL] | Tee-Object -FilePath .\streaming.txt"
Edit
Based on the information from #AlexP that mpg123.exe is writing to stderr, I would try:
powershell -NoProfile -Command "& mpg123.exe [StreamURL] 2>&1 | Tee-Object -FilePath .\streaming.txt"

BASH, cat Buffer

cat report.txt | sed 's/<\/li>/<\/li> \n/g' > report.txt
This obviously results in an empty file.
Is there a mechanism that allows you to store the data before processing it or store the output until the command has finished executing and then write the file?
http://en.wikipedia.org/wiki/Pipeline_(Unix)#Implementation:
"...a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in a queue. When the receiving program is ready to read data, the operating system sends its data from the queue, then removes that data from the queue."
Sounds like there should be a simple trick to load this into a queue instead of writing it immediately to file, then unload it after the command has finished?
Thanks much!
What you need is editing in place:
sed -i 's/<\/li>/<\/li> \n/g' report.txt

Redirection Doesn't Work

I want to put my program's output into a file. I keyed in the following :
./prog > log 2>&1
But there is nothing in the file "log". I am using the Ubuntu 11.10 and the default shell is bash.
Anybody know the cause of this AND how I can debug this?
There are many possible causes:
The program reads the input from log file while you try to redirect into it with truncation (see Why doesn't "sort file1 > file1" work?)
The output is buffered so that you don't see data in the file until the output buffer is flushed. You can manually call fflush or output std::flush if using C++ I/O stream etc.
The program is smart enough and disables output if the output stream is not a terminal.
You look at the wrong file (i.e. in another directory).
You try to dump file's contents incorrectly.
Your program outputs '\0' as the first character so the output appears to be empty, even though there is some data.
Name your own.
Your best bet is to run this application under a debugger (like gdb) or use strace or ptrace (or both) and see what the program is doing. I mean, really, output redirection works for the last like 40 years, so the problem must be somewhere else.

Resources