I have a perl script which writes messages to STDOUT and STDERR (via print/croak statements), but I also redirect the STDOUT and STDERR to a logfile:
File::Tee::tee STDOUT, ">>", "$logFile" ;
File::Tee::tee STDERR, ">>", "$logFile" ;
Now the output logfile has messages from STDOUT and STDERR displayed out of order. Also the actual output on terminal is also out of order. I have tried flushing the buffers (as recommended here: https://perl.plover.com/FAQs/Buffering.html) but it doesn't help:
select(STDERR) ;
$| = 1 ;
select(STDOUT) ;
$| = 1 ;
Does anyone know what I must do to see the output in order (I also tried additionally flushing the filehandle corresponding to $logfile, but its still the same)?
EDIT:
Thanks to all of you who have replied. A lot of the discussion over this ended up in comments, so I am going to list the few things which I tried based on feedback from all of you.
I already was flushing the STDOUT and STDERR before I used File::Tee. As #jimtut suspected, File::Tee was indeed the culprit - removing it restored the ordering on console. But I did want to redirect the STDOUT and STDERR.
#mob suggested using IO::Tee instead, but I haven't fully understood how to make that work they way I want in my code.
#briandfoy pointed out that there isn't a reliable way to ensure 2 seperate filehandles are seen in the correct order in realtime and also suggested using a logging routine which is the only place which can write to STDOUT/STDERR. #zimd further pointed out that File::Tee uses fork which is the heart of the issue since 2 processes cannot guarantee any order on output.
Since the File::Tee is to blame, I attempted to remove this from the code. I updated my logger function to print to STDOUT/STDERR as well as to additionally print to $log filehandle. Further for capturing the warns in the log, I did the following:
sub warning_handler {
my $msg = $_[0] ;
print STDERR $msg ;
print $log $msg if defined $log ;
}
$SIG{__WARN__} = \&warning_handler ;
This worked great for all of the code under my control. Everything was printing in order now both on console and logfile. However I realized I can't use this solution since I was also calling someone else's perl packages for some functionality and clearly I couldn't intercept the print/croak etc which wrote to STDOUT/STDERR within the 'off the shelf' package. So right now, I don't have a good solution. However I suspect if I can find someway to intercept STDOUT/STDERR within perl, I might be able to get what I need.
EDIT2:
I added my own answer which is probably the closest I got to solving the problem by modifying mob's solution of using IO::Tee instead of File::Tee, but even this misses some messages (though it fixes ordering).
EDIT3:
Finally found the 'solution'
use IO::Tee ;
use Capture::Tiny qw(capture);
...
...
select(STDERR) ;
$| = 1 ;
select(STDOUT) ;
$| = 1 ;
open (my $log, ">", $logfilename) ;
*REALSTDOUT = *STDOUT ;
*REALSTDERR = *STDERR ;
*STDOUT = IO::Tee->new(\*REALSTDOUT, $log);
*STDERR = IO::Tee->new(\*REALSTDERR, $log);
# Regular Perl code here which sends output to STDOUT/STDERR
...
...
# system calls / calls to .so needs to be catpured
&log_streams(sub { &some_func_which_calls_shared_object() ; }) ;
sub log_streams {
my ($cr, #args) = #_; # code reference, with its arguments
my ($out, $err, $exit) = capture { $cr->(#args) };
if ($out) {
print STDOUT $out;
}
if ($err) {
print STDERR $err;
}
}
The use of IO::Tee ensures all perl generated outputs to console also go to logfile, and this happens immediately thereby updating the log and console in realtime. Since IO::Tee is changing the meaning of STDOUT/STDERR filehandles to now refer to the teed handles, it can only intercept stdio from perl statements, it misses sys calls since they bypass perl's STDOUT/STDERR handles. So we capture the syscall output and then use the log_streams routine to forward it to the now aliased STDOUT/STDERR streams. This creates a delay in the system call generated output showing up in log/terminal but there is no delay for perl generated output - i.e. best of both worlds. Do note that the ordering of stderr and stdout generated by an invocation of subroutine some_func_which_calls_shared_object is not preserved since in the log_streams routine, we first print to STDOUT and then to STDERR - as long as the system call is atomic and doesn't do much in terms of interleaving stdout/stderr messages we should be ok.
Appreciate solutions from briandfoy, mob and zimd whose answers I combined to arrive at this solution! Never thought it would require to go through this detail for what seems a very simple problem.
With two separate file handles, there's no contract or guarantee that you'll see them in real time. Various settings and buffers affect that, which is why you see the auto flush stuff ($|). It's the same idea for files or the terminal.
Realize this is an architectural issue rather than a syntactic one. You have two things competing for the same resource. That usually ends in tears. I hesitate to suggest a solution when I don't know what the problem is, but consider having whatever is trying to write to STDOUT or STDERR write to some sort of message broker that collects all the messages and is the only thing that writes to the final (shared) destination. For example, things that want to add entries to the syslog don't write to the syslog; they send messages to the thing that writes to the syslog.
A more Perly example: in Log4perl, you don't write to the final destinations. You simply log a message and the logger is the single thing figures out how to handle it. When I want this sort of behavior with the module, I don't use output facilities directly:
debug( "Some debug message" );
sub debug {
my $message = shift;
output( "DEBUG: $message" );
}
sub output { # single thing that can output message
...
}
Then do whatever you need to do in output.
But, you can't always control that in other things that are also trying to output things. Perl let's you get around this by redefining what warn and friends do by putting a coderef in $SIG{__WARN__}. You can capture warning messages and do whatever you like with them (such as sending them to standard output). Beyond that is black magic that reopens STDERR onto something you can control. It's not that bad and it's isolated in one place.
At some point where another person doesn't want merged output, and the intrusive solutions make it impossible to separate them. I'd much prefer flexibility than hard-coded constraint. If I want just the errors, I want a way to get just the errors. There are many other sorts of workarounds, such as wrappers that collect both output streams (so, not at all intrusive) and various command-redirections.
You will have two filehandles writing to $logfile. Unless File::Tee takes care to seek to the end of the filehandle before every write (which it doesn't appear to), you will get a race condition where one filehandle will overwrite the other.
A workaround would be to use the reopen option to the File::Tee::tee function -- that will close the file after each write and reopen it (at the proper end of the file) before the next write. That could hurt your performance though, depending on how often you write to those filehandles.
You might also have better luck with IO::Tee, which is a more straightforward implementation (using tied filehandles) than what File::Tee uses (a background process for each File::Tee::tee call), so you may get fewer surprises. Here is how an IO::Tee solution might look:
use IO::Tee;
$| = 1;
open my $stdout, ">&=1"; # to write to original stdout
open my $stderr, ">&=2"; # to write to original stderr
open my $fh_log, ">>", $logfile;
*STDOUT = IO::Tee->new($stdout, $fh_log);
*STDERR = IO::Tee->new($stderr, $fh_log);
...
There are no background process, extra threads, or anything else to cause a race condition. Both STDOUT and STDERR will both write to the same log filehandle from the same process.
After taking hint from #mob's answer to use IO::Tie instead of File::Tee (since the latter uses fork causing out of order STDERR vs STDOUT), I modified mob's original solution a bit and it worked (almost - read on):
use IO::Tee
...
...
open (my $log, ">", $logfilename) ;
*MYSTDOUT = *STDOUT ;
*MYSTDERR = *STDERR ;
*STDOUT = IO::Tee->new(\*MYSTDOUT, $log);
*STDERR = IO::Tee->new(\*MYSTDERR, $log);
This resulted in the correct ordering on both console and in the logfile (mob's original solution using open to dup the STDOUT/STDERR didn't work - it resulted in correct order in logfile, but out of order on console. Using a typeglob alias instead of dup works for some strange reason).
However, as good as this solution sounds, it missed printing some messages from a package which I call in the logfile (though they get printed on console). My original code which had File::Tee did result in these same messages from package being shown in the logfile, so there is some voodoo going on somewhere. The specific package in question is a .so file so I have no visibility into how exactly it prints its messages.
EDIT:
I guess that the .so file is as good as an external system command printing to stdout/stderr. Since its not going through perl IO, the handles pointed to by STDOUT/STDERR typeglobs within perl will not reflect the output of external programs called from perl.
I guess the best solution would be to use a combination of this solution for messages coming from within the perl code, and using Capture::Tiny::capture as pointed out by #zdim for capturing and redirecting messages from system calls/calls going through the swig interface.
Note The first part is done via tie-d handles; solution in the second part uses Capture::Tiny
A bare-bones proof-of-concept for an approach using tie-d handles.
The package that ties a handle, by printing from it to a file and to (a copy of) STDOUT stream
package DupePrints;
use warnings;
use strict;
use feature 'say';
my $log = 't_tee_log.out';
open my $fh_out, '>', $log or die $!; # for logging
# An independent copy of STDOUT (via dup2), for prints to terminal
open my $stdout, '>&', STDOUT or die $!;
sub TIEHANDLE { bless {} }
sub PRINT {
my $self = shift;
print $fh_out #_;
print $stdout #_;
}
1;
A program that uses it
use warnings;
use strict;
use feature 'say';
use DupePrints;
$| = 1;
tie *STDERR, 'DupePrints';
tie *STDOUT, 'DupePrints';
say "hi";
warn "\t==> ohno";
my $y;
my $x = $y + 7;
say "done";
This prints to both the terminal and to t_tee_log.out the text
hi
==> ohno at main_DupePrints.pl line 14.
Use of uninitialized value $y in addition (+) at main_DupePrints.pl line 17.
done
See perltie and Tie::Handle, and this post with related examples, and perhaps this post
The logging to a file of STDOUT and STDERR streams (along with a copied printout) works across other modules that may be used in the main program, as well.
To also have "clean" prints, that don't get logged, copy the STDOUT handle in the main program, like it's done in the module, and print to that. If you need to use this in a more selective and sophisticated manner please modify as needed -- as it stands it is meant to be only a basic demo.
With the clarification in the question's edit, here is a different approach: wrap a call to Capture::Tiny, which captures all output from any code, and then manage the captured prints as needed
use warnings;
use strict;
use feature qw(say state);
use Capture::Tiny qw(capture);
sub log_streams {
my ($cr, #args) = #_; # code reference, with its arguments
# Initialize "state" variable, so it runs once and stays open over calls
state $fh_log = do {
open my $fh, '>', 'tee_log.txt' or die $!;
$fh;
};
my ($out, $err, $exit) = capture { $cr->(#args) };
if ($out) {
print $fh_log $out;
print $out;
}
if ($err) {
print $fh_log $err;
print $err;
}
}
log_streams( sub { say "hi" } );
log_streams( sub { warn "==> ohno" } );
log_streams( sub { my $y; my $x = $y + 7; } );
log_streams( sub { system('perl', '-wE', q(say "external perl one-liner")) } );
log_streams( sub { say "done" } );
The downside of all this is that everything needs to run via that sub. But then again, that's actually a good thing, even if sometimes inconvenient.
The state feature is used to "initialize" the filehandle because a variable declared as state is never re-initialized; so the file is opened only once, on the first call, and stays opened.
This is also a demo in need of completion.
I am writing a program in which I am taking in a csv file via the < operator on the command line. After I read in the file I would also like to ask the user questions and have them input their response via the command line. However, whenever I ask for user input, my program skips right over it.
When I searched stack overflow I found what seems to be the python version here, but it doesn't really help me since the methods are obviously different.
I read my file using $stdin.read. And I have tried to use regular gets, STDIN.gets, and $stdin.gets. However, the program always skips over them.
Sample input ruby ./bin/kata < items.csv
Current File
require 'csv'
n = $stdin.read
arr = CSV.parse(n)
input = ''
while true
puts "What is your choice: "
input = $stdin.gets.to_i
if input.zero?
break
end
end
My expected result is to have What is your choice: display in the command and wait for user input. However, I am getting that phrase displayed over and over in an infinite loop. Any help would be appreciated!
You can't read both file and user input from stdin. You must choose. But since you want both, how about this:
Instead of piping the file content to stdin, pass just the filename to your script. The script will then open and read the file. And stdin will be available for interaction with the user (through $stdin or STDIN).
Here is a minor modification of your script:
arr = CSV.parse(ARGF) # the important part.
input = ''
while true
puts "What is your choice: "
input = STDIN.gets.to_i
if input.zero?
break
end
end
And you can call it like this:
ruby ./bin/kata items.csv
You can read more about ARGF in the documentation: https://ruby-doc.org/core-2.6/ARGF.html
This has nothing to do with Ruby. It is a feature of the shell.
A file descriptor is connected to exactly one file at any one time. The file descriptor 0 (standard input) can be connected to a file or it can be connected to the terminal. It can't be connected to both.
So, therefore, what you want is simply not possible. And it is not just not possible in Ruby, it is fundamentally impossible by the very nature of how shell redirection works.
If you want to change this, there is nothing you can do in your program or in Ruby. You need to modify how your shell works.
I run a script that returns a long JSON on stdout, but apparently it is truncated:
res = ''
Open3.popen2('node script.js') {|i,o,t|
while line = o.gets
res = res + line
end
}
puts res # it is truncated
Why? How can I avoid it?
You may be running into a known issue with node piping its standard output to another process. This is only an educated guess because I don't know what your node script is doing.
The issue:
If you're trying to write to stdout using fs.open+fs.write or fs.openSync+fs.writeSync, the output may be incomplete, so you may need to add extra logic to write the remainder.
Based on this nodejs github comment, the recommended way to write a buffer fully to stdout is
process.stdout.write(chunk[, encoding][, callback])
See the Stream API for details on this function. The github link above has more explanation about the issue with using fs.write/fs.writeSync.
I have tested process.stdout.write - to a piped stdout. Even a 16 MB string was written completely.
How can I make a process get its standard input from a socket or a dynamic stream, which I can reference in the future for writing?
The scenario is like this:
Lets say I run a background job like foo &, in bash. I want it to read input from somewhere (a file, a socket, a descriptor or something else) that I can reference in the future, so that in case I want foo to do something by writing to its input I can do something like echo "foos-instruction" >> file-where-foo-gets-its-input-from.
Assuming the program that the process is running is designed to read from stdin, you would use normal shell input redirection (<), for example:
foo < file-where-foo-gets-its-input-from &
or, more likely:
tail -f file-where-foo-gets-its-input-from | foo &
When you want foo to do something, you can do exactly what you described in your question:
echo "foos-instruction" >> file-where-foo-gets-its-input-from
...and foo will do whatever it's supposed to do with foos-instruction.
Et voila.
You are describing named pipes.
mkfifo input
foo < input &
echo foo > input
A named pipe (also called a FIFO, for First-In, First-Out) is a limited-size buffer that the OS maintains and acts much like a file. Data written to the file can be read once, and bytes are read in the same order they are written. If the buffer is empty, a read will block until enough data to finish the read is written. If the buffer is full, a write will block until a read "drains" it enough for the write to complete.
How would you create a variable that could be read. It would read from a certain file if it exists, otherwise it would read from standard input. Something like:
input = File.open("file.txt") || in
This doesn't work, but I think this should be done pretty often, but I can't find a nice way to do it.
This this works for you?
input = File.exist?("file.txt") ? File.open("file.txt") : STDIN
See: ...run against stdin if no arg; otherwise input file =ARGV
I think ruby has the ability to treat arguments that aren't used before STDIN is first used as if it were filenames for files piped into standard input.