fork() moving to the start of main()

fork() moving to the start of main() - fork

I have the following code:
#include <stdio.h>
int main() {
printf("Hello\n");
fork();
return 0;
}
This gives output:
Hello
Which is as expected. But if i modify the code to:
#include <stdio.h>
int main() {
printf("Hello");
fork();
return 0;
}
Removing, the \n gives output:
HelloHello
Why is printf called two times. Isn't the child process supposed to execute the next instruction: return 0;

The printf function call places the hello characters into a buffer associated with the stdout stream. The buffer is subsequently flushed when the process exits, and that's when we see the output. You've forked before this happened, so two processes perform this buffer flushing in two separate address spaces when each of them exits. Each process has a copy of the stream, with the buffer and its hello contents.
When the stdout stream is connected to an interactive device (like a TTY on Unix), then it is line buffered. Line buffering means that the buffer is flushed whenever a newline character is output.
If we flush the buffer before fork (such as by printing a newline or by calling fflush(stdout)) then the flushing takes place in the parent process. The buffer is empty at the time the fork; though the child process inherits a copy of it, there is nothing left to flush in either process.
In the duplicated output case, something is in fact called twice. It's just not printf, but rather the write system call which sends the buffered characters to the output device.

It's not called twice. Remember that a console-connected stdout is line-buffered by default. Since you haven't ended your printf argument with a newline, it won't go to the console yet. It'll only go to the buffer first. So printf("Hello"); copies "Hello", into the output buffer, fork() creates a process copy with a copy of the address space (which includes stdout's output buffer with the "Hello" string in it) and return 0; returns control back to libc, which flushes the output buffer, but since this is happening after fork(), it'll happen twice -- once in the parent and once in the child, and so you get "HelloHello" in the final output.

Related

Using -x when passing a heredoc to the ssh command [duplicate]

I have a perl script which writes messages to STDOUT and STDERR (via print/croak statements), but I also redirect the STDOUT and STDERR to a logfile:
File::Tee::tee STDOUT, ">>", "$logFile" ;
File::Tee::tee STDERR, ">>", "$logFile" ;
Now the output logfile has messages from STDOUT and STDERR displayed out of order. Also the actual output on terminal is also out of order. I have tried flushing the buffers (as recommended here: https://perl.plover.com/FAQs/Buffering.html) but it doesn't help:
select(STDERR) ;
$| = 1 ;
select(STDOUT) ;
$| = 1 ;
Does anyone know what I must do to see the output in order (I also tried additionally flushing the filehandle corresponding to $logfile, but its still the same)?
EDIT:
Thanks to all of you who have replied. A lot of the discussion over this ended up in comments, so I am going to list the few things which I tried based on feedback from all of you.
I already was flushing the STDOUT and STDERR before I used File::Tee. As #jimtut suspected, File::Tee was indeed the culprit - removing it restored the ordering on console. But I did want to redirect the STDOUT and STDERR.
#mob suggested using IO::Tee instead, but I haven't fully understood how to make that work they way I want in my code.
#briandfoy pointed out that there isn't a reliable way to ensure 2 seperate filehandles are seen in the correct order in realtime and also suggested using a logging routine which is the only place which can write to STDOUT/STDERR. #zimd further pointed out that File::Tee uses fork which is the heart of the issue since 2 processes cannot guarantee any order on output.
Since the File::Tee is to blame, I attempted to remove this from the code. I updated my logger function to print to STDOUT/STDERR as well as to additionally print to $log filehandle. Further for capturing the warns in the log, I did the following:
sub warning_handler {
my $msg = $_[0] ;
print STDERR $msg ;
print $log $msg if defined $log ;
}
$SIG{__WARN__} = \&warning_handler ;
This worked great for all of the code under my control. Everything was printing in order now both on console and logfile. However I realized I can't use this solution since I was also calling someone else's perl packages for some functionality and clearly I couldn't intercept the print/croak etc which wrote to STDOUT/STDERR within the 'off the shelf' package. So right now, I don't have a good solution. However I suspect if I can find someway to intercept STDOUT/STDERR within perl, I might be able to get what I need.
EDIT2:
I added my own answer which is probably the closest I got to solving the problem by modifying mob's solution of using IO::Tee instead of File::Tee, but even this misses some messages (though it fixes ordering).
EDIT3:
Finally found the 'solution'
use IO::Tee ;
use Capture::Tiny qw(capture);
...
...
select(STDERR) ;
$| = 1 ;
select(STDOUT) ;
$| = 1 ;
open (my $log, ">", $logfilename) ;
*REALSTDOUT = *STDOUT ;
*REALSTDERR = *STDERR ;
*STDOUT = IO::Tee->new(\*REALSTDOUT, $log);
*STDERR = IO::Tee->new(\*REALSTDERR, $log);
# Regular Perl code here which sends output to STDOUT/STDERR
...
...
# system calls / calls to .so needs to be catpured
&log_streams(sub { &some_func_which_calls_shared_object() ; }) ;
sub log_streams {
my ($cr, #args) = #_; # code reference, with its arguments
my ($out, $err, $exit) = capture { $cr->(#args) };
if ($out) {
print STDOUT $out;
}
if ($err) {
print STDERR $err;
}
}
The use of IO::Tee ensures all perl generated outputs to console also go to logfile, and this happens immediately thereby updating the log and console in realtime. Since IO::Tee is changing the meaning of STDOUT/STDERR filehandles to now refer to the teed handles, it can only intercept stdio from perl statements, it misses sys calls since they bypass perl's STDOUT/STDERR handles. So we capture the syscall output and then use the log_streams routine to forward it to the now aliased STDOUT/STDERR streams. This creates a delay in the system call generated output showing up in log/terminal but there is no delay for perl generated output - i.e. best of both worlds. Do note that the ordering of stderr and stdout generated by an invocation of subroutine some_func_which_calls_shared_object is not preserved since in the log_streams routine, we first print to STDOUT and then to STDERR - as long as the system call is atomic and doesn't do much in terms of interleaving stdout/stderr messages we should be ok.
Appreciate solutions from briandfoy, mob and zimd whose answers I combined to arrive at this solution! Never thought it would require to go through this detail for what seems a very simple problem.

With two separate file handles, there's no contract or guarantee that you'll see them in real time. Various settings and buffers affect that, which is why you see the auto flush stuff ($|). It's the same idea for files or the terminal.
Realize this is an architectural issue rather than a syntactic one. You have two things competing for the same resource. That usually ends in tears. I hesitate to suggest a solution when I don't know what the problem is, but consider having whatever is trying to write to STDOUT or STDERR write to some sort of message broker that collects all the messages and is the only thing that writes to the final (shared) destination. For example, things that want to add entries to the syslog don't write to the syslog; they send messages to the thing that writes to the syslog.
A more Perly example: in Log4perl, you don't write to the final destinations. You simply log a message and the logger is the single thing figures out how to handle it. When I want this sort of behavior with the module, I don't use output facilities directly:
debug( "Some debug message" );
sub debug {
my $message = shift;
output( "DEBUG: $message" );
}
sub output { # single thing that can output message
...
}
Then do whatever you need to do in output.
But, you can't always control that in other things that are also trying to output things. Perl let's you get around this by redefining what warn and friends do by putting a coderef in $SIG{__WARN__}. You can capture warning messages and do whatever you like with them (such as sending them to standard output). Beyond that is black magic that reopens STDERR onto something you can control. It's not that bad and it's isolated in one place.
At some point where another person doesn't want merged output, and the intrusive solutions make it impossible to separate them. I'd much prefer flexibility than hard-coded constraint. If I want just the errors, I want a way to get just the errors. There are many other sorts of workarounds, such as wrappers that collect both output streams (so, not at all intrusive) and various command-redirections.

You will have two filehandles writing to $logfile. Unless File::Tee takes care to seek to the end of the filehandle before every write (which it doesn't appear to), you will get a race condition where one filehandle will overwrite the other.
A workaround would be to use the reopen option to the File::Tee::tee function -- that will close the file after each write and reopen it (at the proper end of the file) before the next write. That could hurt your performance though, depending on how often you write to those filehandles.
You might also have better luck with IO::Tee, which is a more straightforward implementation (using tied filehandles) than what File::Tee uses (a background process for each File::Tee::tee call), so you may get fewer surprises. Here is how an IO::Tee solution might look:
use IO::Tee;
$| = 1;
open my $stdout, ">&=1"; # to write to original stdout
open my $stderr, ">&=2"; # to write to original stderr
open my $fh_log, ">>", $logfile;
*STDOUT = IO::Tee->new($stdout, $fh_log);
*STDERR = IO::Tee->new($stderr, $fh_log);
...
There are no background process, extra threads, or anything else to cause a race condition. Both STDOUT and STDERR will both write to the same log filehandle from the same process.

After taking hint from #mob's answer to use IO::Tie instead of File::Tee (since the latter uses fork causing out of order STDERR vs STDOUT), I modified mob's original solution a bit and it worked (almost - read on):
use IO::Tee
...
...
open (my $log, ">", $logfilename) ;
*MYSTDOUT = *STDOUT ;
*MYSTDERR = *STDERR ;
*STDOUT = IO::Tee->new(\*MYSTDOUT, $log);
*STDERR = IO::Tee->new(\*MYSTDERR, $log);
This resulted in the correct ordering on both console and in the logfile (mob's original solution using open to dup the STDOUT/STDERR didn't work - it resulted in correct order in logfile, but out of order on console. Using a typeglob alias instead of dup works for some strange reason).
However, as good as this solution sounds, it missed printing some messages from a package which I call in the logfile (though they get printed on console). My original code which had File::Tee did result in these same messages from package being shown in the logfile, so there is some voodoo going on somewhere. The specific package in question is a .so file so I have no visibility into how exactly it prints its messages.
EDIT:
I guess that the .so file is as good as an external system command printing to stdout/stderr. Since its not going through perl IO, the handles pointed to by STDOUT/STDERR typeglobs within perl will not reflect the output of external programs called from perl.
I guess the best solution would be to use a combination of this solution for messages coming from within the perl code, and using Capture::Tiny::capture as pointed out by #zdim for capturing and redirecting messages from system calls/calls going through the swig interface.

Note The first part is done via tie-d handles; solution in the second part uses Capture::Tiny
A bare-bones proof-of-concept for an approach using tie-d handles.
The package that ties a handle, by printing from it to a file and to (a copy of) STDOUT stream
package DupePrints;
use warnings;
use strict;
use feature 'say';
my $log = 't_tee_log.out';
open my $fh_out, '>', $log or die $!; # for logging
# An independent copy of STDOUT (via dup2), for prints to terminal
open my $stdout, '>&', STDOUT or die $!;
sub TIEHANDLE { bless {} }
sub PRINT {
my $self = shift;
print $fh_out #_;
print $stdout #_;
}
1;
A program that uses it
use warnings;
use strict;
use feature 'say';
use DupePrints;
$| = 1;
tie *STDERR, 'DupePrints';
tie *STDOUT, 'DupePrints';
say "hi";
warn "\t==> ohno";
my $y;
my $x = $y + 7;
say "done";
This prints to both the terminal and to t_tee_log.out the text
hi
==> ohno at main_DupePrints.pl line 14.
Use of uninitialized value $y in addition (+) at main_DupePrints.pl line 17.
done
See perltie and Tie::Handle, and this post with related examples, and perhaps this post
The logging to a file of STDOUT and STDERR streams (along with a copied printout) works across other modules that may be used in the main program, as well.
To also have "clean" prints, that don't get logged, copy the STDOUT handle in the main program, like it's done in the module, and print to that. If you need to use this in a more selective and sophisticated manner please modify as needed -- as it stands it is meant to be only a basic demo.
With the clarification in the question's edit, here is a different approach: wrap a call to Capture::Tiny, which captures all output from any code, and then manage the captured prints as needed
use warnings;
use strict;
use feature qw(say state);
use Capture::Tiny qw(capture);
sub log_streams {
my ($cr, #args) = #_; # code reference, with its arguments
# Initialize "state" variable, so it runs once and stays open over calls
state $fh_log = do {
open my $fh, '>', 'tee_log.txt' or die $!;
$fh;
};
my ($out, $err, $exit) = capture { $cr->(#args) };
if ($out) {
print $fh_log $out;
print $out;
}
if ($err) {
print $fh_log $err;
print $err;
}
}
log_streams( sub { say "hi" } );
log_streams( sub { warn "==> ohno" } );
log_streams( sub { my $y; my $x = $y + 7; } );
log_streams( sub { system('perl', '-wE', q(say "external perl one-liner")) } );
log_streams( sub { say "done" } );
The downside of all this is that everything needs to run via that sub. But then again, that's actually a good thing, even if sometimes inconvenient.
The state feature is used to "initialize" the filehandle because a variable declared as state is never re-initialized; so the file is opened only once, on the first call, and stays opened.
This is also a demo in need of completion.

Input stdin of c program using bash

I made a c program which takes two standard inputs automatically and one manually.
#include <stdio.h>
int main()
{
int i, j;
scanf("%d %d", &i, &j);
printf("Automatically entered %d %d\n", i, j);
int k;
scanf("%d", &k);
printf("Manually entered %d", k);
return 0;
}
I want to run this program using bash script which can input first two inputs automatically and leaves one input that is to be entered manually. This is the script I am using.
#!/bin/bash
./test <<EOF
1
2
EOF
The problem is EOF is passed as third input instead of asking for manual input. I cannot change my c program and I cannot input third input before the two inputs, so how can I do this using bash. I am new to bash scripting please help.

I made a c program which takes two standard inputs automatically and one manually.
No, you didn't. You made a program that attempts to read three whitespace-delimited decimal integers from the standard input stream. The program cannot distinguish between different origins of those integers.
The problem is EOF is passed as third input instead of asking for manual input.
No, the problem is that you are redirecting the program's standard input to be a shell here document. The here document provides the whole standard input, similar to if your program were reading a file with the here document's contents. When it reaches the end, it does not fall back to reading anything else.
I cannot change my c program and I cannot input third input before the two inputs
I take those two statements to be redundant: you cannot alter the program so that the input you characterize as "manual" is the first one it attempts to read. Not that that would help, anyway.
What you need to do is prepend the fixed input to the terminal input in the test program's standard input stream. There are many ways to do that, but the cat command (mnemonic for concatenate) seems like a natural choice. That would work together with process substitution to achieve your objective. For example,
#!/bin/bash
cat <(echo 1 2) - | ./test
The <(echo 1 2) part executes echo 1 2 and provides its standard output as if it were a file. The cat command concatenates that with its own standard input (represented by -), emitting its result to its standard output. The result is piped into program ./test.
This provides a means to prepend fixed input under your control to arbitrary data read from the standard input. That is, the wrapper script doesn't need to know what input the program expects after the fixed initial part.

Your problem is not caused by EOF being passed as third argument, but actually because stdin for your command is closed before third call to scanf.
One way how to solve this, is reading the value inside the script and then passing all three of them.
Something like this:
#!/bin/bash
read value
printf '1 2 %s' "$value" | ./test

Does ruby IO.gets read from a buffer?

Can someone explain to me how pipe.gets works in this instance? is IO object (pipe in this example), buffering the output? How is it that I can take my time to read the output from stdout using "gets"? I even put a sleep in there before reading through gets to make sure I'm not crazy.
def run_script(commands)
raw_output = nil
IO.popen("./db", "r+") do |pipe|
commands.each do |command|
pipe.puts command
end
pipe.close_write
# Read entire output
raw_output = pipe.gets(nil)
end
raw_output.split("\n")
end

The pipe.sync options defaults to false for popen. Meaning that the commands are buffered by Ruby until you call pipe.close_write, pipe.close, exit the program or the buffer is full. The buffer is then flushed, providing all buffered data written to the other program.
For more info check out the IO#sync documentation and/or What does “file.sync = true” do?
Based on your comment I'm not sure I understand the question. The only reason I can think of why pipe.gets(nil) would return nil is if the other program has no output.
The other option is that pipe.gets(nil) will block. This could happen if pipe is never flushed, meaning the other program is still waiting for the input. Because the other program now blocks and hasn't closed their standard output pipe.gets(nil) will also block. pipe.gets(nil) will read everything from the standard input. How does it know everything is read? It knows because a closed connection means there can't be send any data, thus it waits for a closed connection.
Ruby stdout -------> stdin ./db
stdin <-------- stdout
(blocked because empty)
pipe.puts("a") "a\n" -------> empty <read from stdin>
empty <-------- empty
(still blocked)
pipe.puts("b") "a\nb\n" -----> empty <read from stdin>
empty <-------- empty
(flushing stdout) (continues consuming "a\nb\n")
pipe.close_write empty ----/---> "a\n\b" <read from stdin>
empty <-------- empty
(blocks because empty)
pipe.gets(nil) empty ----/---> empty <send data to stdout>
empty <-------- "data\n"
(continues consuming "data\n") (flushing stdout)
pipe.gets(nil) empty ----/---> empty <close stdout>
"data\n" <--/-- empty
I hope the above helps demonstrate the process of your current code.

Set timeout for command execution and output results to file

i have a basic script.sh that runs some commands inside. The script is like:
(script.sh)
......
`gcc -o program program.c`
if [ $? -eq 0 ]; then
echo "Compiled successfully....\n" >> out.txt
#set a timeout for ./program execution and append results to file
(gtimeout 10s ./program) 2> out.txt # <-- NOT WORKING
......
I run this script through the terminal like:
#Go to this directory,pass all folders to compile&execute the program.c file
czar#MBP~$ for D in /Users/czar/Desktop/1/*; do sh script.sh $D; done
EDIT: The output i get in the terminal, not so important though:
# program.c from 1st folder inside the above path
Cycle l=1: 46 46
Cycle l=1: 48 48
Cycle l=2: 250 274 250
Cycle l=1: 896 896
.........
# program.c from 2nd folder inside the above path
Cycle l=1: 46 46
Cycle l=1: 48 48
Cycle l=2: 250 274 250
Cycle l=1: 896 896
.........
The GOAL is to have those into the out.txt
The output i get is almost what i want: it executes whatever possible in those 10seconds but doesn't redirect the result to out.txt, it just prints to the terminal.
I have tried every suggestion proposed here but no luck.
Any other ideas appreciated.
EDIT 2: SOLUTION given in the comments.

The basic approach is much simpler than the command you copied from the answer to a completely different question. What you need to do is simply redirect standard output to your file:
# Use gtimeout on systems which rename standard Gnu utilities
timeout 10s ./program >> out.txt
However, that will probably not produce all of the output generated by the program if the program is killed by gtimeout, because the output is still sitting in a buffer inside the standard library. (There is nothing special about this buffer; it's just a block of memory malloc'd by the library functions the first time data is written to the stream.) When the program is terminated, its memory is returned to the operating system; nothing will even try to ensure that standard library buffers are flushed to their respective streams.
There are three buffering modes:
Block buffered: no output is produced until the stream's buffer is full. (Usually, the stream's buffer will be around 8kb, but it varies from system to system.)
Line buffered: output is produced when a newline character is sent to the stream. It's also produced if the buffer fills up, but it's rare for a single line to be long enough to fill a buffer.
Unbuffered: No buffering is performed at all. Every character is immediately sent to the output.
Normally, standard output is block buffered unless it is directed to a terminal, in which case it will be line buffered. (That's not guaranteed; the various standards allow quite a lot of latitude.) Line buffering is probably what you want, unless you're in the habit of writing programs which write partial lines. (The oddly-common idiom of putting a newline at the beginning of each output line rather than at the end is a really bad idea, precisely because it defeats line-buffering.) Unbuffered output is another possibility, but it's really slow if the program produces a substantial amount of output.
You can change the buffering mode before you write any data to the stream by calling setvbuf:
/* Line buffer stdout */
setvbuf(stdout, NULL, _IOLBF, 0);
(See man setvbuf for more options.)
You can also tell the library to immediately send any buffered data by calling fflush:
fflush(stdout);
That's an effective technique if you don't want the (slight) overhead of line buffering, but you know when it is important to send data (typically, because the program is about to do some very long computation, or wait for some external event).
If you can't modify the source code, you can use the Gnu utility stdbuf to change the buffering mode before starting the program. stdbuf will not work with all programs -- for example, it won't have any effect if the program does call setvbuf -- but it is usually effective. For example, to line buffer stdout, you could do this:
timeout 10s stdbuf -oL ./program >> out.txt
# Or: gtimeout 10s gstdbuf -oL ./program >> out.txt
See man stdbuf for more information.

Why can't open4 read from stdout when the program is waiting for stdin?

I am using the open4 gem and having problems reading from the spawned processes stdout. I have a ruby program, test1.rb:
print 'hi.' # 3 characters
$stdin.read(1) # block
And another ruby program in the same directory, test2.rb:
require 'open4'
pid, stdin, stdout, stderr = Open4.popen4 'ruby test1.rb'
p stdout.read(2) # 2 characters
When I run the second program:
$ ruby test2.rb
It just sits there forever without printing anything. Why does this happen, and what can I do to stop it?

I needed to change test1.rb to this. I don't know why.
print 'hi.' # 3 characters
$stdout.flush
$stdin.read(1) # block

By default, everything that you printto stdout or to another file is written into a buffer of Ruby (or the standard C library, which is underneath Ruby). The content of the buffer is forwarded to the OS if one of the following events occurs:
The buffer gets full.
You close stdout.
You have printed a newline sequence (`\n')
You call flush explicitly.
For other files, a flush is done on other occasions, too, like ftell.
If you put stdout in unbuffered mode ($stdout.sync = true), the buffer will not be used.
stderr is unbuffered by default.
The reason for doing buffering is efficiency: Aggregating output data in a buffer can save many system call (calls to operating system). System calls are very expensive: They take many hundreds or even thousands of CPU cycles. Avoiding them with a little bit of code and some buffers in user space results in a good speedup.
A good reading on buffering: Why does printf not flush after the call unless a newline is in the format string?

I'm not an expert in process.
From my first sight of API document, the sequence of using open4 is like this:
first send text to stdin, then close stdin and lastly read text from stdout.
So. You can the test2.rb like this
require 'open4'
pid, stdin, stdout, stderr = Open4.popen4 'ruby test1.rb'
stdin.puts "something" # This line is important
stdin.close # It might be optional, open4 might close itself.
p stdout.read(2) # 2 characters

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio