Bash pipeline execution - bash

The netcat manpage indicates that, in the absence of the -c and -e options, a shell can be served via nc using the following commands.
$ rm -f /tmp/f; mkfifo /tmp/f
$ cat /tmp/f | /bin/sh -i 2>&1 | nc -l 127.0.0.1 1234 > /tmp/f
Now, as I understand it, both reads and writes from fifos are blocking operations. For example, if I run
$ mkfifo foo
$ cat foo
bash will block, because nothing has been written to foo. How does the pipeline in the example from the nc manpage not block? I assume I am misunderstanding how pipelines are executed.

All the commands in the pipeline run concurrently, not sequentially. So cat /tmp/f will indeed block, but /bin/sh and nc will still be started while that happens. nc will write to the FIFO when a client connects to the port and sends a command, and this will allow cat to unblock.

The pipe character in bash does notthing esle then connecting the output stream of the first command to the input stream of the second. echo "123" | cat is essentially the same as cat < <(echo 123) (the latter does only start one subshell though while the first starts one for each command, but this can be ignored here - plus, it's a bashism and does not work in sh).
$ mkfifo foo
$ cat foo
Does indeed block - but not freeze. The moment any other program writes anything to foo, cat will display it.
WHat you are doign in your netcat call is essentially create a cicrle: anything written into the FIFO will be displayed by cat, and, as cat is connected to sh sent to the latter. sh will then execute the code (as sh just executes anything written to it's input stream) and send the output to nc. nc will sent it to the client.
ANything the client sends to nc will be written into the FIFO - and our circle is complete.
The mistake you made (I think) is to assume the second process of a pipe only reads the data once, not continuously, and therefore has to wait for the first process to end. This is not true, because every process in a pipeline is started in a shubshell, so they all run intependent of each other.
You should also be able to change the order of all commands in your pipeline. As long as the first one reads from the FIFO and the last one writes to it (to complete the circle), it should work.

Related

How to make tee in Linux provide screen output line by line, not at the end of execution? [duplicate]

Usually, stdout is line-buffered. In other words, as long as your printf argument ends with a newline, you can expect the line to be printed instantly. This does not appear to hold when using a pipe to redirect to tee.
I have a C++ program, a, that outputs strings, always \n-terminated, to stdout.
When it is run by itself (./a), everything prints correctly and at the right time, as expected. However, if I pipe it to tee (./a | tee output.txt), it doesn't print anything until it quits, which defeats the purpose of using tee.
I know that I could fix it by adding a fflush(stdout) after each printing operation in the C++ program. But is there a cleaner, easier way? Is there a command I can run, for example, that would force stdout to be line-buffered, even when using a pipe?
you can try stdbuf
$ stdbuf --output=L ./a | tee output.txt
(big) part of the man page:
-i, --input=MODE adjust standard input stream buffering
-o, --output=MODE adjust standard output stream buffering
-e, --error=MODE adjust standard error stream buffering
If MODE is 'L' the corresponding stream will be line buffered.
This option is invalid with standard input.
If MODE is '0' the corresponding stream will be unbuffered.
Otherwise MODE is a number which may be followed by one of the following:
KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.
In this case the corresponding stream will be fully buffered with the buffer
size set to MODE bytes.
keep this in mind, though:
NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does
for e.g.) then that will override corresponding settings changed by 'stdbuf'.
Also some filters (like 'dd' and 'cat' etc.) dont use streams for I/O,
and are thus unaffected by 'stdbuf' settings.
you are not running stdbuf on tee, you're running it on a, so this shouldn't affect you, unless you set the buffering of a's streams in a's source.
Also, stdbuf is not POSIX, but part of GNU-coreutils.
Try unbuffer (man page) which is part of the expect package. You may already have it on your system.
In your case you would use it like this:
unbuffer ./a | tee output.txt
The -p option is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments.
You can use setlinebuf from stdio.h.
setlinebuf(stdout);
This should change the buffering to "line buffered".
If you need more flexibility you can use setvbuf.
You may also try to execute your command in a pseudo-terminal using the script command (which should enforce line-buffered output to the pipe)!
script -q /dev/null ./a | tee output.txt # Mac OS X, FreeBSD
script -c "./a" /dev/null | tee output.txt # Linux
Be aware the script command does not propagate back the exit status of the wrapped command.
The unbuffer command from the expect package at the #Paused until further notice answer did not worked for me the way it was presented.
Instead of using:
./a | unbuffer -p tee output.txt
I had to use:
unbuffer -p ./a | tee output.txt
(-p is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments)
The expect package can be installed on:
MSYS2 with pacman -S expect
Mac OS with brew install expect
Update
I recently had buffering problems with python inside a shell script (when trying to append timestamp to its output). The fix was to pass -u flag to python this way:
run.sh with python -u script.py
unbuffer -p /bin/bash run.sh 2>&1 | tee /dev/tty | ts '[%Y-%m-%d %H:%M:%S]' >> somefile.txt
This command will put a timestamp on the output and send it to a file and stdout at the same time.
The ts program (timestamp) can be installed with the moreutils package.
Update 2
Recently, also had problems with grep buffering the output, when I used the argument grep --line-buffered on grep to it stop buffering the output.
If you use the C++ stream classes instead, every std::endl is an implicit flush. Using C-style printing, I think the method you suggested (fflush()) is the only way.
The best answer IMO is grep's --line-buffer option as stated here:
https://unix.stackexchange.com/a/53445/40003

pipe operator between two blocks

I've found an interesting bash script that with some modifications would likely solve my use case. But I'm unsure if I understand how it works, in particular the pipe between the blocks.
How do these two blocks work together, and what is the behaviour of the pipe that separates them?
function isTomcatUp {
# Use FIFO pipeline to check catalina.out for server startup notification rather than
# ping with an HTTP request. This was recommended by ForgeRock (Zoltan).
FIFO=/tmp/notifytomcatfifo
mkfifo "${FIFO}" || exit 1
{
# run tail in the background so that the shell can
# kill tail when notified that grep has exited
tail -f $CATALINA_HOME/logs/catalina.out &
# remember tail's PID
TAILPID=$!
# wait for notification that grep has exited
read foo <${FIFO}
# grep has exited, time to go
kill "${TAILPID}"
} | {
grep -m 1 "INFO: Server startup"
# notify the first pipeline stage that grep is done
echo >${FIFO}
}
# clean up
rm "${FIFO}"
}
Code Source: https://www.manthanhd.com/2016/01/15/waiting-for-tomcat-to-start-up-in-a-script/
bash has a whole set of compound commands, which work much like simple commands. Most relevant here is that each compound command has its own standard input and standard output.
{ ... } is one such compound command. Each command inside the group inherits its standard input and output from the group, so the effect is that the standard output of a group is the concatenation of its children's standard output. Likewise, each command inside reads in turn from the group's standard input. In your example, nothing interesting happens, because grep consumes all of the standard input and no other command tries to read from it. But consider this example:
$ cat tmp.txt
foo
bar
$ { read a; read b; echo "$b then $a"; } < tmp.txt
bar then foo
The first read gets a single line from standard input, and the second read gets the second. Importantly, the first read consumes a line of input before the second read could see it. Contrast this with
$ read a < tmp.txt
$ read b < tmp.txt
where a and b will both contain foo, because each read command opens tmp.txt anew and both will read the first line.
The { …; } operations groups the commands such that the I/O redirections apply to all the commands within it. The { must be separate as if it were a command name; the } must be preceded by either a semicolon or a newline and be separate too. The commands are not executed in a sub-shell, unlike ( … ) which also has some syntactic differences.
In your script, you have two such groupings connected by a pipe. Because of the pipe, each group is in a sub-shell, but it is not in a sub-shell because of the braces.
The first group runs tail -f on a file in background, and then waits for a FIFO to be closed so it can kill the tail -f. The second part looks for the first occurrence of some specific information and when it finds it, stops reading and writes to the FIFO to free everything up.
As with any pipeline, the exit status is the status of the last group — which is likely to be 0 because the echo succeeds.

how to use mpiexec in linux shell

I have a file a.txt and each line contains a parameter. Now I want to use mpiexec to call my program such as a.out to calculate with each parameter. So I use linux shell script to handle this. The code is sample
cat a.txt | while read line
do
mpiexec -v -hostfile hosts -np 16 ./a.out ${line}
done
Unexpectedly, the script end after processing only one line of file a.txt. So, it is because of the wrong use of pipe? How can I tackle with this problem?
#!/bin/bash
for LINE in `cat a.txt | xargs -r`; do
mpiexec -v -hostfile hosts -np 16 ./a.out $LINE
done
I had this issue too. Claudio's solution helped set me on the right path to understanding why the loop exits after the first iteration. First off, here is a solution which is pretty close to what you wrote:
cat a.txt | while read line; do
</dev/null mpiexec -np 16 ./a.out ${line}
done
Note that I am just using mpiexec on a local computer, (python's threading situation is bad enough to need this) so I can't test if this works with separate hosts. You can try adding that back in yourself.
The reason that your script didn't work is that mpiexec seems to gobble up whatever is attached to the standard input. I assume it does this so that in case a.out needs that input, it would gobble all the input and send it along with the command to run a.out that gets sent to the other servers. The result is that on the first iteration, read reads the first line from your file. Then mpiexec reads the rest of the lines, even though a.out probably doesn't use them in your case. Then on the second iteration, read tries to read more lines, but since mpiexec already read the rest, read is told that the end of file has been reached, so the loop exits.
Since we want to prevent mpiexec from reading the standard in, we redirect mpiexec's standard in to come from /dev/null. Since /dev/null always contains nothing, mpiexec will read nothing and leave the standard input alone.

Combining pipes and here documents in the shell for SSH

Okay, so I've recently discovered the magic of here documents for feeding stdin style lines into interactive commands. However, I'm trying to use this with SSH to execute a bunch of commands on a remote server, but I also need to pipe in some actual input, before executing the extra commands, to confound matters further I also need to get some results back ;)
Here's what I'm trying to use:
#!/bin/sh
RESULT=$(find -type f "$PATH" | gzip | ssh "$HOST" <<- 'REMOTE_SYNC'
cat > "/tmp/.temp_file"
# Do something with /tmp/.temp_file
REMOTE_SYNC
Is this actually correct? Part of the problem I'm having as well is that I need to pipe the data to that file in /tmp, but I should really be generating a randomly named temp file, but I'm not sure how I could do that, assign the name to a variable (so I can get back to it) and still send stdin into it.
I may also extract the find | gzip part to a separate command run locally first, as the gzipped file will likely be small enough that sending it when ready will result in a much shorter SSH connection then sending it as it's generated, but it still doesn't get around the fact that I need to be able to provide both stdin and my extra commands to SSH.
No, you can't do it like this. Both heredoc and the piped input compete for stdin, and only one wins. Look at this example:
echo test | cat << EOF
TEST
EOF
What will this print? test, TEST or both? It prints TEST, so the heredoc wins (at least in bash).
You don't really need this anyway. Luckily ssh takes a command argument, which will be passed on to the shell on the remote host, so you can just use your command as a string here. So something like this:
echo TEST | ssh user#host 'cat > tempfile; cat tempfile; rm tempfile'
would work (althoug it doesn't make much sense), the output of the left side commands is piped through ssh to the remote host and supplied as stdin there.
If you want the data to be compressed when sending it through ssh, you can just enable compression using the -C option.
edit:
Using linebreaks inside a string is perfectly fine, so this works fine too:
echo TEST | ssh user#host '
cat > tempfile
cat tempfile
rm tempfile
'
The only difference to a heredoc would be that you have to escape quotes.
If you use something like echo TEST | ssh user#host "$(<script.sh)" you can write everything into a file...

Keep a file open forever within a bash script

I need a way to make a process keep a certain file open forever. Here's an example of what I have so far:
sleep 1000 > myfile &
It works for a thousand seconds, but really don't want to make some complicated sleep/loop statement. This post suggested that cat is the same thing as sleep for infinite. So I tried this:
cat > myfile &
It almost looks like a mistake doesn't it? It seemed to work from the command line, but in a script the file connection did not stay open. Any other ideas?
Rather than using a background process, you can also just use bash to open one of its file descriptors:
exec 5>myfile
(The special use of exec here allows changing the current file descriptor redirections - see man bash for details). This will open file descriptor 5 to "myfile" (use >> if you don't want to empty the file).
You can later close the file again with:
exec 5>&-
(One possible downside of this is that the FD gets inherited by every program that the shell runs in the meantime. Mostly this is harmless - e.g. your greps and seds will generally ignore the extra FD - but it could be annoying in some cases, especially if you spawn any processes that stay around (because they will then keep the FD open).
Note: If you are using a newer version of bash (>4.1) you can use a slightly different syntax:
exec {fd}>myfile
This allocates a new file descriptor, and puts it in the variable fd. This can help ensure that scripts don't accidentally overwrite each other's file descriptors. To close the file later, use
exec {fd}>&-
The reason that cat>myfile& works is because it re-directs standard input into a file.
if you launch it with an ampersand (in background), it won't get ANY input, including end-of-file, which means it will forever wait and print nothing to the output file.
You can get an equivalent effect, except WITHOUT dependency on standard input (the latter is what makes it not work in your script), with this command:
tail -f /dev/null > myfile &
On the cat > myfile & issue running in terminal vs not running as part of a script: In a non-interactive shell the stdin of a backgrounded command & gets implicitly redirected from /dev/null.
So, cat > myfile & in a script actually gets translated into cat </dev/null > myfile, which terminates cat immediately.
See the POSIX standard on the Shell Command Language & Asynchronous Lists:
The standard input for an asynchronous list, before any explicit redirections are
performed, shall be considered to be assigned to a file that has the same
properties as /dev/null. If it is an interactive shell, this need not happen.
In all cases, explicit redirection of standard input shall override this activity.
# some tests
sh -c 'sleep 10 & lsof -p ${!}'
sh -c 'sleep 10 0<&0 & lsof -p ${!}'
sh -ic 'sleep 10 & lsof -p ${!}'
# in a script
- cat > myfile &
+ cat 0<&0 > myfile &
tail -f myfile
This 'follows' the file, and outputs any changes to the file. If you don't want to see the output of tail, redirect output to /dev/null or something:
tail -f myfile > /dev/null
You may want to use the --retry option, depending on your specific case. See man tail for more information.

Resources