pipe operator between two blocks

pipe operator between two blocks - bash

I've found an interesting bash script that with some modifications would likely solve my use case. But I'm unsure if I understand how it works, in particular the pipe between the blocks.
How do these two blocks work together, and what is the behaviour of the pipe that separates them?
function isTomcatUp {
# Use FIFO pipeline to check catalina.out for server startup notification rather than
# ping with an HTTP request. This was recommended by ForgeRock (Zoltan).
FIFO=/tmp/notifytomcatfifo
mkfifo "${FIFO}" || exit 1
{
# run tail in the background so that the shell can
# kill tail when notified that grep has exited
tail -f $CATALINA_HOME/logs/catalina.out &
# remember tail's PID
TAILPID=$!
# wait for notification that grep has exited
read foo <${FIFO}
# grep has exited, time to go
kill "${TAILPID}"
} | {
grep -m 1 "INFO: Server startup"
# notify the first pipeline stage that grep is done
echo >${FIFO}
}
# clean up
rm "${FIFO}"
}
Code Source: https://www.manthanhd.com/2016/01/15/waiting-for-tomcat-to-start-up-in-a-script/

bash has a whole set of compound commands, which work much like simple commands. Most relevant here is that each compound command has its own standard input and standard output.
{ ... } is one such compound command. Each command inside the group inherits its standard input and output from the group, so the effect is that the standard output of a group is the concatenation of its children's standard output. Likewise, each command inside reads in turn from the group's standard input. In your example, nothing interesting happens, because grep consumes all of the standard input and no other command tries to read from it. But consider this example:
$ cat tmp.txt
foo
bar
$ { read a; read b; echo "$b then $a"; } < tmp.txt
bar then foo
The first read gets a single line from standard input, and the second read gets the second. Importantly, the first read consumes a line of input before the second read could see it. Contrast this with
$ read a < tmp.txt
$ read b < tmp.txt
where a and b will both contain foo, because each read command opens tmp.txt anew and both will read the first line.

The { …; } operations groups the commands such that the I/O redirections apply to all the commands within it. The { must be separate as if it were a command name; the } must be preceded by either a semicolon or a newline and be separate too. The commands are not executed in a sub-shell, unlike ( … ) which also has some syntactic differences.
In your script, you have two such groupings connected by a pipe. Because of the pipe, each group is in a sub-shell, but it is not in a sub-shell because of the braces.
The first group runs tail -f on a file in background, and then waits for a FIFO to be closed so it can kill the tail -f. The second part looks for the first occurrence of some specific information and when it finds it, stops reading and writes to the FIFO to free everything up.
As with any pipeline, the exit status is the status of the last group — which is likely to be 0 because the echo succeeds.

Related

Get bash sub shell output immediately from named pipe

I have a few commands i run between brackets which i then redirect to a named pipe and tail the pipe however it looks like the redirection happens only after the block has finished executing as i don't see any output from the tail command for a while and it only shows the last command ouput when i do. Any ideas how view the output of the block in realtime?
Example Script
#!/usr/bin/env bash
mkfifo /tmp/why_you_no_out;
trap "rm /tmp/why_you_no_out" 0;
{
for ((i=1;i<=100;i++)); do
printf "$i";
done
sleep 10s;
printf "\n12356";
} >> /tmp/why_you_no_out &
printf "here";
tail -n 1 -f /tmp/why_you_no_out

Sounds like the issue is buffering. Most shells don't want to write data a byte at a time because it's wasteful. Instead, they wait until they have a sizable chunk of data before committing it unless the output is connected to your terminal.
If you're looking to unbuffer the output of an arbitrary command, you may find the "unbuffer" utility helpful or any of the solutions mentioned in this question: How to make output of any shell command unbuffered?
If you're dealing with specific applications, they may have options to reduce buffering. For example, GNU's grep includes the --line-buffered option.

The stdin of chained commands and commands in command substitution

Let me present my findings first and put my questions at the end. (1) applies to zsh only and (2), (3) apply to both zsh and bash.
1. stdin of command substitution
ls | echo $(cat)
ls | { echo $(cat) }
The first one prints cat: -: Input/output error; while the second one produces the output of ls.
2. chained commands after pipe
ls | { head -n1; cat}
ls | { read a; cat}
The first command doest work properly. cat encounters EOF and directly exits. But the second form works: the first line is read into a and cat gets the rest of them.
3. mixed stdin
ls | { python -c 'import sys; print(sys.argv)' $(head -n1) }
ls | { python -c 'import sys; print(sys.argv); print(input())' $(head -n1) }
Inside the {} in the first line, the command is to print the cmdline arguments; in the second form, the command also reads a line from stdin.
The first command can run successfully while the second form throws due to that input() reads the EOF.
My questions are:
(as in section 1) What is the difference between the form with {} and without ?
(as in section 2) Is it possible for the head and cat to read the same stdin sequentially? How can the second form succeed while the first form fails?
(as in section 3) How is the stdin of the command in a command substitution connected to the stdin of the original command (echo here). Who reads first? And how to make the stdin kept open so that both commands (python and head) can read the same stdin sequentially?

You are not taking input buffering into account and it explains most of your observations.
head reads several kilobytes of input each time it needs data, which makes it much more efficient. So it is likely that it will read all of stdin before any other process has a chance to. That's obvious in case 2, where the execution order is perhaps clearer.
If input were coming from a regular file, head could seek back to the end of the lines it used before terminating. But since a pipe is not seekable, it cannot do that. If you use "here-strings" -- the <<< syntax, then stdin will turn out to be seekable because here-strings are implemented using a temporary file. I don't know if you can rely on that fact, though.
read does not buffer input, at least not beyond the current line (and even then, only if it has no other line end delimiter specified on the command line). It carefully only reads what it needs precisely because it is generally used in a context where its input comes from a pipe and seeking wouldn't be possible. That's extremely useful -- so much so that the fact that it works is almost invisible -- but it's also one of the reasons shell scripting can be painfully slow.
You can see this more clearly by sending enough data into the pipe to satisfy head's initial read. Try this, for example:
seq 1 10000 | { head -n1; head -n2; }
(I changed the second head to head -n2 because the first head happens to leave stdin positioned exactly at the end of a line, so that the second head sees a blank line as the first line.)
The other thing you need to understand is what command substitution does, and when it does it. Command substitution reads the entire output of a command and inserts it into the command line. That happens even before the command has been identified, never mind started execution.
Consider the following little snippet:
$(printf %cc%co e h) hello, world
It should be clear from that that the command substitution is fully performed before the echo utility (or builtin) is started.
Your first scenario triggers an oddity of zsh which is explained by Stéphane Chazelas in this answer on Unix.SE. Effectively, zsh does the command substitution before the pipeline is set up, so cat is reading from the main zsh's standard input. (Stéphane explains why this is and how it leads to an EIO error. Although I think it is dependent on the precise zsh configuration and option settings, since on my default zsh install, it just locks up my terminal. At some point I'll have to figure out why.) If you use braces, then the redirection is set up before the command substitution is performed.

lazy (non-buffered) processing of shell pipeline

I'm trying to figure out how to perform the laziest possible processing of a standard UNIX shell pipeline. For example, let's say I have a command which does some calculations and outputting along the way, but the calculations get more and more expensive so that the first few lines of output arrive quickly but then subsequent lines get slower. If I'm only interested in the first few lines then I want to obtain those via lazy evaluation, terminating the calculations ASAP before they get too expensive.
This can be achieved with a straight-forward shell pipeline, e.g.:
./expensive | head -n 2
However this does not work optimally. Let's simulate the calculations with a script which gets exponentially slower:
#!/bin/sh
i=1
while true; do
echo line $i
sleep $(( i ** 4 ))
i=$(( i+1 ))
done
Now when I pipe this script through head -n 2, I observe the following:
line 1 is output.
After sleeping one second, line 2 is output.
Despite head -n 2 having already received two (\n-terminated) lines and exiting, expensive carries on running and now waits a further 16 seconds (2 ** 4) before completing, at which point the pipeline also completes.
Obviously this is not as lazy as desired, because ideally expensive would terminate as soon as the head process receives two lines. However, this does not happen; IIUC it actually terminates after trying to write its third line, because at this point it tries to write to its STDOUT which is connected through a pipe to STDIN the head process which has already exited and is therefore no longer reading input from the pipe. This causes expensive to receive a SIGPIPE, which causes the bash interpreter running the script to invoke its SIGPIPE handler which by default terminates running the script (although this can be changed via the trap command).
So the question is, how can I make it so that expensive quits immediately when head quits, not just when expensive tries to write its third line to a pipe which no longer has a listener at the other end? Since the pipeline is constructed and managed by the interactive shell process I typed the ./expensive | head -n 2 command into, presumably that interactive shell is the place where any solution for this problem would lie, rather than in any modification of expensive or head? Is there any native trick or extra utility which can construct pipelines with the behaviour I want? Or maybe it's simply impossible to achieve what I want in bash or zsh, and the only way would be to write my own pipeline manager (e.g. in Ruby or Python) which spots when the reader terminates and immediately terminates the writer?

If all you care about is foreground control, you can run expensive in a process substitution; it still blocks until it next tries to write, but head exits immediately (and your script's flow control can continue) after it's received its input
head -n 2 < <(exec ./expensive)
# expensive still runs 16 seconds in the background, but doesn't block your program
In bash 4.4, these store their PIDs in $! and allow process management in the same manner as other background processes.
# REQUIRES BASH 4.4 OR NEWER
exec {expensive_fd}< <(exec ./expensive); expensive_pid=$!
head -n 2 <&"$expensive_fd" # read the content we want
exec {expensive_fd}<&- # close the descriptor
kill "$expensive_pid" # and kill the process
Another approach is a coprocess, which has the advantage of only requiring bash 4.0:
# magic: store stdin and stdout FDs in an array named "expensive", and PID in expensive_PID
coproc expensive { exec ./expensive }
# read two lines from input FD...
head -n 2 <&"${expensive[0]}"
# ...and kill the process.
kill "$expensive_PID"

I'll answer with a POSIX shell in mind.
What you can do is use a fifo instead of a pipe and kill the first link the moment the second finishes.
If the expensive process is a leaf process or if it takes care of killing its children, you can use a simple kill. If it's a process-spawning shell script, you should run it in a process group (doable with set -m) and kill it with a process-group kill.
Example code:
#!/bin/sh -e
expensive()
{
i=1
while true; do
echo line $i
sleep 0.$i #sped it up a little
echo >&2 slept
i=$(( i+1 ))
done
}
echo >&2 NORMAL
expensive | head -n2
#line 1
#slept
#line 2
#slept
echo >&2 SPED-UP
mkfifo pipe
exec 3<>pipe
rm pipe
set -m; expensive >&3 & set +m
<&3 head -n 2
kill -- -$!
#line 1
#slept
#line 2
If you run this, the second run should not have the second slept line, meaning the first link was killed the moment head finished, not when the first link tried to output after head was finished.

Any script to notify when there is no write activity in folder

I am trying to come up with a nice and easy way of detecting when there has not been any write activity in a folder I'd like to watch.
Basically, what I'd like to have is something like this:
#!/bin/sh
# colored red text (error)
function errorMsg () {
echo '\033[0;31m'"$1"'\033[0m'
}
# check for folder to monitor argument
if [ "$#" -ne 1 ]
then
errorMsg "add experiment (folder) to monitor for activity!"
exit
fi
# time out time, 3 minutes
TIMEOUT_TIME=180
LAST_CHECKED=0
function refreshTimer () {
# when called, check seconds since epoch
CURRENT_TIME=date +%s
if [ CURRENT_TIME - LAST_CHECKED > TIMEOUT_TIME ]
then
echo "file write activity halted!" | mail -s "trouble!" "user#provider.ext"
fi
LAST_CHECKED=date +%s
}
# set last checked to now.
LAST_CHECKED=date +%s
# start monitoring for file changes, and update timer when new files are being written.
fswatch -r ${1} | refreshTimer
but all sorts of bash magic is required I presume, since fswatch is a background task and piping its output creates a subshell.
I would also be in need of some timer logic...
I was thinking something like a setTimeout of which the time argument keeps being added to when there IS activity, but I don't know how to write it all in one script.
Bash, Python, Ruby, anything that can be installed using homebrew on OSX is fine but the simpler the better (so I understand what is happening).

Try the following - note that it requires bash:
#!/usr/bin/env bash
# colored red text (error)
function errorMsg () {
printf '\033[0;31m%s\033[0m\n' "$*" >&2
}
# check for folder to monitor argument
if [[ $# -ne 1 ]]
then
errorMsg "add experiment (folder) to monitor for activity!"
exit 2
fi
# time-out: 3 minutes
TIMEOUT_TIME=180
# Read fswatch output as long as it is
# within the timeout value; every change reported
# resets the timer.
while IFS= read -t $TIMEOUT_TIME -d '' -r file; do
echo "changed: [$file]"
done < <(fswatch -r -0 "${1}")
# Getting here means that read timed out.
echo "file write activity halted!" | mail -s "trouble!" "user#provider.ext"
fswatch indefinitely outputs lines to stdout, which must be read line by line to take timely action on new output.
fswatch -0 terminates lines with NULs (zero bytes), which read then reads on be one by setting the line delimiter (separator) to an empty string (-d '').
< <(fswatch -r -0 "${1}") provides input via stdin < to the while loop, where read consumes the stdin input one NUL-terminated line at a time.
<(fswatch -r -0 "${1}") is a process substitution that forms an "ad-hoc file" (technically, a FIFO or named file descriptor) from the output produced by fswatch -r -0 "${1}" (which watches folder ${1}'s subtree (-r) for file changes, and reports each terminated with NUL (-0)).
Since the fswatch command runs indefinitely, the "ad-hoc file" will continue to provide input, although typically only intermittently, depending on filesystem activity.
Whenever the read command receives a new line within the timeout period (-t $TIMEOUT_TIME), it terminates successfully (exit code 0), causing the body of the loop to be executed and then read to be invoked again.
Thus, whenever a line has been received, the timeout period is effectively reset, because the new read invocation starts over with the timeout period.
By contrast, if the timeout period expires before another line is received, read terminates unsuccessfully - with a nonzero exit code indicating failure, which causes the while loop to terminate.
Thus, the code after the loop is only reached when the read command times out.
As for your original code:
Note: Some of the problems discussed could have been detected with the help of shellecheck.net
echo '\033[0;31m'"$1"'\033[0m'
printf is the better choice when it comes to interpreting escape sequences, since echo's behavior varies across shells and platforms; for instance, running your script with bash will not interpret them (unless you also add the -e option).
function refreshTimer ()
The function syntax is nonstandard (not POSIX-compliant), so you shouldn't use it with a sh shebang line (that's what chepner meant in his comment). On OSX, you can get away with it, because bash acts as sh and most bashisms are still available when run as sh, but it wouldn't work on other systems. If you know you'll be running with bash anyway, it's better to use a bash shebang line.
CURRENT_TIME=date +%s
You can't assign the output from commands to a variable by simply placing the command as is on the RHS of the assignment; instead, you need a command substitution; in the case at hand: CURRENT_TIME=$(date +%s) (the older syntax with backticks - CURRENT_TIME=`date +%s` - works too, but has disadvantages).
[ CURRENT_TIME - LAST_CHECKED > TIMEOUT_TIME ]
> in [ ... ] and bash's [[ ... ]] conditionals is lexical comparison and the variable names must be $-prefixed; you'd have to use an arithmetic conditional with your syntax: (( CURRENT_TIME - LAST_CHECKED > TIMEOUT_TIME ))
As an aside, it's better better not to use all-uppercase variable names in shell programming.
fswatch -r ${1} | refreshTimer
refreshTimer will not get called until the pipe fills up (the timing of which you won't be able to predict), because you make no attempt to read line by line.
Even if you fix that problem, the variables inside refreshTimer won't be preserved, because refreshTimer runs in a new subshell every time, due to use of a pipeline (|). In bash, this problem is frequently worked around by providing input via a process substitution (<(...)), which you can see in action in my code above.

Stdout race condition between script and subscript

I'm trying to call a script deepScript and process its output within another script shallowScript ; it looks schematically like the following pieces of code:
shallowScript.sh
#!/bin/zsh
exec 1> >( tr "[a-z]" "[A-Z]" )
print "Hello - this is shallowScript"
. ./deepScript.sh
deepScript.sh
#!/bin/zsh
print "Hello - this is deepScript"
Now, when I run ./shallowScript.sh, the outcome is erratic : either it works as expected (very rarely), or it prints an empty line followed by the two expected lines (sometimes), or it prints the two lines and then hangs until I hit return and give it a newline (most of the time).
So far, I found out the following:
it is probably a race condition, as the two "print"s try to output to stdout at the same time; inserting "sleep 1" before the call to ". ./deepScript.sh" corrects the problem consistently
the problem comes from the process substitution "exec 1> >(tr ...)"; commenting it out also corrects the problem consistently
I've browsed so many forums and posts about process substitution and redirection, but could not find out how to guarantee that my script calls commands synchronously. Ideas ?
zsh --version
zsh 5.0.5 (x86_64-apple-darwin14.0)
[EDIT]
As it seems that this strategy is bound to fail or lead to horrible workaround syntax, here is another strategy that seems to work with a bearable syntax: I removed all the redirect from shallowScript.sh and created a third script where the output processing happens in a function:
shallowScript.sh
#!/bin/zsh
print "Hello - this is shallowScript"
. ./deepScript.sh
thirdScript.sh
#!/bin/zsh
function _process {
while read input; do
echo $input | tr "[a-z]" "[A-Z]"
done
}
. ./shallowScript.sh | _process

I suppose the problem is that you don't see the prompt after executing the script:
$ ./shallowScript.sh
$ HELLO - THIS IS SHALLOWSCRIPT
HELLO - THIS IS DEEPSCRIPT
(nothing here)
and think it hangs here and waits for the newline. Actually it does not, and the behavior is very expected.
Instead of the newline you can enter any shell command e.g. ls and it will be executed.
$ ./shallowScript.sh
$ HELLO - THIS IS SHALLOWSCRIPT <--- note the prompt in this line
HELLO - THIS IS DEEPSCRIPT
echo test <--- my input
test <--- its result
$
What happens here is: the first shell (the one which is running shallowScript.sh) creates a pipe, executes a dup2 call to forward its stdout (fd 1) to the write end of the created pipe and then forks a new process (tr) so that everything the parent prints to stdout is sent to the stdin of tr.
What happens next is that the main shell (the one where you type the initial command ./shallowScript.sh) does not have an idea that it should delay printing the next command prompt until the end of tr process. It knows nothing about tr, so it just waits for the shallowScript.sh to execute, then prints a prompt. The tr is still running at that time, that's why its output (two lines) come after the prompt is printed, and you think the shell is waiting for the newline. It is not actually, it is ready for the next command. You can see the printed prompt ($ character or whatever) somewhere before, inside, or after the output of the script, it depends on how fast the tr process finished.
You see such behavior every time your process forks and the child continues to write to its stdout when the parent is already dead.
Long story short, try this:
$ ./shallowScript.sh | cat
HELLO - THIS IS SHALLOWSCRIPT
HELLO - THIS IS DEEPSCRIPT
$
Here the shell will wait for the cat process to finish before printing a next prompt, and the cat will finish only when all its input (e.g. the output from tr) is processed, just as you expect.
Update: found a relevant quote in zsh docs here: http://zsh.sourceforge.net/Doc/Release/Expansion.html#Process-Substitution
There is an additional problem with >(process); when this is attached to an external command, the parent shell does not wait for process to finish and hence an immediately following command cannot rely on the results being complete. The problem and solution are the same as described in the section MULTIOS in Redirection. Hence in a simplified version of the example above:
paste <(cut -f1 file1) <(cut -f3 file2) > >(process)
(note that no MULTIOS are involved), process will be run asynchronously as far as the parent shell is concerned. The workaround is:
{ paste <(cut -f1 file1) <(cut -f3 file2) } > >(process)
In your case it will give something like this:
{
print "Hello - this is shallowScript"
. ./deepScript.sh
} 1> >( tr "[a-z]" "[A-Z]" )
which of course works but looks worse than the original.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio