Stdout race condition between script and subscript - shell

I'm trying to call a script deepScript and process its output within another script shallowScript ; it looks schematically like the following pieces of code:
shallowScript.sh
#!/bin/zsh
exec 1> >( tr "[a-z]" "[A-Z]" )
print "Hello - this is shallowScript"
. ./deepScript.sh
deepScript.sh
#!/bin/zsh
print "Hello - this is deepScript"
Now, when I run ./shallowScript.sh, the outcome is erratic : either it works as expected (very rarely), or it prints an empty line followed by the two expected lines (sometimes), or it prints the two lines and then hangs until I hit return and give it a newline (most of the time).
So far, I found out the following:
it is probably a race condition, as the two "print"s try to output to stdout at the same time; inserting "sleep 1" before the call to ". ./deepScript.sh" corrects the problem consistently
the problem comes from the process substitution "exec 1> >(tr ...)"; commenting it out also corrects the problem consistently
I've browsed so many forums and posts about process substitution and redirection, but could not find out how to guarantee that my script calls commands synchronously. Ideas ?
zsh --version
zsh 5.0.5 (x86_64-apple-darwin14.0)
[EDIT]
As it seems that this strategy is bound to fail or lead to horrible workaround syntax, here is another strategy that seems to work with a bearable syntax: I removed all the redirect from shallowScript.sh and created a third script where the output processing happens in a function:
shallowScript.sh
#!/bin/zsh
print "Hello - this is shallowScript"
. ./deepScript.sh
thirdScript.sh
#!/bin/zsh
function _process {
while read input; do
echo $input | tr "[a-z]" "[A-Z]"
done
}
. ./shallowScript.sh | _process

I suppose the problem is that you don't see the prompt after executing the script:
$ ./shallowScript.sh
$ HELLO - THIS IS SHALLOWSCRIPT
HELLO - THIS IS DEEPSCRIPT
(nothing here)
and think it hangs here and waits for the newline. Actually it does not, and the behavior is very expected.
Instead of the newline you can enter any shell command e.g. ls and it will be executed.
$ ./shallowScript.sh
$ HELLO - THIS IS SHALLOWSCRIPT <--- note the prompt in this line
HELLO - THIS IS DEEPSCRIPT
echo test <--- my input
test <--- its result
$
What happens here is: the first shell (the one which is running shallowScript.sh) creates a pipe, executes a dup2 call to forward its stdout (fd 1) to the write end of the created pipe and then forks a new process (tr) so that everything the parent prints to stdout is sent to the stdin of tr.
What happens next is that the main shell (the one where you type the initial command ./shallowScript.sh) does not have an idea that it should delay printing the next command prompt until the end of tr process. It knows nothing about tr, so it just waits for the shallowScript.sh to execute, then prints a prompt. The tr is still running at that time, that's why its output (two lines) come after the prompt is printed, and you think the shell is waiting for the newline. It is not actually, it is ready for the next command. You can see the printed prompt ($ character or whatever) somewhere before, inside, or after the output of the script, it depends on how fast the tr process finished.
You see such behavior every time your process forks and the child continues to write to its stdout when the parent is already dead.
Long story short, try this:
$ ./shallowScript.sh | cat
HELLO - THIS IS SHALLOWSCRIPT
HELLO - THIS IS DEEPSCRIPT
$
Here the shell will wait for the cat process to finish before printing a next prompt, and the cat will finish only when all its input (e.g. the output from tr) is processed, just as you expect.
Update: found a relevant quote in zsh docs here: http://zsh.sourceforge.net/Doc/Release/Expansion.html#Process-Substitution
There is an additional problem with >(process); when this is attached to an external command, the parent shell does not wait for process to finish and hence an immediately following command cannot rely on the results being complete. The problem and solution are the same as described in the section MULTIOS in Redirection. Hence in a simplified version of the example above:
paste <(cut -f1 file1) <(cut -f3 file2) > >(process)
(note that no MULTIOS are involved), process will be run asynchronously as far as the parent shell is concerned. The workaround is:
{ paste <(cut -f1 file1) <(cut -f3 file2) } > >(process)
In your case it will give something like this:
{
print "Hello - this is shallowScript"
. ./deepScript.sh
} 1> >( tr "[a-z]" "[A-Z]" )
which of course works but looks worse than the original.

Related

pipe operator between two blocks

I've found an interesting bash script that with some modifications would likely solve my use case. But I'm unsure if I understand how it works, in particular the pipe between the blocks.
How do these two blocks work together, and what is the behaviour of the pipe that separates them?
function isTomcatUp {
# Use FIFO pipeline to check catalina.out for server startup notification rather than
# ping with an HTTP request. This was recommended by ForgeRock (Zoltan).
FIFO=/tmp/notifytomcatfifo
mkfifo "${FIFO}" || exit 1
{
# run tail in the background so that the shell can
# kill tail when notified that grep has exited
tail -f $CATALINA_HOME/logs/catalina.out &
# remember tail's PID
TAILPID=$!
# wait for notification that grep has exited
read foo <${FIFO}
# grep has exited, time to go
kill "${TAILPID}"
} | {
grep -m 1 "INFO: Server startup"
# notify the first pipeline stage that grep is done
echo >${FIFO}
}
# clean up
rm "${FIFO}"
}
Code Source: https://www.manthanhd.com/2016/01/15/waiting-for-tomcat-to-start-up-in-a-script/
bash has a whole set of compound commands, which work much like simple commands. Most relevant here is that each compound command has its own standard input and standard output.
{ ... } is one such compound command. Each command inside the group inherits its standard input and output from the group, so the effect is that the standard output of a group is the concatenation of its children's standard output. Likewise, each command inside reads in turn from the group's standard input. In your example, nothing interesting happens, because grep consumes all of the standard input and no other command tries to read from it. But consider this example:
$ cat tmp.txt
foo
bar
$ { read a; read b; echo "$b then $a"; } < tmp.txt
bar then foo
The first read gets a single line from standard input, and the second read gets the second. Importantly, the first read consumes a line of input before the second read could see it. Contrast this with
$ read a < tmp.txt
$ read b < tmp.txt
where a and b will both contain foo, because each read command opens tmp.txt anew and both will read the first line.
The { …; } operations groups the commands such that the I/O redirections apply to all the commands within it. The { must be separate as if it were a command name; the } must be preceded by either a semicolon or a newline and be separate too. The commands are not executed in a sub-shell, unlike ( … ) which also has some syntactic differences.
In your script, you have two such groupings connected by a pipe. Because of the pipe, each group is in a sub-shell, but it is not in a sub-shell because of the braces.
The first group runs tail -f on a file in background, and then waits for a FIFO to be closed so it can kill the tail -f. The second part looks for the first occurrence of some specific information and when it finds it, stops reading and writes to the FIFO to free everything up.
As with any pipeline, the exit status is the status of the last group — which is likely to be 0 because the echo succeeds.

The stdin of chained commands and commands in command substitution

Let me present my findings first and put my questions at the end. (1) applies to zsh only and (2), (3) apply to both zsh and bash.
1. stdin of command substitution
ls | echo $(cat)
ls | { echo $(cat) }
The first one prints cat: -: Input/output error; while the second one produces the output of ls.
2. chained commands after pipe
ls | { head -n1; cat}
ls | { read a; cat}
The first command doest work properly. cat encounters EOF and directly exits. But the second form works: the first line is read into a and cat gets the rest of them.
3. mixed stdin
ls | { python -c 'import sys; print(sys.argv)' $(head -n1) }
ls | { python -c 'import sys; print(sys.argv); print(input())' $(head -n1) }
Inside the {} in the first line, the command is to print the cmdline arguments; in the second form, the command also reads a line from stdin.
The first command can run successfully while the second form throws due to that input() reads the EOF.
My questions are:
(as in section 1) What is the difference between the form with {} and without ?
(as in section 2) Is it possible for the head and cat to read the same stdin sequentially? How can the second form succeed while the first form fails?
(as in section 3) How is the stdin of the command in a command substitution connected to the stdin of the original command (echo here). Who reads first? And how to make the stdin kept open so that both commands (python and head) can read the same stdin sequentially?
You are not taking input buffering into account and it explains most of your observations.
head reads several kilobytes of input each time it needs data, which makes it much more efficient. So it is likely that it will read all of stdin before any other process has a chance to. That's obvious in case 2, where the execution order is perhaps clearer.
If input were coming from a regular file, head could seek back to the end of the lines it used before terminating. But since a pipe is not seekable, it cannot do that. If you use "here-strings" -- the <<< syntax, then stdin will turn out to be seekable because here-strings are implemented using a temporary file. I don't know if you can rely on that fact, though.
read does not buffer input, at least not beyond the current line (and even then, only if it has no other line end delimiter specified on the command line). It carefully only reads what it needs precisely because it is generally used in a context where its input comes from a pipe and seeking wouldn't be possible. That's extremely useful -- so much so that the fact that it works is almost invisible -- but it's also one of the reasons shell scripting can be painfully slow.
You can see this more clearly by sending enough data into the pipe to satisfy head's initial read. Try this, for example:
seq 1 10000 | { head -n1; head -n2; }
(I changed the second head to head -n2 because the first head happens to leave stdin positioned exactly at the end of a line, so that the second head sees a blank line as the first line.)
The other thing you need to understand is what command substitution does, and when it does it. Command substitution reads the entire output of a command and inserts it into the command line. That happens even before the command has been identified, never mind started execution.
Consider the following little snippet:
$(printf %cc%co e h) hello, world
It should be clear from that that the command substitution is fully performed before the echo utility (or builtin) is started.
Your first scenario triggers an oddity of zsh which is explained by Stéphane Chazelas in this answer on Unix.SE. Effectively, zsh does the command substitution before the pipeline is set up, so cat is reading from the main zsh's standard input. (Stéphane explains why this is and how it leads to an EIO error. Although I think it is dependent on the precise zsh configuration and option settings, since on my default zsh install, it just locks up my terminal. At some point I'll have to figure out why.) If you use braces, then the redirection is set up before the command substitution is performed.

Kill a child-script when it emits a certain string

I have the following setup:
#!/bin/bash
# call.sh
voip_binary $1
Now when I a certain event happens (when I hang up), then the voip_binary emits a certain string: ... [DISCONNCTD] ...
However, the script does not stop, but continues running. Now as soon as DISCONNCTD is discovered in the output, I want to kill the script.
So I could do the following, to get the relevant output:
voip_binary $1 | grep DISCONNCTD
Even though I now only get relevant output from the binary, I still don't know how to kill it, as soon as a line is emitted.
You can try this:
voip_binary $1 | grep 'DISCONNCTD' | while read line; do
pkill 'voip_binary'
done
Assuming voip_binary outputs your string without buffering.

Open a shell in the second process of a pipe

I'm having problems understanding what's going on in the following situation. I'm not familiar with UNIX pipes and UNIX at all but have read documentation and still can't understand this behaviour.
./shellcode is an executable that successfully opens a shell:
seclab$ ./shellcode
$ exit
seclab$
Now imagine that I need to pass data to ./shellcode via stdin, because this reads some string from the console and then prints "hello " plus that string. I do it in the following way (using a pipe) and the read and write works:
seclab$ printf "world" | ./shellcode
seclab$ hello world
seclab$
However, a new shell is not opened (or at least I can't see it and iteract with it), and if I run exit I'm out of the system, so I'm not in a new shell.
Can someone give some advice on how to solve this? I need to use printf because I need to input binary data to the second process and I can do it like this: printf "\x01\x02..."
When you use a pipe, you are telling Unix that the output of the command before the pipe should be used as the input to the command after the pipe. This replaces the default output (screen) and default input (keyboard). Your shellcode command doesn't really know or care where its input is coming from. It just reads the input until it reaches the EOF (end of file).
Try running shellcode and pressing Control-D. That will also exit the shell, because Control-D sends an EOF (your shell might be configured to say "type exit to quit", but it's still responding to the EOF).
There are two solutions you can use:
Solution 1:
Have shellcode accept command-line arguments:
#!/bin/sh
echo "Arguments: $*"
exec sh
Running:
outer$ ./shellcode foo
Arguments: foo
$ echo "inner shell"
inner shell
$ exit
outer$
To feed the argument in from another program, instead of using a pipe, you could:
$ ./shellcode `echo "something"`
This is probably the best approach, unless you need to pass in multi-line data. In that case, you may want to pass in a filename on the command line and read it that way.
Solution 2:
Have shellcode explicitly redirect its input from the terminal after it's processed your piped input:
#!/bin/sh
while read input; do
echo "Input: $input"
done
exec sh </dev/tty
Running:
outer$ echo "something" | ./shellcode
Input: something
$ echo "inner shell"
inner shell
$ exit
outer$
If you see an error like this after exiting the inner shell:
sh: 1: Cannot set tty process group (No such process)
Then try changing the last line to:
exec bash -i </dev/tty

Why this echo construct does not work?

When I do this:
$ /bin/echo 123 | /bin/echo
I get no o/p. Why is that?
You ask why it doesn't work. In fact, it does work; it does exactly what you told it to do. Apparently that's not what you expected. I think you expected it to print 123, but you didn't actually say so.
(Note: "stdin" is standard input; "stdout" is standard output.)
/bin/echo 123 | /bin/echo
Here's what happens. The echo command is executed with the argument 123. It writes "123", followed by a newline, to its stdout.
stdout is redirected, via the pipe (|) to the stdin of the second echo command. Since the echo command ignores its stdin, the output of the first echo is quietly discarded. Since you didn't give the second echo command any arguments, it doesn't print anything. (Actually, /bin/echo with no arguments typically prints a single blank line; did you see that?)
Normally pipes (|) are used with filters, programs that read from stdin and write to stdout. cat is probably the simplest filter; it just reads its input and writes it, unchanged, to its output (which means that some-command | cat can be written as just some-command).
An example of a non-trivial filter is rev, which copies stdin to stdout while reversing the characters in each line.
echo 123 | rev
prints
321
rev is a filter; echo is not. echo does print to stdout, so it makes sense to have it on the left side of a pipe, but it doesn't read from stdin, so it doesn't make sense to use it on the right side of a pipe.
"echo" reads from command line, not standard input. So pipeline is not working here.
From bash manual:
echo [-neE] [arg ...]
Output the args, separated by spaces, terminated with a newline.
So your first echo did print "123" to stand output, however, the second echo didn't make use of it so "123" is dropped. Then an empty line is printed as if you run "echo".
You can use cat as Keith Thompson suggested:
echo 123|cat
/bin/echo 123 < /bin/echo
the pipe = concate
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-4.html
Pipes let you use the output of a program as the input of another one

Resources