I want to make clear when does pipe | or redirection < > takes precedence in a command?
This is my thought but need confirmation this is how it works.
Example 1:
sort < names | head
The pipe runs first: names|head then it sorts what is returned from names|head
Example 2:
ls | sort > out.txt
This one seems straight forward by testing, ls|sort then redirects to out.txt
Example 3:
Fill in the blank? Can you have both a < and a > with a | ???
In terms of syntactic grouping, > and < have higher precedence; that is, these two commands are equivalent:
sort < names | head
( sort < names ) | head
as are these two:
ls | sort > out.txt
ls | ( sort > out.txt )
But in terms of sequential ordering, | is performed first; so, this command:
cat in.txt > out1.txt | cat > out2.txt
will populate out1.txt, not out2.txt, because the > out1.txt is performed after the |, and therefore supersedes it (so no output is piped out to cat > out2.txt).
Similarly, this command:
cat < in1.txt | cat < in2.txt
will print in2.txt, not in1.txt, because the < in2.txt is performed after the |, and therefore supersedes it (so no input is piped in from cat < in1.txt).
From man bash (as are the other quotes):
SHELL GRAMMAR
Simple Commands
A simple command is a sequence of optional variable assignments followed by
blank-separated words and redirections, and terminated by a control
operator. The first word specifies the command to be executed, and is
passed as argument zero. The remaining words are passed as arguments
to the invoked command.
The return value of a simple command is its exit status, or 128+n if
the command is terminated by signal n.
Pipelines
A pipeline is a sequence of one or more commands separated by one of
the control operators | or |&. The format for a pipeline is:
[time [-p]] [ ! ] command [ [|⎪|&] command2 ... ]
In other words, you can have any number of redirections for a (simple) command; you can also use that as part of a pipeline. Or, put another way, redirection binds more tightly than pipe.
There are a couple of ways to get work around this (although they're rarely either necessary or aesthetic):
1. You can make a "compound command" and redirect into it:
Compound Commands
A compound command is one of the following:
(list) list is executed in a subshell environment (see
COMMAND EXECUTION ENVIRONMENT below). Variable
assignments and builtin commands that affect the
shell's environment do not remain in effect after the
command completes. The return status is the exit status of list.
{ list; }
list is simply executed in the current shell environment. list
must be terminated with a newline or semicolon. This is known as a
group command. The return status is the exit status of list. Note
that unlike the metacharacters ( and ), { and } are reserved words
and must occur where a reserved word is permitted to be recognized.
Since they do not cause a word break, they must be separated from
list by whitespace or another shell metacharacter.
So:
$ echo foo > input
$ { cat | sed 's/^/I saw a line: /'; } < input
I saw a line: foo
2. You can redirect to a pipe using "process substitution":
Process Substitution
Process substitution is supported on systems that support named pipes
(FIFOs) or the /dev/fd method of naming open files. It takes the form of
<(list) or >(list). The process list is run with its input or output
connected to a FIFO or some file in /dev/fd. The name of this file is
passed as an argument to the current command as the result of the
expansion. If the >(list) form is used, writing to the file will provide
input for list. If the <(list) form is used, the file passed as an argument
should be read to obtain the output of list.
So:
rici#...$ cat > >(sed 's/^/I saw a line: /') < <(echo foo; echo bar)
I saw a line: foo
rici#...$ I saw a line: bar
(Why the prompt appears before the output terminates, and what to do about it are left as exercises).
This is pretty much what I understand after doing some reading (including ruakh's answer)
First of all, if you redirect multiple times, all the redirections are performed, but only the last redirection will take effect (assuming none of the earlier redirections cause error)
e.g. cat < in1.txt < in2.txt is equivalent to cat < in2.txt, unless in1.txt does not exist in which case this command will fail (since < in1.txt is performed first)
Similarly, with cat in.txt > out1.txt > out2.txt, only out2.txt would contain the contents of out2.txt, but since > out1.txt was performed first, out1.txt would be created if it doesn't exist.
What pipe does is connect the stdout of previous command to the stdin of the next command, and that connection comes before any other redirections (from Bash manual).
So you can think of
cat in1.txt > out1.txt | cat > out2.txt
as
cat in1.txt > pipe > out1.txt; cat < pipe > out2.txt
And applying the multiple redirection rule mentioned before, we can simplify this to
cat in1.txt > out1.txt; cat < pipe > out2.txt
Result: The content of in1.txt is copied to out1.txt, since nothing was written to pipe
Using another of [ruakh][3]'s example,
cat < in1.txt | cat < in2.txt
is roughly equivalent to
cat > pipe < in1.txt; cat < pipe < in2.txt
which is effectively
cat > pipe < in1.txt; cat < in2.txt
Result: This time something is written to the pipe, but since the second cat reads from in2.txt instead of pipe, only the content of in2.txt is printed out. If the pipe is in the middle of the same side (> or <) redirection, it will be ingored.
It's a little unorthodox, but perfectly legal, to place the < anywhere you like, so I prefer this as it better illustrates the left-to-right data flow:
<input.txt sort | head >output.txt
The only time you cannot do this is with built-in control structure commands (for, if, while).
# Unfortunately, NOT LEGAL
<input.txt while read line; do ...; done
Note that all of these are equivalent commands, but to avoid confusion you should use only the first or the last one:
<input.txt grep -l foobar
grep <input.txt -l foobar
grep -l <input.txt foobar
grep -l foobar <input.txt
Because the file name must always come directly after the redirection operator, I prefer to leave out the optional space between the < and the file name.
Corrections:
Example 1:
sort < names | head
In this case, input redirect runs first (names are sorted), then the result of that is piped to head.
In general you can read from left to right. The standard idiom works as follows:
Use of input redirection "<" tells the program reads from a file instead of stdin
Use of output redirection ">" tells the program to output to a file instead of stdout
Use of pipe "program_a | program_b" takes everything that would normally be output by program_a to stdout, and feeds it all directly to program_b as if it was read from stdin.
Related
I would like to undestand where a stdin redirection goes in an expression like < <(cmd)
as a test to learn more about bash I tried to write a bash function with while .. do .. done and to have it working I had to use a trial and error method , mainly because I did not know the behavior or the first redirection in < <(find ...):
while read L ;do basename "$L";((k++));done < <(
find -maxdepth 1 -type d -name '.*' |
sort
)
With only <(find ... it does not work . I suppose it's because stdout of find command line goes to a tmp file ( so I read ) ; so I added one more < to "push" further the copy of stdout . That I can understand , but how can I know that stdout copied in the tmp file does not stop at the first command encountered : basename and goes as far as stdin of while command ?
<(...) by itself is a process substitution. It behaves like a file name, except the "contents" of the file it names are the output of the command. For example,
$ echo <(echo foo)
/dev/fd/63
$ cat <(echo foo; echo bar)
foo
bar
A while loop doesn't take arguments, which is why you get a syntax error without the input redirection.
$ while read x; do echo "$x"; done <(echo foo; echo bar)
bash: syntax error near unexpected token `<(echo foo; echo bar)'
With the input redirection, the while loop (which is a command, and like any other command, has its own standard input) uses the process substitution as its standard input.
$ while read x; do echo "$x"; done < <(echo foo; echo bar)
foo
bar
while doesn't actually use its standard input, but any command in the while loop inherits its standard input from the loop. That includes read, so each execution of read gets a different line from the file until the file is exhausted, at which point read has an exit status of 1 and the loop terminates.
how far does a redirection go in a command line?
The redirection goes for the whole duration of a command it is applied to.
There is shell grammar which defines the basic stuff that are inside a "command line". You can peak at POSIX shell standard and Bash documentation.
A command is one of the following:
Simple command (see Simple Commands)
Pipeline (see Pipelines)
List compound-list (see Lists)
Compound command (see Compound Commands)
Function definition (see Function Definition Command)
A command may be a compound command, which may be a looping construct, which may be a while looping construct. A while is a single, one command.
The redirection is redirected for the whole duration of a command, and inherited for any commands inside that command.
while
redirected here
do
redirected here
done < redirection
if
redirected here
then
redirected here
else
redirected here
fi < redirection
etc.
I've found an interesting bash script that with some modifications would likely solve my use case. But I'm unsure if I understand how it works, in particular the pipe between the blocks.
How do these two blocks work together, and what is the behaviour of the pipe that separates them?
function isTomcatUp {
# Use FIFO pipeline to check catalina.out for server startup notification rather than
# ping with an HTTP request. This was recommended by ForgeRock (Zoltan).
FIFO=/tmp/notifytomcatfifo
mkfifo "${FIFO}" || exit 1
{
# run tail in the background so that the shell can
# kill tail when notified that grep has exited
tail -f $CATALINA_HOME/logs/catalina.out &
# remember tail's PID
TAILPID=$!
# wait for notification that grep has exited
read foo <${FIFO}
# grep has exited, time to go
kill "${TAILPID}"
} | {
grep -m 1 "INFO: Server startup"
# notify the first pipeline stage that grep is done
echo >${FIFO}
}
# clean up
rm "${FIFO}"
}
Code Source: https://www.manthanhd.com/2016/01/15/waiting-for-tomcat-to-start-up-in-a-script/
bash has a whole set of compound commands, which work much like simple commands. Most relevant here is that each compound command has its own standard input and standard output.
{ ... } is one such compound command. Each command inside the group inherits its standard input and output from the group, so the effect is that the standard output of a group is the concatenation of its children's standard output. Likewise, each command inside reads in turn from the group's standard input. In your example, nothing interesting happens, because grep consumes all of the standard input and no other command tries to read from it. But consider this example:
$ cat tmp.txt
foo
bar
$ { read a; read b; echo "$b then $a"; } < tmp.txt
bar then foo
The first read gets a single line from standard input, and the second read gets the second. Importantly, the first read consumes a line of input before the second read could see it. Contrast this with
$ read a < tmp.txt
$ read b < tmp.txt
where a and b will both contain foo, because each read command opens tmp.txt anew and both will read the first line.
The { …; } operations groups the commands such that the I/O redirections apply to all the commands within it. The { must be separate as if it were a command name; the } must be preceded by either a semicolon or a newline and be separate too. The commands are not executed in a sub-shell, unlike ( … ) which also has some syntactic differences.
In your script, you have two such groupings connected by a pipe. Because of the pipe, each group is in a sub-shell, but it is not in a sub-shell because of the braces.
The first group runs tail -f on a file in background, and then waits for a FIFO to be closed so it can kill the tail -f. The second part looks for the first occurrence of some specific information and when it finds it, stops reading and writes to the FIFO to free everything up.
As with any pipeline, the exit status is the status of the last group — which is likely to be 0 because the echo succeeds.
I came across a syntax for "while read" loop in a bash script
$> while read line; do echo $line; done < f1 # f1 is a file in my current directory
will print the file line by line.
my search for "while read" in the bash GNU manual https://www.gnu.org/software/bash/manual/
came up short, and while other "tutorial sites" give some usage examples, i would still like to understand the full syntax options for this construct.
can it be used for "for" loops as well?
something like
for line in read; do echo $line; done < f1
The syntax for a while loop is
while list-1; do list-2; done
where list-1 is one or more commands (usually one) and the loop continues while list-1 is successful (return value of zero), list-2 is the "body" of the loop.
The syntax of a for loop is different:
for name in word; do list ; done
where word is usually a list of strings, not a command (although it can be hacked to use a command which returns word).
The purpose of a for loop is to iterate through word, the purpose of while is to loop while a command is successful. They are used for different tasks.
Redirection changes a file descriptor to refer to another file or file descriptor.
< changes file descriptor 0 (zero), also known as stdin
> changes file descriptor 1 (one), also known as stdout
So somecommand < foo changes stdin to read from foo rather than the terminal keyboard.
somecommand > foo changes stdout to write to foo rather than the terminal screen (if foo exists it will be overwritten).
In your case somecommand is while, but it can be any other - note that not all commands read from stdin, yet the command syntax with < is still valid.
A common mistake is:
# WRONG!
while read < somefile
do
....
done
In that case somecommand is read and the effect is that it will read the first line of somefile, then proceed with the body of the loop, come back, then read the first line of the file again! It will continually loop just reading the first line, since while has no knowledge or interest in what read is doing, only its return value of success or fail. (read uses the variable REPLY if you don't specify one)
Redirection examples ($ indicates a prompt):
$ cat > file1
111111
<CTRL+D>
$ cat > file2
222222
<CTRL+D>
cat reads from stdin if we don't specify a filename, so it reads from the keyboard. Instead of writing to the screen we redirect to a file. The <CTRL+D> indicates End-Of-File sent from the keyboard.
This redirects stdin to read from a file:
$ cat < file1
111111
Can you explain this?
$ cat < file1 file2
222222
Tried to keep my code as simple as possible:
1: What are the rules for using echo within a while loop?
All my $a and some of my $word variables are echoed not my echo kk?
2: What is the scope of my count variable? Why is it not working within my while loop? can I extend the variable to make it global?
3: When I use the grep in the final row the $word cariable only prints the first word in the passing rows ehile if I remove the grep line in the end $work functions as intended and prints all the words.
count=1
while read a; do
((count=count+1))
if [ $count -le 2 ]
then
echo $a
echo kk
for word in $a; do
echo $word
done
fi
done < data.txt | grep Iteration
Use Process Substitution
In a comment, you say:
I thtought I was using grep on data.txt (sic)
No. Your current pipeline passes the loop's results through grep, not the source file. To do that, you need to rewrite your redirection to use process substitution. For example:
count=1
while read a; do
((count=count+1))
if [ $count -le 2 ]
then
echo $a
echo kk
for word in $a; do
echo $word
done
fi
done < <(fgrep Iteration data.txt)
#CodeGnome answered your question but there's other problems with your script that will come back to bite you at some point. (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for discussions on some of them and also google quoting shell variables). Just don't do it. Shell scripts are just for sequencing calls to tools and the UNIX tool for manipulating text is awk. In this case all you'd need to do the job robustly, portably and efficiently would be:
awk '
/Iteration/ {
if (++count <= 2) {
print
print "kk"
for (i=1; i<=NF; i++) {
print $i
}
}
}' data.txt
and of course it'd be more efficient still if you just stop reading the input when count hits 2:
awk '
/Iteration/ {
print
print "kk"
for (i=1; i<=NF; i++) {
print $i
}
if (++count == 2) {
exit
}
}' data.txt
To complement CodeGnome's helpful answer with an explanation of how your command actually works and why it doesn't do what you want:
In Bash's grammar, an input redirection such as < data.txt is part of a single command, whereas |, the pipe symbol, chains multiple commands, from left to right, to form a pipeline.
Technically, while ... done ... < data.txt | grep Iteration is a single pipeline composed of 2 commands:
a single compound command (while ...; do ...; done) with an input redirection (< data.txt),
and a simple command (grep Iteration) that receives the stdout output from the compound command via its stdin, courtesy of the pipe.
In other words:
only the contents of data.txt is fed to the while loop as input (via stdin),
and whatever stdout output the while loop produces is then sent to the next pipeline segment, the grep command.
By contrast, it sounds like you want to apply grep to data.txt first, and only sent the matching lines to the while loop.
You have the following options for sending a command's output to another command:
Note: The following solutions use a simplified while loop for brevity - whether a while command is single-line or spans multiple lines is irrelevant.
Also, instead of using input redirection (< data.txt) to pass the file content to grep, data.txt is passed as a filename argument.
Option 1: Place the command whose output to send to your while loop first in the pipeline:
grep 'Iteration' data.txt | while read -r a; do echo "$a"; done
The down-side of this approach is that your while loop then runs in a subshell (as all segments of a pipeline do by default), which means that variables defined or modified in your while command won't be visible to the current shell.
In Bash v4.2+, you can fix this by running shopt -s lastpipe, which tells Bash to run the last pipeline segment - the while command in this case - in the current shell instead.
Note that lastpipe is a nonstandard bash extension to the POSIX standard.
(To try this in an interactive shell, you must first turn off job control with set +m.)
Option 2: Use a process substitution:
Loosely speaking, a process substitution <(...) allows you to present command output as the content of a temporary file that cleans up after itself.
Since <(...) expands to the temporary file's (FIFO's) path, and read in the while loop only accepts stdin input, input redirection must be applied as well: < <(...):
while read -r a; do echo "$a"; done < <(grep 'Iteration' data.txt)
The advantage of this approach is that the while loop runs in the current subshell, and any variables definitions or modifications therefore remain in scope after the command completes.
The potential down-side of this approach is that process substitutions are a nonstandard bash extension to the POSIX standard (although ksh and zsh support them too).
Option 3: Use a command substitution inside a here-document:
Using the command first in the pipeline (option 1) is a POSIX-compliant approach, but doesn't allow you to modify variables in the current shell (and Bash's lastpipe option is not POSIX-compliant).
The only POSIX-compliant way to send command output to a command that runs in the current shell is to use a command substitution ($(...)) inside a double-quoted here-document:
while read -r a; do echo "$a"; done <<EOF
$(grep 'Iteration' data.txt)
EOF
Streamlining your code and making it more robust:
The rest of your code has some non-obvious pitfalls that are worth addressing:
Double-quote your variable references (e.g., echo "$a" instead of echo $a), unless you specifically want word-splitting and globbing (filename expansion) applied to the values; word splitting and globbing are two kinds of shell expansions.
Similarly, don't use for to iterate over an (of necessity unquoted) variable reference (don't use for word in $a, in your case), unless you want globbing applied to the individual words - see what happens when you run $a='one *'; for word in $a; do echo "$word"; done
You could turn globbing off beforehand (set -f) and back on after (set +f), but it's better to use read -ra words ... to read the words into an array first, and then safely iterate over the array elements with for word in "${words[#]}"; ...- note the "..." around the array variable reference.
Always use -r with read; without it, rarely used \-preprocessing is applied, which will "eat" embedded \ chars.
If we heed the advice above, apply a few additional tweaks, and use a process substitution to feed grep's output to the while loop, we get:
count=1
while read -r a; do # Note the -r
if (( ++count <= 2 )); then
echo "$a"
# Split $a safely into words and store the words in
# array variable ${words[#]}.
read -ra words <<<"$a" # Note the -a to read into an *array*.
# Loop over the words (elements of the array).
# Note: To simply print the words, you could use
# `printf '%s\n' "${words[#]}"`` instead of the loop.
for word in "${words[#]}"; do
echo "$word"
done
fi
done < <(grep 'Iteration' data.txt)
Note: As written, you don't need a loop at all, because you always exit after the 1st iteration.
Finally, as a general alternative for larger input sets, consider Ed Morton's helpful answer, which is much faster due to using awk to process your input file, whereas looping in shell code is generally slow.
What does this print your shell?
echo foo | while read line; do echo $line; done < <(echo bar)
I would expect it to evaluate to echo foo | bar or foo < <(bar), both of which would result in an error message.
In Bash 4.1.5 it looks like the pipe is simply discarded:
bar
In Dash:
sh: Syntax error: redirection unexpected
Dash doesn't support process substitution (<()).
The behavior you're seeing is consistent if you use syntax that's supported by each of the shells you're comparing. Try this:
echo hello | cat < inputfile
You should see the contents of "inputfile" and not "hello". Of several shells I tried, only Z shell showed both.
This is what POSIX says regarding pipelines and redirection:
The standard output of command1 shall be connected to the standard input of command2. The standard input, standard output, or both of a command shall be considered to be assigned by the pipeline before any redirection specified by redirection operators that are part of the command (see Redirection ).
I interpret this to mean that in the case of the example above, the pipeline assigns stdin to cat then the redirection overrides it.