grep with unassigned variable exits a loop unexpectedly - bash

I have a simple bash script that, so far, just reads the each line of a file and prints it. Simple enough:
while read i
do
echo $i
#otherViewDef=`grep -i $currentView $viewssqlfile`
done <$viewsdeffile
This script works as expected, unless the commented line is uncommented. If this is this case, the loop exits after echoing the first line of the file. I understand that this should not work as both currentView and viewsqlfile are unset, but what is the justification for this behavior as opposed to reporting an error and giving a non-zero return signal?

I think there's something different; this can't be the actual script, because the errors would be different. Assuming $currentView is set but $viewssqlfile is not, the assignment executes
grep -i $currentView
which reads from stdin, which means it greps the contents of $viewsdeffile. It finds no matches, so prints nothing. After that, the read i has nothing to read, returns false, and the loop exits.
In other words, if the controlling read of a loop reads from a redirected stdin, make sure no program in the loop body attempts to reads from stdin as well; they all share the same stdin.
Placing set -x near the top is likely to provide some insight.

Related

Why does the Bash read command return without any input?

I have a Bash script foo that accepts a list of words on STDIN. The script should read the words into an array, then ask for user input:
while IFS= read -r; do
words+=( "$REPLY" )
done
read -r ans -p "Do stuff? [Yn] "
echo "ans: |$ans|"
The problem is, Bash immediate reads an empty string into the ans variable, without waiting for actual user input. That is, I get the following output (with no prompt or delay):
$ cat words.txt | foo
ans: ||
Since everything piped into STDIN has already been consumed by the first read call, why does the second read call return without actually reading anything?
Judging by your symptoms, it looks like you've redirected stdin to provide the list of words to the while loop either via an input file (foo < file) or via a pipeline (... | foo).
If so, your second read command won't automatically switch back to reading from the terminal; it is still reading from whatever stdin was redirected to, and if that input has been consumed (which is exactly what your while loop does, as chepner points out in a comment), read reads nothing, and returns with exit code 1 (which is what terminated the while loop to begin with).
If you explicitly want the second read command to get user input from the terminal, use:
read -r -p "Do stuff? [Yn] " ans </dev/tty
Note:
Stdin redirected from a (finite) file (or pipeline or process substitution with finite output) is a finite resource that eventually reports an EOF condition once all input has been consumed:
read translates the EOF condition into exit code 1, causing the while loop to exit:
Specifically, if read cannot read any more characters, it assigns the null string (empty string) to the specified variable(s) (or $REPLY if none were specified), and sets the exit code to 1.
Note: read may set exit code 1 even when it does read characters (and stores them in the specified variable(s) / $REPLY), namely if the input ends without a delimiter; the delimiter is \n by default, otherwise the delimiter explicitly specified with -d.
Once all input has been consumed, subsequent read commands cannot read anything anymore (the EOF condition persists, and the behavior is as described above).
By contrast, interactive stdin input from a terminal is potentially infinite: additional data is provided by whatever the user types interactively whenever stdin input is requested.
The way to simulate an EOF condition during interactive multiline input (i.e, to terminate an input loop) is to press ^D (Control-D):
When ^D is pressed once at the very start of a line, read returns without reading anything and sets the exit code to 1, just as if EOF had been encountered.
In other words: the way to terminate unbounded interactive input in a loop is to press ^D after having submitted the last line of input.
By contrast, in the interior of an input line, pressing ^D twice is needed to stop reading and set the exit code to 1, but note that the line typed so far is saved to the target variable(s) / $REPLY.[1]
Since the stdin input stream wasn't actually closed, subsequent read commands work normally and continue to solicit interactive user input.
Caveat: If you press ^D at the shell's prompt (as opposed to while a running program is requesting input), you'll terminate the shell itself.
P.S.:
There is one incidental error in the question:
The second read command must place operand ans (the name of the variable to store the input in) after all options in order to work syntactically: read -r -p "Do stuff? [Yn] " ans
[1] As William Pursell points out in a comment on the question: ^D causes the read(2) system call to return with whatever is in the buffer at that point; the direct value returned is the count of characters read.
A count of 0 is how the EOF condition is signaled, and Bash's read translates that into exit code 1, causing termination of the loop.
Thus, pressing ^D at the start of a line, when the input buffer is empty, exits the loop immediately.
By contrast, if characters have already been typed on the line, then the first ^D causes read(2) to return however many characters were typed so far, upon which Bash's read reinvokes read(2), because the delimiter (a newline by default) hasn't been encountered yet.
An immediately following second ^D then causes read(2) to return 0, since no characters were typed, causing Bash's read to set exit code 1 and exit the loop.

Read file as input for a command skipping lines

I'm trying to use the contents of a text file as the input for a command. I know how to read a file just fine. However, when I pass the read line to the command I want to execute, the script starts skipping every other line.
Given a plain text file named queue:
one
two
three
four
This prints out each line as expected:
queue=`pwd`/queue
while read input; do
echo $input
done < $queue
output:
one
two
three
four
However, when I pass $input off to the command, every other line is skipped:
queue=`pwd`/queue
while read input; do
echo $input
transcode-video --dry-run $input
done < $queue
output (transcode-video outputs a bunch of stuff, but I omitted that for brevity. I don't believe it is relevant):
one
three
I managed to get my script working by first reading the whole file into an array and then iterating over the array, but I still don't understand why directly looping over the file doesn't work. I'm assuming the file pointer is getting advanced somehow, but I cannot figure out why. transcode-video is a ruby gem. Is there something I'm not aware of going on behind the scenes when the ruby program is executed? The author of the gem provided a sample script that actually strips lines out of the file using a sed command, and that works fine.
Can someone explain what is going on here?
The launched app tries to process stdin, and reads a line. Try:
transcode-video --dry-run $input </dev/null
Or check the manual for a command-line flag that does the job.

Both pipe and redirecting exist in shell

How to explain the output of cat /etc/passwd | cat </etc/issue?
In this case, the second cat receives contents from /etc/passwd as $STDIN and again /etc/issue is redirected. Why there is only /etc/issue left?
What's more, cat </etc/passwd </etc/issue only outputs the contents in /etc/issue. Is /etc/passwd overwritten?
I am not looking for a solution how to cat two files, but confused with how pipeline works.
Piping and redirection are processed from left to right.
So first the input of cat is redirected to the pipe. Then it is redirected to /etc/issue. Then the program is run, using the last redirection, which is the file.
When you do cat <file1 <file2, stdin is first redirected to file1, then it is redirected to file2. Then the program is run, and it gets its input from the last redirection.
It's like variable assignments. If you do:
stdin=passwd
stdin=issue
The value of stdin at the end is the last one assigned.
This is explained in the bash documentation, in the first paragraph of the section on Redirection:
Before a command is executed, its input and output may be redirected using a special notation interpreted by the shell. Redirection may also be used to open and close files for the current shell execution environment. The following redirection operators may precede or appear anywhere within a simple command or may follow a command. Redirections are processed in the order they appear, from left to right.
(emphasis mine). I assume it's also in the POSIX shell specification, I haven't bothered to look it up. This is how Unix shells have always behaved.
The pipe is created first: the standard output of cat /etc/passwd is sent to write side of the pipe, and the standard input of cat </etc/issue is set to the read side of the pipe. Then the command on each half of the pipe is processed. There's no other I/O redirection on the LHS, but on the RHS, the standard input is redirected so it comes from /etc/issue. That means there's nothing actually reading the read end of the pipe, so the LHS cat is terminated with a SIGPIPE (probably; alternatively, it writes data to the pipe but no process ever reads it). The LHS cat never knows about the pipe input — it only has the the file input for its standard input.

why does redirect (<) not create a subshell

I wrote the following code
var=0
cat $file | while read line do
var=$line
done
echo $var
Now as I understand it the pipe (|) will cause a sub shell to be created an therefore the variable var on line 1 will have the same value on the last line.
However this will solve it:
var=0
while read line do
var=$line
done < $file
echo $line
My question is why does the redirect not cause a subshell to be created, or if you like why does pipe cause one to be created?
Thanks
The cat command is a command which means it needs its own process and has its own STDIN and STDOUT. You're basically taking the STDOUT produced by the cat command and redirecting it into the process of the while loop.
When you use redirection, you're not using a separate process. Instead, you're merely redirecting the STDIN of the while loop from the console to the lines of the file.
Needless to say, the second way is more efficient. In the old Usenet days before all of you little whippersnappers got ahold of our Internet (_Hey you kids! Get off of my Internet!) and destroyed it with your fancy graphics and all them web page, some people use to give out the Useless Use of Cat award for people who contributed to the comp.unix.shell group and had a spurious cat command because the use of cat is almost never necessary and is usually more inefficient.
If you're using a cat in your code, you probably don't need it. The cat command comes from concatenate and is suppose to be used only to concatenate files together. For example, when we use to use SneakerNet on 800K floppies, we would have to split up long files with the Unix split command and then use cat to merge them back together.
A pipe is there to hook the stdout of one program to the stdin or another one. Two processes, possibly two shells. When you do redirection (> and <), all you're doing remapping stdin (or stdout) to a file. reading/writing a file can be done without another process or shell.

When the input is from a pipe, does STDIN.read run until EOF is reached?

Sorry if this is a naïve question, but let's say I have a Ruby program called processor.rb that begins with data = STDIN.read. If I invoke this program like this
cat textfile.txt | processor.rb
Does STDIN.read wait for cat to pipe the entire textfile.txt in? Or does it assign some indeterminate portion of textfile.txt to the data variable?
I'm asking this because I recently saw a strange bug in one of my programs that suggests that the latter is the case.
The read method should import the entire file, as-is, and return only when the process producing the output has finished, as indicated by a flag on the pipe. It should be the case that on output from cat that if you call read a subsequent time, you will return 0 bytes.
In simple terms, a process is allowed to append to its output at any time, which is the case of things like 'tail -f', so you can't be assured that you have read all the data from STDIN without actually checking.
Your OS may implement cat or shell pipes slightly differently, though. I'm not familiar with what POSIX dictates for behavior here.
Probably is line buffered and reads until it encounters a newline or EOF.

Resources