Bash: file descriptors - bash

I am a Bash beginner but I am trying to learn this tool to have a job in computers one of these days.
I am trying to teach myself about file descriptors now. Let me share some of my experiments:
#!/bin/bash
# Some dummy multi-line content
read -d '' colours <<- 'EOF'
red
green
blue
EOF
# File descriptor 3 produces colours
exec 3< <(echo "$colours")
# File descriptor 4 filters colours
exec 4> >(grep --color=never green)
# File descriptor 5 is an unlimited supply of violet
exec 5< <(yes violet)
echo Reading colours from file descriptor 3...
cat <&3
echo ... done.
echo Reading colours from file descriptor 3 again...
cat <&3
echo ... done.
echo Filtering colours through file descriptor 4...
echo "$colours" >&4
echo ... done. # Race condition?
echo Dipping into some violet...
head <&5
echo ... done.
echo Dipping into some more violet...
head <&5
echo ... done.
Some questions spring to mind as I see the output coming from the above:
fd3 seems to get "depleted" after "consumption", is it also automatically closed after first use?
how is fd3 different from a named pipe? (something I have looked at already)
when exactly does the command yes start executing? upon fd declaration? later?
does yes stop (CTRL-Z or other) and restart when more violet is needed?
how can I get the PID of yes?
can I get a list of "active" fds?
very interesting race condition on filtering through fd4, can it be avoided?
will yes only stop when I exec 5>&-?
does it matter whether I close with >&- or <&-?
I'll stop here, for now.
Thanks!
PS: partial (numbered) answers are fine.. I'll put together the different bits and pieces myself.. (although a comprehensive answer from a single person would be impressive!)

fd3 seems to get "depleted" after "consumption", is it also automatically closed after first use?
No, it is not closed. This is due to the way exec works. In the mode in which you have used exec (without arguments), its function is to arrange the shell's own file descriptors as requested by the I/O redirections specified to itself, and then leave them that way until the script terminated or they are changed again later.
Later, cat receives a copy of this file descriptor 3 on its standard input (file descriptor 0). cat's standard input is implicitly closed when cat exits (or perhaps, though unlikely, cat closes it before it exists, but that doesn't matter). The original copy of this file, which is the shell's file descriptor 3, remains. Although the actual file has reached EOF and nothing further will be read from it.
how is fd3 different from a named pipe? (something I have looked at already)
The shell's <(some command) syntax (which is not standard bourne shell syntax and I believe is only available in zsh and bash, by the way) might actually be implemented using named pipes. It probably isn't under Linux because there's a better way (using /dev/fd), but it probably is on other operating systems.
So in that sense, this syntax may or may not be a helper for setting up named pipes.
when exactly does the command yes start executing? upon fd declaration? later?
As soon as the <(yes violet) construct is evaluated (which happens when the exec 5< <(yes violet) is evaluated).
does yes stop (CTRL-Z or other) and restart when more violet is needed?
No, it does not stop. However, it will block soon enough when it starts producing more output than anything reading the other end of the pipe is consuming. In other words, the pipe buffer will become full.
how can I get the PID of yes?
Good question! $! appears to contain it immediately after yes is executed. However there seems to be an intermediate subshell and you actually get the pid of that subshell. Try <(exec yes violet) to avoid the intermediate process.
can I get a list of "active" fds?
Not from the shell. But if you're using an operating system like Linux that has /proc, you can just consult /proc/self/fd.
very interesting race condition on filtering through fd4, can it be avoided?
To avoid it, you presumably want to wait for the grep process to complete before proceeding through the script. If you obtain the process ID of that process (as above), I think you should be able to wait for it.
will yes only stop when I exec 5>&-?
Yes. What will happen then is that yes will continue to try to produce output forever but when the other end of the file descriptor is closed it will either get a write error (EPIPE), or a signal (SIGPIPE) which is fatal by default.
does it matter whether I close with >&- or <&-?
No. Both syntaxes are available for consistency's sake.

Related

Why use "<&3", "3&-" and "3</file" in a bash while loop? And what does it actually do?

I'm going through a bash script, trying to figure out how it works and potentially patching it. The script in question is this cryptroot script from debian responsible for decrypting block devices at boot. Not being completely at home in bash definitely makes it a challenge.
I found this piece of code and I'm not sure what it does.
if [ -r /conf/conf.d/cryptroot ]; then
while read mapping <&3; do
setup_mapping "$mapping" 3<&-
done 3< /conf/conf.d/cryptroot
fi
My guess is that it reads each line in /conf/conf.d/cryptroot and passes it to setup_mapping. But I don't quite understand how, what the significance of <&3, 3&- and 3</conf/conf.d/cryptroot is, and what they do?
When I read lines from a file I usually do something like this:
while read LINE
do COMMAND
done < FILE
Where the output of FILE is directed to read in the while loop and doing COMMAND until the last line.
I also know a little bit about redirection, as in, I sometimes use it to redirect STDOUT and STDERR to stuff like /dev/null for example. But I'm not sure what the redirection to 3 means.
After reading a little bit more about I/O redirection I've something close to an answer, according to tldp.org.
The file descriptors for stdin, stdout, and stderr are 0, 1, and 2,
respectively. For opening additional files, there remain descriptors 3
to 9.
So the 3 is "just" a reference to an open file or:
...simply a number that the operating system assigns to an open file
to keep track of it. Consider it a simplified type of file pointer.
So from what I understand:
The 3< /conf/conf.d/cryptroot opens /conf/conf.d/cryptroot for reading and assigns it to file descriptor 3.
The read mapping <&3 seems to be reading the first line from file descriptor 3, which points to the open file /conf/conf.d/cryptroot.
The setup_mapping "$mapping" 3<&- seems to be closing file descriptor 3, does this mean that it is opened again for every turn in the loop and pointing to the next line?
If the above is correct my question is why do this rather than the "normal" way? e.g.
while read mapping; do
setup_mapping "$mapping"
done < /conf/conf.d/cryptroot
What advantage (if any) does the first version provide?
A common problem with
while read LINE
do COMMAND
done < FILE
is that people forget the COMMAND is also reading from FILE, and potentially consuming data intended to be read by read in the control of the while loop. To avoid this, a common idiom is to instead read from a different file descriptor. That is accomplished with <&3. But doing that opens file descriptor 3 for COMMAND. That may not be an issue, but it's reasonable to explicitly close it with 3<&- . In short, the construction you're seeing is just a way to avoid having setup_mapping inadvertently read data intended for read.

Is lockfile necessary for reading and writing the same file of two processes

I'm working with Bash script and meeting such a situation:
one bash script will write things into a file, and the other bash script will read things from the same file.
In this case, is lockfile necessary? I think I don't need to use lockfile because there are only one reading process and only one writing process but I'm not sure.
Bash write.sh:
#!/bin/bash
echo 'success' > tmp.log
Bash read.sh:
#!/bin/bash
while :
do
line=$(head -n 1 ./tmp.log)
if [[ "$line" == "success" ]]; then
echo 'done'
break
else
sleep 3
fi
done
BTW, the write.sh could write several key words, such as success, fail etc.
While many programmers ignore this, you can potentially run into a problem because writing to the file is not atomic. When the writer does
echo success > tmp.log
it could be split into two (or more) parts: first it writes suc, then it writes cess\n.
If the reader executes between those steps, it might get just suc rather than the whole success line. Using a lockfile would prevent this race condition.
This is unlikely to happen with short writes from a shell echo command, which is why most programmers don't worry about it. However, if the writer is a C program using buffered output, the buffer could be flushed at arbitrary times, which would likely end with a partial line.
Also, since the reader is reading the file from the beginning each time, you don't have to worry about starting the read where the previous one left off.
Another way to do this is for the writer to write into a file with a different name, then rename the file to what the reader is looking for. Renaming is atomic, so you're guaranteed to read all of it or nothing.
At least from your example, it doesn't look like read.sh really cares about what gets written to tmp.log, only that write.sh has created the file. In that case, all read.sh needs to check is that the file exists.
write.sh can simply be
: > tmp.log
and read.sh becomes
until [ -e tmp.log ]; do
sleep 3
done
echo "done"

greater than symbol at beginning of line

I've just seen the following in a script and I'm not sure what it means:
.............
started=$STATUSDIR/.$EVENT_ID-started
errs=$STATUSDIR/.$EVENT_ID-errors
# started is used to capture the time we started, so
# that it can be used as the latest-result marker for
# the next run...
>"$started"
>"$errs"
# store STDERR on FD 3, then point STDERR to $errs
exec 3>&2 2>"$errs"
..............
Specifically, the ">" at the beginning of the lines. The script actually fails with "No such file or directory". The vars are all supplied via subsidiary scripts and there doesn't seem to be any logic to create the directories it's complaining about.
It's not the easiest thing to Google for, so I thought I'd ask it here so that future bash hackers might find the answers you lovely people are able to provide.
This is a redirection. It's the same syntax used for echo hello >file (or its less-conventional but equally-correct equivalent >file echo hello), just without the echo hello. :)
When attached to an empty command, the effect of a redirection is identical to what it would be with a command that ran and immediately exited with no output: It creates (if nonexistent) or truncates (if existent) the output file, closes that file, and lets the script move on to the next command.

Capture stdout for a long time in bash

I'm using a script which is calling another, like this :
# stuff...
OUT="$(./scriptB)"
# do stuff with the variable OUT
Basically, the scriptB script displays text in multiple time. Ie : it displays a line, 2s late another, 3s later another and so on.
With the snippet i use, i only get the first output of my command, i miss a lot.
How can i get the whole output, by capturing stdout for a given time ? Something like :
begin capture
./scriptB
stop capture
I don't mind if the output is not shown on screen.
Thanks.
If I understand your question, then I believe you can use the tee command, like
./scriptB | tee $HOME/scriptB.log
It will display the stdout from scriptB and write stdout to the log file at the same time.
Some of your output seems to be coming on the STDERR stream. So we have to redirect that as needed. As in my comment, you can do
{ ./scriptB ; } > /tmp/scriptB.log 2>&1
Which can almost certainly be reduced to
./scriptB > /tmp/scriptB.log 2>&1
And in newer versions of bash, can further be reduced to
./scriptB >& /tmp/scriptB.log
AND finally, as your original question involved storing the output to a variable, you can do
OUT=$(./scriptB > /tmp/scriptB.log 2>&1)
The notation 2>&1 says, take the file descriptor 2 of this process (STDERR) and tie it (&) into the file descriptor 1 of the process (STDOUT).
The alternate notation provided ( ... >& file) is a shorthand for the 2>&1.
Personally, I'd recommend using the 2>&1 syntax, as this is understood by all Bourne derived shells (not [t]csh).
As an aside, all processes by default have 3 file descriptors created when the process is created, 0=STDIN, 1=STDOUT, 2=STDERR. Manipulation of those streams is usually as simple as illustrated here. More advanced (rare) manipulations are possible. Post a separate question if you need to know more.
IHTH

why does redirect (<) not create a subshell

I wrote the following code
var=0
cat $file | while read line do
var=$line
done
echo $var
Now as I understand it the pipe (|) will cause a sub shell to be created an therefore the variable var on line 1 will have the same value on the last line.
However this will solve it:
var=0
while read line do
var=$line
done < $file
echo $line
My question is why does the redirect not cause a subshell to be created, or if you like why does pipe cause one to be created?
Thanks
The cat command is a command which means it needs its own process and has its own STDIN and STDOUT. You're basically taking the STDOUT produced by the cat command and redirecting it into the process of the while loop.
When you use redirection, you're not using a separate process. Instead, you're merely redirecting the STDIN of the while loop from the console to the lines of the file.
Needless to say, the second way is more efficient. In the old Usenet days before all of you little whippersnappers got ahold of our Internet (_Hey you kids! Get off of my Internet!) and destroyed it with your fancy graphics and all them web page, some people use to give out the Useless Use of Cat award for people who contributed to the comp.unix.shell group and had a spurious cat command because the use of cat is almost never necessary and is usually more inefficient.
If you're using a cat in your code, you probably don't need it. The cat command comes from concatenate and is suppose to be used only to concatenate files together. For example, when we use to use SneakerNet on 800K floppies, we would have to split up long files with the Unix split command and then use cat to merge them back together.
A pipe is there to hook the stdout of one program to the stdin or another one. Two processes, possibly two shells. When you do redirection (> and <), all you're doing remapping stdin (or stdout) to a file. reading/writing a file can be done without another process or shell.

Resources