Execution order of subshells? - bash

While trying to solve other problems, I have come across the following bash script in Alex B's answer in this question:
#!/bin/bash
(
# Wait for lock on /var/lock/.myscript.exclusivelock (fd 200) for 10 seconds
flock -x -w 10 200 || exit 1
# Do stuff
) 200>/var/lock/.myscript.exclusivelock
I have problems understanding that script. According to flock's manual, the file descriptor (the 200) in flock -x -w 10 200 must relate to an open file.
Where is that descriptor / file opened? If it is the 200>/var/lock/.myscript.exclusivelock which opens the descriptor, that would mean that this part is executed before the subshell, which is the opposite of what I have thought when I initially have looked at this script.
This leads me to my question: What is the execution order of subshells in bash, in relation to the main script (i.e. the script opening the subshells) as well as in relation to other subshells which the same main script might spawn?
From reading other articles and the bash manual, I believe I have only learned that subshells are executed "concurrently", but I didn't see any statement explaining if there are execptions from this (one obvious exception would be when the main script would need the output of a subshell, like echo foo $(cat bar)).

200>, the redirection operator, opens the file using descriptor 200. It is indeed processed before the subshell. That file descriptor is then inherited by the subshell.
There is nothing inherently concurrent about subshells. You may be thinking of pipelines, like a | b | c, where a, b, and c are all commands that run concurrently. The fact that each is run in a subshell (usually a subprocess proper, if they are external commands, but even shell built-ins execute in a subshell) is an implementation detail of the pipeline.
To elaborate,
First, the shell parses this command. It identifies the complex command (...) with an output redirection.
It opens /var/lock/.myscript.exclusivelock in write mode on file descriptor 200.
It executes the subshell, which inherits all open file descriptors, including 200.
In the subshell, it executes flock, which inherits all open file descriptors from its parent, the subshell. It does its thing on file descriptor 200, as requested by its argument.
Once the subshell exits, any file opened by one of its redirection operators is closed by the shell.

Related

Make bash file wait for system calls made in Fortran

My bash script looks something like this
mpiexec ./fortran_bin |& tee text_file
wait
./process_output_files
My MPI-based Fortran program makes several synchronous system calls with call exec_cmd(cmd,wait=.true.).
My problem is that handle_output_files only waits for fortran_bin to finish, but some system commands (cmd) are not yet done, and this messes up my output files.
How do I make process_output_files wait for cmd to finish?
NOTES
I'm not sure where best to solve this problem (if there is a solution):
within Fortran, with MPI, within Bash ...
cmd is of the form cat out_{1..n} > out && rm -f out_{1..n}.
I would like it to run synchronously (wait=.false.), because cmd can be time-consuming, and unrelated to the rest of the Fortran program.
The wait line in the bash script seems to have no effect.
I suppose you could ask the equivalent question for a C/C++ program that calls system(some_script).
But I can only find question about waiting within a C/C++ program, if the same program needs the result of the called command (e.g., here and here).
From the notes above, looks like the sub command inherit the stdout/stderr on the calling process, and do not leave any background processes behind.
If those assumptions are true, you can impose a wait until there is no more output coming from the fortran_bin and it's children by piping the output into cat (or similar). The cat program will not terminate until all 'fortran_bin' children (that did not redirect stderr) will finish
mpiexec ./fortran_bin 2>&1 3>&1 | cat
Possible to use tee (or other similar programs) instead of cat

What does 200>"$somefile" accomplish? [duplicate]

This question already has an answer here:
How does this canonical flock example work?
(1 answer)
Closed 8 years ago.
I've found boilerplate flock(1) code which looks promising. Now I want to understand the components before blindly using it.
Seems like these functions are using the third form of flock
flock [-sxun] [-w timeout] fd
The third form is convenient inside shell scripts, and is usually used
the following manner:
(
flock -s 200
# ... commands executed under lock ...
) 200>/var/lock/mylockfile
The piece I'm lost on (from the sample wrapper functions) is this notation
eval "exec $LOCKFD>\"$LOCKFILE\""
or in shorthand from the flock manpage
200>/var/lock/mylockfile
What does that accomplish?
I notice subsequent commands to flock passed a value other than the one in the initial redirect cause flock to complain
flock: 50: Bad file descriptor
It seems like flock is using the file descriptors as a map to know which file to operate on. In order for that to work though, those descriptors would have to still be around and associated with the file, right?
After the redirect is finished, and the lock file is created, isn't the file closed, and file descriptors associated with the open file vaporized? I thought file descriptors were only associated with open files.
What's going on here?
200>/var/lock/mylockfile
This creates a file /var/lock/mylockfile which can be written to via file descriptor 200 inside the sub-shell. The number 200 is an arbitrary one. Picking a high number reduces the chance of any of the commands inside the sub-shell "noticing" the extra file descriptor.
(Typically, file descriptors 0, 1, and 2 are used by stdin, stdout, and stderr, respectively. This number could have been as low as 3.)
flock -s 200
Then flock is used to lock the file via the previously created file descriptor. It needs write access to the file, which the > in 200> provided. Note that this happens after the redirection above.

Bash: file descriptors

I am a Bash beginner but I am trying to learn this tool to have a job in computers one of these days.
I am trying to teach myself about file descriptors now. Let me share some of my experiments:
#!/bin/bash
# Some dummy multi-line content
read -d '' colours <<- 'EOF'
red
green
blue
EOF
# File descriptor 3 produces colours
exec 3< <(echo "$colours")
# File descriptor 4 filters colours
exec 4> >(grep --color=never green)
# File descriptor 5 is an unlimited supply of violet
exec 5< <(yes violet)
echo Reading colours from file descriptor 3...
cat <&3
echo ... done.
echo Reading colours from file descriptor 3 again...
cat <&3
echo ... done.
echo Filtering colours through file descriptor 4...
echo "$colours" >&4
echo ... done. # Race condition?
echo Dipping into some violet...
head <&5
echo ... done.
echo Dipping into some more violet...
head <&5
echo ... done.
Some questions spring to mind as I see the output coming from the above:
fd3 seems to get "depleted" after "consumption", is it also automatically closed after first use?
how is fd3 different from a named pipe? (something I have looked at already)
when exactly does the command yes start executing? upon fd declaration? later?
does yes stop (CTRL-Z or other) and restart when more violet is needed?
how can I get the PID of yes?
can I get a list of "active" fds?
very interesting race condition on filtering through fd4, can it be avoided?
will yes only stop when I exec 5>&-?
does it matter whether I close with >&- or <&-?
I'll stop here, for now.
Thanks!
PS: partial (numbered) answers are fine.. I'll put together the different bits and pieces myself.. (although a comprehensive answer from a single person would be impressive!)
fd3 seems to get "depleted" after "consumption", is it also automatically closed after first use?
No, it is not closed. This is due to the way exec works. In the mode in which you have used exec (without arguments), its function is to arrange the shell's own file descriptors as requested by the I/O redirections specified to itself, and then leave them that way until the script terminated or they are changed again later.
Later, cat receives a copy of this file descriptor 3 on its standard input (file descriptor 0). cat's standard input is implicitly closed when cat exits (or perhaps, though unlikely, cat closes it before it exists, but that doesn't matter). The original copy of this file, which is the shell's file descriptor 3, remains. Although the actual file has reached EOF and nothing further will be read from it.
how is fd3 different from a named pipe? (something I have looked at already)
The shell's <(some command) syntax (which is not standard bourne shell syntax and I believe is only available in zsh and bash, by the way) might actually be implemented using named pipes. It probably isn't under Linux because there's a better way (using /dev/fd), but it probably is on other operating systems.
So in that sense, this syntax may or may not be a helper for setting up named pipes.
when exactly does the command yes start executing? upon fd declaration? later?
As soon as the <(yes violet) construct is evaluated (which happens when the exec 5< <(yes violet) is evaluated).
does yes stop (CTRL-Z or other) and restart when more violet is needed?
No, it does not stop. However, it will block soon enough when it starts producing more output than anything reading the other end of the pipe is consuming. In other words, the pipe buffer will become full.
how can I get the PID of yes?
Good question! $! appears to contain it immediately after yes is executed. However there seems to be an intermediate subshell and you actually get the pid of that subshell. Try <(exec yes violet) to avoid the intermediate process.
can I get a list of "active" fds?
Not from the shell. But if you're using an operating system like Linux that has /proc, you can just consult /proc/self/fd.
very interesting race condition on filtering through fd4, can it be avoided?
To avoid it, you presumably want to wait for the grep process to complete before proceeding through the script. If you obtain the process ID of that process (as above), I think you should be able to wait for it.
will yes only stop when I exec 5>&-?
Yes. What will happen then is that yes will continue to try to produce output forever but when the other end of the file descriptor is closed it will either get a write error (EPIPE), or a signal (SIGPIPE) which is fatal by default.
does it matter whether I close with >&- or <&-?
No. Both syntaxes are available for consistency's sake.

Piping input to a shell command and keeping the created shell alive

My overarching program is a shell script. This shell script calls a C program that I need to pipe input to, and ultimately the C program will create a shell.
However, when I pipe my input into the C program within the shell script
Do_Other_Stuff
./my_prog < file1
I can't get the shell to stay alive. Running just,
Do_Other_Stuff
./my_prog
works, as I have to input the stdin myself, and the shell correctly spawns when my_prog exits. I'm pretty sure wrapping up the ./my_prog call in a C program, and compiling and running that would work, but I'm curious as to whether there's a cleaner way with shell.
I've tried several combinations of using cat file1 | ./my_prog and using & in different situations, and haven't had any success.
Thanks!
Try:
cat file1 - | ./myprog
Many programs recognize the "filename" - to mean stdin.
Do you have access to the C program source code? My guess is that the C program is using istty(0) to determine if stdin is coming from a terminal. It probably only creates an interactive shell when that is the case. Using stdin redirection, whether from a file or a pipe, means that istty(0) returns false.

How does shell execute piped commands?

I want to understand that how does shell executes piped commands ? e.g. cat | more. I am aware that for executing a normal command shell does a fork, execute it and then child returns. But how does shell internally handle the execution of piped commands ?
Considering for example cat | grep, the shell first forks itself to start cat, and then forks itself once more to start grep.
Before calling one of the exec* family of functions in the two newly created processes to start the two programs, the tricky part is setting up the pipe and redirecting the descriptors. The pipe(2) system call is used in the shell process before forking to return a pair of descriptors which both children inherit - a reading end and a writing end.
The reading end will be closed in the first process (cat), and stdout will be redirected to the writing end using the dup2(2) system call. Similarly, the writing end in the second process (grep) will be closed and stdin will be redirected to the reading end again using dup2(2).
This way both programs are unaware of a pipe because they just work with the standard input/output.
It sets up a pipe using the pipe system call, forks two processes instead of one and attaches one end of the pipe to the first process' stdout and the other end to the second process' stdin.
The same, just the stdout of one application is the same as the next stdin. http://unixwiz.net/techtips/remap-pipe-fds.html

Resources