Command Substitution - order of evaluation in Bash - bash

I was trying to run this seemingly simple script which should display the functionality of the -a flag of touch: diff <(stat file.o) <(touch -a file.o; stat file.o). The output of this command is sporadic - obviously sometimes touch gets executed after everything else has been evaluated - but in an example as: diff <(echo first) <(echo second; echo third) - the order is kept. So why doesnt the first command work aswell?

The <( command-list ) syntax does the following:
Run command-list asynchronously
Store output of command-list in a temporary file
Replace itself on the command line with the path to that temporary file
See Process Substitution.
The first point is likely what is tripping you up. There is no guarantee that your first process substitution will run before your second process substitution, therefore touch -a might be executed before either call to stat.
Your second example will always work as expected, because the output of each individual process substitution will be serialized. Even if echo second happens before echo first, they'll still be written to their respective temporary files and echo third will always happen after echo second so they will appear in the correct order in their file. The overall order of the two process substitutions doesn't really matter.

Both commands happen at the same time.
That is to say, touch -a file.o; stat file.o from one process substitution and stat file.o from the other are happening concurrently.
So sometimes the touch happens before the process substitution that only has a stat; that means that both the stat commands see the effect of the touch, because (in that instance) the touch happened first.
As an (ugly, bad-practice) example, you can observe that it no longer happens when you add a delay:
diff <(stat file.o) <(sleep 1; touch -a file.o; stat file.o)

Related

variable passing through awk command [duplicate]

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 1 year ago.
here's my issue, I have a bunch of fastq.gz files and I need to determine the number of lines of it (this is not the issue), and from that number of line derive a value that determine a threshold used as a variable used down in the same loop. I browsed but cannot find how to do it. here's what I have so far:
for file in *R1.fastq*; do
var=echo $(zcat "$file" | $((`wc -l`/400000)))
for i in *Bacter*; do
awk -v var1=$var '{if($2 >= var1) print $0}' ${i} | wc -l >> bacter-filtered.txt
done
done
I get the error message: -bash: 14850508/400000: No such file or directory
any help would be greatly appreciated !
The problem is in the line
var=echo $(zcat "$file" | $((`wc -l`/400000)))
There are a bunch of shell syntax elements here combined in ways that don't connect up with each other. To keep things straight, I'd recommend splitting it into two separate operations:
lines=$(zcat "$file" | wc -l)
var=$((lines/400000))
(You may also have to do something about the output to bacter-filtered.txt -- it's just going to contain a bunch of numbers, with no identifications of which ones come from which files. Also since it always appends, if you run this twice you'll have the output from both runs stuck together. You might want to replace all those appends with a single > bacter-filtered.txt after the last done, so the whole output just gets stored directly.)
What's wrong with the original? Well, let's start with this:
zcat "$file" | $((`wc -l`/400000))
Unless I completely misunderstand, the purpose here is to extract $file (with zcat), count lines in the result (with wc -l), and divide that by 400000. But since the output of zcat isn't piped directly to wc, it's piped to a complex expression involving wc, it's somewhat ambiguous what should happen, and is actually different under different shells. In zsh, it does something completely different from that: it lets wc read from the script's stdin (generally your Terminal), divides the result from that by 400000, and then pipes the output of zcat to that ... number?
In bash, it does something closer to what you want: wc actually does read from the output of zcat, so the second part of the pipe essentially turns into:
... | $((14850508/400000))
Now, what I'd expect to happen at this point (and happens in my tests) is that it should evaluate $((14850508/400000)) into 37, giving:
... | 37
which will then try to execute 37 as a command (because it's part of a pipeline, and therefore is supposed to be a command). But for some reason it's apparently not evaluating the division and just trying to execute 14850508/400000 as a command. Which doesn't really work any better or worse than 37, so I guess it doesn't matter much.
So that's where the error is coming from, but there's actually another layer of confusion in the original line. Suppose that internal pipeline was fixed so that it properly output "37" (rather than trying to execute it). The outer structure would then be:
var=echo $(cmdthatprints37)
The $( ) basically means "run the command inside, and substitute its output into the command line here", so that would evaluate to:
var=echo 37
...which, in shell syntax, means "run the command 37 with var set to "echo" in its environment.
The solution here would be simple. The echo is messing everything up so remove it:
var=$(cmdthatprints37)
...which evaluates to:
var=37
...which is what you want. Except that, as I said above, it'd be better to split it up and do the command bits and the math separately rather than getting them mixed up.
BTW, I'd also recommend some additional double-quoting of shell variables; shellcheck.net will be happy to point out where.

How to delay `redirection operator` of BASH `>`

First I create 3 files:
$ touch alpha bravo carlos
Then I want to save the list to a file:
$ ls > info.txt
However, I always got my info.txt inside:
$ cat info.txt
alpha
bravo
carlos
info.txt
It looks like the redirection operator creates my info.txt first.
In this case, my question is. How can I save my list of files before creating the info.txt first?
The main question is about the redirection operator. Why does it act first, and how to delay it so I complete my task first? Using the example above to answer it.
When you redirect a command's output to a file, the shell opens a file handle to the destination file, then runs the command in a child process whose standard output is connected to this file handle. There is no way to change this order, but you can redirect to a file in a different directory if you don't want the ls output to include the new file.
ls >/tmp/info.txt
mv /tmp/info.txt ./
In a production script, you should make sure that the file name is unique and unpredictable.
t=$(mktemp -t lstemp.XXXXXXXXXX) || exit
trap 'rm -f "$t"' INT HUP
ls >"$t"
mv "$t" ./info.txt
Alternatively, capture the output into a variable, and then write that variable to a file.
files=$(ls)
echo "$files" >info.txt
As an aside, probably don't use ls in scripts. If you want a list of files in the current directory
printf '%s\n' *
does that.
One simple approach is to save your command output to a variable, like this:
ls_output="$(ls)"
and then write the value of that variable to the file, using any of these commands:
printf '%s\n' "$ls_output" > info.txt
cat <<< "$ls_output" > info.txt
echo "$ls_output" > info.txt
Some caveats with this approach:
Bash variables can't contain null bytes. If the output of the command includes a null byte, that byte and everything after it will be discarded.
In the specific case of ls, though, this shouldn't be an issue, because the output of ls should never contain a null byte.
$(...) removes trailing newlines. The above compensates for this by adding a newline while creating info.txt, but if the the command output ends with multiple newlines, then the above will effectively collapse them into a single newline.
In the specific case of ls, this could happen if a filename ends with a newline — very unusual, and unlikely to be intentional, but nonetheless possible.
Since the above adds a newline while creating info.txt, it will put a newline there even if the command output doesn't end with a newline.
In the specific case of ls, this shouldn't be an issue, because the output of ls should always end with a newline.
If you want to avoid the above issues, another approach is to save your command output to a temporary file in a different directory, and then move it to the right place; for example:
tmpfile="$(mktemp)"
ls > "$tmpfile"
mv -- "$tmpfile" info.txt
. . . which obviously has different caveats (e.g., it requires access to write to a different directory), but should work on most systems.
One way to do what you want is to exclude the info.txt file from the ls output.
If you can rename the list file to .info.txt then it's as simple as:
ls >.info.txt
ls doesn't list files whose names start with . by default.
If you can't rename the list file but you've got GNU ls then you can use:
ls --ignore=info.txt >info.txt
Failing that, you can use:
ls | grep -v '^info\.txt$' >info.txt
All of the above options have the advantage that you can safely run them after the list file has been created.
Another general approach is to capture the output of ls with one command and save it to the list file with a second command. As others have pointed out, temporary files and shell variables are two specific ways to capture the output. Another way, if you've got the moreutils package installed, is to use the sponge utility:
ls | sponge info.txt
Finally, note that you may not be able to reliably extract the list of files from info.txt if it contains plain ls output. See ParsingLs - Greg's Wiki for more information.

does output from LHS of pipe become an arg for RHS of pipe

I'm having difficulty grasping how pipes work. Initially I thought of them as per the title but I couldn't get a simple example to work e.g.
mkdir temp
cd temp
echo "rubbish" > txtfile
ls | cat
I'm wondering why it returns the output from 'ls' rather than the output of 'cat txtfile' (i.e. "rubbish"). I've read many pipe tutorials but none of them seem to go beyond "STDOUT of LHS becomes STDIN for RHS" and I'm left wondering what is STDIN of RHS. Does it become the first argument? Where does it slot in when RHS of pipe has options or more than one argument. Is there any kind of macro substitution taking place or is my thinking wide of the mark.
Edit: I'm still none the wiser 5 comments later. I'll certainly take a look at Roadowl's pv utility but for now if I type
ls | cut -c 2-4
I get
xtf
which I'd expect. So, does cut take its input from stdin but cat doesn't?
Edit2: I stuck the question up on askubuntu (I originally put it up here by mistake). The answer there https://askubuntu.com/questions/1316848/does-output-from-lhs-of-pipe-become-an-arg-for-rhs-of-pipe throws a bit more light on it.
Edit3: While reading the answers here and ask ubuntu and the links therein it struck me (again) how woeful bash (& cohorts) are. It's almost like they're designed to trip you up. I only started using bash a couple of months back and every time I write a script I have to read endless web pages to get it to work or discover where I'm going wrong. Take a simple [[ $1=="..." ]] condition. You forget the spaces round the operator and the else condition might wipe some files you want without so much as a warning. Yes, you can do great things with it without a lot of typing but at times it's like using a tightrope to get from skyscraper A to skyscraper B to avoid using 2 lifts. What's up with gold c code like cat(ls())? That said, thanks to everyone who contributed.
I guess, you meant while performing
ls | cat
ls should return txtfile and which should go as a file input to cat command.
But, the things happening in the background are different :
First your shell creates a pipe using pipe(int pipefd[2]) system-call. This pipe has 2 ends, one is read and another is write.
When ls command is executing, it writes its output to the write end of the pipe and cat simultaneously reads from the read end of the pipe.
So, here STDOUT of ls is the write end whereas STDIN for cat is read end of the pipe.
While reading from the pipe cat will consider it as a stream of bytes and not as a name of the file.
So basically, cat is printing whatever is coming as a stream of bytes.
Read about pipe() over here : pipe(2) — Linux manual page
ls | cut -c 2-4
Here, cut reads its standard input, gets the line txtfile, takes characters 2 to 4 from it, producing xtf, and prints that on standard output. That's what the command line option tells it to do.
ls | cat
Here, cat reads its standard input, gets the line txtfile, and prints that on standard output, unchanged. That's what cat does. If there were further lines, it would do the same for those.
Both read standard input unless one or more file names are given as arguments. That standard input is connected to the terminal (the same one where you enter the command line), unless you use pipes or redirections to change that.
So, run the command cut -c 2-4, and enter the line abcdefghijkl, and it will print out bcd. Because without any arguments, it reads its standard input, which is the terminal, by default. Similarly for running just cat, you'll get back the same line you entered.
Running ls | cut -c 2-4 changes where the standard input comes from, but it doesn't create any new command line arguments (other than the -c and 2-4 you gave). Command line arguments are not the same as the standard input.
So, echo txtfile | cat is not the same as running cat txtfile, any more than running echo txtfile | cut -c 2-4 is the same as running cut -c 2-4 txtfile. For some reason, you seem to expect the pipe should work differently for cat than it does for cut.

Append to list of files in bash

so I'm trying to get a simple bash script to continuously read a directory and update a list of files to play through a command. However, I'm having some trouble thinking out the logic in it. What I need to do is put the current items in the directory into the list, have each item in the directory run through a program, and when a new item comes in, just append it to the list. I'm attempting to use inotifywait but can't seem to think of the proper logic. I may need it to run in the background, as the process that is running on these files will run before inotifywait is read again, at which point it will not pick up any new files that have been added as it only checks when it runs. Here's the code so hopefully it makes more sense.
#!/bin/bash
#Initial check to see if files are converted.
if [ ! -d "/home/pi/rpitx/converted" ]; then
echo "Converted directory does not exist, cannot play!"
exit 1
fi
CYAN='\e[36m'
NC='\e[39m'
LGREEN='\e[92m'
#iterate through directory first and act upon each item
for f in $FILES
do
echo -e "${CYAN}Now playing ${f##*/}...${NC}"
#Figure out a way to always watch directory even when it is playing
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to |
while read path action file; do
echo -e "${LGREEN}New file found: ${CYAN}${file}${NC}"
FILES+=($file)
done
# take action on each file. $f store current file name
sudo ./rpitx -m RF -i "${f}" -f 101100
done
exit 0
So for example. if rpitx is currently playing something, and a file is converted, it won't pick up the latest file and add it to the list, nor will it make it since it's always reading. Is there a way to get inotifywait to run in the background of this script somehow? Thanks.
This is actually quite a difficult problem to get 100% perfect, but it is possible to get pretty close.
It is easy to get all the files in a directory, and it is easy to use inotifywait to get iteratively informed of new files being placed into the directory. The issue is getting the two to be consistent. If inotifywait isn't started until all the files have been processed (or even just listed), then you might miss new files created between the listing and the invocation of inotifywait. If, on the other hand, you start inotifywait first, then a file created after the invocation of inotifywait and the extraction of the current file list will be listed twice.
Since it is easier to filter duplicates than notice orphans, the recommended approach is the second one.
As a first approximation, we could ignore the duplicate problem on the assumption that the window of vulnerability is pretty short and so it is probably unlikely to happen. This simplifies the code, but it's not that difficult to track and eliminate duplicates: we could, for example, store each filename as the key in an associative array, ignoring the file if the key already exists.
We need three processes: one to execute inotifywait; one to produce the list of initial files; and one to handle each file as it is identified. So the basic structure of the code will be:
list_new_files |
{ list_existing_files; pass_through; } |
while read action file; do
handle -r "$action" "$file"
done
Note that the second process first produces the existing files, and then calls pass_through, which reads from standard input and writes to standard output, thus passing through the files being discovered by list_new_files. Since pipes have a finite capacity, it is possible that the execution of list_existing_files will block a few times (if there are lots of existing files and handling them takes a long time), so when pass_through finally gets executed, it could have quite a bit of queued-up input to pass through. That doesn't matter, unless the first pipe also fills up, which will happen if a large number of new files are created. And that still won't matter as long as inotifywait doesn't lose notifications while it is blocked on a write. (This may actually be a problem, since the manpage for inotifywait on my system includes in the "BUGS" section the note, "It is assumed the inotify event queue will never overflow." We could fix the problem by inserting another process which carefully buffers inotifywait's output, but that shouldn't be necessary unless you intend to flood the directory with lots of files.)
Now, let's examine each of the functions in turn.
list_new_files could be just the call to inotifywait from your original script:
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to
Listing existing files is also easy. Here's one simple solution:
printf "%s\n" /home/pi/rpitx/converted/*
However, that will print out the full file path, which is different from the output from inotifywait. To make them the same, we cd into the directory in order to do the listing. Since we might not actually want to change the working directory, we use a subshell by surrounding the commands inside parentheses:
( cd /home/pie/rpitx/converted; printf "%s\n" *; )
The printf just prints its arguments each on a separate line. Since glob-expansions are not word-split or recursively glob-expanded, this is safe against whitespace or metacharacters in filenames, except newline characters. Filenames with newline characters are pretty rare; for now, I'll ignore the issue but I'll indicate how to handle it at the end.
Even with the change indicated above, the output from these two commands is not compatible: the first one outputs three things on each line (directory, action, filename), and the second one just one thing (the filename). In the listing below, you'll see how we modify the format to printf and introduce a format for inotifywait in order to make the outputs fully compatible, with the "action" for existing files set to EXISTING.
pass_through could, in theory, just be cat, and that's how I've coded it below. However, it is important that it operate in line-buffered mode; otherwise, nothing will happen until "enough" files have been written by list_existing_files. On my system, cat in this configuration works perfectly; if that doesn't work for you or you don't want to count on it, you could write it explicitly as a while read loop:
pass_through() {
while read -r line; do echo "$line"; done
}
Finally, handle is essentially the code from the original post, but modified a bit to take the new format into account, and to do the right thing with action EXISTING.
# Colours. Note the use of `$'...'` to actually store the code,
# thereby avoiding the need to later reinterpret backslash sequences
CYAN=$'\e[36m'
NC=$'\e[39m'
LGREEN=$'\e[92m'
converted=/home/pi/rpitx/converted
list_new_files() {
inotifywait -m "$converted" -e create -e moved_to --format "%e %f"
}
# Note the use of ( ) around the body instead of { }
# This is the same as `{( ... )}'; it makes the `cd` local to the function.
list_existing_files() (
cd "$converted"
printf "EXISTING %s\n" *
)
# Invoked as `handle action filename`
handle() {
case "$1" in
EXISTING)
echo "${CYAN}Now playing ${2}...${NC}"
;;
*)
echo "${LGREEN}New file found: ${CYAN}${file}${NC}"
;;
esac
sudo ./rpitx -m RF -i "${f}" -f 101100
}
# Put everything together
list_new_files |
{ list_existing_files; cat; } |
while read -r action file; do handle "$action" "$file"; done
What if we thought a filename might have a newline character in it? There are two "safe" characters which could be used to delimit the filenames, in the sense that they cannot appear inside a filename. One is /, which can obviously appear in a path, but cannot appear in a simple filename, which is what we're working with here. The other one is the NUL character, which cannot appear inside a filename at all, but can sometimes be a bit annoying to deal with.
Normally, faced with this problem, we would use a NUL, but that depends on the various utilities we're using allowing the separation of data with NUL instead of newline. That's not the case for inotifywait, which always outputs a newline after a notification line. So in this case it seems simpler to use a /. First we modify the formats:
inotifywait -m "$converted" -e create -e moved_to --format "%e %f/"
printf "%s/\n" *
Now, when we're reading the lines, we need to read until we find a line ending with / (and remember to remove it). read doesn't allow two-character line terminators, so we need to accumulate the lines ourselves:
while read -r action file; do
# If file doesn't end with a slash, we need to read another line
while [[ file != */ ]] && read -r line; do
file+=$'\n'"$line"
done
# Remember to remove the trailing slash
handle "$action" "${file%/}"
done

bash one-liner for opening `less` on the last screen w/o temporary files

I try to create a one-liner for opening less on the last screen of an multi-screen output coming from standard input. The reason for this is that I am working on a program that produces a long AST and I need to be able to traverse up and down through it but I would prefer to start at the bottom. I came up with this:
$ python a.py 2>&1 | tee >(lines=+$(( $(wc -l) - $LINES))) | less +$lines
First, I need to compute number of lines in output and subtract $LINES from it so I know what's the uppermost line of the last screen. I will need to reuse a.py output later so I use tee with process substitution for that purpose. As the last step I point less to open an original stdout on a particular line. Of course, it doesn't work in Bash because $lines is not set in last step as every subcommand is run in a subshell. In ZSH, even though pipe commands are not run in a subshell, process substitution still is and therefore it doesn't work neither. It's not a homework or a work task, I just wonder whether it's possible to do what I want without creating a temporary file in Bash or ZSH. Any ideas?
less supports this innately. The + syntax you're using accepts any less command you could enter while it's running, including G for go-to-end.
... | less +G
does exactly what you want.
This is actually mentioned explicitly as an example in the man page (search for "+G").
The real answer to your question should be the option +G to less, but you indicated that the problem definition is not representative for the abstract problem you want to solve. Therefore, please consideer this alternative problem:
python a.py 2>&1 | \
awk '
{a[NR]=$0}
END{
print NR
for (i=1;i<=NR;i++)print a[i]
}
' | {
read -r l
less -j-1 +$l
}
The awk command is printing the number of lines, and then all the lines in sequence. We define the first line to contain some meta information. This is piped to a group of commands delimited by { and }. The first line is consumed by read, which stores it in variable $l. The rest of the lines are taken by less, where this variable can be used. -j-1 is used, so the matched line is at the bottom of the screen.

Resources