In bash, is there a way have multiple pipes to one process? - bash

For example, if I want to do a diff of two files after preprocessing both of them with sed, is there any way to do this without temporary files?
I have tried things like this and (as I expected) it did not work:
(sed "$expr" file1; sed "$expr" file2) | diff - -
I was thinking there might be a way to create pipes explicitly or something.

Try doing this :
diff <(sed "$expr" file1) <(sed "$expr" file2)
This uses Process Substitution. <( ) is replaced by a temporary filename. Writing or reading that file causes bytes to get piped to the command inside. Often used in combination with file redirection:
cmd1 2> >(cmd2)
See
http://mywiki.wooledge.org/ProcessSubstitution
http://mywiki.wooledge.org/BashFAQ/024

Related

How to store the command output in variable and redirect to file in a same line?

a=`cat /etc/redhat-release | awk '{print $2}' > /tmp/a.txt`
The above command is not redirecting the output to file.
A command substitution captures stdout of the command contained. When you redirect that output to a file, it's no longer on stdout, so it's no longer captured.
Use tee to create two copies -- one in a file, one on stdout.
a=$(awk '{print $2}' </etc/redhat-release | tee /tmp/a)
Note also:
cat shouldn't be used when it isn't needed: Giving awk a direct handle on the input file saves an extra process, allowing a direct write from a file rather than a FIFO -- and is following a practice that will generate much larger efficiency games with programs like sort, shuf, tail, or wc -c that can use more efficient algorithms when reading from a file.
The modern (and standard-compliant, since 1991) syntax for command substitution is $(...). It nests better than the ancient backtick syntax it replaces, and use of backslashes within is less confusing.

Shell, grep for list of patterns

I have two csv files a.csv and b.csv. I 'cut' one column from the a.csv file and now I want to grep for each one of the string from this column in second file b.csv.
Can someone please help me in writing a shell script for this?
You want the -f (and likely -F and possibly -w) flags to grep for this sort of task.
$ cut ... a.csv > tmp
$ grep -Ff tmp b.csv
You can do this without the temporary file on shells that support process substitution.
$ grep -Ff <(cut ... a.csv) b.csv

Use zcat and sed or awk to edit compressed .gz text file

I am trying to edit compressed fastq.gz text files, by removing the first six characters of lines 2,6,10,14... I have two different ways of doing this right now, either using awk or sed, but these only seem to work if the files are unzipped. I would like to edit the files without unzipping them and tried the following code without getting it to work. Thanks.
Using sed:
zcat /dir/* | sed -i~ '2~4s/^.\{6\}//'
Using awk:
zcat /dir/* | awk 'NR%4==2 {gsub(/^....../,"")} 1'
You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:
for f in /dir/*; do
cp "$f" "$f~" &&
gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done
If you're quite confident in the operation, you can remove the backup files by adding rm "$f~" to the end of the loop body.
I wrote a script called zawk which can do this natively. It's similar to glenn jackman's answer to a duplicate of this question, but it handles awk options and several different compression mechanisms and input methods while retaining FILENAME and FNR.
You'd use it like:
zawk 'awk logic goes here' log*.gz
This does not address sed's "in-place" flag (-i).

How to ensure file written with sed w command is closed

I'm using the sed 'w' command to get the labels from a TeX document using:
/\\label{[a-zA-Z0-9]*}/w labels.list
This script is part of a pipeline in which, later on, awk reads the file that sed has just written. e.g
cat bob | sed -f sedScript | awk -f awkScript labels.list -
Sometimes the pipeline produces the correct output, sometimes it doesn't (for exactly the same input file 'bob'). It's random.
I can only conclude that sometimes awk tries to read the file before sed has closed it properly. Is there anyway I can force sed to close the file at the end of the script, or any other suggestions as to what the problem may be?
All stages in a pipeline run in parallel. This is an extremely important and defining feature of pipes, and there is nothing you can or should attempt to do in order to prevent or circumvent that.
Instead, you should rewrite your script so that all data dependencies are executed and finished in the order you need them to be. In the general case, you'd do
cat bob | sed -f sedScript > tempfile
cat tempfile | awk -f awkScript labels.list -
or equivalently in your case:
grep '\\label{[a-zA-Z0-9]*}' bob > labels.list
awk -f awkScript labels.list bob

Redirecting the input of two commands via pipe operator in Bash

I have not learned Bash in a formal way so please do give me suggestions for a more descriptive question title.
Instead of creating a temporary file whose lifespan is limited to that of the command it is used by (in this case, command), as in:
zcat input.txt.gz > input.txt
command input.txt
rm input.txt
we can avoid it as follows:
zcat input.txt.gz | command -
Now my question is whether this is possible with two inputs. I wish to avoid creating two temporary files, as in:
zcat input1.txt.gz > input1.txt
zcat input2.txt.gz > input2.txt
command input1.txt input2.txt
rm input1.txt input2.txt
I am guessing that the following solution can remove the need to create one of the two temporary files, as:
zcat input1.txt.gz > input1.txt
zcat input2.txt.gz | command input1.txt -
rm input1.txt
but I wonder if there is a way to completely avoid creating the temporary file.
I hope my question was clear enough. Though I used zcat as an example, the solution I am looking for should be more general. Thanks in advance.
If you're trying to combine the output of multiple commands into a single pipe, use a subshell:
(cat file1.txt; cat file2.txt) | nl
If you want to use the output of a command as a filename for a command, use process substitution:
diff <(zcat file1.gz) <(zcat file2.gz)
Subshells might get you what you want:
command $(zcat input1.txt.gz) $(zcat input2.txt)
So long the stdout of the 2 subshells (above) make up arguments for 'command'

Resources