How to store the output of a Bash command inside >() in a variable? - bash

I have this:
tee < /some/big/file >(wc -c) >(md5sum) | ...
Instead of writing the results of wc -c and md5sum to stdout, I want to store the results into two variables for later processing. I don't want to read the file more than once. How can I accomplish that?

This doesn't directly answer your question... but you only have to read the file once, and you don't have to use tee.
SIZE=$(wc -c /some/big/file)
MD5=$(md5 /some/big/file)
So, how many times does this read the file? Once. This is because wc -c doesn't actually read the file, it just looks at how long it is and reports back. Here are my tests:
$ time wc -c /big/file >/dev/null
real 0m0.003s
user 0m0.000s
sys 0m0.000s
$ time wc -c </big/file >/dev/null
real 0m0.004s
user 0m0.000s
sys 0m0.000s
$ time cat /big/file | wc -c >/dev/null
real 0m52.945s
user 0m0.160s
sys 0m19.612s
The lesson: don't mix tee (or cat) with wc -c, because it's a big waste of time. Just do the md5 normally and don't worry about wc -c.
Note: The reason why wc -c <file is fast is because it gets an ordinary file handle, just as if wc had called open() itself.
Pipe performance
You should almost never use cat in a pipe.
cat file | cmd # slow
cmd <file # fast
Calling cat usually means creating an extra process which serves no purpose. In some cases, as with wc -c, it actually slows down the program after the pipe. I mean, you could stick cat anywhere you like, but it's just silly:
echo 'hello, world' | cat
cat file.txt | less
cat file.txt | cat | less
cat file.txt | cat | sort | cat | cat | uniq | cat >file_unique.txt
This is better:
echo 'hello, world'
less file.txt
sort file.txt | uniq >file_unique.txt

You can do this with a FIFO and temporary files.
input=/some/big/file
mkfifo tmp
wc -l <tmp >wc.out &
md5=$(tee <"$input" tmp | md5sum)
fg
lines=$(cat wc.out)
rm tmp
rm wc.out

May be someone can offer something better, but I think your best shot is
LINES=$(cat /some/big/file | wc -l)
MD5=$(cat /some/big/file | md5)
The reason, I think it's the best shot, is that one way or another you will have to apply two separate operations to the content of the file. So unless you have a command that knows how to do it at the same time you will have to read it twice.
Beyond reading it twice, that should be a solution to your problem.

I don't think you can export a variable from the >() (Process substitution). If you don't mind redirecting the output to temporary files you can then just read the file into a variable.
Example
tee < /some/big/file >(cmd1 > tmp1) >(cmd2 > tmp2) | ...
CMD1_OUT=$(cat tmp1)
CMD2_OUT=$(cat tmp2)

Related

Switching between "|" and "&&" symbols in a command based on input

I'm using a bash script to concatenate the execution of a series of programs. Each of these programs produces an output and has two flags that can be set as -iInputFile.txt and -oOutputFile.txt, if no flag is set then standard input and output are automatically selected. Most of the time I simply concatenate my programs as
./Program1 | ./Program2 | ./Program3
but if I happen to need to save the data to a file, and then also access it from the next file I need to do
./Program1 | ./Program2 -oFile.txt && ./Program3 -iFile.txt
so my question is if there is a way to provide an input, for example 010, and only convert the symbols between script 2 and 3 from | to && while leaving everything else untouched. Hard-coding it would be impossible since I have up to 12 programs concatenated so it would have even 12! combinations. It's my first time asking so if anything is unclear from the question I'll edit to provide any information required, thank you all in advance.
If you are scripting this, you can hard-code in tee between the pipeline and use bash's default value parameter expansion to essentially turn off the 'write to file feature'
./Program1 | tee ${outFile1:- /dev/null} | ./Program2 | tee ${outFile2:- /dev/null} | \
./Program3 | tee ${outFile3:- /dev/null}
Note that the last call to tee might be superfluous
Proof of Concept
$ unset outFile; echo foo | tee ${outFile:- /dev/null} | cat - && cat ./tmp
foo
cat: ./tmp: No such file or directory
$ outFile=./tmp; echo foo | tee ${outFile:- /dev/null} | cat - && cat ./tmp
foo
foo

Process substitution into grep missing expected outputs

Let’s say I have a program which outputs:
abcd
l33t
1234
which I will simulate with printf 'abcd\nl33t\n1234\n'. I would like to give this output to two programs at the same time. My idea would be to use process substitution with tee. Let’s say I want to give a copy of the output to grep:
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]' >&2) | grep '[0-9]'
I get the following with Bash 4.1.2 (Linux, CentOS 6.5), which is fine:
l33t
1234
abcd
l33t
But if the process substitution is not redirected to stderr (i.e. without >&2), like this:
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]') | grep '[0-9]'
Then I get:
l33t
1234
l33t
It’s like the stdout from process substitution (the first grep) is used by the process after the pipe (the second grep). Except the second grep is already reading things by itself, so I guess it's not supposed to take into account things from the first grep. Unless I’m mistaken (which I surely am).
What am I missing?
Explanation
As far as the command line is concerned, process substitution is just a way of making a special filename. (See also the docs.) So the second pipeline actually looks like:
printf 'abcd\nl33t\n1234\n' | tee /dev/fd/nn | grep '[0-9]'
where nn is some file-descriptor number. The full output of printf goes to /dev/fd/nn, and also goes to the grep '[0-9]'. Therefore, only the numerical values are printed.
As for the process inside the >(), it inherits the stdout of its parent. In this case, that stdout is inside the pipe. Therefore, the output of grep '[a-z]' goes through the pipeline just like the standard output of tee does. As a result, the pipeline as a whole only passes lines that include numbers.
When you write to stderr instead (>&2), you are bypassing the last pipeline stage. Therefore, the output of grep '[a-z]' on stderr goes to the terminal.
A fix
To fix this without using stderr, you can use another alias for your screen. E.g.:
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]' >/dev/tty ) | grep '[0-9]'
# ^^^^^^^^^
which gives me the output
l33t
1234
abcd
l33t
Testing this
To sort this out, I ran echo >(ps). The ps process was a child of the bash process running the pipeline.
I also ran
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]')
without the | grep '[0-9]' at the end. On my system, I see
abcd <--- the output of the tee
l33t ditto
1234 ditto
abcd <-- the output of the grep '[a-z]'
l33t ditto
All five lines go into the grep '[0-9]'.
After the tee you have two streams of
abcd
l33t
1234
The 1st grep (>(grep '[a-z]' >&2) filters out the
abcd
l33t
and prints the result to its(!!!) stderr - which is still connected to your terminal...
So, another simple demo:
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]' >&2) | grep '[0-9]'
this prints
l33t
1234
abcd
l33t
now add the wc -l
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]' >&2) | grep '[0-9]' | wc -l
and you will get
abcd
l33t
2
where you clearly can see: the
abcd
l33t
is the stderr of the 1st grep but the 2nd grep's stdout is redirected to the wc and prints the
2
Now another test:
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]' ) | cat -
output
abcd
l33t
1234
abcd
l33t
e.g. the two lines from the grep and the full input from the print
count:
printf 'abcd\nl33t\n1234\n' | tee >(grep '[a-z]' ) | cat - | wc -l
output
5

Need help writing this specific bash script

Construct the pipe to execute the following job.
"Output of ls should be displayed on the screen and from this output the lines
containing the word ‘poem’ should be counted and the count should be
stored in a file.”
If bash is allowed, use a process substitution as the receiver for tee
ls | tee >( grep -c poem > number.of.poetry.files)
Your attempt was close:
ls | tee /dev/tty | grep poem | wc -l >number_of_poems
The tee /dev/tty copies all ls output to the terminal. This satisfies the requirement that "Output of ls should be displayed on the screen." while also sending ls's output to grep's stdin.
This can be further simplified:
ls | tee /dev/tty | grep -c poem >number_of_poems
Note that neither of these solutions require bash. Both will work with lesser shells and, in particular, with dash which is the default /bin/sh under debian-like systems.
This sounds like a homework assignment :)
#!/bin/bash
ls
ls -l | grep -c poem >> file.txt
The first ls will display the output on the screen
The next line uses a series of pipes to output the number of files/directories containing "poem"
If there were 5 files with poem in them, file.txt would read 5. If file.txt already exists, the new count will be appended to the end. If you want overwrite file each time, change the line to read ls -l | grep -c poem > file.txt

Can I take an output stream, duplicate it with tee, munge one of them, and pipe BOTH back as input into diff?

As an example, taking one single program's stdout, obtaining two copies of it with tee and sending them both (one or preferably both able to be piped through other programs) back into vimdiff.
Bonus points if it can be done without having to create a file on disk.
I know how to direct input into a program that takes two inputs, like this
vimdiff <(curl http://google.com) <(curl http://archives.com/last_night/google.com)
and with tee for making two output streams
echo "abc" | tee >(sed 's/a/zzz/') >(sed 's/c/zzz/')
but I do not know how to connect the pipes back together into a diamond shape.
It's not so hard if you can use a fifo:
test -e fifo || mkfifo fifo
echo abc | tee >(sed s/a/zzz/ > fifo) | sed s/c/zzz/ | diff - fifo
Just as a side note, to have this work under ZSH an extra ">" is needed after tee (multios option should be set):
$ setopt multios
$ test -e fifo || mkfifo fifo
$ echo abc | tee > >(sed s/a/zzz/ > fifo) | sed s/c/zzz/ | diff - fifo

bash echo number of lines of file given in a bash variable without the file name

I have the following three constructs in a bash script:
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE)
echo $NUMOFLINES" lines"
echo $(wc -l $JAVA_TAGS_FILE)" lines"
echo "$(wc -l $JAVA_TAGS_FILE) lines"
And they both produce identical output when the script is run:
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
I.e. the name of the file is also echoed (which I don't want to). Why do these scriplets fail and how should I output a clean:
121711 lines
?
An Example Using Your Own Data
You can avoid having your filename embedded in the NUMOFLINES variable by using redirection from JAVA_TAGS_FILE, rather than passing the filename as an argument to wc. For example:
NUMOFLINES=$(wc -l < "$JAVA_TAGS_FILE")
Explanation: Use Pipes or Redirection to Avoid Filenames in Output
The wc utility will not print the name of the file in its output if input is taken from a pipe or redirection operator. Consider these various examples:
# wc shows filename when the file is an argument
$ wc -l /etc/passwd
41 /etc/passwd
# filename is ignored when piped in on standard input
$ cat /etc/passwd | wc -l
41
# unusual redirection, but wc still ignores the filename
$ < /etc/passwd wc -l
41
# typical redirection, taking standard input from a file
$ wc -l < /etc/passwd
41
As you can see, the only time wc will print the filename is when its passed as an argument, rather than as data on standard input. In some cases, you may want the filename to be printed, so it's useful to understand when it will be displayed.
wc can't get the filename if you don't give it one.
wc -l < "$JAVA_TAGS_FILE"
You can also use awk:
awk 'END {print NR,"lines"}' filename
Or
awk 'END {print NR}' filename
(apply on Mac, and probably other Unixes)
Actually there is a problem with the wc approach: it does not count the last line if it does not terminate with the end of line symbol.
Use this instead
nbLines=$(cat -n file.txt | tail -n 1 | cut -f1 | xargs)
or even better (thanks gniourf_gniourf):
nblines=$(grep -c '' file.txt)
Note: The awk approach by chilicuil also works.
It's a very simple:
NUMOFLINES=$(cat $JAVA_TAGS_FILE | wc -l )
or
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE | awk '{print $1}')
I normally use the 'back tick' feature of bash
export NUM_LINES=`wc -l filename`
Note the 'tick' is the 'back tick' e.g. ` not the normal single quote

Resources