How can I show the results of each step in a pipeline of continuous transformations without looping? - bash

Consider the following example Bash one-liner in which the letters "h", "e" and "o" are removed from the word "hello" one at a time, in that order. Only the two "l" letters remain;
$ echo "hello" | tr -d h | tr -d e | tr -d o
ll
I am trying to find a method for displaying the output of each command to the screen within the one liner, so others running it can see what is going on. Continuing with the above example I would like output as follows;
$ echo "hello" | tr -d h | tr -d e | tr -d o
hello
ello
llo
ll
Is this possible? As per the operation of the one-liner above, we are carrying the output from command to command with the vertical pipe. So I assume I would have to break from the pipe to print to stdout, which would then interrupt the "command chain" I have written. Or perhaps tee can be used here, but I can't seem to achieve the desire affect. UPDATE: tee won't work because it's output is still within the boundaries of the pipe, duh!
Many thanks.

This only works on a terminal:
echo hello | tee /dev/tty |
tr -d h | tee /dev/tty |
tr -d e | tee /dev/tty |
tr -d o
The /dev/tty device redirects output to the current terminal, no matter where the normal output goes.

You can tee the output directly to the terminal:
echo hello | tee /proc/$$/fd/1 |
tr -d h | tee /proc/$$/fd/1 |
tr -d e | tee /proc/$$/fd/1 |
tr -d o
$$ is the shell's PID.

IMO, if this is real world problem and not how is it possible to do in bash, then instead of the bash madness starting X tr and Y tee processes, you can try a simple perl oneliner:
echo hello | perl -nale 'foreach $i (qw(h e o)) {#F=map{s/$i//g;print $_;$_}#F}'
will print:
ello
llo
ll
or if your really want bash (as #chepner suggested)
echo "hello" | while read w; do for c in h e o; do w=${w//$c};echo $w; done;done
or
while read w; do for c in h e o; do w=${w//$c};echo $w; done;done < wordlist

With a small little loop:
w="hello" ; for c in h e o .; do echo $w; w=$(echo $w | tr -d $c); done
The . is only used for a brief solution. After reading the question more carefully, I tested and found, that it works in a pipe chain too:
w="hello" ; for c in h e o .; do echo $w; w=$(echo $w | tr -d $c); done | less
# pure bash buildins, see chepner's comment
w="hello" ; for c in h e o .; do echo $w; w=${w/$c}; done

But this WILL work:
tmp=`echo "hello"`; echo $tmp; tmp=`echo $tmp | tr -d h`; echo $tmp; tmp=`echo $tmp | tr -d e`; echo $tmp; echo $C | tr -d o
It's ugly, but it works. The solution creates variables to store each pass and then show them all:
tmp=`echo "hello"`
echo $tmp
tmp=`echo $tmp | tr -d h`
echo $tmp
tmp=`echo $tmp | tr -d e`
echo $tmp
echo $C | tr -d o
I cannot find a better solution.

You could do it using a fifo (or a regular file) on which you listen:
mkfifo tmp.fifo
tail -f tmp.fifo
Then just run your command on a separate shell:
echo hello | tee -a tmp.fifo | tr -d e | tee -a tmp.fifo | tr -d o

You can use tee - as - represents the console:
echo "hello" | tee - | tr -d h | tee - | tr -d e | tee - | tr -d o
Hope that helped!
EDIT:
This solution WILL NOT WORK because stdout gets blocked and reused by all the commands that produce output by means of stdout.
I will not delete the answer as it clearly states that it doesn't solve the problem but ALSO SHOWS that this kind of solution is NOT VALID and why. If someone finds himself in the same situation will discover this information, and that will be useful.

Room for one more?
echo "hello" | tee >(tr -d h | tee >(tr -d e | tee >(tr -d o) ) )
or
echo "hello" |
tee >(tr -d h |
tee >(tr -d e |
tee >(tr -d o ) ) )
output is:
hello
ubuntu#ubuntu:~$ ello
llo
ll
a little persuasive coercion is required to eliminate corruption by ubuntu#ubuntu:~$
cat <(echo "hello" | tee >(tr -d h | tee >(tr -d e | tee >(tr -d o) ) ) )
output is:
hello
ello
llo
ll

Related

How to find all non-dictionary words in a file in bash/zsh?

I'm trying to find all words in a file that don't exist in the dictionary. If I look for a single word the following works
b=ther; look $b | grep -i "^$b$" | ifne -n echo $b => ther
b=there; look $b | grep -i "^$b$" | ifne -n echo $b => [no output]
However if I try to run a "while read" loop
while read a; do look $a | grep -i "^$a$" | ifne -n echo "$a"; done < <(tr -s '[[:punct:][:space:]]' '\n' <lotr.txt |tr '[:upper:]' '[:lower:]')
The output seems to contain all (?) words in the file. Why doesn't this loop only output non-dictionary words?
Regarding ifne
If stdin is non-empty, ifne -n reprints stdin to stdout. From the manpage:
-n Reverse operation. Run the command if the standard input is empty
Note that if the standard input is not empty, it is passed through
ifne in this case.
strace on ifne confirms this behavior.
Alternative
Perhaps, as an alternative:
1 #!/bin/bash -e
2
3 export PATH=/bin:/sbin:/usr/bin:/usr/sbin
4
5 while read a; do
6 look "$a" | grep -qi "^$a$" || echo "$a"
7 done < <(
8 tr -s '[[:punct:][:space:]]' '\n' < lotr.txt \
9 | tr '[A-Z]' '[a-z]' \
10 | sort -u \
11 | grep .
12 )

Echo the command result in a file.txt

I have a script such as :
cat list_id.txt | while read line; do for ACC in $line;
do
echo -n "$ACC\t"
curl -s "link=fasta&retmode=xml" |\
grep TSeq_taxid |\
cut -d '>' -f 2 |\
cut -d '<' -f 1 |\
tr -d "\n"
echo
sleep 0.25
done
done
This script allows me from a list of ID in list_id.txt to get the corresponding names in a database in https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ACC}&rettype=fasta&retmode=xml
So from this script I get something like
CAA42669\t9913
V00181\t7154
AH002406\t538120
And what I would like is directly to print or echo this result in fiel call new_ids.txt, I tried echo >> new_ids.txt but the file is empty.
Thanks for your help.
A minimal refactoring of your script might look like
# Avoid useless use of cat
# Use read -r
# Don't use upper case for private variables
while read -r line; do
for acc in $line; do
echo -n "$acc\t"
# No backslash necessary after | character
curl -s "link=fasta&retmode=xml" |
# Probably use a proper XML parser for this
grep TSeq_taxid |
cut -d '>' -f 2 |
cut -d '<' -f 1 |
tr -d "\n"
echo
sleep 0.25
done
done <list_id.txt >new_ids.txt
This could probably still be simplified significantly, but without knowledge of what your input file looks like exactly, or what curl returns, this is somewhat speculative.
tr -s ' \t\n' '\n' <list_id.txt |
while read -r acc; do
curl -s "link=fasta&retmode=xml" |
awk -v acc="$acc" '/TSeq_taxid/ {
split($0, a, /[<>]/); print acc "\t" a[3] }'
sleep 0.25
done <list_id.txt >new_ids.txt

is there a way other than read/echo to read one line from a file descriptor to stdout without closing the fd?

I ran into a situation where I was doing:
outputStuff |
filterStuff |
transformStuff |
doMoreStuff |
endStuff > endFile
I want to be able to insert some debug tracing stuff in a fashion like:
tee debugFile.$((debugNum++)) |
but obviously the pipes create subshells, so I wanted to do this instead.
exec 5< <(seq 1 100)
outputStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
filterStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
transformStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
doMoreStuff |
endStuff > endFile
ie, I want the debug line I insert to be identical, so I don't have to worry about stepping on various stuff. The read/REPLY echo seems really ugly.. I suppose I could wrap in a function.. but is there a way to read one line from a file descriptor to stdout without closing the fd (like a head -1 would close the fd is I did head -1 <&3)
I tried several things, but in the end #Etan Reisner proved to me (unintentionally) that even if there is a way to do what you asked (clever, Etan), it's not what you actually want. If you want to be sure to read the numbers back sequentially then the reads have to be serialized, which the commands in a pipeline are not.
Indeed, that applies to your original approach as well, since command substitutions are performed in subshells. I think you could reliably do it like this, though:
debugum=1
eval "
outputStuff |
tee debugFile.$((debugNum++)) |
filterStuff |
transformStuff |
doMoreStuff |
tee debugFile.$((debugNum++)) |
endStuff > endFile
"
That way all the substitutions are performed by the parent shell, on the string, before any of the commands is launched.
Put all the reads inside a single compound statement, and redirect input from descriptor 5 for that.
{ outputStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
filterStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
transformStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
doMoreStuff |
endStuff
} 5< <(seq 1 100) > endFile
Now, file descriptor 5 is opened once (and closed once), and each call to read gets successive lines from that descriptor.
(You can also simplify this a bit; unless you are providing outputStuff with input via the keyboard, there doesn't seem to be a need to use file descriptor 5 instead of standard input, since only outputStuff is reading from standard input. All the other programs are reading their standard input via the pipeline.)
Here's an approach that doesn't need seq at all:
Define a function that constructs your pipeline recursively.
buildWithDebugging() {
local -a nextCmd=( )
local debugNum=$1; shift
while (( $# )) && [[ $1 != '|' ]]; do
nextCmd+=( "$1" )
shift
done
if (( ${#nextCmd[#]} )); then
"${nextCmd[#]}" \
| tee "debugFile.$debugNum" \
| buildWithDebugging "$((debugNum + 1))" "$#"
else
cat # noop
fi
}
...and, to use it:
buildWithDebugging 0 \
outputStuff '|'
filterStuff '|'
transformStuff '|'
doMoreStuff '|'
endStuff > endFile
A more secure version would use pipeline components done in the style of Pascal strings rather than C strings -- which is to say, instead of using literal |s, preceding each string of commands with its length:
buildWithDebugging 0 \
1 outputStuff
3 filterStuff filterArg filterSecondArg
2 transformStuff filterArg
1 doMoreStuff
endStuff > endFile
Building this should be a completely trivial exercise for the reader. :)
At the cost of evenly padding your numbers you can do this with dd though you don't end up with a nicer looking command for your trouble. =)
exec 5< <(seq -w 10 -1 1)
echo -n |
{ echo "d 1:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 2:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 3:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 4:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
cat
You can also just use a shorter variable with read: read -u 5 a;echo $a but that only saves you two characters.

$(shell …) command in Makefile is not executed correctly, but works in bash/sh

I want to count the number of nodes in a graphviz file in a Makefile to use it to start a process for each node.
When I run
grep -- -\> graph.gv | while read line; do for w in $line; do echo $w; done; done | grep [Aa-Zz] | sort | uniq | wc -l
in the shell, it prints the number of nodes as expected.
However, when I use it in my Makefile
NODES := $(shell grep -- -\> graph.gv | while read line; do for w in $line; do echo $w; done; done | grep [Aa-Zz] | sort | uniq | wc -l)
${NODES} is always 0.
You'll need to escape the $ sign. Say:
NODES := $(shell grep -- -\> graph.gv | while read line; do for w in $$line; do echo $$w; done; done | grep [Aa-Zz] | sort | uniq | wc -l)

How to assign output of multiple shell commmands to variable when using tee?

I want to tee and get the results from multiple shell commands connected in the pipeline. I made a simple example to explain the point. Suppose I wanna count the numbers of 'a', 'b' and 'c'.
echo "abcaabbcabc" | tee >(tr -dc 'a' | wc -m) >(tr -dc 'b' | wc -m) >(tr -dc 'c' | wc -m) > /dev/null
Then I tried to assign the result from each count to a shell variable, but they all end up empty.
echo "abcaabbcabc" | tee >(A=$(tr -dc 'a' | wc -m)) >(B=$(tr -dc 'b' | wc -m)) >(C=$(tr -dc 'c' | wc -m)) > /dev/null && echo $A $B $C
What is the right way to do it?
Use files. They are the single most reliable solution. Any of the commands may need different time to run. There is no easy way to synchronize command redirections. Then most reliable way is to use a separate "entity" to collect all the data:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
tee >(tr -dc 'a' | wc -m > "$tmpa") >(tr -dc 'b' | wc -m > "$tmpb") |
tr -dc 'c' | wc -m > "$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
rm "$tmpa" "$tmpb" "$tmpc"
trap '' EXIT
Second way:
You can prepend the data from each stream with a custom prefix. Then sort all lines (basically, buffer them) on the prefix and then read them. The example script will generate only a single number from each process substitution, so it's easy to do:
read -r A B C < <(
echo "abcaabbcabc" |
tee >(
tr -dc 'a' | wc -m | sed 's/^/A /'
) >(
tr -dc 'b' | wc -m | sed 's/^/B /'
) >(
tr -dc 'c' | wc -m | sed 's/^/C /'
) >/dev/null |
sort |
cut -d' ' -f2 |
paste -sd' '
)
echo A="$A" B="$B" C="$C"
Using temporary files with flock to synchronize the output of child processes could look like this:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
(
flock 3
flock 4
flock 5
tee >(
tr -dc 'a' | wc -m |
{ sleep 0.1; cat; } > "$tmpa"
# unblock main thread
flock -u 3
) >(
tr -dc 'b' | wc -m |
{ sleep 0.2; cat; } > "$tmpb"
# unblock main thread
flock -u 4
) >(
tr -dc 'c' | wc -m |
{ sleep 0.3; cat; } > "$tmpc"
# unblock main thread
flock -u 5
) >/dev/null
# wait for subprocesses to finish
# need to re-open the files to block on them
(
flock 3
flock 4
flock 5
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
declare -p A B C
You can use this featured letter frequency analysis
#!/usr/bin/env bash
declare -A letter_frequency
while read -r v k; do
letter_frequency[$k]="$v"
done < <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
for k in "${!letter_frequency[#]}"; do
printf '%c = %d\n' "$k" "${letter_frequency[$k]}"
done
Output:
c = 3
b = 4
a = 4
Or to only assign $A, $B and $C as in your example:
#!/usr/bin/env bash
{
read -r A _
read -r B _
read -r C _
}< <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
printf 'a=%d\nb=%d\nc=%d\n' "$A" "$B" "$C"
grep -o '[[:alnum:]]': split each alphanumeric character on its own line
sort: sort lines of characters
uniq -c: count each instance and output count and character for each
< <( command group; ): the output of this command group is for stdin of the command group before
If you need to count occurrence of non-printable characters, newlines, spaces, tabs, you have to make all these commands output and deal with null delimited lists. It can sure be done with the GNU versions of these tools. I let it to you as an exercise.
Solution to the count arbitrary characters except null:
As demonstrated, works also with Unicode.
#!/usr/bin/env bash
declare -A character_frequency
declare -i v
while read -d '' -r -N 8 v && read -r -d '' -N 1 k; do
character_frequency[$k]="$v"
done < <(
grep --only-matching --null-data . <<<$'a¹bc✓ ✓\n\t\t\u263A☺ ☺ aabbcabc' |
head --bytes -2 | # trim the newline added by grep
sort --zero-terminated | # sort null delimited list
uniq --count --zero-terminated # count occurences of char (null delim)
)
for k in "${!character_frequency[#]}"; do
printf '%q = %d\n' "$k" "${character_frequency[$k]}"
done
Output:
$'\n' = 1
$'\t' = 2
☺ = 3
\ = 7
✓ = 2
¹ = 1
c = 3
b = 4
a = 4

Resources