Interacting with awk while processing a pipe

Interacting with awk while processing a pipe - bash

GIs there a way to make awk interactive when it is processing /dev/stdin via a pipe.
Imagine I have a program which continuously generates data. Example :
$ od -vAn -tu2 -w2 < /dev/urandom
2357
60431
19223
...
This data is being processed by a very advanced awk script by means of a pipe :
$ od -vAn -tu2 -w2 < /dev/urandom | awk '{print}'
Question: Is it possible to make this awk program interactive such that :
The program continuously prints output
When a single key is pressed (eg. z), it starts to output only 0 for each line it reads from the pipe.
When the key is pressed again, it continues to output the original data, obviously skipping the already processed records it printed as 0.
Problems:
/dev/stdin (also referenced as -) is already in use, so the keyboard interaction needs to be picked up with /dev/tty or is there another way?
getline key < "/dev/tty" awaits until RS is encountered, so in the default case you need to press two keys (z and Enter) :
$ awk 'BEGIN{ getline key < "/dev/tty"; print key}'
This is acceptable, but I would prefer a single key-press.
So, is it possible to set RS locally such that getline reads a single character? This way we could locally modify RS and reset it after the getline. Another way might be using the shell function read. But it is incompatible between bash and zsh.
getline awaits for input until the end-of-time. So it essentially stops the processing of the pipe. There is a gawk extention which allows you to set a timeout, but this is only available since gawk 4.2. So I believe this could potentially work :
awk '{print p ? 0 : $0 }
{ PROCINFO["/dev/tty", "READ_TIMEOUT"]=1;
while (getline key < "/dev/tty") p=key=="z"?!p:p
}
However, I do not have access to gawk 4.2 (update: this does not work)
Requests:
I would prefer a full POSIX compliant version, which is or entirely awk or using POSIX compliant system calls
If this is not possible, gawk extensions prior to 3.1.7 can be used and shell independent system calls.
As a last resort, I would accept any shell-awk construct which would make this possible under the single condition that the data is only read continuously by awk (so I'm thinking multiple pipes here).

After some searching, I came up with a Bash script that allows doing this. The idea is to inject a unique identifiable string into the pipe that awk is processing. Both the original program od and the bash script write to the pipe. In order not to mangle that data, I used stdbuf to run the program od line-buffered. Furthermore, since it is the bash-script that handles the key-press, both the original program and the awk script have to run in the background. Therefore a clean exit strategy needs to be in place. Awk will exit when the key q is pressed, while od will terminate automatically when awk is terminated.
In the end, it looks like this :
#!/usr/bin/env bash
# make a fifo which we use to inject the output of data-stream
# and the key press
mkfifo foo
# start the program in line-buffer mode, writing to FIFO
# and run it in the background
stdbuf -o L od -vAn -tu2 -w2 < /dev/urandom > foo &
# run the awk program that processes the identified key-press
# also run it in the background and insert a clear EXIT strategy
awk '/key/{if ($2=="q") exit; else p=!p}
!p{print}
p{print 0}' foo &
# handle the key pressing
# if a key is pressed inject the string "key <key>" into the FIFO
# use "q" to exit
while true; do
read -rsn1 key
echo "key $key" > foo
[[ $key == "q" ]] && exit
done
note: I ignored the concept that the key has to be z
Some useful posts :
Unix/Linux pipe behavior when reading process terminates before writing process
shell script respond to keypress

Related

Avoid capturing input on a script that stores output of command and functions

I have a script that stores the output of commands, functions, and other scripts in a log file.
I want to avoid capturing user input.
The line that is in charge of storing the output of the commands to a logfile is this one:
$command 2>&1 | tee /dev/tty | ruby -pe 'print Time.now.strftime("[%s] ") if !$stdin.tty?' >> "$tempfile"
If the command is a function or a script that asks for user input and prints out those data, that input is stored in temporary file. I would like to avoid that since I don't want to capture sensible data.
I can't modify the commands, functions that I'm wrapping.

Your command only saves program output, not user input. The problem you're seeing is that the command has chosen to output whatever the user inputs, merging it into its own output that is then obviously logged.
There is no good way around this. Please fix your command.
Anyways. Here's a bad, fragile, hacky way around it:
tempfile=test.txt
command='read -ep Enter_some_input: '
$command 2>&1 |
tee /dev/tty |
python3 -c $'import os\nwhile s:=os.read(0, 1024):\n if len(s) > 3: os.write(1, s)' |
ruby -pe 'print Time.now.strftime("[%s] ") if !$stdin.tty?' >> "$tempfile"
The Python command drops all reads of 3 bytes or less. This aims to remove character by character echo as would happen in the most basic cases of a user typing into readline and similar, while hopefully not removing too much intentional output.

pipe operator between two blocks

I've found an interesting bash script that with some modifications would likely solve my use case. But I'm unsure if I understand how it works, in particular the pipe between the blocks.
How do these two blocks work together, and what is the behaviour of the pipe that separates them?
function isTomcatUp {
# Use FIFO pipeline to check catalina.out for server startup notification rather than
# ping with an HTTP request. This was recommended by ForgeRock (Zoltan).
FIFO=/tmp/notifytomcatfifo
mkfifo "${FIFO}" || exit 1
{
# run tail in the background so that the shell can
# kill tail when notified that grep has exited
tail -f $CATALINA_HOME/logs/catalina.out &
# remember tail's PID
TAILPID=$!
# wait for notification that grep has exited
read foo <${FIFO}
# grep has exited, time to go
kill "${TAILPID}"
} | {
grep -m 1 "INFO: Server startup"
# notify the first pipeline stage that grep is done
echo >${FIFO}
}
# clean up
rm "${FIFO}"
}
Code Source: https://www.manthanhd.com/2016/01/15/waiting-for-tomcat-to-start-up-in-a-script/

bash has a whole set of compound commands, which work much like simple commands. Most relevant here is that each compound command has its own standard input and standard output.
{ ... } is one such compound command. Each command inside the group inherits its standard input and output from the group, so the effect is that the standard output of a group is the concatenation of its children's standard output. Likewise, each command inside reads in turn from the group's standard input. In your example, nothing interesting happens, because grep consumes all of the standard input and no other command tries to read from it. But consider this example:
$ cat tmp.txt
foo
bar
$ { read a; read b; echo "$b then $a"; } < tmp.txt
bar then foo
The first read gets a single line from standard input, and the second read gets the second. Importantly, the first read consumes a line of input before the second read could see it. Contrast this with
$ read a < tmp.txt
$ read b < tmp.txt
where a and b will both contain foo, because each read command opens tmp.txt anew and both will read the first line.

The { …; } operations groups the commands such that the I/O redirections apply to all the commands within it. The { must be separate as if it were a command name; the } must be preceded by either a semicolon or a newline and be separate too. The commands are not executed in a sub-shell, unlike ( … ) which also has some syntactic differences.
In your script, you have two such groupings connected by a pipe. Because of the pipe, each group is in a sub-shell, but it is not in a sub-shell because of the braces.
The first group runs tail -f on a file in background, and then waits for a FIFO to be closed so it can kill the tail -f. The second part looks for the first occurrence of some specific information and when it finds it, stops reading and writes to the FIFO to free everything up.
As with any pipeline, the exit status is the status of the last group — which is likely to be 0 because the echo succeeds.

send EXIT code as variable to next command in a pipeline [bash]

I have something like this:
while read line
do command1 $line | awk -v l="$line" '
...
... awk program here doing something ...
...
'
done < inputfile.txt
Now, command1 will have three possible EXIT code statuses (0,1,2), and depending
on which one occures, processing in awk program will be different.
So, EXIT code serves the purpose of a logic in awk program.
My problem is, I don't know how to pass that EXIT code from command1
to my awk program.
I thought maybe there is a way of passing that EXIT code as a var, along with that
line:
-v l="$line" -v l="$EXITCODE", but did not find the way of doing it.
After exiting from while do done loop, I still need access to
${PIPESTATUS[0]} (original exit status of a command1 command) variable, as I am doing some more stuff after.
ADDITION (LOGIC CHANGED)
while read line
do
stdout=$(command $line);
case $? in
0 )
# need to call awk program
awk -v l="$line" -v stdout="$stdout" -f horsepower
;;
1 )
# no need to call awk program here..
;;
2 )
# no need to call awk program here..
;;
;;
*)
# something unimportant
esac
done < inputfile.txt
So as you can see, I changed a logic a bit, so now I am doing my EXIT status code logic outside of awk
program (now I made it as a separate program in its own file called
horsepower), and need only to call it in case 0), to do processing on stdout output generated from previous command.
horsepower has few lines like:
#! /bin/awk -f
/some regex/ { some action }
, and based on what it finds it acts appropriately. But how to make it now act
on that stdout?

Don't Use PIPESTATUS Array
You can use the Bash PIPESTATUS array to capture the exit status of the last foreground pipeline. The trick is knowing which array element you need. However, you can't capture the exit status of the current pipeline this way. For example:
$ false | echo "${PIPESTATUS[0]}"
0
The first time you run this, the exit status will be 0. However, subsequent runs will return 1, because they're showing the status of the previous command.
Use Separate Commands
You'd be much better off using $? to display the exit status of the previous command. However, that will preclude using a pipeline. You will need to split your commands, perhaps storing your standard output in a variable that you can then feed to the next command. You can also feed the exit status of the last command into awk in the same way.
Since you only posted pseudo-code, consider this example:
$ stdout=$(grep root /etc/passwd); \
awk -v status=$? \
-v stdout="$stdout" \
'BEGIN {printf "Status: %s\nVariable: %s\n", status, stdout}'
Status: 0
Variable: root:x:0:0:root,,,:/root:/bin/bash
Note the semicolon separating the commands. You wouldn't necessarily need the semicolon inside a script, but it allows you to cut and paste the example as a one-liner with separate commands.
The output of your command is stored in stdout, and the exit status is stored in $?. Both are declared as variables to awk, and available within your awk script.
Updated to Take Standard Input from Variable
The OP updated the original question, and now wants standard input to come from the data stored in the stdout variable that is storing the results of the command. This can be done with a here-string. The case statement in the updated question can be modified like so:
while read line; do
stdout=$(command $line)
case $? in
0) awk -v l="$line" -f horsepower <<< "$stdout" ;;
# other cases, as needed
esac
done
In this example, the the stdout variable captures standard output from the command stored in line, and then reuses the variable as standard input to the awk script named "horsepower." There are a number of ways this sort of command evaluation can break, but this will work for the OP's use case as currently posted.

Is it possible to make changes to a line written to STDOUT in shell?

Is it possible to make changes to a line written to STDOUT in shell, similar to the way many programs such as scp do?
The point would be to allow me to essentially have a ticker, or a monitor of some sort, without it scrolling all over the screen.

You can manipulate the terminal with control characters and ANSI escape codes. For example \b returns the cursor one position back, and \r returns it to the beginning of the line. This can be used to make a simple ticker:
for i in $(seq 10)
do
echo -en "Progress... $i\r" # -e is needed to interpret escape codes
sleep 1
done
echo -e "\nDone."
With ANSI escape codes you can do even more, like clear part of the screen, jump to any position you want, and change the output color.

You can overwrite the last printed line by printing the \r character.
For instance this:
for i in `seq 1 10`; do
echo -n $i;
sleep 1;
echo -n -e "\r" ;
done
Will print 1 then update it with 2 and so on until 10.

You can do modify the output of stdout using another program in a pipeline. When you run the program you use | to pipe the input into the next program. The next program can do whatever it wants with the output. A general purpose program for modifying the output of a program is sed, or you could write something yourself that modifies the data from the previous program.
A shell program would be something like:
while read line; do
# do something with $line and output the results
done
so you can just:
the_original_program | the_above_program

How can I send the stdout of one process to multiple processes using (preferably unnamed) pipes in Unix (or Windows)?

I'd like to redirect the stdout of process proc1 to two processes proc2 and proc3:
proc2 -> stdout
/
proc1
\
proc3 -> stdout
I tried
proc1 | (proc2 & proc3)
but it doesn't seem to work, i.e.
echo 123 | (tr 1 a & tr 1 b)
writes
b23
to stdout instead of
a23
b23

Editor's note:
- >(…) is a process substitution that is a nonstandard shell feature of some POSIX-compatible shells: bash, ksh, zsh.
- This answer accidentally sends the output process substitution's output through the pipeline too: echo 123 | tee >(tr 1 a) | tr 1 b.
- Output from the process substitutions will be unpredictably interleaved, and, except in zsh, the pipeline may terminate before the commands inside >(…) do.
In unix (or on a mac), use the tee command:
$ echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null
b23
a23
Usually you would use tee to redirect output to multiple files, but using >(...) you can
redirect to another process. So, in general,
$ proc1 | tee >(proc2) ... >(procN-1) >(procN) >/dev/null
will do what you want.
Under windows, I don't think the built-in shell has an equivalent. Microsoft's Windows PowerShell has a tee command though.

Like dF said, bash allows to use the >(…) construct running a command in place of a filename. (There is also the <(…) construct to substitute the output of another command in place of a filename, but that is irrelevant now, I mention it just for completeness).
If you don't have bash, or running on a system with an older version of bash, you can do manually what bash does, by making use of FIFO files.
The generic way to achieve what you want, is:
decide how many processes should receive the output of your command, and create as many FIFOs, preferably on a global temporary folder:
subprocesses="a b c d"
mypid=$$
for i in $subprocesses # this way we are compatible with all sh-derived shells
do
mkfifo /tmp/pipe.$mypid.$i
done
start all your subprocesses waiting input from the FIFOs:
for i in $subprocesses
do
tr 1 $i </tmp/pipe.$mypid.$i & # background!
done
execute your command teeing to the FIFOs:
proc1 | tee $(for i in $subprocesses; do echo /tmp/pipe.$mypid.$i; done)
finally, remove the FIFOs:
for i in $subprocesses; do rm /tmp/pipe.$mypid.$i; done
NOTE: for compatibility reasons, I would do the $(…) with backquotes, but I couldn't do it writing this answer (the backquote is used in SO). Normally, the $(…) is old enough to work even in old versions of ksh, but if it doesn't, enclose the … part in backquotes.

Unix (bash, ksh, zsh)
dF.'s answer contains the seed of an answer based on tee and output process substitutions
(>(...)) that may or may not work, depending on your requirements:
Note that process substitutions are a nonstandard feature that (mostly)
POSIX-features-only shells such as dash (which acts as /bin/sh on Ubuntu,
for instance), do not support. Shell scripts targeting /bin/sh should not rely on them.
echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null
The pitfalls of this approach are:
unpredictable, asynchronous output behavior: the output streams from the commands inside the output process substitutions >(...) interleave in unpredictable ways.
In bash and ksh (as opposed to zsh - but see exception below):
output may arrive after the command has finished.
subsequent commands may start executing before the commands in the process substitutions have finished - bash and ksh do not wait for the output process substitution-spawned processes to finish, at least by default.
jmb puts it well in a comment on dF.'s answer:
be aware that the commands started inside >(...) are dissociated from the original shell, and you can't easily determine when they finish; the tee will finish after writing everything, but the substituted processes will still be consuming the data from various buffers in the kernel and file I/O, plus whatever time is taken by their internal handling of data. You can encounter race conditions if your outer shell then goes on to rely on anything produced by the sub-processes.
zsh is the only shell that does by default wait for the processes run in the output process substitutions to finish, except if it is stderr that is redirected to one (2> >(...)).
ksh (at least as of version 93u+) allows use of argument-less wait to wait for the output process substitution-spawned processes to finish.
Note that in an interactive session that could result in waiting for any pending background jobs too, however.
bash v4.4+ can wait for the most recently launched output process substitution with wait $!, but argument-less wait does not work, making this unsuitable for a command with multiple output process substitutions.
However, bash and ksh can be forced to wait by piping the command to | cat, but note that this makes the command run in a subshell. Caveats:
ksh (as of ksh 93u+) doesn't support sending stderr to an output process substitution (2> >(...)); such an attempt is silently ignored.
While zsh is (commendably) synchronous by default with the (far more common) stdout output process substitutions, even the | cat technique cannot make them synchronous with stderr output process substitutions (2> >(...)).
However, even if you ensure synchronous execution, the problem of unpredictably interleaved output remains.
The following command, when run in bash or ksh, illustrates the problematic behaviors (you may have to run it several times to see both symptoms): The AFTER will typically print before output from the output substitutions, and the output from the latter can be interleaved unpredictably.
printf 'line %s\n' {1..30} | tee >(cat -n) >(cat -n) >/dev/null; echo AFTER
In short:
Guaranteeing a particular per-command output sequence:
Neither bash nor ksh nor zsh support that.
Synchronous execution:
Doable, except with stderr-sourced output process substitutions:
In zsh, they're invariably asynchronous.
In ksh, they don't work at all.
If you can live with these limitations, using output process substitutions is a viable option (e.g., if all of them write to separate output files).
Note that tzot's much more cumbersome, but potentially POSIX-compliant solution also exhibits unpredictable output behavior; however, by using wait you can ensure that subsequent commands do not start executing until all background processes have finished.
See bottom for a more robust, synchronous, serialized-output implementation.
The only straightforward bash solution with predictable output behavior is the following, which, however, is prohibitively slow with large input sets, because shell loops are inherently slow.
Also note that this alternates the output lines from the target commands.
while IFS= read -r line; do
tr 1 a <<<"$line"
tr 1 b <<<"$line"
done < <(echo '123')
Unix (using GNU Parallel)
Installing GNU parallel enables a robust solution with serialized (per-command) output that additionally allows parallel execution:
$ echo '123' | parallel --pipe --tee {} ::: 'tr 1 a' 'tr 1 b'
a23
b23
parallel by default ensures that output from the different commands doesn't interleave (this behavior can be modified - see man parallel).
Note: Some Linux distros come with a different parallel utility, which won't work with the command above; use parallel --version to determine which one, if any, you have.
Windows
Jay Bazuzi's helpful answer shows how to do it in PowerShell. That said: his answer is the analog of the looping bash answer above, it will be prohibitively slow with large input sets and also alternates the output lines from the target commands.
bash-based, but otherwise portable Unix solution with synchronous execution and output serialization
The following is a simple, but reasonably robust implementation of the approach presented in tzot's answer that additionally provides:
synchronous execution
serialized (grouped) output
While not strictly POSIX compliant, because it is a bash script, it should be portable to any Unix platform that has bash.
Note: You can find a more full-fledged implementation released under the MIT license in this Gist.
If you save the code below as script fanout, make it executable and put int your PATH, the command from the question would work as follows:
$ echo 123 | fanout 'tr 1 a' 'tr 1 b'
# tr 1 a
a23
# tr 1 b
b23
fanout script source code:
#!/usr/bin/env bash
# The commands to pipe to, passed as a single string each.
aCmds=( "$#" )
# Create a temp. directory to hold all FIFOs and captured output.
tmpDir="${TMPDIR:-/tmp}/$kTHIS_NAME-$$-$(date +%s)-$RANDOM"
mkdir "$tmpDir" || exit
# Set up a trap that automatically removes the temp dir. when this script
# exits.
trap 'rm -rf "$tmpDir"' EXIT
# Determine the number padding for the sequential FIFO / output-capture names,
# so that *alphabetic* sorting, as done by *globbing* is equivalent to
# *numerical* sorting.
maxNdx=$(( $# - 1 ))
fmtString="%0${#maxNdx}d"
# Create the FIFO and output-capture filename arrays
aFifos=() aOutFiles=()
for (( i = 0; i <= maxNdx; ++i )); do
printf -v suffix "$fmtString" $i
aFifos[i]="$tmpDir/fifo-$suffix"
aOutFiles[i]="$tmpDir/out-$suffix"
done
# Create the FIFOs.
mkfifo "${aFifos[#]}" || exit
# Start all commands in the background, each reading from a dedicated FIFO.
for (( i = 0; i <= maxNdx; ++i )); do
fifo=${aFifos[i]}
outFile=${aOutFiles[i]}
cmd=${aCmds[i]}
printf '# %s\n' "$cmd" > "$outFile"
eval "$cmd" < "$fifo" >> "$outFile" &
done
# Now tee stdin to all FIFOs.
tee "${aFifos[#]}" >/dev/null || exit
# Wait for all background processes to finish.
wait
# Print all captured stdout output, grouped by target command, in sequences.
cat "${aOutFiles[#]}"

Since #dF: mentioned that PowerShell has tee, I thought I'd show a way to do this in PowerShell.
PS > "123" | % {
$_.Replace( "1", "a"),
$_.Replace( "2", "b" )
}
a23
1b3
Note that each object coming out of the first command is processed before the next object is created. This can allow scaling to very large inputs.

You can also save the output in a variable and use that for the other processes:
out=$(proc1); echo "$out" | proc2; echo "$out" | proc3
However, that works only if
proc1 terminates at some point :-)
proc1 doesn't produce too much output (don't know what the limits are there but it's probably your RAM)
But it is easy to remember and leaves you with more options on the output you get from the processes you spawned there, e. g.:
out=$(proc1); echo $(echo "$out" | proc2) / $(echo "$out" | proc3) | bc
I had difficulties doing something like that with the | tee >(proc2) >(proc3) >/dev/null approach.

another way to do would be,
eval `echo '&& echo 123 |'{'tr 1 a','tr 1 b'} | sed -n 's/^&&//gp'`
output:
a23
b23
no need to create a subshell here

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio