Piping the same output through one or several grep commands on condition - bash

I am currently writing a bash script to modify the output of my LaTeX compilations to have only what I find relevant printing on the console. As I would like this script to be extremely thorough, I set up different options to toggle different output filters at the same time depending of the nature of the informations given through the compilation (Fatal error, warning, over/underfull h/vbox...).
For those who may not know, we often need to perform several compilations in a row to have a full LaTeX document with correct labels, page numbering, index, table of contents... + other commands like bibtex or makeglossaries for bibliography and, well, glossaries. I therefore have a loop that execute everything and stops if there is a fatal error encountered, but should continue if it is only a minor warning.
My main command line is piping the pdflatex output through a reversed grep that finds errors line (starting by !). Like this, the script stops only if grep found a fatal error.
: | pdflatex --halt-on-error $# | { ! grep --color=auto '^!.*' -A200; }
But when I activate any other filters (eg. '*.full.*' for over/underfull lines), I need to be able to continue compiling to be able to identify it there is a major necessity to correct it (hey, sometimes, underfull lines are just not that ugly...).
That means my grep command cannot be inverted as in the first line, and I cannot (or don't know how to) use the same grep with a different regex. notice that if if using a different grep, it should also be read from the pdflatex output and I cannot pipe it directly following the above snippet.
To sum up, it should roughly look like this :
pdflatex --> grep for fatal errors --> if more filters, grep for those filters
--> pass to next step
I came up with several attempts that did not work properly :
This one works only if I want to compile WITH the warnings. Looking only for errors does not work.
latex_compilation() {
: | pdflatex --halt-on-error $# | tee >({ ! grep --color=auto '^!.*' -A200; }) >({ grep --color=auto "$warnings_filter" -A5 };) >/dev/null
}
latex_compilation() {
: | pdflatex --halt-on-error $# | tee >({ ! grep --color=auto '^!.*' -A200; }) >/dev/null | ({ grep --color=auto "$warnings_filter" -A5 };)
}
or even desperately
latex_compilation() {
: | pdflatex --halt-on-error $# |
if [[ "$warnings_on" = true ]]; then
{ grep --color=auto "$warnings_filter" -A5 };
fi
{ ! grep --color=auto '^!.*' -A200; }
}
This one would work but uses 2 compilation processes for each step (you could easily go up to 7/8 compilations steps for a big and complex document). It should be avoided if possible.
latex_compilation() {
if [[ "$warnings_on" = true ]]; then
: | pdflatex --halt-on-error $# | \
{ grep --color=auto "$warnings_filter" -A5 };
fi
: | pdflatex --halt-on-error $# | \
{ ! grep --color=auto '^!.*' -A200; }
}
I spent hours looking for solutions online, but didn't find any yet.
I really hope this is clear enough because it is a mess to sum up, moreover writing it. You can find the relavant code here if needed for clarity.

This one would work but uses 2 compilation processes
So let's use one.
latex_compilation() {
local tmp
tmp=$(pdflatext ... <&-)
if [[ "$warnings_on" = true ]]; then
grep --color=auto "$warnings_filter" -A5 <<<"$tmp"
fi
! grep --color=auto '^!.*' -A200 <<<"$tmp"
}
Or you can do that asynchronously, by parsing the output, in your chosem programmign langauge. For Bash see https://mywiki.wooledge.org/BashFAQ/001 :
line_is_warning() { .... }
latex_compilation() {
local outputlines=0 failed
while IFS= read -r line; do
if "$warnings_on" && line_is_warning "$line"; do
outputlines=5 # will output 5 lines after
fi
if [[ "$line" =~ ^! ]]; then
failed=1
outputlines=200 # will output 200 lines after
fi
if ((outputlines != 0)); then
((outputlines--))
printf "%s\n" "$line"
fi
done < <(pdflatext ... <&-)
if ((failed)); then return 1; fi
}
But Bash will be extremely slow. Consider using AWK or Python or Perl.
looking for solutions online
Exactly, you have to write a solution yourself, for your specific requirements.
his one works only if I want to compile WITH the warnings. Looking only for errors does not work.
You can write whole code blocks inside >( ... ) and basically anywhere. The exit status of a pipeline is the exit status of rightmost command (except set -o pipefail). Put the failing command as the rightmost of the pipeline.
latex_compilation() {
pdflatex --halt-on-error "$#" <&- |
tee >(
if "$warnings_on"; then
grep --color=auto "$warnings_filter" -A5
else
cat >/dev/null
fi
) |
! grep --color=auto '^!.*' -A200
}

Suggesting to use awk filtering pattern.
Read more about awk filtering pattern here.
With awk you can create complex filtering patterns logic: !=not, &&=and, ||=or.
For example if you have 3 filtering RegExp patterns: Pattern_1, Pattern_2, Pattern 3.
Example 1
You can make a combined filter all 3 patterns in the following command:
awk '/Pattern_1/ && /Pattern_2/ && /Pattern_3/ 1' scanned_file1 scanned_file2 ...
The result will be printing only lines that match all 3 pattern.
Example 2
You can make a combined inverse filter all 3 pattern in the following command:
awk '!/Pattern_1/ && !/Pattern_2/ && !/Pattern_3/ 1' scanned_file1 scanned_file2 ...
The result will be printing lines not matching any of the 3 patterns.
Example 3
You can make a combined inverse filter Pattern_1 and match Pattern_2 or Pattern_3:
awk '!/Pattern_1/ && (/Pattern_2/ || /Pattern_3/)' scanned_file1 scanned_file2 ...
The result will be printing lines not matching Pattern_1 but match Pattern_2 or Pattern_3.

Related

Create a chained command line in bash function

I have a question: I would like to create a function that (dependantly from number of entered arguments) would create so called "cained" command line. The current code I wrote look as follow:
function ignore {
if [ -n "$#" ] && [ "$#" > 0 ]; then
count=$#
if [ ${count} -eq 1 ]; then
return "grep -iv $1"
else
for args in "$#" do
## Here should be code that would put (using pseudo code) as many "grep -iv $a | grep -iv $(a+1) | ... | grep -iv $(a+n)", where the part "$(a+..) represent the placeholder of next argument"
done
fi
fi
}
Any ideas? Thanks
Update
I would like to precise above. The above functions would become used as following:
some_bash_function | ignore
example:
apt-get cache search apache2 | ignore doc lib
Maybe this will help bit more
This seems horribly inefficient. A much better solution would look like grep -ive "${array[0]}" -e "${array[1]}" -e "${array[2]}" etc. Here's a simple way to build that.
# don't needlessly use Bash-only function declaration syntax
ignore () {
local args=()
local t
for t; do
args+=(-e "$t")
done
grep -iv "${args[#]}"
}
In the end, git status | ignore foo bar baz is not a lot simpler than git status | grep -ive foo -e bar -e baz so this function might not be worth these 116 bytes (spaces included). But hopefully at least this can work to demonstrate a way to build command lines programmatically. The use of arrays is important; there is no good way to preserve quoting of already quoted values if you smash everything into a single string.
A more sustainable solution still is to just combine everything into a single regex. You can do that with grep -iv 'foo\|bar\|baz' though personally, I would probably switch to the more expressive regex dialect of grep -E; grep -ivE 'foo|bar|baz'.
If you really wanted to build a structure of pipes, I guess a recursive function would work.
# FIXME: slow and ugly, prefer the one above
ignore_slowly () {
if [[ $# -eq 1 ]]; then
grep -iv "$1"
else
local t=$1
shift
grep -iv "$t" | ignore_slowly "$#"
fi
}
But generally, you want to minimize the number of processes you create.
Though inefficient, what you want can be done like this:
#!/bin/bash
ignore () {
printf -v pipe 'grep -iv %q | ' "$#"
pipe=${pipe%???} # remove trailing ' | '
bash -c "$pipe"
}
ignore 'regex1' 'regex2' … 'regexn' < file

Calling bash script from bash script

I have made two programms and I'm trying to call the one from the other but this is appearing on my screen:
cp: cannot stat ‘PerShip/.csv’: No such file or directory
cp: target ‘tmpship.csv’ is not a directory
I don't know what to do. Here are the programms. Could somebody help me please?
#!/bin/bash
shipname=$1
imo=$(grep "$shipname" shipsNAME-IMO.txt | cut -d "," -f 2)
cp PerShip/$imo'.csv' tmpship.csv
dist=$(octave -q ShipDistance.m 2>/dev/null)
grep "$shipname" shipsNAME-IMO.txt | cut -d "," -f 2 > IMO.txt
idnumber=$(cut -b 4-10 IMO.txt)
echo $idnumber,$dist
#!/bin/bash
rm -f shipsdist.csv
for ship in $(cat shipsNAME-IMO.txt | cut -d "," -f 1)
do
./FindShipDistance "$ship" >> shipsdist.csv
done
cat shipsdist.csv | sort | head -n 1
The code and error messages presented suggest that the second script is calling the first with an empty command-line argument. That would certainly happen if input file shipsNAME-IMO.txt contained any empty lines or otherwise any lines with an empty first field. An empty line at the beginning or end would do it.
I suggest
using the read command to read the data, and manipulating IFS to parse out comma-delimited fields
validating your inputs and other data early and often
making your scripts behave more pleasantly in the event of predictable failures
More generally, using internal Bash features instead of external programs where the former are reasonably natural.
For example:
#!/bin/bash
# Validate one command-line argument
[[ -n "$1" ]] || { echo empty ship name 1>&2; exit 1; }
# Read and validate an IMO corresponding to the argument
IFS=, read -r dummy imo tail < <(grep -F -- "$1" shipsNAME-IMO.txt)
[[ -f PerShip/"${imo}.csv" ]] || { echo no data for "'$imo'" 1>&2; exit 1; }
# Perform the distance calculation and output the result
cp PerShip/"${imo}.csv" tmpship.csv
dist=$(octave -q ShipDistance.m 2>/dev/null) ||
{ echo "failed to compute ship distance for '${imo}'" 2>&1; exit 1; }
echo "${imo:3:7},${dist}"
and
#!/bin/bash
# Note: the original shipsdist.csv will be clobbered
while IFS=, read -r ship tail; do
# Ignore any empty ship name, however it might arise
[[ -n "$ship" ]] && ./FindShipDistance "$ship"
done < shipsNAME-IMO.txt |
tee shipsdist.csv |
sort |
head -n 1
Note that making the while loop in the second script part of a pipeline will cause it to run in a subshell. That is sometimes a gotcha, but it won't cause any problem in this case.

Avoiding temporary files while iterating over pipeline results

I've created the function to display interfaces and IP's per interface
network() {
iplst() {
ip a show "$i" | grep -oP "inet\s+\K[\w./]+" | grep -v 127
}
ip ntable | grep -oP "dev\s+\K[\w./]+"| grep -v lo | sort -u >> inf_list
netlist="inf_list"
while read -r i
do
infd=$i
paste <(echo -e $i) <(iplst)
done < $netlist
}
Current output:
ens32 10.0.0.2/24
10.0.0.4/24
10.0.0.20/24
ens33 192.168.1.3/24
ens34 192.168.0.2/24
ens35 192.168.2.149/24
but would like to avoid creation of temp files,
would appreciate suggestions
In general, temporary files can be replaced with process substitution. For instance, to avoid the inf_list temporary file, one can generate its contents with an inf_list function:
build_inf_list() {
ip ntable | grep -oP "dev\s+\K[\w./]+"| grep -v lo | sort -u
}
iplst() {
ip a show "$1" | grep -oP "inet\s+\K[\w./]+" | egrep -v '^127'
}
while read -r i; do
paste <(printf '%s\n' "$i") <(iplst "$i")
done < <(build_inf_list)
Some notes:
Passing (and using) explicit arguments makes it much more obvious to a reader than relying on globals set elsewhere in your code, and reduce the chances that functions added in the future will stomp on variable names you're depending on.
Using process substitution, <(...), is replaced with a filename which, when read from, will return the stdout of the command ...; thus, since what you're writing to your temporary file comes from such a command, you can simply replace the temporary file with a process substitution invocation.
Any shell where echo -e does not print -e on its output is defying black-letter POSIX. While bash is noncompliant in this manner by default, it's not noncompliant consistently -- if the posix and xpg_echo flags are both set, then bash complies with the letter of the standard. It's much safer to use printf, which is far more robustly defined. See also the APPLICATION USAGE and RATIONALE sections of the linked standard document, which explains how BSD and AT&T UNIX have traditionally incompatible versions of echo, and thus why the POSIX standard is so loose in the behavior it mandates.
Final result, thanks #Charles Duffy
#!/bin/bash
build_inf_list() {
ip ntable | grep -oP "dev\s+\K[\w./]+"| grep -v lo | sort -u
}
iplst() {
if [ "$1" = "lo" ]; then
ip a show "$1" | grep -oP "inet\s+\K[\w./]+" | grep -v '^127'
else
ip a show "$1" | grep -oP "inet\s+\K[\w./]+"
fi
}
while read -r i; do
paste <(printf '%s\n' "$i") <(iplst "$i")
done < <(build_inf_list)
edited

Speed up bash filter function to run commands consecutively instead of per line

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?
sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.
If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).
I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.
Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )
Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

How can I expand arguments to a bash function into a chain of piped commands?

I often find myself doing something like this a lot:
something | grep cat | grep bat | grep rat
when all I recall is that those three words must have occurred somewhere, in some order, in the output of something...Now, i could do something like this:
something | grep '.*cat.*bat.*rat.*'
but that implies ordering (bat appears after cat). As such, I was thinking of adding a bash function to my environment called mgrep which would turn:
mgrep cat bat rat
into
grep cat | grep bat | grep rat
but I'm not quite sure how to do it (or whether there is an alternative?). One idea would be to for loop over the parameters like so:
while (($#)); do
grep $1 some_thing > some_thing
shift
done
cat some_thing
where some_thing is possibly some fifo like when one does >(cmd) in bash but I'm not sure. How would one proceed?
I believe you could generate a pipeline one command at a time, by redirecting stdin at each step. But it's much simpler and cleaner to generate your pipeline as a string and execute it with eval, like this:
CMD="grep '$1' " # consume the first argument
shift
for arg in "$#" # Add the rest in a pipeline
do
CMD="$CMD | grep '$arg'"
done
eval $CMD
This will generate a pipeline of greps that always reads from standard input, as in your model. Note that it protects spaces in quoted arguments, so that it works correctly if you write:
mgrep 'the cat' 'the bat' 'the rat'
Thanks to Alexis, this is what I did:
function mgrep() #grep multiple keywords
{
CMD=''
while (($#)); do
CMD="$CMD grep \"$1\" | "
shift
done
eval ${CMD%| }
}
You can write a recursive function; I'm not happy with the base case, but I can't think of a better one. It seems a waste to need to call cat just to pass standard input to standard output, and the while loop is a bit inelegant:
mgrep () {
local e=$1;
# shift && grep "$e" | mgrep "$#" || while read -r; do echo "$REPLY"; done
shift && grep "$e" | mgrep "$#" || cat
# Maybe?
# shift && grep "$e" | mgrep "$#" || echo "$(</dev/stdin)"
}

Resources