Avoiding temporary files while iterating over pipeline results

Avoiding temporary files while iterating over pipeline results - bash

I've created the function to display interfaces and IP's per interface
network() {
iplst() {
ip a show "$i" | grep -oP "inet\s+\K[\w./]+" | grep -v 127
}
ip ntable | grep -oP "dev\s+\K[\w./]+"| grep -v lo | sort -u >> inf_list
netlist="inf_list"
while read -r i
do
infd=$i
paste <(echo -e $i) <(iplst)
done < $netlist
}
Current output:
ens32 10.0.0.2/24
10.0.0.4/24
10.0.0.20/24
ens33 192.168.1.3/24
ens34 192.168.0.2/24
ens35 192.168.2.149/24
but would like to avoid creation of temp files,
would appreciate suggestions

In general, temporary files can be replaced with process substitution. For instance, to avoid the inf_list temporary file, one can generate its contents with an inf_list function:
build_inf_list() {
ip ntable | grep -oP "dev\s+\K[\w./]+"| grep -v lo | sort -u
}
iplst() {
ip a show "$1" | grep -oP "inet\s+\K[\w./]+" | egrep -v '^127'
}
while read -r i; do
paste <(printf '%s\n' "$i") <(iplst "$i")
done < <(build_inf_list)
Some notes:
Passing (and using) explicit arguments makes it much more obvious to a reader than relying on globals set elsewhere in your code, and reduce the chances that functions added in the future will stomp on variable names you're depending on.
Using process substitution, <(...), is replaced with a filename which, when read from, will return the stdout of the command ...; thus, since what you're writing to your temporary file comes from such a command, you can simply replace the temporary file with a process substitution invocation.
Any shell where echo -e does not print -e on its output is defying black-letter POSIX. While bash is noncompliant in this manner by default, it's not noncompliant consistently -- if the posix and xpg_echo flags are both set, then bash complies with the letter of the standard. It's much safer to use printf, which is far more robustly defined. See also the APPLICATION USAGE and RATIONALE sections of the linked standard document, which explains how BSD and AT&T UNIX have traditionally incompatible versions of echo, and thus why the POSIX standard is so loose in the behavior it mandates.

Final result, thanks #Charles Duffy
#!/bin/bash
build_inf_list() {
ip ntable | grep -oP "dev\s+\K[\w./]+"| grep -v lo | sort -u
}
iplst() {
if [ "$1" = "lo" ]; then
ip a show "$1" | grep -oP "inet\s+\K[\w./]+" | grep -v '^127'
else
ip a show "$1" | grep -oP "inet\s+\K[\w./]+"
fi
}
while read -r i; do
paste <(printf '%s\n' "$i") <(iplst "$i")
done < <(build_inf_list)
edited

Related

Bad Substitution error with pdfgrep as variable?

I'm using a bash script to parse information from a PDF and use it to rename the file (with the help of pdfgrep). However, after some working, I'm receiving a "Bad Substitution" error with line 5. Any ideas on how to reformat it?
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+")
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+")
$({ read dobmonth; read dobday; read dobyear; } < (pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+"))
# Check id1 is found, else do nothing
if [ ${#id1} ]; then
mv "$f" "${id1}_${id2}_${printf '%02d-%02d-%04d\n' "$dobmonth" "$dobday" "$dobyear"}.pdf"
fi
done

There are several unrelated bugs in this code; a corrected version might look like the following:
#!/usr/bin/env bash
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+") || continue
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+") || continue
{ read dobmonth; read dobday; read dobyear; } < <(pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+")
printf -v date '%02d-%02d-%04d' "$dobmonth" "$dobday" "$dobyear"
mv -- "$f" "${id1}_${id2}_${date}.pdf"
done
< (...) isn't meaningful bash syntax. If you want to redirect from a process substitution, you should use the redirection syntax < and the process substitution <(...) separately.
$(...) generates a subshell -- a separate process with its own memory, such that variables assigned in that subprocess aren't exposed to the larger shell as a whole. Consequently, if you want the contents you set with read to be visible, you can't have them be in a subshell.
${printf ...} isn't meaningful syntax. Perhaps you wanted a command substitution? That would be $(printf ...), not ${printf ...}. However, it's more efficient to use printf -v varname 'fmt' ..., which avoids the overhead of forking off a subshell altogether.
Because we put the || continues on the id1=$(... | grep ...) command, we no longer need to test whether id1 is nonempty: The continue will trigger and cause the shell to continue to the next file should the grep fail.

Do what Charles suggests wrt creating the new file name but you might consider a different approach to parsing the PDF file to reduce how many pdfregs and pipes and greps you're doing on each file. I don't have pdfgrep on my system, nor do I know what your input file looks like but if we use this input file:
$ cat file
foo
ID #: M13
foo
Date Of Birth: 05 21 1996
foo
Second ID: V27
foo
and grep -E in place of pdfgrep then here's how I'd get the info from the input file by just reading it once with pdfgrep and parsing that output with awk instead of reading it multiple times with pdfgrep and using multiple pipes and greps to extract the info you need:
$ grep -E -i '(ID #|Second ID|Date Of Birth): ' file |
awk -F': +' '{f[$1]=$2} END{print f["ID #"], f["Second ID"], f["Date Of Birth"]}'
M13 V27 05 21 1996
Given that you can use the same read approach to save the output in variables (or an array). You obviously may need to massage the awk command depending on what your pdfgrep output actually looks like.

Terminate tail command after timeout

I'm capturing stdout (log) in a file using tail -f file_name to save a specific string with grep and sed (to exit the tail) :
tail -f log.txt | sed /'INFO'/q | grep 'INFO' > info_file.txt
This works fine, but I want to terminate the command in case it does not find the pattern (INFO) in the log file after some time
I want something like this (which does not work) to exit the script after a timeout (60sec):
tail -f log.txt | sed /'INFO'/q | grep 'INFO' | read -t 60
Any suggestions?

This seems to work for me...
read -t 60 < <(tail -f log.txt | sed /'INFO'/q | grep 'INFO')

Since you only want to capture one line:
#!/bin/bash
IFS= read -r -t 60 line < <(tail -f log.txt | awk '/INFO/ { print; exit; }')
printf '%s\n' "$line" >info_file.txt
For a more general case, where you want to capture more than one line, the following uses no external commands other than tail:
#!/usr/bin/env bash
end_time=$(( SECONDS + 60 ))
while (( SECONDS < end_time )); do
IFS= read -t 1 -r line && [[ $line = *INFO* ]] && printf '%s\n' "$line"
done < <(tail -f log.txt)
A few notes:
SECONDS is a built-in variable in bash which, when read, will retrieve the time in seconds since the shell was started. (It loses this behavior after being the target of any assignment -- avoiding such mishaps is part of why POSIX variable-naming conventions reserving names with lowercase characters for application use are valuable).
(( )) creates an arithmetic context; all content within is treated as integer math.
<( ) is a process substitution; it evaluates to the name of a file-like object (named pipe, /dev/fd reference, or similar) which, when read from, will contain output from the command contained therein. See BashFAQ #24 for a discussion of why this is more suitable than piping to read.

The timeout command, (part of the Debian/Ubuntu "coreutils" package), seems suitable:
timeout 1m tail -f log.txt | grep 'INFO'

How to process values from for loop in shell script

I have below for loop in shell script
#!/bin/bash
#Get the year
curr_year=$(date +"%Y")
FILE_NAME=/test/codebase/wt.properties
key=wt.cache.master.slaveHosts=
prop_value=""
getproperty(){
prop_key=$1
prop_value=`cat ${FILE_NAME} | grep ${prop_key} | cut -d'=' -f2`
}
#echo ${prop_value}
getproperty ${key}
#echo "Key = ${key}; Value="${prop_value}
arr=( $prop_value )
for i in "${arr[#]}"; do
echo $i | head -n1 | cut -d "." -f1
done
The output I am getting is as below.
test1
test2
test3
I want to process the test2 from above results to below script in place of 'ABCD'
grep test12345 /home/ptc/storage/**'ABCD'**/apache/$curr_year/logs/access.log* | grep GET > /tmp/test.access.txt
I tried all the options but could not able to succeed as I am new to shell scripting.

Ignoring the many bugs elsewhere and focusing on the one piece of code you say you want to change:
for i in "${arr[#]}"; do
val=$(echo "$i" | head -n1 | cut -d "." -f1)
grep test12345 /dev/null "/home/ptc/storage/$val/apache/$curr_year/logs/access.log"* \
| grep GET
done > /tmp/test.access.txt
Notes:
Always quote your expansions. "$i", "/path/with/$val/"*, etc. (The * should not be quoted on the assumption that you want it to be expanded).
for i in $prop_value would have the exact same (buggy) behavior; using arr buys you nothing. If you want using arr to increase correctness, populate it correctly: read -r -a arr <<<"$prop_value"
The redirection is moved outside the loop -- that way the second iteration through the loop doesn't overwrite the file written by the first one.
The extra /dev/null passed to grep ensures that its behavior is consistent regardless of the number of matches; otherwise, it would display filenames only if more than one matching log file existed, and not otherwise.

shell script - trying not to use tmp files

How can I do that without tmp1 and tmp2?
(information files are good)
cat information_file1 | sed -e 's/\,/\ /g' >> tmp1
echo Messi >> tmp2
cat tmp1 | grep Ronaldo | cut -d"=" -f2- >> tmp2
rm tmp1
cat information_file2 | fin_func tmp2
rm tmp2
fin_func for your insight.(its not really the func and I dont want to change it just that you will see how I use tmp2 and info_file2)
while read -a line; do
if [[ "`grep $line $1`" != "" ]]; then
echo 1
fi
done

This should work, although it’s pretty incomprehensive:
cat information_file2 | fin_func <(cat <(echo Messi) <(cat information_file1 | \
sed -e 's/\,/\ /g' | grep Ronaldo | cut -d"=" -f2-))
The <( … ) syntax is Bash’s process substitution, which returns a the name of a file /dev/fd file descriptor and whose output it written to.

The sample fin_func reads through the file given as a command argument multiple times, so unless we are allowed to modify that function at least one temporary file will be necessary. The sample fin_func given in the question can be easily modified so that it does not read the file multiple times, but since you indicate that this is not the real script I will assume it cannot be modified and must take a file as an argument. That said, I would write your script as:
trap 'rm -f $TMPFILE' 0 # in bash, just trapping on 0 will work for SIGINT, etc
TMPFILE=$( mktemp )
{ echo Messi
tr , ' ' < information_file1 |
awk -F= '/Ronaldo/{print $2}' ; } > $TMPFILE
< information_file2 fin_func $TMPFILE
I strongly suspect that fin_func could be rewritten so that it does not require a regular file as input. Also, there's no need for the tr, as you could gsub in awk only on matching lines and save a bit of processing, but that is probably a trivial optimization. However, using tr instead of sed is aesthetically necessary.

Speed up bash filter function to run commands consecutively instead of per line

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?

sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.

If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).

I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.

Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )

Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Avoiding temporary files while iterating over pipeline results - bash

Related

Bad Substitution error with pdfgrep as variable?

Terminate tail command after timeout

How to process values from for loop in shell script

shell script - trying not to use tmp files

Speed up bash filter function to run commands consecutively instead of per line

Categories

Resources