Bash : reading regex from file and subsitute them into sed inline as variable - bash

I am stuck with how sed interacts with variables. I am reading a list of regex from a file then substitute it into SED to mask certain sensitive information within a log file. if I hard coded the regex, the SED work perfectly, however it behave differently when used with variable.
con-list.txt contain below:
(HTTP\/)(.{2})(.*?)(.{2})(group\.com)
(end\sretrieve\sfacility\s)(.{2})(.*?)(.{3})$
Not sure if the dollar sign for regex is interfering with the SED command.
input="/c/Users/con-list.txt"
inputfiles="/c/Users/test.log"
echo $inputfiles
while IFS= read -r var
do
#echo "Searching $var"
count1=`zgrep -E "$var" "$inputfiles" | wc -l`
if [ ${count1} -ne 0 ]
then
echo "total:${count1} ::: ${var}"
sed -r -i "s|'[$]var'|'\1\2XXXX\4\5'|g" $inputfiles #this doesnt work
sed -r -i "s/(HTTP\/)(.{2})(.*?)(.{2})(group\.com)/'\1\2XXXX\4\5'/g" $inputfiles #This works
egrep -in "${var}" $inputfiles
fi
done < "$input"
I need the SED to accept the regex as variable read from the file. So I could automate masking for sensitive information within logs.
$ ./zgrep2.sh
/c/Users/test.log
total:4 ::: (HTTP\/)(.{2})(.*?)(.{2})(group\.comp\.com\#GROUP\.COM)
sed: -e expression #1, char 30: invalid reference \5 on `s' command's RHS

Your idea was right, but you forgot to leave the regex in the sed command to be under double quotes for $var to be expanded.
Also you don't need to use wc -l to count the match of occurrences. The family of utilities under grep all implement a -c flag that returns a count of matches. That said, you don't even need to count the matches, but use the return code of the command (if the match was found or not) simply as
if zgrep -qE "$var" "$inputfiles" ; then
Assuming you might need the count for debug purposes, you can continue with your approach with modifications to your script done as below
Notice how the var is interpolated in the sed substitution, leaving it expanded under double-quotes and once expanded preserving the literal values using the single-quote.
while IFS= read -r var
do
count1=$(zgrep -Ec "$var" "$inputfiles")
if [ "${count1}" -ne 0 ]
then
sed -r -i 's|'"$var"'|\1\2XXXX\4\5|g' "$inputfiles"
sed -r -i "s/(HTTP\/)(.{2})(.*?)(.{2})(group\.com)/'\1\2XXXX\4\5'/g" "$inputfiles"
egrep -in "${var}" "$inputfiles"
fi
done < "$input"

You need:
sed -r -i "s/$var"'/\1\2XXXX\4\5/g' $inputfiles
You also need to provide sample input (a useful bit of the log file) so that we can verify our solutions.
EDIT: a slight change to $var and I think this is what you want:
$ cat ~/tmp/j
Got creds for HTTP/PPCKSAPOD81.group.com
Got creds for HTTP/PPCKSAPOD21.group.com
Got creds for HTTP/PPCKSAPOD91.group.com
Got creds for HTTP/PPCKSWAOD81.group.com
Got creds for HTTP/PPCKSDBOD81.group.com
Got creds for HTTP/PPCKSKAOD81.group.com
$ echo $var
(HTTP\/)(.{2})(.*?)(.{2})(.group\.com)
$ sed -r "s/$var"'/\1\2XXXX\4\5/' ~/tmp/j
Got creds for HTTP/PPXXXX81.group.com
Got creds for HTTP/PPXXXX21.group.com
Got creds for HTTP/PPXXXX91.group.com
Got creds for HTTP/PPXXXX81.group.com
Got creds for HTTP/PPXXXX81.group.com
Got creds for HTTP/PPXXXX81.group.com

Related

'sed: no input files' when using sed -i in a loop

I checked some solutions for this in other questions, but they are not working with my case and I'm stuck so here we go.
I have a csv file that I want to convert all to uppercase. It has to be with a loop and occupate 7 lines of code minimum. I have to run the script with this command:
./c_bash.sh student-mat.csv
So I tried this Script:
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -e 's/\(.*\)/\U\1/'
else
echo "$line"
fi
((c++))
done < student-mat.csv
I know that maybe there are a couple of unnecessary things on it, but I want to focus in the sed command because it looks like the problem here.
That script shows this output:(first 5 lines):
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
GP,F,17,U,GT3,T,1,1,AT_HOME,OTHER,COURSE,FATHER,1,2,0,NO,YES,NO,NO,NO,YES,YES,NO,5,3,3,1,1,3,4,5,5,6
GP,F,15,U,LE3,T,1,1,AT_HOME,OTHER,OTHER,MOTHER,1,2,3,YES,NO,YES,NO,YES,YES,YES,NO,4,3,2,2,3,3,10,7,8,10
GP,F,15,U,GT3,T,4,2,HEALTH,SERVICES,HOME,MOTHER,1,3,0,NO,YES,YES,YES,YES,YES,YES,YES,3,2,2,1,1,5,2,15,14,15
GP,F,16,U,GT3,T,3,3,OTHER,OTHER,HOME,FATHER,1,2,0,NO,YES,YES,NO,YES,YES,NO,NO,4,3,2,1,2,5,4,6,10,10
GP,M,16,U,LE3,T,4,3,SERVICES,OTHER,REPUTATION,MOTHER,1,2,0,NO,YES,YES,YES,YES,YES,YES,NO,5,4,2,1,2,5,10,15,15,15
Now that I see that it works, I want to apply that sed command permanently to the csv file, so I put -i after it:
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -i -e 's/\(.*\)/\U\1/'
else
echo "$line"
fi
((c++))
done < student-mat.csv
But the output instead of applying the changes, shows this:(first 5 lines)
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
sed: no input files
sed: no input files
sed: no input files
sed: no input files
sed: no input files
So checking a lot of different solutions on the internet, I also tried to change single quoting to double quoting.
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -i -e "s/\(.*\)/\U\1/"
else
echo "$line"
fi
((c++))
done < student-mat.csv
But in this case, instead of applying the changes, it generate a file with 0 bytes. So no output when I do this:
cat student-mat.csv
My expected solution here is that, when I apply this script, it changes permanently all the data to uppercase. And after applying the script, it should show this with the command cat student-mat.csv: (first 5 lines)
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
GP,F,17,U,GT3,T,1,1,AT_HOME,OTHER,COURSE,FATHER,1,2,0,NO,YES,NO,NO,NO,YES,YES,NO,5,3,3,1,1,3,4,5,5,6
GP,F,15,U,LE3,T,1,1,AT_HOME,OTHER,OTHER,MOTHER,1,2,3,YES,NO,YES,NO,YES,YES,YES,NO,4,3,2,2,3,3,10,7,8,10
GP,F,15,U,GT3,T,4,2,HEALTH,SERVICES,HOME,MOTHER,1,3,0,NO,YES,YES,YES,YES,YES,YES,YES,3,2,2,1,1,5,2,15,14,15
GP,F,16,U,GT3,T,3,3,OTHER,OTHER,HOME,FATHER,1,2,0,NO,YES,YES,NO,YES,YES,NO,NO,4,3,2,1,2,5,4,6,10,10
GP,M,16,U,LE3,T,4,3,SERVICES,OTHER,REPUTATION,MOTHER,1,2,0,NO,YES,YES,YES,YES,YES,YES,NO,5,4,2,1,2,5,10,15,15,15
Sed works on files, not on lines. Do not read lines, use sed on the file. Sed can exclude the first line by itself. See sed manual.
You want:
sed -i -e '2,$s/\(.*\)/\U\1/' student-mat.csv
You can do shorter with s/.*/\U&/.
Your code does not work as you think it does. Note that your code removes the second line from the output. Your code:
reads first line with read -r line
echo "$line" first line is printed
c++ is incremented
read -r line reads second line
then sed processes the rest of the file (from line 3 till the end) and prints them in upper case
then c++ is incremented
then read -r line fails, and the loop exits

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.
What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.
Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )
try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

what does the at sign before a dollar sign #$VAR do in a SED string in a Shell script?

What does #$VAR mean in Shell? I don't get the use of # in this case.
I encountered the following shell file while working on my dotfiles repo
#!/usr/bin/env bash
KEY="$1"
VALUE="$2"
FILE="$3"
touch "$FILE"
if grep -q "$1=" "$FILE"; then
sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE"
else
echo "export $KEY=\"$VALUE\"" >> "$FILE"
fi
and I'm struggling with understanding the sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE" line, especially the use of #.
When using sed you must not necessarily use a / character as the delimiter for the substitute action.
Thereby, the #, or % characters are also perfectly fine options to be used instead:
echo A | sed s/A/B/
echo A | sed s#A#B#
echo A | sed s%A%B%
In the command
sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE"
the character # is used as a delimiter in the s command of sed. The general form of the s (substitute) command is
s<delim><searchPattern><delim><replaceString><delim>[<flags>]
where the most commonly used <delim> is /, but other characters are sometimes used, especially when either <searchPattern> or <replaceString> contain (or might contain) slashes.

Output of bash loop iteration into next iteration

I have a list of substitutions that I would like to perform with sed. Instead of combining the substitutions into a single sed command, I would like to perform each substitution in an iteration of a bash loop. For example:
cat ${input} |
for subst in "${substlist}"; do
sed 's/'${subst}'/modified_'${subst}'/g'
done > ${output}
I would expect that each iteration modifies the entire stream but I'm only seeing that the first iteration gets the input.
Is this pattern possible in bash?
Create an array of -e options to pass to sed.
filters=()
for subst in ${substlist}; do
filters+=(-e "s/$subst/modified_$subst/")
done
sed "${filters[#]}" "$input" > "$output"
(The question of iterating over an unquoted parameter expansion and dynamically creating each sed filter is beyond the scope of this answer.)
Here is one way to do it as a single stream. Building up the sed arguments from ${substlist} and calling sed once:
#!/bin/sh
cat ${input} |
sed `for subst in ${substlist}; do
echo " -e s/${subst}/modified_${subst}/g "
done` > ${output}
Depending on what is in ${substlist} you may need to do additional escaping.
Copy the input file to the output file, then perform the sed substitutions using the -i option to keep overwriting that file.
cp "$input" "$output"
for subst in $substlist
do
sed -i "s/$subst/modified_$subst/g" "$output"
done
With BSD/OSX sed that needs to be:
sed -i '' "s/$subst/modified_$subst/g" "$output"

sed or grep to read between a set of parentheses

I'm trying to read a version number from between a set of parentheses, from this output of some command:
Test Application version 1.3.5
card 0: A version 0x1010000 (1.0.0), 20 ch
Total known cards: 1
What I'm looking to get is 1.0.0.
I've tried variations of sed and grep:
command.sh | grep -o -P '(?<="(").*(?=")")'
command.sh | sed -e 's/(\(.*\))/\1/'
and plenty of variations. No luck :-(
Help?
You were almost there! In pgrep, use backslashes to keep literal meaning of parentheses, not double quotes:
grep -o -P '(?<=\().*(?=\))'
Having GNU grep you can also use the \K escape sequence available in perl mode:
grep -oP '\(\K[^)]+'
\K removes what has been matched so far. In this case the starting ( gets removed from match.
Alternatively you could use awk:
awk -F'[()]' 'NF>1{print $2}'
The command splits input lines using parentheses as delimiters. Once a line has been splitted into multiple fields (meaning the parentheses were found) the version number is the second field and gets printed.
Btw, the sed command you've shown should be:
sed -ne 's/.*(\(.*\)).*/\1/p'
There are a couple of variations that will work. First with grep and sed:
grep '(' filename | sed 's/^.*[(]\(.*\)[)].*$/\1/'
or with a short shell script:
#!/bin/sh
while read -r line; do
value=$(expr "$line" : ".*(\(.*\)).*")
if [ "x$value" != "x" ]; then
printf "%s\n" "$value"
fi
done <"$1"
Both return 1.0.0 for your given input file.

Resources