How to escape a previously unknown string in regular expression?

How to escape a previously unknown string in regular expression? - bash

I need to egrep a string that isn't known before runtime and that I'll get via shell variable (shell is bash, if that matters). Problem is, that string will contain special characters like braces, spaces, dots, slashes, and so on.
If I know the string I can escape the special characters one at a time, but how can I do that for the whole string?
Running the string through a sed script to prefix each special character with \ could be an idea, I still need to rtfm how such a script should be written. I don't know if there are other, better, options.
I did read re_format(7) but it seems there is no such thing like "take the whole next string as literal"...
EDIT: to avoid false positives, I should also add newline detection to the pattern, eg. egrep '^myunknownstring'

If you need to embed the string into a larger expression, sed is how I would do it.
s_esc="$(echo "$s" | sed 's/[^-A-Za-z0-9_]/\\&/g')" # backslash special characters
inv_ent="$(egrep "^item [0-9]+ desc $s_esc loc .+$" inventory_list)"

Use the -F flag to make the PATTERN a fixed literal string
$ var="(.*+[a-z]){3}"
$ echo 'foo bar (.*+[a-z]){3} baz' | grep -F "$var" -o
(.*+[a-z]){3}

Are you trying to protect the string from being incorrectly interpreted as bash syntax or are you trying to protect parts of the string from being interpreted as regular expression syntax?
For bash protection:
grep supports the -f switch:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
No escaping is necessary inside the file. Just make it a file containing a single line (and thus one pattern) which can be produced from your shell variable if that's what you need to do.
# example trivial regex
var='^r[^{]*$'
pattern=/tmp/pattern.$$
rm -f "$pattern"
echo "$var" > "$pattern"
egrep -f "$pattern" /etc/password
rm -f "$pattern"
Just to illustrate the point.
Try it with -F instead as another poster suggested for regex protection.

Related

Replacing a parameter in bash using sed

Trying to clean up several dozen redundant nagios config files, but sed isn't working for me (yes I'm fairly new to bash), here's the string I want to replace:
use store-service
host_name myhost
service_description HTTP_JVM_SYM_DS
check_command check_http!'-p 8080 -N -u /SymmetricDS/app'
check_interval 1
with this:
use my-template-service
host_name myhost
just the host_name should stay unchanged since it'll be different for each file. Any help will be greatly appreciated. Tried escaping the ' and !, but get this error -bash: !'-p: event not found
Thanks

Disclaimer: This question is somewhat light on info and rings a bit like "write my code for me". In good faith I'm assuming that it's not that, so I am answering in hopes that this can be used to learn more about text processing/regex substitutions in general, and not just to be copy-pasted somewhere and forgotten.
I suggest using perl instead of sed. While sed is often the right tool for the job, in this case I think Perl's better, for the following reasons:
Perl lets you easily do multi-line matches on a regex. This is possible with sed, but difficult (see this question for more info).
With multiple lines and complex delimiters and quote characters, sed starts to display different behavior depending on what platform you're using it on. For example, trying to do this with sed in "sorta multiline" mode gave me different results on OSX versus Linux (really GNU sed vs BSD sed). When using semi-advanced functionality like that, I'd stick with a tool that behaves consistently across platforms, which Perl does in this case.
Perl lets you deal with ASCII values and other special characters without a ton of "toothpick tower" escaping or subshelling. Since it's convenient to use ASCII values to match the single quotes in your pattern (we could use mixed double and single quotes instead, but that makes it harder to copy/paste this command into, say, a subshell or an eval'd part of a script), it's better to use a tool that supports this without extra hassle. It's possible with sed, but tricky; see this article for more info.
In sed/BRE, doing something as simple as a "one or more" match usually requires escaping special characters, aka [[:space:]]\{1,\}, which gets tedious. Since it's convenient to use a lot of repetition/grouping characters in this pattern, I prefer Perl for conciseness in this case, since it improves clarity of the matching code.
Perl lets you write comments in regex statements in one-liner mode via the x modifier. For big, multiline patterns like this one, having the pattern broken up and commented for readability really helps if you ever need to go back and change it. sed has comments too, but using them in single-pasteable-command mode (as opposed to a file of sed script code) can be tricky, and can result in less readable commands.
Anyway, following is the matcher I came up with. It's commented inline as much as I can make it, but the non-commented parts are explained here:
The -0777 switch tells perl to consume input files whole before processing them, rather than operating line-by-line. See perlrun for more info on this and the other flags. Thanks to #glennjackman for pointing this out in the comments on the original question!
The -p switch tells Perl to read STDIN until it sees a delimiter (which is end-of-input as set by -0777), run the program supplied, and print that program's return value before shutting down. Since our "program" is just a string substitution statement, its return value is the substituted string.
The -e switch tells perl to evaluate the next string argument for a program to run, rather than finding a script file or similar.
Input is piped from mytext.txt, which could be a file containing your pattern. You could also pipe input to Perl e.g. via cat mytext.txt | perl ... and it would work exactly the same way.
The regex modifiers work as follows: I use the multiline m modifier to match more than one \n-delimited statement, and the extended x modifier so we can have comments and turn off matching of literal whitespace, for clarity. You could get rid of comments and literal whitespace and splat it all into one line if you wanted, but good luck making any changes after you've forgotten what it does. See perlre for more info on these modifiers.
This command will replace the literal string you supplied, in a file that contains it (it can have more than just that string before/after it; only that block of text will be manipulated). It is less than literal in one minor way: it allows any number (one or more) of space characters between the first and second words in each line. If I remember Nagios configs, the number of spaces doesn't particularly matter anyway.
This command will not change the contents of a file it is supplied. If a file does not match the pattern, its contents will be printed out unchanged by this command. If it contains that pattern, the replaced contents will be printed out. You can write those contents to a new file, or do anything you like with them.
perl -0777pe '
# Use the pipe "|" character as an expression delimiter, since
# the pattern contains slashes.
s|
# 'use', one or more space-equivalent characters, and then 'store-service',
# on one line.
use \s+ store-service \n
# Open a capturing group.
(
# Capture the host name line in its entirety, then close the group.
host_name \s+ \S+
# Close the group and end the line.
) \n
service_description \s+ HTTP_JVM_SYM_DS \n
# Look for check_command, spaces, and check_http!, but keep matching on the
# same line.
check_command \s+ check_http!
# Look for a single quote character by ASCII value, since shell
# escaping these can be ugly/tricky, and makes your code less copy-
# pasteable in/out of scripts/subcommands.
\047
# Look for the arguments to check_http, delimited by explicit \s
# spaces, since we are in "extended" mode in order to be able to write
# these comments and the expression on multiple lines.
-p \s 8080 \s -N \s -u \s /SymmetricDS/app
# Look for another single quote and the end of the line.
\047 \n
check_interval \s+ 1\n
# Replace all of the matched text with the "use my-template-service" line,
# followed by the contents of the first matching group (the host_name line).
# You could capture the "use" statement in another group, or use e.g.
# sprintf() to align fields here instead of a big literal space line, but
# this is the simplest, most obvious way to get the replacement done.
|use my-template-service\n$1|mx
' < mytext.txt

Assuming you can glob the files to select on the log files of interest, I would first filter the files that you want to replace to be limited to five lines.
You can do that with Bash and awk:
for fn in *; do # make that glob apply to your files...
[[ -e "$fn" && -f "$fn" && -s "$fn" ]] || continue
line_cnt=$(awk 'FNR==NR{next}
END {print NR}' "$fn")
(( line_cnt == 5 )) || continue
# at this point you only have files with 5 lines of text...
done
Once you have done that, you can add another awk to the loop to make the replacements:
for fn in *; do
[[ -e "$fn" && -f "$fn" && -s "$fn" ]] || continue
line_cnt=$(awk -v l=5 'FNR==NR{next}
END {print NR}' "$fn")
(( line_cnt == 5 )) || continue
awk 'BEGIN{tgt["use"]="my-template-service"
tgt["host_name"]=""}
$1 in tgt { if (tgt[$1]=="") s=$2
else s=tgt[$1]
printf "%-33s%s\n", $1, s
}
' "$fn"
done

This is the GNU sed solution, check it. Backup your files before testing.
#!/bin/bash
# You should escape all special characters in this string (like $, ^, /, {, }, etc),
# which you need interpreted literally, not as regex - by the backslash.
# Your original string was contained only slashes from this list, but
# I decide don't escape them by backslashes, but change sed's s/pattern/replace/
# command to the s|patter|replace|. You can pick any more fittable character.
needle="use\s{1,}store-service\n\
host_name\s{1,}myhost\n\
service_description\s{1,}HTTP_JVM_SYM_DS\n\
check_command\s{1,}check_http!'-p 8080 -N -u /SymmetricDS/app'\n\
check_interval\s{1,}1"
replacement="use my-template-service\n\
host_name myhost"
# This echo command displays the generated substitute command,
# which will be used by sed
# uncomment it for viewing
# echo "s/$needle/$replacement/"
# for changing the file in place add the -i option.
sed -r "
/use\s{1,}store-service/ {
N;N;N;N;
s|$needle|$replacement|
}" input.txt
Input
one
two
use store-service
host_name myhost
service_description HTTP_JVM_SYM_DS
check_command check_http!'-p 8080 -N -u /SymmetricDS/app'
check_interval 1
three
four
Output
one
two
use my-template-service
host_name myhost
three
four

Extracting snmpdump values (with an exact MIB) from a shell script

I have a a some SNMP dump:
1.3.6.1.2.1.1.2.0|5|1.3.6.1.4.1.9.1.1178
1.3.6.1.2.1.1.3.0|7|1881685367
1.3.6.1.2.1.1.4.0|6|""
1.3.6.1.2.1.1.5.0|6|"hgfdhg-4365.gfhfg.dfg.com"
1.3.6.1.2.1.1.6.0|6|""
1.3.6.1.2.1.1.7.0|2|6
1.3.6.1.2.1.1.8.0|7|0
1.3.6.1.2.1.1.9.1.2.1|5|1.3.6.1.4.1.9.7.129
1.3.6.1.2.1.1.9.1.2.2|5|1.3.6.1.4.1.9.7.115
And need to grep all data in first string after 1.3.6.1.2.1.1.2.0|5|, but not include this start of the string in grep itself. So, I must receive 1.3.6.1.4.1.9.1.1178 in grep. I've tried to use regex:
\b1.3.6.1.2.1.1.2.0\|5\|\s*([^\n\r]*)
But without any success. If a regular expression, or grep, is in fact the right tool, can you help me find the right regex? Otherwise, what tools should I consider instead?

With GNU grep +PCRE support, you can use Perl's \K flag to discard part of the matched string :
grep -Po "1\.3\.6\.1\.2\.1\.1\.2\.0\|5\|\K.*"
-P enables Perl's regex mode and -o switches output to matched parts rather than whole lines.
I had to escape the characters that have special meaning in Perl regexs, but this can be avoided as 123 suggests, by enclosing the characters to interpret literally between \Q and \E :
grep -Po "\Q1.3.6.1.2.1.1.2.0|5|\E\K.*"
I would usually solve this with sed as follows :
sed -n 's/1\.3\.6\.1\.2\.1\.1\.2\.0|5|\(.*\)/\1/p'
The -n flag disables implicit output and the search and replace command will remove the searched prefix from the line, leaving the relevant part to be printed.
The characters that have special meaning in GNU Basic Regular Expressions (BRE) must be escaped, which in this case is only .. Also note that the grouping tokens are \( and \) rather than the usual ( and ).

An alternate way to do this is in native shell, without any regexes at all. Consider:
prefix='1.3.6.1.2.1.1.2.0|5|'
while read -r line; do
[[ $line = "$prefix"* ]] && printf '%s\n' "${line#$prefix}"
done
If your original string is piped into the while read loop, the output is precisely 1.3.6.1.4.1.9.1.1178.

sed - replace patterns with spaces on OS X

I have a bunch of text files (named install*) in which I need to replace the expression curl -L with the expression curl -k -L. I am on OS X 10, Yosemite.
The following attempts don't work:
sed -e "s/'curl -L'/'curl -k -L'/g" install*
sed -i '' -e "s/'curl -L'/'curl -k -L'/g" install*
The contents of the files are shown (as if I had typed cat), but replacement isn't performed.
What am I doing wrong?

Your problem is that you're nesting quoting mechanisms (delimiters) when you shouldn't.
Generally, you should use single quotes to enclose an sed script as a whole:
sed -i '' -e 's/curl -L/curl -k -L/g' install*
By using single quotes, the script is protected from accidental interpretation by the shell.
If you do want the shell to interpret parts of a sed script beforehand, splice those parts in as double-quoted strings.
To sed's s function, it is / that acts as the delimiter - no additional quoting of the strings between / instances should be used, and your use of ' simply causes sed to consider these single quotes part of the regex to match / the replacement string. (As an aside: you're free to choose a delimiter other than /).
That said, if you do need to force interpretation of certain characters as literals, use \-escaping, such as using \. in the regex to represent a literal .
This is the only quoting mechanism that sed itself supports.
However, in your particular case, neither the string used in the regex nor the string used as the replacement string need escaping.
For a generic way to escape strings for use as literals in these situations, see
https://stackoverflow.com/a/29613573/45375.

You need -i.
GNU sed uses -i without arguments to mean "replace in-place".
BSD sed needs an empty argument, so you'll have to use sed -i '' -e '...' install*
There's no portable way to do it with sed...

Search for single quoted grep strings?

I have a string, "$server['fish_stick']" (disregard double quotes)
I don't know how to successfully grep for an exact match for this string. I've tried many ways.
I've tried,
rgrep -i \$'server'\[\''fish'\_'stick'\'\] .
rgrep -i "\$server\[\'fish\_stick\'\]" .
rgrep -i '\$server\[\'fish\_stick\'\]' .
Is it single quotes that are causing my issue?
When I echo the first grep out it shows exactly what I want to search but returns garbage results like anything with $server in it.
Please help and explain, thank you!

The main problem here is that you are not quoting the argument being passed to grep. The only thing that needs to be escaped is \$ (if double quoted) and []. If you want the exact string (not using regex), just use fgrep (grep -F) which does exact string matching:
grep -F "\$server['fish_stick']"
Works on my system:
$ foo="\$server['fish_stick']"
$ echo "$foo" | grep -F "\$server['fish_stick']"
$server['fish_stick']
Using regex:
$ echo "$foo" | grep "\$server\['fish_stick'\]"
$server['fish_stick']
Using regex and handling nested single quotes:
$ echo "$foo" | grep '\$server\['\''fish_stick'\''\]'
$server['fish_stick']
Inside of single quotes, nested single quotes can not be not be escaped. You have to close the quotes, and then reopen it to "escape" the single quotes.
http://mywiki.wooledge.org/Quotes

I don't suppose you're asking how to get that string into a variable without having quoting issues. If you are, here's a way using a here-document:
str=$(cat <<'END'
$foo['bar']
END
)
To address your concern about escaping special characters for grep, you could use sed to put a backslash before any non-alphanumeric character:
grep "$(sed 's/[^[:alnum:]]/\\&/g' <<< "$str")" ...
When used with set -x, the grep command looks like: grep '\$foo\[\'\''bar\'\''\]' ...

How do you escape a user-provided search term that you don't want evaluated for sed?

I'm trying to escape a user-provided search string that can contain any arbitrary character and give it to sed, but can't figure out how to make it safe for sed to use. In sed, we do s/search/replace/, and I want to search for exactly the characters in the search string without sed interpreting them (e.g., the '/' in 'my/path' would not close the sed expression).
I read this related question concerning how to escape the replace term. I would have thought you'd do the same thing to the search, but apparently not because sed complains.
Here's a sample program that creates a file called "my_searches". Then it reads each line of that file and performs a search and replace using sed.
#!/bin/bash
# The contents of this heredoc will be the lines of our file.
read -d '' SAMPLES << 'EOF'
/usr/include
P#$$W0RD$?
"I didn't", said Jane O'Brien.
`ls -l`
~!##$%^&*()_+-=:'}{[]/.,`"\|
EOF
echo "$SAMPLES" > my_searches
# Now for each line in the file, do some search and replace
while read line
do
echo "------===[ BEGIN $line ]===------"
# Escape every character in $line (e.g., ab/c becomes \a\b\/\c). I got
# this solution from the accepted answer in the linked SO question.
ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}')
# Search for the line we read from the file and replace it with
# the text "replaced"
sed 's/'"$ES"'/replaced/' < my_searches # Does not work
# Search for the text "Jane" and replace it with the line we read.
sed 's/Jane/'"$ES"'/' < my_searches # Works
# Search for the line we read and replace it with itself.
sed 's/'"$ES"'/'"$ES"'/' < my_searches # Does not work
echo "------===[ END ]===------"
echo
done < my_searches
When you run the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it's used as the 'search' term, but not the 'replace' term. I've marked the lines that give this error with # Does not work above.
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: xregcomp: Invalid content of \{\}
------===[ END ]===------
If you don't escape the characters in $line (i.e., sed 's/'"$line"'/replaced/' < my_searches), you get this error instead because sed tries to interpret various characters:
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: bad format in substitution expression
sed: No previous regexp.
------===[ END ]===------
So how do I escape the search term for sed so that the user can provide any arbitrary text to search for? Or more precisely, what can I replace the ES= line in my code with so that the sed command works for arbitrary text from a file?
I'm using sed because I'm limited to a subset of utilities included in busybox. Although I can use another method (like a C program), it'd be nice to know for sure whether or not there's a solution to this problem.

This is a relatively famous problem—given a string, produce a pattern that matches only that string. It is easier in some languages than others, and sed is one of the annoying ones. My advice would be to avoid sed and to write a custom program in some other language.
You could write a custom C program, using the standard library function strstr. If this is not fast enough, you could use any of the Boyer-Moore string matchers you can find with Google—they will make search extremely fast (sublinear time).
You could write this easily enough in Lua:
local function quote(s) return (s:gsub('%W', '%%%1')) end
local function replace(first, second, s)
return (s:gsub(quote(first), second))
end
for l in io.lines() do io.write(replace(arg[1], arg[2], l), '\n') end
If not fast enough, speed things up by applying quote to arg[1] only once, and inline frunciton replace.

As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' is incorrect because it escapes out non-special characters. What you really want to do is perhaps something like:
awk 'gsub(/[^[:alpha:]]/, "\\\\&")'
This will escape out non-alpha characters. For some reason I have yet to determine, I still cant replace "I didn't", said Jane O'Brien. even though my code above correctly escapes it to
\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.
It's quite odd because this works perfectly fine
$ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/
replaced`

this : echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes every character in $line, which is wrong!. do an echo $ES after that and $ES appears to be \/\u\s\r\/\i\n\c\l\u\d\e. Then when you pass to the next sed, (below)
sed 's/'"$ES"'/replaced/' my_searches
, it will not work because there is no line that has pattern \/\u\s\r\/\i\n\c\l\u\d\e. The correct way is something like:
$ sed 's|\([#$#^&*!~+-={}/]\)|\\\1|g' file
\/usr\/include
P\#\$\$W0RD\$?
"I didn't", said Jane O'Brien.
\`ls -l\`
\~\!\#\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\|
you put all the characters you want escaped inside [], and choose a suitable delimiter for sed that is not in your character class, eg i chose "|". Then use the "g" (global) flag.
tell us what you are actually trying to do, ie an actual problem you are trying to solve.

This seems to work for FreeBSD sed:
# using FreeBSD & Mac OS X sed
ES="$(printf "%q" "${line}")"
ES="${ES//+/\\+}"
sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches
sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches
sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches

The -E option of FreeBSD sed is used to turn on extended regular expressions.
The same is available for GNU sed via the -r or --regexp-extended options respectively.
For the differences between basic and extended regular expressions see, for example:
http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
Maybe you can use FreeBSD-compatible minised instead of GNU sed?
# example using FreeBSD-compatible minised,
# http://www.exactcode.de/site/open_source/minised/
# escape some punctuation characters with printf
help printf
printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
# example line
line='!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~ ... and Jane ...'
# escapes in regular expression
ES="$(printf "%q" "${line}")" # escape some punctuation characters
ES="${ES//./\\.}" # . -> \.
ES="${ES//\\\\(/(}" # \( -> (
ES="${ES//\\\\)/)}" # \) -> )
# escapes in replacement string
lineEscaped="${line//&/\&}" # & -> \&
minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}"
minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}"
minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}"

To avoid potential backslash confusion, we could (or rather should) use a backslash variable like so:
backSlash='\\'
ES="${ES//${backSlash}(/(}" # \( -> (
ES="${ES//${backSlash})/)}" # \) -> )
(By the way using variables in such a way seems like a good approach for tackling parameter expansion issues ...)

... or to complete the backslash confusion ...
backSlash='\\'
lineEscaped="${line//${backSlash}/${backSlash}}" # double backslashes
lineEscaped="${lineEscaped//&/\&}" # & -> \&

If you have bash, and you're just doing a pattern replacement, just do it natively in bash. The ${parameter/pattern/string} expansion in Bash will work very well for you, since you can just use a variable in place of the "pattern" and replacement "string" and the variable's contents will be safe from word expansion. And it's that word expansion which makes piping to sed such a hassle. :)
It'll be faster than forking a child process and piping to sed anyway. You already know how to do the whole while read line thing, so creatively applying the capabilities in Bash's existing parameter expansion documentation can help you reproduce pretty much anything you can do with sed. Check out the bash man page to start...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio