Is there an easy way to pass a "raw" string to grep? - bash

grep can't be fed "raw" strings when used from the command-line, since some characters need to be escaped to not be treated as literals. For example:
$ grep '(hello|bye)' # WON'T MATCH 'hello'
$ grep '\(hello\|bye\)' # GOOD, BUT QUICKLY BECOMES UNREADABLE
I was using printf to auto-escape strings:
$ printf '%q' '(some|group)\n'
\(some\|group\)\\n
This produces a bash-escaped version of the string, and using backticks, this can easily be passed to a grep call:
$ grep `printf '%q' '(a|b|c)'`
However, it's clearly not meant for this: some characters in the output are not escaped, and some are unnecessarily so. For example:
$ printf '%q' '(^#)'
\(\^#\)
The ^ character should not be escaped when passed to grep.
Is there a cli tool that takes a raw string and returns a bash-escaped version of the string that can be directly used as pattern with grep? How can I achieve this in pure bash, if not?

If you want to search for an exact string,
grep -F '(some|group)\n' ...
-F tells grep to treat the pattern as is, with no interpretation as a regex.
(This is often available as fgrep as well.)

If you are attempting to get grep to use Extended Regular Expression syntax, the way to do that is to use grep -E (aka egrep). You should also know about grep -F (aka fgrep) and, in newer versions of GNU Coreutils, grep -P.
Background: The original grep had a fairly small set of regex operators; it was Ken Thompson's original regular expression implementation. A new version with an extended repertoire was developed later, and for compatibility reasons, got a different name. With GNU grep, there is only one binary, which understands the traditional, basic RE syntax if invoked as grep, and ERE if invoked as egrep. Some constructs from egrep are available in grep by using a backslash escape to introduce special meaning.
Subsequently, the Perl programming language has extended the formalism even further; this regex dialect seems to be what most newcomers erroneously expect grep, too, to support. With grep -P, it does; but this is not yet widely supported on all platforms.
So, in grep, the following characters have a special meaning: ^$[]*.\
In egrep, the following characters also have a special meaning: ()|+?{}. (The braces for repetition were not in the original egrep.) The grouping parentheses also enable backreferences with \1, \2, etc.
In many versions of grep, you can get the egrep behavior by putting a backslash before the egrep specials. There are also special sequences like \<\>.
In Perl, a huge number of additional escapes like \w \s \d were introduced. In Perl 5, the regex facility was substantially extended, with non-greedy matching *? +? etc, non-grouping parentheses (?:...), lookaheads, lookbehinds, etc.
... Having said that, if you really do want to convert egrep regular expressions to grep regular expressions without invoking any external process, try ${regex/pattern/substitution} for each of the egrep special characters; but recognize that this does not handle character classes, negated character classes, or backslash escapes correctly.

When I use grep -E with user provided strings I escape them with this
ere_quote() {
sed 's/[][\.|$(){}?+*^]/\\&/g' <<< "$*"
}
example run
ere_quote ' \ $ [ ] ( ) { } | ^ . ? + *'
# output
# \\ \$ \[ \] \( \) \{ \} \| \^ \. \? \+ \*
This way you may safely insert the quoted string in your regular expression.
e.g. if you wanted to find each line starting with the user content, with the user providing funny strings as .*
userdata=".*"
grep -E -- "^$(ere_quote "$userdata")" <<< ".*hello"
# if you have colors in grep you'll see only ".*" in red

I think that previous answers are not complete because they miss one important thing, namely string which begin with dash (-). So while this won't work:
echo "A-B-C" | grep -F "-B-"
This one will:
echo "A-B-C" | grep -F -- "-B-"

quote() {
sed 's/[^\^]/[&]/g;s/[\^]/\\&/g' <<< "$*"
}
Usage: grep [OPTIONS] "$(quote [STRING])"
This function has some substantial benefits:
quote is independent from the regex flavor. You can use quote's output in
grep (-G)` (BRE, the default)
grep -E (ERE)
grep -P (PCRE)
sed (-E) "s/$(quote [STRING])/.../" (as long as you don't use \, [, or ] instead of /).
quote even works in corner cases that are not directly quoting related, for instance
Leading - are quoted so that they aren't misinterpreted as options by grep.
Trailing spaces are quoted so that the aren't removed by $(...).
quote only fails if [STRING] contains linebreaks. But in general there is no fix for this since tools like grep and sed may not support linebreaks in their search pattern (even if they are written as \n).
Also, there is the drawback that the quoted output usually is three times longer than the unquoted input.

Just want to comment example below which shows that substring "-B" is iterpreted by grep as a command line option and the command failed.
echo "A-B-C" | grep -F "-B-"
grep has a special option for this case:
-e PATTERNS, --regexp=PATTERNS
Use PATTERNS as the patterns. If this option is used multiple times or is combined with the -f (--file) option,
search for all patterns given. This option can be used to protect a pattern beginning with “-”.
So a fix for the issue is:
echo "A-B-C" | grep -F -e "-B-" -

Related

Proper use of capture groups in SED command

I need to convert a string "1,234" =to=> 1234.
this string is just a part of a bigger line. There are thousands of such lines in the file.
I have written a sed command which is not working as I expect it to.
echo \"1,234\" | sed 's/\("\)\([0-9]+\)\(,\)\([0-9]+\)\("\)/\2\4/g'
As far as I understand, in this code,
\1 is "
\2 is the digits before comma
\3 is ,
\4 is the digits after comma
I expect this command to output 1234 which should be \2\4. But it just yields back "1,234". So I think it is not being parsed properly. Some help would be appreciated.
I would suggest you use POSIX Extended Regular Expressions (ERE), where you don't have to escape parentheses and the repetition operator. To enable ERE in sed, you can use the -E switch (or -r in GNU sed). Your expression will then look like this:
$ echo '"1,234"' | sed -E 's/"([0-9]+),([0-9]+)"/\1\2/g'
1234
For completeness, your original BRE expression will function properly if you escape the +:
echo \"1,234\" | sed 's/\("\)\([0-9]\+\)\(,\)\([0-9]\+\)\("\)/\2\4/g'
1234
Your second and fourth groups contain [0-9]+, which matches any digit followed by a plus sign.
It looks like you meant [0-9]\+, to match one or more digits.
In passing: there's no need to group the parts you'll not be using (\1, \3 and \5). You can simplify to:
echo \"1,234\" | sed 's/"\([0-9]\+\),\([0-9]\+\)"/\1\2/g'
If you're finding all those \ hard to handled, you could use Extendend Regular Expression syntax, with the -E flag:
echo \"1,234\" | sed -E 's/"([0-9]+),([0-9]+)"/\1\2/g'

Bash - sed syntax with variables

I've got two variables VAR1 and VAR2 that contain strings. What I want to do is go through a list of files that have a .txt extension and change all occurences of VAR1 to VAR2. So far, it looks like this:
for i in `find . -name "*.txt"`
do
echo $i
sed -i -E "s|\$VAR1|\$VAR2|g" $i
done
I think everything except the sed line is working well. I think it's a syntax issue, but I haven't been able to figure out what it is. Any help would be appreciated
Thanks
You shouldn't need to escape your $ variable. Also make sure to use the lower case -e and quote the filename in case it has spaces:
sed -ri -e "s|$VAR1|$VAR2|g" "$i"
Since sed's "find-and-replace" functionality is oriented to regular expressions rather than literal strings, you might wish to consider an alternative to sed, e.g. using awk as follows:
awk -v from="$VAR1" -v to="$VAR2" '
function replace(a,b,s, n) {
n=index(s,a);
if (n==0) {return s}
return substr(s,1,n-1) b replace(a,b, substr(s,n+length(a)));
}
{print replace(from, to, $0)} '
The above can easily be combined with the find ... | while read f ; do .... done pattern mentioned elsewhere on this page.
GNU awk supports the equivalent of sed's '-i' option, but it's probably better simply to direct the output of awk to a temporary file, and then mv it into place.
You managed to quote the dollar sign from the shell (which would not have been necessary if you had used single quotes instead of double) but this does not change the fact that dollar signs also have a meaning in regular expressions. Double the backslashes to escape from both the shell and sed, or use single quotes so the backslashes get through to sed. Alternatively, use a notation which does not require backslashes.
sed -i -E 's|[$]VAR1|$VAR2|g' "$i"
Incidentally, your loop has a number of problems. Your for loop will not work correctly if there are file names with whitespace in them, and you need to quote the arguments inside the loop. To completely cope with file names with special characters in them, you want to use find -exec instead.
find . -name "*.txt" -exec sed -i -E 's|[$]VAR1|$VAR2|g' {} \;
If your find supports \+ instead of \;, by all means use that.
(1) Using the idiom for i in $(find ....) ; do ...; done will often work as intended, but it is not robust. Significantly better is the pattern:
find ... | while read i ; do ... ; done
(2) If $VAR1 and/or $VAR2 contain characters that have special significance in regular expressions, then some care will be required. For example, parentheses ("(" and ")") have special significance, and so if VAR1 contains these, using the -r option (or on a Mac, the -E option) is probably asking for trouble.
(3) Chances are that sed -i -e "s|$VAR1|$VAR2|g" will do the trick if VAR1 does not contain any of the eight characters: ^$*[]\|. and if VAR2 does not contain "|", "\" or "&".
(4) If you want to prepare your strings ($VAR1 and $VAR2) programatically for use with sed, then see this SO page; it shows how to munge the strings -- using sed of course!

OSX sed newlines - why conversion of whitespace to newlines works, but newlines are not converted to spaces

sed on OSX has some quirks. This resource (http://nlfiedler.github.io/2010/12/05/newlines-in-sed-on-mac.html) contains information on how to convert whitespace into a newline:
echo 'foo bar baz quux' | sed -e 's/ /\'$'\n/g'
OR (#ghoti's suggestion which does make it easier to read):
echo 'foo bar baz quux' | sed -e $'s/ /\\\n/g'
However, when I try the reverse - converting newlines to whitespace, it doesn't work:
echo -e "foo\nbar" | sed -e 's/\'$'\n/ /g'
A more straightforward approach of just changing \n doesn't work either:
echo -e "foo\nbar" | sed -e 's/\n/ /g'
There's a related answer here: https://superuser.com/questions/307165/newlines-in-sed-on-mac-os-x, with a detailed answer by Spiff (right at the end of the page), however applying the same logic didn't resolve the problem.
Here's one way that does work on OSX (via http://www.benjiegillam.com/2011/09/using-sed-to-replace-newlines/):
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'
However, I am still curious why reversing the original approach doesn't work.
UPDATE: here's how to make it work with two lines (the solution is to use N to embed the newline characters):
echo -e "foo\nbar\n" | sed -e 'N;s/\n/ /g'
AN ALTERNATIVE SOLUTION (see full answer by #ghoti for detailed explanation):
echo -e "foo\nbar\n" | sed -n '1h;2,$H;${;x;s/\n/ /gp;}'
However, this solution appears to be a tiny bit slower than the one suggested in the question statement (note order of these commands matters, so it might make sense to try testing them in different orders):
time seq 10000 | sed -n '1h;2,$H;${;x;s/\n/ /gp;}' > /dev/null
time seq 10000 | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' > /dev/null
Your question appears to be "why doesn't the reverse of the original approach [of converting spaces to newlines] work?".
In sed, the newline is more of a record separator than part of the line. Consider that $, the null at the end of the pattern space, comes after the last character of the line, and is not a newline of every line.
Sed commands that utilize newlines, like H and N and even s, do so outside the scope of newline-as-record-separator. The records you're substituting are between the newlines.
In order to substitute a newline, then, you need to get it INSIDE the pattern space, using N, H, etc.
So here's an option.
printf 'foo\nbar\nbaz\n' | sed -n '1h;2,$H;${;x;s/\n/ /gp;}'
The idea is that we'll append all our lines to the hold buffer, then at the end of the file, move the hold buffer back to the pattern space for substitution, and replace the newlines with spaces all at once.
The 1h;2,$H construction avoids a blank at the beginning of your output, caused by the newline that is appended before each line of data with H.
The GNU manual page for sed includes:
REGULAR EXPRESSIONS
POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.
The Mac OS X manual page for sed includes:
Sed Regular Expressions
The regular expressions used in sed, by default, are basic regular expressions (BREs, see re_format(7) for more information), but extended (modern) regular expressions can be used instead if the -E flag is given. In addition, sed has the following two additions to regular expressions:
In a context address, any character other than a backslash (\) or newline character may be used to delimit the regular expression. Also, putting a backslash character before the delimiting character causes the character to be treated literally. For example, in the context address \xabc\xdefx, the RE delimiter is an x and the second x stands for itself, so that the regular expression is abcxdef.
The escape sequence \n matches a newline character embedded in the pattern space. You cannot, however, use a literal newline character in an address or in the substitute command.
What these don't say, but what seems to be the case, is that in the s/regex/new/ command, the regex section is a regular expression, but the new section is not. In the replacement material, you have to use \ followed by a newline to embed a newline. In the search material (regex), you can use \n.
Note also that sed works on lines. By default, the newline at the end of the pattern space is pretty much unmatchable except with the regex metacharacter $; you can't simply remove that newline by matching it. You can, however, end up with multiple lines in the pattern space, and then you can match embedded newlines with the \n pattern.
A couple of alternatives, that I tend to fall back on when stymied by OSX sed peculiarities, are tr and perl.
echo -e "foo\nbar" | tr '\n' ' '
foo bar
echo -e "foo\nbar" | perl -pe 's/\n/ /'
foo bar

How to pass special characters through sed

I want to pass this command in my script:
sed -n -e "/Next</a></p>/,/Next</a></p>/ p" file.txt
This command (should) extract all text between the two matched patterns, which are both Next</a></p> in my case. However when I run my script I keep getting errors. I've tried:
sed -n -e "/Next\<\/a\>\<\/p\>/,/Next<\/a\>\<\/p>/ p" file.txt with no luck.
I believe the generic pattern for this command is this:
sed -n -e "/pattern1/,/pattern2/ p" file.txt
I can't get it working for Next</a></p> though and I'm guessing it has something to do with the special characters I am encasing. Is there any way to pass Next</a></p> in the sed command? Thanks in advance guys! This community is awesome!
You don't need to use / as a regular expression delimiter. Using a different character will make quoting issues slightly easier. The syntax is
\cregexc
where c can be any character (other than \) that you don't use in the regex. In this case, : might be a good choice:
sed -n -e '\:Next</a></p>:,\:Next</a></p>: p' file.txt
Note that I changed " to ' because inside double quotes, \ will be interpreted by bash as an escape character, whereas inside single quotes \ is just treated as a regular character. Consequently, you could have written the version with escaped slashes like this:
sed -n -e '/Next<\/a><\/p>/,/Next<\/a><\/p>/ p' file.txt
but I think the version with colons is (slightly) easier to read.
You need to escape the forward slashes inside the regular expressions with a \, since the forward slashes serve as delimiters for the regexes
sed -n -e '/Next<\/a><\/p>/,/Next<\/a><\/p>/p' file.txt

How to insert a newline in front of a pattern?

How to insert a newline before a pattern within a line?
For example, this will insert a newline behind the regex pattern.
sed 's/regex/&\n/g'
How can I do the same but in front of the pattern?
Given this sample input file, the pattern to match on is the phone number.
some text (012)345-6789
Should become
some text
(012)345-6789
This works in bash and zsh, tested on Linux and OS X:
sed 's/regexp/\'$'\n/g'
In general, for $ followed by a string literal in single quotes bash performs C-style backslash substitution, e.g. $'\t' is translated to a literal tab. Plus, sed wants your newline literal to be escaped with a backslash, hence the \ before $. And finally, the dollar sign itself shouldn't be quoted so that it's interpreted by the shell, therefore we close the quote before the $ and then open it again.
Edit: As suggested in the comments by #mklement0, this works as well:
sed $'s/regexp/\\\n/g'
What happens here is: the entire sed command is now a C-style string, which means the backslash that sed requires to be placed before the new line literal should now be escaped with another backslash. Though more readable, in this case you won't be able to do shell string substitutions (without making it ugly again.)
Some of the other answers didn't work for my version of sed.
Switching the position of & and \n did work.
sed 's/regexp/\n&/g'
Edit: This doesn't seem to work on OS X, unless you install gnu-sed.
In sed, you can't add newlines in the output stream easily. You need to use a continuation line, which is awkward, but it works:
$ sed 's/regexp/\
&/'
Example:
$ echo foo | sed 's/.*/\
&/'
foo
See here for details. If you want something slightly less awkward you could try using perl -pe with match groups instead of sed:
$ echo foo | perl -pe 's/(.*)/\n$1/'
foo
$1 refers to the first matched group in the regular expression, where groups are in parentheses.
On my mac, the following inserts a single 'n' instead of newline:
sed 's/regexp/\n&/g'
This replaces with newline:
sed "s/regexp/\\`echo -e '\n\r'`/g"
echo one,two,three | sed 's/,/\
/g'
You can use perl one-liners much like you do with sed, with the advantage of full perl regular expression support (which is much more powerful than what you get with sed). There is also very little variation across *nix platforms - perl is generally perl. So you can stop worrying about how to make your particular system's version of sed do what you want.
In this case, you can do
perl -pe 's/(regex)/\n$1/'
-pe puts perl into a "execute and print" loop, much like sed's normal mode of operation.
' quotes everything else so the shell won't interfere
() surrounding the regex is a grouping operator. $1 on the right side of the substitution prints out whatever was matched inside these parens.
Finally, \n is a newline.
Regardless of whether you are using parentheses as a grouping operator, you have to escape any parentheses you are trying to match. So a regex to match the pattern you list above would be something like
\(\d\d\d\)\d\d\d-\d\d\d\d
\( or \) matches a literal paren, and \d matches a digit.
Better:
\(\d{3}\)\d{3}-\d{4}
I imagine you can figure out what the numbers in braces are doing.
Additionally, you can use delimiters other than / for your regex. So if you need to match / you won't need to escape it. Either of the below is equivalent to the regex at the beginning of my answer. In theory you can substitute any character for the standard /'s.
perl -pe 's#(regex)#\n$1#'
perl -pe 's{(regex)}{\n$1}'
A couple final thoughts.
using -ne instead of -pe acts similarly, but doesn't automatically print at the end. It can be handy if you want to print on your own. E.g., here's a grep-alike (m/foobar/ is a regex match):
perl -ne 'if (m/foobar/) {print}'
If you are finding dealing with newlines troublesome, and you want it to be magically handled for you, add -l. Not useful for the OP, who was working with newlines, though.
Bonus tip - if you have the pcre package installed, it comes with pcregrep, which uses full perl-compatible regexes.
In this case, I do not use sed. I use tr.
cat Somefile |tr ',' '\012'
This takes the comma and replaces it with the carriage return.
To insert a newline to output stream on Linux, I used:
sed -i "s/def/abc\\\ndef/" file1
Where file1 was:
def
Before the sed in-place replacement, and:
abc
def
After the sed in-place replacement. Please note the use of \\\n. If the patterns have a " inside it, escape using \".
Hmm, just escaped newlines seem to work in more recent versions of sed (I have GNU sed 4.2.1),
dev:~/pg/services/places> echo 'foobar' | sed -r 's/(bar)/\n\1/;'
foo
bar
echo pattern | sed -E -e $'s/^(pattern)/\\\n\\1/'
worked fine on El Captitan with () support
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be written as:
sed -i 's/playstation/PS4\nplaystation/' input.txt
PS4
playstation
Consider using \\n while using it in a string literal.
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
in sed you can reference groups in your pattern with "\1", "\2", ....
so if the pattern you're looking for is "PATTERN", and you want to insert "BEFORE" in front of it, you can use, sans escaping
sed 's/(PATTERN)/BEFORE\1/g'
i.e.
sed 's/\(PATTERN\)/BEFORE\1/g'
You can also do this with awk, using -v to provide the pattern:
awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
This checks if a line contains a given pattern. If so, it appends a new line to the beginning of it.
See a basic example:
$ cat file
hello
this is some pattern and we are going ahead
bye!
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
hello
this is some
pattern and we are going ahead
bye!
Note it will affect to all patterns in a line:
$ cat file
this pattern is some pattern and we are going ahead
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' d
this
pattern is some
pattern and we are going ahead
sed -e 's/regexp/\0\n/g'
\0 is the null, so your expression is replaced with null (nothing) and then...
\n is the new line
On some flavors of Unix doesn't work, but I think it's the solution to your problem.
echo "Hello" | sed -e 's/Hello/\0\ntmow/g'
Hello
tmow
This works in MAC for me
sed -i.bak -e 's/regex/xregex/g' input.txt sed -i.bak -e 's/qregex/\'$'\nregex/g' input.txt
Dono whether its perfect one...
After reading all the answers to this question, it still took me many attempts to get the correct syntax to the following example script:
#!/bin/bash
# script: add_domain
# using fixed values instead of command line parameters $1, $2
# to show typical variable values in this example
ipaddr="127.0.0.1"
domain="example.com"
# no need to escape $ipaddr and $domain values if we use separate quotes.
sudo sed -i '$a \\n'"$ipaddr www.$domain $domain" /etc/hosts
The script appends a newline \n followed by another line of text to the end of a file using a single sed command.
In vi on Red Hat, I was able to insert carriage returns using just the \r character. I believe this internally executes 'ex' instead of 'sed', but it's similar, and vi can be another way to do bulk edits such as code patches. For example. I am surrounding a search term with an if statement that insists on carriage returns after the braces:
:.,$s/\(my_function(.*)\)/if(!skip_option){\r\t\1\r\t}/
Note that I also had it insert some tabs to make things align better.
Just to add to the list of many ways to do this, here is a simple python alternative. You could of course use re.sub() if a regex were needed.
python -c 'print(open("./myfile.txt", "r").read().replace("String to match", "String to match\n"))' > myfile_lines.txt
sed 's/regexp/\'$'\n/g'
works as justified and detailed by mojuba in his answer .
However, this did not work:
sed 's/regexp/\\\n/g'
It added a new line, but at the end of the original line, a \n was added.

Resources