I'm having touble understanding the different behaviors of grep end egrep when using \$ in a pattern.
To be more specific:
grep "\$this->db" file # works
egrep "\$this->db" file # does not work
egrep "\\$this->db" file # works
Can some one tell me why or link some explanation?
Thank you very much.
The backslash is being eaten by the shell's escape processing, so in the first two cases the regexp is just $this->db. The difference is that grep treats a $ that isn't at the end of the regexp as an ordinary character, but egrep treats it as a regular expression that matches the end of the line.
In the last case, the double backslash causes the backslash to be sent to egrep. This escapes the $, so it gets treated as an ordinary character rather than matching the end of the line.
See man grep:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
If regex are activated (through the usage of egrep) metacharacters like the backslash have to be escaped with a backslash. Therefore the need of \\ to match a literal backslash.
Related
I expect the following to work:
ls -l | grep '^.{38}<some date>'
It should give me the files which have said date in modification time. But it does not work. The following works:
ls -l | grep '^.\{38\}<some date>'
Isn't '...' supposed to turn off special meaning for all the meta characters? Why should we have to escape braces?
The regular expression .{38}, as interpreted here by grep, matches an arbitrary string of exactly 38 characters. To match literal braces, you need to escape them.
.\{38\}
In order to ensure that that exact 7-character sequence is seen by grep, you need to quote the string so that the shell doesn't perform quote removal and reduce it to .{38} before grep gets a chance to see it.
Misunderstanding the question, it appears grep is using basic regular expressions, in which unescaped braces are the literal characters and the escaped ones introduce a brace expression. In extended regular expressions, it's the other way around. In either case, though, the single quotes are protecting all enclosed characters from special treatment by the shell; whether grep treats them specially is another question.
There are many variants of regular expression syntax. By default, grep uses the "basic" ("BRE" or "obsolete") regular expression syntax, in which braces must be escaped to be treated as repetition bounds (what you're trying to do here); without the escapes, they're treated as just literal characters. In the "extended" ("ERE" or "modern"), Perl-compatible ("PCRE"), and ... well, pretty much all other variants, it's the other way around: escaped braces are treated as literal characters, and unescaped ones define repetition bounds.
grep '^.{38}<some date>' # Matches any character followed by literal braces around "38"
grep '^.\{38\}<some date>' # Matches 38 characters
grep -E '^.{38}<some date>' # Matches 38 characters (-E invokes "extended" syntax)
egrep '^.{38}<some date>' # Matches 38 characters (egrep uses "extended" syntax)
BTW, parentheses are the same: literal unless escaped in the basic syntax, literal if escaped in the extended syntax. And there are a few other differences; see the re_format man page. There are also many other syntax variants (Perl-compatible, etc). It's important to know what variant the tool you're using accepts, and format your RE appropriately for it.
BTW2, as #Charles Duffy pointed out in a comment, parsing ls output isn't a good idea. In this case, the number of characters before the date will depend on the width of other fields (user, group, size), which will not be consistent, so skipping 38 characters might skip part of the date field or not skip enough. You'd be much better off using something like find with the -mtime or -mmin tests, or at least using stat instead of ls (since you can control the fields with the format string, and e.g. put the date at the beginning of the line) (but stat will still have some of ls's other problems).
I have to variables in a bash script:
$string = "The cat is green.\n"
$line = "Sunny day today.\n"
each of those variables contain "\n" character, how can I use sed to search and replace:
sed 's/$string/$line/g' file.txt
This doesn't seem to work, if I erase the "\n" from the strings sed works properly.
If I had only the text I could escape "\n" by adding a backslash:
sed 's/"The cat is green.\\n"/"Sunny day today.\\n"/g' file.txt
How can I manage to do search/replace when variables contain "\n" in them.
Thank you for the help.
It looks like you are trying to match the two-character sequence \n, as opposed to the single newline character that together they represent in some contexts. There is a tremendous difference between these.
As part of your example, you presented
sed 's/$string/$line/g' file.txt
, but that won't work at all, because variable references are not expanded within single-quoted strings. That has nothing whatever to do with the values of shell variables string and line.
But let's consider those values:
$string="The cat is green.\n"
$line="Sunny day today.\n"
[Extra spaces removed.]
Of course, the problem you're focusing on is that sed recognizes \n as a code for a newline character, but you also have the problem that in a regular expression, the . character matches any character, so if you want it to be treated as a literal then it, too, needs to be escaped (in the pattern, but not in the replacement). If you're trying to support search and replace for arbitrary text, then there are other characters you'll need to escape, too.
Answering the question as posed (escaping only \n sequences) you might do this:
sed "s/${string//\\n/\\\\n}/${line//\\n/\\\\n}/g"
The ${foo//pat/repl} form of parameter expansion performs pattern substitution on the expanded value, but note well that the pattern (pat) is interpreted according to shell globbing rules, not as a regular expression. That specific form replaces every appearance of the pattern; read the bash manual for alternatives that match only the first appearance and/or that match only at the beginning or the end of the parameter's value. Note, too, the extra doubling of the \ characters in the pattern substitution -- they need to be escaped for the shell, too.
Given your variable definitions, that command would be equivalent to this:
sed 's/The cat is green.\\n/Sunny day today.\\n/g'
In other words, exactly what you wanted. Again, however, be warned: that is not a general solution for arbitrary search & replace. If you want that, then you'll want to study the sed manual to determine which characters need to be escaped in the regex, and which need to be escaped in the replacement. Moreover, I don't see a way to do it with just one pattern substitution for each variable.
I am having a sql file (samplesqlfile) and I want to replace a string which contains backticks with another string. Below is the code.
actualtext="FROM sampledatabase.\`Datatype\`"
replacetext="FROM sampledatabase.\`Datatype_details\`"
sed -i "s/\<${actualtext}\>/${replacetext}/g" samplesqlfile
This is not working. The actual word to be replaced is
FROM sampledatabase.`Datatype`
I added back slashes to escape the backticks. But still it is not working. Please help.
Observe that this does not work:
$ sed "s/\<${actualtext}\>/${replacetext}/g" samplesqlfile
FROM sampledatabase.`Datatype`
But this does:
$ sed "s/\<${actualtext}/${replacetext}/g" samplesqlfile
FROM sampledatabase.`Datatype_details`
The problem was the \>. The string variable $actualtext does not end with a word-character. It ends with a quote. Consequently, \> will never match there. The solution is to remove \>.
To clarify, \> matches at the boundary between a word character and a non-word character where the word character appears first. Word characters can be alphanumerics or underlines.
\> is a GNU extension. The behavior under BSD/OSX sed will be different.
For purposes of illustration here, I removed the -i option. For your intended use, of course, add it back.
I have file with comments like this:
\$max_servers = 2;
\#\## BLOCKED ANYWHERE
I'm trying to
Replace all instances of \$ with $.
Replace all instances of \#\## with ###.
I wonder how I can go about doing that via sed or awk
What I have tried so far without much success using vi or vim
%s/^\//gc
%s/^#/\\/###/gc
Thank you
Another option to replace all [#$] in one pass is to use the following regular expression. The following is VI syntax:
:%s/\\\([$#]\)/\1/g
Replace the characters in the brackets [] with whatever you need if its more than just # and $.
The first \\ is a backslash - escaped since its inside a regular expression
The expression between the \( and \) is saved and later used in the replacement as \1.
Escaping backslash will work
#echo "\#\##"| sed "s/\\\\#\\\\##/###/g"
###
# echo "\\$"| sed "s/\\\\\\$/$/g"
$
In order to replace a backslash, you have to double it up, so it can quote itself much the way other special characters must be quoted. You can use sed instead of vim to help automate the process a bit:
$ sed -e 's/^\\\$/$/' -e 's/^\\#\\##/###/' $file > $new_file
Note that you have to put a backslash in front of dollar signs since they are used to mark an end of line in regular expressions. That's why I have \\\$ in my first expression. One backslash to quote the backslash and another backslash to quote the dollar sign.
By the way, these same sed expressions will also work inside Vim depending upon your Vim settings.
You escape special characters with the backslash. So for example, to replace everything with \$, you would do
%s/\\\$/$/g
sed 's|^\\\([$#]\)\\\{0,1\}|\1|' YourFile
work for your sample bu will also remove the 2 \ in \$\ ...,
how to use egrep regex ?
source
exec pro..do_pr_ddd_sum 123039246, 995, 201705848
egrep '*pr_ddd_sum*123039246*995*' *
-- no result found
In the code above, it can't get any result back.
Perhaps you mean 'pr_ddd_sum.*123039246.*995'.
You're confusing shell wildcards with regular expression metacharacters. In the shell, a "" means any character. In a regex, this metacharacter means zero or more of the preceding character. Look at Michael's suggestion. In it, the dot ('.') stands for any character, so '.' means any character followed by zero or more repetitions of any character.