Brace needs to be escaped with \ inside single quotes - bash

I expect the following to work:
ls -l | grep '^.{38}<some date>'
It should give me the files which have said date in modification time. But it does not work. The following works:
ls -l | grep '^.\{38\}<some date>'
Isn't '...' supposed to turn off special meaning for all the meta characters? Why should we have to escape braces?

The regular expression .{38}, as interpreted here by grep, matches an arbitrary string of exactly 38 characters. To match literal braces, you need to escape them.
.\{38\}
In order to ensure that that exact 7-character sequence is seen by grep, you need to quote the string so that the shell doesn't perform quote removal and reduce it to .{38} before grep gets a chance to see it.
Misunderstanding the question, it appears grep is using basic regular expressions, in which unescaped braces are the literal characters and the escaped ones introduce a brace expression. In extended regular expressions, it's the other way around. In either case, though, the single quotes are protecting all enclosed characters from special treatment by the shell; whether grep treats them specially is another question.

There are many variants of regular expression syntax. By default, grep uses the "basic" ("BRE" or "obsolete") regular expression syntax, in which braces must be escaped to be treated as repetition bounds (what you're trying to do here); without the escapes, they're treated as just literal characters. In the "extended" ("ERE" or "modern"), Perl-compatible ("PCRE"), and ... well, pretty much all other variants, it's the other way around: escaped braces are treated as literal characters, and unescaped ones define repetition bounds.
grep '^.{38}<some date>' # Matches any character followed by literal braces around "38"
grep '^.\{38\}<some date>' # Matches 38 characters
grep -E '^.{38}<some date>' # Matches 38 characters (-E invokes "extended" syntax)
egrep '^.{38}<some date>' # Matches 38 characters (egrep uses "extended" syntax)
BTW, parentheses are the same: literal unless escaped in the basic syntax, literal if escaped in the extended syntax. And there are a few other differences; see the re_format man page. There are also many other syntax variants (Perl-compatible, etc). It's important to know what variant the tool you're using accepts, and format your RE appropriately for it.
BTW2, as #Charles Duffy pointed out in a comment, parsing ls output isn't a good idea. In this case, the number of characters before the date will depend on the width of other fields (user, group, size), which will not be consistent, so skipping 38 characters might skip part of the date field or not skip enough. You'd be much better off using something like find with the -mtime or -mmin tests, or at least using stat instead of ls (since you can control the fields with the format string, and e.g. put the date at the beginning of the line) (but stat will still have some of ls's other problems).

Related

Trouble understanding the non-obvious use of backslash inside of backticks

I have read a ton of pages including the bash manual, but still find the "non-obvious" use of backslashes confusing.
If I do:
echo \*
it prints a single asterisks, this is normal as I am escaping the asterisks making it literal.
If I do:
echo \\*
it prints \*
This also seems normal, the first backslash escapes the second.
If I do
echo `echo \\*`
It prints the contents of the directory. But in my mind it should print the same as echo \\* because when that is substituted and passed to echo. I understand this is the non-obvious use of backslashes everyone talks about, but I am struggling to understand WHY it happens.
Also the bash manual says
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by ‘$’, ‘`’, or ‘\’.
But it doesn't define what the "literal meaning on backslash" is. Is it as an escape character, a continuation character, or just literally a backslash character?
Also, it says it retain it's literal meaning, except when followed by ... So when it's followed by one of those three characters what does it do? Does it only escape those three characters?
This is mostly for historical interest since `...` command substitution has been superseded by the cleaner $(...) form. No new script should ever use backticks.
Here's how you evaluate a $(command) substitution
Run the command
Here's how you evaluate a `string` command substitution:
Determine the span of the string, from the opening backtick to the closing unescaped backtick (behavior is undefined if this backtick is inside a string literal: the shell will typically either treat it as literal backtick or as a closing backtick depending on its parser implementation)
Unescape the string by removing backslashes that come before one of the three characters dollar, backtick or backslash. This following character is then inserted literally into the command. A backslash followed by any other character will be left alone.
E.g. Hello\\ World will become Hello\ World, because the \\ is replaced with \
Hello\ World will also become Hello\ World, because the backslash is followed by a character other than one of those three, and therefore retains its literal meaning of just being a backslash
\\\* will become \\* since the \\ will become just \ (since backslash is one of the three), and the \* will remain \* (since asterisk is not)
Evaluate the result as a shell command (this includes following all regular shell escaping rules on the result of the now-unescaped command string)
So to evaluate echo `echo \\*`:
Determine the span of the string, here echo \\*
Unescape it according to the backtick quoting rules: echo \*
Evaluate it as a command, which runs echo to output a literal *
Since the result of the substitution is unquoted, the output will undergo:
Word splitting: * becomes * (since it's just one word)
Pathname expansion on each of the words, so * becomes bin Desktop Downloads Photos public_html according to files in the current directory
Note in particular that this was not the same as replacing the the backtick command with the output and rerunning the result. For example, we did not consider escapes, quotes and expansions in the output, which a simple text based macro expansion would have.
Pass each of these as arguments to the next command (also echo): echo bin Desktop Downloads Photos public_html
The result is a list of files in the current directory.

sed command: search and replace variables with \n character

I have to variables in a bash script:
$string = "The cat is green.\n"
$line = "Sunny day today.\n"
each of those variables contain "\n" character, how can I use sed to search and replace:
sed 's/$string/$line/g' file.txt
This doesn't seem to work, if I erase the "\n" from the strings sed works properly.
If I had only the text I could escape "\n" by adding a backslash:
sed 's/"The cat is green.\\n"/"Sunny day today.\\n"/g' file.txt
How can I manage to do search/replace when variables contain "\n" in them.
Thank you for the help.
It looks like you are trying to match the two-character sequence \n, as opposed to the single newline character that together they represent in some contexts. There is a tremendous difference between these.
As part of your example, you presented
sed 's/$string/$line/g' file.txt
, but that won't work at all, because variable references are not expanded within single-quoted strings. That has nothing whatever to do with the values of shell variables string and line.
But let's consider those values:
$string="The cat is green.\n"
$line="Sunny day today.\n"
[Extra spaces removed.]
Of course, the problem you're focusing on is that sed recognizes \n as a code for a newline character, but you also have the problem that in a regular expression, the . character matches any character, so if you want it to be treated as a literal then it, too, needs to be escaped (in the pattern, but not in the replacement). If you're trying to support search and replace for arbitrary text, then there are other characters you'll need to escape, too.
Answering the question as posed (escaping only \n sequences) you might do this:
sed "s/${string//\\n/\\\\n}/${line//\\n/\\\\n}/g"
The ${foo//pat/repl} form of parameter expansion performs pattern substitution on the expanded value, but note well that the pattern (pat) is interpreted according to shell globbing rules, not as a regular expression. That specific form replaces every appearance of the pattern; read the bash manual for alternatives that match only the first appearance and/or that match only at the beginning or the end of the parameter's value. Note, too, the extra doubling of the \ characters in the pattern substitution -- they need to be escaped for the shell, too.
Given your variable definitions, that command would be equivalent to this:
sed 's/The cat is green.\\n/Sunny day today.\\n/g'
In other words, exactly what you wanted. Again, however, be warned: that is not a general solution for arbitrary search & replace. If you want that, then you'll want to study the sed manual to determine which characters need to be escaped in the regex, and which need to be escaped in the replacement. Moreover, I don't see a way to do it with just one pattern substitution for each variable.

Delete all lines containing a caret (^)

I tried sed -i '/^/d' myfile and it deleted the entire file. How to avoid this? I want to remove all lines with ^ in it.
sed -i '/\^/d' myfile
You need to escape the ^ special character.
In regular expressions, characters that are "special" lose their special meaning when they exist within a bracket expression (square brackets). So you'd think that a search for [^] would be what you need.
Alas, it turns out that while this works for the caret, the caret also gains a different special meaning when it is the first character of a bracket expression. It is used to negate the expressions. So [^] is actually invalid regex syntax, and this character still needs to be escaped.
What you're looking for, in GNU sed, might look like:
sed -i '/[\^]/d' myfile
This looks awkward (especially when compared to #threadp's answer), but I prefer the square bracket approach to escape specials because it works on all other special characters the same way and its behaviour is consistent across regex parsers. Backslashes are used for other things -- continuing lines in shell scripts, converting characters to specials (\n, \t, etc). Too many backslashes can make things confusing.
One interesting thing to note is that the caret is only special within a bracket expression if it is the FIRST character. So the following works:
$ printf 'one\ntwo^\n' | sed -ne '/[X^]/p'
two^

egrep and grep difference with dollar

I'm having touble understanding the different behaviors of grep end egrep when using \$ in a pattern.
To be more specific:
grep "\$this->db" file # works
egrep "\$this->db" file # does not work
egrep "\\$this->db" file # works
Can some one tell me why or link some explanation?
Thank you very much.
The backslash is being eaten by the shell's escape processing, so in the first two cases the regexp is just $this->db. The difference is that grep treats a $ that isn't at the end of the regexp as an ordinary character, but egrep treats it as a regular expression that matches the end of the line.
In the last case, the double backslash causes the backslash to be sent to egrep. This escapes the $, so it gets treated as an ordinary character rather than matching the end of the line.
See man grep:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
If regex are activated (through the usage of egrep) metacharacters like the backslash have to be escaped with a backslash. Therefore the need of \\ to match a literal backslash.

how to use egrep regex?

how to use egrep regex ?
source
exec pro..do_pr_ddd_sum 123039246, 995, 201705848
egrep '*pr_ddd_sum*123039246*995*' *
-- no result found
In the code above, it can't get any result back.
Perhaps you mean 'pr_ddd_sum.*123039246.*995'.
You're confusing shell wildcards with regular expression metacharacters. In the shell, a "" means any character. In a regex, this metacharacter means zero or more of the preceding character. Look at Michael's suggestion. In it, the dot ('.') stands for any character, so '.' means any character followed by zero or more repetitions of any character.

Resources