How to grep for filename with literal backslash and apostrophe - bash

I have a script which in which I am trying to grep a log file for lines containing filenames with apostrophes that have been escaped with a backslash.
My grep code is:
grep -i saved logfile | grep "/path/to/file/filename contains spaces, apostrophe\'s, and commas"
The apostrophes in the logfile all have a preceding backslash so the following grep command works:
grep -i saved logfile | grep "/path/to/file/filename contains spaces, apostrophe\\\'s, and commas"
However I am trying to run this in a if statement where the filename is a variable:
if [[ ! $(grep -i saved logfile | grep "$i") ]]
which doesn't return a match.
How can I escape the backslash and the apostrophe to get a match with grep?

There are multiple layers here. The backslash has a special meaning both to grep and in the shell inside double-quoted strings. Things are simpler if you put the regex in single quotes, but then, of course, the regex cannot contain a single quote. But you can have a single quote in double quotes adjacent to the single-quoted string.
grep -i saved logfile |
grep '/path/to/file/filename contains spaces, apostrophe\\'"'"'s, and commas'
The first single-quoted string ends with apostrophe\\' and is followed by "'" -- a double-quoted string containing a single quote. That in turn is followed by another single-quoted string.
Alternatively, add enough backslashes to satisfy both the shell and grep.
grep -i saved logfile |
grep "/path/to/file/filename contains spaces, apostrophe\\\'s, and commas"
Of course, another alternative is to use grep -F which will match the entire string as a literal, i.e. dots will only match dots, not any character, asterisks will only match asterisks, not repetitions of the previous character, etc.
(The correct plural of "apostrophe" is simply "apostrophes", though.)

grep'ing for "\\'" works for me:
root#ultra:~# a="\\\'"
root#ultra:~# echo -e marley\\\'s\ ghost\\nmarley\'s ghost
marley\'s ghost
marley's ghost
root#ultra:~# echo -e marley\\\'s\ ghost\\nmarley\'s ghost | grep $a
marley\'s ghost
root#ultra:~#

Related

How to remove single quotes from a string using sed

If I need to remove the following line from certain files
ob_start('ffggg_ggg');
I have tried
grep -rl "ob_start('ffggg_ggg');" /pathtosearch | xargs sed -i 's/[ob_start('ffggg_ggg');]//g'
but the rest of the characters have been removed except for single quotes.
How can I remove single quotes from a string using sed command?

Escape "./" when using sed

I wanted to use grep to exclude words from $lastblock by using a pipeline, but I found that grep works only for files, not for stdout output.
So, here is what I'm using:
lastblock="./2.json"
echo $lastblock | sed '1,/firstmatch/d;/.json/,$d'
I want to exclude ./ and .json, keeping only what is between.
This sed command is correct for this purpose, but how to escape the ./ replacing firstmatch so it can work?
Thanks in advance!
Use bash's Parameter Substitution
lastblock="./2.json"
name="${lastblock##*/}" # strips from the beginning until last / -> 2.json
base="${name%.*}" # strips from the last . to the end -> 2
but I found that grep works only for files, not for stdout output.
here it is. (if your grep supports the -P flag.
lastblock="./2.json"
echo "$lastblock" | grep -Po '(?<=\./).*(?=\.)'
but how to escape the ./
With sed(1), escape it using a back slash \
lastblock="./2.json"
echo "$lastblock" | sed 's/^\.\///;s/\..*$//'
Or use a different delimiter like a pipe |
sed 's|^\./||;s|\..*$||'
with awk
lastblock="./2.json"
echo "$lastblock" | awk -F'[./]+' '{print $2}'
Starting from bashv3, regular expression pattern matching is supported using the =~ operator inside the [[ ... ]] keyword.
lastblock="./2.json"
regex='^\./([[:digit:]]+)\.json'
[[ $lastblock =~ $regex ]] && echo "${BASH_REMATCH[1]}"
Although a P.E. should suffice just for this purpose.
I wanted to use grep to exclude words from $lastblock by using a pipeline, but I found that grep works only for files, not for stdout output.
Nonsense. grep works the same for the same input, regardless of whether it is from a file or from the standard input.
So, here is what I'm using:
lastblock="./2.json"
echo $lastblock | sed '1,/firstmatch/d;/.json/,$d'
I want to exclude ./ and .json, keeping only what is between. This sed
command is correct for this purpose,
That sed command is nowhere near correct for the stated purpose. It has this effect:
delete every line from the very first one up to and including the next subsequent one that matches the regular expression /firstmatch/, AND
delete every line from the first one matching the regular expression /.json/ to the last one of the file (and note that . is a regex metacharacter).
To remove part of a line instead of deleting a whole line, use an s/// command instead of a d command. As for escaping, you can escape a character to sed by preceding it with a backslash (\), which itself must be quoted or escaped to protect it from interpretation by the shell. Additionally, most regex metacharacters lose their special significance when they appear inside a character class, which I find to be a more legible way to include them in a pattern as literals. For example:
lastblock="./2.json"
echo "$lastblock" | sed 's/^[.]\///; s/[.]json$//'
That says to remove the literal characters ./ appearing at the beginning of the (any) line, and, separately, to remove the literal characters .json appearing at the end of the line.
Alternatively, if you want to modify only those lines that both start with ./ and end with .json then you can use a single s command with a capturing group and a backreference:
lastblock="./2.json"
echo "$lastblock" | sed 's/^[.]\/\(.*\)[.]json$/\1/'
That says that on lines that start with ./ and end with .json, capture everything between those two and replace the whole line with the captured part alone.
You can use another character like '#' when you want to avoid slashes.
You can remember a part that matches and use it in the replacement.
Use [.] avoiding the dot to be any character.
echo "$lastblock" | sed -r 's#[.]/(.*)[.]json#\1#'
Solution!
Just discovered today the tr command thanks to this legendary, unrelated answer.
When searching all over Google for how to exclude "." and "/", 100% of StackOverflow answers didn't helped.
So, to escape characters from the output of a command, just append this pipe:
| tr -d "{character-emoji-anything-you-want-to-exclude}"
So, a full working and simple sample:
echo "./2.json" | tr -d "/" | tr -d "." | tr -d "json"
And done!

How to grep information?

What I have:
test
more text
#user653434 text and so
test
more text
#user9659333 text and so
I'd like to filter this text and finally get the following list as .txt file:
user653434
user9659333
It's important to get the names without "#" sign.
Thx for help ;)
Using grep -P (requires GNU grep):
$ grep -oP '(?<=#)\w+' File
user653434
user9659333
-o tells grep to print only the match.
-P tells grep to use Perl-style regular expressions.
(?<=#) tells sed that # must precede the match but the # is not included in the match.
\w+ matches one or more word characters. This is what grep will print.
To change the file in place with grep:
grep -oP '(?<=#)\w+' File >tmp && mv tmp File
Using sed
$ sed -En 's/^#([[:alnum:]]+).*/\1/p' File
user653434
user9659333
And, to change the file in place:
sed -En -i.bak 's/^#([[:alnum:]]+).*/\1/p' File
-E tells sed to use the extended form of regular expressions. This reduces the need to use escapes.
-n tells sed not to print anything unless we explicitly ask it to.
-i.bak tells sed to change the file in place while leaving a backup file with the extension .bak.
The leading s in s/^#([[:alnum:]]+).*/\1/p tells sed that we are using a substitute command. The command has the typical form s/old/new/ where old is a regular expression and sed replaces old with new. The trailing p is an option to the substitute command: the p tells sed to print the resulting line.
In our case, the old part is ^#([[:alnum:]]+).*. Starting from the beginning of the line, ^, this matches # followed by one or more alphanumeric characters, ([[:alnum:]]+), followed by anything at all, .*. Because the alphanumeric characters are placed in parens, this is saved as a group, denoted \1.
The new part of the substitute command is just \1, the alphanumeric characters from above which comprise the user name.
Here, the s indicates that we are using a sed substitute command. The usual form
With GNU grep:
grep -Po '^#\K[^ ]*' file
Output:
user653434
user9659333
See: The Stack Overflow Regular Expressions FAQ

sed ' in replacement text when $ in search pattern

I have a file containing the string "this $c$ is a single quote", created as follows:
%echo "this \$c\$ is a single quote" > test3.txt
%cat test3.txt
this $c$ is a single quote
I would like to replace the letter c by a single quote, but I need to match the $ characters as well (to avoid matching other characters 'c'). I can't seem to do this.
I tried
%sed 's/$c$/$foo$/' test3.txt
this $c$ is a single quote
so obviously I need to escape the $.
%sed 's/\$c\$/$foo$/' test3.txt
this $foo$ is a single quote
But when I try to put an escaped ' in the replacement text I get
%sed 's/$c$/$\'$/' test3.txt
quote>
So I need to use some other quoting method.
%sed "s/\$c\$/$'$/" test3.txt
this $c$ is a single quote
Nothing was replaced, so let's try not escaping the $
%sed "s/$c$/$'$/" test3.txt
this $c$ is a single quote$'$
That was unexpected (to me), so let's try matching just the c as a sanity check.
%sed "s/c/'/" test3.txt
this $'$ is a single quote
I tried a number of other combinations but no luck. How do I do this in sed?
How about this?
!$ echo 'This is $c$ ceee' | sed s/\\\$c\\\$/\'/
This is ' ceee
I do not enclose the whole sed's command in quotes, so I need to escape each backslash and each dollar separately (and the quote as well, of course).
Edit
As Chris Lear points out, my replace string contains no dollars. Here is a fix – please note these dollars do not have a special meaning for sed (they are not interpreted as symbols for match, they're just plain characters to be inserted) so they can be escaped only once:
!$ echo 'This is $c$ ceee' | sed s/\\\$c\\\$/\\\$\'\\\$/
This is $'$ ceee
!$ echo 'This is $c$ ceee' | sed s/\\\$c\\\$/\$\'\$/
This is $'$ ceee
If you want to quote the sed command you need to do plenty of escaping. $ is a special character for both the shell and the sed patterns. In the shell it means the start of a variable name to expand, $c in your case. To sed it means the end of the line. To do the quoting you need to escape it from both of those so you could do
sed "s/\\\$c\\\$/\$'\$/" test3.txt
or you could mix your quoting styles to use single quotes around the $ expansions and double quotes around your single quote like
sed 's/\$c\$/$'"'"'$/' test3.txt
You can use ansi c string
sed $'s/\$c\$/\'/'
This allows single backslash escaping of $s and 's.
More info here
If you want to keep $s
sed $'s/\$c\$/$\'$/'
Try this :
sed -i -E "s/([$]c[$])/\'/g" sed.txt

Delete flanking uppercase characters in a string

How could I remove the uppercases that start and end in this string (DNA sequence) using the linux terminal?
Input:
TCGTAAATGGTgggggtcagaccctaaggtttccataaagGCTGGtccaaacgcaacttctaattgaatgataaaatactcatgcatgttGTTCGAtaaaacgtaatatttatggcgtgtctacctaccgttccatcttatcgtttaaactttggtacaattctcagttaagtgacgattgctttggaggaagtaatactgtgatcacaatctatgctgtttgcgttgccAAAAAAtttcaatgtaaaaaaaaaTCGAAAATGGT
Desired Output:
gggggtcagaccctaaggtttccataaagGCTGGtccaaacgcaacttctaattgaatgataaaatactcatgcatgttGTTCGAtaaaacgtaatatttatggcgtgtctacctaccgttccatcttatcgtttaaactttggtacaattctcagttaagtgacgattgctttggaggaagtaatactgtgatcacaatctatgctgtttgcgttgccAAAAAAtttcaatgtaaaaaaaaa
Note there are other internal uppercases in the string that must be preserved.
Thanks!
Using sed you can do this, assuming each string is in one line:
sed 's/^[A-Z]*\|[A-Z]*$//g' <<< "$s"
You could use sed with a regular expression:
sed -e 's/^[A-Z]*//' -e 's/[A-Z]*$//'
(It would also be possible to combine these into a single regex, but I wrote it this way for clarity; the first regex strips leading uppercase chars, the second strips trailing uppercase chars.)
[me#localhost ~]$ echo 'TCGTAAATGGTgggggtcagaccctaaggtttccataaagGCTGGtccaaacgcaacttctaattgaatgataaaatactcatgcatgttGTTCGAtaaaacgtaatatttatggcgtgtctacctaccgttccatcttatcgtttaaactttggtacaattctcagttaagtgacgattgctttggaggaagtaatactgtgatcacaatctatgctgtttgcgttgccAAAAAAtttcaatgtaaaaaaaaaTCGAAAATGGT' | sed -e 's/^[A-Z]*//' -e 's/[A-Z]*$//'
gggggtcagaccctaaggtttccataaagGCTGGtccaaacgcaacttctaattgaatgataaaatactcatgcatgttGTTCGAtaaaacgtaatatttatggcgtgtctacctaccgttccatcttatcgtttaaactttggtacaattctcagttaagtgacgattgctttggaggaagtaatactgtgatcacaatctatgctgtttgcgttgccAAAAAAtttcaatgtaaaaaaaaa
Suppose
sequence=TCGTAAATGGTgggggtcagaccctaaggtttccataaagGCTGGtccaaacgcaacttctaattgaatgataaaatactcatgcatgttGTTCGAtaaaacgtaatatttatggcgtgtctacctaccgttccatcttatcgtttaaactttggtacaattctcagttaagtgacgattgctttggaggaagtaatactgtgatcacaatctatgctgtttgcgttgccAAAAAAtttcaatgtaaaaaaaaaTCGAAAATGGT
A pure bash requiring extended patterns would be
shopt -s extglob
tmp1=${sequence##*([TCGA])} # Save the result of stripping the leading capitals
echo ${tmp1%%*([TCGA])} # Strip the trailing capitals

Resources