Replacing "#", "$", "%", "&", and "_" with "\#", "\$", "\%", "\&", and "\_" - bash

I have a plain text document, which I want to compile inside LaTeX. However, sometimes it has the characters, "#", "$", "%", "&", and "_". To compile properly in LaTeX, I must first replace these characters with "#", "\$", "\%", "\&", and "_". I have used this line in sed:
sed -i 's/\#/\\\#/g' ./file.txt
sed -i 's/\$/\\\$/g' ./file.txt
sed -i 's/\%/\\\%/g' ./file.txt
sed -i 's/\&/\\\&/g' ./file.txt
sed -i 's/\_/\\\_/g' ./file.txt
Is this correct?
Unfortunately, the file is too large to open in any GUI software, so checking if my sed line is correct with a text editor is difficult. I tried searching with grep, but the search does not work as expected (e.g. below, I searched for any lines containing "$"):
grep "\$" file.txt
What is the best way to put "\" in front of these characters?
How can I use grep to successfully check the lines with the replacements?

You can do the replacement with a single call to sed:
sed -i -E 's/([#$%&_\])/\\&/g' file.txt
The & in the replacement text fills in for whichever single character is enclosed in parentheses. Note that since \ is the LaTeX escape character, you'll have to escape it as well in the original file.

sed -i 's/\#/\\\#/g' ./file.txt
sed -i 's/\$/\\\$/g' ./file.txt
sed -i 's/\%/\\\%/g' ./file.txt
sed -i 's/\&/\\\&/g' ./file.txt
sed -i 's/\_/\\\_/g' ./file.txt
You don't need the \ on the first (search) string on most of them, just $ (it's a special character, meaning the end of a line; the rest aren't special). And in the replacement, you only need two \\, not three. Also, you could do it all in one with several -e statements:
sed -i.bak -e 's/#/\\#/g' \
-e 's/\$/\\$/g' \
-e 's/%/\\%/g' \
-e 's/&/\\&/g' \
-e 's/_/\\_/g' file.txt
You don't need to double-escape anything (except the \\) because these are single-quoted. In your grep, bash is interpreting the escape on the $ because it's a special character (specifically, a sigil for variables), so grep is getting and searching for just the $, which is a special character meaning the end of a line. You need to either single-quote it to prevent bash from interpreting the \ ('\$', or add another pair of \\: "\\\$". Presumably, that's where you're getting the\` from, but you don't need it in the sed as it's written.

I think your problem is that bash itself is handling those escapes.
What you have looks right to me. But warning: it will also doubly escape e.g. a \# that is already escaped. If that's not what you want, you might want to modify your patterns to check that there isn't a preceding \ already.
$ is used for bash command substitution syntax. I guess grep "\\$" file.txt should do what you expect.

I do not respond for sed, the other answers are good enougth ;-)
You can use less as viewer to check your huge file (or more, but less is more comfortable than more).
For searching, you can use fgrep: it ignores regular expression => fgrep '\$' will really search for text \$. fgrep is the same as invoking grep -F.
EDIT:
fgrep '\$' and fgrep "\$" are different. In the second case, bash interprets the string and will replace it by a single character: $ (i.e. fgrep will search for $ only).

Related

How to grep information?

What I have:
test
more text
#user653434 text and so
test
more text
#user9659333 text and so
I'd like to filter this text and finally get the following list as .txt file:
user653434
user9659333
It's important to get the names without "#" sign.
Thx for help ;)
Using grep -P (requires GNU grep):
$ grep -oP '(?<=#)\w+' File
user653434
user9659333
-o tells grep to print only the match.
-P tells grep to use Perl-style regular expressions.
(?<=#) tells sed that # must precede the match but the # is not included in the match.
\w+ matches one or more word characters. This is what grep will print.
To change the file in place with grep:
grep -oP '(?<=#)\w+' File >tmp && mv tmp File
Using sed
$ sed -En 's/^#([[:alnum:]]+).*/\1/p' File
user653434
user9659333
And, to change the file in place:
sed -En -i.bak 's/^#([[:alnum:]]+).*/\1/p' File
-E tells sed to use the extended form of regular expressions. This reduces the need to use escapes.
-n tells sed not to print anything unless we explicitly ask it to.
-i.bak tells sed to change the file in place while leaving a backup file with the extension .bak.
The leading s in s/^#([[:alnum:]]+).*/\1/p tells sed that we are using a substitute command. The command has the typical form s/old/new/ where old is a regular expression and sed replaces old with new. The trailing p is an option to the substitute command: the p tells sed to print the resulting line.
In our case, the old part is ^#([[:alnum:]]+).*. Starting from the beginning of the line, ^, this matches # followed by one or more alphanumeric characters, ([[:alnum:]]+), followed by anything at all, .*. Because the alphanumeric characters are placed in parens, this is saved as a group, denoted \1.
The new part of the substitute command is just \1, the alphanumeric characters from above which comprise the user name.
Here, the s indicates that we are using a sed substitute command. The usual form
With GNU grep:
grep -Po '^#\K[^ ]*' file
Output:
user653434
user9659333
See: The Stack Overflow Regular Expressions FAQ

How to replace double period with single period in a file with SED or AWK?

What's the recommended way to do this and save it to a new file?
I tried
sed -i 's/.././g' /tmp/folder/domains.new
But it replaced single periods as well..
A period is a RE metacharacter so you need to escape it to have it taken literally:
sed 's/\.\././g' oldfile > newfile
Ed Morton has the right answer. Here is another way to ensure that RE metacharacters are taken literally (wrap them inside character classes):
sed -i 's/[.][.]/./g' /tmp/folder/domains.new
or
sed -i 's/[.]\{2\}/./g' /tmp/folder/domains.new
or use -E option to enable extended regex to prevent escaping { and }.
sed -i -E 's/[.]{2}/./g' /tmp/folder/domains.new
Note that in the replacement part, . and other metacharacters are always considered literal and not special.

How to pass special characters through sed

I want to pass this command in my script:
sed -n -e "/Next</a></p>/,/Next</a></p>/ p" file.txt
This command (should) extract all text between the two matched patterns, which are both Next</a></p> in my case. However when I run my script I keep getting errors. I've tried:
sed -n -e "/Next\<\/a\>\<\/p\>/,/Next<\/a\>\<\/p>/ p" file.txt with no luck.
I believe the generic pattern for this command is this:
sed -n -e "/pattern1/,/pattern2/ p" file.txt
I can't get it working for Next</a></p> though and I'm guessing it has something to do with the special characters I am encasing. Is there any way to pass Next</a></p> in the sed command? Thanks in advance guys! This community is awesome!
You don't need to use / as a regular expression delimiter. Using a different character will make quoting issues slightly easier. The syntax is
\cregexc
where c can be any character (other than \) that you don't use in the regex. In this case, : might be a good choice:
sed -n -e '\:Next</a></p>:,\:Next</a></p>: p' file.txt
Note that I changed " to ' because inside double quotes, \ will be interpreted by bash as an escape character, whereas inside single quotes \ is just treated as a regular character. Consequently, you could have written the version with escaped slashes like this:
sed -n -e '/Next<\/a><\/p>/,/Next<\/a><\/p>/ p' file.txt
but I think the version with colons is (slightly) easier to read.
You need to escape the forward slashes inside the regular expressions with a \, since the forward slashes serve as delimiters for the regexes
sed -n -e '/Next<\/a><\/p>/,/Next<\/a><\/p>/p' file.txt

bash strip line matching wildcard

sed -i -e "/^*google.com*/d" activedomains.txt
What I am trying to do is strip any line containing * google.com * it needs to be the wildcard on both front and rear, can't seem to figure it out :/
sed uses regex, not globbing (although maybe there is something that does). Pretty simple to change, though:
sed -i '/google\.com/d' activedomains.txt
This deletes any line that matches google.com. You could also use
sed -i -e '/^.*google.com.*/d' activedomains.txt
...which is more and line with what you were doing and literally means "the start of the string, then zero or more of any character followed by 'google (one of any character) com' followed by zero or more of any character." Of course, since it is surrounded by "zero or mores," it's just as well to match it directly.
do you mean this?
sed -i -e "/google\.com/d" activedomains.txt
This should works :
sed -i -e "/google.com/d" activedomains.txt
No need wildcard here : it's like a grep

How to insert a newline in front of a pattern?

How to insert a newline before a pattern within a line?
For example, this will insert a newline behind the regex pattern.
sed 's/regex/&\n/g'
How can I do the same but in front of the pattern?
Given this sample input file, the pattern to match on is the phone number.
some text (012)345-6789
Should become
some text
(012)345-6789
This works in bash and zsh, tested on Linux and OS X:
sed 's/regexp/\'$'\n/g'
In general, for $ followed by a string literal in single quotes bash performs C-style backslash substitution, e.g. $'\t' is translated to a literal tab. Plus, sed wants your newline literal to be escaped with a backslash, hence the \ before $. And finally, the dollar sign itself shouldn't be quoted so that it's interpreted by the shell, therefore we close the quote before the $ and then open it again.
Edit: As suggested in the comments by #mklement0, this works as well:
sed $'s/regexp/\\\n/g'
What happens here is: the entire sed command is now a C-style string, which means the backslash that sed requires to be placed before the new line literal should now be escaped with another backslash. Though more readable, in this case you won't be able to do shell string substitutions (without making it ugly again.)
Some of the other answers didn't work for my version of sed.
Switching the position of & and \n did work.
sed 's/regexp/\n&/g'
Edit: This doesn't seem to work on OS X, unless you install gnu-sed.
In sed, you can't add newlines in the output stream easily. You need to use a continuation line, which is awkward, but it works:
$ sed 's/regexp/\
&/'
Example:
$ echo foo | sed 's/.*/\
&/'
foo
See here for details. If you want something slightly less awkward you could try using perl -pe with match groups instead of sed:
$ echo foo | perl -pe 's/(.*)/\n$1/'
foo
$1 refers to the first matched group in the regular expression, where groups are in parentheses.
On my mac, the following inserts a single 'n' instead of newline:
sed 's/regexp/\n&/g'
This replaces with newline:
sed "s/regexp/\\`echo -e '\n\r'`/g"
echo one,two,three | sed 's/,/\
/g'
You can use perl one-liners much like you do with sed, with the advantage of full perl regular expression support (which is much more powerful than what you get with sed). There is also very little variation across *nix platforms - perl is generally perl. So you can stop worrying about how to make your particular system's version of sed do what you want.
In this case, you can do
perl -pe 's/(regex)/\n$1/'
-pe puts perl into a "execute and print" loop, much like sed's normal mode of operation.
' quotes everything else so the shell won't interfere
() surrounding the regex is a grouping operator. $1 on the right side of the substitution prints out whatever was matched inside these parens.
Finally, \n is a newline.
Regardless of whether you are using parentheses as a grouping operator, you have to escape any parentheses you are trying to match. So a regex to match the pattern you list above would be something like
\(\d\d\d\)\d\d\d-\d\d\d\d
\( or \) matches a literal paren, and \d matches a digit.
Better:
\(\d{3}\)\d{3}-\d{4}
I imagine you can figure out what the numbers in braces are doing.
Additionally, you can use delimiters other than / for your regex. So if you need to match / you won't need to escape it. Either of the below is equivalent to the regex at the beginning of my answer. In theory you can substitute any character for the standard /'s.
perl -pe 's#(regex)#\n$1#'
perl -pe 's{(regex)}{\n$1}'
A couple final thoughts.
using -ne instead of -pe acts similarly, but doesn't automatically print at the end. It can be handy if you want to print on your own. E.g., here's a grep-alike (m/foobar/ is a regex match):
perl -ne 'if (m/foobar/) {print}'
If you are finding dealing with newlines troublesome, and you want it to be magically handled for you, add -l. Not useful for the OP, who was working with newlines, though.
Bonus tip - if you have the pcre package installed, it comes with pcregrep, which uses full perl-compatible regexes.
In this case, I do not use sed. I use tr.
cat Somefile |tr ',' '\012'
This takes the comma and replaces it with the carriage return.
To insert a newline to output stream on Linux, I used:
sed -i "s/def/abc\\\ndef/" file1
Where file1 was:
def
Before the sed in-place replacement, and:
abc
def
After the sed in-place replacement. Please note the use of \\\n. If the patterns have a " inside it, escape using \".
Hmm, just escaped newlines seem to work in more recent versions of sed (I have GNU sed 4.2.1),
dev:~/pg/services/places> echo 'foobar' | sed -r 's/(bar)/\n\1/;'
foo
bar
echo pattern | sed -E -e $'s/^(pattern)/\\\n\\1/'
worked fine on El Captitan with () support
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be written as:
sed -i 's/playstation/PS4\nplaystation/' input.txt
PS4
playstation
Consider using \\n while using it in a string literal.
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
in sed you can reference groups in your pattern with "\1", "\2", ....
so if the pattern you're looking for is "PATTERN", and you want to insert "BEFORE" in front of it, you can use, sans escaping
sed 's/(PATTERN)/BEFORE\1/g'
i.e.
sed 's/\(PATTERN\)/BEFORE\1/g'
You can also do this with awk, using -v to provide the pattern:
awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
This checks if a line contains a given pattern. If so, it appends a new line to the beginning of it.
See a basic example:
$ cat file
hello
this is some pattern and we are going ahead
bye!
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
hello
this is some
pattern and we are going ahead
bye!
Note it will affect to all patterns in a line:
$ cat file
this pattern is some pattern and we are going ahead
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' d
this
pattern is some
pattern and we are going ahead
sed -e 's/regexp/\0\n/g'
\0 is the null, so your expression is replaced with null (nothing) and then...
\n is the new line
On some flavors of Unix doesn't work, but I think it's the solution to your problem.
echo "Hello" | sed -e 's/Hello/\0\ntmow/g'
Hello
tmow
This works in MAC for me
sed -i.bak -e 's/regex/xregex/g' input.txt sed -i.bak -e 's/qregex/\'$'\nregex/g' input.txt
Dono whether its perfect one...
After reading all the answers to this question, it still took me many attempts to get the correct syntax to the following example script:
#!/bin/bash
# script: add_domain
# using fixed values instead of command line parameters $1, $2
# to show typical variable values in this example
ipaddr="127.0.0.1"
domain="example.com"
# no need to escape $ipaddr and $domain values if we use separate quotes.
sudo sed -i '$a \\n'"$ipaddr www.$domain $domain" /etc/hosts
The script appends a newline \n followed by another line of text to the end of a file using a single sed command.
In vi on Red Hat, I was able to insert carriage returns using just the \r character. I believe this internally executes 'ex' instead of 'sed', but it's similar, and vi can be another way to do bulk edits such as code patches. For example. I am surrounding a search term with an if statement that insists on carriage returns after the braces:
:.,$s/\(my_function(.*)\)/if(!skip_option){\r\t\1\r\t}/
Note that I also had it insert some tabs to make things align better.
Just to add to the list of many ways to do this, here is a simple python alternative. You could of course use re.sub() if a regex were needed.
python -c 'print(open("./myfile.txt", "r").read().replace("String to match", "String to match\n"))' > myfile_lines.txt
sed 's/regexp/\'$'\n/g'
works as justified and detailed by mojuba in his answer .
However, this did not work:
sed 's/regexp/\\\n/g'
It added a new line, but at the end of the original line, a \n was added.

Resources