Using the nul character in sed instead of "/" - bash

I want to remove a line in a file containing a path. The path which should be removed is stored in a variable in a bash script.
Somewhere I read that filenames are allowed to contain any characters except "/" and "\0" on *nix systems.
Since I can't use "/" for this purpose (I have paths) I wanted to use the nul character.
What I tried:
#!/bin/bash
var_that_contains_path="/path/to/file.ext"
sed "\\\0$var_that_contains_path"\\0d file.txt > file1.txt #not working
sed "\\0$var_that_contains_path"\0d file.txt > file1.txt #not working
How can I make this work? Thanks in advance!

I think you may be using the wrong tool for the job here. Just use grep:
$ cat file
blah /path/to/file.ext more
some other text
$ var='/path/to/file.ext'
$ grep -vF "$var" file
some other text
As you can see, the line containing the path in the variable is not present in the output.
The -v switch means that grep does an inverse match, so that only lines that don't match the pattern are printed. The -F switch means that grep searches for fixed strings, rather than regular expressions.

Since the filename can contain at least a dozen different characters which have special meaning for sed (., ^, [, just to name a few), the right way to do this is to escape them all in the search string:
Escape a string for a sed replace pattern
So for the search pattern (in this case: the path), you need the following expression:
the_path=$(sed -e 's/[]\/$*.^|[]/\\&/g' <<< "$the_path")

Related

Text processing in bash - extracting information between multiple HTML tags and outputting it into CSV format [duplicate]

I can't figure how to tell sed dot match new line:
echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'
I expect to get:
one
three
instead I get original:
one
two
three
sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).
$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three
Maybe you should try vim
:%s/one\_.*two/one/g
If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :
.
Matches any character, including newline.
All you need to use is a -z option:
echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
# three
See the online sed demo.
However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:
perl -i -0 -pe 's/one.*?two//sg' file # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file
The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.
You can use python this way:
$ echo -e "one\ntwo\nthree" | python -c 'import re, sys; s=sys.stdin.read(); s=re.sub("(?s)one.*two", "one", s); print s,'
one
three
$
This reads the entire python's standard input (sys.stdin.read()), then substitutes "one" for "one.*two" with dot matches all setting enabled (using (?s) at the start of the regular expression) and then prints the modified string (the trailing comma in print is used to prevent print from adding an extra newline).
This might work for you:
<<<$'one\ntwo\nthree' sed '/two/d'
or
<<<$'one\ntwo\nthree' sed '2d'
or
<<<$'one\ntwo\nthree' sed 'n;d'
or
<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'
Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.
Only certain commands (N,H and G) preserve newlines in the pattern/hold space.
N appends a newline to the pattern space and then appends the next line.
H does exactly the same except it acts on the hold space.
G appends a newline to the pattern space and then appends whatever is in the hold space too.
The hold space is empty until you place something in it so:
sed G file
will insert an empty line after each line.
sed 'G;G' file
will insert 2 empty lines etc etc.
How about two sed calls:
(get rid of the 'two' first, then get rid of the blank line)
$ echo -e 'one\ntwo\nthree' | sed 's/two//' | sed '/^$/d'
one
three
Actually, I prefer Perl for one-liners over Python:
$ echo -e 'one\ntwo\nthree' | perl -pe 's/two\n//'
one
three
Below discussion is based on Gnu sed.
sed operates on a line by line manner. So it's not possible to tell it dot match newline. However, there are some tricks that can implement this. You can use a loop structure (kind of) to put all the text in the pattern space, and then do the operation.
To put everything in the pattern space, use:
:a;N;$!ba;
To make "dot match newline" indirectly, you use:
(\n|.)
So the result is:
root#u1804:~# echo -e "one\ntwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
Note that in this case, (\n|.) matches newline and all characters. See below example:
root#u1804:~# echo -e "oneXXXXXX\nXXXXXXtwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#

How to grep information?

What I have:
test
more text
#user653434 text and so
test
more text
#user9659333 text and so
I'd like to filter this text and finally get the following list as .txt file:
user653434
user9659333
It's important to get the names without "#" sign.
Thx for help ;)
Using grep -P (requires GNU grep):
$ grep -oP '(?<=#)\w+' File
user653434
user9659333
-o tells grep to print only the match.
-P tells grep to use Perl-style regular expressions.
(?<=#) tells sed that # must precede the match but the # is not included in the match.
\w+ matches one or more word characters. This is what grep will print.
To change the file in place with grep:
grep -oP '(?<=#)\w+' File >tmp && mv tmp File
Using sed
$ sed -En 's/^#([[:alnum:]]+).*/\1/p' File
user653434
user9659333
And, to change the file in place:
sed -En -i.bak 's/^#([[:alnum:]]+).*/\1/p' File
-E tells sed to use the extended form of regular expressions. This reduces the need to use escapes.
-n tells sed not to print anything unless we explicitly ask it to.
-i.bak tells sed to change the file in place while leaving a backup file with the extension .bak.
The leading s in s/^#([[:alnum:]]+).*/\1/p tells sed that we are using a substitute command. The command has the typical form s/old/new/ where old is a regular expression and sed replaces old with new. The trailing p is an option to the substitute command: the p tells sed to print the resulting line.
In our case, the old part is ^#([[:alnum:]]+).*. Starting from the beginning of the line, ^, this matches # followed by one or more alphanumeric characters, ([[:alnum:]]+), followed by anything at all, .*. Because the alphanumeric characters are placed in parens, this is saved as a group, denoted \1.
The new part of the substitute command is just \1, the alphanumeric characters from above which comprise the user name.
Here, the s indicates that we are using a sed substitute command. The usual form
With GNU grep:
grep -Po '^#\K[^ ]*' file
Output:
user653434
user9659333
See: The Stack Overflow Regular Expressions FAQ

Bash sed replace with exact match of a text in a file

I have a file pattern.txt which is composed of one very long line of complicated code (~8200 chars).
This code can be found in multiple files inside multiple directories.
I can easily identify a list of these files using
grep -rli 'uniquepartofthecode' *
My concern is how do I replace it with the exact text from within the file ?
I tried to do:
var=$(cat pattern.txt)
sed -i "s/$var//g" targetfile.txt
but I got the following error :
sed: -e expression #1, char 96: unknown option to `s'
sed is interpreting my $var content as a regular expression, I would like it to just match the exact text.
The pattern.txt content could be more or less any combination of characters so I'm afraid I cannot escape every characters efficiently.
Is there a solution using sed ? Or should I use another tool for that ?
EDIT:
I tried using this solution to make a proper regex pattern from my text file.
Is it possible to escape regex metacharacters reliably with sed
the overall process is:
var=$(cat pattern.txt)
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$var")
sed -n "s/$searchEscaped/foo/p" <<<"$var" # if ok, echoes 'foo'
This last command displays "foo". $searchEscaped seems to be properly escaped.
Though, this is not returning anything (it should display foo + the rest of the file without the matched part):
sed -n "s/$searchEscaped/foo/p" targetfile.txt
I think that the best solution is to not use regular expressions at all and resort to string replacement.
One way to do this is using perl:
$ echo "$string_to_replace"
some other stuff abc$^%!# some more
$ echo "$search"
abc$^%!#
$ perl -spe '$len = length $search;
while (($pos = index($_, $search, $n)) > -1) {
substr($_, $pos, $len) = "replacement";
$n = $pos + $len;
}' <<<"$string_to_replace" -- -search="$search"
some other stuff replacement some more
The -p switch tells perl to loop through each line of the variable $string_to_replace (which could easily be replaced by a file). -s allows options to be passed to the script - in this case, I've passed a shell variable containing the search string.
For each line of the file, the while loop runs through all of the matches of the search string. substr is used on the left hand of the assignment to replace a substring of $_, which refers to the current line being processed.

Grep (fgrep) bash exact match end of line

I have the below example file
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/apersand $ file
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/file[with square brackets]
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/~$tempfile
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
217a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThreeDays
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/single quote's
I want to grep the last part of the file (the file name) but I'm after an exact match for the last part of the line (the file name)
grep FileThree$ files.md5
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
gives back an exact match and doesnt find "FileThreeDays" which is what I'm after but because some of the file names contains square brackets it I'm having to use grep -F or fgrep. However using fgrep like the above doesnt work it returns nothing.
How can I exact match the last part of the line using fgrep whilst still honoring the special characters above ~ / $ / ' / [ ] etc...or any other method using maybe awk...
Further....
using fgrep withou return both these files I only want an exact match (using the use of the $ above with grep), but $ with fgrep doesnt return anything.
grep -F FileThree files.md5
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
217a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThreeDays
I can't tell all the details from your question, but it sounds like you can use grep and just escape the special characters: grep 'File\[Three\]Days$'
If you want to use fgrep, though, you can use some tr tricks to help you. If all you want is the filename (without the directory name), you can do something like
cat files.md5 | tr '/' '\n' | fgrep FileThreeDays
That tr command replaces slashes with newlines, so it will put each filename on its own line. That means that fgrep will only find the filename when it searches for FileThreeDays.
If you want the full filename with directory, it's a little trickier, but a similar approach will work. Assuming that there's always a double space between the SHA and the filename, and that there aren't any filenames with double spaces or tab characters in them, you can try something like this:
sed 's/ /\t' files.md5 | tr '\t' '\n' | fgrep FileThreeDays
That sed command converts the double spaces to tabs. The tr command turns those tabs into newlines (the same trick as above).
I would use awk:
awk '{$1="";print}' file
$1="" cuts the first column to an empty string, and print prints the modified line - which only contains the filename now.
However, this leaves a blank space at the start of each line. If you care about it and want to remove it, set the output field separator to an empty string:
awk '{$1="";print}' OFS="" file

How to insert a newline in front of a pattern?

How to insert a newline before a pattern within a line?
For example, this will insert a newline behind the regex pattern.
sed 's/regex/&\n/g'
How can I do the same but in front of the pattern?
Given this sample input file, the pattern to match on is the phone number.
some text (012)345-6789
Should become
some text
(012)345-6789
This works in bash and zsh, tested on Linux and OS X:
sed 's/regexp/\'$'\n/g'
In general, for $ followed by a string literal in single quotes bash performs C-style backslash substitution, e.g. $'\t' is translated to a literal tab. Plus, sed wants your newline literal to be escaped with a backslash, hence the \ before $. And finally, the dollar sign itself shouldn't be quoted so that it's interpreted by the shell, therefore we close the quote before the $ and then open it again.
Edit: As suggested in the comments by #mklement0, this works as well:
sed $'s/regexp/\\\n/g'
What happens here is: the entire sed command is now a C-style string, which means the backslash that sed requires to be placed before the new line literal should now be escaped with another backslash. Though more readable, in this case you won't be able to do shell string substitutions (without making it ugly again.)
Some of the other answers didn't work for my version of sed.
Switching the position of & and \n did work.
sed 's/regexp/\n&/g'
Edit: This doesn't seem to work on OS X, unless you install gnu-sed.
In sed, you can't add newlines in the output stream easily. You need to use a continuation line, which is awkward, but it works:
$ sed 's/regexp/\
&/'
Example:
$ echo foo | sed 's/.*/\
&/'
foo
See here for details. If you want something slightly less awkward you could try using perl -pe with match groups instead of sed:
$ echo foo | perl -pe 's/(.*)/\n$1/'
foo
$1 refers to the first matched group in the regular expression, where groups are in parentheses.
On my mac, the following inserts a single 'n' instead of newline:
sed 's/regexp/\n&/g'
This replaces with newline:
sed "s/regexp/\\`echo -e '\n\r'`/g"
echo one,two,three | sed 's/,/\
/g'
You can use perl one-liners much like you do with sed, with the advantage of full perl regular expression support (which is much more powerful than what you get with sed). There is also very little variation across *nix platforms - perl is generally perl. So you can stop worrying about how to make your particular system's version of sed do what you want.
In this case, you can do
perl -pe 's/(regex)/\n$1/'
-pe puts perl into a "execute and print" loop, much like sed's normal mode of operation.
' quotes everything else so the shell won't interfere
() surrounding the regex is a grouping operator. $1 on the right side of the substitution prints out whatever was matched inside these parens.
Finally, \n is a newline.
Regardless of whether you are using parentheses as a grouping operator, you have to escape any parentheses you are trying to match. So a regex to match the pattern you list above would be something like
\(\d\d\d\)\d\d\d-\d\d\d\d
\( or \) matches a literal paren, and \d matches a digit.
Better:
\(\d{3}\)\d{3}-\d{4}
I imagine you can figure out what the numbers in braces are doing.
Additionally, you can use delimiters other than / for your regex. So if you need to match / you won't need to escape it. Either of the below is equivalent to the regex at the beginning of my answer. In theory you can substitute any character for the standard /'s.
perl -pe 's#(regex)#\n$1#'
perl -pe 's{(regex)}{\n$1}'
A couple final thoughts.
using -ne instead of -pe acts similarly, but doesn't automatically print at the end. It can be handy if you want to print on your own. E.g., here's a grep-alike (m/foobar/ is a regex match):
perl -ne 'if (m/foobar/) {print}'
If you are finding dealing with newlines troublesome, and you want it to be magically handled for you, add -l. Not useful for the OP, who was working with newlines, though.
Bonus tip - if you have the pcre package installed, it comes with pcregrep, which uses full perl-compatible regexes.
In this case, I do not use sed. I use tr.
cat Somefile |tr ',' '\012'
This takes the comma and replaces it with the carriage return.
To insert a newline to output stream on Linux, I used:
sed -i "s/def/abc\\\ndef/" file1
Where file1 was:
def
Before the sed in-place replacement, and:
abc
def
After the sed in-place replacement. Please note the use of \\\n. If the patterns have a " inside it, escape using \".
Hmm, just escaped newlines seem to work in more recent versions of sed (I have GNU sed 4.2.1),
dev:~/pg/services/places> echo 'foobar' | sed -r 's/(bar)/\n\1/;'
foo
bar
echo pattern | sed -E -e $'s/^(pattern)/\\\n\\1/'
worked fine on El Captitan with () support
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be written as:
sed -i 's/playstation/PS4\nplaystation/' input.txt
PS4
playstation
Consider using \\n while using it in a string literal.
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
in sed you can reference groups in your pattern with "\1", "\2", ....
so if the pattern you're looking for is "PATTERN", and you want to insert "BEFORE" in front of it, you can use, sans escaping
sed 's/(PATTERN)/BEFORE\1/g'
i.e.
sed 's/\(PATTERN\)/BEFORE\1/g'
You can also do this with awk, using -v to provide the pattern:
awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
This checks if a line contains a given pattern. If so, it appends a new line to the beginning of it.
See a basic example:
$ cat file
hello
this is some pattern and we are going ahead
bye!
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
hello
this is some
pattern and we are going ahead
bye!
Note it will affect to all patterns in a line:
$ cat file
this pattern is some pattern and we are going ahead
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' d
this
pattern is some
pattern and we are going ahead
sed -e 's/regexp/\0\n/g'
\0 is the null, so your expression is replaced with null (nothing) and then...
\n is the new line
On some flavors of Unix doesn't work, but I think it's the solution to your problem.
echo "Hello" | sed -e 's/Hello/\0\ntmow/g'
Hello
tmow
This works in MAC for me
sed -i.bak -e 's/regex/xregex/g' input.txt sed -i.bak -e 's/qregex/\'$'\nregex/g' input.txt
Dono whether its perfect one...
After reading all the answers to this question, it still took me many attempts to get the correct syntax to the following example script:
#!/bin/bash
# script: add_domain
# using fixed values instead of command line parameters $1, $2
# to show typical variable values in this example
ipaddr="127.0.0.1"
domain="example.com"
# no need to escape $ipaddr and $domain values if we use separate quotes.
sudo sed -i '$a \\n'"$ipaddr www.$domain $domain" /etc/hosts
The script appends a newline \n followed by another line of text to the end of a file using a single sed command.
In vi on Red Hat, I was able to insert carriage returns using just the \r character. I believe this internally executes 'ex' instead of 'sed', but it's similar, and vi can be another way to do bulk edits such as code patches. For example. I am surrounding a search term with an if statement that insists on carriage returns after the braces:
:.,$s/\(my_function(.*)\)/if(!skip_option){\r\t\1\r\t}/
Note that I also had it insert some tabs to make things align better.
Just to add to the list of many ways to do this, here is a simple python alternative. You could of course use re.sub() if a regex were needed.
python -c 'print(open("./myfile.txt", "r").read().replace("String to match", "String to match\n"))' > myfile_lines.txt
sed 's/regexp/\'$'\n/g'
works as justified and detailed by mojuba in his answer .
However, this did not work:
sed 's/regexp/\\\n/g'
It added a new line, but at the end of the original line, a \n was added.

Resources