Bash sed replace with exact match of a text in a file - bash

I have a file pattern.txt which is composed of one very long line of complicated code (~8200 chars).
This code can be found in multiple files inside multiple directories.
I can easily identify a list of these files using
grep -rli 'uniquepartofthecode' *
My concern is how do I replace it with the exact text from within the file ?
I tried to do:
var=$(cat pattern.txt)
sed -i "s/$var//g" targetfile.txt
but I got the following error :
sed: -e expression #1, char 96: unknown option to `s'
sed is interpreting my $var content as a regular expression, I would like it to just match the exact text.
The pattern.txt content could be more or less any combination of characters so I'm afraid I cannot escape every characters efficiently.
Is there a solution using sed ? Or should I use another tool for that ?
EDIT:
I tried using this solution to make a proper regex pattern from my text file.
Is it possible to escape regex metacharacters reliably with sed
the overall process is:
var=$(cat pattern.txt)
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$var")
sed -n "s/$searchEscaped/foo/p" <<<"$var" # if ok, echoes 'foo'
This last command displays "foo". $searchEscaped seems to be properly escaped.
Though, this is not returning anything (it should display foo + the rest of the file without the matched part):
sed -n "s/$searchEscaped/foo/p" targetfile.txt

I think that the best solution is to not use regular expressions at all and resort to string replacement.
One way to do this is using perl:
$ echo "$string_to_replace"
some other stuff abc$^%!# some more
$ echo "$search"
abc$^%!#
$ perl -spe '$len = length $search;
while (($pos = index($_, $search, $n)) > -1) {
substr($_, $pos, $len) = "replacement";
$n = $pos + $len;
}' <<<"$string_to_replace" -- -search="$search"
some other stuff replacement some more
The -p switch tells perl to loop through each line of the variable $string_to_replace (which could easily be replaced by a file). -s allows options to be passed to the script - in this case, I've passed a shell variable containing the search string.
For each line of the file, the while loop runs through all of the matches of the search string. substr is used on the left hand of the assignment to replace a substring of $_, which refers to the current line being processed.

Related

How to convert separators using regex in bash

How do I modify my bash file to achieve the expected result shown below ?
#!/bin/bash
filename=$1
var="$(<$filename)" | tr -d '\n'
sed -i 's/;/,/g' $var
Convert this input file
a,b;c^d"e}
f;g,h!;i8j-
To this output file
a,b,c,d,e,f,g,h,i,j
How to convert separators using regex in bash
You would, well, literally, do exactly that - convert any of the separators using regex. This consists of steps:
most importantly, figure out the exact definition of what consists of a "separator"
writing a regex for it
writing an algorithm for it
running and testing the code
For example, assuming a separator is a sequence of of any of \n,;^"}!8- characters, you could do:
sed -zi 's/[,;^"}!8-]\+/,/g; s/,$/\n/' input_file
Or similar with first tr '\n' , for example when -z is not available with your sed, and then pass the result of tr to sed. The second regex adds a trailing newline on the output instead of a trailing ,.
Additionally, in your code:
var is unset on sed line. Parts of | pipeline are running in a subshell.
var=$(<$filename) contains the contents of the file, whereas sed wants a filename as argument, not file contents.
var=.... | ... is pipeing the result of assignment to tr. The output of assignment is empty, so that line produces nothing, and its output is unused.
Remember to check bash scripts with shellcheck.
For a somewhat portable solution, maybe try
tr -cs A-Za-z , <input_file | sed '$s/,$/\n/' >output_file
The use of \n to force a final newline is still not entirely reliable; there are some sed versions which interpret the sequence as a literal n.
You'd move output_file back on top of input_file after this command if you want to replace the original.

Text processing in bash - extracting information between multiple HTML tags and outputting it into CSV format [duplicate]

I can't figure how to tell sed dot match new line:
echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'
I expect to get:
one
three
instead I get original:
one
two
three
sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).
$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three
Maybe you should try vim
:%s/one\_.*two/one/g
If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :
.
Matches any character, including newline.
All you need to use is a -z option:
echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
# three
See the online sed demo.
However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:
perl -i -0 -pe 's/one.*?two//sg' file # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file
The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.
You can use python this way:
$ echo -e "one\ntwo\nthree" | python -c 'import re, sys; s=sys.stdin.read(); s=re.sub("(?s)one.*two", "one", s); print s,'
one
three
$
This reads the entire python's standard input (sys.stdin.read()), then substitutes "one" for "one.*two" with dot matches all setting enabled (using (?s) at the start of the regular expression) and then prints the modified string (the trailing comma in print is used to prevent print from adding an extra newline).
This might work for you:
<<<$'one\ntwo\nthree' sed '/two/d'
or
<<<$'one\ntwo\nthree' sed '2d'
or
<<<$'one\ntwo\nthree' sed 'n;d'
or
<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'
Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.
Only certain commands (N,H and G) preserve newlines in the pattern/hold space.
N appends a newline to the pattern space and then appends the next line.
H does exactly the same except it acts on the hold space.
G appends a newline to the pattern space and then appends whatever is in the hold space too.
The hold space is empty until you place something in it so:
sed G file
will insert an empty line after each line.
sed 'G;G' file
will insert 2 empty lines etc etc.
How about two sed calls:
(get rid of the 'two' first, then get rid of the blank line)
$ echo -e 'one\ntwo\nthree' | sed 's/two//' | sed '/^$/d'
one
three
Actually, I prefer Perl for one-liners over Python:
$ echo -e 'one\ntwo\nthree' | perl -pe 's/two\n//'
one
three
Below discussion is based on Gnu sed.
sed operates on a line by line manner. So it's not possible to tell it dot match newline. However, there are some tricks that can implement this. You can use a loop structure (kind of) to put all the text in the pattern space, and then do the operation.
To put everything in the pattern space, use:
:a;N;$!ba;
To make "dot match newline" indirectly, you use:
(\n|.)
So the result is:
root#u1804:~# echo -e "one\ntwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
Note that in this case, (\n|.) matches newline and all characters. See below example:
root#u1804:~# echo -e "oneXXXXXX\nXXXXXXtwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#

Using the nul character in sed instead of "/"

I want to remove a line in a file containing a path. The path which should be removed is stored in a variable in a bash script.
Somewhere I read that filenames are allowed to contain any characters except "/" and "\0" on *nix systems.
Since I can't use "/" for this purpose (I have paths) I wanted to use the nul character.
What I tried:
#!/bin/bash
var_that_contains_path="/path/to/file.ext"
sed "\\\0$var_that_contains_path"\\0d file.txt > file1.txt #not working
sed "\\0$var_that_contains_path"\0d file.txt > file1.txt #not working
How can I make this work? Thanks in advance!
I think you may be using the wrong tool for the job here. Just use grep:
$ cat file
blah /path/to/file.ext more
some other text
$ var='/path/to/file.ext'
$ grep -vF "$var" file
some other text
As you can see, the line containing the path in the variable is not present in the output.
The -v switch means that grep does an inverse match, so that only lines that don't match the pattern are printed. The -F switch means that grep searches for fixed strings, rather than regular expressions.
Since the filename can contain at least a dozen different characters which have special meaning for sed (., ^, [, just to name a few), the right way to do this is to escape them all in the search string:
Escape a string for a sed replace pattern
So for the search pattern (in this case: the path), you need the following expression:
the_path=$(sed -e 's/[]\/$*.^|[]/\\&/g' <<< "$the_path")

sed not writing to file

I am having trouble using sed to substitute values and write to a new file. It writes to a new file, but fails to change any values. Here is my code:
cd/mydirectory
echo "Enter file name:"
read file_input
file1= "$file_input"
file1= "$file1.b"
file2= "$file_input"
file2= "${file2}Ins.b"
sed "/\!cats!/s/\!cats!.*/cats!300!/g $file1>$file2
I simply want to substitute whatever text was after cats with the value 300. Whenever I run this script it doesn't overwrite the previous value with 300. Any suggestions?
Try changing
sed "/\!cats!/s/\!cats!.*/cats!300!/g $file1>$file2
to
sed "s/cats.*/cats300/g" $file1 > $file2
To replace text, you often have to use sed like sed "s/foo/bar/g" file_in > file_out, to change all occurrences of foo with bar in file_in, redirecting the output to file_out.
Edit
I noticed that you are redirecting the output to the same file - you can't do that. You have 2 options:
Redirect the results to another file, with a different filename. e.g.:
sed "s/cats.*/cats300/g" $file1 > $file2.tmp
Note the .tmp after $file2
Use the -i flag (if using GNU sed):
sed -i "s/cats.*/cats300/g" $file1
The i stands for inline replacement.
I think this modified version of your script should work:
echo "Enter file name:"
read file_input
file1="$file_input" # No space after '='
file1="$file1.b" # No space after '='
file2="$file_input" # No space after '='
file2="${file2}Ins.b" # No space after '='
sed 's/!cats!.*/!cats!300!/g' "$file1" > "$file2"
Note the single quotes around sed expression: with them, there's no need to escape the !s in your expression. Note also the double quotes around "$file1" and "$file2": if one of those variables contain spaces, this will prevent your command from breaking.
Some further remarks:
As pointed by jim, you may want to use the GNU sed -i option.
Your regex will currently replace everything after !cats! in matching lines. If they were several occurences of !cats! on your line, only one will remain. If instead you just want to replace the value between two ! delimiters, you may consider use following sed command instead:
sed 's/!cats![^!]*/!cats!300/g'

How to delete the string which is present in parameter from file in unix

I have redirected some string into one parameter for ex: ab=jyoti,priya, pranit
I have one file : Name.txt which contains -
jyoti
prathmesh
John
Kelvin
pranit
I want to delete the records from the Name.txt file which are contain in ab parameter.
Please suggest if this can be done ?
If ab is a shell variable, you can easily turn it into an extended regular expression, and use it with grep -E:
grep -E -x -v "${ab//,/|}" Name.txt
The string substitution ${ab//,/|} returns the value of $ab with every , substituted with a | which turns it into an extended regular expression, suitable for passing as an argument to grep -E.
The -v option says to remove matching lines.
The -x option specifies that the match needs to cover the whole input line, so that a short substring will not cause an entire longer line to be removed. Without it, ab=prat would cause pratmesh to be removed.
If you really require a sed solution, the transformation should be fairly trivial. grep -E -v -x 'aaa|bbb|ccc' is equivalent to sed '/^\(aaa\|bbb\|ccc)$/d' (with some dialects disliking the backslashes, and others requiring them).
To do an in-place edit (modify Name.txt without a temporary file), try this:
sed -i "/^\(${ab//,/\|}\)\$/d" Name.txt
This is not entirely robust against strings containing whitespace or other shell metacharacters, but if you just need
Try with
sed -e 's/\bjyoti\b//g;s/\bpriya\b//g' < Name.txt
(using \b assuming you need word boundaries)
this will do it:
for param in `echo $ab | sed -e 's/[ ]+//g' -e 's/,/ /g'` ; do res=`sed -e "s/$param//g" < name.txt`; echo $res > name.txt; done
echo $res

Resources