Using sed with a regex to replace strings - bash

I want to replace some string which contain specific words with another word.
Here is my code
#!/bin/bash
arr='foo/foo/baz foo/bar/baz foo/baz/baz';
for i in ${arr[#]}; do
echo $i | sed -e 's|foo/(bar\|baz)/baz|test|g'
done
Result
foo/foo/baz
foo/bar/baz
foo/baz/baz
Expected
foo/foo/baz
foo/test/baz
foo/test/baz

There are several things you can improve. The reason you are using the alternate delimiters '|' for the sed substitution expression (to avoid the "picket fence" appearance of \/\/\/ complicates the use of '|' as the OR (alternative) regex component. Choose an alternative delimiter that does not also server as part of the regular expression, '#' works fine.
Next there is no reason to loop, simply use a here string to redirect the contents of arr to sed and place it all in a command substitution with the "%s\n" format specifier to provide the newline separated output. (that's a mouthful, but it is actually nothing more than)
arr='foo/foo/baz foo/bar/baz foo/baz/baz'
printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr))
Example Use/Output
To test it out, just select the expressions above and middle-mouse paste the selection into your terminal, e.g.
$ arr='foo/foo/baz foo/bar/baz foo/baz/baz'
> printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr)
foo/foo/baz
foo/test/baz
foo/test/baz
Look things over and let me know if you have further questions.

How about something like this:
sed -e 's/\(bar\|baz\)\//test\//g'

Related

Insert the contents of the variable in SED command [duplicate]

If I run these commands from a script:
#my.sh
PWD=bla
sed 's/xxx/'$PWD'/'
...
$ ./my.sh
xxx
bla
it is fine.
But, if I run:
#my.sh
sed 's/xxx/'$PWD'/'
...
$ ./my.sh
$ sed: -e expression #1, char 8: Unknown option to `s'
I read in tutorials that to substitute environment variables from shell you need to stop, and 'out quote' the $varname part so that it is not substituted directly, which is what I did, and which works only if the variable is defined immediately before.
How can I get sed to recognize a $var as an environment variable as it is defined in the shell?
Your two examples look identical, which makes problems hard to diagnose. Potential problems:
You may need double quotes, as in sed 's/xxx/'"$PWD"'/'
$PWD may contain a slash, in which case you need to find a character not contained in $PWD to use as a delimiter.
To nail both issues at once, perhaps
sed 's#xxx#'"$PWD"'#'
In addition to Norman Ramsey's answer, I'd like to add that you can double-quote the entire string (which may make the statement more readable and less error prone).
So if you want to search for 'foo' and replace it with the content of $BAR, you can enclose the sed command in double-quotes.
sed 's/foo/$BAR/g'
sed "s/foo/$BAR/g"
In the first, $BAR will not expand correctly while in the second $BAR will expand correctly.
Another easy alternative:
Since $PWD will usually contain a slash /, use | instead of / for the sed statement:
sed -e "s|xxx|$PWD|"
You can use other characters besides "/" in substitution:
sed "s#$1#$2#g" -i FILE
一. bad way: change delimiter
sed 's/xxx/'"$PWD"'/'
sed 's:xxx:'"$PWD"':'
sed 's#xxx#'"$PWD"'#'
maybe those not the final answer,
you can not known what character will occur in $PWD, / : OR #.
if delimiter char in $PWD, they will break the expression
the good way is replace(escape) the special character in $PWD.
二. good way: escape delimiter
for example:
try to replace URL as $url (has : / in content)
x.com:80/aa/bb/aa.js
in string $tmp
URL
A. use / as delimiter
escape / as \/ in var (before use in sed expression)
## step 1: try escape
echo ${url//\//\\/}
x.com:80\/aa\/bb\/aa.js #escape fine
echo ${url//\//\/}
x.com:80/aa/bb/aa.js #escape not success
echo "${url//\//\/}"
x.com:80\/aa\/bb\/aa.js #escape fine, notice `"`
## step 2: do sed
echo $tmp | sed "s/URL/${url//\//\\/}/"
URL
echo $tmp | sed "s/URL/${url//\//\/}/"
URL
OR
B. use : as delimiter (more readable than /)
escape : as \: in var (before use in sed expression)
## step 1: try escape
echo ${url//:/\:}
x.com:80/aa/bb/aa.js #escape not success
echo "${url//:/\:}"
x.com\:80/aa/bb/aa.js #escape fine, notice `"`
## step 2: do sed
echo $tmp | sed "s:URL:${url//:/\:}:g"
x.com:80/aa/bb/aa.js
With your question edit, I see your problem. Let's say the current directory is /home/yourname ... in this case, your command below:
sed 's/xxx/'$PWD'/'
will be expanded to
sed `s/xxx//home/yourname//
which is not valid. You need to put a \ character in front of each / in your $PWD if you want to do this.
Actually, the simplest thing (in GNU sed, at least) is to use a different separator for the sed substitution (s) command. So, instead of s/pattern/'$mypath'/ being expanded to s/pattern//my/path/, which will of course confuse the s command, use s!pattern!'$mypath'!, which will be expanded to s!pattern!/my/path!. I’ve used the bang (!) character (or use anything you like) which avoids the usual, but-by-no-means-your-only-choice forward slash as the separator.
Dealing with VARIABLES within sed
[root#gislab00207 ldom]# echo domainname: None > /tmp/1.txt
[root#gislab00207 ldom]# cat /tmp/1.txt
domainname: None
[root#gislab00207 ldom]# echo ${DOMAIN_NAME}
dcsw-79-98vm.us.oracle.com
[root#gislab00207 ldom]# cat /tmp/1.txt | sed -e 's/domainname: None/domainname: ${DOMAIN_NAME}/g'
--- Below is the result -- very funny.
domainname: ${DOMAIN_NAME}
--- You need to single quote your variable like this ...
[root#gislab00207 ldom]# cat /tmp/1.txt | sed -e 's/domainname: None/domainname: '${DOMAIN_NAME}'/g'
--- The right result is below
domainname: dcsw-79-98vm.us.oracle.com
VAR=8675309
echo "abcde:jhdfj$jhbsfiy/.hghi$jh:12345:dgve::" |\
sed 's/:[0-9]*:/:'$VAR':/1'
where VAR contains what you want to replace the field with
I had similar problem, I had a list and I have to build a SQL script based on template (that contained #INPUT# as element to replace):
for i in LIST
do
awk "sub(/\#INPUT\#/,\"${i}\");" template.sql >> output
done
If your replacement string may contain other sed control characters, then a two-step substitution (first escaping the replacement string) may be what you want:
PWD='/a\1&b$_' # these are problematic for sed
PWD_ESC=$(printf '%s\n' "$PWD" | sed -e 's/[\/&]/\\&/g')
echo 'xxx' | sed "s/xxx/$PWD_ESC/" # now this works as expected
for me to replace some text against the value of an environment variable in a file with sed works only with quota as the following:
sed -i 's/original_value/'"$MY_ENVIRNONMENT_VARIABLE"'/g' myfile.txt
BUT when the value of MY_ENVIRONMENT_VARIABLE contains a URL (ie https://andreas.gr) then the above was not working.
THEN use different delimiter:
sed -i "s|original_value|$MY_ENVIRNONMENT_VARIABLE|g" myfile.txt

Add prefix to each word of each line in bash

I have a variable called deps:
deps='word1 word2'
I want to add a prefix to each word of the variable.
I tried with:
echo $deps | while read word do \ echo "prefix-$word" \ done
but i get:
bash: syntax error near unexpected token `done'
any help? thanks
With sed :
$ deps='word1 word2'
$ echo "$deps" | sed 's/[^ ]* */prefix-&/g'
prefix-word1 prefix-word2
For well behaved strings, the best answer is:
printf "prefix-%s\n" $deps
as suggested by 123 in the comments to fedorqui's answer.
Explanation:
Without quoting, bash will split the contents of $deps according to $IFS (which defaults to " \n\t") before calling printf
printf evaluates the pattern for each of the provided arguments and writes the output to stdout.
printf is a shell built-in (at least for bash) and does not fork another process, so this is faster than sed-based solutions.
In another question I just came across the markers for beginning (\<) and end (\>) of words. With those you can shorten the solution of SLePort above somewhat. The solution also nicely extends to appending a suffix, which I needed in addition to the prefix, but couldn't figure out how to use above solution for it, as the & also includes the possible trailing whitespace after the word.
So my solution is this:
$ deps='word1 word2'
# add prefix:
$ echo "$deps" | sed 's/\</prefix-/g'
prefix-word1 prefix-word2
# add suffix:
$ echo "$deps" | sed 's/\>/-suffix/g'
word1-suffix word2-suffix
Explanation: \< matches the beginning of every word, and \> matches the end of each word. You can simply "replace" these by the prefix/suffix, resulting in them being prepended/appended. There is no need to reference them anymore in the replacement, as these are not "real" characters anyway!
You can read the string into an array and then prepend the string to every item:
$ IFS=' ' read -r -a myarray <<< "word1 word2"
$ printf "%s\n" "${myarray[#]}"
word1
word2
$ printf "prefix-%s\n" "${myarray[#]}"
prefix-word1
prefix-word2

Bash sed replace with exact match of a text in a file

I have a file pattern.txt which is composed of one very long line of complicated code (~8200 chars).
This code can be found in multiple files inside multiple directories.
I can easily identify a list of these files using
grep -rli 'uniquepartofthecode' *
My concern is how do I replace it with the exact text from within the file ?
I tried to do:
var=$(cat pattern.txt)
sed -i "s/$var//g" targetfile.txt
but I got the following error :
sed: -e expression #1, char 96: unknown option to `s'
sed is interpreting my $var content as a regular expression, I would like it to just match the exact text.
The pattern.txt content could be more or less any combination of characters so I'm afraid I cannot escape every characters efficiently.
Is there a solution using sed ? Or should I use another tool for that ?
EDIT:
I tried using this solution to make a proper regex pattern from my text file.
Is it possible to escape regex metacharacters reliably with sed
the overall process is:
var=$(cat pattern.txt)
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$var")
sed -n "s/$searchEscaped/foo/p" <<<"$var" # if ok, echoes 'foo'
This last command displays "foo". $searchEscaped seems to be properly escaped.
Though, this is not returning anything (it should display foo + the rest of the file without the matched part):
sed -n "s/$searchEscaped/foo/p" targetfile.txt
I think that the best solution is to not use regular expressions at all and resort to string replacement.
One way to do this is using perl:
$ echo "$string_to_replace"
some other stuff abc$^%!# some more
$ echo "$search"
abc$^%!#
$ perl -spe '$len = length $search;
while (($pos = index($_, $search, $n)) > -1) {
substr($_, $pos, $len) = "replacement";
$n = $pos + $len;
}' <<<"$string_to_replace" -- -search="$search"
some other stuff replacement some more
The -p switch tells perl to loop through each line of the variable $string_to_replace (which could easily be replaced by a file). -s allows options to be passed to the script - in this case, I've passed a shell variable containing the search string.
For each line of the file, the while loop runs through all of the matches of the search string. substr is used on the left hand of the assignment to replace a substring of $_, which refers to the current line being processed.

Bash - sed syntax with variables

I've got two variables VAR1 and VAR2 that contain strings. What I want to do is go through a list of files that have a .txt extension and change all occurences of VAR1 to VAR2. So far, it looks like this:
for i in `find . -name "*.txt"`
do
echo $i
sed -i -E "s|\$VAR1|\$VAR2|g" $i
done
I think everything except the sed line is working well. I think it's a syntax issue, but I haven't been able to figure out what it is. Any help would be appreciated
Thanks
You shouldn't need to escape your $ variable. Also make sure to use the lower case -e and quote the filename in case it has spaces:
sed -ri -e "s|$VAR1|$VAR2|g" "$i"
Since sed's "find-and-replace" functionality is oriented to regular expressions rather than literal strings, you might wish to consider an alternative to sed, e.g. using awk as follows:
awk -v from="$VAR1" -v to="$VAR2" '
function replace(a,b,s, n) {
n=index(s,a);
if (n==0) {return s}
return substr(s,1,n-1) b replace(a,b, substr(s,n+length(a)));
}
{print replace(from, to, $0)} '
The above can easily be combined with the find ... | while read f ; do .... done pattern mentioned elsewhere on this page.
GNU awk supports the equivalent of sed's '-i' option, but it's probably better simply to direct the output of awk to a temporary file, and then mv it into place.
You managed to quote the dollar sign from the shell (which would not have been necessary if you had used single quotes instead of double) but this does not change the fact that dollar signs also have a meaning in regular expressions. Double the backslashes to escape from both the shell and sed, or use single quotes so the backslashes get through to sed. Alternatively, use a notation which does not require backslashes.
sed -i -E 's|[$]VAR1|$VAR2|g' "$i"
Incidentally, your loop has a number of problems. Your for loop will not work correctly if there are file names with whitespace in them, and you need to quote the arguments inside the loop. To completely cope with file names with special characters in them, you want to use find -exec instead.
find . -name "*.txt" -exec sed -i -E 's|[$]VAR1|$VAR2|g' {} \;
If your find supports \+ instead of \;, by all means use that.
(1) Using the idiom for i in $(find ....) ; do ...; done will often work as intended, but it is not robust. Significantly better is the pattern:
find ... | while read i ; do ... ; done
(2) If $VAR1 and/or $VAR2 contain characters that have special significance in regular expressions, then some care will be required. For example, parentheses ("(" and ")") have special significance, and so if VAR1 contains these, using the -r option (or on a Mac, the -E option) is probably asking for trouble.
(3) Chances are that sed -i -e "s|$VAR1|$VAR2|g" will do the trick if VAR1 does not contain any of the eight characters: ^$*[]\|. and if VAR2 does not contain "|", "\" or "&".
(4) If you want to prepare your strings ($VAR1 and $VAR2) programatically for use with sed, then see this SO page; it shows how to munge the strings -- using sed of course!

How to ensure I have exactly 2 spaces before string and zero spaces after

I get a string that can have from zero to multiple leading and trailing spaces.
I'm trying to get rid of them without lot of hackery but my code looks huge.
How to do this in a clean way?
as easy as:
$ src=" some text "
$ dst=" $(echo $src)"
$ echo ":$dst:"
: some text:
$(echo $src) will get rid of all around spaces.
than you simply add how much spaces you need before it.
How are you calling out the string? If it's an echo you can just put
Echo "<2 spaces>". "string";
if it's a normal string you just put 2 spaces between the first qoute and the string.
"<2spaces> string here"
One way using GNU sed:
sed 's/^[ \t]*/ /; s/[ \t]*$//' file.txt
You can apply this to a bash variable like this:
echo "$string" | sed 's/^[ \t]*/ /; s/[ \t]*$//'
And save it like this:
variable=$(echo "$string" | sed 's/^[ \t]*/ /; s/[ \t]*$//')
Explanation:
The first substitution will remove all leading whitespace and replace it with two spaces.
The second substitution will simply remove all lagging whitespace from a line.
The simplest is probably to use an external process.
value=$(echo "$value" | sed 's/^ *\(.*[^ ]\) *$/ \1/')
If you need to transform an empty string into two spaces, you'll need to modify the regex; and if you're not on Linux, your sed dialect may differ slightly. For maximum portability, switch to awk or Perl, or do it all in Bash. That gets a bit more complex, but for a start, trailing=${value##*[! ]} contains any trailing spaces, and you can trim them off with ${value%$trailing}, and similarly for leading spaces. See the section on variable substitution in the Bash manual for details.
You can use a regular expression to match everything between the leading and trailing spaces. The matched text is found in the BASH_REMATCH array (the text matching the first parentheses group is in element 1).
spcs='\ *'
text='.*[^ ]'
[[ $src =~ ^$spcs($text)$spcs$ ]]
dst=" ${BASH_REMATCH[1]}"

Resources