How to replace all underscores in a text except the ones that are part of a specific word or pattern in Unix Shell - shell

I have a file that contains lot of underscores and i have to replace all of them with empty string except the ones that are part of a specific string usr_mstr.
I have tried sed command, it replaces underscore and excludes the words i provided, but it also replaces the character immediately following underscore! Any help will greatly be appreciated..
echo "fname_sname_id_usr_mstr" | sed 's/_[^usr_mstr]//g'
Expected output:
fnamesnameidusr_mstr
Actual Output:
fnamenamedusr_mstr
(s and i got replaced)

This might work for you (GNU sed):
sed -r 's/(usr_mstr)|_/\1/g' file
Globally replace usr_mstr by itsself or replace _ by nothing

[^usr_mstr] is a character class that matches any character that's not u, s, r, m, t, or _.
Perl supports "look-around" assertions, so you can write:
echo "fname_sname_id_usr_mstr_x_usr_other_mstr_y_usrmstr_z" \
| perl -pe 's/(?<!usr)_//g;s/_(?!mstr)//g'
i.e. replace _ if not preceded by usr, and not followed by mstr.

You cannot solve this with standard sed BRE regex alone. With sed you would basically need to replace "usr_mstr" with a placeholder string, then replace all the underscores and then replace the placeholder string with the "usr_master" ..
echo "fname_sname_id_usr_mstr" |
{ null="###"; sed "s/usr_mstr/$null/g; s/_//g; s/$null/usr_mstr/g" ;}
An alternative is to try awk :
echo "fname_sname_id_usr_mstr" |
awk -v s="usr_mstr" 'BEGIN{FS=OFS=s} {for(i=1; i<=NF; i++) gsub("_","",$i)}1'
Which should work as long a s does not contains regular characters that are special in extended regular expressions.

Related

Replace the matched string in the comma separated string pattern

I have a comma separated strings inside brackets and I need to replace the string in matches the pattern.
And we have unknown string at the start and at the end. In the below example I need to replace c++ string with c if the row has string ruby.
I tried below sed command but it didnt work.
```
("java","php","ruby",".net","scala","c++",...n),
(".net","ruby","php","java","c++",...n),
("java",".net","ruby","php","c++",...n),
("ruby","java",".net","php","c++",...n);
```
```
sed -e "s/(\(.*\),\("ruby"\),\(.*\),"c++",\(.*\))/(\1,\2,\3,"c",\4)/g"
```
("java","php","ruby",".net","scala","c++",...n),
(".net","ruby","php","java","c++",...n),
("java",".net","ruby","php","c++",...n),
("ruby","java",".net","php","c++",...n);
'
{m,n,g}awk '/\42ruby\42/ ? NF = NF : NF' FS='"c[+][+]"' OFS='"c"'
'
("java","php","ruby",".net","scala","c",...n),
(".net","ruby","php","java","c",...n),
("java",".net","ruby","php","c",...n),
("ruby","java",".net","php","c",...n);
it seems like your sed command is not escaping double quotes
sed -e "s/(\(.*\),\("ruby"\),\(.*\),"c++",\(.*\))/(\1,\2,\3,"c",\4)/g"
change it to single quotes.
sed -e 's/(\(.*\),\("ruby"\),\(.*\),"c++",\(.*\))/(\1,\2,\3,"c",\4)/g' file.txt
or more simply use the below one...
sed -e 's/\("ruby"\),\(.*\),"c++"/\1,\2,"c"/g' my_file.txt
which will output
("jsjs","java",".net","php","c++",...n);
("java","php","ruby",".net","scala","c",...n),
(".net","ruby","php","java","c",...n),
("java",".net","ruby","php","c",...n),
("ruby","java",".net","php","c",...n);
("rubys","java",".net","php","c++",...n);

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

How to replace a string with string containing multiple / characters

I am trying to change the following string
FROM java_jre_8#sha256:92f22331226b9b3c43a15eeeb304dd7
to
FROM docker-registry.service.consul:5000/java_jre_8#sha256:92f22331226b9b3c43a15eeeb304dd7
but am having difficult with sed as a result of / character
This is for a build server.
There are two ways of doing this. The first is to escape each / in the string you're replacing:
sed 's/from/to with \/ ... /'
The other, more simple way is to use a delimiter other than /. While most sed examples use / as a delimiter, you can use any character:
sed 's|from|to with / ...|'
Here, the | is the first character following s, and therefore sed knows to use this as a delimiter.
You can use # as the delimiter as it doesn't appear in your string (you can still use / but then you'll have to quote the actual /s that are part of the string).
sed "s#FROM java_jre_8##FROM docker-registry.service.consul:5000/java_jre_8##'
Example:
$ echo "FROM java_jre_8#sha256:92f22331226b9b3c43a15eeeb304dd7" | sed "s#FROM java_jre_8##FROM docker-registry.service.consul:5000/java_jre_8##"
FROM docker-registry.service.consul:5000/java_jre_8#sha256:92f22331226b9b3c43a15eeeb304dd7

Get string between strings in bash

I want to get the string between <sometag param=' and '>
I tried to use the method from Get any string between 2 string and assign a variable in bash to get the "x":
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*/\1/p'
The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:
x_><irrelevant stuff=_nonsense
but the correct output would be the minimum-match, in this example just "x"
Thanks for your help
You are probably looking for something like this:
sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"
Test:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"
Results:
x
Explanation:
Instead of a greedy capture, use a non-greedy capture like: [^']* which means match anything except ' any number of times. To make the pattern stick, this is followed by: '>.
You can also use double quotes so that you don't need to escape the single quotes. If you wanted to escape the single quotes, you'd do this:
-
... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*/\1/p'
Notice how that the single quotes aren't really escaped. The sed expression is stopped, an escaped single quote is inserted and the sed expression is re-opened. Think of it like a four character escape sequence.
Personally, I'd use GNU grep. It would make for a slightly shorter solution. Run like:
... | grep -oP "(?<=<sometag param=').*?(?='>)"
Test:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"
Results:
x
You don't have to assemble regexes in those cases, you can just use ' as the field separator
in="<sometag param='x'><irrelevant stuff='nonsense'>"
IFS="'" read x whatiwant y <<< "$in" # bash
echo "$whatiwant"
awk -F\' '{print $2}' <<< "$in" # awk

insert a string at specific position in a file by SED awk

I have a string which i need to insert at a specific position in a file :
The file contains multiple semicolons(;) i need to insert the string just before the last ";"
Is this possible with SED ?
Please do post the explanation with the command as I am new to shell scripting
before :
adad;sfs;sdfsf;fsdfs
string = jjjjj
after
adad;sfs;sdfsf jjjjj;fsdfs
Thanks in advance
This might work for you:
echo 'adad;sfs;sdfsf;fsdfs'| sed 's/\(.*\);/\1 jjjjj;/'
adad;sfs;sdfsf jjjjj;fsdfs
The \(.*\) is greedy and swallows the whole line, the ; makes the regexp backtrack to the last ;. The \(.*\) make s a back reference \1. Put all together in the RHS of the s command means insert jjjjj before the last ;.
sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/' filename
(substitute jjjjj with what you need to insert).
Example:
$ echo 'adad;sfs;sdfsf;fsdfs;' | sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/'
adad;sfs;sdfsfjjjjj;fsdfs;
Explanation:
sed finds the following pattern: \([^;]*\)\(;[^;]*;$\). Escaped round brackets (\(, \)) form numbered groups so we can refer to them later as \1 and \2.
[^;]* is "everything but ;, repeated any number of times.
$ means end of the line.
Then it changes it to \1jjjjj\2.
\1 and \2 are groups matched in first and second round brackets.
For now, the shorter solution using sed : =)
sed -r 's#;([^;]+);$#; jjjjj;\1#' <<< 'adad;sfs;sdfsf;fsdfs;'
-r option stands for extented Regexp
# is the delimiter, the known / separator can be substituted to any other character
we match what's finishing by anything that's not a ; with the ; final one, $ mean end of the line
the last part from my explanation is captured with ()
finally, we substitute the matching part by adding "; jjjj" ans concatenate it with the captured part
Edit: POSIX version (more portable) :
echo 'adad;sfs;sdfsf;fsdfs;' | sed 's#;\([^;]\+\);$#; jjjjj;\1#'
echo 'adad;sfs;sdfsf;fsdfs;' | sed -r 's/(.*);(.*);/\1 jjjj;\2;/'
You don't need the negation of ; because sed is by default greedy, and will pick as much characters as it can.
sed -e 's/\(;[^;]*\)$/ jjjj\1/'
Inserts jjjj before the part where a semicolon is followed by any number of non-semicolons ([^;]*) at the end of the line $. \1 is called a backreference and contains the characters matched between \( and \).
UPDATE: Since the sample input has no longer a ";" at the end.
Something like this may work for you:
echo "adad;sfs;sdfsf;fsdfs"| awk 'BEGIN{FS=OFS=";"} {$(NF-1)=$(NF-1) " jjjjj"; print}'
OUTPUT:
adad;sfs;sdfsf jjjjj;fsdfs
Explanation: awk starts with setting FS (field separator) and OFS (output field separator) as semi colon ;. NF in awk stands for number of fields. $(NF-1) thus means last-1 field. In this awk command {$(NF-1)=$(NF-1) " jjjjj" I am just appending jjjjj to last-1 field.

Resources