How to use sed command to add a string before a pattern string? - bash

I want to use sed to modify my file named "baz".
When i search a pattern foo , foo is not at the beginning or end of line, i want to append bar before foo, how can i do it using sed?
Input file named baz:
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
Output file
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah

You can just use something like:
sed 's/foo/barfoo/g' baz
(the g at the end means global, every occurrence on each line rather than just the first).
For an arbitrary (rather than fixed) pattern such as foo[0-9], you could use capture groups as follows:
pax$ echo 'xyz fooA abc
xyz foo5 abc
xyz fooB abc' | sed 's/\(foo[0-9]\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc
xyz fooB abc
The parentheses capture the actual text that matched the pattern and the \1 uses it in the substitution.
You can use arbitrarily complex patterns with this one, including ensuring you match only complete words. For example, only changing the pattern if it's immediately surrounded by a word boundary:
pax$ echo 'xyz fooA abc
xyz foo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc' | sed 's/\(\bfoo[0-9]\b\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc
In terms of how the capture groups work, you can use parentheses to store the text that matches a pattern for later use in the replacement. The captured identifiers are based on the ( characters reading from left to right, so the regex (I've left off the \ escape characters and padded it a bit for clarity):
( ( \S* ) ( \S* ) )
^ ^ ^ ^ ^ ^
| | | | | |
| +--2--+ +--3--+ |
+---------1---------+
when applied to the text Pax Diablo would give you three groups:
\1 = Pax Diablo
\2 = Pax
\3 = Diablo
as shown below:
pax$ echo 'Pax Diablo' | sed 's/\(\(\S*\) \(\S*\)\)/[\1] [\2] [\3]/'
[Pax Diablo] [Pax] [Diablo]

Just substitute the start of the line with something different.
sed '/^foo/s/^/bar/'

To replace or modify all "foo" except at beginning or end of line, I would suggest to temporarily replace them at beginning and end of line with a unique sentinel value.
sed 's/^foo/____veryunlikelytoken_bol____/
s/foo$/____veryunlikelytoken_eol____/
s/foo/bar&/g
s/^____veryunlikelytoken_bol____/foo/
s/____veryunlikelytoken_eol____$/foo/'
In sed there is no way to specify "cannot match here". In Perl regex and derivatives (meaning languages which borrowed from Perl's regex, not necessarily languages derived from Perl) you have various negative assertions so you can do something like
perl -pe 's/(?!^)foo(?!$)/barfoo/g'

Related

how to remove all whitespaces in front and beind 3 consecutive periods

I'm trying to remove all white spaces before and after 3 consecutive periods and replace it with the actual ellipse symbol.
I've tried the following code:
sed 's/[[:space:]]*\.\.\.[[:space:]]*/…/g'
It replaces the 3 periods with the ellipse symbol, but the spaces before and after remain.
Sample Input.
hello ... world
Desired output
hello…world
Expression you are using is ERE(extended regular expressions) you have to add -E option to sed as follows to allow it, since you are using character classes in your code [[:space:]].
sed -E 's/[[:space:]]*\.\.\.[[:space:]]*/.../g' Input_file
Without -E try:
sed 's/ *\.\.\. */.../g' Input_file
Here is another sed
echo "hello ... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello...world
4 dots, do nothing?
echo "hello .... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello .... world
In bash, just use parameter substitution...
foo="hello ... world"
foo="${foo//+( )...+( )/...}"
Now, echo "$foo", outputs:
hello...world
The syntax for BaSH regex variable substitution are as follows:
${var-name/search/replace}
A single /replaces only the first occurrence from the left, while a double //replaces every occurrence.
One of ?*+#! followed by (pattern-list) replaces a specified number of occurrences of the patterns in pattern-list as follows:
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
# A single occurence
! Anything that *doesn't* match one of the occurrences
Pattern list can be any combination of literal strings, or character classes, separated by the pipe character |

Word after a particular word in a string

I have string say e.g. ab_abc_bbb_ccc_ssss_pppp, I want the word after ccc i.e. ssss from the string, how to achieve the same using unix command
you mean something like this?
echo "ab_abc_bbb_ccc_ssss_pppp" | sed 's/.*ccc_\([^_]*\).*/\1/'
explanation
s/ # substitute
.*ccc_ # find search pattern
\([^_]*\) # save all chars without '_' into arg1 (\1)
_.*/ # ignore trailing chars
\1/ # print \1
output
ssss

Replacing one space with two spaces in Unix

I am trying to replace every time there is one space with two spaces in Unix. We are just reading from standard input and writing to standard ouput. I also have to avoid using the functions awk and perl. For example if I read in something like San Diego it should print San Diego. If there are already multiple spaces, it should just leave them alone.
How about bash only? First test file:
$ cat file
1
2 3
4 5
San Diego NO
Then:
$ cat file |
while IFS= read line
do
while [[ "$line" =~ (^|.+[^ ])\ ([^ ].*) ]]
do
line="${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"
done
echo "$line"
done
1
2 3
4 5
San Diego NO
You have to a bit careful here not to forget spaces at the beginning or end.
I present three solutions for educational purpose:
sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g' # solution 1
sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g' # solution 2
sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g' # solution 3
All three solutions make use of subexpressions:
9.3.6 BREs Matching Multiple Characters
A subexpression can be defined within a BRE by enclosing it between
the character pairs \( and \). Such a subexpression shall match
whatever it would have matched without the \( and \), except that
anchoring within subexpressions is optional behavior; see BRE
Expression Anchoring. Subexpressions can be arbitrarily nested.
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression enclosed
between "\(" and "\)" preceding the '\n'. The character n shall be a
digit from 1 through 9, specifying the nth subexpression (the one that
begins with the nth \( from the beginning of the pattern and ends
with the corresponding paired \) ). The expression is invalid if
less than n subexpressions precede the \n. For example, the
expression ".∗\1$" matches a line consisting of two adjacent
appearances of the same string, and the expression a*\1 fails to
match a. When the referenced subexpression matched more than one
string, the back-referenced expression shall refer to the last matched
string. If the subexpression referenced by the back-reference matches
more than one string because of an asterisk (*) or an interval
expression (see item (5)), the back-reference shall match the last
(rightmost) of these strings.
Solution 1: sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g'
Here there are two subexpressions. The first subexpression \(^\|[^ ]\) matches the beginning of the line (^) or (\|) a non-space character ([^ ]). The second subexpression \($\|[^ ]\) is similar but with the end-of-line ($).
Solution 2: sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g'
This replaces one-or more spaces by the same amount of spaces and an extra one. Afterwards we correct the ones with 3 spaces or more by removing a single space from those.
Solution 3: sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g'
This does the same thing as solution 2 but inverts the logic. First remove a space from all sequences that have more then one space, and afterwards add a space. This one-liner is just one-character shorter then solution 2.
Example: based on solution 1
The following commands are nothing more then echo "string" | sed ..., but to show the spaces, wrapped into a printf statement.
# default string
$ printf "|%s|" " foo bar car "
| foo bar car |
# spaces replaced
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
# 3 spaces in front and back
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
note: If you want to replace any form of blanks (spaces and tabs in any encoding) by the same doubled blank, you could use :
sed 's/\(^\|[^[:blank:]]\)\([[:blank:]]\)\($\|[^[:blank:]]\)/\1\2\2\3/g'
sed 's/\(^\|[[:graph:]]\)\([[:blank:]]\)\($\|[[:graph:]]\)/\1\2\2\3/g
Something along the lines of
cat input.txt | sed 's,\([[:alnum:]]\) \([[:alnum:]]\),\1 \2,'
should work for that purpose.
replace only occurrence of 1 space between 2 chars hat are not white space with 2 spaces
`sed 's/\([^ ]\) \([^ ]\)/\1 \2/g' file`
1) [^ ] - not space char
2) \1 \2 - first expression found in Parenthesis, 2 spaces, second Parentheses expiration
3) sed used with s///g is replacing the regex in the first // with the value in the second //

bash script: how to insert text between two specific characters

For example, I have a file containing a line as below:
"abc":"def"
I need to insert 123 between "abc":" and def" so that the line will become: "abc":"123def".
As "abc" appears only once so I think I can just search it and do the insertion.
How to do this with bash script such as sed or awk?
AMD$ sed 's/"abc":"/&123/' File
"abc":"123def"
Match "abc":", then append this match with 123 (& will contain the matched string "abc":")
If you want to take care of space before and after :, you can use:
sed 's/"abc" *: *"/&123/'
For replacing all such patterns, use g with sed.
sed 's/"abc" *: *"/&123/g' File
sed:
$ sed -E 's/(:")(.*)/\1123\2/' <<<'"abc":"def"'
"abc":"123def"
(:") gets :" and put in captured group 1
(.*) gets the remaining portion and put in captured group 2
in the replacement, \1123\2 puts 123 between the groups
awk:
$ awk -F: 'sub(".", "&123", $2)' <<<'"abc":"def"'
"abc" "123def"
In the sub() function, the second ($2) field is being operated on, pattern is used as . (which would match "), and in the replacement the matched portion (&) is followed by 123.
echo '"abc":"def"'| awk '{sub(/def/,"123def")}1'
"abc":"123def"

sed not working as expected (trying to get value between two matches in a string)

I have a file (/tmp/test) the has a the string "aaabbbccc" in it
I want to extract "bbb" from the string with sed.
Doing this returns the entire string:
sed -n '/aaa/,/ccc/p' /tmp/test
I just want to return bbb from the string with sed (I am trying to learn sed so not interested in other solutions for this)
Sed works on a line basic, and a,b{action} will run action for lines matching a until lines matching b. In your case
sed -n '/aaa/,/ccc/p'
will start printing lines when /aaa/ is matched, and stop when /ccc/ is matched which is not what you want.
To manipulate a line there is multiply options, one is s/search/replace/ which can be utilized to remove the leading aaa and trailing ccc:
% sed 's/^aaa\|ccc$//g' /tmp/test
bbb
Breakdown:
s/
^aaa # Match literal aaa in beginning of string
\| # ... or ...
ccc$ # Match literal ccc at the end of the sting
// # Replace with nothing
g # Global (Do until there is no more matches, normally when a match is
# found and replacement is made this command stops replacing)
If you are not sure how many a's and c's you have you can use:
% sed 's/^aa*\|cc*$//g' /tmp/test
bbb
Which will match literal a followed by zero or more a's at the beginning of the line. Same for the c's but just at the end.
With GNU sed:
sed 's/aaa\(.*\)ccc/\1/' /tmp/test
Output:
bbb
See: The Stack Overflow Regular Expressions FAQ

Resources