Empty regular expression in sed script - shell

Found the following sed script to reverse characters in each line, from the famous "sed one liners", and I am not able to follow the following command in //D of the script
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
Suppose the inital file had two lines to start with say,
apple
banana
After the first command,
/\n/!G
pattern space would be,
apple
banana
[a new line introduced after each line. Code tag removing the last new line here. So it is not shown].
After the second command,
s/\(.\)\(.*\n\)/&\2\1/
pattern space would be,
apple
pple
a
banana
anana
b
How does the third command work after this? Also, I understand empty regular expression(//) matches the previously matched regexp. But in this case, what that will be? \n from the 1st command or the regexp substituted by the 2nd command? Any help would be much appreciated. Thanks.

Using the suggestion from my own comment above
this is what happens:
After /\n/!G pattern space would be
apple¶
banana¶
After s/\(.\)\(.*\n\)/&\2\1/ pattern space would be
apple¶pple¶a
banana¶anana¶b
then comes the D command. from man sed:
D Delete up to the first embedded newline in the pattern space.
Start next cycle, but skip reading from the input if there is
still data in the pattern space.
so the first word and the first ¶ is deleted. then sed starts from the
1st command but since the pattern space contains a ¶ the pattern /\n/
does not match and the G command is not executed.
The 2nd command leads to
pple¶ple¶pa
anana¶nana¶ab
can you continue from there?

D mean Delete first line (until first \n) and restart the current cycle if there is still something in the buffer
// is a shortcut to previous pattern matching (reuse the last pattern to serach for)
$ echo "123" | sed -n 's/2/other/;// p'
$
No corresponding (because it change the pattern matching content)
$ echo "123" | sed -n 's/.2/&still/;// p'
12still3
$
Pattern .2 is found also when // p is used because it is the equivalent to /.2/ p

Related

SED's Substituted string is considered as one-line string, whereas it contains newline character

I am testing the sed command to substitute one line with 3 lines and, then, to delete the last line. (I could have substituted it with only the 2 first lines, but this is deliberately stated like this to showcase the main issue).
Let's say that I have the following text :
// ##OPTION_NAME: xxxx
I want to replace the token ##OPTION_NAME by ##OP-NAME and surround it by 2 new lines; Like so :
// ##OP-START
// ##OP-NAME: xxxx
// ##OP-END
To illustrate this, I put this text in a code.c file, and the sed commands in a sed script named script.sed.
Then, I call the following shell command :
Shell command
sed -f script.sed code.c
script.sed
# Begin by replacing patterns by their equivalents, surrounding them with ##OP-START and ##OP-END lines
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g
The problem
Now, I add another sed command in script.sed to delete the line containing ##OP-END. Surprise ! all 3 lines are removed !
# Begin by replacing patterns by their equivalents, surrounding them with ##OP-START and ##OP-END lines
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g
# Last parse; delete ##OP-END
/##OP-END/d
I tried \r\n instead of \n in the sustitution command
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g, but it does not work.
I also tested on ##OP-START to see if it makes some difference,
but alas ! All 3 lines were removed too.
It seems that sed is considering it as one line !
This is not a surprise, d operates on the pattern space, not on a per line basis. After the modification with the s command, your pattern space contains 3 lines. The content of it matches the expression and gets therefore deleted.
To delete this line from the pattern space, you need to use the s command again:
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g$
s/\n\/\/ ##OP-END//
About pattern and hold space: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html#tag_20_116_13

sed error unterminated substitute pattern for new line text

I am writing a script to add new dependencies to the watch list. I am putting a placeholder to know where to add the text, for eg
assets = [
"../../new_app/assets"
# [[NEW_APP_ADD_ASSETS]]
]
It is simple to replace just the place holder but my problem is to add comma in the previous line.
that can be done if I search and replace
"
# [[NEW_APP_ADD_ASSETS]]
ie "\n # [[NEW_APP_ADD_ASSETS]]
I am not able to search for the new line.
One of the solutions I found for adding a new line was
sed -i '' 's/newline/line one\
line two/' filename.txt
But when same way done for the search string it returns :unterminated substitute pattern
sed -i '' s/'assets\"\
#'/'some new text'/ filename.txt
PS: I writing on macos
Sed works on a line-by-line base, hence it becomes tricky to add the coma to the previous line as that line has already been processed. It is possible, but the sed syntax quickly becomes messy.
To be a bit more specific:
In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space. Reading from input shall be skipped if a <newline> was in the pattern space prior to a D command ending the previous cycle. The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits. If no commands explicitly started a new cycle, then at the end of the script the pattern space shall be copied to standard output (except when -n is specified) and the pattern space shall be deleted. Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a <newline>.
In short, if you do not manipulate the pattern space, you cannot process <newline> characters as they just do not appear!
And even shorter, if you only use the substitute command, sed only processes one line at a time!
This is also why you suffer from : unterminated substitute pattern. You are searching for a newline character, but as sed just reads one line at a time, it just does not find it and it also does not expect it. The error will vanish if you replace your newline with the symbols \n.
sed -i '' s/'assets\"\n #'/'some new text'/ filename.txt
A better way to achieve your goals would be to make use of awk. It is a bit more readable:
awk '/# [[NEW_APP_ADD_ASSETS]]/{ print t","; t="line1\nline2"; next }
{ print t; t=$0 }
END{ print t }' <file>

Bash script output text between first match and 2nd match only [duplicate]

I'm trying to use sed to clean up lines of URLs to extract just the domain.
So from:
http://www.suepearson.co.uk/product/174/71/3816/
I want:
http://www.suepearson.co.uk/
(either with or without the trailing slash, it doesn't matter)
I have tried:
sed 's|\(http:\/\/.*?\/\).*|\1|'
and (escaping the non-greedy quantifier)
sed 's|\(http:\/\/.*\?\/\).*|\1|'
but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.
Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:
perl -pe 's|(http://.*?/).*|\1|'
In this specific case, you can get the job done without using a non-greedy regex.
Try this non-greedy regex [^/]* instead of .*?:
sed 's|\(http://[^/]*/\).*|\1|g'
With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*\)/.*;\1;p'
Output:
http://www.suon.co.uk
this is:
don't output -n
search, match pattern, replace and print s/<pattern>/<replace>/p
use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p
remember match between brackets \( ... \), later accessible with \1,\2...
match http://
followed by anything in brackets [], [ab/] would mean either a or b or /
first ^ in [] means not, so followed by anything but the thing in the []
so [^/] means anything except / character
* is to repeat previous group so [^/]* means characters except /.
so far sed -n 's;\(http://[^/]*\) means search and remember http://followed by any characters except / and remember what you've found
we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;\(http://[^/]*\)/' but we want to match the rest of the line after the domain so add .*
now the match remembered in group 1 (\1) is the domain so replace matched line with stuff saved in group \1 and print: sed -n 's;\(http://[^/]*\)/.*;\1;p'
If you want to include backslash after the domain as well, then add one more backslash in the group to remember:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*/\).*;\1;p'
output:
http://www.suon.co.uk/
Simulating lazy (un-greedy) quantifier in sed
And all other regex flavors!
Finding first occurrence of an expression:
POSIX ERE (using -r option)
Regex:
(EXPRESSION).*|.
Sed:
sed -r ‍'s/(EXPRESSION).*|./\1/g' # Global `g` modifier should be on
Example (finding first sequence of digits) Live demo:
$ sed -r 's/([0-9]+).*|./\1/g' <<< 'foo 12 bar 34'
12
How does it work?
This regex benefits from an alternation |. At each position engine tries to pick the longest match (this is a POSIX standard which is followed by couple of other engines as well) which means it goes with . until a match is found for ([0-9]+).*. But order is important too.
Since global flag is set, engine tries to continue matching character by character up to the end of input string or our target. As soon as the first and only capturing group of left side of alternation is matched (EXPRESSION) rest of line is consumed immediately as well .*. We now hold our value in the first capturing group.
POSIX BRE
Regex:
\(\(\(EXPRESSION\).*\)*.\)*
Sed:
sed 's/\(\(\(EXPRESSION\).*\)*.\)*/\3/'
Example (finding first sequence of digits):
$ sed 's/\(\(\([0-9]\{1,\}\).*\)*.\)*/\3/' <<< 'foo 12 bar 34'
12
This one is like ERE version but with no alternation involved. That's all. At each single position engine tries to match a digit.
If it is found, other following digits are consumed and captured and the rest of line is matched immediately otherwise since * means
more or zero it skips over second capturing group \(\([0-9]\{1,\}\).*\)* and arrives at a dot . to match a single character and this process continues.
Finding first occurrence of a delimited expression:
This approach will match the very first occurrence of a string that is delimited. We can call it a block of string.
sed 's/\(END-DELIMITER-EXPRESSION\).*/\1/; \
s/\(\(START-DELIMITER-EXPRESSION.*\)*.\)*/\1/g'
Input string:
foobar start block #1 end barfoo start block #2 end
-EDE: end
-SDE: start
$ sed 's/\(end\).*/\1/; s/\(\(start.*\)*.\)*/\1/g'
Output:
start block #1 end
First regex \(end\).* matches and captures first end delimiter end and substitues all match with recent captured characters which
is the end delimiter. At this stage our output is: foobar start block #1 end.
Then the result is passed to second regex \(\(start.*\)*.\)* that is same as POSIX BRE version above. It matches a single character
if start delimiter start is not matched otherwise it matches and captures the start delimiter and matches the rest of characters.
Directly answering your question
Using approach #2 (delimited expression) you should select two appropriate expressions:
EDE: [^:/]\/
SDE: http:
Usage:
$ sed 's/\([^:/]\/\).*/\1/g; s/\(\(http:.*\)*.\)*/\1/' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
Output:
http://www.suepearson.co.uk/
Note: this will not work with identical delimiters.
sed does not support "non greedy" operator.
You have to use "[]" operator to exclude "/" from match.
sed 's,\(http://[^/]*\)/.*,\1,'
P.S. there is no need to backslash "/".
sed - non greedy matching by Christoph Sieghart
The trick to get non greedy matching in sed is to match all characters excluding the one that terminates the match. I know, a no-brainer, but I wasted precious minutes on it and shell scripts should be, after all, quick and easy. So in case somebody else might need it:
Greedy matching
% echo "<b>foo</b>bar" | sed 's/<.*>//g'
bar
Non greedy matching
% echo "<b>foo</b>bar" | sed 's/<[^>]*>//g'
foobar
Non-greedy solution for more than a single character
This thread is really old but I assume people still needs it.
Lets say you want to kill everything till the very first occurrence of HELLO. You cannot say [^HELLO]...
So a nice solution involves two steps, assuming that you can spare a unique word that you are not expecting in the input, say top_sekrit.
In this case we can:
s/HELLO/top_sekrit/ #will only replace the very first occurrence
s/.*top_sekrit// #kill everything till end of the first HELLO
Of course, with a simpler input you could use a smaller word, or maybe even a single character.
HTH!
This can be done using cut:
echo "http://www.suepearson.co.uk/product/174/71/3816/" | cut -d'/' -f1-3
another way, not using regex, is to use fields/delimiter method eg
string="http://www.suepearson.co.uk/product/174/71/3816/"
echo $string | awk -F"/" '{print $1,$2,$3}' OFS="/"
sed certainly has its place but this not not one of them !
As Dee has pointed out: Just use cut. It is far simpler and much more safe in this case. Here's an example where we extract various components from the URL using Bash syntax:
url="http://www.suepearson.co.uk/product/174/71/3816/"
protocol=$(echo "$url" | cut -d':' -f1)
host=$(echo "$url" | cut -d'/' -f3)
urlhost=$(echo "$url" | cut -d'/' -f1-3)
urlpath=$(echo "$url" | cut -d'/' -f4-)
gives you:
protocol = "http"
host = "www.suepearson.co.uk"
urlhost = "http://www.suepearson.co.uk"
urlpath = "product/174/71/3816/"
As you can see this is a lot more flexible approach.
(all credit to Dee)
sed 's|(http:\/\/[^\/]+\/).*|\1|'
There is still hope to solve this using pure (GNU) sed. Despite this is not a generic solution in some cases you can use "loops" to eliminate all the unnecessary parts of the string like this:
sed -r -e ":loop" -e 's|(http://.+)/.*|\1|' -e "t loop"
-r: Use extended regex (for + and unescaped parenthesis)
":loop": Define a new label named "loop"
-e: add commands to sed
"t loop": Jump back to label "loop" if there was a successful substitution
The only problem here is it will also cut the last separator character ('/'), but if you really need it you can still simply put it back after the "loop" finished, just append this additional command at the end of the previous command line:
-e "s,$,/,"
sed -E interprets regular expressions as extended (modern) regular expressions
Update: -E on MacOS X, -r in GNU sed.
Because you specifically stated you're trying to use sed (instead of perl, cut, etc.), try grouping. This circumvents the non-greedy identifier potentially not being recognized. The first group is the protocol (i.e. 'http://', 'https://', 'tcp://', etc). The second group is the domain:
echo "http://www.suon.co.uk/product/1/7/3/" | sed "s|^\(.*//\)\([^/]*\).*$|\1\2|"
If you're not familiar with grouping, start here.
I realize this is an old entry, but someone may find it useful.
As the full domain name may not exceed a total length of 253 characters replace .* with .\{1, 255\}
This is how to robustly do non-greedy matching of multi-character strings using sed. Lets say you want to change every foo...bar to <foo...bar> so for example this input:
$ cat file
ABC foo DEF bar GHI foo KLM bar NOP foo QRS bar TUV
should become this output:
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV
To do that you convert foo and bar to individual characters and then use the negation of those characters between them:
$ sed 's/#/#A/g; s/{/#B/g; s/}/#C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/g; s/}/bar/g; s/{/foo/g; s/#C/}/g; s/#B/{/g; s/#A/#/g' file
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV
In the above:
s/#/#A/g; s/{/#B/g; s/}/#C/g is converting { and } to placeholder strings that cannot exist in the input so those chars then are available to convert foo and bar to.
s/foo/{/g; s/bar/}/g is converting foo and bar to { and } respectively
s/{[^{}]*}/<&>/g is performing the op we want - converting foo...bar to <foo...bar>
s/}/bar/g; s/{/foo/g is converting { and } back to foo and bar.
s/#C/}/g; s/#B/{/g; s/#A/#/g is converting the placeholder strings back to their original characters.
Note that the above does not rely on any particular string not being present in the input as it manufactures such strings in the first step, nor does it care which occurrence of any particular regexp you want to match since you can use {[^{}]*} as many times as necessary in the expression to isolate the actual match you want and/or with seds numeric match operator, e.g. to only replace the 2nd occurrence:
$ sed 's/#/#A/g; s/{/#B/g; s/}/#C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/2; s/}/bar/g; s/{/foo/g; s/#C/}/g; s/#B/{/g; s/#A/#/g' file
ABC foo DEF bar GHI <foo KLM bar> NOP foo QRS bar TUV
Have not yet seen this answer, so here's how you can do this with vi or vim:
vi -c '%s/\(http:\/\/.\{-}\/\).*/\1/ge | wq' file &>/dev/null
This runs the vi :%s substitution globally (the trailing g), refrains from raising an error if the pattern is not found (e), then saves the resulting changes to disk and quits. The &>/dev/null prevents the GUI from briefly flashing on screen, which can be annoying.
I like using vi sometimes for super complicated regexes, because (1) perl is dead dying, (2) vim has a very advanced regex engine, and (3) I'm already intimately familiar with vi regexes in my day-to-day usage editing documents.
Since PCRE is also tagged here, we could use GNU grep by using non-lazy match in regex .*? which will match first nearest match opposite of .*(which is really greedy and goes till last occurrence of match).
grep -oP '^http[s]?:\/\/.*?/' Input_file
Explanation: using grep's oP options here where -P is responsible for enabling PCRE regex here. In main program of grep mentioning regex which is matching starting http/https followed by :// till next occurrence of / since we have used .*? it will look for first / after (http/https://). It will print matched part only in line.
echo "/home/one/two/three/myfile.txt" | sed 's|\(.*\)/.*|\1|'
don bother, i got it on another forum :)
sed 's|\(http:\/\/www\.[a-z.0-9]*\/\).*|\1| works too
Here is something you can do with a two step approach and awk:
A=http://www.suepearson.co.uk/product/174/71/3816/
echo $A|awk '
{
var=gensub(///,"||",3,$0) ;
sub(/\|\|.*/,"",var);
print var
}'
Output:
http://www.suepearson.co.uk
Hope that helps!
Another sed version:
sed 's|/[:alnum:].*||' file.txt
It matches / followed by an alphanumeric character (so not another forward slash) as well as the rest of characters till the end of the line. Afterwards it replaces it with nothing (ie. deletes it.)
#Daniel H (concerning your comment on andcoz' answer, although long time ago): deleting trailing zeros works with
s,([[:digit:]]\.[[:digit:]]*[1-9])[0]*$,\1,g
it's about clearly defining the matching conditions ...
You should also think about the case where there is no matching delims. Do you want to output the line or not. My examples here do not output anything if there is no match.
You need prefix up to 3rd /, so select two times string of any length not containing / and following / and then string of any length not containing / and then match / following any string and then print selection. This idea works with any single char delims.
echo http://www.suepearson.co.uk/product/174/71/3816/ | \
sed -nr 's,(([^/]*/){2}[^/]*)/.*,\1,p'
Using sed commands you can do fast prefix dropping or delim selection, like:
echo 'aaa #cee: { "foo":" #cee: " }' | \
sed -r 't x;s/ #cee: /\n/;D;:x'
This is lot faster than eating char at a time.
Jump to label if successful match previously. Add \n at / before 1st delim. Remove up to first \n. If \n was added, jump to end and print.
If there is start and end delims, it is just easy to remove end delims until you reach the nth-2 element you want and then do D trick, remove after end delim, jump to delete if no match, remove before start delim and and print. This only works if start/end delims occur in pairs.
echo 'foobar start block #1 end barfoo start block #2 end bazfoo start block #3 end goo start block #4 end faa' | \
sed -r 't x;s/end//;s/end/\n/;D;:x;s/(end).*/\1/;T y;s/.*(start)/\1/;p;:y;d'
If you have access to gnu grep, then can utilize perl regex:
grep -Po '^https?://([^/]+)(?=)' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
http://www.suepearson.co.uk
Alternatively, to get everything after the domain use
grep -Po '^https?://([^/]+)\K.*' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
/product/174/71/3816/
The following solution works for matching / working with multiply present (chained; tandem; compound) HTML or other tags. For example, I wanted to edit HTML code to remove <span> tags, that appeared in tandem.
Issue: regular sed regex expressions greedily matched over all the tags from the first to the last.
Solution: non-greedy pattern matching (per discussions elsewhere in this thread; e.g. https://stackoverflow.com/a/46719361/1904943).
Example:
echo '<span>Will</span>This <span>remove</span>will <span>this.</span>remain.' | \
sed 's/<span>[^>]*>//g' ; echo
This will remain.
Explanation:
s/<span> : find <span>
[^>] : followed by anything that is not >
*> : until you find >
//g : replace any such strings present with nothing.
Addendum
I was trying to clean up URLs, but I was running into difficulty matching / excluding a word - href - using the approach above. I briefly looked at negative lookarounds (Regular expression to match a line that doesn't contain a word) but that approach seemed overly complex and did not provide a satisfactory solution.
I decided to replace href with ` (backtick), do the regex substitutions, then replace ` with href.
Example (formatted here for readability):
printf '\n
<a aaa h href="apple">apple</a>
<a bbb "c=ccc" href="banana">banana</a>
<a class="gtm-content-click"
data-vars-link-text="nope"
data-vars-click-url="https://blablabla"
data-vars-event-category="story"
data-vars-sub-category="story"
data-vars-item="in_content_link"
data-vars-link-text
href="https:example.com">Example.com</a>\n\n' |
sed 's/href/`/g ;
s/<a[^`]*`/\n<a href/g'
apple
banana
Example.com
Explanation: basically as above. Here,
s/href/` : replace href with ` (backtick)
s/<a : find start of URL
[^`] : followed by anything that is not ` (backtick)
*` : until you find a `
/<a href/g : replace each of those found with <a href
Unfortunately, as mentioned, this it is not supported in sed.
To overcome this, I suggest to use the next best thing(actually better even), to use vim sed-like capabilities.
define in .bash-profile
vimdo() { vim $2 --not-a-term -c "$1" -es +"w >> /dev/stdout" -cq! ; }
That will create headless vim to execute a command.
Now you can do for example:
echo $PATH | vimdo "%s_\c:[a-zA-Z0-9\\/]\{-}python[a-zA-Z0-9\\/]\{-}:__g" -
to filter out python in $PATH.
Use - to have input from pipe in vimdo.
While most of the syntax is the same. Vim features more advanced features, and using \{-} is standard for non-greedy match. see help regexp.

How to capitalize first letter of every word using sed in OSX

I'm trying to capitalize the first letter of every word in a string using the following sed command, but it's not working:
echo "my string" | sed 's/\b\(.\)/\u\1/g'
Output:
my string
What am I doing wrong?
Thank you
Given your sample input, this will work in any awk:
$ echo 'my string' | awk '{for (i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) substr($i,2)} 1'
My String
If that doesn't do what you really want then edit your question to show some more truly representative sample input and expected output.
Here is a sed solution that works on OSX:
echo 'my string
ANOTHER STRING
tHiRd StRiNg' | sed -En '
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
:loop
h
s/^(.*[^a-zA-Z0-9])?([a-z]).*$/\2/
t next
b end
:next
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/^(.+)\n(.*[^a-zA-Z0-9])?[a-z](.*)$/\2\1\3/
t loop
:end
p
'
Output:
My String
Another String
Third String
The sed command works as follows:
sed inputs a line, and the first y command transforms all uppercase letters to lowercase.
The commands from :loop to t loop form a loop that executes once for
each word in the current line, capitalizing the first letter of each word.
When there are no more words to capitalize for the current line, the p command prints the line, and sed inputs the next line.
Here is how the loop works:
The h command saves the line as it currently stands to the hold
space.
The first s command looks for the first letter of the first
non-capitalized word. If such a word is found, the s command saves
its first letter to the pattern space, and the t command branches to
the :next label. If such a word is not found, which indicates that
there are no more words to capitalize, the b command is executed
instead, branching to the :end label to print out and complete the
processing of the current line.
If a word needing capitalizing was found, execution resumes at the
:next label, and the y command transforms the first letter, which is now in the pattern space, from lowercase to uppercase.
The G command appends the non-transformed version of the current
line from the hold space to the end of the pattern space.
The second s command reconstructs the current line, replacing the first letter of the word currently being processed with its capitalized version.
The t command branches to the :loop label to look for the next word
needing capitalization.
Execution speed testing revealed that the current sed approach executes at approximately the same speed as the awk solution submitted by Ed Morton.
This has already been addressed: Uppercasing First Letter of Words Using SED
I get the correct behavior with GNU sed, but not with the standard BSD sed that ships with OS X. I think the \u "regular expression" is a GNU thing. How about "port install gsed"?
Edit: if you really want to use BSD sed, which I would not recommend (because the command becomes very ugly), then you can do the following:
sed -E "s:([^[:alnum:]_]|^)a:\1A:g; s:([^[:alnum:]_]|^)b:\1B:g; s:([^[:alnum:]_]|^)c:\1C:g; s:([^[:alnum:]_]|^)d:\1D:g; s:([^[:alnum:]_]|^)e:\1E:g; s:([^[:alnum:]_]|^)f:\1F:g; s:([^[:alnum:]_]|^)g:\1G:g; s:([^[:alnum:]_]|^)h:\1H:g; s:([^[:alnum:]_]|^)i:\1I:g; s:([^[:alnum:]_]|^)j:\1J:g; s:([^[:alnum:]_]|^)k:\1K:g; s:([^[:alnum:]_]|^)l:\1L:g; s:([^[:alnum:]_]|^)m:\1M:g; s:([^[:alnum:]_]|^)n:\1N:g; s:([^[:alnum:]_]|^)o:\1O:g; s:([^[:alnum:]_]|^)p:\1P:g; s:([^[:alnum:]_]|^)q:\1Q:g; s:([^[:alnum:]_]|^)r:\1R:g; s:([^[:alnum:]_]|^)s:\1S:g; s:([^[:alnum:]_]|^)t:\1T:g; s:([^[:alnum:]_]|^)u:\1U:g; s:([^[:alnum:]_]|^)v:\1V:g; s:([^[:alnum:]_]|^)w:\1W:g; s:([^[:alnum:]_]|^)x:\1X:g; s:([^[:alnum:]_]|^)y:\1Y:g; s:([^[:alnum:]_]|^)z:\1Z:g;"
Try:
echo "my string" | sed -r 's/\b(.)/\u\1/g'

sed: c command, change only the first line, text deleted

I'm on my second day ever of shell scripting and I've stumbled into this problem: I want to change an entire line of code, which I identify by one word only, and I would like to do that only for the first occurrence.
I'm using sed and the c command, something that looks like this:
Text in file called "prova":
Apple is red
Apple is green
Banana
Tangerine
sed bit of code:
sed -i.bak '1,/Apple/c\
Apricot
' prova
(I'm using Mac OSX)
Strangely enough, and in agreement with what reported by these guys, if I do, I get this output for the prova file:
Apricot
Banana
Tangerine
One "Apple" is gone! Is there a way around this? Please, be patient, I'm a beginner...
Thanks in advance!
Try
sed '1cApricot' prova
With 1,/Apple/, you define a range, starting from line 1 and ending at the first occurrence of Apple after line 1. What you want is not a range, though, just a single line. This can be achieved by only using 1 (instead of e.g. 1,2).
The above command does work for me, but it depends on the sed version, if it doesn't work, try
sed '1c\
Apricot' prova
With the 1 you tell sed to change the first line.
If you don't necessarily want to change line 1, but the first occurrence of Apple, you can do
sed '0,/Apple/s/.*Apple.*/Apricot/'
I used the substitute command s (frankly, I never use c) here and it's only applied to the range starting from line 0 to the first occurrence of Apple. If it finds Apple, the whole line is replaced with Apricot.
sed 's/PatternToFind/PatternToReplaceWith/option'
So if you know the workd top find, use it in first part after the s/ ( PatternToFind). This is a Reduce Regular Expression so be carefull with char like*.[((and should be escape by` before) but alphanumeric are explicit.
Replace the (whole) corresponding pattern with the PatternToReplaceWith (here only few character like \& are special and should be escape by \)
You could also make several substitution serialy with a separation by new line or ;
sed 's/Apple/Pie/;s/Banana/Split/;s/Ice/Cream/g' YourFile
note the last g that mean every occurence on the line.
for first occurence only, you need to load the full file before in buffer before (load each line in holding buffer, at last line recall the buffer in working buffer and make your substitution
sed '1h;1!H
$ {x
s/Apple/Pie/;s/Banana/Split/;s/Ice/Cream/
}' YourFile

Resources