Search and replace in Shell - bash

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.

for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.

amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.

I would look up the man pages for awk or sed.

Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.

Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Related

sed to remove section of text from a variable

So I think I've cracked the regex but just can't crack how to get sed to make the changes. I have a variable which is this:
MAKEVAR = EPICS_BASE=$CI_PROJECT_DIR/3.16/base IPAC=$CI_PROJECT_DIR/3.16/support/ipac SNCSEQ=$CI_PROJECT_DIR/3.16/support/seq
(All one line). But I want to delete the particular section defining IPAC so my regex looks like this:
(IPAC.+\s)
I know from using this tool that that should be correct:
https://www.regextester.com/98103
However when I run different iterations of trying out sed like:
sed 's/(IPAC.+\s)/\&/g' <<< "$MAKEVAR"
And then echo out MAKEVAR, the IPAC section still exists.
How can I update a particular section of text in a shell variable to remove a section beginning with IPAC up until the next space?
Thanks in advance
regextester (or any other online tool) is a great way to verify that a regexp works in that online tool. Unfortunately that doesn't mean it'll work in any given command-line tool. In particular your regexp includes \s which is specific to PCREs and some GNU tools, and uses (...) to delineate capture groups but that's only used in EREs and PCREs, not BREs such as sed supports by default where you'd have to use \(...\), and your replacement text is using '&' which is telling sed you want to replace the string that matches the regexp with a literal \& when in fact you just want to remove it.
This is how to do what I think you're trying to do using any sed:
$ sed 's/IPAC[^ ]* //' <<< "$MAKEVAR"
EPICS_BASE=$CI_PROJECT_DIR/3.16/base SNCSEQ=$CI_PROJECT_DIR/3.16/support/seq
Nevermind, found a workaround:
MAKEVAR=$(sed -E 's/(IPAC.+ipac)//' <<<"$MAKEVAR")
Use a shorter
MAKEVAR=$(sed 's/IPAC.*ipac//' <<< "$MAKEVAR")
IPAC.*ipac matches all the way from first IPAC to last ipac. The matched text is removed from the text.

Combine two expression in Bash

I did check the ABS, but it was hard to find a reference to my problem/question there.
Here it is. Consider the following code (Which extracts the first character of OtherVar and then converts MyVar to uppercase):
OtherVar=foobar
MyChar=${OtherVar:0:1} # get first character of OtherVar string variable
MyChar=${MyChar^} # first character to upper case
Could I somehow condense the second and third line into one statement?
P.S.: As was pointed out below, not needs to have a named variable. I should add, I would like to not add any sub-shells or so and would also accept a somehow hacky way to achieve the desired result.
P.P.S.: The question is purely educational.
You could do it all-in-one without forking sub-shell or running external command:
printf -v MyChar %1s "${OtherVar^}"
Or:
read -n1 MyChar <<<"${OtherVar^}"
Another option:
declare -u MyChar=${OtherVar:0:1}
But I can't see the point in such optimization in a bash script.
There are more suitable text processing interpreters, like awk, sed, even perl or python if performance matters.
You could use the cut command and put it in a complex expression to get it on one line, but I'm not sure it makes the code too much clearer:
OtherVar=foobar
MyChar=$(echo ${OtherVar^} | cut -c1-1) # uppercase first character and cut string

What result do sed 's;//;/;g' and egrep "\-example (|\:)$variablename:" give?

I'm writing some automation for getting perticular path, I got the code, but I'm unable to understand what exactly the below commands does?
sed 's;//;/;g'
egrep "\-example (|\:)$variablename:"
In sed, it is common to use / as a delimiter for the s command, but any character will do and it is common to choose a character that does not appear in either the pattern or the replacement. In this case, ; was chosen as the delimiter. This command is simply replacing occurrences of // with /. It is equivalent to the (IMO) more readable :s#//#/#g
The egrep is a syntax error which perhaps should have been written: egrep "-example (:)?$variablename:", which will match text with or without the :. (That is, if variablename=foo, then this will match either -example foo: or -example :foo:. That may not be what is intended, but it's hard to say since the given example is not valid syntax. It appears to want to provide alternatives between a : or something else, but the alternative is missing.)

How can I replace a word at a specific line in a file in unix

I've researched other questions on here, but haven't really found one that works for me. I'm trying to select a specific line from a file and replace a string on that line with another string. So I have a file named my_course. I'm trying to modify a line in my_course that starts with "123". on that line I want to replace the string "0," with "1,". Help?
One possibility would be to use sed:
sed '/^123/ s/0/1/' my_course
In the first /../ part you just have to specify the pattern you are looking for ^123 for a line starting with 123.
In the s/from/to/ part you have specify the substitution to be performed.
Note that by default after substitution the file will be written to stdout. You might want to:
redirect the output using ... > my_new_course
perform the substitution "in place" using the -e switch to sed
If you are using the destructive in place variant you might want to use -iEXTENSION in addition to keep a copy with the given EXTENSION of the original version in case something goes wrong.
EDIT:
To match the desired lined with a prefix stored in a variable you have to enclose the sed script with double quotes " as using single qoutes ' will prevent variable expansion:
sed "/^$input/ s/0/1/" my_course
Have you tried this:
sed -e '[line]s/old_string/new_string/' my_course
PS: the [ ] shouldn't be used, is there just to make it clear that you should put the number right before the "s".
Cheers!
In fact, the -e in this case is not necessary, I can write just
sed '<line number>s/<old string>/<new string>/' my_course
This is what worked for me on Fedora 36, GNU bash, version 5.2.15(1)-release (x86_64-redhat-linux-gnu):
sed -i '1129s/additional/extra/' en-US/Design.xml
I know you said you couldn't use line numbers; I don't know how to address that part, but this replaced "additional" with "extra" on line 1129 of that file.

Why does sed not replace overlapping patterns

I have a database unload file with field separated with the <TAB> character. I am running this file through sed to replace any occurences of <TAB><TAB> with <TAB>\N<TAB>. This is so that when the file is loaded into MySQL the \N in interpreted as NULL.
The sed command 's/\t\t/\t\N\t/g;' almost works except that it only replaces the first instance e.g. "...<TAB><TAB><TAB>..." becomes "...<TAB>\N<TAB><TAB>...".
If I use 's/\t\t/\t\N\t/g;s/\t\t/\t\N\t/g;' it replaces more instances.
I have a notion that despite the /g modifier this is something to do with the end of one match being the start of another.
Could anyone explain what is happening and suggest a sed command that would work or do I need to loop.
I know I could probably switch to awk, perl, python but I want to know what is happening in sed.
Not dissimilar to the perl solution, this works for me using pure sed
With #Robin A. Meade improvement
sed ':repeat;
s|\t\t|\t\n\t|g;
t repeat'
Explanation
:repeat is a label, used for branch commands, similar to batch
s|\t\t|\t\n\t|g; - Standard replace 2 tabs with tab-newline-tab. I still use the global flag because if you have, say, 15 tabs, you will only need to loop twice, rather than 14 times.
t repeat means if the "s" command did any replaces, then goto the label repeat, else it goes onto the next line and starts over again.
So it goes like this. Keep repeating (goto repeat) as long as there is a match for the pattern of 2 tabs.
While the argument can be made that you could just do two identical global replaces and call it good, this same technique could work in more complicated scenarios.
As #thorn-blake points out, sed just doesn't support advanced features like lookahead, so you need to do a loop like this.
Original Answer
sed ':repeat;
/\t\t/{
s|\t\t|\t\n\t|g;
b repeat
}'
Explanation
:repeat is a label, used for branch commands, similar to batch
/\t\t/ means match the pattern 2 tabs. If the pattern it matched, the command following the second / is executed.
{} - In this case the command following the match command is a group. So all of the commands in the group are executed if the match pattern is met.
s|\t\t|\t\n\t|g; - Standard replace 2 tabs with tab-newline-tab. I still use the global because if you have say 15 tabs, you will only need to loop twice, rather than 14 times.
b repeat means always goto (branch) the label repeat
Short version
Which can be shortened to
sed ':r;s|\t\t|\t\n\t|g; t r'
# Original answer
# sed ':r;/\t\t/{s|\t\t|\t\n\t|g; b r}'
MacOS
And the Mac (yet still Linux/Windows compatible) version:
sed $':r\ns|\t\t|\t\\\n\t|g; t r'
# Original answer
# sed $':r\n/\t\t/{ s|\t\t|\t\\\n\t|g; b r\n}'
Tabs need to be literal in BSD sed
Newlines need to be both literal and escaped at the same time, hence the single slash (that's \ before it is processed by the $, making it a single literal slash ) plus the \n which becomes an actual newline
Both label names (:r) and branch commands (b r when not the end of the expression) must end in a newline. Special characters like semicolons and spaces are consumed by the label name/branch command in BSD, which makes it all very confusing.
I know you want sed, but sed doesn't like this at all, it seems that it specifically (see here) won't do what you want. However, perl will do it (AFAIK):
perl -pe 'while (s#\t\t#\t\n\t#) {}' <filename>
As a workaround, replace every tab with tab + \N; then remove all occurrences of \N which are not immediately followed by a tab.
sed -e 's/\t/\t\\N/g' -e 's/\\N\([^\t]\)/\1/g'
... provided your sed uses backslash before grouping parentheses (there are sed dialects which don't want the backslashes; try without them if this doesn't work for you.)
Right, even with /g, sed will not match the text it replaced again. Thus, it's read <TAB><TAB> and output <TAB>\N<TAB> and then reads the next thing in from the input stream. See http://www.grymoire.com/Unix/Sed.html#uh-7
In a regex language that supports lookaheads, you can get around this with a lookahead.
Well, sed simply works as designed. The input line is scanned once, not multiple times. Maybe it helps to look at the consequences if sed used rescanning the input line to deal with overlapping patterns by default: in this case even simple substitutions would work quite differently--some might say counter-intuitively--, e.g.
s/^/ / inserting a space at the beginning of a line would never terminate
s/$/foo/ appending foo to each line - likewise
s/[A-Z][A-Z]*/CENSORED/ replacing uppercase words with CENSORED - likewise
There are probably many other situations. Of course these could all be remedied with, say, a substitution modifier, but at the time sed was designed, the current behavior was chosen.

Resources