Sed substitution places characters after back reference at beginning of line - bash

I have a text file that I am trying to convert to a Latex file for printing. One of the first steps is to go through and change lines that look like:
Book 01 Introduction
To look like:
\chapter{Introduction}
To this end, I have devised a very simple sed script:
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)/\\chapter{\1}/p'
This does the job, except, the closing curly bracket is placed where the initial backslash should be in the substituted output. Like so:
}chapter{Introduction
Any ideas as to why this is the case?

Your call to sed is fine; the problem is that your file uses DOS line endings (CRLF), but sed does not recognize the CR as part of the line ending, but as just another character on the line. The string Introduction\r is captured, and the result \chapter{Introduction\r} is printed by printing everything up to the carriage return (the ^ represents the cursor position)
\chapter{Introduction
^
then moving the cursor to the beginning of the line
\chapter{Introduction
^
then printing the rest of the result (}) over what has already been printed
}chapter{Introduction
^
The solution is to either fix the file to use standard POSIX line endings (linefeed only), or to modify your regular expression to not capture the carriage return at the end of the line.
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)\r?$/\\chapter{\1}/p'

As an alternative to sed, awk using gsub might work well in this situation:
awk '{gsub(/Book [0-9]+/,"\\chapter"); print $1"{"$2"}"}'
Result:
\chapter{Introduction}

A solution is to modify the capture group. In this case, since all book chapter names consist only of alphabetic characters I was able to use [[:alpha:]]*. This gave a revised sed script of:
sed -n -e 's/Book [[:digit:]]\{2\}\s*\([[:alpha:]]*\)/\\chapter{\1}/p'.

Related

SED's Substituted string is considered as one-line string, whereas it contains newline character

I am testing the sed command to substitute one line with 3 lines and, then, to delete the last line. (I could have substituted it with only the 2 first lines, but this is deliberately stated like this to showcase the main issue).
Let's say that I have the following text :
// ##OPTION_NAME: xxxx
I want to replace the token ##OPTION_NAME by ##OP-NAME and surround it by 2 new lines; Like so :
// ##OP-START
// ##OP-NAME: xxxx
// ##OP-END
To illustrate this, I put this text in a code.c file, and the sed commands in a sed script named script.sed.
Then, I call the following shell command :
Shell command
sed -f script.sed code.c
script.sed
# Begin by replacing patterns by their equivalents, surrounding them with ##OP-START and ##OP-END lines
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g
The problem
Now, I add another sed command in script.sed to delete the line containing ##OP-END. Surprise ! all 3 lines are removed !
# Begin by replacing patterns by their equivalents, surrounding them with ##OP-START and ##OP-END lines
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g
# Last parse; delete ##OP-END
/##OP-END/d
I tried \r\n instead of \n in the sustitution command
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g, but it does not work.
I also tested on ##OP-START to see if it makes some difference,
but alas ! All 3 lines were removed too.
It seems that sed is considering it as one line !
This is not a surprise, d operates on the pattern space, not on a per line basis. After the modification with the s command, your pattern space contains 3 lines. The content of it matches the expression and gets therefore deleted.
To delete this line from the pattern space, you need to use the s command again:
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g$
s/\n\/\/ ##OP-END//
About pattern and hold space: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html#tag_20_116_13

sed error unterminated substitute pattern for new line text

I am writing a script to add new dependencies to the watch list. I am putting a placeholder to know where to add the text, for eg
assets = [
"../../new_app/assets"
# [[NEW_APP_ADD_ASSETS]]
]
It is simple to replace just the place holder but my problem is to add comma in the previous line.
that can be done if I search and replace
"
# [[NEW_APP_ADD_ASSETS]]
ie "\n # [[NEW_APP_ADD_ASSETS]]
I am not able to search for the new line.
One of the solutions I found for adding a new line was
sed -i '' 's/newline/line one\
line two/' filename.txt
But when same way done for the search string it returns :unterminated substitute pattern
sed -i '' s/'assets\"\
#'/'some new text'/ filename.txt
PS: I writing on macos
Sed works on a line-by-line base, hence it becomes tricky to add the coma to the previous line as that line has already been processed. It is possible, but the sed syntax quickly becomes messy.
To be a bit more specific:
In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space. Reading from input shall be skipped if a <newline> was in the pattern space prior to a D command ending the previous cycle. The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits. If no commands explicitly started a new cycle, then at the end of the script the pattern space shall be copied to standard output (except when -n is specified) and the pattern space shall be deleted. Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a <newline>.
In short, if you do not manipulate the pattern space, you cannot process <newline> characters as they just do not appear!
And even shorter, if you only use the substitute command, sed only processes one line at a time!
This is also why you suffer from : unterminated substitute pattern. You are searching for a newline character, but as sed just reads one line at a time, it just does not find it and it also does not expect it. The error will vanish if you replace your newline with the symbols \n.
sed -i '' s/'assets\"\n #'/'some new text'/ filename.txt
A better way to achieve your goals would be to make use of awk. It is a bit more readable:
awk '/# [[NEW_APP_ADD_ASSETS]]/{ print t","; t="line1\nline2"; next }
{ print t; t=$0 }
END{ print t }' <file>

unterminated address regex while using sed

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!
Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.
If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)
awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

Replacing text with shell script ending with an extension

I need some inputs on how to achieve this:
I need to replace a text in a file, using shell script, the text which i need to replace ends with .ear, for example below:
/home/export/files/list/aa_bb_cc.ear
The shell script should replace the aa_bb_cc.ear with say, replaced.ear.
That is the line after substitution should be:
/home/export/files/list/replaced.ear
I did read online about this, and came to know about sed command. The problem which i have, is that i don't know before hand what the text to be replace would be, that is, i know the text to be replace would be something *.ear (in an attempt to match aa_bb_cc.ear)
Now, how can I do this? I tried to use "*" in sed however it didn't work
With GNU sed:
sed 's|[^/]*\.ear|replaced.ear|' file
or
sed 's|[^/]*\(\.ear\)|replaced\1|' file
If you want to edit your file "in place" use sed's option -i.
See: The Stack Overflow Regular Expressions FAQ
$ echo /home/export/files/list/aa_bb_cc.ear | sed 's/[^/]*\.ear/xxx.ear/'
/home/export/files/list/xxx.ear
since regex match is greedy, you need to specify to match non-slash char. Dot matches any char, so to specify literal dot you have to escape with back-slash.

Ignoring lines with blank or space after character using sed

I am trying to use sed to extract some assignments being made in a text file. My text file looks like ...
color1=blue
color2=orange
name1.first=Ahmed
name2.first=Sam
name3.first=
name4.first=
name5.first=
name6.first=
Currently, I am using sed to print all the strings after the name#.first's ...
sed 's/name.*.first=//' file
But of course, this also prints all of the lines with no assignment ...
Ahmed
Sam
# I'm just putting this comment here to illustrate the extra carriage returns above; please ignore it
Is there any way I can get sed to ignore the lines with blank or whitespace only assignments and store this to an array? The number of assigned name#.first's is not known, nor are the number of assignments of each type in general.
This is a slight variation on sputnick's answer:
sed -n '/^name[0-9]\.first=\(.\+\)/ s//\1/p'
The first part (/^name[0-9]\.first=\(.\+\)/) selects the lines you want to pass to the s/// command. The empty pattern in the s command re-uses the previous regular expression and the replacement portion (\1) replaces the entire match with the contents of the first parenthesized part of the regex. Use the -n and p flags to control which lines are printed.
sed -n 's/^name[0-9]\.\w\+=\(\w\+\)/\1/p' file
Output
Ahmed
Sam
Explainations
the -n switch suppress the default behavior of sed : printing all lines
s/// is the skeleton for a substitution
^ match the beginning of a line
name literal string
[0-9] a digit alone
\.\w\+ a literal dot (without backslash means any character) followed by a word character [a-zA-Z0-9_] al least one : \+
( ) is a capturing group and \1 is the captured group

Resources