unterminated address regex while using sed - bash

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!

Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.

If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)

awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

Related

Remove word from url

I Need to remove /%(tenant_id)s from this source:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
To make it look like this:
https://ext.an1.test.dev:8776/v3
I'm trying through sed, but unsuccessfully.
curl ....... | jq -r .endpoints[].url | grep '8776/v3' | sed -e 's/[/%(tenant_id)s] //g'
I get it again:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
You seem to be confused about the meaning of square brackets.
curl ....... |
jq -r '.endpoints[].url' |
sed -n '\;8776/v3;s;/%.*;;p'
fixes the incorrect regex, loses the useless grep, and somewhat simplifies the processing by switching to a different delimiter. To protect against (fairly unlikely) shell wildcard matches on the text in the jq search expression, I also added single quotes around that.
In some more detail, sed -n avoids printing input lines, and the address expression \;8776/v3; selects only input lines which match the regex 8776/v3; we use ; as the delimiter around the regex, which (somewhat obscurely) requires the starting delimiter to be backslashed. Then, we perform the substitution: again, we use ; as the delimiter so that slashes and percent signs in the regex do not need to be escaped. The p flag on the substitution causes sed to print lines where the substitution was performed successfully; we remove the g flag, as we don't expect more than one match per input line. The substitution replaces everything after the first occurrence of /% with nothing.
(Equivalently, with slash delimiters, you would have to backslash all literal slashes: sed -n '/8776\/v3/s/\/%.*//p'.)
For the record, square brackets in regular expressions form a character class; the expression [abc] matches a single character which can be one of a, b, or c. Perhaps review the tips on the Stack Overflow regex tag info page for a quick rerun on this and other common beginner mistakes.
Besides the incorrect square brackets, your regex specified a space after s, which is unlikely to be there. Other than that, your regex should work fine if you are sure the string you want to remove is always exactly /%(tenant_id)s. (Many regex dialects require round parentheses to be escaped, but sed without -E or -r is not one of those.)
If you've managed to get the address into a variable then one parameter expansion idea:
$ myaddr='https://ext.an1.test.dev:8776/v3/%(tenant_id)s'
$ echo "${myaddr%/*}"
https://ext.an1.test.dev:8776/v3
$ mynewaddr="${myaddr%/*}"
$ echo "${mynewaddr}"
https://ext.an1.test.dev:8776/v3

how to filter a fil with regular expression using sed command?

Can anyone please tell me what these two commands do?
sed -i 's!{[^{]*\;}! !' file.txt
sed -i 's!{[^{]*{! !' file.txt
I found this example and i can not figure out the result provided when running the code.
sed -i 's!{[^{]*\;}! !' file.txt
sed -i means in place, file.txt might be altered.
's!....!....!' substitute command, splittet by exclamation marks. Most often you will see slashes used, but sed accepts different characters, defined by the first one, following the s. Note that exclamation marks make problems in the shell. Since there is no slash, neither in pattern, nor in replacement, I don't see a reason to use them.
{[^{]*\;} pattern to match
' ' substitution eventually a blank, if transferred with care, but might be a tab or funky half spaces or something else too.
Now what is the complicated expression:
{[^{]*\;} a literal pair of curly braces containing ...
[^{] a negation group, negation is by first char being a '^', so anything which is not a opening, curly brace, followed by
the quantifier *, meaning in any number, including 0, followed by
backspace, which is a masking tool, as so often.
and a semicolon.
So
'{aaaaa;}' should match
'{a;}' should match
'{;}' should match
'{};}' should match
'{{;}' should not match
'{a}' should not match
'{a}' should not match

replacing specific characters in a line shell script

I have the following contents in a file
{"Hi","Hello","unix":["five","six"]}
I would like to replace comma within the square brackets only to semi colon. Rest of the comma's in the line should not be changed.
Output should be
{"Hi","Hello","unix":["five";"six"]}
I have tried using sed but it is not working. Below is the command I tried. Kindly help.
sed 's/:\[*\,*\]/;/'
Thanks
If your Input_file is same as sample shown then following may help you in same.
sed 's/\([^[]*\)\([^,]*\),\(.*\)/\1\2;\3/g' Input_file
Output will be as follows.
{"Hi","Hello","unix":["five";"six"]}
EDIT: Adding explanation also for same now, it should be only taken for explanation purposes, one should run above code only for getting the output.
sed 's/\([^[]*\)\([^,]*\),\(.*\)/\1\2;\3/g' Input_file
s ##is for substitution in sed.
\([^[]*\) ##Creating the first memory hold which will have the contents from starting to before first occurrence of [ and will be obtained by 1 later in code.
\([^,]*\) ##creating second memory hold which will have everything from [(till where it stopped yesterday) to first occurrence of ,
, ##Putting , here in the line of Input_file.
\(.*\) ##creating third memory hold which will have everything after ,(comma) to till end of current line.
/\1\2;\3/g ##Now mentioning the memory hold by their number \1\2;\3/g so point to be noted here between \2 and \3 have out ;(semi colon) as per OP's request it needed semi colon in place of comma.
Awk would also be useful here
awk -F'[][]' '{gsub(/,/,";",$2); print $1"["$2"]"$3}' file
by using gsub, you can replace all occurrences of matched symbol inside a specific field
Input File
{"Hi","Hello","unix":["five","six"]}
{"Hi","Hello","unix":["five","six","seven","eight"]}
Output
{"Hi","Hello","unix":["five";"six"]}
{"Hi","Hello","unix":["five";"six";"seven";"eight"]}
You should definitely use RavinderSingh13's answer instead of mine (it's less likely to break or exhibit unexpected behavior given very complex input) but here's a less robust answer that's a little easier to explain than his:
sed -r 's/(:\[.*),(.*\])/\1;\2/g' test
() is a capture group. You can see there are two in the search. In the replace, they are refered to as \1 and \2. This allows you to put chunks of your search back in the replace expression. -r keeps the ( and ) from needing to be escaped with a backslash. [ and ] are special and need to be escaped for literal interpretation. Oh, and you wanted .* not *. The * is a glob and is used in some places in bash and other shells, but not in regexes alone.
edit: and /g allows the replacement to happen multiple times.

Sed substitution places characters after back reference at beginning of line

I have a text file that I am trying to convert to a Latex file for printing. One of the first steps is to go through and change lines that look like:
Book 01 Introduction
To look like:
\chapter{Introduction}
To this end, I have devised a very simple sed script:
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)/\\chapter{\1}/p'
This does the job, except, the closing curly bracket is placed where the initial backslash should be in the substituted output. Like so:
}chapter{Introduction
Any ideas as to why this is the case?
Your call to sed is fine; the problem is that your file uses DOS line endings (CRLF), but sed does not recognize the CR as part of the line ending, but as just another character on the line. The string Introduction\r is captured, and the result \chapter{Introduction\r} is printed by printing everything up to the carriage return (the ^ represents the cursor position)
\chapter{Introduction
^
then moving the cursor to the beginning of the line
\chapter{Introduction
^
then printing the rest of the result (}) over what has already been printed
}chapter{Introduction
^
The solution is to either fix the file to use standard POSIX line endings (linefeed only), or to modify your regular expression to not capture the carriage return at the end of the line.
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)\r?$/\\chapter{\1}/p'
As an alternative to sed, awk using gsub might work well in this situation:
awk '{gsub(/Book [0-9]+/,"\\chapter"); print $1"{"$2"}"}'
Result:
\chapter{Introduction}
A solution is to modify the capture group. In this case, since all book chapter names consist only of alphabetic characters I was able to use [[:alpha:]]*. This gave a revised sed script of:
sed -n -e 's/Book [[:digit:]]\{2\}\s*\([[:alpha:]]*\)/\\chapter{\1}/p'.

Removing Time Stamp With Sed

I have found a couple examples online.. but I could not
find a combination that would work, as the synx for sed
is very tricky, if you could please kindly point me in
right direction I would be highly grateful..
Here is the time stamp that i would like to remove from the file
00:02:06.580 --> 00:02:07.380
Here is what i already tried
cat sometextfile.txt | sed -r 's /\[0-9]{2}:[0-9]{2}:[0-9]{2}\/ g'
But I keep getting and error: sed: -e expression #1, char 34: unterminated `s' command
Thanks!!
The syntax is s/ what to replace / what to replace it with /. You are missing the second part. Even if you want to replace it with nothing, you need all three slashes; just don't put anything between the last two. As it is, you have only one slash, because the second one is quoted with \, meaning sed will treat it as part of the expression and look for a literal / in the input.
The beginning of your regex is also wrong. \[0-9]{2} matches the literal string [0-9 followed by exactly two right brackets (]]). Remove the initial backslash (\) if you want to match "exactly two digits".
Also, you never need to do cat filename |; you can just do < filename. In this specific case, sed takes a filename parameter, so you can do without the <, too.
So it should be something like this;
sed -E 's/[0-9]{2}:[0-9]{2}:[0-9]{2}//' sometextfile.txt
(I used -E because it's more portable than -r, which is a GNUism.)
You don't need the g on the end unless there's more than one timestamp per line.

Resources