how to filter a fil with regular expression using sed command? - shell

Can anyone please tell me what these two commands do?
sed -i 's!{[^{]*\;}! !' file.txt
sed -i 's!{[^{]*{! !' file.txt
I found this example and i can not figure out the result provided when running the code.

sed -i 's!{[^{]*\;}! !' file.txt
sed -i means in place, file.txt might be altered.
's!....!....!' substitute command, splittet by exclamation marks. Most often you will see slashes used, but sed accepts different characters, defined by the first one, following the s. Note that exclamation marks make problems in the shell. Since there is no slash, neither in pattern, nor in replacement, I don't see a reason to use them.
{[^{]*\;} pattern to match
' ' substitution eventually a blank, if transferred with care, but might be a tab or funky half spaces or something else too.
Now what is the complicated expression:
{[^{]*\;} a literal pair of curly braces containing ...
[^{] a negation group, negation is by first char being a '^', so anything which is not a opening, curly brace, followed by
the quantifier *, meaning in any number, including 0, followed by
backspace, which is a masking tool, as so often.
and a semicolon.
So
'{aaaaa;}' should match
'{a;}' should match
'{;}' should match
'{};}' should match
'{{;}' should not match
'{a}' should not match
'{a}' should not match

Related

Remove word from url

I Need to remove /%(tenant_id)s from this source:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
To make it look like this:
https://ext.an1.test.dev:8776/v3
I'm trying through sed, but unsuccessfully.
curl ....... | jq -r .endpoints[].url | grep '8776/v3' | sed -e 's/[/%(tenant_id)s] //g'
I get it again:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
You seem to be confused about the meaning of square brackets.
curl ....... |
jq -r '.endpoints[].url' |
sed -n '\;8776/v3;s;/%.*;;p'
fixes the incorrect regex, loses the useless grep, and somewhat simplifies the processing by switching to a different delimiter. To protect against (fairly unlikely) shell wildcard matches on the text in the jq search expression, I also added single quotes around that.
In some more detail, sed -n avoids printing input lines, and the address expression \;8776/v3; selects only input lines which match the regex 8776/v3; we use ; as the delimiter around the regex, which (somewhat obscurely) requires the starting delimiter to be backslashed. Then, we perform the substitution: again, we use ; as the delimiter so that slashes and percent signs in the regex do not need to be escaped. The p flag on the substitution causes sed to print lines where the substitution was performed successfully; we remove the g flag, as we don't expect more than one match per input line. The substitution replaces everything after the first occurrence of /% with nothing.
(Equivalently, with slash delimiters, you would have to backslash all literal slashes: sed -n '/8776\/v3/s/\/%.*//p'.)
For the record, square brackets in regular expressions form a character class; the expression [abc] matches a single character which can be one of a, b, or c. Perhaps review the tips on the Stack Overflow regex tag info page for a quick rerun on this and other common beginner mistakes.
Besides the incorrect square brackets, your regex specified a space after s, which is unlikely to be there. Other than that, your regex should work fine if you are sure the string you want to remove is always exactly /%(tenant_id)s. (Many regex dialects require round parentheses to be escaped, but sed without -E or -r is not one of those.)
If you've managed to get the address into a variable then one parameter expansion idea:
$ myaddr='https://ext.an1.test.dev:8776/v3/%(tenant_id)s'
$ echo "${myaddr%/*}"
https://ext.an1.test.dev:8776/v3
$ mynewaddr="${myaddr%/*}"
$ echo "${mynewaddr}"
https://ext.an1.test.dev:8776/v3

Extract text between two special characters

Trying to extract the text between the special characters "\ and \" through sed
Ex: "\hell##$\"},
expected output : hell##$
You can do it quite easily with using a capture-group and backreference with basic regular-expressions:
sed 's/^["][\]\([^\]*\).*$/\1/'
Explanation
Normal substitution sed 's/find/replace/, where
find is ^["][\] a double-quote and \ before beginning the capture \(...\) which contains [^\]* (zero or more characters not a \), the closing of the capture \) and then .*$ the remainder of the string;
replace is \1 (the first backreference) containing the text captured between \(...\).
(note: if your "\ doesn't begin the string, remove the first '^' anchor)
Example
$ echo '"\hell##$\"},' | sed 's/^["][\]\([^\]*\).*$/\1/'
hell##$
Look things over and let me know if you have questions.
This might work for you (GNU sed):
sed -nE '/"\\[^\\]*\\+([^\\"][^\\]*\\+)*"/{s/"\\/\n/;s/.*\n//;s/\\"/\n/;P;D}' file
The solution comes in two parts:
Firstly, a regexp to determine whether a pair of two characters exists. This can be tricky as a negated class is insufficient because edge cases can easily defeat a simplistic approach.
Secondly, once a pair of characters does exist the text between them must be extracted piece meal.

Removing Time Stamp With Sed

I have found a couple examples online.. but I could not
find a combination that would work, as the synx for sed
is very tricky, if you could please kindly point me in
right direction I would be highly grateful..
Here is the time stamp that i would like to remove from the file
00:02:06.580 --> 00:02:07.380
Here is what i already tried
cat sometextfile.txt | sed -r 's /\[0-9]{2}:[0-9]{2}:[0-9]{2}\/ g'
But I keep getting and error: sed: -e expression #1, char 34: unterminated `s' command
Thanks!!
The syntax is s/ what to replace / what to replace it with /. You are missing the second part. Even if you want to replace it with nothing, you need all three slashes; just don't put anything between the last two. As it is, you have only one slash, because the second one is quoted with \, meaning sed will treat it as part of the expression and look for a literal / in the input.
The beginning of your regex is also wrong. \[0-9]{2} matches the literal string [0-9 followed by exactly two right brackets (]]). Remove the initial backslash (\) if you want to match "exactly two digits".
Also, you never need to do cat filename |; you can just do < filename. In this specific case, sed takes a filename parameter, so you can do without the <, too.
So it should be something like this;
sed -E 's/[0-9]{2}:[0-9]{2}:[0-9]{2}//' sometextfile.txt
(I used -E because it's more portable than -r, which is a GNUism.)
You don't need the g on the end unless there's more than one timestamp per line.

unterminated address regex while using sed

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!
Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.
If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)
awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

Need to diff two text files in linux with some patterns in filelines

File A contains
Test-1.2-3
Test1-2.2-3
Test2-4.2-3
File B contains
Test1
Expected output should be
Test-1.2-3
Test2-4.2-3
diff A B doesn't work as expected.
Kindly let me know if any solutions here.
Using grep:
grep -vf B A
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
Edit:
Optionally, you may want to use the -w option if you want a more precise match on "words" only which seems to be your case from your example since your match is followed by '-'. As DevSolar points out, you may also want to use the -F option to prevent input patterns from your file B to be interpreted as regular expressions.
grep -vFwf B A
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (rather than regular
expressions), separated by newlines, any of which is to be matched.
To complement Julien Lopez's helpful answer:
If you want to ensure that lines from File B only match at the beginning of lines from File A, you can prepend ^ to each line from file B, using sed:
grep -vf <(sed 's/^/^/' fileB) fileA
grep, which by default interprets its search strings as BREs (basic regular expressions), then interprets the ^ as the beginning-of-line anchor.
If the lines in File B may contain characters that are regex metacharacters (such as ^, *,?, ...) but should be treated as literals, you must escape them first:
grep -vf <(sed 's/[^^]/[&]/g; s/\^/\\^/g; s/^/^/' fileB) fileA
An explanation of this grim-looking - but generically robust - sed command can be found in this this answer of mine.
Note:
Assumes bash, ksh, or zsh due to use of <(...), a process substitution, which makes the output from sed act as if it were provided via a file.
sed command s/^/^/ looks like it won't do anything, but the first ^, in the regex part of the call, is the beginning-of-line anchor[1]
, whereas the second ^, in the substitution part of the call, is a literal to place at the beginning of the line (which will later itself be interpreted as the beginning-of-line anchor in the context of grep).
[1] Strictly speaking, to sed it is the beginning-of-pattern-space anchor, because it is possible to read multiple lines at once with sed, in which case ^ refers to the beginning of the pattern space (input buffer) as a whole, not to individual lines.

Resources