Extract text between two special characters

Extract text between two special characters - shell

Trying to extract the text between the special characters "\ and \" through sed
Ex: "\hell##$\"},
expected output : hell##$

You can do it quite easily with using a capture-group and backreference with basic regular-expressions:
sed 's/^["][\]\([^\]*\).*$/\1/'
Explanation
Normal substitution sed 's/find/replace/, where
find is ^["][\] a double-quote and \ before beginning the capture \(...\) which contains [^\]* (zero or more characters not a \), the closing of the capture \) and then .*$ the remainder of the string;
replace is \1 (the first backreference) containing the text captured between \(...\).
(note: if your "\ doesn't begin the string, remove the first '^' anchor)
Example
$ echo '"\hell##$\"},' | sed 's/^["][\]\([^\]*\).*$/\1/'
hell##$
Look things over and let me know if you have questions.

This might work for you (GNU sed):
sed -nE '/"\\[^\\]*\\+([^\\"][^\\]*\\+)*"/{s/"\\/\n/;s/.*\n//;s/\\"/\n/;P;D}' file
The solution comes in two parts:
Firstly, a regexp to determine whether a pair of two characters exists. This can be tricky as a negated class is insufficient because edge cases can easily defeat a simplistic approach.
Secondly, once a pair of characters does exist the text between them must be extracted piece meal.

Related

Remove word from url

I Need to remove /%(tenant_id)s from this source:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
To make it look like this:
https://ext.an1.test.dev:8776/v3
I'm trying through sed, but unsuccessfully.
curl ....... | jq -r .endpoints[].url | grep '8776/v3' | sed -e 's/[/%(tenant_id)s] //g'
I get it again:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s

You seem to be confused about the meaning of square brackets.
curl ....... |
jq -r '.endpoints[].url' |
sed -n '\;8776/v3;s;/%.*;;p'
fixes the incorrect regex, loses the useless grep, and somewhat simplifies the processing by switching to a different delimiter. To protect against (fairly unlikely) shell wildcard matches on the text in the jq search expression, I also added single quotes around that.
In some more detail, sed -n avoids printing input lines, and the address expression \;8776/v3; selects only input lines which match the regex 8776/v3; we use ; as the delimiter around the regex, which (somewhat obscurely) requires the starting delimiter to be backslashed. Then, we perform the substitution: again, we use ; as the delimiter so that slashes and percent signs in the regex do not need to be escaped. The p flag on the substitution causes sed to print lines where the substitution was performed successfully; we remove the g flag, as we don't expect more than one match per input line. The substitution replaces everything after the first occurrence of /% with nothing.
(Equivalently, with slash delimiters, you would have to backslash all literal slashes: sed -n '/8776\/v3/s/\/%.*//p'.)
For the record, square brackets in regular expressions form a character class; the expression [abc] matches a single character which can be one of a, b, or c. Perhaps review the tips on the Stack Overflow regex tag info page for a quick rerun on this and other common beginner mistakes.
Besides the incorrect square brackets, your regex specified a space after s, which is unlikely to be there. Other than that, your regex should work fine if you are sure the string you want to remove is always exactly /%(tenant_id)s. (Many regex dialects require round parentheses to be escaped, but sed without -E or -r is not one of those.)

If you've managed to get the address into a variable then one parameter expansion idea:
$ myaddr='https://ext.an1.test.dev:8776/v3/%(tenant_id)s'
$ echo "${myaddr%/*}"
https://ext.an1.test.dev:8776/v3
$ mynewaddr="${myaddr%/*}"
$ echo "${mynewaddr}"
https://ext.an1.test.dev:8776/v3

Using sed to substitute text around dynamic filename

I'm trying to figure out the best method for substituting text in a BASH script. Sed seems to be the best option, but correct me if I'm wrong.
What I'd like to do is take every instance of images/< filename >.png in a file, and add surrounding text - {{media("images/.< filename >.png")}}. The following code is the closest I've been able to get:
sed -i -e 's:images/.*.png:{{media("images/.*.png")}}:g' file.html
How can I make this happen?

In sed, & in the substitution will be replaced with the matched string, so if we can assume no spaces in a filename and a word boundary before and after each, this does what you want:
s:\bimages/\S*\.png\b:{{media("&")}}:g
Try it online!
Apart from doing the substitution, there are a couple issues with your code worth mentioning:
images/.*.png will match images/foo.png, but it will also match images/foopng. Don't forget to escape regex characters: images/.*\.png.
sed quantifiers are always greedy. Suppose you had this input:
Foo images/bar.png baz images/qux.png quux
In this case, the expression images/.*\.png would match everything from the first images to the last .png. The solution above avoids this by using \S instead of . to match only non-whitespace characters.

Replace All first 4 spaces with a tab

I am doing some documentation work, and I have a tree structure like this:
A
BB
C C
DD
How can I replace just all the occurrences of 2 spaces in the head of the line with '-', like:
A
--BB
--C C
----DD
I have tried sed 's/ /-/g', but this replaces all occurrences of 2 spaces; also sed 's/^ /-/g', this just replaces the first occurrence of 2 spaces. How can I do this?

The regular expression for four spaces at beginning of line is /^ / where I put the slashes just to demarcate the expression (they are not part of the actual regular expression, but they are used as delimiters by sed).
sed 's/^ /\t/' file
In recent sed versions, you can add an -i option to modify file in-place (that is, sed will replace the file with the modified file); on *BSD (including OSX), you need -i '' with an empty option argument.
The \t escape code for tab is also not universally supported; if that is a problem, your shell probably allows you to type a literal tab by prefixing it with ctrl-V.
(Your question title says "tab" but your question asks about dashes. To replace with two dashes, replace \t in the replacement part of the script with --, obviously.)
If you are trying to generalize to "any groups of two spaces at beginning of line should be replaced by a dash", this is not impossible to do in sed, but I would recommend Perl instead:
perl -pe 's%^((?: )+)% "-" x (length($1) / 2)%e' file
This captures the match into $1; the inner parenthesized expression matches two spaces and the + quantifier says to match that as many times as possible. The /e flag allows us to use Perl code in the replacement; this piece of code repeats the character "-" as many times as the captured expression was repeated, which is conveniently equal to half its length.

Need to diff two text files in linux with some patterns in filelines

File A contains
Test-1.2-3
Test1-2.2-3
Test2-4.2-3
File B contains
Test1
Expected output should be
Test-1.2-3
Test2-4.2-3
diff A B doesn't work as expected.
Kindly let me know if any solutions here.

Using grep:
grep -vf B A
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
Edit:
Optionally, you may want to use the -w option if you want a more precise match on "words" only which seems to be your case from your example since your match is followed by '-'. As DevSolar points out, you may also want to use the -F option to prevent input patterns from your file B to be interpreted as regular expressions.
grep -vFwf B A
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (rather than regular
expressions), separated by newlines, any of which is to be matched.

To complement Julien Lopez's helpful answer:
If you want to ensure that lines from File B only match at the beginning of lines from File A, you can prepend ^ to each line from file B, using sed:
grep -vf <(sed 's/^/^/' fileB) fileA
grep, which by default interprets its search strings as BREs (basic regular expressions), then interprets the ^ as the beginning-of-line anchor.
If the lines in File B may contain characters that are regex metacharacters (such as ^, *,?, ...) but should be treated as literals, you must escape them first:
grep -vf <(sed 's/[^^]/[&]/g; s/\^/\\^/g; s/^/^/' fileB) fileA
An explanation of this grim-looking - but generically robust - sed command can be found in this this answer of mine.
Note:
Assumes bash, ksh, or zsh due to use of <(...), a process substitution, which makes the output from sed act as if it were provided via a file.
sed command s/^/^/ looks like it won't do anything, but the first ^, in the regex part of the call, is the beginning-of-line anchor[1]
, whereas the second ^, in the substitution part of the call, is a literal to place at the beginning of the line (which will later itself be interpreted as the beginning-of-line anchor in the context of grep).
[1] Strictly speaking, to sed it is the beginning-of-pattern-space anchor, because it is possible to read multiple lines at once with sed, in which case ^ refers to the beginning of the pattern space (input buffer) as a whole, not to individual lines.

Ignoring lines with blank or space after character using sed

I am trying to use sed to extract some assignments being made in a text file. My text file looks like ...
color1=blue
color2=orange
name1.first=Ahmed
name2.first=Sam
name3.first=
name4.first=
name5.first=
name6.first=
Currently, I am using sed to print all the strings after the name#.first's ...
sed 's/name.*.first=//' file
But of course, this also prints all of the lines with no assignment ...
Ahmed
Sam
# I'm just putting this comment here to illustrate the extra carriage returns above; please ignore it
Is there any way I can get sed to ignore the lines with blank or whitespace only assignments and store this to an array? The number of assigned name#.first's is not known, nor are the number of assignments of each type in general.

This is a slight variation on sputnick's answer:
sed -n '/^name[0-9]\.first=\(.\+\)/ s//\1/p'
The first part (/^name[0-9]\.first=\(.\+\)/) selects the lines you want to pass to the s/// command. The empty pattern in the s command re-uses the previous regular expression and the replacement portion (\1) replaces the entire match with the contents of the first parenthesized part of the regex. Use the -n and p flags to control which lines are printed.

sed -n 's/^name[0-9]\.\w\+=\(\w\+\)/\1/p' file
Output
Ahmed
Sam
Explainations
the -n switch suppress the default behavior of sed : printing all lines
s/// is the skeleton for a substitution
^ match the beginning of a line
name literal string
[0-9] a digit alone
\.\w\+ a literal dot (without backslash means any character) followed by a word character [a-zA-Z0-9_] al least one : \+
( ) is a capturing group and \1 is the captured group

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extract text between two special characters - shell

Trying to extract the text between the special characters "\ and \" through sed Ex: "\hell##$\"}, expected output : hell##$

Related

Remove word from url

Using sed to substitute text around dynamic filename

Replace All first 4 spaces with a tab

Need to diff two text files in linux with some patterns in filelines

Ignoring lines with blank or space after character using sed

Categories

Resources