Remove word from url - bash

I Need to remove /%(tenant_id)s from this source:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
To make it look like this:
https://ext.an1.test.dev:8776/v3
I'm trying through sed, but unsuccessfully.
curl ....... | jq -r .endpoints[].url | grep '8776/v3' | sed -e 's/[/%(tenant_id)s] //g'
I get it again:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s

You seem to be confused about the meaning of square brackets.
curl ....... |
jq -r '.endpoints[].url' |
sed -n '\;8776/v3;s;/%.*;;p'
fixes the incorrect regex, loses the useless grep, and somewhat simplifies the processing by switching to a different delimiter. To protect against (fairly unlikely) shell wildcard matches on the text in the jq search expression, I also added single quotes around that.
In some more detail, sed -n avoids printing input lines, and the address expression \;8776/v3; selects only input lines which match the regex 8776/v3; we use ; as the delimiter around the regex, which (somewhat obscurely) requires the starting delimiter to be backslashed. Then, we perform the substitution: again, we use ; as the delimiter so that slashes and percent signs in the regex do not need to be escaped. The p flag on the substitution causes sed to print lines where the substitution was performed successfully; we remove the g flag, as we don't expect more than one match per input line. The substitution replaces everything after the first occurrence of /% with nothing.
(Equivalently, with slash delimiters, you would have to backslash all literal slashes: sed -n '/8776\/v3/s/\/%.*//p'.)
For the record, square brackets in regular expressions form a character class; the expression [abc] matches a single character which can be one of a, b, or c. Perhaps review the tips on the Stack Overflow regex tag info page for a quick rerun on this and other common beginner mistakes.
Besides the incorrect square brackets, your regex specified a space after s, which is unlikely to be there. Other than that, your regex should work fine if you are sure the string you want to remove is always exactly /%(tenant_id)s. (Many regex dialects require round parentheses to be escaped, but sed without -E or -r is not one of those.)

If you've managed to get the address into a variable then one parameter expansion idea:
$ myaddr='https://ext.an1.test.dev:8776/v3/%(tenant_id)s'
$ echo "${myaddr%/*}"
https://ext.an1.test.dev:8776/v3
$ mynewaddr="${myaddr%/*}"
$ echo "${mynewaddr}"
https://ext.an1.test.dev:8776/v3

Related

how to filter a fil with regular expression using sed command?

Can anyone please tell me what these two commands do?
sed -i 's!{[^{]*\;}! !' file.txt
sed -i 's!{[^{]*{! !' file.txt
I found this example and i can not figure out the result provided when running the code.
sed -i 's!{[^{]*\;}! !' file.txt
sed -i means in place, file.txt might be altered.
's!....!....!' substitute command, splittet by exclamation marks. Most often you will see slashes used, but sed accepts different characters, defined by the first one, following the s. Note that exclamation marks make problems in the shell. Since there is no slash, neither in pattern, nor in replacement, I don't see a reason to use them.
{[^{]*\;} pattern to match
' ' substitution eventually a blank, if transferred with care, but might be a tab or funky half spaces or something else too.
Now what is the complicated expression:
{[^{]*\;} a literal pair of curly braces containing ...
[^{] a negation group, negation is by first char being a '^', so anything which is not a opening, curly brace, followed by
the quantifier *, meaning in any number, including 0, followed by
backspace, which is a masking tool, as so often.
and a semicolon.
So
'{aaaaa;}' should match
'{a;}' should match
'{;}' should match
'{};}' should match
'{{;}' should not match
'{a}' should not match
'{a}' should not match

grep for a variable content with a dot

i found many similar questions about my issue but i still don't find the correct one for me.
I need to grep for the content of a variable plus a dot but it doesn't run escaping the dot after the variable. For example:
The file content is
item.
newitem.
My variable content is item. and i want to grep for the exact word, therefore I must use -w and not -F but with the command I can't obtain the correct output:
cat file | grep -w "$variable\."
Do you have suggestions please?
Hi, I have to rectify my scenario. My file contains some FQDN and for some reasons I have to look for hostname. with the dot.
Unfortunatelly the grep -wF doesn't run:
My file is
hostname1.domain.com
hostname2.domain.com
and the command
cat file | grep -wF hostname1.
doesn't show any output. I have to find another solution and I'm not sure that grep could help.
If $variable contains item., you're searching for item.\. which is not what you want. In fact, you want -F which interprets the pattern literally, not as a regular expression.
var=item.
echo $'item.\nnewitem.' | grep -F "$var"
Try:
grep "\b$word\."
\b: word boundary
\.: the dot itself is a word boundary
Following awk solution may help you in same.
awk -v var="item." '$0==var' Input_file
You are dereferencing variable and append \. to it, which results in calling
cat file | grep -w "item.\.".
Since grep accepts files as parameter, calling grep "item\." file should do.
from man grep
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
and
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string
provided it's not at the edge of a word. The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
as the last character is a . it must be followed by a non word [A-Za-z0-9_] however the next character is d
grep '\<hostname1\.'
should work as \< ensures previous chracter is not a word constituent.
You can dynamically construct the search pattern and then call grep
rexp='^hostname1\.'
grep "$rexp" file.txt
The single quotes tell bash not to interpret special characters in the variable. Double quotes tell bash to allow replacing $rexp with its value. The caret ( ^ ) in the expression tells grep to look for lines starting with 'hostname1.'

Bash: remove semicolons from a line in a CSV-file

I've a CSV-file with a few hundred lines and a lot (not all) of these lines contains data (Klas/Lesgroep:;;T2B1) which I want to extract.
i.e. ;;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;
I want to delete the semicolons which are in front of Klas/Lesgroep but the number of semicolons is variable. How can I delete these semicolons in Bash ?
I'm not a native speaking Englishman so I hope it's clear to you
To remove any nonempty run of ; chars. that come directly before literal Klas/Lesgroep:
With GNU or BSD/macOS sed:
$ sed -E 's|;+(Klas/Lesgroep)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
The s function performs string substitution (replacement):
The 1st argument is a regex (regular expression) that specifies what part of the line to match,
and the 2nd arguments specifies what to replace the matching part with.
Note how I've chosen | as the regex/argument delimiter instead of the customary /, because that allows unescaped use of / chars. inside the regex.
;+ matches one or more directly adjacent ; chars.
(Klas/Lesgroep) matches literal Klas/Lesgroep and by enclosing it in (...) - making it a capture group - the match is remembered and can be referenced as \1 - the 1st capture group in the regex - in the replacement argument to s.
The net effect is that all ; chars. directly preceding Klas/Lesgroep are removed.
POSIX-compliant form:
$ sed 's|;\{1,\}\(Klas/Lesgroep\)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
POSIX requires the less powerful and antiquated BRE syntax, where duplication symbol + must be emulated as \{1,\}, and, generally, metacharacters (, ), {, } must be \-escaped.
With sed you can search for lines starting with at least one semi-colon followed by Klas/Lesgroep and, if found, substitute leading ; with nothing:
$ sed '/;;*Klas\/Lesgroep/s/^;*//g' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
To remove all ";" from a file , we can use sed command . sed is used for modifying the files.
$ sed 's/find/replace/g' file
The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.
So to remove ";" just find and replace it with nothing.
sed 's/;//g' file.csv

Sed capturing too much during substring extraction

I'm trying to parse a curl response in order to retrieve an img src, identified with the alt tag captcha.
So to test my sed expression I tried the following:
echo 'alt="captcha" src="http://example.com/foo.html" /></p>' | sed -n 's/.*alt="captcha" src="\([^"]*\)/\1/p'
However this echos
http://example.com/foo.html" /></p>
How can I simply return
http://example.com/foo.html
?
I am new to sed so I would like to know where I'm going wrong.
This answer explains sed's behavior, but 123 - who also gave the right answer to the sed problem succinctly in a comment - points to a potentially better alternative, if you have GNU grep: grep -oP 'alt="captcha" src="\K[^"]*'. GNU grep's -P option supports PCREs, which are more powerful regular expressions than those available in sed.
The issue is not related to greediness, but to the fact that your regex only matches part of the line:
To extract a substring in sed, your regex must match the entire line. Otherwise, any parts not matched by your regex are simply passed through, as happened with the trailing " /></p> in your case; here's a fix:
$ echo 'alt="captcha" src="http://example.com/foo.html" /></p>' |
sed -n 's/.*alt="captcha" src="\([^"]*\).*/\1/p'
http://example.com/foo.html
Note the trailing .* I've added, which ensures that the remainder of the line is matched as well.
Without it, what is left of the input line after the match is simply appended to the result of your substitution; i.e., the " /></p> part. More correctly: the remaining part of the line is simply not replaced.
Therefore, generally, you'd use an approach such as the following (pseudo notation):
sed 's/^...<capture-group>...$/\1/p'
Again, the regex must match the whole line for this to work.
Due to sed's greedy matching, you neither need ^ nor $, though you may choose to add it for clarity of intent.
Caveat: If your capture group has no ambiguity, .* is fine to match the remainder of the line, but .* to match everything before the capture group will not work in all cases - see below.
A simple example to demonstrate the problem:
$ sed -n 's/[^"]*"\([^"]*\)/>>\1<</p' <<<'before"foo"after' # WRONG
>>foo<<"after
Note how \1 does contain the substring of interest captured by \([^"]*\), as intended - the string foo between "..." - but, because the regex stopped matching just before the closing ", the remainder of the line - "after - is still output.
Fixed version, with .* appended to ensure that the whole line matches:
$ sed -n 's/[^"]*"\([^"]*\).*/>>\1<</p' <<<'before"foo"after'
>>foo<<
Also note how [^"]*" is used to match the beginning of the line up to the capture group; .* would not work here, due to sed's greedy matching:
$ sed -n 's/.*"\([^"]*\).*/>>\1<</p' <<<'before"foo"after' # WRONG
>>after<<
.*" greedily matches everything up to the last ", and so the capture group then captures after, which is the run of non-" chars. after the closing ".
Use sed grouping. Its always my goto!
Sed regex:
echo 'alt="captcha" src="http://example.com/foo.html" /></p>' | sed 's/\(^alt.*src=\"\)\(.*\)\(\".*p>\)/\2/g'
Output
http://example.com/foo.html

unterminated address regex while using sed

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!
Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.
If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)
awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

Resources