Using sed to substitute text around dynamic filename - bash

I'm trying to figure out the best method for substituting text in a BASH script. Sed seems to be the best option, but correct me if I'm wrong.
What I'd like to do is take every instance of images/< filename >.png in a file, and add surrounding text - {{media("images/.< filename >.png")}}. The following code is the closest I've been able to get:
sed -i -e 's:images/.*.png:{{media("images/.*.png")}}:g' file.html
How can I make this happen?

In sed, & in the substitution will be replaced with the matched string, so if we can assume no spaces in a filename and a word boundary before and after each, this does what you want:
s:\bimages/\S*\.png\b:{{media("&")}}:g
Try it online!
Apart from doing the substitution, there are a couple issues with your code worth mentioning:
images/.*.png will match images/foo.png, but it will also match images/foopng. Don't forget to escape regex characters: images/.*\.png.
sed quantifiers are always greedy. Suppose you had this input:
Foo images/bar.png baz images/qux.png quux
In this case, the expression images/.*\.png would match everything from the first images to the last .png. The solution above avoids this by using \S instead of . to match only non-whitespace characters.

Related

Remove word from url

I Need to remove /%(tenant_id)s from this source:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
To make it look like this:
https://ext.an1.test.dev:8776/v3
I'm trying through sed, but unsuccessfully.
curl ....... | jq -r .endpoints[].url | grep '8776/v3' | sed -e 's/[/%(tenant_id)s] //g'
I get it again:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
You seem to be confused about the meaning of square brackets.
curl ....... |
jq -r '.endpoints[].url' |
sed -n '\;8776/v3;s;/%.*;;p'
fixes the incorrect regex, loses the useless grep, and somewhat simplifies the processing by switching to a different delimiter. To protect against (fairly unlikely) shell wildcard matches on the text in the jq search expression, I also added single quotes around that.
In some more detail, sed -n avoids printing input lines, and the address expression \;8776/v3; selects only input lines which match the regex 8776/v3; we use ; as the delimiter around the regex, which (somewhat obscurely) requires the starting delimiter to be backslashed. Then, we perform the substitution: again, we use ; as the delimiter so that slashes and percent signs in the regex do not need to be escaped. The p flag on the substitution causes sed to print lines where the substitution was performed successfully; we remove the g flag, as we don't expect more than one match per input line. The substitution replaces everything after the first occurrence of /% with nothing.
(Equivalently, with slash delimiters, you would have to backslash all literal slashes: sed -n '/8776\/v3/s/\/%.*//p'.)
For the record, square brackets in regular expressions form a character class; the expression [abc] matches a single character which can be one of a, b, or c. Perhaps review the tips on the Stack Overflow regex tag info page for a quick rerun on this and other common beginner mistakes.
Besides the incorrect square brackets, your regex specified a space after s, which is unlikely to be there. Other than that, your regex should work fine if you are sure the string you want to remove is always exactly /%(tenant_id)s. (Many regex dialects require round parentheses to be escaped, but sed without -E or -r is not one of those.)
If you've managed to get the address into a variable then one parameter expansion idea:
$ myaddr='https://ext.an1.test.dev:8776/v3/%(tenant_id)s'
$ echo "${myaddr%/*}"
https://ext.an1.test.dev:8776/v3
$ mynewaddr="${myaddr%/*}"
$ echo "${mynewaddr}"
https://ext.an1.test.dev:8776/v3

Extract text between two special characters

Trying to extract the text between the special characters "\ and \" through sed
Ex: "\hell##$\"},
expected output : hell##$
You can do it quite easily with using a capture-group and backreference with basic regular-expressions:
sed 's/^["][\]\([^\]*\).*$/\1/'
Explanation
Normal substitution sed 's/find/replace/, where
find is ^["][\] a double-quote and \ before beginning the capture \(...\) which contains [^\]* (zero or more characters not a \), the closing of the capture \) and then .*$ the remainder of the string;
replace is \1 (the first backreference) containing the text captured between \(...\).
(note: if your "\ doesn't begin the string, remove the first '^' anchor)
Example
$ echo '"\hell##$\"},' | sed 's/^["][\]\([^\]*\).*$/\1/'
hell##$
Look things over and let me know if you have questions.
This might work for you (GNU sed):
sed -nE '/"\\[^\\]*\\+([^\\"][^\\]*\\+)*"/{s/"\\/\n/;s/.*\n//;s/\\"/\n/;P;D}' file
The solution comes in two parts:
Firstly, a regexp to determine whether a pair of two characters exists. This can be tricky as a negated class is insufficient because edge cases can easily defeat a simplistic approach.
Secondly, once a pair of characters does exist the text between them must be extracted piece meal.

Remove "../" from text file using sed

I have a text file containing text such as this
../path-to-image/folder1/image.jpg path-to-another-image/folder2/image.png
I would like to remove the "../" part and obtain
path-to-image/folder1/image.jpg path-to-another-image/folder2/image.png
I have tried using sed with
sed -i 's#../##g' file.txt
But I obtain the following:
path-to-imafoldeimage.jpg path-to-another-imafoldeimage.png
All the slashes and some other characters were removed and thus the path to my images was broken.
I looked up how to make it match exactly the string using
\<\>
sed 's#\<../\>#%%#g' file.txt
But the output is identical to input. Is there a way to remove "../" using sed? I need this from command line since I have about 10 files with similar path structures which I will copy into a bunch of directories. Meaning I can't do this manually.
.s have special meaning in regex syntax, and need to be escaped.
Either [.] (creating a character class of size one) or \. will suffice; I strongly advise the former, as it works properly in a wider array of quoting contexts. Thus:
sed -i 's#[.][.]/##g' file.txt
Dots are special characters in regex. They mean any character (except a newline). So you need to escape them with backslashes in the sed command:
sed -i 's#\.\./##g' file.txt
Do sed -i 's/\.\.\///g' file.txt
's/\.\.\///g' replaces ../ with an empty string, as of the syntax 's/string/replacement/g'
\.\.\/ escapes the dots and the slash, which is necessary because dots and slashes are special characters in regex. After escaping \.\.\/, the string reads ../.
The following two slashes surround the replacement string, which is empty in this case.
Edit:
For easier legibility (and to avoid escaping the slash):
sed -i 's#\.\.\/##g' file.txt. This is much closer to your initial attempt, and as a revised explanation, \.\./ translates to ../, as the slash no longer needs to be escaped. The dots are still special characters and must be escaped with the backslash.

unterminated address regex while using sed

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!
Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.
If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)
awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

Print all characters upto a matching pattern from a file

Maybe a silly question but I have a text file that needs to display everything upto the first pattern match which is a '/'. (all lines contain no blank spaces)
Example.txt:
somename/for/example/
something/as/another/example
thisfile/dir/dir/example
Preferred output:
somename
something
thisfile
I know this grep code will display everything after a matching pattern:
grep -o '/[^\n]*' '/my/file.txt'
So is there any way to do the complete opposite, maybe rm everything after matching pattern or invert to display my preferred output?
Thanks.
If you're calling an external command like grep, you can get the same results your require with the sed command, i.e.
echo "something/as/another/example" | sed 's:/.*::'
something
Instead of focusing on what you want to keep, think about what you want to remove, in this case everything after the first '/' char. This is what this sed command does.
The leading s means substitute, the :/.*: is the pattern to match, with /.* meaning match the first /' char and all characters after that. The 2nd half of thesedcommand is the replacement. With::`, this means replace with nothing.
The traditional idom for sed is to use s/str/rep/, using / chars to delimit the search from the replacement, but you can use any character you want after the initial s (substitute) command.
Some seds expect the / char, and want a special indication that the following character is the sub/replace delimiter. So if s:/.*:: doesn't work, then s\:/.*:: should work.
IHTH.
Yu can use a much simpler reg exp:
/[^/]*/
The forward slash after the carat is what you're matching to.
jsFiddle
Assuming filename as "file.txt"
cat file.txt | cut -d "/" -f 1
Here, we are cutting the input line with "/" as the delimiter (-d "/"). Then we select the first field (-f 1).
You just need to include starting anchor ^ and also the / in a negated character class.
grep -o '^[^/]*' file

Resources