How to use sed regex pattern matching - bash

I'm learning bash and I'm trying to parse a webpage(https://chromium-i18n.appspot.com/ssl-address) and extract the href o
f interest using sed. The pattern I'm using is:
/<a\shref=\'\/ssl-address\/data\/([^\"]*)\'>/siU
However, I cant get the expression to work with sed. When i run:
data=$(wget ${serviceUrl} -q -O -)
parsedData=$(sed '/<a\shref=\'\''\/ssl-address\/data\/([^\"]*)\'\''>/siU/' <<< ${data})
echo ${parsedData}
I get the following error:
sed: 1: "/<a\shref=\'\/ssl-addre ...": unterminated substitute pattern
What am I doing wrong?

Is this what you're trying to do?
$ wget 'https://chromium-i18n.appspot.com/ssl-address' -q -O - |
sed -n 's:.*/ssl-address/data/\([^'\'']*\).*:\1:p'
AC
AD
AD/Canillo
AD/Encamp
I see you're getting some answers using double quotes instead of single around your sed script so you can do "...'..." instead of '...'\''...' - though tempting and it'd function OK for this particular current example, don't do it. To avoid any surprises now or if/when your requirements change later, in all shell programming always enclose strings and scripts in single quotes unless you need to expose them to the shell for interpretation and then use double quotes unless you need the shell to do globbing and file name expansion on them and then use no quotes.

All right, you are trying to parse an entire webpage.
This situation require to delete all the lines you don't need.
As #Ed Morton said, you can use something else than sed.
Your webpage is this as you told us in a comment, so you first need do download it.
Note that the changing how you download the source of the page, you can change some thing (E.G. copy pasting it from the console of Firefox you will have href=", using wget you will have href=').
That said, let's use wget as you are currently doing in your question.
# This will create the ssl-address file
wget "https://chromium-i18n.appspot.com/ssl-address"
# This will give you a list of all of the links in a href.
sed -e "/<a href='.*/! d" -e "s/<a href='\/ssl-address\/data\/\(.*\)'.*/\1/" ssl-address
EDIT:
Reading your comments I saw you would like to filter some of the output (E.G. deleting all the examples link)
This can be done adding a piece of sed in order to delete lines you don't need.
In your case you just need to add -e "/<a href='\/ssl-address\/examples.*/d" so the whole line of code should be as follow:
sed -e "/<a href='.*/! d" -e "/<a href='\/ssl-address\/examples.*/d" -e "s/<a href='\/ssl-address\/data\/\(.*\)'.*/\1/" ssl-address

You probably want something like this, based on that input data:
sed -e "s/.*href='\([^']*\)'.*/\1/"
It says, "match anything .* followed by the literal characters href=' followed by anything other than the ' character [^']* (we capture using the \( ... \) notation) followed by the ' character followed by anything".
Note I used the " to enclose the sed expression, to avoid you having to quote the '.

Related

replace file path name from multiple file using sed

I want to replace <lexicon uri="file://C:/image/png/grammars/custom/image-custom.lex?SWI.type=backup"/><lexicon uri="file://C:/image/jpg/grammars/custom/image-dot-custom.lex?SWI.type=backup"/> with null in multiple files.
The code is given below.
sed -i s|<lexicon uri="file://C:/image/png/grammars/custom/image-custom.lex?SWI.type=backup"/><lexicon uri="file://C:/image/jpg/grammars/custom/image-dot-custom.lex?SWI.type=backup"/>||g *
Here I am getting this error:
< was unexpected at this time.
Please clarify for me what is not working here.
Could you please try following and let me know if this helps you. By using # as sed's separator you need not to escape / in it only need to escape ., ? not to take their special meaning
sed -E 's#<lexicon uri="file://C:/image/png/grammars/custom/image-custom\.lex\?SWI\.type=backup"/><lexicon uri="file://C:/image/jpg/grammars/custom/image-dot-custom\.lex\?SWI\.type=backup"/>##' Input_file
Tested it with:
sed --version
GNU sed version 4.2.1
works with #
sed -i -e 's#<lexicon uri="file://C:/image/png/grammars/custom/image-custom.lex?SWI.type=backup"/><lexicon uri="file://C:/image/jpg/grammars/custom/image-dot-custom.lex?SWI.type=backup"/>##g' test.txt
The pattern contains shell metacharacters, which need to be quoted or escaped. Usually, in Bash, you should use single quotes around strings, unless you need the shell to interpolate variables and command substitutions and interpret backslash sequences (in which case use double quotes) or to also perform whitespace tokenization and wildcard expansion (in which case use no quotes). See also When to wrap quotes around a shell variable?
sed -i 's|<lexicon uri="file://C:/image/png/grammars/custom/image-custom.lex?SWI.type=backup"/><lexicon uri="file://C:/image/jpg/grammars/custom/image-dot-custom.lex?SWI.type=backup"/>||' *
I also took out the g flag, which only makes sense if you expect multiple matches within a single line. (Perhaps you do after all, in which case obviously put it back.)

proper syntax for the s command along to the addressing in sed

I want to issue this command from the bash script
sed -e $beginning,$s/pattern/$variable/ file
but any possible combination of quotes gives me an error, only one that works:
sed -e "$beginning,$"'s/pattern/$variable/' file
also not good, because it do not dereferences the variable.
Does my approach can be implemented with sed?
Feel free to switch the quotes up. The shell can keep things straight.
sed -e "$beginning"',$s/pattern/'"$variable"'/' file
You can try this:
$ sed -e "$beginning,$ s/pattern/$variable/" file
Example
file.txt:
one
two
three
Try:
$ beginning=1
$ variable=ONE
$ sed -e "$beginning,$ s/one/$variable/" file.txt
Output:
ONE
two
three
There are two types of quotes:
Single quotes preserve their contents (> is the prompt):
> var=blah
> echo '$var'
$var
Double quotes allow for parameter expansion:
> var=blah
> echo "$var"
blah
And two types of $ sign:
One to tell the shell that what follows is the name of a parameter to be expanded
One that stands for "last line" in sed.
You have to combine these so
The shell doesn't think sed's $ has anything to do with a parameter
The shell parameters still get expanded (can't be within single quotes)
The whole sed command is quoted.
One possibility would be
sed "$beginning,\$s/pattern/$variable/" file
The whole command is in double quotes, i.e., parameters get expanded ($beginning and $variable). To make sure the shell doesn't try to expand $s, which doesn't exist, the "end of line" $ is escaped so the shell doesn't try anything funny.
Other options are
Double quoting everything but adding a space between $ and s (see Ren's answer)
Mixing quoting types as needed (see Ignacio's answer)
Methods that don't work
sed '$beginning,$s/pattern/$variable/' file
Everything in single quotes: the shell parameters are not expanded (doesn't follow rule 2 above). $beginning is not a valid address, and pattern would be literally replaced by $variable.
sed "$beginning,$s/pattern/$variable/" file
Everything in double qoutes: the parameters are expanded, including $s, which isn't supposed to (doesn't follow rule 1 above).
the following form worked for me from within script
sed $beg,$ -e s/pattern/$variable/ file
the same form will also work if executed from the shell

How to use bash variables in sed command?

I want to do (in bash script):
NEWBASE=`echo $NAME | sed "s/${DIR}//g" | sed 's/.\///g'`
I read in the net, that I have to replace single quote with double quote.
This is unfortunately not working. Why? Thanks
sed is overkill for this. Use parameter expansion:
NEWBASE=${NAME//$DIR//}
NEWBASE=${NEWBASE//.\//}
It is important to understand that bash and sed are two completely independent things. When you give bash a command, it first processes it according to its rules, in order to come up with a utility name and a set of arguments for that utility (in this case sed), and then calls the utility with the arguments.
Probably $DIR contains a slash character. Perhaps it looks something like /usr/home/codyline/src.
So when bash substitutes that into the argument to the sed command:
"s/${DIR}//g"
the result is
s//usr/home/codyline/src//g
which is what is then passed to sed. But sed can't understand that commabnd: it has (many) too many / characters.
If you really want to use sed for this purpose, you need to use a delimiter other than /, and it needs to be a character you are confident will never appear in $DIR. Fortunately, the sed s command allows you to use any character as a delimiter: whatever character follows the s is used as the delimiter. But there always must be exactly three of them in the command.
For example, you might believe that no directory path contains a colon (:), in which case you could use:
sed "s:${DIR}::g"
Of course, someday that will fail precisely because you have a directory with a colon in its name. So you could make things more general by using bash's substitute-and-replace feature to backslash-escape all the colons:
sed "s:${DIR//:/\:}::g"
But you could have used this bash feature in order to avoid the use of sed altogether:
NEWBASE=${NAME//$DIR}
Unfortunately, you can't nest bash substitute-and-replaces, so you need to do them sequentially:
NEWBASE=${NEWBASE//.\/}
Note: I used ${var//...}, which is the equivalent of specifying the g flag in a sed s command, but I really don't know if it is appropriate. Do you really expect multiple instances of $DIR in a single path? If there are multiple instances, do you really want to remove all of them? You'll have to decide.

Simple bash script giving me problems

I'm having difficulty getting this bash script to perform the formatting of an input.
It's pretty straight-forward, but when it executes the line that starts with 'newstring=', it doesn't perform the sed operation, it only prints my input (up until the first white-space) then prints my sed command directly after. What am I doing wrong?
#! /bin/bash
##format paths/strings with spaces to escape the spaces with a forward-slash'\'
##then use 'open' to open finder at current-set directory (based on path)
oldstring="$1"
newstring="$oldstring | sed 's/ /\\ /g')"
cd $newstring
open .
You should simply do:
cd "$1"
open .
This avoids running sub-processes and deals with various problems that the sed script doesn't (such as names containing $ symbols, or other shell metacharacters). Generally, if a variable (or positional parameter such as $1) is a file name that could contain spaces, use it surrounded by double quotes every time.
Try putting the command in backquotes like
newstring=`echo "$oldstring" | sed 's/ /\\ /g')`
#Jonathan Leffler's is the correct solution, since adding escapes doesn't actually do what you want but double-quoting does. However, I'll take this opportunity to point out that there's a better way to add escapes using bash's built-in substitution capability instead of sed:
newstring="${oldstring/ /\\ }"
So there you have it, a better way to implement the wrong solution. Personally, I voted for Jonathan's.

Sed not working inside bash script

I believe this may be a simple question, but I've looked everywhere and tried some workarounds, but I still haven't solved the problem.
Problem description:
I have to replace a character inside a file and I can do it easily using the command line:
sed -e 's/pattern1/pattern2/g' full_path_to_file/file
But when I use the same line inside a bash script I can't seem to be able to replace it, and I don't get an error message, just the file contents without the substitution.
#!/bin/sh
VAR1="patter1"
VAR2="patter2"
VAR3="full_path_to_file"
sed -e 's/${VAR1}/${VAR2}/g' ${VAR3}
Any help would be appreciated.
Thank you very much for your time.
Try
sed -e "s/${VAR1}/${VAR2}/g" ${VAR3}
Bash reference says:
The characters ‘$’ and ‘`’ retain their special meaning within double quotes
Thus it will be able to resolve your variables
I use a script like yours... and mine works as well!
#!/bin/sh
var1='pattern1'
var2='pattern2'
sed -i "s&$var1&$var2&g" *.html
See that, mine use "-i"... and the seperator character "&" I use is different as yours.
The separator character "&" can be used any other character that DOES NOT HAVE AT PATTERN.
You can use:
sed -i "s#$var1#$var2#g" *.html
sed -i "s#$var1#$var2#g" *.html
...
If my pattern is: "test#email.com" of course you must use a seperator different like "#", "%"... ok?

Resources