I'm using sed for parse some HTML page, here is the code:
name=`echo $p | sed -n 's/.*href=\"\([^"]*\)" class=\"alleLink iTitle\"><span>\([^<]*\)<\/span>.*/\1/p'`;
When there is a match it works good - returns required substring. But when there is no match, sed just freeze and the script is doing nothing. I just wanna receive empty string or something like that.
Do you know what to do?
Thanks
Roman Zkamene
I recommend you to install perl module WWW::Mechanize with the command
cpan -i WWW::Mechanize
or search in your package manager for perl.*mechanize
then, you will be able to run this command in the shell (interactive or not) to see all the links on a page :
mech-dump --links http://foobar.tld
Moreover, sed is not the right tool to parse HTML. python ruby or perl will be your best bet.
I think by example of
python + lxml or python + beautifoul soup
perl + WWW::Mechanize
One more thing :
you can use any character you want as sed delimiter, so escaping / is not necessary and will be more readable for everyone
A couple of points:
This one has to be inevitably the first
You can simplify the expression using the -r switch for sed
Related
I want to clean my file before/after saving so I have to delete unnecessary characters that I have there. Sadly, even that my regex is working in Regex101, it does not work in shell script I wrote.
I am getting my list from Kubernetes via
kubectl get pods -n $1 -o jsonpath='{range .items[*]}{#.spec.containers[*].image}{","}{#.status.containerStatuses[*].imageID}{"\n"}{end}'
Then I saving it to the temp file and using sed to clear it - the regex should match and (sed should) delete any character between , and # (also should delete #). I am escaping them since they are special characters.
sed -i 's/(?<=\,)(.*?)(?<=\#)//g' temp
The problem is that this regex is working fine (for example in Regex101) but is not working with the sed command. I even tried awk but getting the same output.
awk '!/(?<=\,)(.*?)(?<=\#)/' temp
Am I missing something or is the regex acting differently somehow in Unix/shell?
Thanks for any input.
Example content of the file (for test):
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,docker-pullable://docker.elastic.co/elasticsearch/elasticsearch#sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,docker-pullable://bitnami/kafka#sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,docker-pullable://bitnami/mongodb#sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,docker-pullable://bitnami/postgresql#sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,docker-pullable://bitnami/zookeeper#sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
Expected output:
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
You are trying to use Perl extensions which are not supported by more traditional regex tools like sed and Awk.
Perhaps see also Why are there so many different regular expression dialects? and the Stack Overflow regex tag info page.
If I can guess what you are trying to do, you want simply
sed -i 's/,[^#]*#/,/g' temp
The /g flag is unnecessary if you only expect one match per line.
Neither , nor # is a regex metacharacter; they do not require escaping.
Usually you would want to avoid using a temporary file or sed -i; perhaps simply
kubectl blah blah | sed 's/,[^#]*#/,/' > temp
to create the file, or remove the redirection if you want to pipe the results further.
I want to parse a file and replace the text between "::" and ":::" with the text already there, just now capitalized.
I've tried using this command:
sed 's/\(::\)\(.*\)\(:::\)/\1\U\2\E\3/' filename
but the output just puts a U in beginning and E at the end of the string I want capitalized
Works for me, which makes me think you may not be on Linux?
echo "This is :: some sample text ::: to test uppercasing" | sed 's/\(::\)\(.*\)\(:::\)/\1\U\2\E\3/'
This is :: SOME SAMPLE TEXT ::: to test uppercasing
If Perl is your option, you can say something like:
echo "This is :: some sample text ::: to test uppercasing" | perl -pe 's/(::)(.*)(:::)/\1\U\2\E\3/'
This is :: SOME SAMPLE TEXT ::: to test uppercasing
gawk '{match($0,/::.*:::/,a) ;gsub(/::.*::/,toupper(a[0]))}1' input
Here ,bit less cryptic solution with gawk:, match is used to find the desired string ,later that string is used by gsub to convert it to upped cause using toupper function.
You are pretty close.
On Mac OS X, you will need to install GNU sed, because the feature you are using - \U - is a GNU extension.
So, start by installing it:
▶ brew install gnu-sed
Then I normally stick in some code like this somewhere:
shopt -s expand_aliases
alias sed='/usr/local/bin/gsed'
And then your GNU sed will work.
Finally, I would simplify that code as:
▶ sed -E 's/(::)(.*)(::)/\1\U\2\E\3/' <<< "foo::bar::baz"
foo::BAR::baz
Noting that -E gives you Extended Regular Expressions, and a cleaner syntax when you are doing captures.
This might work for you (GNU sed):
sed 's/::[^:]*:::/\U&/' file
or perhaps:
sed 's/::[^:]*:::/\n&\n/;h;y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/;G;s/.*\n\(.*\)\n.*\n\(.*\)\n.*\n/\2\1/' file
Using seds y native translate command, pattern matching and a copy held in the hold space.
I have a config.yaml file which contains among other values the following list of kafka brokers which I want to remove from the config using a bash script.
kafka.brokers:
- "node003"
- "node004"
I am doing this currently by invoking vi from inside the script using the command:
vi $CONF_BENCHMARK/config.yaml -c ":%s/kafka.brokers:\(\n\s*-\s".*"\)*/kafka.brokers:/g" -c ':wq!'
I understand that sed is a more appropriate tool to accomplish the same task but when I try to translate the above regex to sed, it does not work.
sed -i -e "s/kafka.brokers:\(\n\s*-\s".*"\)*/kafka.brokers:/g" $CONF_BENCHMARK/config.yaml
I am doing something wrong ?
awk to the rescue!
sed is line based, this should work...
$ awk 's{if(/\s*-\s*"[^"]*"/) next; else s=0} /kafka.brokers:/{s=1}1' file
Explanation
if(/\s*-\s*"[^"]*"/) next if pattern matches skip to next line
s{if(/\s... check pattern if only s is set
/kafka.brokers:/{s=1} when header seen set s
1 shorthand for print lines (if not skipped)
s{... else s=0} if s was set but pattern not found, reset s
As other have pointed out, you will need to be explicit to get sed working with multiply lines.
The real answer is to use AWK a beautiful answer is provided by karakfa. But for the educational purpose I will provide an sed answer:
sed '
/kafka.brokers/ {
:a
$be
N
/\n[[:space:]]*-[[:space:]]"[^\n]*"[^\n]*$/ba
s/\n.*\(\n\)/\1/
P;D
:e
s/\n.*//
}
' input
Basically sed will keep append lines to the pattern space from when kafka.brokers up until \n[[:space:]]*-[[:space:]]"[^\n]*"[^\n]*$ is not matches.
This will leave the pattern space with one trailing line it in, i.e:
kafka.brokers:\n - "node003"\n - "node004"\nother stuff$
Replacing everything \n.*\(\n\) with a newline leaves the following pattern space:
kafka.brokers:\nother stuff$
P;D will print the first line from the pattern space and then restart the cycle with the remaning pattern space. Making the input support:
kafka.brokers:
- "node003"
- "node004"
kafka.brokers:
- "node005"
more_input
Your Vim pattern matches across multiple lines, but sed works line-by-line. (That is, it first tries to match your pattern against kafka.brokers: and fails, then it tries to match - "node003", and so on.) Your instinct to use something other than Vim was right, but sed probably isn't the best tool for the job here.
This answer addresses the problem of matching multi-line patterns with sed in more detail.
My personal recommendation would be to use a scripting language like Python or Perl to deal with complicated pattern-matching. You can run a Python command with python -c <command>, for instance, just like you did with Vim, or you could write a small Python script that you call from your Bash script. It's a little more complicated than a sed one-liner, but it will probably save you a lot of debugging and make your script easier to maintain and modify.
Consider using yq instead of sed or awk. Deleting the key kafka.brokers then becomes as simple as:
yq d $CONF_BENCHMARK/config.yaml '"kafka.brokers"'
The following snippet demonstrates the yq delete feature:
cat <<EOF | yq d - '"kafka.brokers"'
some:
path: value
kafka.brokers:
- "node003"
- "node004"
EOF
... and results in the output
some:
path: value
For some reason I can't seem to find a straightforward answer to this and I'm on a bit of a time crunch at the moment. How would I go about inserting a choice line of text after the first line matching a specific string using the sed command. I have ...
CLIENTSCRIPT="foo"
CLIENTFILE="bar"
And I want insert a line after the CLIENTSCRIPT= line resulting in ...
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Try doing this using GNU sed:
sed '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
if you want to substitute in-place, use
sed -i '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
Output
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Doc
see sed doc and search \a (append)
Note the standard sed syntax (as in POSIX, so supported by all conforming sed implementations around (GNU, OS/X, BSD, Solaris...)):
sed '/CLIENTSCRIPT=/a\
CLIENTSCRIPT2="hello"' file
Or on one line:
sed -e '/CLIENTSCRIPT=/a\' -e 'CLIENTSCRIPT2="hello"' file
(-expressions (and the contents of -files) are joined with newlines to make up the sed script sed interprets).
The -i option for in-place editing is also a GNU extension, some other implementations (like FreeBSD's) support -i '' for that.
Alternatively, for portability, you can use perl instead:
perl -pi -e '$_ .= qq(CLIENTSCRIPT2="hello"\n) if /CLIENTSCRIPT=/' file
Or you could use ed or ex:
printf '%s\n' /CLIENTSCRIPT=/a 'CLIENTSCRIPT2="hello"' . w q | ex -s file
Sed command that works on MacOS (at least, OS 10) and Unix alike (ie. doesn't require gnu sed like Gilles' (currently accepted) one does):
sed -e '/CLIENTSCRIPT="foo"/a\'$'\n''CLIENTSCRIPT2="hello"' file
This works in bash and maybe other shells too that know the $'\n' evaluation quote style. Everything can be on one line and work in
older/POSIX sed commands. If there might be multiple lines matching the CLIENTSCRIPT="foo" (or your equivalent) and you wish to only add the extra line the first time, you can rework it as follows:
sed -e '/^ *CLIENTSCRIPT="foo"/b ins' -e b -e ':ins' -e 'a\'$'\n''CLIENTSCRIPT2="hello"' -e ': done' -e 'n;b done' file
(this creates a loop after the line insertion code that just cycles through the rest of the file, never getting back to the first sed command again).
You might notice I added a '^ *' to the matching pattern in case that line shows up in a comment, say, or is indented. Its not 100% perfect but covers some other situations likely to be common. Adjust as required...
These two solutions also get round the problem (for the generic solution to adding a line) that if your new inserted line contains unescaped backslashes or ampersands they will be interpreted by sed and likely not come out the same, just like the \n is - eg. \0 would be the first line matched. Especially handy if you're adding a line that comes from a variable where you'd otherwise have to escape everything first using ${var//} before, or another sed statement etc.
This solution is a little less messy in scripts (that quoting and \n is not easy to read though), when you don't want to put the replacement text for the a command at the start of a line if say, in a function with indented lines. I've taken advantage that $'\n' is evaluated to a newline by the shell, its not in regular '\n' single-quoted values.
Its getting long enough though that I think perl/even awk might win due to being more readable.
A POSIX compliant one using the s command:
sed '/CLIENTSCRIPT="foo"/s/.*/&\
CLIENTSCRIPT2="hello"/' file
Maybe a bit late to post an answer for this, but I found some of the above solutions a bit cumbersome.
I tried simple string replacement in sed and it worked:
sed 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
& sign reflects the matched string, and then you add \n and the new line.
As mentioned, if you want to do it in-place:
sed -i 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
Another thing. You can match using an expression:
sed -i 's/CLIENTSCRIPT=.*/&\nCLIENTSCRIPT2="hello"/' file
Hope this helps someone
The awk variant :
awk '1;/CLIENTSCRIPT=/{print "CLIENTSCRIPT2=\"hello\""}' file
I had a similar task, and was not able to get the above perl solution to work.
Here is my solution:
perl -i -pe "BEGIN{undef $/;} s/^\[mysqld\]$/[mysqld]\n\ncollation-server = utf8_unicode_ci\n/sgm" /etc/mysql/my.cnf
Explanation:
Uses a regular expression to search for a line in my /etc/mysql/my.cnf file that contained only [mysqld] and replaced it with
[mysqld]
collation-server = utf8_unicode_ci
effectively adding the collation-server = utf8_unicode_ci line after the line containing [mysqld].
I had to do this recently as well for both Mac and Linux OS's and after browsing through many posts and trying many things out, in my particular opinion I never got to where I wanted to which is: a simple enough to understand solution using well known and standard commands with simple patterns, one liner, portable, expandable to add in more constraints. Then I tried to looked at it with a different perspective, that's when I realized i could do without the "one liner" option if a "2-liner" met the rest of my criteria. At the end I came up with this solution I like that works in both Ubuntu and Mac which i wanted to share with everyone:
insertLine=$(( $(grep -n "foo" sample.txt | cut -f1 -d: | head -1) + 1 ))
sed -i -e "$insertLine"' i\'$'\n''bar'$'\n' sample.txt
In first command, grep looks for line numbers containing "foo", cut/head selects 1st occurrence, and the arithmetic op increments that first occurrence line number by 1 since I want to insert after the occurrence.
In second command, it's an in-place file edit, "i" for inserting: an ansi-c quoting new line, "bar", then another new line. The result is adding a new line containing "bar" after the "foo" line. Each of these 2 commands can be expanded to more complex operations and matching.
I'm looking through an Oracle script I found online, but it runs a sed command to filter results from a trace file. I'm running Oracle on a Windows server, so the sed command isn't recognized.
host sed -n '/scattered/s/.*p3=//p' &Trace_Name | sort -n | tail -1
I've tried reading the online documentation, but am still not sure how to interpret what this command is trying to filter. Would anyone be so kind as to help me interpret what this command is trying to filter? Or better yet, what I can run from a Windows command prompt to achieve the same result.
Thanks!
It says "on lines that contain 'scattered' replace zero or more of any character followed by 'p3=' with nothing (delete it, in other words) and print the result" (-n says don't print lines unless there's an explicit print command).
For this example input:
abc organized p3=123
def scattered p3=456
ghi ordered p3=789
The output would be:
456
The sed command is searching a file for strings that match a pattern. The "-n" option will suppress output that is not explicitly printed. The "p" at the end says to print the lines that match the preceding pattern.
sort -n does a numerical sort.
tail -1 prints the last line, only.
So, it seems to be searching for scattered disk reads and printing the line with the biggest value.
I think the regular expression pattern is eliminating everything up to and including "p3=". The s/from/to/" is a substitute command.
The sed command works with cygwin, a unix-like shell for Windows.
To address the other part of your question, the unxutils project ports many GNU utilities to Win32, including sed. Find out more.