How to use uBlock static filter operator - adblock

uBlock wiki says:
Filters such as ||example.com^ are still considered generic.
What is || operator? What does the above filter do?

uBlock uses/supports some AdBlockPlus syntax, you can refer to docs on it:
https://help.eyeo.com/en/adblockplus/how-to-write-filters#creating-filters
You might want to block http://example.com/banner.gif as well as https://example.com/banner.gif and http://www.example.com/banner.gif. You can do this by putting two pipe symbols in front of the filter. This ensures that the filter matches at the beginning of the domain name: ||example.com/banner.gif, and blocks all of these addresses while not blocking http://badexample.com/banner.gif or http://gooddomain.example/analyze?http://example.com/banner.gif.

Related

How to handle two dashes in ReST

I'm using Sphinx to document a command line utility written in Python. I want to be able to document a command line option, such as --region like this:
**--region** <region_name>
in ReST and then use Sphinx to to generate my HTML and man pages for me.
This works great when generating man pages but in the generated HTML, the -- gets turned into - which is incorrect. I have found that if I change my source ReST document to look like this:
**---region** <region_name>
The HTML generates correctly but now my man pages have --- instead of --. Also incorrect.
I've tried escaping the dashes with a backslash character (e.g. \-\-) but that had no effect.
Any help would be much appreciated.
This is a configuration option in Sphinx that is on by default: the html_use_smartypants option (http://sphinx-doc.org/config.html?highlight=dash#confval-html_use_smartypants).
If you turn off the option, then you will have to use the Unicode character '–' if you want an en-dash.
With
**-\\-region** <region_name>
it should work.
In Sphinx 1.6 html_use_smartypants has been deprecated, and it is no longer necessary to set html_use_smartypants = False in your conf.py or as an argument to sphinx-build. Instead you should use smart_quotes = False.
If you want to use the transformations formerly provided by html_use_smartypants, instead it is recommended to use smart_quotes, e.g., smart_quotes = True.
Note that at the time of this writing Read the Docs pins sphinx==1.5.3, which does not support the smart_quotes option. Until then, you'll need to continue using html_use_smartypants.
EDIT It appears that Sphinx now uses smartquotes instead of docutils smart_quotes. h/t #bad_coder.
To add two dashes, add the following:
.. include:: <isotech.txt>
|minus|\ |minus|\ region
Note the backward-slash and the space. This avoids having a space between the minus signs and the name of the parameter.
You only need to include isotech.txt once per page.
With this solution, you can keep the extension smartypants and write two dashes in every part of the text you need. Not just in option lists or literals.
As commented by #mzjn, the best way to address the original submitter's need is to use Option Lists.
The format is simple: a sequence of lines that start with -, --, + or /, followed by the actual option, (at least) two spaces and then the option's description:
-l long listing
-r reversed sorting
-t sort by time
--all do not ignore entries starting with .
The number of spaces between option and description may vary by line, it just needs to be at least two, which allows for a clear presentation (as above) on the source, as well as on the generated document.
Option Lists have syntax for an option argument as well (just put an additional word or several words enclosed in <> before the two spaces); see the linked page for details.
The other answers on this page targeted the original submitter's question, this one addresses their actual need.

Problem capturing data inside of a capture that is optional

It's best to start with an example and what I've gotten so far.
Sample Data:
FOO foo#acme.com 5545
<Data><Name>tester</Name><Foo>bar</Foo></Data>
Current regex:
/FOO\s(.{1,20}#[^\s]+)\s.{0,20}\s{1,2}(<Data>.{0,100}<Name>(.{0,20})<\/Name>.{0,100}<\/Data>)?/m
Matches from regex:
foo#acme.com
testerbar
tester
I've wrapped the <Data> section in parenthesis followed-by a ? because the entire data section may or may not exist. However, the <Name> section is also optional, it may or may not exist. So I tried putting parenthesis around <Name> with a question mark as well but then I don't get the matches:
/FOO\s(.{1,20}#[^\s]+)\s.{0,20}\s{1,2}(<Data>.{0,100}(<Name>(.{0,20})<\/Name>)?.{0,100}<\/Data>)?/m
I've posted my regex and sample data on a regex site to make it easier to test/validate what I'm trying to do: http://www.rubular.com/r/ZhQzlNp1vv
In the <Data> section there is <Name> and even <Foo>. The point is, there may be many different elements in <Data> and I only care about extracting data from some of them. I need to use regex for my particular situation so please don't suggest using some XML parsing library (thanks!).
Thanks in advance.
/FOO\s(\S+#\S+).*?\n(?:.{0,100}(.{0,20})</Name>.{0,100}</Data>)?/m
http://www.rubular.com/r/IhisH7HYJR
To capture an optional group, use a non-capturing group to indicate the optionality inside a capturing group.
i.e.
((?:content)?)
The outer parentheses form the capturing group - if the optional group doesn't match you get an empty string. The (?:...) is the non-capturing group, which allows you to group the content (so it can all be made optional) without capturing it.
Update:
Whenever you have a complex regex, use free-spacing comment mode (flag=x) to make it readable (and thus far easier to figure out what's going on), like this:
FOO\s(.{1,20}#[^\s]+)\s.{0,20}\s{1,2}
((?:<Data>
# upto 200 chars, excluding captured tags or end tag (repeated below)
(?:(?!<Name>|<Foo>|<Bob>|<\/Data>).){0,200}
# Capture 3:
((?:<Name>.{0,20}<\/Name>)?)
(?:(?!<Name>|<Foo>|<Bob>|<\/Data>).){0,200}
# Capture 4:
((?:<Foo>.{0,20}<\/Foo>)?)
(?:(?!<Name>|<Foo>|<Bob>|<\/Data>).){0,200}
# Capture 5:
((?:<Bob>.{0,20}<\/Bob>)?)
(?:(?!<Name>|<Foo>|<Bob>|<\/Data>).){0,200}
<\/Data>)?)
Which at rubular results in:
1. foo#acme.com
2. <Data><Name>tester</Name><Foo>bar</Foo></Data>
3. <Name>tester</Name>
4. <Foo>bar</Foo>
5.
Annoyingly rubular doesn't seem to provide a multi-line editor when x is turned on, which sucks, and it also doesn't support standard comment syntax, so I had to change those #... to (?#...) which is less readable. Oh well.
If you need the values without the tags, you'll need a separate expression to strip those.
( Or, y'know, use a tool actually designed for the job. ;) )

Regexp in ruby - can I use parenthesis without grouping?

I have a regexp of the form:
/(something complex and boring)?(something complex and interesting)/
I'm interested in the contents of the second parenthesis; the first ones are there only to ensure a correct match (since the boring part might or might not be present but if it is, I'll match it by accident with the regexp for the interesting part).
So I can access the second match using $2. However, for uniformity with other regexps I'm using I want that somehow $1 will contain the contents of the second parethesis. Is it possible?
Use a non-capturing group:
r = /(?:ab)?(cd)/
This is a non-ruby regexp feature. Use /(?:something complex and boring)?(something complex and interesting)/ (note the ?:) to achieve this.
By the way, in Ruby 1.9, you can do /(something complex and boring)?(?<interesting>something complex and interesting)/ and access the group with $~[:interesting] ;)
Yup, use the ?: syntax:
/(?:something complex and boring)?(something complex and interesting)/
I'm not a ruby developer however I know other regex flavors. So I bet you can use a non capturing group
/(?:something complex and boring)?(something complex and interesting)/
There is only one capturing group, hence $1
HTH
Not really, no. But you can use a named group for uniformity, like this:
/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/
You can change the names (the text in the angle brackets) for the uniformity that you want to achieve. You can then access the groups like this:
string.match(/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/) do |m|
# Do something with the match, m['group'] can be used to access the group
end

How can I write a regex to repeatedly capture group within a larger match?

I'm getting a regex headache, so hopefully someone can help me here. I'm doing some file syntax conversion and I've got this situation in the files:
OpenMarker
keyword some expression
keyword some expression
keyword some expression
keyword some expression
keyword some expression
CloseMarker
I want to match all instances of "keyword" inside the markers. The marker areas are repeated and the keyword can appear in other places, but I don't want to match outside of the markers. What I don't seem to be able to work out is how to get a regex to pull out all the matches. I can get one to do the first or the last, but not to get all of them. I believe it should be possible and it's something to do with repeated capture groups -- can someone show me the light?
I'm using grepWin, which seems to support all the bells and whistles.
You could use:
(?<=OpenMarker((?!CloseMarker).)*)keyword(?=.*CloseMarker)
this will match the keyword inside OpenMarker and CloseMarker (using the option "dot matches newline").
sed -n -e '/OpenMarker[[:space:]]*CloseMarker/p' /path/to/file | grep keyword should work. Not sure if grep alone could do this.
There are only a few regex engines that support separate captures of a repeated group (.NET for example). So your best bet is to do this in two steps:
First match the section you're interested in: OpenMarker(.*?)CloseMarker (using the option "dot matches newline").
Then apply another regex to the match repeatedly: keyword (.*) (this time without the option "dot matches newline").

Yahoo Pipes: filter items in a feed based on words in a text file

I have a pipe that filters an RSS feed and removes any item that contains "stopwords" that I've chosen. Currently I've manually created a filter for each stopword in the pipe editor, but the more logical way is to read these from a file. I've figured out how to read the stopwords out of the text file, but how do I apply the filter operator to the feed, once for every stopword?
The documentation states explicitly that operators can't be applied within the loop construct, but hopefully I'm missing something here.
You're not missing anything - the filter operator can't go in a loop.
Your best bet might be to generate a regex out of the stopwords and filter using that. e.g. generate a string like (word1|word2|word3|...|wordN).
You may have to escape any odd characters. Also I'm not sure how long a regex can be so you might have to chunk it over multiple filter rules.
In addition to Gavin Brock's answer the following Yahoo Pipes
filters the feed items (title, description, link and author) according to multiple stopwords:
Pipes Info
Pipes Edit
Pipes Demo
Inputs
_render=rss
feed=http://example.com/feed.rss
stopwords=word1-word2-word3

Resources