Firefox add-on SDK pattern matching not working - firefox

When I use following code
var pageMod = require("sdk/page-mod");
pageMod.PageMod({
include: "http://www.page.com/user/*",
contentScript: 'window.alert("user");'
});
I get alert. But I want to replace "http://www." part so I tried:
*://*.page.com/user/*
*://page.com/user/*
*.page.com/user/*
and none of those work for me. Examples from developer.mozilla.org indicate that at least one of them should work. What is wrong with those?

I have encountered this problem in the past, you cannot use more than 1 * (wildcard) in the pattern.
You have 2 options
Use an array of websites, i.e. ["http://www.page.com/user/*", "https://www.page.com/user/*"]
Use a RegEx (Regular Expression)
Here is how you can use a RegEx to get what you wanted when you tried *://*.page.com/user/*
Use the following RegEx: .+:\/\/(.+\.)?page\.com\/user\/.*
Here is how it works (if you do not know RegEx, I would suggest learning it):
.+ # Any character 1+ times - Selects the Protocol (http, https, ftp)
:\/\/ # :// After Protocol (/ have to be escaped using \/)
(.+\.)? # (Optional) Letters followed by a . (dot) - (www.)
page # Website Name - (page)
\.com # .com - (Top-Level Domain)
\/user\/ # Folder /user/ (/ have to be escaped using \/)
.* # Any character 0 or more times - (Any folders / files after the /user/ folder)
Here is a good site to learn RegEx if you do not already know them: RegexOne
So, your full include will be:
include: /.+:\/\/(.+\.)?page\.com\/user\/.*/,
Note than in JavaScript you define a RegEx by esclosing it in /s
Here is a Live Demo of the RegEx working

Related

Regular expression help: how to ignore every path that isn’t a CSS file

I have a CSS framework submodule in my Git repo that includes a bunch of README, component.json and other files. I don’t want to modify or delete the files because I’d imagine it’d cause problems when updates are pushed to the submodule. Yet Middleman wants to process them.
I currently have this in my config.rb file:
# Ignore everything that's not a CSS file inside inuit.css
ignore 'css/inuit.css/*.html'
ignore 'css/inuit.css/*.json'
ignore 'css/inuit.css/LICENSE'
How could I express this with a file pattern or a regex?
I’m not familiar with Middleman, but doesn’t this work?
ignore /^css\/inuit\.css\/.*(?<![.]css)$/
Since ignore can take a regex, pass ignore a Ruby regex // instead of a string "" with a filename glob. In the regex, use negative lookahead (?!) and the end-of-string anchor $ to check that the filename doesn’t end in “.css”.
ignore /^ css\/inuit\.css\/ (?: [^.]+ | .+ \. (?!css) \w+ ) $/ix
This regex correctly handles all of these test cases:
Should match:
css/inuit.css/abc.html
css/inuit.css/thecssthing.json
css/inuit.css/sub/in_a_folder.html
css/inuit.css/sub/crazily.named.css.json
css/inuit.css/sub/crazily.css.named.json
css/inuit.css/LICENSE
Shouldn’t match:
css/inuit.css/realcss.css
css/inuit.css/main.css
css/inuit.css/sub/in_a_folder.css
css/inuit.css/sub/crazily.css.named.css
css/inuit.css/sub/crazily.named.css.css
The first alternation of the (?:) non-capturing group handles the case of files with no extension (no “.”). Otherwise, the second case checks that the last “.” in the path is not followed by “css”, which would indicate a “.css” extension.
I use the x flag to ignore whitespace in the regex, so that I can add spaces in the regex to make it clearer.

Check if a string contains 'http://' and '.jpg'

Im new to Ruby and Rails so forgive me if this an easy question. Im trying to check when a user passes in an IMG url in my form, that it is a valid url. Here is my code:
if params[:url].include? 'http://' && (params[:url].include? '.jpg' || params[:url].include? '.png')
This returns and error. Is this is even the best way to go about it? What should I do differently? Thanks.
if my_str =~ %r{\Ahttps?://.+\.(?:jpe?g|png)\z}i
Regex explained:
%r{...} — regex literal similar to /.../, but allows / to be used inside without escaping
\A — the start of the string (^ is just the start of the line)
http — the literal text
s? — optionally followed by an "s" (to allow https://)
:// — the literal text (to prevent something like http-whee.jpg)
.+ — one or more characters (that aren't a newline)
\. — a literal period (make sure this is an extension we're looking at)
(?:aaa|bbb) — allow either aaa or bbb here, but don't capture the result
jpe?g — either "jpg" or "jpeg"
png — the literal text
\z — the end of the string ($ is just the end of the line)
i — make the match case-insensitive (allow for .JPG as well as .jpg)
However, you might be able to get away with just this (more readable) version:
allowed_extensions = %w[.jpg .jpeg .png]
if my_str.start_with?('http://') &&
allowed_extensions.any?{ |ext| my_str.end_with?(ext) }
#Phrogz answer is better,I just tried this with some ruby libs.
require 'uri'
extensions = %w( .jpg .jpeg .png )
schemes = %w( http https )
string = params[:url]
if (schemes.include?URI.parse(string).scheme) && (extensions.include?File.extname(string))
end
While regex will shorten the code, I prefer to not do such a check all in one pattern. It's a self-documentation/maintenance thing. A single regex is faster, but if the needed protocols or image types grow, the pattern will become more and more unwieldy.
Here's what I'd do:
str[%r{^http://}i] && str[/\.(?:jpe?g|png)$/]

How to modify this regex to exclude punctuation in a URL?

I've modified a regex that I found here so that it would accept various UK and second-level TLDs.
/\b((?:^https?:\/\/|^[a-z0-9.\-]+[.][a-z]{2,4})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!#()\[\]{};:'".,<>?]))/i
However as you can see in my test data here, the regex matches URLs such as www.zapple.#com and https://m!crosoft.com which are not valid.
For some reason # symbols are excluded before the .com but after the . they are not.
Exclamation marks are not excluded at all which is confusing since, as far as I can see, only letters, numbers and dashes are allowed before the period.
The # is matched by
[^\s()<>]+
And the ! mark by
(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+
I don't know but that doesn't look like a good regex to match url's
Try the following which matches a url according to RFC 3986
Both absolute and relative url'sare supported.
Set case insensitivity to true
^
(# Scheme
[a-z][a-z0-9+\-.]*:
(# Authority & path
//
([a-z0-9\-._~%!$&'()*+,;=]+#)? # User
([a-z0-9\-._~%]+ # Named host
|\[[a-f0-9:.]+\] # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPvFuture host
(:[0-9]+)? # Port
(/[a-z0-9\-._~%!$&'()*+,;=:#]+)*/? # Path
|# Path without authority
(/?[a-z0-9\-._~%!$&'()*+,;=:#]+(/[a-z0-9\-._~%!$&'()*+,;=:#]+)*/?)?
)
|# Relative URL (no scheme or authority)
([a-z0-9\-._~%!$&'()*+,;=#]+(/[a-z0-9\-._~%!$&'()*+,;=:#]+)*/? # Relative path
|(/[a-z0-9\-._~%!$&'()*+,;=:#]+)+/?) # Absolute path
)
# Query
(\?[a-z0-9\-._~%!$&'()*+,;=:#/?]*)?
# Fragment
(\#[a-z0-9\-._~%!$&'()*+,;=:#/?]*)?
$
Update 1
This does not match m!crosoft.com and #pple.com It's probably due to someting with Rublar.

Catching a string like #.+# with regex

I am building a project which users should be able to generate links easily by putting: #this is the link#. And i am trying to catch strings in between 2 # symbols with regex. I have tried,
#.+#
it works perfectly if only 1 link in users string, but if there are more than 1 links like,
#asdfasdf asdf# asdf asfasdfasdf asd fasd fasdf #asdfasdf asdfasdf asdf asdf#
it catches the whole string. But i need them separately, so i can substitute them with tags.
This is called "greedy regex". By default regular expression matches the longest string possible. You can make it non-greedy this way:
/#.+?#/
Demo: http://rubular.com/r/7WWyaUApFt
Use non-greedy match
#.+?#
It will catch indivisual ones.

How can I match a URL but exclude terminators from the match?

I want to match urls in text and replace them with anchor tags, but I want to exclude some terminators just like how Twitter matches urls in tweets.
So far I've got this, but it's obviously not working too well.
(http[s]?\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?)
EDIT: Some example urls. In all cases below I only want to match "http://www.example.com"
http://www.example.com.
http://www.example.com:
"http://www.example.com"
http://www.example.com;
http://www.example.com!
[http://www.example.com]
{http://www.example.com}
http://www.example.com*
I looked into this very issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP) This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.
My regex solution, written for both PHP and Javascript - (but could easily be translated to Ruby) is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:
The Problem With URLs by Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber
The comments following Jeff's blog post are a must read if you want to do this right...
Ruby's URI module has a extract method that is used to parse out URLs from text. Parsing the returned values lets you piggyback on the heuristics in the module to extract the scheme and host information from a URL, avoiding reinventing the wheel.
text = '
http://www.example.com.
http://www.example.com:
"http://www.example.com"
http://www.example.com;
http://www.example.com!
[http://www.example.com]
{http://www.example.com}
http://www.example.com*
http://www.example.com/foo/bar?q=foobar
http://www.example.com:81
'
require 'uri'
puts URI::extract(text).map{ |u| uri = URI.parse(u); "#{ uri.scheme }://#{ uri.host[/(^.+?)\.?$/, 1] }" }
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
The only gotcha, is that a period '.' is a legitimate character in a host name, so URI#host won't strip it. Those get caught in the map statement where the URL is rebuilt. Note that URI is stripping off the path and query information.
A pragmatic and easy understandable solution is:
regex = %r!"(https?://[-.\w]+\.\w{2,6})"!
Some notes:
With %r we can choose the start and end delimiter. In this case I used exclamation mark, since I want to use slash unescaped in the regex.
The optional quantifier (i.e. '?') binds only to the preceding expression, in this case 's'. There's no need to put the 's' in a character class [s]?. It's the same as s?.
Inside the character class [-.\w] we don't need to escape dash and dot in order to make them match dot and dash literally. Dash should be first, however, to not mean range.
\w matches [A-Za-z0-9_] in Ruby. It's not exactly the full definition of URL characters, but combined with dash and dot it may be enough for our needs.
Top domains are between 2 and 6 characters long, e.g. '.se' and '.travel'
I'm not sure what you mean by I want to exclude some terminators but this regex matches only the wanted one in your example.
We want to use the first capture group, e.g. like this:
if input =~ %r!"(https?://[-.\w]+.\w{2,6})"!
match = $~[1]
else
match = ""
end
What about this?
%r|https?://[-\w.]*\w|

Resources