I've upgraded from Codeigniter version 2.x to 3.x and noticed that the URI routing behaves differently when passing in multiple segments.
In version 2.x, I was able to pass the following URL variations:
domain.com/function/arg1
domain.com/function/arg1/arg2
domain.com/function/arg1/arg2/arg3
Where $route['function/(:any)'] = 'function/$1' would work for all three URL variations given that my function call is function($arg1, $arg2 = 0, $arg3 = 0) which allows arg2 and arg3 to be optional and be passed in as $1 in the routing rules.
In order for it to work in version 3.x, I find that I have to set up my routing as:
$route['function/(:any)'] = 'function/$1';
$route['function/(:any)/(:any)'] = 'function/$1/$2';
$route['function/(:any)/(:any)/(:any)'] = 'function/$1/$2/$3';
Is there any way I can simplify the routing so that it will pass all remaining segments without having to create separate routing rules for each variation of the number of possible segments?
(:any) is not supposed to match the / character and this bug was fixed in version 3.
You should thoroughly read the v3.0 upgrade guide...
Quoting "Routes containing :any" -
There are certainly many developers that have utilized this bug as an actual feature. If you’re one of them and want to match a forward slash, please use the .+ regular expression:
(.+) // matches ANYTHING
(:any) // matches any character, except for '/'
Related
I have keys that I want to setup routing for in elasticsearch. I have some keys that have special characters (eg: &, ") - how can I setup routing for such keys. Examples:
"A&B"
You simply need to URL-encode the routing parameter value since & is a reserved character in URLs that introduces a new parameter. In your case, the routing parameter has the value "A and then another parameter called B" is introduced, which ES doesn't know about, hence why it complains.
/partb-2017-06-*/_search?routing="A&B"
^
|
this introduces a new parameter
The right way to do it is to URL-encode "A&B" into %22A%26B%22 and ES will not complain anymore.
/partb-2017-06-*/_search?routing=%22A%26B%22
How do I write a regex in ruby that will look for a "-" and ".org" or "com" like:
some-thing.org
some-thing.org.sg
some-thing.com
some-thing.com.sg
some-thing.com.* (there are too many countries so for now any suffix is fine- I will deal with this problem later )
but not:
some-thing
some-thing.moc
I wrote : /.-.(org)?|.*(.com)/i
but it fails to stop "some-thing" or "some-thing.moc" :(
Support optional hyphen
I can come with this regex:
(https?:\/\/)?(www\.)?[a-z0-9-]+\.(com|org)(\.[a-z]{2,3})?
Working demo
Keep in mind that I used capturing groups for simplicity, but if you want to avoid capturing the content you can use non capturing groups like this:
(?:https?:\/\/)?(?:www\.)?[a-z0-9-]+\.(?:com|org)(?:\.[a-z]{2,3})?
^--- Notice "?:" to use non capturing groups
Additionally, if you don't want to use protocol and www pattern you can use:
[a-z0-9-]+\.(?:com|org)(?:\.[a-z]{2,3})?
Support mandatory hyphen
However, as Greg Hewgill pointed in his comment, if you want to ensure you have a hyphen at least, you can use this regex:
(?:https?:\/\/)?(?:www\.)?[a-z0-9]+(?:[-][a-z0-9]+)+\.(?:com|org)(?:\.[a-z]{2,3})?
Although, this regex can fall in horrible backtracking issues.
Working demo
This may help :
/[a-z0-9]+-?[a-z0-9]+\.(org|com)(\.[a-z]+)?/i
It matches '-' in the middle optionally, i.e. still matches names without '-'.
I had a similar issue when I was writing an HTTP server...
... I ended up using the following Regexp:
m = url.match /(([a-z0-9A-Z]+):\/\/)?(([^\/\:]+))?(:([0-9]+))?([^\?\#]*)(\?([^\#]*))?/
m[1] # => requested_protocol (optional) - i.e. https, http, ws, ftp etc'
m[4] # => host_name (optional) - i.e. www.my-site.com
m[6] # => port (optional)
m[7] #=> encoded URI - i.e. /index.htm
If what you are trying to do is validate a host name, you can simply make sure it doesn't contain the few illegal characters (:, /) and contains at least one dot separated string.
If you want to validate only .com or .org (+ country codes), you can do something like this:
def is_legit_url?(url)
allowed_master_domains = %w{com org}
allowed_country_domains = %w{sg it uk}
url.match(/[^\/\:]+\.(#{allowed_master_domains.join '|'})(\.#{allowed_country_domains.join '|'})?/i) && true
end
* notice that certain countries use .co, i.e. the UK uses www.amazon.co.uk
I would convert the Regexp to a constant, for performance reasons:
module MyURLReview
ALLOWED_MASTER_DOMAINS = %w{com org}
ALLOWED_COUNTRY_DOMAINS = %w{sg it uk}
DOMAINS_REGEXP = /[^\/\:]+\.(#{ALLOWED_MASTER_DOMAINS.join '|'})(\.#{ALLOWED_COUNTRY_DOMAINS.join '|'})?/i
def self.is_legit_url?(url)
url.match(DOMAINS_REGEXP) && true
end
end
Good Luck!
Regex101 Example
/[a-zA-Z0-9]-[a-zA-Z0-9]+?\.(?:org|com)\.?/
Of course, the above could be simplified depending on how lenient your rules are. The following is a simpler pattern, but would allow s0me-th1ng.com-plete to pass through:
/\w-\w+?\.(?:org|com)\b/
You could use a lookahead:
^(?=[^.]+-[^.]+)([^.]+\.(?:org|com).*)
Demo
Assuming you are looking for the general pattern of letters-letters where letters could be Unicode, you can do:
^(?=\p{L}+-\p{L}+)([^.]+\.(?:org|com).*)
If you want to add digits:
^(?=[\p{L}0-9]+-[\p{L}0-9]+)([^.]+\.(?:org|com).*)
So that you can match sòme1-thing.com
Demo
(Ruby 2.0+ for \p{L} I think...)
Using regex, how could I remove everything before the first path / in a URL?
Example URL: https://www.example.com/some/page?user=1&email=joe#schmoe.org
From that, I just want /some/page?user=1&email=joe#schmoe.org
In the case that it's just the root domain (ie. https://www.example.com/), then I just want the / to be returned.
The domain may or may not have a subdomain and it may or may not have a secure protocol. Really ultimately just wanting to strip out anything before that first path slash.
In the event that it matters, I'm running Ruby 1.9.3.
Don't use regex for this. Use the URI class. You can write:
require 'uri'
u = URI.parse('https://www.example.com/some/page?user=1&email=joe#schmoe.org')
u.path #=> "/some/page"
u.query #=> "user=1&email=joe#schmoe.org"
# All together - this will only return path if query is empty (no ?)
u.request_uri #=> "/some/page?user=1&email=joe#schmoe.org"
require 'uri'
uri = URI.parse("https://www.example.com/some/page?user=1&email=joe#schmoe.org")
> uri.path + '?' + uri.query
=> "/some/page?user=1&email=joe#schmoe.org"
As Gavin also mentioned, it's not a good idea to use RegExp for this, although it's tempting.
You could have URLs with special characters, even UniCode characters in them, which you did not expect when you wrote the RegExp. This can particularly happen in your query string. Using the URI library is the safer approach.
The same can be done using String#index
index(substring[, offset])
str = "https://www.example.com/some/page?user=1&email=joe#schmoe.org"
offset = str.index("//") # => 6
str[str.index('/',offset + 2)..-1]
# => "/some/page?user=1&email=joe#schmoe.org"
I strongly agree with the advice to use the URI module in this case, and I don't consider myself great with regular expressions. Still, it seems worthwhile to demonstrate one possible way to do what you ask.
test_url1 = 'https://www.example.com/some/page?user=1&email=joe#schmoe.org'
test_url2 = 'http://test.com/'
test_url3 = 'http://test.com'
regex = /^https?:\/\/[^\/]+(.*)/
regex.match(test_url1)[1]
# => "/some/page?user=1&email=joe#schmoe.org"
regex.match(test_url2)[1]
# => "/"
regex.match(test_url3)[1]
# => ""
Note that in the last case, the URL had no trailing '/' so the result is the empty string.
The regular expression (/^https?:\/\/[^\/]+(.*)/) says the string starts with (^) http (http), optionally followed by s (s?), followed by :// (:\/\/) followed by at least one non-slash character ([^\/]+), followed by zero or more characters, and we want to capture those characters ((.*)).
I hope that you find that example and explanation educational, and I again recommend against actually using a regular expression in this case. The URI module is simpler to use and far more robust.
I'm having a problem getting my RegEx to work with my Ruby script.
Here is what I'm trying to match:
http://my.test.website.com/{GUID}/{GUID}/
Here is the RegEx that I've tested and should be matching the string as shown above:
/([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)/
3 capturing groups:
group 1: ([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)
group 2: (\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)
group 3: ([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])
Ruby is giving me an error when trying to validate a match against this regex:
empty range in char class: (My RegEx goes here) (SyntaxError)
I appreciate any thoughts or suggestions on this.
You could simplify things a bit by using URI to deal parsing the URL, \h in the regex, and scan to pull out the GUIDs:
uri = URI.parse(your_url)
path = uri.path
guids = path.scan(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/)
If you need any of the non-path components of the URL the you can easily pull them out of uri.
You might need to tighten things up a bit depending on your data or it might be sufficient to check that guids has two elements.
You have several errors in your RegEx. I am very sleepy now, so I'll just give you a hint instead of a solution:
...[\/\/[0-9a-fA-F]....
the first [ does not belong there. Also, having \/\/ inside [] is unnecessary - you only need each character once inside []. Also,
...[-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}...
is greedy, and includes a period - indeed, includes all chars (AFAICS) that can come after it, effectively swallowing the whole string (when you get rid of other bugs). Consider {2,256}? instead.
I'm using Url Rewriter to create user-friendly URLs in my web app and have the following rule set up
<rewrite url="/(?!Default.aspx).+" to="/letterchain.aspx?ppc=$1"/>
How do I replace $1 so that it is the last part of the URL?
So that the following
www.mywebapp.com/hello
would transform to
/letterchain.aspx?ppc=hello
I've read the docs but can't find anything.
The $1 in the to portion of the group refers to the first capture group defined (eg the part in the brackets).
The part that you actually want injecting into the $1 is the .+ which isnt in a capture group.
I'm not sure but I think because of the (?! ) "match if suffix is absent" query this isnt counted as numbered capture group $1 so this should work:
<rewrite url="/(?!Default.aspx)(.+)" to="/letterchain.aspx?ppc=$1"/>
If it doesnt then just try inserting the second capture group into your to string instead:
<rewrite url="/(?!Default.aspx)(.+)" to="/letterchain.aspx?ppc=$2"/>
Please note that if you are developing for IIS 7+ http://www.iis.net/download/urlrewrite/ is a module from Microsoft that performs faster rewrites with lower footprint.
BTW, your regex has a small problem, you need to escape the dot character, that is "/(?!Default.aspx)(.+)"