Getting last part of URL into variable with Url Rewriter - url-rewriting

I'm using Url Rewriter to create user-friendly URLs in my web app and have the following rule set up
<rewrite url="/(?!Default.aspx).+" to="/letterchain.aspx?ppc=$1"/>
How do I replace $1 so that it is the last part of the URL?
So that the following
www.mywebapp.com/hello
would transform to
/letterchain.aspx?ppc=hello
I've read the docs but can't find anything.

The $1 in the to portion of the group refers to the first capture group defined (eg the part in the brackets).
The part that you actually want injecting into the $1 is the .+ which isnt in a capture group.
I'm not sure but I think because of the (?! ) "match if suffix is absent" query this isnt counted as numbered capture group $1 so this should work:
<rewrite url="/(?!Default.aspx)(.+)" to="/letterchain.aspx?ppc=$1"/>
If it doesnt then just try inserting the second capture group into your to string instead:
<rewrite url="/(?!Default.aspx)(.+)" to="/letterchain.aspx?ppc=$2"/>

Please note that if you are developing for IIS 7+ http://www.iis.net/download/urlrewrite/ is a module from Microsoft that performs faster rewrites with lower footprint.
BTW, your regex has a small problem, you need to escape the dot character, that is "/(?!Default.aspx)(.+)"

Related

How to visit a link inside an email using capybara

I am new to cucumber with capybara. I got an application to test whose flow is:'after submitting a form, an email will be sent to the user which contains the link to another app. In order to access the app we have to open the mail and click the link, which will redirect to the app.'. I don't have access to the mail Id. Is there any way to extract that link and continue with the flow?
Please, give some possible way to do it.
Regards,
Abhisek Das
In your test, use whatever means you need in order to trigger the sending of the email by your application. Once the email is sent, use a regular expression to find the URL from the link within the email body (note this will work only for an email that contains a single link), and then visit the path from that URL with Capybara to continue with your test:
path_regex = /(?:"https?\:\/\/.*?)(\/.*?)(?:")/
email = ActionMailer::Base.deliveries.last
path = email.body.match(path_regex)[1]
visit(path)
Regular expression explained
A regular expression (regex) itself is demarcated by forward slashes, and this regex in particular consists of three groups, each demarcated by pairs of parentheses. The first and third groups both begin with ?:, indicating that they are non-capturing groups, while the second is a capturing group (no ?:). I will explain the significance of this distinction below.
The first group, (?:"https?\:\/\/.*?), is a:
non-capturing group, ?:
that matches a single double quote, "
we match a quote since we anticipate the URL to be in the href="..." attribute of a link tag
followed by the string http
optionally followed by a lowercase s, s?
the question mark makes the preceding match, in this case s, optional
followed by a colon and two forward slashes, \:\/\/
note the backslashes, which are used to escape characters that otherwise have a special meaning in a regex
followed by a wildcard, .*?, which will match any character any number of times up until the next match in the regex is reached
the period, or wildcard, matches any character
the asterisk, *, repeats the preceding match up to an unlimited number of times, depending on the successive match that follows
the question mark makes this a lazy match, meaning the wildcard will match as few characters as possible while still allowing the next match in the regex to be satisfied
The second group, (\/.*?) is a capturing group that:
matches a single forward slash, \/
this will match the first forward slash after the host portion of the URL (e.g. the slash at the end of http://www.example.com/) since the slashes in http:// were already matched by the first group
followed by another lazy wildcard, .*?
The third group, (?:"), is:
another non-capturing group, ?:
that matches a single double quote, "
And thus, our second group will match the portion of the URL starting with the forward slash after the host and going up to, but not including, the double quote at the end of our href="...".
When we call the match method using our regex, it returns an instance of MatchData, which behaves much like an array. The element at index 0 is a string containing the entire matched string (from all of the groups in the regex), while elements at subsequent indices contain only the portions of the string matched by the regex's capturing groups (only our second group, in this case). Thus, to get the corresponding match of our second group—which is the path we want to visit using Capybara—we grab the element at index 1.
You can use Nokogiri to parse the email body and find the link you want to click.
Imagine you want to click a link Change my password:
email = ActionMailer::Base.deliveries.last
html = Nokogiri::HTML(email.html_part.body.to_s)
target_url = html.at("a:contains('Change my password')")['href']
visit target_url
I think this is more semantic and robust that using regular expressions. For example, this would work if the email has many links.
If you're using or willing to use the capybara-email gem, there's now a simpler way of doing this. Let's say you've generated an email to recipient#email.com, which contains the link 'fancy link'.
Then you can just do this in your test suite:
open_email('recipient#email.com') # Allows the current_email method
current_email.click_link 'fancy link'

url rewrite pattern generates to many redirects

I have been struggling with the same issue for a while now and I could not find an good answer yet. I'm using rack-rewrite to add some url rewrite rules to my app's middleware stack.
I have the following rule:
r301 %r{^/([^(docs|help|legal|login|account|apps)])(.+)/$}, '$1'
Which is not working properly or as I would expect it. I have tried one of my previous question's answer, but neither that works, it actually generates an event more weird behaviour (it redirects to an url without the domain name, just to the path).
What I am trying to do is:
if user requests http://example.com/ or http://example.com/random-path/ I need the rewrite rule to strip the slash, thus the examples would become http://example.com respectively http://example.com/random-path;
if the requested paths matches any of the paths in the list docs|help|legal|login|account|apps, do not strip the slash at the end of the path if exists, but add a slash if it's not there
I tried with two rules, one that ignores the listed paths above and strips slashes and one that adds the slash if it hits something from the list and the slash after the path is not there:
r301 %r{^/([^(docs|help|legal|login|account|apps)])(.+)/$}, '/$1'
r301 %r{^/([(docs|help|legal|login|account|apps)])(.+)/$}, '/$1/'
How could I write a rule that would do that, or two rules, because what I tried it did not work?
You can do that like so:
r301 %r{^/((?!docs|help|legal|login|account|apps).+)/$}, '/$1'
r301 %r{^/((?=docs|help|legal|login|account|apps).+[^/])$}, '/$1/'
example 1
example 2
and some documentation on lookahead and lookbehind
EDIT: stray parentheses.

url rewriting : can't find rule for /fr/ or /en/

I need to identified urls that are either "/fr/" or "/en/". (and only these two)
I'm looking for the good regular expression.
Of course it works if I write "/../", but it's too large.
Best I could find is "/[fe][rn]/" but it also take /fn/ and /er/.
Simply use a pipe in a group :
/(fr|en)/?
Edit: added the optional trailing slash

How do I select all the characters after any of these three extensions with Regex?

My test string :
http://website.me/stuffs/5715?vars=
So my url can be website.com, website.me, or website.dev.
And I basically want a regex statement that would capture all the content after this part:
http://website.me:3000/
So that it returns :
stuffs/5715?vars=
You should really use the URI class from Ruby core:
require 'uri'
URI.parse('http://website.me/stuffs/5715?vars=').request_uri
#=> "/stuffs/5715?vars="
http://[^/]+/(.*)
The (.*) part should capture everything after the domain name (and port number if it's included) and store it in capture group 1. How you go about accessing the capture group is language/implementation specific. This regex works by just matching everything after the first / that appears after the initial http://.

How can I write a regex to repeatedly capture group within a larger match?

I'm getting a regex headache, so hopefully someone can help me here. I'm doing some file syntax conversion and I've got this situation in the files:
OpenMarker
keyword some expression
keyword some expression
keyword some expression
keyword some expression
keyword some expression
CloseMarker
I want to match all instances of "keyword" inside the markers. The marker areas are repeated and the keyword can appear in other places, but I don't want to match outside of the markers. What I don't seem to be able to work out is how to get a regex to pull out all the matches. I can get one to do the first or the last, but not to get all of them. I believe it should be possible and it's something to do with repeated capture groups -- can someone show me the light?
I'm using grepWin, which seems to support all the bells and whistles.
You could use:
(?<=OpenMarker((?!CloseMarker).)*)keyword(?=.*CloseMarker)
this will match the keyword inside OpenMarker and CloseMarker (using the option "dot matches newline").
sed -n -e '/OpenMarker[[:space:]]*CloseMarker/p' /path/to/file | grep keyword should work. Not sure if grep alone could do this.
There are only a few regex engines that support separate captures of a repeated group (.NET for example). So your best bet is to do this in two steps:
First match the section you're interested in: OpenMarker(.*?)CloseMarker (using the option "dot matches newline").
Then apply another regex to the match repeatedly: keyword (.*) (this time without the option "dot matches newline").

Resources