Friendly URL with Coldfusion help required [duplicate] - url-rewriting

I have got a requirement for generating user friendly urls.I am on IIS.
My dynamic URLs looks like,
www.testsite.com/blog/article.cfm?articleid=4432
Client wants the urls should look like
www.testsite.com/blog/article_title
I know this can be easily done using IIS URL rewiter 2.0.
But the Client wants to do it using ColdFusion only. Basic idea he given like,
User will hit the url www.testsite.com/blog/article_title
I need to fetch the article id using the article_title in the url.
Using the ID to call the article.cfm page and load the output into cfsavecontent and then deliver that output to the browser.
But I do not think its possible at application server level. How IIS will understand our user friendly urls . OR am I missing something important? Is it possible to do it using ColdFusion at application server level?

First, I hate to recommend reinventing the wheel. Webservers do this and do this well.
Cold Fusion can do something like this with #cgi.path_info#. You can jump through some hoops as Adam Tuttle explains here: Can I have 'friendly' url's without a URL rewriter in IIS?.
Option #2: My Favorite: OnMissingTemplate..
Only available to users of Application.cfc (I'm pretty sure .cfm has no counterpart to onMissingTemplate).
You can use this function within application.cfc and all affected pages will throw any "missing" urls at this event. You can then place
<cffunction name="onMissingTemplate">
<cfargument name="targetPage" type="string" required=true/>
<!--- Use a try block to catch errors. --->
<cftry>
<cfset local.pagename = listlast(cgi.script_name,"/")>
<cfswitch expression="#listfirst(cgi.script_name,"/")#">
<cfcase value="blog">
<cfinclude template="mt_blog.cfm">
<cfreturn true />
</cfcase>
</cfswitch>
<cfreturn false />
<!--- If no match, return false to pass back to default handler. --->
<cfcatch>
<!--- Do some error logging here --->
<cfreturn false />
</cfcatch>
</cftry>
</cffunction>
mt_blog.cfm can have contents like, if your url is say just like /blog/How-to-train-your-flea-circus.cfm
<!--- get everything after the slash and before the dot --->
<cfset pagename = listfirst(listlast(cgi.script_name,"/"),".")>
<!--- you may probably cache queries blog posts --->
<cfquery name="getblogpost">
select bBody,bTitle,bID
from Blog
where urlname = <cfqueryparam cfsqltype="cf_sql_varchar" value="#pagename#">
</cfquery>
<!--- This assumes you will have a field, ex: urlname, that has a url-friendly format to match
to. The trouble is that titles are generically, in most blogs, changing every special char
to - or _, so it's difficult to change them back for this sort of comparison, so an add'l
db field is probably best. It also makes it a little easier to make sure no two blogs have
identical (after url-safe-conversion) titles. --->
...
Or if you use a url like /blog/173_How-to-train-your-flea-circus.cfm (where 173 is a post ID)
<!--- get everything after the slash and before the dot --->
<cfset pageID = listfirst(listlast(cgi.script_name,"/"),"_")>
<!--- you may probably cache queries blog posts --->
<cfquery name="getblogpost">
select bBody,bTitle,bID
from Blog
where bID = <cfqueryparam cfsqltype="cf_sql_integer" value="#pageID#">
</cfquery.
...

I don't recommend using a missing file handler (or CF's onMissingTemplate). Otherwise IIS will return a 404 status code and your page will not be indexed by search engines.
What you need to do is identify a unique prefix pattern you want to use and create a web.config rewrite rule. Example: I sometimes use "/detail_"+id for product detail pages.
You don't need to retain a physical "/blog" sub-directory if you don't want to. Add the following rewrite rule to the web.config file in the web root to accept anything after /blog/ in the URL and interpret it as /?blogtitle=[everythingAfterBlog]. (I've added an additional clause in case you want to continue to support /blog/article.cfm links.)
<rules>
<rule name="Blog" patternSyntax="ECMAScript" stopProcessing="true">
<match url="blog/(.*)$" ignoreCase="true" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{SCRIPT_FILENAME}" matchType="IsFile" negate="true" />
<add input="{PATH_INFO}" pattern="^.*(blog/article.cfm).*$" negate="true" />
</conditions>
<action type="Rewrite" url="/?blogtitle={R:1}" appendQueryString="true" />
</rule>
</rules>
I recommend using a "301 Redirect" to the new SEO-friendly URL. I also advise using dashes (-) between word fragments and ensure that the character case is consistent (ie, lowercase) or you could get penalized for "duplicate content".

To add to what cfqueryparam suggested, this post on Using ColdFusion to Handle 404 errors shows how to replace the web server's 404 handler with a CFM script - giving you full rewrite capabilities. It is for an older version of IIS, but you should be able to find the proper settings in the IIS version you are using.
As Adam and other's have said (and the same point is made in the post) this is not something you should do if you can avoid it. Web servers working at the HTTP level are much better equipped to do this efficiently. When you rely on CF to do it you are intentionally catching errors that are thrown in order to get the behavior you want. That's expensive and unnecessary. Typically the issue with most clients or stakeholders is a simple lack of understanding or familiarity with technology like url rewriting. See if you can bend them a little. Good luck! :)

Related

'redirect' and 'proxy' vs 'forward' and 'passthrough' in Tuckey URLRewrite

Note that, in my attempt to display code examples, I will redact/edit out any references to the company for whom I work in an effort to obscure their identity, not so much to hide the fact that I'm even asking. It should also be of note that I am very new to this game of UrlRewrite/Tuckey/dotCMS.
I have been having trouble getting a redirect to work. It's using Tuckey URLRewrite through dotCMS. The attempt is to redirect, but as a forward versus a proxy, for SEO purposes.
I've found that the following works ('redirect' and 'proxy' are interchangeable here):
<to type="proxy">http://[redacted]:8080$1$3?%{query-string}</to>
However, the following leads to a 404 ('forward' and 'passthrough' are interchangeable here):
<to type="forward">http://[redacted]:8080$1$3?%{query-string}</to>
The entirety of the rule is as follows:
<!-- EN with Query Params -->
<rule>
<from>^/([^/]+)/en/([^/]+)?$</from>
<to type="proxy" qsappend="true">[redacted]:8080$1$3&%{query-string}</to>
</rule>
<!-- EN without Query Params -->
<rule>
<from>^(.*)(\/en)(\/.*)?$</from>
<to type="proxy">[redacted]:8080$1$3?%{query-string}</to>
</rule>
Some of my initial questions (as many more are likely to arise):
Is there such a difference between 'proxy'/'redirect' and 'forward'/'passthrough' that more specialized efforts to achieve a meaningful redirect need to be implemented?
Am I missing something in other configuration files that may affect the outcomes of these attempts at redirection?
EDIT: The differences in RegEx are me trying things to see if that could possibly be where the disconnect is occurring
Because urls in dotCMS do not really exist, the servlet requestdispatcher, which is used by forward rules, does not work. You need to set a request attribute, CMS_FILTER_URLMAP_OVERRIDE, which dotCMS will respect. In code, this looks like:
NormalRule forwardRule = new NormalRule();
forwardRule.setFrom( "^/example/forwardDotCMS/(.*)$" );
SetAttribute attribute = new SetAttribute();
attribute.setName("CMS_FILTER_URLMAP_OVERRIDE");
attribute.setValue("/about-us/index");
forwardRule.addSetAttribute(attribute);
addRewriteRule( forwardRule );

Translating .htaccess to Tuckey UrlRewriteFilter

I'm building a project using Spring Boot and Angular 1.5.X and I am struggling to handle full page refreshes of Angular routes - typical "404 because the path I made doesn't actually exist" problem. I've done a fair bit of research and the solution that I keep seeing is to implement a .htaccess file with the following snippet in order to redirect all unknown requests back to the index (I pulled the following from this post)
RewriteEngine On
Options FollowSymLinks
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /#/$1 [L]
I have Tuckey's UrlRewriteFilter installed - according to this blog post since I don't have a WEB-INF folder - and it is working. It starts and it reads the urlrewrite.xml successfully. However, I don't know what to put in my urlrewrite.xml - I haven't the slightest clue of how to translate the above into something that the UrlRewriteFilter can understand. I've browsed the manual for the UrlRewriteFilter and I don't really know where / how to start.
Basically, what do I have to put in my urlrewrite.xml so that if I hit F5, my website doesn't puke back 404 errors?
Any help is appreciated.
Edit 1
I should mention that all of my API endpoints are prefaced with /api/** in order to distinguish them from links on my front end - an example would be /api/open/getUser and /api/secured/updateSettings.
Edit 2
Couple things I've discovered so far. One is that the UrlRewriteFilter can actually support .htaccess files and I did get it (as far as I can tell) to load in by moving the .htacess into my Resources folder and tweaking the code sample in the above blog post slightly, changing this
private static final String CONFIG_LOCATION = "classpath:/urlrewrite.xml";
to
private static final String CONFIG_LOCATION = "classpath:/.htaccess";
and
Conf conf = new Conf(filterConfig.getServletContext(), resource.getInputStream(), resource.getFilename(), "MyProject");
to
Conf conf = new Conf(filterConfig.getServletContext(), resource.getInputStream(), resource.getFilename(), "MyProject", true);
The addition of the true tells the filter to use a .htaccess file. Awesome, problem solved right? Not quite - it's hard to explain but it doesn't seem like the UrlRewriteFilter was/is reading the .htacess correctly. I was using an .htacess tester to verify that the regex's and rewrite conditions were working as I expected and they seemed to be. The tester said that they were fine. However, the UrlRewriteFilter would freak out and get stuck in some kind of loop, to the point that Java would throw a stack overflow exception (as to why, I've no idea - I can't seem to find a way to set the filter's logging level to debug via Java D:< ).
So clearly that didn't work - I am currently attempting to translate the .htaccess into urlrewrite.xml myself, and here is what I've managed to created so far.
<urlrewrite use-query-string="true">
<rule match-type="regex" enabled="true">
<note>
Any URI that ends with one of the following extensions will be allowed to continue on unimpeded.
Buried in the manual was the single line that said a "-" in the "to" will allow the request to
continue on unmodified.
</note>
<from>\.(html|css|jpeg|gif|png|js|ico|txt|pdf)$</from>
<to last="true">-</to>
</rule>
<rule match-type="regex" enabled="true">
<note>
Any URI that is prefaced with "/api/open/**" or "/api/secured/**" will be allowed through unmodified.
</note>
<condition type="request-uri" operator="equal">\/api\/(open|secured)\/([a-zA-Z0-9\/]+)</condition>
<from>^.*$</from>
<to last="true">-</to>
</rule>
<rule match-type="regex" enabled="false">
<note>
This one is supposed to be a "when all else fail" rule - if the other two rules don't match,
forward to the index and let Angular figure out the rest.
!! This one seems to be getting stuck in a loop of sorts !!
</note>
<from>^.*$</from>
<to last="true">/</to>
</rule>
</urlrewrite>
The first two seem to be working splendidly. The third rule (the one with enabled set to false for good reason) does not - it also appears to getting stuck in the same filter loop (or whatever is happening - the stack trace is so big that Intellij is like "nah man") as the .htacess method. Making progress.
Huzzah, I managed to get it! It was a right pain the butt since I couldn't figure out how to turn on debugging and see what the filter was actually doing, but alas, I have succeeded!
Spent one metric crap ton of time using a regex tester, and this is what I came up with. I am by no means even remotely close to a regex master, so please try to contain your nausea should you have any.
<urlrewrite use-query-string="true">
<rule match-type="regex" enabled="true">
<note>
- "/post/**" and "/user/.../**" are optional - this is because when you're on, say, "/post/20" and you hit
F5, the browser will attempt to get the static assets from "/post/**"
- the second group is used to see if the request is for a static asset
- take advantage of back references and forward only the part that matches the second group
- i.e. "/post/20" as URI -> hit F5 -> "/post/scripts/mainController.js" request of server -> "/scripts/mainController.js" forwarded
- i.e. "/user/Tester/home" -> hit F5 -> "/user/Tester/scripts/mainController.js" -> "/scripts/mainController.js" forwarded
</note>
<condition type="request-uri" operator="equal">\/?(post\/|user\/[a-zA-Z0-9]+\/)(.*.(html|css|jpe?g|gif|png|js|ico|txt|pdf))</condition>
<from>^.*$</from>
<to last="true">/%2</to>
</rule>
<rule match-type="regex" enabled="true">
<note>
Any URI that is prefaced with "/api/open/**" or "/api/secured/**" will be allowed through unmodified.
</note>
<condition type="request-uri" operator="equal">\/api\/(open|secured)\/([a-zA-Z0-9\/]+)</condition>
<from>^.*$</from>
<to last="true">-</to>
</rule>
<rule match-type="regex" enabled="true">
<note>
- Register, browse, search, and upload are all single level urls - the "\z" is to match the end of the string,
otherwise "/register" would match "/registerController.js"
- Inbox CAN be like "/inbox/favorites" so that's why it has a secondary regex - my Regex-Fu isn't good enough to combine
- Settings always has a secondary level
- User always has either home, gallery (w/ page and number), or favorites (w/ page and number)
- A post will always have a number
</note>
<condition type="request-uri" operator="equal" next="or">\/(register\z|browse\z|search\z|upload\z|inbox\z|tag\z)</condition>
<condition type="request-uri" operator="equal" next="or">\/inbox\/(favorites\z|uploads\z|comments\z)?</condition>
<condition type="request-uri" operator="equal" next="or">\/settings\/[A-Za-z-_0-9]+</condition>
<condition type="request-uri" operator="equal" next="or">\/user\/[A-Za-z-_0-9]+\/(home\z|gallery\/[0-9]+\/[0-9]+|favorites\/[0-9]+\/[0-9]+)</condition>
<condition type="request-uri" operator="equal" next="or">\/post\/[0-9]+</condition>
<from>^.*$</from>
<to last="true">/</to>
</rule>
The rules are not as general as I'd like, but they are functional (I have a sneaking suspicion that those five conditionals daisy chained together are a bit of a performance hit). The rules are pretty much specifically tailored solely to my needs but hopefully they can at least be starting point to anybody else who was in my shoes about 4 days ago.
Another important thing to take note of is that in your Angular config (if you're using HTML5 mode - I don't believe that the following is required for hashbang mode), make sure you set requiredBase to true, like:
$locationProvider.html5Mode({
enabled: true,
requireBase: true
});
and include a
<base href="/">
in the <head> of your index.html file. If you don't, Angular will get confused and parts of your application might not quite load correctly - parts of my URI were being trimmed, for example.
Also, tip for anybody new to using .htaccess / UrlRewriteFilter, go get yourself Postman in order to test your rules - probably a major "well, duh" for most, but for the rest of us it'll be a life saver :)
If anybody has any tips on how to improve the efficiency / combine the regex's at all, please let me know.

validateRequest=true and requestValidationMode="4.0" lets html through

I have a Web Forms website on IIS7 and .NET 4.5.1 and I want the http requests to be validated using Microsoft's Request validation. The web.config default values for validateRequest and requestValidationMode are supposed to be "true" and "4.0" respectively and that should be what I want (I tried specifying them just in case).
<pages validateRequest="true">
<httpRuntime requestValidationMode="4.0" />
For some reason, when I input an html tag (tried < script > and < a >) in a form then submitting it, I get the expected Potentially Dangerous request error, but the tag gets saved in the database. Why did it go through? I simply take the textbox's Text value as is and send it to my DB, but I expect the error to stop that from happening.
When I tried setting:
<httpRuntime requestValidationMode="2.0" />
The error was the same, but this time, the tag didn't end up in the database, which is what I want.
I would like to understand why the lesser safe validation mode "2.0" is the only one that actually prevents the request from going through in my case, which doesn't seem to make much sense. There must be something I'm missing, please let me know if I should provide other information.
I have found a solution to my own problem. It would appear that Microsoft's documentation about requestValidationMode states that all values above "4.0" is interpreted as "4.0", but that isn't true. Reading this interesting page, I have found out there's a "4.5" value that is valid and does exactly what I wanted.

Tuckey UrlRewriteFilter Not Working With Multiple Conditions for Host Not Equal

When my URL is localhost:8080, the rule below for Tuckey UrlRewriteFilter wrongly always results in localhost:8080 redirecting to www.example.com.
That behaviour for seems contrary to Tuckey UrlRewriteFilter reference manual!
What I want is for localhost:8080 to remain unchanged without redirection, to allow testing on local computer.
I wish to avoid unwanted URLs which are NOT at the example.com domain from being indexed by search engines. The unwanted URLs have a different domain but point to the same/duplicate example.com pages.
<urlrewrite>
<rule>
<name>Avoid wrong hostname's pages being indexed by search engines</name>
<condition name="host" operator="notequal" next="and">www.example.com</condition>
<condition name="host" operator="notequal" next="and">localhost:8080</condition>
<from>^/(.*)</from>
<to type="permanent-redirect" last="true">http://www.example.com/$1</to>
</rule>
Alternative:
I also tried it another way: removing all condition elements, and altering "from" to be:
<from>^/(^www.example.com|^localhost:8080)(\?.*)?$</from>
i.e. not equal to example.com and not equal to localhost -- but that has same problem.
I had the same problem as you do but couldn't find a solution using tuckey. I end up solving this compatibility of localhost-test and domain-name-consistency by using the interceptor in Spring. My code is like this
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response, Object handler) throws Exception {
String url = request.getRequestURL().toString();
if (!url.startsWith("http://localhost") && !url.startsWith("http://www.example.com")){
response.sendRedirect("http://www.example.com"+request.getRequestURI());
return false;
}
return true;}
but there will be the necessary overhead to check in every request. Hope this helps!
as you are using regex match-type please try giving to the condition a regex too. E.g. <condition name="host" operator="notequal">^www.example.com$</condition>

ASP.NET MVC: how is this possible ? The parameters dictionary contains a null entry for parameter 'x'

I was wondering if someone has a clue of what is happening here, and could point me in the right direction.
Ok ..lets put the code in context.
I have ajax methods (jquery) like this:
$xmlHttp("/Api/GetWaitingMessages", { count: 20 })
.always(processResult);
($xmlHttp simply wraps a jQuery defered, and some basic $ajax options)
And in our healthmonitoring back-office i see things like this:
Exception information:
Exception type: System.ArgumentException
Exception message: The parameters dictionary contains a null entry for parameter 'count' of non-nullable type 'System.Int32' for method 'System.Web.Mvc.ActionResult GetWaitingMessages(Int32)' in 'AjaxController'. An optional parameter must be a reference type, a nullable type, or be declared as an optional parameter.
Parameter name: parameters
Now the thing is, i placed some traces & try/catches (for testing) to make sure that jQuery never calls GetWaitingMessages with an empty or undefined "count", but as far as the healthmonitoring exeptions go: GetWaitingMessages was instantiated and passed null as a parameter. (from what i understand, MVC instantiates methods via reflection)
btw: the error only happens like maybe 1 out of many thousands of requests
The signature of GetWaitingMessages is:
public virtual ActionResult GetWaitingMessages(int count)
{
....
}
So i suppose, mvc shouldn't even hit the method since there should be no signature match..
Does MVC have problems with high traffic websites (ie. multi-threading problems) ?
The site mentioned above is running on a cluster of 5 web-farm servers with Network Load Balancing and IP affinity.
Each server gets around 1500 request/sec at peak times.
The site is using url rewriting to map domains to areas (ie test.com will simply insert /test into the url) since it's a skinable & multilingual white label site.
Some more details on site configuration:
The controller that serves ajax requests is decorated with
[SessionState(SessionStateBehavior.Disabled)]
HttpModules that where considered useless where removed since we need to run: runAllManagedModulesForAllRequests="true" in MVC. I could have set runAllManagedModulesForAllRequests="false", and then try to figure out what to add, in which order, but found it simpler to just remove what i know is not essential.
<remove name="AnonymousIdentification" />
<remove name="Profile" />
<remove name="WindowsAuthentication" />
<remove name="UrlMappingsModule" />
<remove name="FileAuthorization" />
<remove name="RoleManager" />
<remove name="Session" />
<remove name="UrlAuthorization" />
<remove name="ScriptModule-4.0" />
<remove name="FormsAuthentication" />
The following are all activated and configured in the web.config
<pages validateRequest="false" enableEventValidation="false" enableViewStateMac="true" clientIDMode="Static">
and also:
urlCompression
staticContent
caching
outputCache
EDIT : just analyzed my trace logs a bit more. When the error occurs, i see (Content-Length: 8), which corresponds to (count=20). However i do not see any query parameters in the logs. I dumped the HttpInputStream to the logs, and it's completely empty ..but like i just mentioned, the logs also say that Content-Length = 8, so something is very wrong here.
Could IIS (eventually url rewriting) be mixing up it's stuff somewhere along the way ?
-
Any help would be greatly appreciated ..i'm ripping my hair out trying to understand what could possibly be going wrong here.
Thanks, Robert
What type of a request does your xmlHttp issues to a server (GET, POST or something else)?
What is the definition of GetWaitingMessages action method?
It might very well be the case of mismatching accepted verbs or argument names.
I have a feeling that this could be a problem with MVC not being able to bind to your 'count' parameter. By default, it expects the parameter to be named 'id'.
You can try the following:
Modify your GetWaitingMessages action to define it with a parameter called 'id' instead of 'count'
Create a custom model binding as described in the accepted answer to the stackoverflow question at asp.net mvc routing id parameter
Hope this helps
EDIT: Just saw your reply to another answer stating that the action is a POST. In which case, binding may not be an issue.
Just for testing try this to see if there is any problem .
public virtual ActionResult GetWaitingMessages(FormCollection form)
{
var count=Int32.Parse(form["count"]);
....
}
Of course it will throw if the count field isn't set. If it always works correctly then the problem is with routing or model binding.

Resources