Deeplinks with Backbone frontend on S3 - mod-rewrite

I have a one page javascript(Backbone) frontend running on S3 and I'd like to have a couple of deeplinks to be redirected to the same index file. You'd normally do this with mod_rewrite in Apache but there is no way to do this in S3.
I have tried setting the default error document to be the same as the index document, and
this works on the surface, but if you check the actual response status header you'll see the page comes back as a 404. This is obviously not good.
There is another solution, its ugly but better than the error document hack:
It turns out that you can create a copy of index.html and name it simply the same as the subdirectory(minus the trailing slash), so for example if I clone index.html and name it 'about', and make sure the Content-Type is set to text/html (in the metadata tab) all requests to /about will return the new 'about' which is a copy of index.html.
Obviously this solution is sub-optimal and only works with predefined deeplink targets, but the hassle could be lessened if the step to clone index.html was part of a build process for the frontend. Using Backbone-Boilerplate I could write a grunt task to do just that.
Other than these 2 hacky workarounds I dont see a way of doing this other than resorting to hashbangs..
Any suggestions will be greatly appreciated.
UPDATE:
S3 now (for a while actually) supports Index Documents which solves this problem.
Also if you use Route 53 for your DNS management you can set up an alias record pointing to your S3 bucket, so you dont need a subdomain+cname anymore :)

Unfortunately as far as I know (and I use s3 websites quite a bit) you're right on the money. The 404 hack is a really bad idea as you said, and so you basically have these options:
Use a regular backend of some kind and not S3
The Content-Type work-around
Hashbangs
Sorry to be the bearer of bad news :)
For me, the fact that you can't really direct the root of the domain to S3 websites was the deal breaker for some of my stuff. mod_rewrite-type scenarios sounds like another good example where it just doesn't work.

Did you try redirecting to hash? I am not sure if this S3 feature was available when you asked this question, but I was able to fix the problem using these redirection rules in static web hosting section of folder's properties.
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>topic/</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyPrefixWith>#topic/</ReplaceKeyPrefixWith>
</Redirect>
</RoutingRule>
</RoutingRules>
The rest is handled in Backbone.js application.

Related

Encode/Decode SEO urls through platform API?

I am trying to decode and encode Joomla urls but Joomla doesn't seem to have a consistent API for that (how it looks). The main problem comes in when another SEO plugin is installed and the operation is performed as background process (ie: not whilst rendering in a browser through Joomla).
The other big problem is that users copy and paste SEO urls of the own site directly into the content.
Does anyone knows a solution for this ? Supporting all sorts of SEO plugins individually is a total no-go and rather impossible.
I actually thought its the Job of the CMS to guarantee on a API level that SEO urls can be decoded and encoded without knowing the plugins, but no. I also had a look in some plugins and indeed, plugins do
handle code for other plugins whilst it shouldn't be, coz.
Well,
thanks
You can't. JRoute won't work reliably in the administrator, I even tried hacking it, it's a no-go.
Moveover sh404 (one of the leading SEF extensions) does a curl call to the frontend in order to get the paths right. You can find in their code a commented attempt to route in the backend.
Are you are trying to parse content when it's saved, find SEF urls and replace with their non-sef equivalents? If you create a simple component to handle this in the frontend (just get what you need from xmap), then you can query the frontend from the backend with curl/wget and possibly achieve this with a decent rate of success: but I wouldn't expect this to work 100% (sometimes parameters are added by components, or the order of parameters is different from call to call, and the router.php in extensions can be very fragile or even plain wrong).

Does Robots Meta Tag No Index remove indexed URL's

We have an application which has about 15000 pages. For better SEO reasons we had to change the URL's. Google had already crawled all of these pages earlier and due to the change, we see a lot of duplicate titles/meta description on webmasters. Our impressions on google have dropped and we believe this is the reason. Correct me if my assumption is incorrect. Now we are not able to write a regular expression for the change of URL's using a 301 redirect, because the change was such. The only way to do it would be to write 301 redirects for individual URL's which is not feasible for 10000 URL's. Now can we use a robots meta tag with NOINDEX? My question basically is if I write a NOINDEX metatag will google remove the already indexed URL's? If not what are the other ways to remove the old indexed URL's from google? ANother thing which I can do is make all the previous pages 404 errors to avoid the duplicates, but will that be a right thing to do?
Now we are not able to write a regular expression for the change of
URL's using a 301 redirect, because the change was such. The only way
to do it would be to write 301 redirects for individual URL's which is
not feasible for 10000 URL's.
Of course you can! I'm rewriting more than 15000 URLs with mod_rewrite and RewriteMap!
This is just a matter of scripting / echo all URLs and mastering vim, but you can, and easily. If you need more information, just ask.
What you can do is a RewriteMap fils like this:
/baskinrobbins/branch/branch1/ /baskinrobbins/branch/Florida/Jacksonville/branch1
I've made a huge answer here and you can very easily adapt it to your needs.
I could do that job in 1-2 hours max but I'm expensive ;).
Reindexing is slow
It would take weeks for Google to ignore the older URLs anyways.
Use Htaccess 301 redirects
You can add a file on your Apache server, called .htaccess, that is able to list all the old URLs and the new URLs and have the user instantly redirected to the new page. Can you generate such a text file? I'm sure you can loop through the sections in your app or whatever and generate a list of URLs.
Use the following syntax.
Redirect 301 /oldpage.html http://www.yoursite.com/newpage.html
Redirect 301 /oldpage2.html http://www.yoursite.com/folder/
This prevents the 404 File Not Found errors, and is better than the meta refresh or redirect tag because the the old page is not even served to clients.
I did this for a website that had gone through a recent upgrade, and since google kept pointing to the older files, we needed to redirect clients to view the new content instead.
Where's .htaccess?
Go to your site’s root folder, and create/download the .htaccess file to your local computer and edit it with a plain-text editor (ie. Notepad). If you are using FTP Client software and you don’t see any .htaccess file on your server, ensure you are viewing invisible / system files.

Use IIS Rewrite Module to redirect to Amazon S3 bucket

My MVC project uses the default location (/Content/...)
So where this code:
<div id="header"style="background-image: url('/Content/images/header_.jpg')">
resolves as www.myDomain.com/content/images/header_.jpg
I'm moving my images files to S3 so now they resolve from 'http://images.myDomain.com' Do I have to convert all the links in the project to that absolute path?
Is there perhaps an IIS7x property to help here?
EDIT: The question seems to boil down to the specifics of working with IIS's Rewrite Module. The samples I've seen so far show how to manipulate the lower ends and query string of a URI. I'm needing to remap the domain end of the URI:
http://www.myDomain.com/content/images/header_.jpg
needs to become:
http://images.myDomain.com/header_.jpg
thx
I'm not sure I understand you correctly. Do you mean
How do I transparently rewrite image urls like http://www.myDomain.com/Content/myImage.png as http://images.myDomain.com/Content/myImage.png at render time?
Or
How do I serve images like http://images.myDomain.com/Content/myImage.png transparently from S3?
There's a DNS trick to answer the second one.
Create the 'images.myDomain.com' bucket, and put your content in it under the '/Content/' path. Since S3 exposes buckets as domains in their own right, you can now get your content with
http://images.myDomain.com.s3.amazonaws.com/Content/myImage.png
You can then create a CNAME record in your own DNS provider taking 'images.myDomain.com' to 'images.myDomain.com.s3.amazonaws.com'
This lets you link to your images as
http://images.myDomain.com/Content/myImage.png
..and yet have them served from S3 (You might also consider a full CDN such as cloud front.)

With Google's #! mess, what effect would a redirect on the converted URL have?

So Google takes:
http://www.mysite.com/mypage/#!pageState
and converts it to:
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
...So... Would be it fair game to redirect that with a 301 status to something like:
http://www.mysite.com/mypage/pagestate/
and then return an HTML snapshot?
My thought is if you have an existing html structure, and you just want to add ajax as a progressive enhancement, this would be a fair way to do it, if Google just skipped over _escaped_fragment_ and indexed the redirected URL. Then your ajax links are configured by javascript, and underneath them are the regular links that go to your regular site structure.
So then when a user comes in on a static url (ie http://www.mysite.com/mypage/pagestate/ ), the first link he clicks takes him to the ajax interface if he has javascript, then it's all ajax.
On a side note does anyone know if Yahoo/MSN onboard with this 'spec' (loosely used)? I can't seem to find anything that says for sure.
If you redirect the "?_escaped_fragment_" URL it will likely result in the final URL being indexed (which might result in a suboptimal user experience, depending on how you have your site setup). There might be a reason to do it like that, but it's hard to say in general.
As far as I know, other search engines are not yet following the AJAX-crawling proposal.
You've pretty much got it. I recently did some tests and experimented with sites like Twitter (which uses #!) to see how they handle this. From what I can tell they handle it like you're describing.
If this is your primary URL
http://www.mysite.com/mypage/#!pageState
Google/Facebook will go to
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
You can setup a server-side 301 redirect to a prettier URL, perhaps something like
http://www.mysite.com/mypage/pagestate/
On these HTML snapshot pages you can add a client-side redirect to send most people back to the dynamic version of the page. This ensures most people share the dynamic URL. For example, if you try to go to http://twitter.com/brettdewoody it'll redirect you to the dynamic (https://twitter.com/#!/brettdewoody) version of the page.
To answer your last question, both Google and Facebook use the _escaped_fragment_ method right now.

websites urls without file extension?

When I look at Amazon.com and I see their URL for pages, it does not have .htm, .html or .php at the end of the URL.
It is like:
http://www.amazon.com/books-used-books-textbooks/b/ref=topnav_storetab_b?ie=UTF8&node=283155
Why and how? What kind of extension is that?
Your browser doesn't care about the extension of the file, only the content type that the server reports. (Well, unless you use IE because at Microsoft they think they know more about what you're serving up than you do). If your server reports that the content being served up is Content-Type: text/html, then your browser is supposed to treat it like it's HTML no matter what the file name is.
Typically, it's implemented using a URL rewriting scheme of some description. The basic notion is that the web should be moving to addressing resources with proper URIs, not classic old URLs which leak implementation detail, and which are vulnerable to future changes as a result.
A thorough discussion of the topic can be found in Tim Berners-Lee's article Cool URIs Don't Change, which argues in favour of reducing the irrelevant cruft in URIs as a means of helping to avoid the problems that occur when implementations do change, and when resources do move to a different URL. The article itself contains good general advice on planning out a URI scheme, and is well worth a read.
More specifically than most of these answers:
Web content doesn't use the file extension to determine what kind of file is being served (unless you're Internet Explorer). Instead, they use the Content-type HTTP header, which is sent down the wire before the content of the image, HTML page, download, or whatever. For example:
Content-type: text/html
denotes that the page you are viewing should be interpreted as HTML, and
Content-type: image/png
denotes that the page is a PNG image.
Web servers often use the file extension if the file is served directly from disk to determine what Content-type to assign, but web applications can also generate pages with any Content-type they like in response to a request. No matter the filename's structure or extension, so long as the actual content of the page matches with the declared Content-type, the data renders as intended.
For websites that use Apache, they are probably using mod_rewrite that enables them to rewrite URLS (and make them more user and SEO friendly)
You can read more here http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
and here http://www.sitepoint.com/article/apache-mod_rewrite-examples/
EDIT: There are rewriting modules for IIS as well.
Traditionally the file extension represents the file that is being served.
For example
http://someserver/somepath/image.jpg
Later that same approach was used to allow a script process the parameter
http://somerverser/somepath/script.php?param=1234&other=7890
In this case the file was a php script that process the "request" and presented a dinamically created file.
Nowadays, the applications are much more complex than that ( namely amazon that you metioned )
Then there is no a single script that handles the request ( but a much more complex app wit several files/methods/functions/object etc ) , and the url is more like the entry point for a web application ( it may have an script behind but that another thing ) so now web apps like amazon, and yes stackoverflow don't show an file in the URL but anything comming is processed by the app in the server side.
websites urls without file extension?
Here I questions represents the webapp and 322747 the parameter
I hope this little explanation helps you to understand better all the other answers.
Well how about a having an index.html file in the directory and then you type the path into the browser? I see that my Firefox and IE7 both put the trailing slash in automatically, I don't have to type it. This is more suited to people like me that do not think every single url on earth should invoke php, perl, cgi and 10,000 other applications just in order to sent a few kilobytes of data.
A lot of people are using an more "RESTful" type architecture... or at least, REST-looking URLs.
This site (StackOverflow) dosn't show a file extension... it's using ASP.NET MVC.
Depending on the settings of your server you can use (or not) any extension you want. You could even set extensions to be ".JamesRocks" but it won't be very helpful :)
Anyways just in case you're new to web programming all that gibberish on the end there are arguments to a GET operation, and not the page's extension.
A number of posts have mentioned this, and I'll weigh in. It absolutely is a URL rewriting system, and a number of platforms have ways to implement this.
I've worked for a few larger ecommerce sites, and it is now a very important part of the web presence, and offers a number of advantages.
I would recommend taking the technology you want to work with, and researching samples of the URL rewriting mechanism for that platform. For .NET, for example, there google 'asp.net url rewriting' or use an add-on framework like MVC, which does this functionality out of the box.
In Django (a web application framework for python), you design the URLs yourself, independent of any file name, or even any path on the server for that matter.
You just say something like "I want /news/<number>/ urls to be handled by this function"

Resources