Using CDN like Cloudflare with mod_rewrite to save bandwidth - caching

I know CDNs like Cloudflare save bandwidth by caching static files as images, JS, and CSS files.
I have a script that generates images on the fly and its location looks something like this:
http://domain.com/image.php?id=1
With id being the image id. Cloudflare won't cache these images due to the dynamic URL. If I add a mod rewrite rule to rewrite the URL to become something like:
http://domain.com/images/1
or
http://domain.com/images/1.jpg
Will CloudFlare cache the images in this case or the images have to actually be real files that reside in directories?

One way to check is to open a page containing an image and then use Chrome Web Inspector. Go to Network > Images, select an image, then under Header > Response header look for cf-cache-status.
If you see cf-cache-status: HIT then the image or resource is being cached by CloudFlare. I think the alternative is 'MISS"
Good luck

This question is a duplicate of one on StackOverflow already: Using CDN like Cloudflare with mod_rewrite to save bandwidth
The short answer, however, is this:
The easiest way to Cache Everything on a given endpoint in CloudFlare
is to use a Cache Everything Page Rule, an asterisk will match a
wildcard result. So in your first example we can do the following:

Related

Cloudfront, EC2 and Relative URLs

This is probably a simple question, but I can't find a straightforward answer anywhere.
My website is hosted on Amazon EC2.
I want to take advantage of Amazon Cloudfront to speed up the loading of the images, Javascript and CSS on my site.
Throughout my site, I've used relative URLs to point to the images, Javascript and CSS that reside on my EC2 server.
So, now, in order to take advantage of Cloudfront, do I need to change all my relative URLs into absolute URLs which point to Cloudfront, or will this be handled automatically by Amazon/EC2/Cloudfront? Or, maybe a better way to ask the question is, can I leave all my URLs as relative URLs and still get all the advantages of Cloudfront?
Short answer is no, your relative URLs will not work as expected on CloudFront - except for the case mentioned by Gio Hunt that once your page loads the CSS file, any relative url inside the CSS file itself will resolve to CloudFront, but this probably isn't very useful in your case.
See this answer for a solution using SASS that pretty closely matches what I've done in the past:
I used SASS - http://sass-lang.com
I have a mixin called cdn.scss with content like $image_path: "/images/";
Import that mixin in the sass style #import "cdn.scss"
Update image paths as such: background:url($image_path + "image.png");
On deployment I change the $image_path variable in the mixin.scss and then rerun sass
Basically you regenerate your CSS to use the CDN (CloudFront) base url by creating a variable that all your pages respect. The difficulty involved with doing this will depend on how many references and file you need to change, but a simple search and replace for relative paths is pretty easy to accomplish.
Good luck.
If you want to leave everything as is, you could pass everything through cloudfront setting up your site as a custom origin. This can work pretty well if your site is mostly static.
If you do want to take advantage of cloudfront without sending everything through, you will need to update your relative urls to absolute ones. CSS files can keep relative urls as long as the css file is served via cloudfront.

Does Robots Meta Tag No Index remove indexed URL's

We have an application which has about 15000 pages. For better SEO reasons we had to change the URL's. Google had already crawled all of these pages earlier and due to the change, we see a lot of duplicate titles/meta description on webmasters. Our impressions on google have dropped and we believe this is the reason. Correct me if my assumption is incorrect. Now we are not able to write a regular expression for the change of URL's using a 301 redirect, because the change was such. The only way to do it would be to write 301 redirects for individual URL's which is not feasible for 10000 URL's. Now can we use a robots meta tag with NOINDEX? My question basically is if I write a NOINDEX metatag will google remove the already indexed URL's? If not what are the other ways to remove the old indexed URL's from google? ANother thing which I can do is make all the previous pages 404 errors to avoid the duplicates, but will that be a right thing to do?
Now we are not able to write a regular expression for the change of
URL's using a 301 redirect, because the change was such. The only way
to do it would be to write 301 redirects for individual URL's which is
not feasible for 10000 URL's.
Of course you can! I'm rewriting more than 15000 URLs with mod_rewrite and RewriteMap!
This is just a matter of scripting / echo all URLs and mastering vim, but you can, and easily. If you need more information, just ask.
What you can do is a RewriteMap fils like this:
/baskinrobbins/branch/branch1/ /baskinrobbins/branch/Florida/Jacksonville/branch1
I've made a huge answer here and you can very easily adapt it to your needs.
I could do that job in 1-2 hours max but I'm expensive ;).
Reindexing is slow
It would take weeks for Google to ignore the older URLs anyways.
Use Htaccess 301 redirects
You can add a file on your Apache server, called .htaccess, that is able to list all the old URLs and the new URLs and have the user instantly redirected to the new page. Can you generate such a text file? I'm sure you can loop through the sections in your app or whatever and generate a list of URLs.
Use the following syntax.
Redirect 301 /oldpage.html http://www.yoursite.com/newpage.html
Redirect 301 /oldpage2.html http://www.yoursite.com/folder/
This prevents the 404 File Not Found errors, and is better than the meta refresh or redirect tag because the the old page is not even served to clients.
I did this for a website that had gone through a recent upgrade, and since google kept pointing to the older files, we needed to redirect clients to view the new content instead.
Where's .htaccess?
Go to your site’s root folder, and create/download the .htaccess file to your local computer and edit it with a plain-text editor (ie. Notepad). If you are using FTP Client software and you don’t see any .htaccess file on your server, ensure you are viewing invisible / system files.

Use IIS Rewrite Module to redirect to Amazon S3 bucket

My MVC project uses the default location (/Content/...)
So where this code:
<div id="header"style="background-image: url('/Content/images/header_.jpg')">
resolves as www.myDomain.com/content/images/header_.jpg
I'm moving my images files to S3 so now they resolve from 'http://images.myDomain.com' Do I have to convert all the links in the project to that absolute path?
Is there perhaps an IIS7x property to help here?
EDIT: The question seems to boil down to the specifics of working with IIS's Rewrite Module. The samples I've seen so far show how to manipulate the lower ends and query string of a URI. I'm needing to remap the domain end of the URI:
http://www.myDomain.com/content/images/header_.jpg
needs to become:
http://images.myDomain.com/header_.jpg
thx
I'm not sure I understand you correctly. Do you mean
How do I transparently rewrite image urls like http://www.myDomain.com/Content/myImage.png as http://images.myDomain.com/Content/myImage.png at render time?
Or
How do I serve images like http://images.myDomain.com/Content/myImage.png transparently from S3?
There's a DNS trick to answer the second one.
Create the 'images.myDomain.com' bucket, and put your content in it under the '/Content/' path. Since S3 exposes buckets as domains in their own right, you can now get your content with
http://images.myDomain.com.s3.amazonaws.com/Content/myImage.png
You can then create a CNAME record in your own DNS provider taking 'images.myDomain.com' to 'images.myDomain.com.s3.amazonaws.com'
This lets you link to your images as
http://images.myDomain.com/Content/myImage.png
..and yet have them served from S3 (You might also consider a full CDN such as cloud front.)

Understanding how images are served and cached

So I'm wondering how browsers treat requests for images. I'm hoping to use a cdn for serving product images on my website. I'd also like to use the cdn for serving button images and images used in my css.
The problem with this is that I don't have control over the expires headers (Rackspace files is what I'm looking into).
See, say I have a large image file as a background on my home page. So the page is accessed often, but the image stays the same. Is the browser going to request this image every time?
Or should I just use a cdn for my product images?
caching is quite a broad subject. I suggest you start by reading about the different kinds of caching here http://www.mnot.net/cache_docs/#BROWSER and how caching works here http://www.web-caching.com/mnot_tutorial/how.html
Now, to answer your question: assuming the user has caching enabled and the cdn response headers are properly configured a user visiting your page multiple times will only request that background image once until the cache expires or those files are cleaned.
No, AFAIK you need necessarily to add the 'cache' header to your images to enable browser caching. This is a great tutorial about it.
Additionally you can read this article from Yahoo to get a very brief view of the topics.
Review specially these topics of the article:
Minimize HTTP Requests
Add an Expires or a Cache-Control Header
Use a Content Delivery Network
Hope it helps you

Question about using subdomains to force caching

I haven't had a huge opportunity to research the subject but I figure I'll just ask the question and see if we can create a knowledge base on the subject here.
1) Using subdomains will force a client side cache, is this by default or is there an easy way for a client to disable it? More curious about what kind of a percentage of users I should be expecting to affect.
2) What all will be cached? Images? Stylesheets? Flash SWFs? Javascripts? Everything?
3) I remember reading that you must use a subdomain or www in your URL for this to work, is this correct? (and does this mean SO won't allow it?)
I plan on integrating this onto all of my websites eventually but first I am going to try to do it for a network of flash game websites so I am thinking www.example.com for the website will remain the same but instead of using www.example.com/images, www.example.com/stylesheets, www.example.com/javascript, & www.example.com/swfs I will just create subdomains that point to them (img.example.com, css.example.com, js.example.com & swf.example.com respectively) -- is this the best course of action?
Using subdomains for content elements isn't so much to force caching, but to trick a browser into opening more connections than it might otherwise do. This can speed up page load time.
Caching of those elements is entirely down the HTTP headers delivered with that content.
For static files like CSS, JS etc, a server will typically tell the client when the file was modified, which allows a browser to ask for the file "If-Modified-Since" that timestamp. Specifics of how to improve on this by adding some extra caching headers would depend on which webserver you use. For example, with Apache you can use the mod_expires module to set the Expires header, or the Header directive to output other types of cache control headers.
As an example, if you had a subdirectory with your css files in, and wanted to ensure they were cached for at least an hour, you could place a .htaccess in that directory with these contents
ExpiresActive On
ExpiresDefault "access plus 1 hours"
Check out YSlow's documentation. YSlow is a plugin for Firebug, the amazing Firefox web development plugin. There is lots of good info on a number of ways to speed up your page loads, one of which is using one or more subdomains to encourage the browser to do more parallel object loads.
One thing I've done on two Django sites is to use a custom template tag to create pseudo-paths to images, css, etc. The path contains the time-last-modified as a pseudo directory. This path component is stripped out by an Apache .htaccess mod_rewrite rule. The object is then given a 10 year time-to-live (ExpiresDefault "now plus 10 years") so the browser will only load it once. If the object changes, the pseudo path changes and the browser will fetch the updated object.

Resources