I'm trying to wrap my head around what approach I should use to force CDN refreshes of user profile photos on a website where CloudFront is the CDN serving the profile photos, and S3 is the underlying file store.
I need to ensure that user profile photos are up to date as soon as a user updates their profile photos. I see three options that I can do to update profile photos and ensure that website users get the latest image as soon as profile photos are revised. Of these approaches, is one better than the other in terms of ensuring fresh content and maximum long term cost effectiveness? Are there better approaches to ensuring fresh content and maximum long term cost effectiveness?
Issue one S3 put object request to save the file with its original file name, and issue one Amazon CloudFront invalidation request. Amazon CloudFront allows up to 1000 free invalidation requests per month which seems a bit on the low side
Issue one S3delete object request to delete the original photos, then one S3 put object request to save the new photo with a unique, new photo file name. This would be two S3 requests per file update, and would not require a CloudFront CDN invalidation request. CloudFront would then serve the latest files as soon as they were updated, providing image URLs were automatically set to the new file names
Issue one S3 put object request to save the file with its original file name, and then client side append a version code to the CDN URLs (i.e. /img/profilepic.jpg?x=timestamp) or something along that line. I'm not sure how effective this strategy is in terms of invalidating cached CloudFront objects
Thanks
CloudFront invalidation can take a while to kick in and is recomendded as a last resort to remove content that must be removed (like a copyright infringement).
The best approach is you versioned URLs. For profile images I would use an unique ID (such as a GUID). Whenever a user uploads a new photo replace that URL (and delete the old photo if you wish).
When you update your DB with the new ID of the user profile photo CloudFront will pull the new image and the change will be immediate.
Related
I'm fairly new to .net core but feel I now have a decent grasp of it but I am struggling with a problem that must have been encountered and solved before yet I can't find any tutorials, answers or pages that deal with it.
If I have a website where users can upload their own photos and those photos need to be private to the user how to I deal with <img> elements on pages? I can deal with uploading and storing the files. My problem is once I have the files on the server how to I service <img> elements that request those files. There are two parts to my question/problem:
Ideally I want a simple url (a) that doesn't reflect the physical storage location and (b) where the url has an id for the image - ie /files/images?id=123 rather than the url reflecting the directory structure of the server. The first problem is how do I intercept elements and map the images url to physical location and return the actual image file? ie what does look like from the servers perspective (how do I get the request) and what is the code that will intercept it (ie is it middleware)? Does .net core have any inbuilt mechanisms/features that can route the image requests (it seems like it is such a common requirement I would have thought that there were features to do it)?
When doing (1) how do I authenticate requests so that I only deliver images to authorised users particularly when it may not be a simple 1:1 relations (ie a user may give access to a particular image to multiple other users so I will need to verify access rights).
Thanks
S3 presigned URLs are being created for items that do not exist. Is this normal behavior? I would rather know if the item is not going to exist when creating the link, than send users to an error page. Obviously, I can check if the item exists before I create the link, but I'm wondering if I'm doing something wrong.
Yes, this is normal behavior. The pre-signed URL is simply a local calculation and signing of a URL. It has no interaction with the S3 service at all.
If you want to ensure that an object exists before you generate a pre-signed URL for it, then you should head that object.
Note: you can use pre-signed URLs to upload new objects, which obviously don't yet exist at the time you generate the URL. You might also want to use pre-signed URLs to download objects that don't yet exist, but will at some later date (though I admit this is probably not that common a use case).
I have a website forum where users exchange photos and text with one another on the home page. The home page shows 20 latest objects - be they photos or text. The 21st object is pushed out out of view. A new photo is uploaded every 5 seconds. A new text string is posted every second. In around 20 seconds, a photo that appeared at the top has disappeared at the bottom.
My question is: would I get a performance improvement if I introduced a CDN in the mix?
Since the content is changing, it seems I shouldn't be doing it. However, when I think about it logically, it does seem I'll get a performance improvement from introducing a CDN for my photos. Here's how. Imagine a photo is posted, appearing on the page at t=1 and remaining there till t=20. The first person to access the page (closer to t=1) will enable to photo to be pulled to an edge server. Thereafter, anyone accessing the photo will be receiving it from the CDN; this will last till t=20, after which the photo disappears. This is a veritable performance boost.
Can anyone comment on what are the flaws in my reasoning, and/or what am I failing to consider? Also would be good to know what alternative performance optimizations I can make for a website like mine. Thanks in advance.
You've got it right. As long as someone accesses the photo within the 20 seconds that the image is within view it will be pulled to an edge server. Then upon subsequent requests, other visitors will receive a cached response from the nearest edge server.
As long as you're using the CDN for delivering just your static assets, there should be no issues with your setup.
Additionally, you may want to look into a CDN which supports HTTP/2. This will provide you with improved performance. Check out cdncomparison.com for a comparison between popular CDN providers.
You need to consider all requests hitting your server, which includes the primary dynamically generated HTML document, but also all static assets like CSS files, Javascript files and, yes, image files (both static and user uploaded content). An HTML document will reference several other assets, each of which needs to be downloaded separately and thus incurs a server hit. Assuming for the sake of argument that each visitor has an empty local cache, a single page load may incur, say, ~50 resource hits for your server.
Probably the only request which actually needs to be handled by your server is the dynamically generated HTML document, if it's specific to the user (because they're logged in). All other 49 resource requests are identical for all visitors and can easily be shunted off to a CDN. Those will just hit your server once [per region], and then be cached by the CDN and rarely bother your server again. You can even have the CDN cache public HTML documents, e.g. for non-logged in users, you can let the CDN cache HTML documents for ~5 seconds, depending on how up-to-date you want your site to appear; so the CDN can handle an entire browsing session without hitting your server at all.
If you have roughly one new upload per second, that means there is likely a magnitude more passive visitors per second. If you can let a CDN handle ~99% of requests, that's a dramatic reduction in actual hits to your server. If you are clever with what you cache and for how long and depending on your particular mix of anonymous and authenticated users, you can easily reduce server loads by a magnitude or two. On the other side, you're speeding up page load times accordingly for your visitors.
For every single HTML document and other asset, really think whether this can be cached and for how long:
For HTML documents, is the user logged in? If no, and there's no other specific cookie tracking or similar things going on, then the asset is static and public for all intents and purposes and can be cached. Decide on a maximum age for the document and let the CDN cache it. Even caching it for just a second makes a giant difference when you get 1000 hits per second.
If the user is logged in, set the cache pragma to private, but still let the visitor's browser cache it for a few seconds. These headers must be decided upon by your forum software while it's generating the document.
For all other assets which aren't access restricted: let the CDN cache it for a long time and you can practically forget about ever having to serve those particular files ever again. These headers can be statically configured for entire directories in the web server.
The first bit before the _ is the id of the pin...what are the ZZtfjmGQ used for? I'm assuming the _c is probalby something to do with size.
http://media-cache-lt0.pinterest.com/upload/33284484717557666_HZtfjmFQ_c.jpg
I'm building an image upload service in node.js and was curious what other sites do to store the image.
Final images are served from a CDN, evident by the subdomain in the URL. The first bit, as you pointed out, is the id of the image, the second bit is a UID to get around cache limitations for image versions, and the last bit is image size.
A limitation of CDNs is the inability to process the image after upload. To get around this, my service uploads the files to my Nodejs server where I then serve the image back to the client. I use a jQuery script the user can use to crop the image which sends crop coordinates back to the server and I use ImageMagick to create the various sizes of the the uploaded image. You can obviously eliminate the crop step and just use aspect ratio's to automatically create the needed image sizes. I then upload the final images to the CDN for hosting to end users.
When a user needs to update a photo already in the CDN, the user uploads to nodejs server, images are processed and sized, the UID hash is updated and then uploaded to the CDN. If you want to keep things clean (and cut on CDN cost) you can delete the old "version" at this step as well. However, in my service I give the option to backtrack to an older version if needed.
The rest of the CRUD actions are pretty self explanatory. You can read a list of images available from the CDN using the ID (My service has a user id as well as an image id to allow more robust query operations) and deleting is as simple as identifying the image you want to delete.
We're using Azure CDN, but we've stumbled upon a problem. Before, content could not be updated. But we added the option for our users to crop their picture, which changes the thumbnails. See, the image is not being created as new, instead we just update the stream of the blob.
There doesn't seem to be any method to clear the cache, update any headers or anything else.
Is the only answer here to make a new blob and delete the old?
Thanks.
CDN would still cache the content, unless the cache-expiry passes, or the file name changes.
CDN is best for static content with a high cache hit ratio.
Using CDN for dynamic content is not recommended, because it causes the user to wait for a double hop from storage to cdn and from cdn to user.
You also pay twice the bandwidth on the initial load.
I guess the only workaround right now is to pass a dummy parameter in the request from the client to force download the file every time.
http://resourceurl?dummy=dummyval