my company has recently started to get problems with the image handling for our websites.
We have several websites (adult entertainment) that display images like dvd covers, snapshots and similar. We have about 100'000 movies and for each movie we have an average of 30 snapshots + covers. Almost every image has an additional version with blurring and overlay for non-members, this results in about 50 images per movie or a total of 5 million base images. Each of the images is available in several versions, depending on where it's placed on the page (thumbnail, original, small preview, not-so-small preview, small image in the top-list, etc.) which results in more images than i cared to count.
Now i had the idea to use a server to generate the images on-the-fly since it became quite clumsy to generate all the different images for all the different pages (as different pages sometimes even need different image sizes for basically the same task).
Does anyone know of an image processing server that can scale down images on-the-fly so we only need to provide the original images and the web guys can just request whatever size they need?
Requirements:
Very High performance (Several thousand users per day)
On-the-fly blurring and overlay creation
On-the-fly resize (with and without keeping aspect ratio)
Can handle millions of images
Must be able to read JPG, GIF, PNG and BMP and convert between them
Security is not that much of a concern as i.e. the unblurred images can already be reached by URL manipulation and more security would be nice but it's not required and frankly i stopped caring (After failing to get into my coworkers heads why (for our small reseller page) it's a bad idea to use http://example.com/view_image.php?filename=/data/images/01020304.jpg to display the images).
We tried PHP scripts to do this but the performance was too slow for this many users.
Thanks in advance for any suggestions you have.
I suggest you set up a dedicated web server to handle image resize and serve the final result. I have done something similar, although on a much smaller scale. It basically eliminates the process of checking for the cache.
It works like this:
you request the image appending the required size to the filename like http://imageserver/someimage.150x120.jpg
if the image exists, it will be returned with no other processing (this is the main point, the cache check is implicit)
if the image does not exist, handle the 404 not found via .htaccess and reroute the request to the script that generates the image of the required size
in the script specify the list of allowed sizes to avoid attacks like scripts requesting every possible size to shut your server down
keep this on a cookieless domain to minimize unnecessary traffic
EDIT: I don't think that PHP itself would slow the process much, as PHP scripting in this case is reduced to a minimum: the image scaling is done by a builtin library written in C. Whatever you do you'll have to use a library like this (GD or libmagick or so) so that's unavoidable. With my system at least you totally skip the overhead of checking the cache, thus further reducing PHP interaction. You can implement this on your existing server, so I guess it's a solution well suited for your budget.
Based on
We tried PHP scripts to do this but the performance was too slow for this many users.
I'm going to assume you weren't caching the results. I'd recommend caching the resulting images for a day or two (i.e. have your script check to see if the thumbnail has already been generated, if so use it, if it hasn't generate it on the fly).
This would improve performance dramatically as I'd imagine the main/start page probably has a lot more hits than random video X, thus when viewing the main page no images have to be created as they're cached. When User Y views Movie X, they won't notice the delay as much since it just has to generate that one page.
For the "On-the-fly resize" aspect - how important is bandwidth to you? I'd want to assume you're going through so much with movies that a few extra kb in images per request wouldn't do too much harm. If that's the case, you could just use larger images and set the width and height and let the browser do the scaling for you.
The ImageCache and Image Exact Sizes solutions from the Drupal community might do this, and like most solutions OSS use the libraries from ImageMagik
There are some AMI images for Amazons EC2 service to do image scaling. It used Amazon S3 for image storage, original and scales, and could feed them through to Amazons CDN service (Cloud Front). Check on EC2 site for what's available
Another option is Google. Google docs now supports all file types, so you can load the images up to a Google docs folder, and share the folder for public access. The URL's are kind of long e.g.
http://lh6.ggpht.com/VMLEHAa3kSHEoRr7AchhQ6HEzHVTn1b7Mf-whpxmPlpdrRfPW216UhYdQy3pzIe4f8Q7PKXN79AD4eRqu1obC7I
Add the =s paramter to scale the image, cool! e.g. for 200 pixels wide
http://lh6.ggpht.com/VMLEHAa3kSHEoRr7AchhQ6HEzHVTn1b7Mf-whpxmPlpdrRfPW216UhYdQy3pzIe4f8Q7PKXN79AD4eRqu1obC7I=s200
Google only charge USD5/year for 20GB. There is a full API for uploading docs etc
Other answers on SO
How best to resize images off-server
Ok first problem is that resizing an image with any language takes a little processing time. So how do you support thousands of clients? We'll you cache it so you only have to generate the image once. The next time someone asks for that image, check to see if it has already been generated, if it has just return that. If you have multiple app servers then you'll want to cache to a central file-system to increase your cache-hit ratio and reduce the amount of space you will need.
In order to cache properly you need to use a predictable naming convention that takes into account all the different ways that you want your image displayed, i.e. use something like myimage_blurred_320x200.jpg to save a jpeg that has been blurred and resized to 300 width and 200 height, etc.
Another approach is to sit your image server behind a proxy server that way all the caching logic is done automatically for you and your images are served by a fast, native web server.
Your not going to be able to serve millions of resized images any other way. That's how Google and Bing maps do it, they pre-generate all the images they need for the world at different pre-set extents so they can provide adequate performance and be able to return pre-generated static images.
If php is too slow you should consider using the 2D graphic libraries from Java or .NET as they are very rich and can support all your requirements. To get a flavour of the Graphics API here is a method in .NET that will resize any image to the new width or height specified. If you omit a height or width, it will resize maintaining the right aspect ratio. Note Image can be a created from a JPG, GIF, PNG or BMP:
// Creates a re-sized image from the SourceFile provided that retails the same aspect ratio of the SourceImage.
// - If either the width or height dimensions is not provided then the resized image will use the
// proportion of the provided dimension to calculate the missing one.
// - If both the width and height are provided then the resized image will have the dimensions provided
// with the sides of the excess portions clipped from the center of the image.
public static Image ResizeImage(Image sourceImage, int? newWidth, int? newHeight)
{
bool doNotScale = newWidth == null || newHeight == null; ;
if (newWidth == null)
{
newWidth = (int)(sourceImage.Width * ((float)newHeight / sourceImage.Height));
}
else if (newHeight == null)
{
newHeight = (int)(sourceImage.Height * ((float)newWidth) / sourceImage.Width);
}
var targetImage = new Bitmap(newWidth.Value, newHeight.Value);
Rectangle srcRect;
var desRect = new Rectangle(0, 0, newWidth.Value, newHeight.Value);
if (doNotScale)
{
srcRect = new Rectangle(0, 0, sourceImage.Width, sourceImage.Height);
}
else
{
if (sourceImage.Height > sourceImage.Width)
{
// clip the height
int delta = sourceImage.Height - sourceImage.Width;
srcRect = new Rectangle(0, delta / 2, sourceImage.Width, sourceImage.Width);
}
else
{
// clip the width
int delta = sourceImage.Width - sourceImage.Height;
srcRect = new Rectangle(delta / 2, 0, sourceImage.Height, sourceImage.Height);
}
}
using (var g = Graphics.FromImage(targetImage))
{
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
g.DrawImage(sourceImage, desRect, srcRect, GraphicsUnit.Pixel);
}
return targetImage;
}
In the time that this question has been asked, a few companies have sprung up to deal with this exact issue. It is not an issue that's isolated to you or your company. Many companies reach the point where they need to look for a more permanent solution for their image processing needs.
Services like imgix serve as a proxy and CDN for image operations like resizing and applying overlays. By manipulating the URL, you can apply different transformations to each image. imgix serves billions of requests per day.
You can also stand up services on your own and put them behind a CDN. Open source projects like imageproxy are good for this. This puts the burden of maintenance on your operations team.
(Disclaimer: I work for imgix.)
What you are looking for is best matched by Thumbor http://thumbor.readthedocs.org/en/latest/index.html , which is open source, backed by a huge company (means it will not disappear tomorrow), and ships with a lot of nice features like detecting what is important on an image when cropping.
For low-cost plus CDN I'd suggest to combine it with Cloudfront and AWS storage, or a comparable solution with a free CDN like Cloudflare. These might not be the best performing CDN providers, but at least still perform better than one server and also offload your image server on the cheap. Plus, it will save you a TON of bandwidth cost.
If each different image is uniquely identifiable by a single URL then I'd simply use a CDN such as AKAMAI. Let your PHP script do the job and let AKAMAI handle the load.
Since this kind of business doesn't usually have budget problems, that'd be the only place I'd look at.
Edit: that works only if you do find a CDN that will serve this kind of content for you.
This exact same problem is now being solved by image resize services dedicated to this task. They provide following features:
In built CDN - you need not worry about image distribution
Image resize on the fly - any size needed is available
No storage needed - you just store base image and all variants are handled by service
Ecosystem libraries - you can just include javascript and your job is done for all devices and all browsers.
One such service is Gumlet. You can also try some open source alternative like nginx plugin which can also resize image on the fly.
(I work for Gumlet.)
Related
I have a website that is going to host thousands of images. Not only that, but each image has different size depending on where you are in the webpage - on the list page, the image is shown as 350x200 rectangle, on the sidebar pictures are 100x100 etc.
So when a user uploads an image to the website, I keep the original and make 4 resized copies for each size. So if 100 users upload an image, the result will be 500 images. I can't event think of what will happen with the different sizes of the different mobile devices...
I started using CloudFront to optimize speed. And when a user uploads an image I upload the original and the resized copies to Amazon S3 bucket.
But if tomorrow I decide to add another size, or change an existing size I have to run a script that deletes the old sized images and to uplaod the newly sized images, which means "get the original from S3, resize it, upload the new size". That is not practical at all. Imagine when I do a complete redesign of the website and the image sizes change completely, I would have to run a script that resizes each image to the new requirements and to delete the old images.
Is there a more practical way to acheive that?
I wanted to acheive the following scenario:
When the user uploads an image, I resize it and upload it to S3
CloudFront will take the images which are directly resized
When I add another size, I want CloudFront to see that this size is missing on S3 and pull it from an origin from my website.
I cannot think of a way to implement this. Any help or sharing best practices will be appretiated.
Rather than doing all this yourself, you could use an image resizing service such as:
Cloudinary
Imgix
They can resize images on-the-fly so you do not have to create and store them yourself. They can also manipulate them (rotate, watermark, colorize) images on-the-fly. Videos too!
If you choose not to use such a service, you could create your own virtual resizing service. The major choice is whether to:
Resize on-the-fly (using CloudFront for caching but not requiring any storage of the resized images), or
Resize upon request and store the result for future access (less processing cost, but involves storage cost)
It is not possible to have CloudFront "see that this size is missing on S3 and pull it from an origin from my website". (You might be able to do fancy stuff with 404 pages, but it wouldn't be worth the effort.)
For anyone doing this now. It's not for everyone, but due cost-saving efforts we just switched from Imgix to using AWS's Serverless Image Handler. Works great if your images are on S3 and maybe a good alternative solution if it lacks the features of Imgix.
how can I set height and width in scaling and can I depend on the image generated (quality and professional scale generation).
how can I set height and width in scaling
You can't. Specify a maxSize for each scaling.sizes entry and Fine Uploader will proportionally scale the image.
can I depend on the image generated (quality
Quality will be limited if you rely on the browser only. There is an entire section in the documentation that explains how you can generate higher-quality resizes by integrating a third-party resize library. I also discuss why you may or may not want to do this. From the documentation:
Fine Uploader's internal image resize code delegates to the drawImage
method on the browser's native CanvasRenderingContext2D object. This
object is used to manipulate a element, which represents a
submitted image File or Blob. Most browsers use linear interpolation
when resizing images. This can lead to extreme aliasing and moire
patterns which is a deal breaker for anyone resizing images for
art/photo galleries, albums, etc. These kinds of artifacts are
impossible to remove after the fact.
If speed is most important, and precise scaled image generation is not
paramount, you should continue to use Fine Uploader's internal scaling
implementation. However, if you want to generate higher quality scaled
images for upload, you should instead use a third-party library to
resize submitted image files, such as pica or limby-resize. As of
version 5.10 of Fine Uploader, it is extremely easy to integrate such
a plug-in into this library. In fact, Fine Uploader will continue to
properly orient the submitted image file and then pass a properly
sized to the image scaling library of your choice to receive
the resized image file, along with the original full-sized image file
drawn onto a for reference. The only caveat is that, due to
issues with scaling larger images in iOS, you may need to continue to
use Fine Uploader's internal scaling algorithm for that particular OS,
as other third-party scaling libraries most likely do not contain
logic to handle this complex case. Luckily, that is easy to account
for as well.
If you'd like to, for example, use pica to generate higher-quality
scaled images, simply pull pica into your project, and contribute a
scaling.customResizer function, like so:
scaling: {
customResizer: !qq.ios() && function(resizeInfo) {
return new Promise(function(resolve, reject) {
pica.resizeCanvas(resizeInfo.sourceCanvas, resizeInfo.targetCanvas, {}, resolve)
})
},
...
}
I am building a web-site where a user will upload an image and then the image has to ultimately be stored in Amazon S3. Now, given the possibility of uploading directly from browser, I considered the following options:
Resize to all three sizes in the browser and then upload to S3 directly - this works but the problem is that I am uploading multiple images, which I doubt is a good way to this.
Upload the original image to S3, resize in node.js - I was thinking of actually uploading only original image but then using node.js to resize.
What is the best way to do 2) so that I can minimize the footprint of the service I need to deploy.
Appreciate any pointers!
Cheers
Vishal
i wouldn't do option 1 just because it's bad for user experience.
i suggest the following: only upload the original image to s3, then when a resized variant gets requested by a client, retrieve it from s3, resize, and deliver to the client through a CDN.
this beats resizing them all at once because:
maybe you want to change the sizes later on. resizing on request lets you change sizes at will. otherwise, you would have to go through every image and redo all the resizes.
the user doesn't have to wait until all the images are resized. instead, the user only waits for the image to be uploaded to s3 from the node.js server, which is fast assuming you're on aws.
resizing all at once creates a memory spike. if you're on a low memory platform like heroku, you can hit the 512mb memory limit pretty quickly and your app will swap and become ridiculously slow. better to spread out the resizing operations.
you only store the originals on S3. the resized versions are just derivatives - no need to store them.
i've written a library https://github.com/mgmtio/simgr to help me do this in my app.
G'day Vishal,
I imagine your largest images are going to be much larger than your smaller sizes, which are probably thumbnails or whatever. Because image file size more-or-less scales as the area of the image, this effect will be even more pronounced.
So any service you create which pulls the largest image back out of S3 to rescale it is likely to use more S3 bandwidth than the version which just uploads all three from the client! So it might be more sensible to just do option 1.
EDIT: If you really want to avoid uploading multiple files, I still wouldn't use the upload-directly-to-S3 option, since a) you'll have to detect new files on S3 and b) the total amount of data getting shipped around will still be greater than if you accept the image into your node app, resize it there and then save the various sizes to S3.
This question discusses image manipulation in Node.
-----Nick
i suggest using an Image CDN to Speed Up Image Delivery. Check here
Amazon S3 is Slow and Images are Heavy
Amazon S3 is known to be slow when serving web content directly. One
reason why it is slow is that a bucket is only located in one
geographical location. The location is selected when you create the
bucket. For example, if your bucket is created on Amazon servers in
California, but your users are in India, then images will still be
served from California. This geographical distance causes slow image
loading on your website.
Further, it is not uncommon to see very heavy images on S3, with large
dimensions and high byte size. One can only speculate on the reasons,
but it is probably related to the publication workflow and the
convenience of S3 as a storage space.
using image delivery like ImageEngine while keeping S3’s workflow
and convenience level.
Trust me when I say its very simple to implement...
very very simple I repeat.
Once you do the nessary settings after creating account all you need is to
Reference images using the ImageEngine domain
http://wq77sh2y.cdn.imgeng.in/path/image.jpg
OR
Reference images using the prefixing feature
http://wq77sh2y.cdn.imgeng.in/http://example.com/path/image.jpeg
this is not a question about a specific programming problem, it's about examining different concepts. If the moderators don't feel this is ok, delete my question.
I need to display 100 png images in a table td, and the images are 75x16 PNGs. In order to reduce the number of HTTP requests, I grouped all 166 images (only roughly 100 are shown at one time) in a big spritesheet, and have used the IMG tag to display the output, one image at a time. This is the code:
CSS:
.sprites {background-image:url('folder/spritesheet.png');background-color:transparent;background-repeat:no-repeat;display:inline;}
#png3 {height:16px;width:75px;background-position:0px 0;}
#png5 {height:16px;width:75px;background-position:-75px 0;}
PHP:
$classy = "png" . $db_field['imageid'];
echo "<td>" , "<img src='transparent.gif' class='sprites' id='$classy' alt='alt' align='absmiddle'/>" , "</td>";
$classy is a variable which is calling the correct image based on the SQL query output. transparent.gif is a 1px transparent gif. And this is the result, images are shown correctly inside a table:
The page loading speed increased significantly, maybe 50-60%. In one of my earlier questions some concerns were raised over this being good practice or not. But it works.
The only other solution I've found is using jar compression, but that concept works for Firefox only. This is the code which is used for displaying these same images using jar compression (PHP, no CSS):
$logo = "jar:http://www.website.com/logos.jar!/" . $db_field['imageid'] . ".png";
echo "<tr>" , "<td>" , "<img src='$logo' alt='alt' width='75' height='16'/>";
All of the 166 images are compressed in a jar archive and uploaded to the server, and as jar is a non-solid archive, only the called image is extracted, not all of them. This solution is lighting fast and I've never seen a faster way of displaying that many images. The concept is here and deserves a link. Another advantage over CSS sprites is that with jar each image can be individually optimized for size (e.g one is optimized to 64 colors, another one to 128, 32...depending on the image) and a large spritesheet can not be optimized as it contains a lot of colors.
So, does anyone know of a solution which would be equally fast as jar? If using CSS sprites to display content is bad practice - what is good practice which gives the same result? The key here is the loading speed of the website with as few HTTP requests as possible.
Not really an expert on this but I had my share in these thing also
HTTP Requests
Ever heard of the "2 concurrent connections" (recent browsers have around 6-8). Loading a lot of stuff means if 2 are loading at the same time, the others have to wait in line. Loading it in one big chunk is better. This is the main reason why spriting is used. Aside from the connection limit, you manage those "same purpose" images in one image.
Cache
Now, one big chunk I say but you might ask "Does that make it even worse?". Nope, becasue I have an ace up my sleeve and that's where "cache" comes in to play. One page load is all you need, and then, poof! The rest of the pages that need that image are like the speed of light and Saves you from another HTTP request. Never underestimate the power of the cache.
Images
Other things you can do is to optimize your images. I have used Fireworks and I really love it's image optimization tools. To optimize, here are personal guidelines i follow which you can use in your situation:
GIFs for icons, JPGs for images, PNGs for transparent stuff.
remove unused colors. yes, you can do this in some tools. cuts down size
never resize images in the html. have resized versions instead.
lose quality. yes, there is such thing. lower your image quality to reasonable limits. losing it too much makes your image too "cloudy" or "blocky"
progressive loading images. What it does is it "fast-loads" a blurred image then clears it up later.
avoid animated images. they are a bloat, not to mention annoying.
Server Tricks
There are connection limits - but that does not prevent you from using other domains or even subdomains! Distribute your content to other domains or subdomains to increase the number your connections. For images, dedicate a subdomain or two for it, like say img1.mysite.com and img2.mysite.com or another domain mysite2.com. not only is it beneficial for your user, it's beneficial to distributing server load.
Another method is using a Content Delivery Network (CDN). CDN has a global network of servers, which contain "cached" versions of your website resources. Like say i'm in Asia, when i view your site with CDN'ed resources, it finds that resource in the server nearest Asia.
Mark-up
Not necessarily related speed and semantics but the use of id should be reserved for more important purposes. If you use ID to mark images for their styles, what if there was another element that needs the same image? IDs need to be unique, they can't be used twice. So i suggest using multiple classes instead.
also, IDs take precedence over classes. to avoid unexpected overrides, use classes. learn more about CSS specificity.
.sprites {
background-image:url('folder/spritesheet.png');
background-color:transparent;
background-repeat:no-repeat;
display:inline;
height:16px; /*same width and heights? place them here instead*/
width:75px;
}
.png3 {
height:16px; /* in cases you need a different dimension, this will override */
width:75px;
background-position:0px 0;
}
.png5 {
background-position:-75px 0;
}
$classy = "png" . $db_field['imageid'];
echo <img src='transparent.gif' class='sprites {$classy}' alt='alt' align='absmiddle'/>";
I embed small images/icons in the style sheet:
.someicon{
background-image:url('....');
}
this works with all modern browsers, and does not require me to create sprites (plus it even saves one additional file to load).
In development, the images are defined in the style sheet normally like so:
.someicon{
background-image:url('../images/someicon.png');
}
and I have a system that generates the final style sheet (including consolidating all CSS into one, minifying and replacing the image reference with the data: url) automatically whenever I make a change to a style sheet.
This works well and saves me a ton of work. When compressed with gzip, the CSS file is not much bigger than the individual files added together. After optimizing the PNG/JPG files, the CSS for my start page is 63K uncompressed. Even with a slightly smaller sprite file, I would probably not save more than a fraction of a second in load time for the average user, so I do not bother with sprites.
This may well be a little of an open-ended question
The site I am working on requires to be optimised for performance. One of the key areas is to optimise the file sizes of the images used upon the site.
Unfortunatley these images are being created by employees who do not have the required knowledge for creating images for the web, and it is my job to produce a set of guidelines for them to use.
I was wondering whether there was any resource/guidlines/literature regarding typical images file sizes for images of different dimensions - as I would like to include something like this to aid them to ensure their images are being created properly.
Any info would be greatly appreciated.
Thanks in advance
I can't answer the opinion question, but I can suggest some guidelines that will keep your images smaller.
First off, if they're using Photoshop to edit their images, it's likely they're storing a whole bunch of crap in the headers (digital papertrail, EXIF data, and such). Also, folks will frequently save in too high a bit depth.
For novice users, trying to explain why they need to use "save for web" is more likely to confuse them. Instead, just point them at:
http://www.smushit.com/ysmush.it/
This site is rather handy - it will compress all the images on a page you specify, or you can upload the images.
You should strongly consider writing some guidelines about where images are stored as well. It's frequently very beneficial to have your static image content stored on several servers, apart from your dynamic content. Most browsers will only download a limited # of files at a time from any given website (usually it's 2).
Unless there's a good reason, all your images should be cached using one of the HTTP cache techniques (expires, etags, etc).
Good luck.
72 dpi as a resolution and either jpeg or png formats work best.
Try to use images at the exact pixel area size they will end up being displayed as. This is specified by the images height and width attributes.
You can set the output quality of a jpeg image which will also save file size although there is a trade off against image quality.
I hope this is of use.