AWS S3 extremely slow, how to improve performance - performance

I am experiencing extremely slow download speeds from AWS S3. I compared image download speed from AWSS3 vs Tesla site (from where I took the image).
I noticed that AWSS3 was approximately 10 times slower, why is this ?
Try to download the image in your browser and count the time until it is fully downloaded, from these two different url's, you would notice that AWS S3 is extremely slow.
AWS S3 Url: https://s3-us-west-2.amazonaws.com/hey-my-test-bucket/Red_Bay-1440.jpg
Tesla Url (where the image was taken from): https://www.tesla.com/tesla_theme/assets/img/models/v1.0/slideshow/Red_Bay-1440.jpg?20171006

Tesla.com is using the Akamai CDN where S3 images are not.
https://www.cdnplanet.com/tools/cdnfinder/#site:https://www.tesla.com/tesla_theme/assets/img/models/v1.0/slideshow/Red_Bay-1440.jpg?20171006
As #EleazarEnrique suggests you could use AWS CloudFront which is basically a CDN or you can use some other CDN like CloudFlare.com or one of the many other CDN's.
Both CloudFront and CloudFlare have a "free" pricing plans also (with some limitations).

Related

Slow download speed from Azure Storage

While testing our storage virtualization solution for a client we found interesting situation:
When object is downloaded directly from Azure Blob Storage to
outside of Azure - the speed (tested by wget) is around 0.2 to 0.5 MB/s
When the same object is downloaded to a VM in Azure - the speed is around 60 MB/s
It may seem like a network issue, but....
When the same object is downloaded via our proxy that also runs in the same Azure region and actually downloads the same object from Azure Blob Storage without any caching (and converts to S3 which is irrelevant here) - the speed is 30 MB/s and is basically limited by client's network.
I tried different regions and the results are similar.
Is Azure somehow throttles traffic coming from Blob storage to outside of Azure?
Your proxies/VMs or whatever runs in the same datacenter, network requests does not leave the local network and the speed then depends on the infrastructure. (routers, firewalls, cables, etc ...).
I'm sure they won't limit speed between their own infrastructure so services hosted on the same network works at full speed.
When you download from outside the datacenter, it depends on the outside architecture. And then your speed depends on more factors, including your internet downloading speed. Maybe they also limit uploading speed on their side but 0.5 MB would be very low.
EDIT: even between different region, you will benefit of 2 VMs connected through optic backbones. That's normal to have a high speed. They can also consider the traffic is "inside" Azure even if it's not the same datacenter and so not limit downloading speed.
Did you tried to download from a fast internet connection outside of the datacenter ? For exemple setting up a VM on digitalocean and trying a wget from this machine ?

Where to store thousands of images in S3 or EFS of AWS?

I will make a project in the not too distant future, a project where we will be storing thousands of thousands of images in the course of time. I'm on a hard decision whether to use Amazon S3 or EFS to store those images. Both I think are a very good option, but my question goes to what would be the best service or what would be the best practice?
My application will be done with Laravel and I already did the integration of both services.
Most of the characteristics of the project are:
Most of the files I will store will be photos about 95%.
Approximately 1.5k photos would be stored daily.
The photos are very large (professional cameras).
Traffic to the application will not be much, approx. 100 users at a time.
Each user would consult about 100 photos per day.
What do you recommend?
S3 is absolutely the right answer and practice. I have built numerous applications like you describe, some with 100s of millions of images, and S3 is superior. It also allows for flexibility such as your API returning the images as pre-signed URLs which will reduce load to your servers, images can be linked directly via static web hosting, and it provides lifecycle policies to archive less used data. Additionally, further integration with other AWS services is easy using event triggers.
As for storing/uploading, S3 multi-part upload is very useful to both increase performance and increase reliability.
EFS would make sense for your type of scenario if you were doing some intensive processing where you had a cluster of severs that needed lower latency with a shared file system - think HPC. EFS would also come at a higher cost and doesn't provide as many extensibility options or built-in features as S3. Your scenario doesn't sound like it requires EFS.
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
For the scenario you proposed AWS S3 is the choice. Why?
Since images are more often added, it costs roughly 1/10 th of EFS.
Less overhead on your web servers since files can be directly uploaded and downloaded with S3.
You can leverage event driven processing with Lambda e.g Generating thumbnail, Image processing filters by S3 Lambda trigger.
Higher level of SLA for availability and durability.
Supporting for inbuilt lifecycle management to archival and reduce cost.
AWS EFS can also be an option if it happens to frequently modify the images (Where EBS is also an option)
You can also consider using AWS CloudFront with either the option to cache images.
Note: At the end its not about using a single service. Based on your upcoming requrements you can choose either one of them or both.

Redis for caching image files?

I am using Amazon S3 for storing and retrieving images for an image storing website.
The trouble is that multiple users have to retrieve same image multiple times.
Is it suggested to use Redis or memcached for caching image files by storing them directly onto it.
Amazon S3 pricing for data transfer is much higher than compared to serving images via Redis cache. But storing image files directly on Redis seems to be a bad proposition because I read somewhere that Redis is not good for operating on large data files. Also I don't understand that if Redis stores data on memory how will it store so many images(unless I make many many instances).
Is it advisable to store image files directly onto Redis or is there an alternate for caching these images?
Do pinterest and imgur use Redis and memcache for storing images directly? If not why do they have so many instances?Pinterest
You get credit for creativity, but you have not found a loophole, here.
First, it's entirely inappropriate to try to serve images from elasticache. It's a cache. It's volatile by definition.
Second, it's not a web server.
Third, it's not intended to be exposed to the Internet.
But even if these aren't persuasive, your question seems premised on a misunderstanding of the pricing structure on several levels.
There is no Amazon ElastiCache Data Transfer charge for traffic in or out of the Amazon ElastiCache Node itself.
https://aws.amazon.com/elasticache/pricing/
Technically, this is accurate, but it is not helpful.
This is only relevant to the transfer from elasticache to your EC2 instance. You still have to return the data to the browser, across the Internet, and this costs the same, whether you return it from/through EC2, or from S3.
Data Transfer OUT From Amazon EC2 To Internet
Up to 10 TB / month $0.09 per GB
https://aws.amazon.com/ec2/pricing/
...or...
Data Transfer OUT From Amazon S3 To Internet
Up to 10 TB / month $0.090 per GB
https://aws.amazon.com/ec2/pricing/
Meanwhile, CloudFront is $0.085/GB for traffic sent to browsers that are accessing edge locations in the lowest price class, US and Europe. And you control which edge locations are available when you select a price class other than the global one:
If you choose a price class that does not include all edge locations ... you're charged the rate for the least expensive region in your selected price class.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
That's $0.085 if configured correctly.
There is no charge for transfer from S3 to CloudFront or from EC2 to CloudFront. Only the charge from CloudFront to the Internet.

Caching for I/O intensive S3?

I am writing a video messaging service and the videos will be stored on amazon S3. The nature of video messaging will involve a lot of writing and reading from the S3 storage. Basically as soon as it's written it will be read by another client. I am worried that S3 cannot keep up with the speed and will delay the message delivery time. I already have CloudFront CDN + S3 setup, I wonder if CloudFront is enough to serve as a cache or do I need to setup some sort of memcaching layer above the S3?
CloudFront + S3 should be enough, but do test your assumptions, use multipart upload and measure it all, as this guy did: http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/
At the top, I was pushing more than one gigabyte of data to S3 every second - 1117,9 megs/sec to be precise. That is an awful lot of data, all coming from a single machine. Now imagine you scale this out over multiple machines, and you have the network infrastructure to support it - you can really send a lot of data.

A better solution to host static files besides Amazon S3

I made a mobile application in static html, which is equal to my site wordpress site
The first version was completely static, all texts were in the mobile HTML application.
Today, I updated my application to pull data from the wordpress with AJAX.
The problem is that now, with so many requests being made, the S3 bucket is not being enough.
Despite having decreased from 6kb to 83kb, but it is still more slow because of AJAX..
is it possible put static applications in some other service from Amazon?
For the static content, you should probably be looking at AWS CloudFront instead of S3. As per the page itself:
Amazon CloudFront is a content delivery web service. It integrates with other Amazon Web Services products to give developers and businesses an easy way to distribute content to end users with low latency, high data transfer speeds, and no minimum usage commitments.
Other thing you can leverage is the AJAX caching. That will make your webpage load much faster from the next time. You may also want to using nginx on your server for caching (this will reduce your server load)

Resources