Adding edge caching for GCE - caching

I'm trying to add some caching for our Compute Engine server using cache-control and max-age, but I don't see any caching happening.
From the description here - http://www.youtube.com/watch?v=uFbfULXoXn8&feature=share&t=2m1s it should work seamlessly.
Looks like only GCS and GAE are supported. Or am I missing something?
It would be really, really great to have support for edge caching in GCE as well

As more in the comments, GCE doors not provide a CDN or edge caching solution with the layer 3/layer 4 load balancer. Using cache-control and max-age will still allow other intermediate proxies (e.g. ISP or mobile network caches) to cache your content.
At the moment, the best solution I can think of if you need edge caching is to hire one of the companies that specializes in this, such as Akamai. You can then point Akamai at your load-balanced GCE IP address as the content source, the same way you would with an on-premise server.
Sorry I can't point you to a more-integrated solution right now; since this is supported on GAE, I have to imagine that the GCE networking team is aware of the interest.

Related

Cloudfront TTL: Setting max TTL to 0 to just get DDOS protection benefits

I've been reading the Cloudfront docs and I want to make sure that my plan is reasonable. I have a backend API structured as an EC2 HTTP server with frequently updating content (several changes per second). This is my understanding:
I shouldn't expose this HTTP server directly to clients because that makes the EC2 server vulnerable to DDOS attacks
Creating a layer of indirection with CloudFront edge locations helps defend against DDOS because AWS can deploy a firewall at the outside of the network rather than right around my EC2 instance
By setting Maximum TTL = 0, I ensure that Cloudfront is merely an indirection layer and doesn't try to do any actual caching so that users always get up-to-date information.
Are these assumptions correct / does my plan sound reasonable? It seems from reading online that this is a nonstandard use of Cloudfront.
This is a perfectly reasonable plan.
It isn't the primary use case for which AWS markets CloudFront (as a CDN), but one can hardly argue that the practice isn't within the design scope of the product.
Amazon CloudFront accepts expiration periods as short as 0 seconds (in which case Amazon CloudFront will revalidate each viewer request with the origin). Amazon CloudFront also honors special cache control directives such as private, no-store, etc.; these are often useful when delivering dynamic content that may not be cached at the edge.
https://aws.amazon.com/cloudfront/dynamic-content/
Of course, given enough traffic, there's some level that will still be enough to overload your server, but, yes, this is a solid strategy.
Under the hood, API Gateway Edge-Optimized endpoints and the S3 Transfer Acceleration feature both use CloudFront with caching completely disabled. In both cases, you can't see CloudFront distributions in your console that correlate to these services, but this is how they work.

Caching proxy solutions for internet bound HTTPS traffic

Sorry if this is inappropriate for SO but wasn't sure where best to ask this question!
Background:
Running applications on EC2 Container Service (ECS) inside an AWS VPC.
There's the potential to move the functions making the requests to Lambda functions in the near future (3-6 months).
What I'm functionally looking to achieve:
Cache responses from HTTPS traffic to specific URL patterns (eg. subdomain.example.com) for specified periods (eg. 7 days).
We're hitting API limits for free/paid services and looking to inject a layer to handle duplicate requests transparently, not easy to handle at the application layer unfortunately.
Have this applied at a VPC (eg. InternetGateway?) level or ECS service level - not too fussed which one.
Ideally this is transparent to the application itself that'd be fantastic but guessing the fact it's HTTPS traffic may throw a spanner in the works of that. Was initially thinking this may be possible at the InternetGateway level but assuming that doesn't have easy access to request headers.
Potential solutions:
Squid? (https://aws.amazon.com/articles/6463473546098546)
Linkerd? - Seems like something should be possible with this (we also don't have a unified service discovery approach at the moment so this might kill 2 birds with 1 stone).
Any suggestions would be greatly appreciated!
Alex
PS. As you can probably tell I'm a little out of my depth in this one, sorry if I'm mixing patterns/solutions!
If I understand your question correctly you want to cache certain responses that you do towards paid/free API's of 3rd parties. I'm wondering wether you're looking for a solution that works inside your VPC or if it's fine if the solution is outside.
When you're OK with some solution running outside of your VPC, Cloudfront might be something worth looking into. Cloudfront can act as a caching layer for any content of any origin, even if the origin connection is using HTTPS. It is even possible to use signed URL's or signed cookies with Cloudfront to restrict unwanted access, if that's what you're going for.

How to prevent being affected by data-center DDoS attack & maintainance related downtime?

I'm hosting a web application which should be highly-available. I'm hosting on multiple linodes and using a nodebalancer to distribute the traffic. My question might be stupid simple - but not long ago I was affected by a DDoS hitting the data-center. That made me think how I can be better prepared next time this happens.
The nodebalancer and servers are all in the same datacenter which should, of course, be fixed. But how does one go about doing this? If I have two load balancers in two different data centers - how can I setup the domain to point to both, but ignore the one affected by DDoS? Should I look into the DNS manager? Am I making things too complicated?
Really would appreciate some insights.
Thanks everyone...
You have to look at ways to load balance across datacenters. There's a few ways to do this, each with pros and cons.
If you have a lot of DB calls, running to datacenters HOT can introduce a lot of latency problems. What I would do is as follows.
Have the second datacenter (DC2) be a warm location. It is configured for everything to work and is constantly getting data from the master DB in DC 1, but isn't actively getting traffic.
Use a service like CLoudFlare for their extremely fast DNS switching. Have a service in DC2 that constantly pings the load balancer in DC1 to make sure that everything is up and well. When it has trouble contacting DC1, it can connect to CloudFlare via the API and switch the main 'A' record to point to DC2, in which case it now picks up the traffic.
I forget what CloudFlare calls it but it has a DNS feature that allows you to switch 'A' records almost instantly because the actual IP address given to the public is their own, they just route the traffic for you.
Amazon also have a similar feature with CloudFront I believe.
This plan is costly however as you're running much more infrastructure that rarely gets used. Linode is and will be rolling out more network improvements so hopefully this becomes less necessary.
For more advanced load balancing and HA, you can go with more "cloud" providers but it does come at a cost.
-Ricardo
Developer Evangelist, CircleCI, formally Linode

Would you use Amazon CloufFront as a Cache for a website?

I have been using Amazon CloudFront for a while now as a cache and edge location for my css, js, image files. I am now thinking about using it for hosting all of my static html files as well. In essence my www.example.com and example.com will be hosted via CloudFront and I will use a separate tomcat server at my.example.com for all the dynamic stuff.
Any feedback about this? Suggestions?
Thanks,
Assaf
This is exactly what CloudFront is designed for. I think you will find this approach is typical of many high traffic web sites.
The only downside is added cost...
I used cloudfront for some time, but recently switched to Google Page Speed Service. It is a little light on features currently, but it deals with edge locations and all the tricks required to speed up you page.
It is currently in beta, but I've had no problems over the 2 months I've been using it. The only question is how much it'll cost when it leaves beta.

Should I host Website and REST API on the same server or split?

I have a web application that consists of Website and REST API. Should I host them on the same server or should I host them on different servers? By "server" I mean a server cluster - several servers behind load balancer.
API is mostly inbound traffic, website - mostly outbound.
If it matters - hosted on Rackspace and/or AWS.
Here is what I see so far:
Benefits of having Website and REST API on the same server
Simple deployment
Simple scaling - something is slow - just launch another instance
Single load balancer configuration
Simple monitoring
Simple, simple, simple ...
Effective use of full duplex network (API - inbound, website - outbound)
Benefits of splitting
API overload will not affect website load time
Detailed monitoring (I will know which component uses resources at this moment)
Any comments?
Thank you
Alexander
Just as you stated, in most situations, there are more advantages in hosting the API on the same server as the website. So I would stick with that option.
But if you predict allot of traffic for either the website or the API, then maybe a separate server would be more suited.
If this is on a load balancer why don't you leave the services and pages on the same site and let the load balancer/cluster do its job?
Your list of advantages/disadvantages are operational considerations, but you should consider application needs as well.
Caching?
Security?
Other resources, i.e. filesystem
These may or may not apply, but if your application architecture is different between the two, be sure to factor this into your decision.

Resources