I have a Google Cloud CDN set up for a Google Cloud Storage bucket, and everything seems to be working correctly. When I check "Network services > Cloud CDN in the GCP console, I see my load balancer with the correct backend and a "Cache hit ratio". The cache hit ratio moves about quite a lot and ranges between 0 and 90%. While I understand that this will vary, I would still like to track the hit ratio.
Using Stackdriver, I can monitor the load balancer, but I can't see any metric for "hit ratio" when looking into the load balancer.
Is there a way to see a time series metric for "cache hit ratio" for a specific load balancer using Stackdriver (or any other method)?
According to this document Cache Hit get logged using the field "httpRequest.cacheHit=true". You can create a Log-based Metric to use the logs in Stackdriver Monitoring
I didn't find any option to display percentage of cache hit true to show in Stackdriver Monitoring. I am sharing this document which lets you combine time series based on common statistics like max, mean, std.dev etc.
Related
As a part of performance Testing on cloud Foundry applications, i am now focusing more on server side (i.e containers where applications are stored) and interested in pulling out metrics which are useful to find bottlenecks such as
1) CPU consumption,
2) disk usage,
3) memory usage
4) Logs
Searched around internet but instead got a lot of confusions.Anyone can please suggest framework or tool that can be used to achieve the same using a windows OS.
The proper way to get metrics & logs would be through the firehose.
https://docs.cloudfoundry.org/loggregator/architecture.html#firehose
You use a Nozzle to get the information from the firehose.
https://docs.cloudfoundry.org/loggregator/architecture.html#nozzles
If you just want to experiment and see what information is available, you can use the firehose-plugin for the cf cli.
https://github.com/cloudfoundry-community/firehose-plugin
Ideally, you'd end up finding or writing a nozzle to integrate with your metrics and log capturing platform. For example, there is a DataDog nozzle for sending metrics off to DataDog.
https://github.com/cloudfoundry-incubator/datadog-firehose-nozzle
There's also a nozzle for sending logs to a syslog server (like ELK).
https://github.com/cloudfoundry-community/firehose-to-syslog
And there's one for Splunk too.
https://github.com/cloudfoundry-community/splunk-firehose-nozzle
Hope that helps!
Current situation
I have a Java Tomcat application running on ElasticBeanstalk. The application is a webservice that receives search queries and returns the results in Xml format. The webservice is only updated with new data once a month so any query sent at the end of the month will return identical results to one returned at the start of the month.
We take advantage of EB's load balancing so that usually just one EC2 instance is running but at time of peaks usage and another EC2 instance may get started.
To allow deployment of new versions Elastic Beanstalk we have a domain name on Route53, and a subdomain mapped to the the EB Application, customers use this subdomain in order to use the webservice.
This is working reasonably well, except peak usage can be somewhat higher than normal usage resulting in need more instances to be started increasing cost but also a slower response rate even with the extra machine.
Should I use CloudFront
I was wondering if I could use CloudFront to cache these responses, Im making these assumption
There would be less peaks and troughs on EB
I would save me money assuming cloudfront requests cheaper then extra load on EB
It would improve response rate for customers not near my EB server, i.e EB server is based in EU but I have many US customers.
If so how do I do it
I went to try and create a Cloudfront Distribution but in the Original Domain Name field it only listed my s3 buckets not my S3 domain so havent gone any further.
I always put cloudfront in front of any solutions I deliver on AWS. In response to your specific questions:
Most likely yes, it would off load some of the work that may go to an EC2 instance so it might prevent an extra instance from spinning up sometimes.
Maybe, maybe not. It might save you money, but its also possible that it could end up costing you a fortune. Cloudfront can be abused by a hacker, if for no other reason than to give you a huge bill, so you may want to add a billing alert so you are not suprised by this.
Yes, it all likelihood it will improve the responsiveness of you websites. Thats the prime reason I always use it.
CloudFront does allow you to serve dynamic content: http://aws.amazon.com/cloudfront/dynamic-content/ however from reading there it seems that it would cache query results based on a URL pattern. Would that be compatible with your site's use?
Information on how to specify an EC2 as a CloudFront origin can be found here: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/CustomOriginBestPractices.html
I have a simple Django app set up that I have been testing with Blitz.io.
When I test with many dynos I can get thousands of req/s on http:// X.com
When I switch to https:// X.com I get no more than 72 req/s no matter how many dynos.
And on https:// X.herokuapp.com/ I get more, but it still tops out at a few hundred req/s.
Is this a fluke that won't show up with normal use cases? A Blitz issue? A heroku issue? Would resources just be scaled up with demand?
We had a very similar issue with blitz.io and loader.io. See our blog post at http://making.fiftythree.com/load-testing-an-unexpected-journey/ for more details. It's very possible that blitz.io is the cause of your issue with SSL. We found that BlazeMeter could handle the load quite well.
If cost is a concern you might also want to try open source tools like siege or JMeter.
This answer assumes https://X.com uses the SSL:endpoint heroku addon to serve a custom cert.
The ssl:endpoint addon is implemented using an AWS Elastic Load Balancer. ELBs distribute load amongst their nodes using DNS. In my experience, each individual ELB node isn't particularly beefy, and SSL negotiation/decryption is non-trivial from a CPU perspective. So it's important when load testing to:
Re-resolve the hostname with each request to distribute load amongst all the ELB ips, especially as new ones are added in response to increased traffic.
Ramp up your load test very slowly. Amazon advises to increase load on ELBs at most by 50% every 5 minutes.
I'm not particularly suprised if the difference between HTTP/HTTPS capacity in terms of concurrent connections allowed on a single ELB node is substantial, which if you're pinned to one IP, may account for the difference you're observing.
I don't know the details of the https://*.herokuapp.com stack, but I'm not surprised at all that it can service quite a bit more https traffic than a cold ssl:endpoint ELB.
Is there any tool by which I can record all the requests/traffic which currently hits my production website and then replay this load on a different environment to check the perf of new environment?
Basically, I want to be able to test the performance of my application on aws cloud and what configuration is required to handle the current load on production if it is migrated to aws.
You could use JMeter's Access Log Sampler (see also Access log replay for load testing? Jmeter Pitfalls and Competitors).
This would allow you to take the logs from your production server, and replay the traffic against your new server. Not sure about it replicating the exact load profile - real traffic tends to be spread over the day, with peaks and troughs in visits depending on your time zone and your users; it also doesn't deal with POST requests.
In fact, for any web app that isn't about retrieving web pages, replayineg historical traffic is likely to be problematic. If users have to log in, for instance, you need to know their passwords; if they browse a product catalogue in an ecommerce site, you need to have the right data to reflect the catalogue as it was when you recorded the log file.
Far more useful, in my view, is to build a performance model based on your current traffic, and understand the peak number of page requests / second you need to be able to support for each (type of) page.
For instance, if you know that today, you have 10K visitors/hour, and you know the most common user journeys, you can build a performance model that equates those 10K users into "login page requests per second", "product home page requests / second", "payment page requests/ second"; you can then use a tool like JMeter to model those journeys, and ramp up the load till you exceed your targets.
What analysis do you currently perform to achieve performance metrics that are acceptable? Metrics such as page weight, response time, etc. What are the acceptable metrics that are currently recommended?
This is performance related, so 'it depends' :)
Do you have existing metrics (an existing application) to compare against? Do you have users that are complaining - can you figure out why?
The other major factor will depend on what network sits between the application and the users. On a LAN, page weight probably doesn't matter. On a very slow WAN, page size (esp WRT to TCP windowing) is going to dwarf the impact of server time.
As far as analysis:
Server response time, measured by a load test tool on the same network as the app
Client response time, as measured by a browser / client either on a real or simulated network
The workload for 1) follows the 80/20 rule in terms of transaction mix. For 2), I look at some subset of pages for a browser app and run empty cache and full cache cases to simulate new vs returning users.
Use webpagetest.org to get waterfall charts.
Do RUM, Real User Monitoring, using Google Analytics snippet with Site Speed Javascript enabled.
They are the bare minimum to do.