Azure CDN metrics for monitoring - metrics

I have a CDN in Azure and I want to generate a graph in a dashboard or workbook where I can display the number of requests made to the CDN and in particular the number of failed requests (for example, when a request failed because this is a wrong resource URL or the asset is not anymore in the CDN).
Is it possible to do that?

It should be possible with Azure Diagnostics settings.
Under Logs you have option to setup your custom metrics depending on the query/condition.
Here are some of the conditions you can set for measuring the metrics,
Top 10 URL request count
Resuest per hour
Top 10 client IPs and http versions
4XX error rate by URL.
Ref: https://github.com/microsoft/AzureMonitorCommunity/tree/master/Azure%20Services/CDN%20profiles

Related

Does ALB over grpc protocol return network related errors when scaling concurrent load?

We were experimenting load balancing startegies for grpc based services in aws cloud. In addition to client side load balancing recommened in grpc platform, we also wanted to try the ALB offered in aws over the grpc protocol. We created a grpc service written in golang with two instances and followed all the steps like creating Target groups, configuring an ALB over grpc protocol and health checks. We wrote a load generation[in golang] tool to send concurrent requests to the service. The load generation tool creates a single grpc client connection and uses the same to send concurrent requests. When the concurrency[workers] is increased[~1000] and run for a period of time, some requests are failing with below error.
code = Unavailable desc = transport is closing
For 250K requests to the ALB in 20mins, around 1k requests were failing in small batches with the above error.
Then to identify the root cause, we used a NLB to test the same load and didn't get any errors.
Note: We are aware that NLB won't load balance requests over single client to multiple instances. This is done just to identify the cause of error.
We added channelz to the service and monitored the number of failed messages in all channels/sockets. The number of failures are below hunder[~70] in the channelz stats.
We also noticed that the monitoring stats for the alb showed 4xx error codes.
Please share suggestions to debug the failures from ALB or articles around the internals of AWS ALB to figure out the solution.

Google oAuth API 503

I have a service which requires the usage of refresh credential API from Gmail and I have recently noticed a surge in HTTP 503 errors for the following API: https://accounts.google.com/o/oauth2/token
This happens for a certain duration of times and twice it coincided with a gmail downtime according to Google App Status. I have also checked to make sure that any Quota limits for gmail API was not hit from the admin console.
Please advice on how to proceed further on this.
Editing the question to provide further details from comments:
There are separate limits on authentication API (like the token endpoint).
-- Where do I find the limits on authentication API in Google developer console? I could only find the limits for Application APIs like Gmail/Google Calendar.
Questions:
How often are you calling this API/token endpoint?
-- once every ~50-60 mins for a user
Is this for the same user/token? (for the same user, you should try to use the access token until the expiry time that is 1 hour).
-- No this is for different users. For the same user, the same access token is used until its expiry.
If your server is making a lot of requests for different tokens/users, are they coming from the same IP?
-- They are not coming from the same IP, but from few servers (~5) which makes these requests.
What is the max qps you may be hitting?
-- 300 qps on an average (aggregated from all our servers), max would be 450 qps.

Load balancing with nginx

I want to stop serving requests to my back end servers if the load on those servers goes above a certain level. Anyone who is already surfing the site will still get routed but new connection will be sent to a static server busy page until the load drops below a pre determined level.
I can use cookies to let the current customers in but I can't find information on how to to routing based on a custom load metric.
Can anyone point me in the right direction?
Nginx has an HTTP Upstream module for load balancing. Checking the responsiveness of the backend servers is done with the max_fails and fail_timeout options. Routing to an alternate page when no backends are available is done with the backup option. I recommend translating your load metrics into the options that Nginx supplies.
Let's say though that Nginx is still seeing the backend as being "up" when the load is higher than you want. You may be able to adjust that further by tuning the max connections of the backend servers. So, maybe the backend servers can only handle 5 connections before the load is too high, so you tune it only allow 5 connections. Then on the front-end, Nginx will time-out immediately when trying to send a sixth connection, and mark that server as inoperative.
Another option is to handle this outside of Nginx. Software like Nagios can not only monitor load, but can also proactively trigger actions based on the monitor it does.
You can generate your Nginx configs from a template that has options to mark each upstream node as up or down. When a monitor detects that the upstream load is too high, it could re-generate the Nginx config from the template as appropriate and then reload Nginx.
A lightweight version of the same idea could done with a script that runs on the same machine as your Nagios server, and performs simple monitoring as well as the config file updates.

What does the Amazon ELB automatic health check do and what does it expect?

Here is the thing:
We've implemented a C++ RESTful API Server, with built-in HTTP parser and no standard HTTP server like apache or anything of the kind
It has been in use for several months in Amazon structure, using both plain and SSL communications, and no problems have been identified, related to Amazon infra-structure
We are deploying our first backend using Amazon ELB
Amazon ELB has a customizable health check system but also as an automatic one, as stated here
We've found no documentation of what data is sent by the health check system
The backend simple hangs on the socket read instruction and, eventually, the connection is closed
I'm not looking for a solution for the problem since the backend is not based on a standard web server, just if someone knows what kind of message is being sent by the ELB health check system, since we've found no documentation about this, anywhere.
Help is much appreciated. Thank you.
Amazon ELB has a customizable health check system but also as an
automatic one, as stated here
With customizable you are presumably referring to the health check configurable via the AWS Management Console (see Configure Health Check Settings) or via the API (see ConfigureHealthCheck).
The requirements to pass health checks configured this way are outlined in field Target of the HealthCheck data type documentation:
Specifies the instance being checked. The protocol is either TCP,
HTTP, HTTPS, or SSL. The range of valid ports is one (1) through
65535.
Note
TCP is the default, specified as a TCP: port pair, for example
"TCP:5000". In this case a healthcheck simply attempts to open a TCP
connection to the instance on the specified port. Failure to connect
within the configured timeout is considered unhealthy.
SSL is also specified as SSL: port pair, for example, SSL:5000.
For HTTP or HTTPS protocol, the situation is different. You have to
include a ping path in the string. HTTP is specified as a
HTTP:port;/;PathToPing; grouping, for example
"HTTP:80/weather/us/wa/seattle". In this case, a HTTP GET request is
issued to the instance on the given port and path. Any answer other
than "200 OK" within the timeout period is considered unhealthy.
The total length of the HTTP ping target needs to be 1024 16-bit
Unicode characters or less.
[emphasis mine]
With automatic you are presumably referring to the health check described in paragraph Cause within Why is the health check URL different from the URL displayed in API and Console?:
In addition to the health check you configure for your load balancer,
a second health check is performed by the service to protect against
potential side-effects caused by instances being terminated without
being deregistered. To perform this check, the load balancer opens a
TCP connection on the same port that the health check is configured to
use, and then closes the connection after the health check is
completed. [emphasis mine]
The paragraph Solution clarifies the payload being zero here, i.e. it is similar to the non HTTP/HTTPS method described for the configurable health check above:
This extra health check does not affect the performance of your
application because it is not sending any data to your back-end
instances. You cannot disable or turn off this health check.
Summary / Solution
Assuming your RESTful API Server, with built-in HTTP parser is supposed to serve HTTP only indeed, you will need to handle two health checks:
The first one you configured yourself as a HTTP:port;/;PathToPing - you'll receive a HTTP GET request and must answer with 200 OK within the specified timeout period to be considered healthy.
The second one configured automatically by the service - it will open a TCP connection on the HTTP port configured above, won't send any data, and then closes the connection after the health check is completed.
In conclusion it seems that your server might be behaving perfectly fine already and you are just irritated by the 2nd health check's behavior - does ELB actually consider your server to be unhealthy?
As far as I know it's just an HTTP GET request looking for a 200 OK http response.

Throttle downloads on a server

I am building a download application which would allow clients to download data( documents + images) from a server, which has exposed download functionality through web services apis. Each client might download anywhere from 1GB-10GB of data. What I am looking for is a possible mechanism to throttle the downloads, so that if too many clients simultaneously start the downloads then the server should not go down because of the load.
What are the standard mechanisms for throttling downloads on the server?
We finally decided to go for a rate of download approach, where the client pings the server for a rate of download and the server sends the rate at which client should download. The rate of download is calculated on the server based on the number of active clients.
Throttling is possible at almost any level: you could add it to your code, but it is also possible on any decent firewall.
Inbetween, you could throttle a VM or (if you're talking Linux) you can throttle applications using cgroups.

Resources