Recommended way to measure http requests - metrics

From the OTel Metrics specification Counter is the recommended instrument to measure the number of requests completed. This can later be used to calculate the throughput rate.
Example uses for Counter:
count the number of bytes received
count the number of requests completed
However, OTel's semantic conventions do not include a metric for such a use case, with http.server.active_requests being the closest thing.
Name
Instrument
Unit
Unit (UCUM)
Description
http.server.active_requests
Asynchronous UpDownCounter
requests
{requests}
measures the number of concurrent HTTP requests that are currently in-flight
Granted, the Metrics Semantic Conventions are still "Experimental", but this seems like such a basic use case which is even mentioned in the API spec.
My questions are:
Is the counting of requests via a counter not recommended?
What is the best way to monitor throughput?
Thanks in advance.

Not a spec creator but the number of requests is indeed a typical example for counters and is therefore recommended ubiquitously, e.g. check Prometheus docs.
Why is it not stated in the semantic conventions spec - hard to say. But yes it is very experimental, there are 400+ issues in the whole thing and you may create another one :)

Related

Suggestion regarding max concurrent consumer

I have a spring integration application and I am using message driver adapter to consume messages from external systems. To handle the messages concurrently I have setup concurrent (5) and maximum concurrent consumers (20) which is working fine.
But for production scenario I wanted to fine tune it further. I just want to understand that if we have any standard suggestion regarding how much we can increase this maximum concurrent consumer to? I understand that this is purely dependent on the application and how much traffic is coming to it but I hope there should be some standard process to figure out this number. If we blindly increase this number to a random value like 1000 than it might lead to resource starvation, conflicts etc so I am trying to understand the process of how to go about fine tuning this property.
Thanks!
There is no standard process as there is no standard performance requirement. It all depends on your SLA and performant system is the one that meets your SLA (as there is no such thing as beats SLA).
The main caveat when it comes to concurrent consumers is the order of messages. Basically once you introduced more then one consumer you can not and should not assume any guarantees of message ordering.

Find the no. of requests made in last minute

Given requests at random times, return the requests from the last 1 minute
This was a question asked in Microsoft technical interview. I could not find any more details about the problem. Can anyone suggest how to approach the problem
This is really an interesting question which serves as a baseline for multiple cloud services which operate upon the idea of throttling. The ideology behind throttling is to limit the number of requests per second from a given client depending upon the throughput he's paying for. An example of such a service is DynamoDB from AWS.
Since cloud services usually have a high level of clients and traffic, one must design a solution-at-scale which works at high load. A queue would indeed be a data-structure of choice to handle such a scenario. However, would enqueuing and dequeuing millions of transaction per minute be efficient? A general way to avoid having a big queue tail is by introducing a precision trade-off through batching.
A blog which defines this concept in depth is this: https://medium.com/#saisandeepmopuri/system-design-rate-limiter-and-data-modelling-9304b0d18250
Let me know if you need any more explanation about the same. Cheers!
Make a queue.
Add new requests to the queue tail.
After every adding and before checking remove too old ones from the queue head.
When checking needed - return queue size

How can we determine how much web requests per second a machine can handle without load testing?

What are some of the things that determine how many web requests a single machine can handle? In general what's an average number (requests per second) that most machines should be able to handle? For example, I see some answers that say 2k requests/s can easily be handled. How about 5k? 10k? etc.
I'm basically trying to do my best at estimating how many machines I'd need to scale to some high throughput, before I dive into load testing.
Yes, That is possible through performance modelling but the answers will have 5-10% error margin.
If you know exact size of web request then probably you can find your nw limit and thus this gives you maximum possible request acceptance limit. some exploratory test with sample test you can get the cpu time required by each request roughly(in terms of response time or thoughput). thus you can extrapolate the results for higher number of requests using many theorems examples, little's law. Using this theorem you can find maximum no of users (request here) can be supported on a give hw for a give acceptable response time.
but this all is done after tuning your application to expected level otherwise you will end up with lot of hw because of lack of tuning.

Maximum number of concurrent requests a webserver can serve assuming average service time to be known

Is it logical to say: "If average service time for a request is X and affordable waiting time for the requests is Y then maximum number of concurrent requests to serve would be Y / X" ?
I think what I'm asking is that if there're any hidden factors that I'm not taking into account!?
If you're talking specifically about webservers, then no, your formula doesn't work, because webservers are designed to handle multiple, simultaneous requests, using forking or threading.
This turns the formula into something far harder to quantify - in my experience, web servers can handle LOTS (i.e. hundreds or thousands) of concurrent requests which consume little or no time, but tend to reduce that concurrency quite dramatically as the requests consume more time.
That means that "average service time" isn't massively useful - it can hide wide variations, and it's actually the outliers that affect you the most.
Broadly yes, but your service provider (webserver in your case) is capable of handling more than one request in parallel, so you should take that into account. I assume you measured end to end service time and havent already averaged it by number of parallel streams. One other thing you didnt and cannot realistically measure is the delay to/from your website.
What you are heading towards is the Erlang unit (not the language using the same name) which is used to described how much load a system can take. Erlangs are unitless (it is just a number) and originated from old school telephony, POTS, where it was used to describe how many wires were needed to handle X calls per time period with low blocking probability. Beyond erlang is engset which is used more for high capacity systems, such as mobile systems.
It also gets used for expensive consultant reports into realtime computer systems and databases to describe the point at which performance degradation is likely to occur. Wikipedia has an article on this http://en.wikipedia.org/wiki/Erlang_(unit) and the book 'Fixed and mobile telecommunications, network systems and services' has a good chapter on performance analysis.
While aimed at telephone systems, just replace with word webserver and it behaves the same. A webserver is the same concept, load is offered that arrives at random intervals to a system with finite parallel capacity. In your case, you can probably calculate total load with load tools easier than parallel capacity and then back calculate the formulas. This is widely done to gain a level of confidence in overall system models.
Erlang/engsetformulas are really useful when you have a randomly arriving load over parallel stream (ie web requests) and a service time that can only be averaged or estimated (ie it varies in real life). You can then calculate the blocking probability, which is the probability a new request will need to wait while current requests are serviced, and how long it will wait. It also helps analyse whether you need to handle more requests in parallel, or make each faster (#lines and holding time in erlang speak)
You will probably look into queuing systems analysis next, as a soon as requests block (queue), the models change slightly.
many factors are not taken into account
memory limits
data locking constraints such as people wanting to update the same data
application latency
caching mechanisms
different users will have different tasks on the site and put different loads
That said, one easy way to get a rough estimate is with apache ab tool (apache benchmark)
Example, get 1000 times the homepage with 100 requests at a time:
ab -c 100 -n 1000 http://www.example.com/

Which are the most Relevant Performance Parameters / Measures for a Web Application

We are re-implementing(yes from scratch) a web application which is currently in production. We have decided to start doing some performance tests on the new app, to get some early information of the capabilities.
As the old application is currently in production and has a good performance we would like to extract some performance parameters, and then use this parameters as a reference or base goal of the performance of the new application.
Which do you think are the most relevant performance parameters we should be obtaining from the current production application?
Thanks!
Determine what pages are used the most.
Measure a latency histogram for the total time it takes to answer the request. Don't just measure the mean, measure a histogram.
From the histogram you can see how many % of requests have which latency in milliseconds. You can choose to key performance indicators by takes the values for 50% and 95%. This will tell you the average latency and the worst latency (for the worst 10% of requests).
Those two numbers alone will bring you great confidence regarding the experience your users will have.
Throughput does not matter for users, but for capacity planning.
I also recommend that you track the performance values over time and review them twice a year.
Just in case you need an HTTP client, there is weighttp, a multi-threaded client written by the guys from Lighttpd.
It has the same syntax used by ApacheBench, but weighttp lets you use several client worker threads (AB is single-threaded so it cannot saturate a modern SMP Web server).
The answer of "usr" is valid, but you can as well record the minnimum, average and maximum latencies (that's useful to see in which range they play). Here is a public-domain C program to automate all this on a given concurrency range.
Disclamer: I am involved in the development of this project.

Resources