When does nginx $upstream_response_time start/stop specifically - performance

Does anyone know when, specifically, the clock for $upstream_response_time begins and ends?
The documentation seems a bit vague:
keeps time spent on receiving the response from the upstream server; the time is kept in seconds with millisecond resolution. Times of several responses are separated by commas and colons like addresses in the $upstream_addr variable.
There is also an $upstream_header_time value, which adds more confusion.
I assume $upstream_connect_time stops once the connection is established, but before it is accepted upstream?
After this what does $upstream_response_time include?
Time spent waiting for upstream to accept?
Time spent sending the request?
Time spent sending the response header?

A more specific definition is in their blog.
$request_time – Full request time, starting when NGINX reads the first
byte from the client and ending when NGINX sends the last byte of the
response body
$upstream_connect_time – Time spent establishing a
connection with an upstream server
$upstream_header_time – Time
between establishing a connection to an upstream server and receiving
the first byte of the response header
$upstream_response_time – Time
between establishing a connection to an upstream server and receiving
the last byte of the response body
So
$upstream_header_time is included in $upstream_response_time.
Time spent connecting to upstream is not included in both of them.
Time spent sending response to client is not included in both of them.

I've investigated and debug the behavior around this, and it turned out as follows:
start time
end time
$upstream_connect_time
before Nginx establishes TCP connection with upstream server
before Nginx sends HTTP request to upstream server
$upstream_header_time
before Nginx establishes TCP connection with upstream server
after Nginx receives and processes headers in HTTP response from upstream server
$upstream_response_time
before Nginx establishes TCP connection with upstream server
after Nginx receives and processes HTTP response from upstream server
source code
I'll explain how values are different between $upstream_connect_time and $upstream_response_time, as it's what I was primarily interested in.
The value of u->state->connect_time, which represents $upstream_connect_time in millisecond, is ingested in the following section: https://github.com/nginx/nginx/blob/3334585539168947650a37d74dd32973ab451d70/src/http/ngx_http_upstream.c#L2073
if (u->state->connect_time == (ngx_msec_t) -1) {
u->state->connect_time = ngx_current_msec - u->start_time;
}
Whereas the value of u->state->repponse_time, which represents $upstream_response_time in millisecond, is set in the following section: https://github.com/nginx/nginx/blob/3334585539168947650a37d74dd32973ab451d70/src/http/ngx_http_upstream.c#L4432
if (u->state && u->state->response_time == (ngx_msec_t) -1) {
u->state->response_time = ngx_current_msec - u->start_time;
You can notice that both of values are calculated based on u->start_time, which is the time just before the connection is established, defined in https://github.com/nginx/nginx/blob/3334585539168947650a37d74dd32973ab451d70/src/http/ngx_http_upstream.c#L1533
(note that ngx_event_connect_peer is a function to establish TCP connection between nginx workerprocesses and upstream servers).
Therefore, both values include the time taken to establish the TCP connection. You can check this by doing a live debug with, for example, gdbserver.

Related

Flask & NGINX performance tuning

I have a RESTful web service written with Flask framework.
To make it simple, let's assume that all requests are GET, that is only request line + headers.
Request buffering time usually takes about 100ms, processing time - 1 second/request.
During stress tests I found an issue:
When lots of clients (hundreds+) open a connection to it at the same time, there's a delay between connections opened and start of processing.
It turned out that Flask reads a headers part of HTTP requests upon connection. More connections -> more headers to read -> bigger networking load -> bigger request buffering time.
For example: 100 simultaneously opened connections will start buffering together and will take 0.1*100=10 seconds to buffer. They will then pass to processing stage together.
My intention is to reach 2 goals:
Primary: Start processing as quick as possible
Secondary: Buffer all requests as quick as possible
Despite a seeming contradiction, both of them can be achieved by preserving a rule:
Buffering less connections, if there's a processing starvation.
Once again, to make it simple, I want my server to buffer only 10 connections at a time (/second). All the other connections should be accepted by server socket and wait patiently for their turn. Alternatively, accept only 10 connections/second (and still wait for others, not to discard bursts).
I tried to do it with NGINX reverse proxy with limit_req:
http {
limit_req_zone $server_name zone=processing_limit:10m rate=1r/s;
}
location ~ /process {
# Limit requests
limit_req zone=processing_limit burst=1000;
include proxy_params;
proxy_pass_header Server;
proxy_request_buffering on;
proxy_buffering off;
client_max_body_size 100m;
proxy_pass http://127.0.0.1:8081;
}
But NGINX also buffers all connections together and only forwards 1 request/sec to a backend server.

Mqtt connection & data publisher misinterpreted in JMeter

I have jmeter , where a single thread contains two mqtt gateway connection sampler & each sampler have three publishers connected to iothub.
Jmeter reference:
When I run the thread in loop 6frames / second for 10 seconds, I could see all 60 frames published successful in JMeter.
But when I check data count at iothub, first gate way point have received only 6 frames ( some data get missed it seems, problem with jmeter I assume ) & second gateway have received 42 frames. Second part led to major confusion, when it have to receive maximum of 30 frames, but received 42.
Diagram reference:
Each gateway (A &B) include the Connection panel with :
Iothub URL
Mqtt v 3.1.1
Username: iothuburl/device ID
Pwd: SAS token ( generated SAS from connection string available at iothubowner page from azure portal).
Each Gateway (A&B) include
three publishers & Each includes 200 JSON objects and size doesn't exceed 55kb.
Publisher QoS: 0
Operation:
For every one second, each gateway publish 3 frames ( total 600 JSON objects).
As I have mentioned 2 gateway, so total 6 frames with 1200 JSON object get published successful in JMeter.
But the data is missing at iothub.
note: while running two gateway in single thread, i could both gateway connection sharing the common connection string ID.
Any clue, where did I miss the major configuration, any help would be greatly appreciated. Thanks.
Change the QoS=1 in publisher panel. Though we have few latency time to wait for acknowledgement, but the simulation works fine without any loss of connection/data.

What do the bytes and duration fields in squid log count for https (CONNECT)?

Standard squid config only logs one CONNECT line for any https transaction. What is being counted/timed by the reported bytes and duration fields in that line?
Got an answer via the squid-users mailing list [1]:
Unless you are using SSL-Bump or such to process the contents specially.
The duration is from the CONNECT message arriving to the time TCP close
is used to end the tunnel. The size should be the bytes sent to the
client (excluding the 200 reply message itself) during that time.
[1] http://lists.squid-cache.org/pipermail/squid-users/2016-July/011714.html

Timeout of JMS Point-to-point requests in JMeter does not result in an error

We are using Apache JMeter 2.12 in order to measure the response time of our JMS queue. However, we would like to see how many of those requests take less than a certain time. This, according to the official site of JMeter (http://jmeter.apache.org/usermanual/component_reference.html) should be set by the Timeout property. You can see in the photo below how our configuration looks like:
However, setting the timeout does not result in an error after sending 100 requests. We can see that some of them take apparently more than that amount of time:
Is there some other setting I am missing or is there a way to achieve my goal?
Thanks!
The JMeter documentation for JMS Point-to-Point describes the timeout as
The timeout in milliseconds for the reply-messages. If a reply has not been received within the specified time, the specific testcase failes and the specific reply message received after the timeout is discarded. Default value is 2000 ms.
This is timing not the actual sending the message but receipt of a response.
The source for the JMeter Point to Point will determine if you have a 'Receive Queue' Configured. If you do it will go through the executor path and use the timeout value, otherwise it does not use time timeout value.
if (useTemporyQueue()) {
executor = new TemporaryQueueExecutor(session, sendQueue);
} else {
producer = session.createSender(sendQueue);
executor = new FixedQueueExecutor(producer, getTimeoutAsInt(), isUseReqMsgIdAsCorrelId());
}
In your screen shot JNDI name Receive Queue is not defined, thus it uses temporary queue, and does not use the timeout. Should or should not timeout be supported in this case, that is best discussed in JMeter forum.
Alternately if you want to see request times in percentiles/buckets please read this stack overflow Q/A -
I want to find out the percentage of HTTPS requests that take less than a second in JMeter

ZMQ data transfer latency from one process to another?

when using ZMQ transfer data, the transmitted port is fast and the data is huge, but the receive port processing is slow and the data is accumulated between the two processes. Does any one know how to solve this problem? Thanks.
Instead of sending all the data at once, send in chunks instead. Somethings like this...
Client requests file 'xyz' from server
Server responds with file size only, ex: 10Mb
Client sets chunk size accordingly, ex: 1024b
Client sends read requests to server for chunks of data:
client -> server: give me 0 to 1023 bytes for file 'xyz'
server -> client: 1st chunk
client -> server: give me 1024 to 2047 bytes for file 'xyz'
server -> client: 2nd chunk
...and so on.
For each response, client persists chunk to disk.
This approach allows the client to throttle the rate at which data is transmitted from the server. Also, in case of network failure, since each chunk is persisted, there's no need to read file from beginning; the client can start requesting more chunks from the point before the last response failed.
You mentioned nothing on language bindings, but this solution should be trivial to implement in just about any language.

Resources