Spring Boot Reactive: elapsed time calculation - spring-boot

I am currently using Spring Boot Reactive (using webflux) to develop a microservice. In it, I implement some kind of elapsed time calculation to determine how long it took for the process to run.
Basically, when the process start, I initialize the current timestamp to mark as the start timestamp as follows:
...
metrics.setStartMillis(System.currentTimeMillis());
...
and then it will print whether the process success or not along with the elapsed time in doOnSuccess() and onErrorResume() respectively:
...
metrics.setStartMillis(System.currentTimeMillis());
return webclientAdapter.getResponse(request)
.doOnSuccess(success -> {
metrics.info(true); // This will print a metric log indicating a success process along with process time with current timestamp
})
.onErrorResume(error -> {
metrics.info(false); // This will print a metric log indicating a failed process along with process time with current timestamp
});
...
When testing the service by mocking the backend call with a 100ms delay using cUrl, the elapsed time manage to be printed correctly (~100ms elapsed time), however, during a load test using jmeter, the printed elapsed time become very fast (~0 - 20ms ish) although the service is configured to call the mock backend with 100ms delay.
Does this have to do with the nature of Reactive being in an event loop, and if so, how can I ensure that the calling process elapsed time is able to be calculated properly?
Pardon if there is any confusion, feel free to ask additional information.

Related

KafkaConsumer poll() behavior understanding

Trying to understand (new to kafka)how the poll event loop in kafka works.
Use Case : 25 records on the topic, max poll size is set to 5.
max.poll.interval.ms = 5000 //5 seconds by default max.poll.records = 5
Sequence of tasks
Poll the records from the topic.
Process the records in a for loop.
Some processing login where the logic would either pass or fail.
If logic passes (with offset) will be added to a map.
Then it will be committed using commitSync call.
If fails then the loop will break and whatever was success before this would be committed.The problem starts after this.
The next poll would just keep moving in batches of 5 even after error, is it expected?
What we basically expect is that the loop breaks and the offsets till success process message logic should get committed, then the next poll should continue from the failed message.
Example, 1st batch of poll 5 messages polled and 1,2 offsets successful and committed then 3rd failed.So the poll call keep moving to next batch like 5-10,10-15 if there are any errors in between we expect it to stop at that point and poll should start from 3 in first case or if it fails in 2nd batch at 8 then the next poll should start from 8th offset not from next max poll batch settings which would be like 5 in this case.IF IT MATTERS USING SPRING BOOT PROJECT and enable autocommit is false.
I have tried finding this in documentation but no help.
tried tweaking this but no help max.poll.interval.ms
EDIT: Not accepted answer because there is no direct solution for a customer consumer.Keeping this for informational purpose
max.poll.interval.ms is milliseconds, not seconds so it should be 5000.
Once the records have been returned by the poll (and offsets not committed), they won't be returned again unless you restart the consumer or perform seek() operations on the consumer to reset the offset to the unprocessed ones.
The Spring for Apache Kafka project provides a SeekToCurrentErrorHandler to perform this task for you.
If you are using the consumer yourself (which it sounds like), you must do the seeks.
You can manually seek to the beginning offset of the poll for all the assigned partitions on failure. I am not sure using spring consumer.
Sample code for seeking offset to beginning for normal consumer.
In the code below I am getting the records list per partition and then getting the offset of the first record to seek to.
def seekBack(records: ConsumerRecords[String, String]) = {
records.partitions().map(partition => {
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(0).offset()
consumer.seek(partition, offset)
})
}
One problem doing this in production is bad since you don't want seekback all the time only in cases where you have a transient error otherwise you will end up retrying infinitely.

SimpleMeterRegistry clears data if data not polled every minute

I have a simple spring boot app with the following config (the project is available here on GitHub):
management:
metrics:
export:
simple:
mode: step
endpoints:
web:
exposure:
include: "*"
The above config creates SimpleMeterRegistry and configures its metrics to be step-based, with 60 seconds step. I have one script that sends 50-100 requests per second to the service dummy endpoint and there's the other script that polls the data from /actuator/metrics/http.server.requests every X seconds. When I run the latter script every 60 seconds everything works as expected, but when the script is run every 120 seconds, the response always contains zeros for TOTAL_TIME and COUNT metrics.
Can anyone explain this behavior?
I have read the documentation here. The picture below
could indicate that a registry will try to aggregate the data for the previous interval only if pollAsRate is called during the current interval. This will explain why it does not work for 120 seconds interval. But this is just my assumption, does anyone know what is really happening here?
Spring boot version: 2.1.7.RELEASE
UPDATE
I did a similar test with management.metrics.export.simple.step=10s, it works fine when polling interval is 10s and not working when it is 20s. For 15s interval it sporadically works. So, it's definitely related to the step size and polling frequency.
MAX, TOTAL_TIME, COUNT is the property of Statistic.
DistributionStatisticConfig has .expiry(Duration.ofMinutes(2)) which sets the some measutement to 0 if there is no request has been made for last 2 minutes (120 seconds)
Methods such as public TimeWindowMax(Clock clock,...), private void rotate() has been written for the same. You may see the implementation here
More Detailed Answer
Finally figured out what is happening.
On every request to /actuator/metrics, MetricsEndpoint is going to merge measures (see here). That is done by collecting values for all meters with measurement.getValue(). The StepMeasurement.getValue() will not simply return the value, it will update the current and the previous intervals and counts, and roll the count (see here and here).
StepMeasurement.getValue
public double getValue() {
double absoluteCount = (Double)this.f.get();
double inc = Math.max(0.0D, absoluteCount - this.lastCount.sum());
this.lastCount.add(inc);
this.value.getCurrent().add(inc);
return this.value.poll();
}
StepDouble.poll
public double poll() {
rollCount(clock.wallTime());
return previous;
}
How is this related to the polling interval? If you do not poll /actuator/metrics endpoint, the current and previous intervals will not be updated, thus resulting in the current interval not being up-to-date and metrics being recorded for the "wrong" interval.

Early wakeups in WaitForSingleObject() ...?

Everything I've read both on the MS docs site (where it's not really addressed) and here in SO, says that Windows WaitForSingleObject() is not subject to spurious wakeups and it waits for at least the provided time, and maybe longer. However, my testing says this is not true and in fact early wakeups almost always happen. Is the "common wisdom" wrong and I just need to add loops to handle early wakeups, or am I doing something wrong and I need to keep banging my head on this to try to figure out?
Unfortunately the full code is too complex to post here, but I have two different threads each with their own event, created via:
event = CreateEventA(NULL, false, false, NULL);
(event is a thread-local variable). I have a mutex I use to ensure that both threads start running at about the same time.
In each thread I call WaitForSingleObject(). In this specific test, I never call SetEvent() so the only way to finish is via timeout, and the return code shows that's what happens. However, the actual amount of time spent waiting is massively variable and 90% of the time is less than the time I requested. I've instrumented this using QueryPerformanceCounter() to detect how long is spent here and it's just wrong. Here's the instrumented code:
LARGE_INTEGER freq, ctr1, ctr2;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&ctr1);
DWORD ret = WaitForSingleObject(event, tmoutMs);
QueryPerformanceCounter(&ctr2);
uint64_t elapsed = ((uint64_t)ctr2.QuadPart - (uint64_t)ctr1.QuadPart) * 1000000ULL / (uint64_t)freq.QuadPart;
(here elapsed is kept in microseconds, just to be a bit more specific)
Then I print this info out. In one thread tmoutMs is 2, and in the other thread tmoutMs is 100. Almost every time the returned values are too short: the 2ms wait can take anywhere from 700us up, and the 100ms wait takes from about 93ms up. Only once in 7 tries or so will the elapsed time be >100ms. Here are some sample outputs:
event=104: pause(tmoutMs=2) => ret=258 elapsed us=169, ms=0
event=112: pause(tmoutMs=100) => ret=258 elapsed us=93085, ms=93
event=104: pause(tmoutMs=2) => ret=258 elapsed us=427, ms=0
event=112: pause(tmoutMs=100) => ret=258 elapsed us=94002, ms=94
event=104: pause(tmoutMs=2) => ret=258 elapsed us=3317, ms=3
event=112: pause(tmoutMs=100) => ret=258 elapsed us=96840, ms=96
event=104: pause(tmoutMs=2) => ret=258 elapsed us=11461, ms=11
event=112: pause(tmoutMs=100) => ret=258 elapsed us=105189, ms=105
The return code is always WAIT_TIMEOUT as expected.
Is this reasonable, even though it's not documented (or is it documented somewhere that I just can't find), and I just have to loop on my own to handle early wakeups?
FWIW, this is a C++ program compiled with Visual Studio 2017 running on Windows10. It's a unit test program using Google Test, and has no graphical interface: it's command-line only.

Is it the Windows system time could retreat occasionally

Spring boot project, log how many time took to save 2 DB,
long start = System.currentTimeMillis();
getDao().batchInsert(batchList);
long end = System.currentTimeMillis();
log.info("Save {} data 2 DB successfully took time: {}", getDescName(), (end - start));
Very strangely, I found there is situation time cost is negativeļ¼Œsee below
2019-05-16 14:41:04.420 INFO 3324 --- [ave2db-thread-2] c.c.sz_vss.demo.AbstractSave2DBProcess : Save Stock data 2 DB batch size: 416
2019-05-16 14:41:03.152 INFO 3324 --- [ave2db-thread-2] c.c.sz_vss.demo.AbstractSave2DBProcess : Save Stock data 2 DB successfully took time: -1268
Why does this happen? Is it spring boot log system bug? or is it the Windows system time could retreat occasionally?
Network time synchronization can apply a correction in either direction, so yes the system calendar can move backwards. Also in timezones that observe daylight savings time, you will see +/- 1 hour discontinuities each year.
That's why it is not recommended to use the system calendar for measuring elapsed time. There are monotonic timers (on Windows, QueryPerformanceCounter() combined with QueryPerformanceFrequency(), on POSIX such as Linux it is clock_gettime(CLOCK_MONOTONIC)). Managed frameworks usually wrap these in an object named "Stopwatch".

Response time different in Postman/Jmeter and web API

I have an MVC Web aPI and I have trouble in comparing the response time of this API. I added some code to calculate the response time:
In the AuthorizationFilterAttribute OnAuthorization, I have the below code:
actionContext.Request.Headers.Add("RequestStartTime", DateTime.Now.ToString());
I have an ActionFilterAttribute, and an OnActionExecuted in which I have the below code:
string strRequestStartTime = actionExecutedContext.Request.Headers.GetValues("RequestStartTime").First();
DateTime dtstartTime = DateTime.Parse(strRequestStartTime);
TimeSpan tsTimeTaken = DateTime.Now.Subtract(dtstartTime);
actionExecutedContext.Response.Headers.Add("RequestProcessingTime", tsTimeTaken.TotalMilliseconds + "ms");
The response has the header "RequestProcessingTime" in milli seconds. The issue is whenever I try the same request using Postman/JMeter, I see that the response time is lesser than what I see in my Response. Why is this happening?
I think this is due to the fact the header does not consider time for request to reach the server and response to travel back, my expectation is that it shows only the time, required to process the request on the server side. So JMeter reports time as delta from the time when request has been sent and the time when the last byte has been received, which is more correct in terms of real user experience.
See definitions of "Elapsed Time", "Connect Time" and "Latency" in the JMeter Glossary. You may also be interested in How to Analyze the Results of a Load Test article which demonstrates the impact of network capacity on the overall performance

Resources