How to resolve "Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly"? - performance

While doing API Performance Testing [e.g. No of Threads/Users : 50 with Loop Count : 10], 5 to 8 % samples with POST methods are failing due to below exception :
**Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.**
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.PumpAsync()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.GetReadAsyncResult()
at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
at Microsoft.AspNetCore.WebUtilities.FileBufferingReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
at Microsoft.AspNetCore.WebUtilities.StreamHelperExtensions.DrainAsync(Stream stream, ArrayPool`1 bytePool, Nullable`1 limit, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Mvc.Formatters.JsonInputFormatter.ReadRequestBodyAsync(InputFormatterContext context, Encoding encoding)
at Microsoft.AspNetCore.Mvc.ModelBinding.Binders.BodyModelBinder.BindModelAsync(ModelBindingContext bindingContext)
at Microsoft.AspNetCore.Mvc.ModelBinding.ParameterBinder.BindModelAsync(ActionContext actionContext, IModelBinder modelBinder, IValueProvider valueProvider, ParameterDescriptor parameter, ModelMetadata metadata, Object value)
Could you please suggest how i can fix this ?
NOTE :I correctly implemented async/await in POST method.This is
happening to POST method only and out of 500 samples only 20 to
40 samples are failing per POST method. Providing same inputs while
testing.

You could setKestrelServerLimits.MinRequestBodyDataRate property to null in your Program.cs as below:
.UseKestrel(opt =>
{
opt.Limits.MinRequestBodyDataRate = null;
});
Reference:
KestrelServerLimits.MinRequestBodyDataRate Property

Requests should be resolved fast, and this is a clear warning, that they are not. Exception is, that we are debugging/single-stepping the service, in which case I would prefer more tolerant request limits. In live environments, DDoS and even heavy load can render a system or service unstable very quickly, if limits are too high, because a growing request stack can overwhelm the system.
I made a cache service built on Kestrel and I found two scenarios which could cause this error:
The service received a massive amounts of items for a write operation on a quite busy DB table. This happened due to my own poor design. To resolve: either to optimize the update flow or create a per-task based update handler.
Service requests have stalled due to an instability, suspension or shut-down. (f.ex. TCP connections may not always shut down immediately using a static HttpClient, it sometimes fails to exit gracefully on shut-down)
Code below is from a NET 7.0 based kestrel service
// Limits can also be applied within appsettings.json,
appBuilder.WebHost.ConfigureKestrel(serverOptions =>
{
serverOptions.Limits.MaxConcurrentConnections = 100;
serverOptions.Limits.MaxConcurrentUpgradedConnections = 100;
#if !DEBUG
// Live: Demand higher request data-rate limits and low grace period, services can be unreliable.
// May cause requests to stall resulting in BadHttpRequestException.
// Catch them in your error handler
serverOptions.Limits.MinRequestBodyDataRate = new MinDataRate(
bytesPerSecond: 100,
gracePeriod: TimeSpan.FromSeconds(5)
);
#endif
});
Add error handler to app
// global error handler
app.UseMiddleware<ErrorHandlerMiddleware>();
Error handler:
public class ErrorHandlerMiddleware
{
private readonly RequestDelegate _next;
public ErrorHandlerMiddleware(RequestDelegate next)
{
_next = next;
}
public async Task Invoke(HttpContext context)
{
try
{
await _next(context);
}
#if !DEBUG
catch (Microsoft.AspNetCore.Connections.ConnectionResetException)
{
// Not of interest - client has ended the connection prematurely. Happens all the time.
return;
}
#endif
catch (BadHttpRequestException bad) when (bad.Message.Contains("data arriving too slowly"))
{
#if DEBUG
// The original exception lacks insight.
throw new BadHttpRequestException($"Request data arriving too slowly: {context.Request.Path}");
#endif
}
}
}

Related

Immediately return first emitted value from two Monos while continuing to process the other asynchronously

I have two data sources, each returning a Mono:
class CacheCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
class MasterCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
Callers to my application are hitting a Spring WebFlux controller:
#PostMapping
#ResponseStatus(HttpStatus.CREATED)
public Flux<Entity> createCustomer(#RequestBody Customer customer) {
return customerService.createNewCustomer(entity);
}
As long as either data source successfully completes its create operation, I want to immediately return a success response to the caller, however, I still want my service to continue processing the result of the other Mono stream, in the event that an error was encountered, so it can be logged.
The problem seems to be that as soon as a value is returned to the controller, a cancel signal is propagated back through the stream by Spring WebFlux and, thus, no information is logged about a failure.
Here's one attempt:
public Flux<Entity> createCustomer(final Customer customer) {
var cacheCreate = cacheClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in cache"));
var masterCreate = masterClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in master"));
return Flux.firstWithValue(cacheCreate, masterCreate)
.onErrorMap((err) -> new Exception("Customer creation failed in cache and master"));
}
Flux.firstWithValue() is great for emitting the first non-error value, but then whichever source is lagging behind is cancelled, meaning that any error is never logged out. I've also tried scheduling these two sources on their own Schedulers and that didn't seem to help either.
How can I perform these two calls asynchronously, and emit the first value to the caller, while continuing to listen for emissions on the slower source?
You can achieve that by transforming you operators to "hot" publishers using share() operator:
First subscriber launch the upstream operator, and additional subscribers get back result cached from the first subscriber:
Further Subscriber will share [...] the same result.
Once a second subscription has been done, the publisher is not cancellable:
It's worth noting this is an un-cancellable Subscription.
So, to achieve your requirement:
Apply share() on each of your operators
Launch a subscription on shared publishers to trigger processing
Use shared operators in your pipeline (here firstWithValue).
Sample example:
import java.time.Duration;
import reactor.core.publisher.Mono;
public class TestUncancellableMono {
// Mock a mono successing quickly
static Mono<String> quickSuccess() {
return Mono.delay(Duration.ofMillis(200)).thenReturn("SUCCESS !");
}
// Mock a mono taking more time and ending in error.
static Mono<String> longError() {
return Mono.delay(Duration.ofSeconds(1))
.<String>then(Mono.error(new Exception("ERROR !")))
.doOnCancel(() -> System.out.println("CANCELLED"))
.doOnError(err -> System.out.println(err.getMessage()));
}
public static void main(String[] args) throws Exception {
// Transform to hot publisher
var sharedQuick = quickSuccess().share();
var sharedLong = longError().share();
// Trigger launch
sharedQuick.subscribe();
sharedLong.subscribe();
// Subscribe back to get the cached result
Mono
.firstWithValue(sharedQuick, sharedLong)
.subscribe(System.out::println, err -> System.out.println(err.getMessage()));
// Wait for subscription to end.
Thread.sleep(2000);
}
}
The output of the sample is:
SUCCESS !
ERROR !
We can see that error message has been propagated properly, and that upstream publisher has not been cancelled.

MassTransit Mediator MessageNotConsumedException

I Noticed a weird issue in one of our applications, from time to time, we get MessageNotConsumedException errors on API requests which we route via MT's Mediator.
As you will notice below, we have configured a customer LogFilter<T> which implements IFilter<ConsumeContext<T>> which ensure that we log each mediator message before and after consuming, or a 'ConsumeFailed' log in case an exception is thrown in any consumer.
When the error manifests itself, in the logs we see the following sequence of events:
T 0 : PreConsume logged
T +5ms: PostConsume logged
T +6ms: R-FAULT logged (I believe this logging is made by MT's internals?)
T +9ms: API Request 500 response logged, with `MessageNotConsumedException` as internal error
In the production environment, we see these errors with various timings, it happens in requests taking as 'little' as 9ms, over several seconds up to 30+ seconds.
I've trying to reproduce this problem in my local development environment, and did manage to produce the same sequence of events, but only by adding a delay of 35 seconds inside the consumer (see GetSomethingById class below for consumer body)
If I reduce the delay to 30s or less, the reponse will be fine.
Since the production errors are happening with very low handling times in the consumer, I suspect what I'm able to reproduce is not exactly the same.
However I'd still like to understand why I'm getting the MessageNotConsumedException, since while debugging I can easily step through my entire consumer (after the delay has elapsed) and happily reach the context.RespondAsync() call without any problems. Also while stepping through the consumer, the context.CancellationToken has not been cancelled.
I also came across this question, which sounds exactly like what I'm having, however I did add the HttpContext scope as documented. To be fair, I didn't try this change in production yet, but my local issue with the 35s delay remains unchanged.
I have MassTransit medatior configured as follows:
services.AddHttpContextAccessor();
services.AddMediator(x =>
{
x.AddConsumer<GetSomethingByIdHandler>();
x.ConfigureMediator((context, cfg) =>
{
//The order of using the middleware matters, so don't change this
cfg.UseHttpContextScopeFilter(context); // Extension method & friends copy/pasted from https://masstransit-project.com/usage/mediator.html#http-context-scope
cfg.UseConsumeFilter(typeof(LogFilter<>), context);
});
});
The LogFilter which is configured is the following class:
public class LogFilter<T> : IFilter<ConsumeContext<T>> where T : class
{
private readonly ILogger<LogFilter<T>> _logger;
public LogFilter(ILogger<LogFilter<T>> logger)
{
_logger = logger;
}
public void Probe(ProbeContext context) => context.CreateScope("log-filter");
public async Task Send(ConsumeContext<T> context, IPipe<ConsumeContext<T>> next)
{
LogPreConsume(context);
try
{
await next.Send(context);
}
catch (Exception exception)
{
LogConsumeException(context, exception);
throw;
}
LogPostConsume(context);
}
private void LogPreConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}",
typeof(T).Name,
"PreConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime());
private void LogPostConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}",
typeof(T).Name,
"PostConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime);
private void LogConsumeException(ConsumeContext<T> context, Exception exception) => _logger.LogError(exception,
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with sent time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}"
+ " and message {#message}",
typeof(T).Name,
"ConsumeFailure",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime,
context.Message);
}
I then have a controller method which looks like this:
[Route("[controller]")]
[ApiController]
public class SomethingController : ControllerBase
{
private readonly IMediator _mediator;
public SomethingController(IMediator mediator)
{
_mediator = mediator;
}
[HttpGet("{somethingId}")]
public async Task<IActionResult> GetSomething([FromRoute] int somethingId, CancellationToken ct)
{
var query = new GetSomethingByIdQuery(somethingId);
var response = await _mediator
.CreateRequestClient<GetSomethingByIdQuery>()
.GetResponse<Something>(query, ct);
return Ok(response.Message);
}
}
The consumer which handles this request is as follows:
public record GetSomethingByIdQuery(int SomethingId);
public class GetSomethingByIdHandler : IConsumer<GetSomethingByIdQuery>
{
public async Task Consume(ConsumeContext<GetSomethingByIdQuery> context)
{
await Task.Delay(35000, context.CancellationToken);
await context.RespondAsync(new Something{Name = "Something cool"});
}
}
MessageNotConsumedException is thrown when a message is sent using mediator and that message is not consumed by a consumer. That wouldn't typically be a transient error since one would expect that the consumer remains configured/connected to the mediator for the lifetime of the application.

Stop consumption of message if it cannot be completed

I'm new to mass transit and have a question regarding how I should solve a failure to consume a message. Given the below code I am consuming INotificationRequestContract's. As you can see the code will break and not complete.
public class NotificationConsumerWorker : IConsumer<INotificationRequestContract>
{
private readonly ILogger<NotificationConsumerWorker> _logger;
private readonly INotificationCreator _notificationCreator;
public NotificationConsumerWorker(ILogger<NotificationConsumerWorker> logger, INotificationCreator notificationCreator)
{
_logger = logger;
_notificationCreator = notificationCreator;
}
public Task Consume(ConsumeContext<INotificationRequestContract> context)
{
try
{
throw new Exception("Horrible error");
}
catch (Exception e)
{
// >>>>> insert code here to put message back for later consumption. <<<<<
_logger.LogError(e, "Failed to consume message");
throw;
}
}
}
How do I best handle a scenario such as this where the consumption fails? In my specific case this is likely to occur if a required external service is unavailable.
I can see two solutions.
If there is a way to put the message back, or cancel the consumption so that it will be tried again.
I could store it locally in a database and create my own re-try method to wrap this (but would prefer not to for sake of simplicity).
The exceptions section of the documentation provides sufficient guidance for dealing with consumer exceptions.
There are two retry approaches, which can be used in combination:
Message Retry, which waits while the message is locked, in-process, for the next retry. Therefore, these should be short, to deal with transient issues.
Message Redelivery, which delays the message using either the broker delayed delivery, or a message scheduler, so that it is redelivered to the receive endpoint at some point in the future.
Once all retry/redelivery attempts are exhausted, the message is moved to the _error queue.

What is the purpose of passing CancellationToken to Task.Factory.StartNew()

In the following code a CancellationToken is passed to the .StartNew(,) method as the 2nd parameter, but is only usable by the Action via the closure in the lambda. So, what is the purpose of passing the token through the .StartNew(,) method's 2nd parameter?
var cts = new CancellationTokenSource();
var token = cts.Token;
Task.Factory.StartNew(() =>
{
while (true)
{
// simulate doing something useful
Thread.Sleep(100);
}
}, token);
StartNew method schedules a task in the tread pool, but not necessary start it right at the moment, because threads may be unavailable. During waiting for the start the cancellation request may occur, after which the thread pool wouldn't start a task at all. After task was started it's your job to handle cancellation of the task.
Actually, the purpose of the CancellationToken passed to Task.Run and Taskfactory.StartNew is to allow the task to differentiate being cancelled by an exception thrown from CancellationToken.ThrowIfCancellationRequested and failing because of any other exception.
That is, if the CancellationToken passed at start throws, the task's state is Cancelled while any other exception (even from another CancellationToken) will set it to Faulted.
Also, if the CancellationToken is cancelled before the task actually starts, it won't be started at all.

Should I use ContinueWith() after ReadAsAsync in DelegatingHandler

Say I have a DelegatingHandler that I am using to log api requests. I want to access the request and perhaps response content in order to save to a db.
I can access directly using:
var requestBody = request.Content.ReadAsStringAsync().Result;
which I see in a lot of examples. It is also suggested to use it here by somebody that appears to know what they are talking about. Incidentally, the suggestion was made because the poster was originally using ContinueWith but was getting intermittent issues.
In other places, here, the author explicitly says not to do this as it can cause deadlocks and recommends using ContinueWith instead. This information comes directly from the ASP.net team apparently.
So I am a little confused. The 2 scenarios looks very similar in my eyes so appear to be conflicting.
Which should I be using?
You should use await.
One of the problems with Result is that it can cause deadlocks, as I describe on my blog.
The problem with ConfigureAwait is that by default it will execute the continuation on the thread pool, outside of the HTTP request context.
You can get working solutions with either of these approaches (though as Youssef points out, Result will still have sub-optimal performance), but why bother? await does it all for you: no deadlocks, optimal threading, and resuming within the HTTP request context.
var requestBody = await request.Content.ReadAsStringAsync();
Edit for .NET 4.0: First, I strongly recommend upgrading to .NET 4.5. The ASP.NET runtime was enhanced in .NET 4.5 to properly handle Task-based async operations. So the code below may or may not work if you install WebAPI into a .NET 4.0 project.
That said, if you want to try properly using the old-school ContinueWith, something like this should work:
protected override Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request,
CancellationToken cancellationToken)
{
var context = TaskScheduler.FromCurrentSynchronizationContext();
var tcs = new TaskCompletionSource<HttpResponseMessage>();
HttpResponseMessage ret;
try
{
... // logic before you need the context
}
catch (Exception ex)
{
tcs.TrySetException(ex);
return tcs.Task;
}
request.Content.ReadAsStringAsync().ContinueWith(t =>
{
if (t.Exception != null)
{
tcs.TrySetException(t.Exception.InnerException);
return;
}
var content = t.Result;
try
{
... // logic after you have the context
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
tcs.TrySetResult(ret);
}, context);
return tcs.Task;
}
And now it becomes clear why await is so much better...
Calling .Result synchronously blocks the thread until the task completes. This is not a good thing because the thread is just spinning waiting for the async operation to complete.
You should prefer using ContinueWith or even better, async and await if you're using .NET 4.5. Here's a good resource for you to learn more about that:
http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx

Resources