How to avoid saturation in Akka HTTP (latency spikes)? - performance

I have a akka-http (Scala) API server that serves data to a NodeJS server.
In moments after startup, everything works fine, everything is fast. Latency is low. But suddenly, latency increases fastly. The API no longer responds, and the website becomes unusable.
The strange thing is that the traffic and the requests count remain stable. Latency spikes seem decorrelated from them.
I guess this saturation is achieved with the blocking of all the threads in akka thread pool. Unfortunately, my Akka dispatcher is blocking, because I'm doing a lot of SQL queries (in MySQL) and because I'm not using a reactive library. I'm using Slick 2, which is, contrary to Slick 3, blocking-only.
Here's my dispatcher :
http-blocking-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 46
}
throughput = 1
}
So, my question is, how to avoid this sort of bottling ? How to keep a latency proportional with the traffic ? Is there a way to evict the requests that cause the saturation in order to prevent them to compromise everything ?
Thank you !

You should not use Akka's own thread pool for running long blocking tasks. Create your own thread pool, and run your slick queries using it, leaving free threads for akka. That's for your first 2 questions.
I don't know of any good answer for the last one. You could maybe look into specific slick settings to set a timeout on sql queries, but I don't know if such things exist. Otherwise try to analyse why your queries takes so much time, could you be missing an index or two?

Related

How to implement a cache in a Vertx application?

I have an application that at some point has to perform REST requests towards another (non-reactive) system. It happens that a high number of requests are performed towards exactly the same remote resource (the resulting HTTP request is the same).
I was thinking to avoid flooding the other system by using a simple cache in my app.
I am in full control of the cache and I have proper moments when to invalidate it, so this is not an issue. Without this cache, I'm running into other issues, like connection timeout or read timeout, the other system having troubles with high load.
Map<String, Future<Element>> cache = new ConcurrentHashMap<>();
Future<Element> lookupElement(String id) {
String key = createKey(id);
return cache.computeIfAbsent(key, key -> {
return performRESTRequest(id);
}.onSucces( element -> {
// some further processing
}
}
As I mentioned lookupElement() is invoked from different worker threads with same id.
The first thread will enter in the computeIfAbsent and perform the remote quest while the other threads will be blocked by ConcurrentHashMap.
However, when the first thread finishes, the waiting threads will receive the same Future object. Imagine 30 "clients" reacting to the same Future instance.
In my case this works quite fine and fast up to a particular load, but when the processing input of the app increases, resulting in even more invocations to lookupElement(), my app becomes slower and slower (although it reports 300% CPU usage, it logs slowly) till it starts to report OutOfMemoryException.
My questions are:
Do you see any Vertx specific issue with this approach?
Is there a more Vertx friendly caching approach I could use when there is a high concurrency on the same cache key?
Is it a good practice to cache the Future?
So, a bit unusual to respond to my own question, but I managed to solve the problem.
I was having two dilemmas:
Is ConcurentHashMap and computeIfAbsent() appropriate for Vertx?
Is it safe to cache a Future object?
I am using this caching approach in two places in my app, one for protecting the REST endpoint, and one for some more complex database query.
What was happening is that for the database query there was up to 1300 "clients" waiting for a response. Or 1300 listeners waiting for an onSuccess() of the same Future. When the Future was emitting strange things were happening. Some kind of thread strangulation.
I did a bit of refactoring eliminating this concurrency on the same resource/key, but I did kept both caches and things went back to normal.
In conclusion I think my caching approach is safe as long as we have enough spreading or in other words, we don't have such a high concurrency on the same resource. Having 20-30 listeners on the same Future works just fine.

How to use Pomelo.EntityFrameworkCore.MySql provider for ef core 3 in async mode properly?

We are building an asp.net core 3 application which uses ef core 3.0 with Pomelo.EntityFrameworkCore.MySql provider 3.0.
Right now we are trying to replace all database calls from sync to async, like:
//from
dbContext.SaveChanges();
//to
await dbContext.SaveChangesAsync();
Unfortunetly when we do it we expereince two issues:
Number of connections to the server grows significatntly compared to the same tests for sync calls
Average processing speed of our application drops significantly
What is the recommended way to use ef core with mysql asynchronously? Any working example or evidence of using ef-core 3 with MySql asynchonously would be appreciated.
It's hard to say what the issue here is without seeing more code. Can you provide us with a small sample app that reproduces the issue?
Any DbContext instance uses exactly one database connection for normal operations, independent of whether you call sync or async methods.
Number of connections to the server grows significatntly compared to the same tests for sync calls
What kind of tests are we talking about? Are they automated? If so, how many tests are being run? Because of the nature of async calls, if you run 1000 tests in parallel, every test with its own DbContext, you will end up with 1000 parallel connections.
Though with Pomelo, you will not end up additionally with 1000 threads, as you would with using Oracle's provider.
Update:
We test asp.net core call (mvc) which goes to db and read and writes something. 50 threads, using DbContextPool with limit 500. If i use dbContext.SaveChanges(), Add(), all context methods sync, I am land up with around 50 connections to MySql, using dbContext.SaveChangesAsnyc() also AddAsnyc, ReadAsync etc, I end up seeing max 250 connection to MySql and the average response time of the page drops by factor of 2 to 3.
(I am talking about ASP.NET and requests below. The same is true for parallel run test cases.)
If you use Async methods all the way, nothing will block, so your 50 threads are free to handle the next 50 requests while the database is still executing the queries for the first 50 requests.
This will happen again and again because ASP.NET might process your requests faster than your database can return its results. So you will end up with a lot of parallel database queries.
This does not happen when executing the Sync methods, because every thread blocks and you end up with a maximum of 50 parallel queries (one per thread).
So this is expected behavior and just a consequence of async method calls.
You can always modify your code or web server configuration to limit the amount of concurrent ASP.NET requests.
50 threads, using DbContextPool with limit 500.
Also be aware that DbContextPool does not limit how many DbContext objects can concurrently exist, but only how many will be kept in the pool. So if you set DbContextPool to 500, you can create more than 500 contexts, but only 500 will be kept alive after using them.
Update:
There is a very interesting low level talk about lock-free database pool programming from #roji that addresses this behavior and takes your position, that there should be an upper limit in the connection pool that should result in blocking when exceeded and makes a great case for this behavior.
According to #bgrainger from MySqlConnector, that is how it is already implemented (the docs did not explicitly state this, but they do now). The MaxPoolSize connection string option has a default value of 100, so if you use connection pooling and if you don't overwrite this value and if you don't use multiple connection pools, you should not have more than 100 connections active at a given time.
From GitHub:
This is a documentation error, if you are interpreting the docs to mean that you can create an unlimited number of connections.
When pooling is true, each connection pool (there is one per unique connection string) only allows MaximumPoolSize connections to be open simultaneously. Each additional call to MySqlConnection.Open will block until a connection is returned to the pool.
When pooling is false, there is no limit to the number of connections that can be opened simultaneously; it's up to the user to manage the concurrency.
Check to see whether you have Pooling=false in your connection string, as mentioned by Bradley Grainger in comments.
After I removed pooling=false from my connection string, my app ran literally 3x faster.

Hard limit connections Spring Boot

I'm working on a simple micro service written in Spring Boot. This service will act as a proxy towards another resources that have a hard concurrent connection limit and the requests take a while to process.
I would like to impose a hard limit on concurrent connections allowed to my micro service and rejecting any with either a 503 or on tcp/ip level. I've tried to look into different configurations that can be made for Jetty/Tomcat/Undertow but haven't figured out yet something completely convincing.
I found some settings regulating thread pools:
server.tomcat.max-threads=0 # Maximum amount of worker threads.
server.undertow.io-threads= # Number of I/O threads to create for the worker.
server.undertow.worker-threads= # Number of worker threads.
server.jetty.acceptors= # Number of acceptor threads to use.
server.jetty.selectors= # Number of selector threads to use.
But if understand correctly these are all configuring thread pool sizes and will just result in connections being queued on some level.
This seems like really interesting, but this hasn't been merged in yet and is targeted for Spring Boot 1.5 , https://github.com/spring-projects/spring-boot/pull/6571
Am I out of luck using a setting for now? I could of course implement a filter but would rather block it on an earlier level and not have to reinvent the wheel. I guess using apache or something else in front is also an option, but still that feels like an overkill.
Try to look at EmbeddedServletContainerCustomizer
this gist could give you and idea how to do that.
TomcatEmbeddedServletContainerFactory factory = ...;
factory.addConnectorCustomizers(connector ->
((AbstractProtocol) connector.getProtocolHandler()).setMaxConnections(10000));

Golang app-engine performance parameters

Using stock out-of-the-box configuration on a golang app-engine project, I am getting very disappointing performance. Any hints on what I might be missing? How should a golang google app be optimized?
Sending a few dozen requests, not more than six concurrently, I find only one instance handling all the requests, up to six requests concurrently (not sequentially) on that one instance - where I expected to see up to six instances. Possibly as a result, things seem to be blocking. I am seeing many timeouts, even on administrative functions like blobstore.Create(), which didn't happen when requests were being sent and processed individually.
EDIT1: These three lines
context.Infof("Sending request to blobstore to create %s as %s", Name, MimeType)
blobWriter, err := blobstore.Create(context, MimeType)
if err!=nil {
context.Warningf("Unable to access content store: %v",err)
}
are producing:
I 12:47:36.201 Sending request to blobstore to create download.jpg as application/octet-stream
W 12:47:41.251 Unable to access content store: Canceled: Deadline exceeded (timeout)
On failure here it is always about five seconds in blobstore.Create (a few milliseconds when it passes). Timeouts also occur in blobstore.Write and blobstore.Close and datastore, but with 20 to 30 second delays.
--End EDIT1.
There also seem to be performance issues. There is one computationally intensive bit, taking nearly a second to complete on my home machine (at 1.7GHz). According to the logged time stamps, that same code running on the remote app-engine (at 600MHz) is taking over 30 seconds on average, with a maximum of 109 seconds. That doesn't seem right!
EDIT2: The most computationally intensive bit used the resize function:
https://code.google.com/p/appengine-go/source/browse/example/moustachio/resize/resize.go
(with the obvious bug fixes). Not the most efficient resizer, but fast enough for now in a stand-along app. However it runs an order of magnitude slower in appengine (either the local SDK version 1.9 or running on Google's servers). Perhaps Google's version of the image library is slower? Probably the library? - A recursive fibonacci computation runs inside appengine in the same time as outside (same order of magnitude as C code).
--- End EDIT2
Any hints on how to get google app performance more similar to a multi-threaded stand-along application? So far these preliminary scaling experiments have been a miserable failure!
UPDATE: Using runtime.GOMAXPROCS(6), for a maximum of 6 concurrent requests, made no measurable difference. When using "manual_scaling" with more instances that requests was helpful, requests usually get assigned to different instances, but sometimes not - leading to problems.
A partial solution: Segregate computationally intensive requests on a separate module, running on separate instances, so that they do not block smaller more time-sensitive requests. Next, break down larger functions into several smaller requests, so that several can run "concurrently" on the same instance without timing out? (Make the client send several requests to do one job!)
It would be much better if I could ask the appengine just to start new instances for each request when none are available. Experimentally, starting a new instance is much cheaper than running two requests in slow motion on one instance.

Big difference in throughput when using MSMQ as subscription storage for Nservicebus as opposed to RavenDB

Recently we noticed that our Nservicebus subscribers were not able to handle the increasing load. We have a fairly constant input stream of events (measurement data from embedded devices), so it is very important that the throughput follows the input.
After some profiling, we concluded that it was not the handling of the events that was taking a lot of time, but rather the NServiceBus process of retrieving and publishing events. To try to get a better idea of what goes on, I recreated the Pub/Sub sample (http://particular.net/articles/nservicebus-step-by-step-publish-subscribe-communication-code-first).
On my laptop, using all the NServiceBus defaults, the maximum throughput of the Ordering.Server is about 10 events/second. The only thing it does is
class PlaceOrderHandler : IHandleMessages<PlaceOrder>
{
public IBus Bus { get; set; }
public void Handle(PlaceOrder message)
{
Bus.Publish<OrderPlaced>
(e => { e.Id = message.Id; e.Product = message.Product; });
}
}
I then started to play around with configuration settings. None seem to have any impact on this (very low) performance:
Configure.With()
.DefaultBuilder()
.UseTransport<Msmq>()
.MsmqSubscriptionStorage();
With this configuration, the throughput instantly went up to 60 messages/sec.
I have two questions:
When using MSMQ as subscription storage, the performance is much better than RavenDB. Why does something as trivial as the storage for subscription data have such an impact?
I would have expected a much higher performance. Are there any other configuration settings that I should use to get at least one order of magnitude better than this? On our servers, the maximum throughput when running this sample is about 200 msg/s. This is far from spectacular for a system that doesn't even do anything useful yet.
MSMQ doesn't have native pub/sub capabilities so NServiceBus adds support this by storing the list of subscribers and then looping over that list sending a copy of the event to each of the subscribers. This translates to X message queuing operations where X is the number of subscribers. This explains why RabbitMQ is faster since it has native pub/sub so you would only need one operation against the broker.
The reason the storage based on a msmq queue is faster is that it's a local storage (can't be used if you need to scaleout the endpoint) and that means that we can cache the data since that can't be any other endpoint instances updating the storage. In short this means that we get away with a in memory lookup which as you can see is the fastest option.
There are plans to add native caching across all storages:
https://github.com/Particular/NServiceBus/issues/1320
200 msg/s sounds quite low, what number do you get if you skip the bus.Publish? (just to get a base line)
Possibility 1: distributed transactions
Distributed transactions are created when processing messages because of the combination Queue-Database.
Try measuring without transactional handling of the messages. How does that compare?
Possibility 2: msmq might not be the best queueing system for your needs
Ever considered switching to rabbitmq for transport? I have very good experiences with RabbitMq in combination with MassTransit. Way exceed the numbers you are mentioning in your question.

Resources