How to implement a cache in a Vertx application? - caching

I have an application that at some point has to perform REST requests towards another (non-reactive) system. It happens that a high number of requests are performed towards exactly the same remote resource (the resulting HTTP request is the same).
I was thinking to avoid flooding the other system by using a simple cache in my app.
I am in full control of the cache and I have proper moments when to invalidate it, so this is not an issue. Without this cache, I'm running into other issues, like connection timeout or read timeout, the other system having troubles with high load.
Map<String, Future<Element>> cache = new ConcurrentHashMap<>();
Future<Element> lookupElement(String id) {
String key = createKey(id);
return cache.computeIfAbsent(key, key -> {
return performRESTRequest(id);
}.onSucces( element -> {
// some further processing
}
}
As I mentioned lookupElement() is invoked from different worker threads with same id.
The first thread will enter in the computeIfAbsent and perform the remote quest while the other threads will be blocked by ConcurrentHashMap.
However, when the first thread finishes, the waiting threads will receive the same Future object. Imagine 30 "clients" reacting to the same Future instance.
In my case this works quite fine and fast up to a particular load, but when the processing input of the app increases, resulting in even more invocations to lookupElement(), my app becomes slower and slower (although it reports 300% CPU usage, it logs slowly) till it starts to report OutOfMemoryException.
My questions are:
Do you see any Vertx specific issue with this approach?
Is there a more Vertx friendly caching approach I could use when there is a high concurrency on the same cache key?
Is it a good practice to cache the Future?

So, a bit unusual to respond to my own question, but I managed to solve the problem.
I was having two dilemmas:
Is ConcurentHashMap and computeIfAbsent() appropriate for Vertx?
Is it safe to cache a Future object?
I am using this caching approach in two places in my app, one for protecting the REST endpoint, and one for some more complex database query.
What was happening is that for the database query there was up to 1300 "clients" waiting for a response. Or 1300 listeners waiting for an onSuccess() of the same Future. When the Future was emitting strange things were happening. Some kind of thread strangulation.
I did a bit of refactoring eliminating this concurrency on the same resource/key, but I did kept both caches and things went back to normal.
In conclusion I think my caching approach is safe as long as we have enough spreading or in other words, we don't have such a high concurrency on the same resource. Having 20-30 listeners on the same Future works just fine.

Related

Task.WhenAll with Select is a footgun - but why?

Consider: you have a collection of user ids and want to load the details of each user represented by their id from an API. You want to bag up all of those users into some kind of collection and send it back to the calling code. And you want to use LINQ.
Something like this:
var userTasks = userIds.Select(userId => GetUserDetailsAsync(userId));
var users = await Task.WhenAll(tasks); // users is User[]
This was fine for my app when I had relatively few users. But, there came a point where it didn't scale. When it got to the point of thousands of users, this resulted in thousands of HTTP requests being fired concurrently and bad things started to happen. Not only did we realise we were launching a denial of service attack on the API we were consuming as did this, we were also bringing our own application to the point of collapse through thread starvation.
Not a proud day.
Once we realised that the cause of our woes was a Task.WhenAll / Select combo, we were able to move away from that pattern. But my question is this:
What is going wrong here?
As I read around on the topic, this scenario seems well described by #6 on Mark Heath's list of Async antipatterns: "Excessive parallelization":
Now, this does "work", but what if there were 10,000 orders? We've flooded the thread pool with thousands of tasks, potentially preventing other useful work from completing. If ProcessOrderAsync makes downstream calls to another service like a database or a microservice, we'll potentially overload that with too high a volume of calls.
Is this actually the reason? I ask as my understanding of async / await becomes less clear the more I read about the topic. It's very clear from many pieces that "threads are not tasks". Which is cool, but my code appears to be exhausting the number of threads that ASP.NET Core can handle.
So is that what it is? Is my Task.WhenAll and Select combo exhausting the thread pool or similar? Or is there another explanation for this that I'm not aware of?
Update:
I turned this question into a blog post with a little more detail / waffle. You can find it here: https://blog.johnnyreilly.com/2020/06/taskwhenall-select-is-footgun.html
N+1 Problem
Putting threads, tasks, async, parallelism to one side, what you describe is an N+1 problem, which is something to avoid for exactly what happened to you. It's all well and good when N (your user count) is small, but it grinds to a halt as the users grow.
You may want to find a different solution. Do you have to do this operation for all users? If so, then maybe switch to a background process and fan-out for each user.
Back to the footgun (I had to look that up BTW 🙂).
Tasks are a promise, similar to JavaScript. In .NET they may complete on a separate thread - usually a thread from the thread pool.
In .NET Core, they usually do complete on a separate thread if not complete and the point of awaiting, for an HTTP request that is almost certain to be the case.
You may have exhausted the thread pool, but since you're making HTTP requests, I suspect you've exhausted the number of concurrent outbound HTTP requests instead. "The default connection limit is 10 for ASP.NET hosted applications and 2 for all others." See the documentation here.
Is there a way to achieve some parallelism and not take exhaust a resource (threads or http connections)? - Yes.
Here's a pattern I often implement for just this reason, using Batch() from morelinq.
IEnumerable<User> users = Enumerable.Empty<User>();
IEnumerable<IEnumerable<string>> batches = userIds.Batch(10);
foreach (IEnumerable<string> batch in batches)
{
Task<User> batchTasks = batch.Select(userId => GetUserDetailsAsync(userId));
User[] batchUsers = await Task.WhenAll(batchTasks);
users = users.Concat(batchUsers);
}
You still get ten asynchronous HTTP requests to GetUserDetailsAsync(), and you don't exhaust threads or concurrent HTTP requests (or at least max out with the 10).
Now if this is a heavily used operation or the server with GetUserDetailsAsync() is heavily used elsewhere in the app, you may hit the same limits when your system is under load, so this batching is not always a good idea. YMMV.
You already have an excellent answer here, but just to chime in:
There's no problem with creating thousands of tasks. They're not threads.
The core problem is that you're hitting the API way too much. So the best solutions are going to change how you call that API:
Do you really need user details for thousands of users, all at once? If this is for a dashboard display, then change your API to enforce paging; if this is for a batch process, then see if you can access the data directly from the batch process.
Use a batch route for that API if it supports one.
Use caching if possible.
Finally, if none of the above are possible, look into throttling the API calls.
The standard pattern for asynchronous throttling is to use SemaphoreSlim, which looks like this:
using var throttler = new SemaphoreSlim(10);
var userTasks = userIds.Select(async userId =>
{
await throttler.WaitAsync();
try { await GetUserDetailsAsync(userId); }
finally { throttler.Release(); }
});
var users = await Task.WhenAll(tasks); // users is User[]
Again, this kind of throttling is best only if you can't make the design changes to avoid thousands of API calls in the first place.
While there is no thread waiting for async operation if the async operation is pure, there is a thread for continuation, so assuming that your GetUserDetailsAsync will await for some IO-bound operation the continuation (parsing output, returning result ...) will need to run on some thread so your Task.Result which was created by GetUserDetailsAsync can be set, so every one of them will wait for a thread from thread pool to finish.

Cache invalidation algorithm

I'm thinking about caching dynamic content in web server. My goal is to bridge the whole processing by returning a cached HTTP response without bothering the DB (or Hibernate). This question is not about choosing between existing caching solutions; my current concern is the invalidation.
I'm sure, a time-based invalidation makes no sense at all: Whenever a user changes anything, they expect to see to the effect immediately rather than in a few seconds or even minutes. And caching for a fraction of a second is useless as there are no repeated requests for the same data in such a short period (since most of the data is user-specific).
For every data change, I get an event and can use it to invalidate everything depending on the changed data. As request happen concurrently, there are two time-related problems:
Invalidation may come too late and stale data may be served even to the client who changed them.
After the invalidation has finished, a long running request may finish and its stale data may get put into the cache.
The two problems are sort of opposite to each other. I guess, the former is easily solved by partially serializing requests from the same client using a ReadWriteLock per client. So let's forget it.
The latter is more serious as it basically means a lost invalidation and serving the stale data forever (or too long).
I can imagine a solution like repeating the invalidation after every request having started before the change happened, but this sounds rather complicated and time-consuming. I wonder if any existing caches do support this, but I'm mainly interested in how this gets done in general.
Clarification
The problem is a simple race condition:
Request A executes a query and fetches the result
Request B does some changes
The invalidation due to B happens
Request A (which was delayed for whatever reason) finishes
The obsolete response by request A gets written into the cache
To solve the race condition, add a timestamp (or a counter) and check this timestamp when setting a new cache entry.
This ensures that obsolete response will not be cached.
Here's a pseudocode:
//set new cache entry if resourceId is not cached
//or if existing entry is stale
function setCache(resourceId, requestTimestamp, responseData) {
if (cache[resourceId]) {
if (cache[resourceId].timestamp > requestTimestamp) {
//existing entry is newer
return;
} else
if (cache[resourceId].timestamp = requestTimestamp) {
//ensure invalidation
responseData = null;
}
}
cache[resourceId] = {
timestamp: requestTimestamp,
response: responseData
};
}
Let's say we got 2 requests for the same resource "foo":
Request A (received at 00:00:00.000) executes a query and fetches the result
Request B (received at 00:00:00.001) does some changes
The invalidation due to B happens by calling setCache("foo", "00:00:00.001", null)
Request A finishes
Request A calls setCache("foo", "00:00:00.000", ...) to write the obsolete response to cache but fails because the existing entry is newer
This is just the basic mechanism, so there is room for improvements.
I think you don't realize (or don't want to explicitly call out) that you are asking about a choice between cache synchronization strategies. There are several well known strategies: "cache aside", "read through", "write through", and "write behind". e.g. read here: A beginner’s guide to Cache synchronization strategies. They offer various levels of cache consistency (invalidation as you call it).
Your choice should depend on your needs and requirements.
It sounds like so far you've chosen "write behind" strategy (queue or defer cache invalidation). But from your concerns it sounds like you've chosen it incorrectly, because you are worried about inconsistent cache reads.
So, you should consider using "cache aside" or "read/write through" strategies, because those offer better cache consistency. They all are different flavors of the same thing - always keep cache consistent. If you don't care about cache consistency, then ok, stay with "write behind", but then this question becomes irrelevant.
Architecture wide, I would never go with raising events to invalidate the cache, because it seems like you've made it part of your business logic, while it's just an infrastructure concern. Invalidate (or queue invalidation of) cache as part of read/write operations, and not somewhere else. That allows cache to become just one aspect of your infrastructure, and not part of everything else.

How to avoid saturation in Akka HTTP (latency spikes)?

I have a akka-http (Scala) API server that serves data to a NodeJS server.
In moments after startup, everything works fine, everything is fast. Latency is low. But suddenly, latency increases fastly. The API no longer responds, and the website becomes unusable.
The strange thing is that the traffic and the requests count remain stable. Latency spikes seem decorrelated from them.
I guess this saturation is achieved with the blocking of all the threads in akka thread pool. Unfortunately, my Akka dispatcher is blocking, because I'm doing a lot of SQL queries (in MySQL) and because I'm not using a reactive library. I'm using Slick 2, which is, contrary to Slick 3, blocking-only.
Here's my dispatcher :
http-blocking-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 46
}
throughput = 1
}
So, my question is, how to avoid this sort of bottling ? How to keep a latency proportional with the traffic ? Is there a way to evict the requests that cause the saturation in order to prevent them to compromise everything ?
Thank you !
You should not use Akka's own thread pool for running long blocking tasks. Create your own thread pool, and run your slick queries using it, leaving free threads for akka. That's for your first 2 questions.
I don't know of any good answer for the last one. You could maybe look into specific slick settings to set a timeout on sql queries, but I don't know if such things exist. Otherwise try to analyse why your queries takes so much time, could you be missing an index or two?

Golang app-engine performance parameters

Using stock out-of-the-box configuration on a golang app-engine project, I am getting very disappointing performance. Any hints on what I might be missing? How should a golang google app be optimized?
Sending a few dozen requests, not more than six concurrently, I find only one instance handling all the requests, up to six requests concurrently (not sequentially) on that one instance - where I expected to see up to six instances. Possibly as a result, things seem to be blocking. I am seeing many timeouts, even on administrative functions like blobstore.Create(), which didn't happen when requests were being sent and processed individually.
EDIT1: These three lines
context.Infof("Sending request to blobstore to create %s as %s", Name, MimeType)
blobWriter, err := blobstore.Create(context, MimeType)
if err!=nil {
context.Warningf("Unable to access content store: %v",err)
}
are producing:
I 12:47:36.201 Sending request to blobstore to create download.jpg as application/octet-stream
W 12:47:41.251 Unable to access content store: Canceled: Deadline exceeded (timeout)
On failure here it is always about five seconds in blobstore.Create (a few milliseconds when it passes). Timeouts also occur in blobstore.Write and blobstore.Close and datastore, but with 20 to 30 second delays.
--End EDIT1.
There also seem to be performance issues. There is one computationally intensive bit, taking nearly a second to complete on my home machine (at 1.7GHz). According to the logged time stamps, that same code running on the remote app-engine (at 600MHz) is taking over 30 seconds on average, with a maximum of 109 seconds. That doesn't seem right!
EDIT2: The most computationally intensive bit used the resize function:
https://code.google.com/p/appengine-go/source/browse/example/moustachio/resize/resize.go
(with the obvious bug fixes). Not the most efficient resizer, but fast enough for now in a stand-along app. However it runs an order of magnitude slower in appengine (either the local SDK version 1.9 or running on Google's servers). Perhaps Google's version of the image library is slower? Probably the library? - A recursive fibonacci computation runs inside appengine in the same time as outside (same order of magnitude as C code).
--- End EDIT2
Any hints on how to get google app performance more similar to a multi-threaded stand-along application? So far these preliminary scaling experiments have been a miserable failure!
UPDATE: Using runtime.GOMAXPROCS(6), for a maximum of 6 concurrent requests, made no measurable difference. When using "manual_scaling" with more instances that requests was helpful, requests usually get assigned to different instances, but sometimes not - leading to problems.
A partial solution: Segregate computationally intensive requests on a separate module, running on separate instances, so that they do not block smaller more time-sensitive requests. Next, break down larger functions into several smaller requests, so that several can run "concurrently" on the same instance without timing out? (Make the client send several requests to do one job!)
It would be much better if I could ask the appengine just to start new instances for each request when none are available. Experimentally, starting a new instance is much cheaper than running two requests in slow motion on one instance.

Dealing with concurrency issues when caching for high-traffic sites

I was asked this question in an interview:
For a high traffic website, there is a method (say getItems()) that gets called frequently. To prevent going to the DB each time, the result is cached. However, thousands of users may be trying to access the cache at the same time, and so locking the resource would not be a good idea, because if the cache has expired, the call is made to the DB, and all the users would have to wait for the DB to respond. What would be a good strategy to deal with this situation so that users don't have to wait?
I figure this is a pretty common scenario for most high-traffic sites these days, but I don't have the experience dealing with these problems--I have experience working with millions of records, but not millions of users.
How can I go about learning the basics used by high-traffic sites so that I can be more confident in future interviews? Normally I would start a side project to learn some new technology, but it's not possible to build out a high-traffic site on the side :)
The problem you were asked on the interview is the so-called Cache miss-storm - a scenario in which a lot of users trigger regeneration of the cache, hitting in this way the DB.
To prevent this, first you have to set soft and hard expiration date. Lets say the hard expiration date is 1 day, and the soft 1 hour. The hard is one actually set in the cache server, the soft is in the cache value itself (or in another key in the cache server). The application reads from cache, sees that the soft time has expired, set the soft time 1 hour ahead and hits the database. In this way the next request will see the already updated time and won't trigger the cache update - it will possibly read stale data, but the data itself will be in the process of regeneration.
Next point is: you should have procedure for cache warm-up, e.g. instead of user triggering cache update, a process in your application to pre-populate the new data.
The worst case scenario is e.g. restarting the cache server, when you don't have any data. In this case you should fill cache as fast as possible and there's where a warm-up procedure may play vital role. Even if you don't have a value in the cache, it would be a good strategy to "lock" the cache (mark it as being updated), allow only one query to the database, and handle in the application by requesting the resource again after a given timeout
You could probably be better of using some distributed cache repository, as memcached, or others depending your access pattern.
You could use the Cache implementation of Google's Guava library if you want to store the values inside the application.
From the coding point of view, you would need something like
public V get(K key){
V value = map.get(key);
if (value == null) {
synchronized(mutex){
value = map.get(key);
if (value == null) {
value = db.fetch(key);
map.put(key, value);
}
}
}
return value;
}
where the map is a ConcurrentMap and the mutex is just
private static Object mutex = new Object();
In this way, you will have just one request to the db per missing key.
Hope it helps! (and don't store null's, you could create a tombstone value instead!)
Cache miss-storm or Cache Stampede Effect, is the burst of requests to the backend when cache invalidates.
All high concurrent websites I've dealt with used some kind of caching front-end. Bein Varnish or Nginx, they all have microcaching and stampede effect suppression.
Just google for Nginx micro-caching, or Varnish stampede effect, you'll find plenty of real world examples and solutions for this sort of problem.
All boils down to whether or not you'll allow requests pass through cache to reach backend when it's in Updating or Expired state.
Usually it's possible to actively refresh cache, holding all requests to the updating entry, and then serve them from cache.
But, there is ALWAYS the question "What kind of data are you supposed to be caching or not", because, you see, if it is just plain text article, which get an edit/update, delaying cache update is not as problematic than if your data should be exactly shown on thousands of displays (real-time gaming, financial services, and so on).
So, the correct answer is, microcache, suppression of stampede effect/cache miss storm, and of course, knowing which data to cache when, how and why.
It is worse to consider particular data type for caching only if data consumers are ready for getting stale date (in reasonable bounds).
In such case you could define invalidation/eviction/update policy to keep you data up-to-date (in business meaning).
On update you just replace data item in cache and all new requests will be responsed with new data
Example: Stocks info system. If you do not need real-time price info it is reasonable to keep in cache stock and update it every X mils/secs with expensive remote call.
Do you really need to expire the cache. Can you have an incremental update mechanism using which you can always increment the data periodically so that you do not have to expire your data but keep on refreshing it periodically.
Secondly, if you want to prevent too many users from hiting the db in one go, you can have a locking mechanism in your stored proc (if your db supports it) that prevents too many people hitting the db at the same time. Also, you can have a caching mechanism in your db so that if someone is asking for the exact same data from the db again, you can always return a cached value
Some applications also use a third service layer between the application and the database to protect the database from this scenario. The service layer ensures that you do not have the cache miss storm in the db
The answer is to never expire the Cache and have a background process update cache periodically. This avoids the wait and the cache-miss storms, but then why use cache in this scenario?
If your app will crash with a "Cache miss" scenario, then you need to rethink your app and what is cache verses needed In-Memory data. For me, I would use an In Memory database that gets updated when data is changed or periodically, not a Cache at all and avoid the aforementioned scenario.

Resources