Kotlin: Create custom CoroutineContext - jdbc

I'm using Kotlin in my API backend. I don't want to run db queries in the common pool. Basically, I want to create a CoroutineContext that has a number of threads that matches the database maximumPoolSize.
What's the best way to accomplish this (generally and for my specific use case)? I know Kotlin provides contexts out of the box, but what's the best approach to create my own?
Bonus question: If I have a jdbc connection pool size of 3, does it make sense to use a coroutinecontext with a thread pool size of 3? Can this guarantee the best concurrency possible?

The function newFixedThreadPoolContext is now considered obsolete with current version of Kotlin coroutines (1.3.0), as it is now annotated with #ObsoleteCoroutinesApi and it will give you a warning if you would try to use. The documentation also states that it will be replaced in the future.
The recommended way to create a CoroutineContext is now through
Executors.newFixedThreadPool(3).asCoroutineDispatcher()
So a complete example with imports, where also creating a CoroutineScope, would look like this
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.asCoroutineDispatcher
import java.util.concurrent.Executors
import kotlin.coroutines.CoroutineContext
fun coroutineScope(threads: Int): CoroutineScope {
val context: CoroutineContext = Executors.newFixedThreadPool(threads).asCoroutineDispatcher()
return CoroutineScope(context)
}

You can create a CoroutineContext that's backed by a thread pool with a fixed number of threads using newFixedThreadPoolContext:
val myContext = newFixedThreadPoolContext(nThreads = 3, name = "My JDBC context")
And yes, it seems like a good idea to match your thread pool's size to the connection pool's size, because that way your threads (assuming they each use one connection at a time) will always have a database connection ready for them - here's a blog post suggesting the same.

The answer by zsmb13 works perfectly, but Android Studio warns that newFixedThreadPoolContext is a delicate API and should only be used in specific cases.
The recommended alternative (as of Spring 2022) for limited parallelism is limitedParallelism:
Creates a view of the current dispatcher that limits the parallelism to the given value. The resulting view uses the original dispatcher for execution, but with the guarantee that no more than parallelism coroutines are executed at the same time.
This method does not impose restrictions on the number of views or the total sum of parallelism values, each view controls its own parallelism independently with the guarantee that the effective parallelism of all views cannot exceed the actual parallelism of the original dispatcher.
(from https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-coroutine-dispatcher/limited-parallelism.html)
So an alternative solution would be to create a view on the thread pool used for the db connection like:
val dbDispatcher = Dispatchers.IO.limitedParallelism(maximumPoolSize)
and then use this as coroutine dispatcher.

Related

Batching stores transparently

We are using the following frameworks and versions:
jOOQ 3.11.1
Spring Boot 2.3.1.RELEASE
Spring 5.2.7.RELEASE
I have an issue where some of our business logic is divided into logical units that look as follows:
Request containing a user transaction is received
This request contains various information, such as the type of transaction, which products are part of this transaction, what kind of payments were done, etc.
These attributes are then stored individually in the database.
In code, this looks approximately as follows:
TransactionRecord transaction = transactionRepository.create();
transaction.create(creationCommand);`
In Transaction#create (which runs transactionally), something like the following occurs:
storeTransaction();
storePayments();
storeProducts();
// ... other relevant information
A given transaction can have many different types of products and attributes, all of which are stored. Many of these attributes result in UPDATE statements, while some may result in INSERT statements - it is difficult to fully know in advance.
For example, the storeProducts method looks approximately as follows:
products.forEach(product -> {
ProductRecord record = productRepository.findProductByX(...);
if (record == null) {
record = productRepository.create();
record.setX(...);
record.store();
} else {
// do something else
}
});
If the products are new, they are INSERTed. Otherwise, other calculations may take place. Depending on the size of the transaction, this single user transaction could obviously result in up to O(n) database calls/roundtrips, and even more depending on what other attributes are present. In transactions where a large number of attributes are present, this may result in upwards of hundreds of database calls for a single request (!). I would like to bring this down as close as possible to O(1) so as to have more predictable load on our database.
Naturally, batch and bulk inserts/updates come to mind here. What I would like to do is to batch all of these statements into a single batch using jOOQ, and execute after successful method invocation prior to commit. I have found several (SO Post, jOOQ API, jOOQ GitHub Feature Request) posts where this topic is implicitly mentioned, and one user groups post that seemed explicitly related to my issue.
Since I am using Spring together with jOOQ, I believe my ideal solution (preferably declarative) would look something like the following:
#Batched(100) // batch size as parameter, potentially
#Transactional
public void createTransaction(CreationCommand creationCommand) {
// all inserts/updates above are added to a batch and executed on successful invocation
}
For this to work, I imagine I'd need to manage a scoped (ThreadLocal/Transactional/Session scope) resource which can keep track of the current batch such that:
Prior to entering the method, an empty batch is created if the method is #Batched,
A custom DSLContext (perhaps extending DefaultDSLContext) that is made available via DI has a ThreadLocal flag which keeps track of whether any current statements should be batched or not, and if so
Intercept the calls and add them to the current batch instead of executing them immediatelly.
However, step 3 would necessitate having to rewrite a large portion of our code from the (IMO) relatively readable:
records.forEach(record -> {
record.setX(...);
// ...
record.store();
}
to:
userObjects.forEach(userObject -> {
dslContext.insertInto(...).values(userObject.getX(), ...).execute();
}
which would defeat the purpose of having this abstraction in the first place, since the second form can also be rewritten using DSLContext#batchStore or DSLContext#batchInsert. IMO however, batching and bulk insertion should not be up to the individual developer and should be able to be handled transparently at a higher level (e.g. by the framework).
I find the readability of the jOOQ API to be an amazing benefit of using it, however it seems that it does not lend itself (as far as I can tell) to interception/extension very well for cases such as these. Is it possible, with the jOOQ 3.11.1 (or even current) API, to get behaviour similar to the former with transparent batch/bulk handling? What would this entail?
EDIT:
One possible but extremely hacky solution that comes to mind for enabling transparent batching of stores would be something like the following:
Create a RecordListener and add it as a default to the Configuration whenever batching is enabled.
In RecordListener#storeStart, add the query to the current Transaction's batch (e.g. in a ThreadLocal<List>)
The AbstractRecord has a changed flag which is checked (org.jooq.impl.UpdatableRecordImpl#store0, org.jooq.impl.TableRecordImpl#addChangedValues) prior to storing. Resetting this (and saving it for later use) makes the store operation a no-op.
Lastly, upon successful method invocation but prior to commit:
Reset the changes flags of the respective records to the correct values
Invoke org.jooq.UpdatableRecord#store, this time without the RecordListener or while skipping the storeStart method (perhaps using another ThreadLocal flag to check whether batching has already been performed).
As far as I can tell, this approach should work, in theory. Obviously, it's extremely hacky and prone to breaking as the library internals may change at any time if the code depends on Reflection to work.
Does anyone know of a better way, using only the public jOOQ API?
jOOQ 3.14 solution
You've already discovered the relevant feature request #3419, which will solve this on the JDBC level starting from jOOQ 3.14. You can either use the BatchedConnection directly, wrapping your own connection to implement the below, or use this API:
ctx.batched(c -> {
// Make sure all records are attached to c, not ctx, e.g. by fetching from c.dsl()
records.forEach(record -> {
record.setX(...);
// ...
record.store();
}
});
jOOQ 3.13 and before solution
For the time being, until #3419 is implemented (it will be, in jOOQ 3.14), you can implement this yourself as a workaround. You'd have to proxy a JDBC Connection and PreparedStatement and ...
... intercept all:
Calls to Connection.prepareStatement(String), returning a cached proxy statement if the SQL string is the same as for the last prepared statement, or batch execute the last prepared statement and create a new one.
Calls to PreparedStatement.executeUpdate() and execute(), and replace those by calls to PreparedStatement.addBatch()
... delegate all:
Calls to other API, such as e.g. Connection.createStatement(), which should flush the above buffered batches, and then call the delegate API instead.
I wouldn't recommend hacking your way around jOOQ's RecordListener and other SPIs, I think that's the wrong abstraction level to buffer database interactions. Also, you will want to batch other statement types as well.
Do note that by default, jOOQ's UpdatableRecord tries to fetch generated identity values (see Settings.returnIdentityOnUpdatableRecord), which is something that prevents batching. Such store() calls must be executed immediately, because you might expect the identity value to be available.

How do you actually "manage" the max number of webthreads using Spring 5's Reactive Programming?

When using a classical Tomcat approach, you can give your server a maximum number of threads it can use to handle web requests from users. Using the Reactive Programming paradigm, and Reactor in Spring 5, we are able to scale better vertically, making sure we are blocked minimally.
It seems to me that it makes this less manageable than the classical Tomcat approach, where you simply define the max number of concurrent requests. When you have a max number of concurrent requests, it's easier to estimate the maximum memory your application will need and scale accordingly. When you use Spring 5's Reactive Programming this seems like more of a hassle.
When I talk about these new technologies to sysadmin friends, they reply with worry about applications running out of RAM, or even threads on the OS level. So how can we deal with this better?
No blocking I/O at ALL
First of all, if you don't have any blocking operation then you should not worry at all about How much Thread should I provide for managing concurrency. In that case, we have only one worker which process all connections asynchronously and nonblockingly. And in that case, we may easily scale connection-servant workers which process all connections without contention and coherence (each worker has its own queue of received connections, each worker works on its own CPU) and we may scale application better in that case (shared nothing design).
Summary: in that case you manage max number of webthread identically as previously, by configuration application-container (Tomcat, WebSphere, etc) or similar in case of non-Servlet servers like Netty, or hybrid Undertow. The benefit - you may process muuuuuuch more users requests but with the same resources consumption.
Blocking Database and Non-Blocking Web API (such as WebFlux over Netty).
In case we should deal somehow with blocking I/O, for an instant communication with DB over blocking JDBC, the most appropriate way to keep your app scalable and efficient as possible we should use dedicated thread-pool for I/O.
Thread-pool requirements
First of all, we should create thread-pool with exactly the same amount of workers as available connections in JDBC connections-pool. Hence, we will have exactly the same amount of thread which will be blockingly wait for the response and we utilize our resources as efficiently as it possible, so no more memory will be consumed for Thread stack as it actually needed (In other word Thread per Connection model).
How to configure thread-pool accordingly to size of connection-pool
Since access to properties is varying for a particular database and JDBC driver, we may always externalize that configuration on a particular property, which in turn means that it may be configured by devops or sysadmin.
A configuration of Threadpool (in our example it is configuring of Scheduler of Project Reactor 3) may looks like next:
#Configuration
public class ReactorJdbcSchedulerConfig {
#Value("my.awasome.scheduler-size")
int schedulerSize;
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(new ForkJoinPool(schedulerSize));
// similarly
// ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
// taskExecutor.setCorePoolSize(schedulerSize);
// taskExecutor.setMaxPoolSize(schedulerSize);
// taskExecutor.setQueueCapacity(schedulerSize);
// taskExecutor.initialize();
// return Schedulres.fromExecutor(taskExecutor);
}
}
...
#Autowire
Scheduler jdbcScheduler;
public Mono myJdbcInteractionIsolated(String id) {
return Mono.fromCallable(() -> jpaRepo.findById(id))
.subscribeOn(jdbcScheduler)
.publishOn(Schedulers.single());
}
...
As it might be noted, with that technique, we may delegate our shared thread-pool configuration to an external team (sysadmins for an instance) and allows them to manage consumption of memory which is used for created Java Threads.
Keep your blocking I/O thread pool only for I/O work
This statement means that I/O thread should be only for operations which are blockingly waiting. In turn, it means that after the thread has done his awaiting the response, you should move result processing to another thread.
That is why in the above code-snippet I put .publishOn right after .subscribeOn.
So, to summarize, with that technique we may allow external team managing application sizing by controlling thread-pool size to connection-pool size accordingly. All results processing will be executed within one thread and there will be no redundant, uncontrolled memory consumption hence.
Finally, Blocking API (Spring MVC) and blocking I/O (Database access)
In that case, there is no need for reactive paradigm at all since you don't get any profit from that. First of all, Reactive Programming requires particular mind shifting, especially in the understanding of the usage of functional techniques with Reactive libraries such as RxJava or Project Reactor. In turn for non-prepared users, it gives more complexity and causes more "What ****** is going on here???". So, in case of blocking operations from both ends, you should think twice do you really need Reactive Programming here.
Also, there is no magic for free. Reactive Extensions comes with a lot of internal complexity and using all that magical .map, .flatMap, etc., you may lose in overall performance and memory consumption instead of winning like in case of end-to-end non-blocking, async communication.
That means that old good imperative programming will be more suitable here and it will much easier to control your application sizing in memory using old good Tomcat configuration management.
Can you try this :
public class AsyncConfig implements AsyncConfigurer {
#Override
public Executor getAsyncExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(15);
taskExecutor.setMaxPoolSize(100);
taskExecutor.setQueueCapacity(100);
taskExecutor.initialize();
return taskExecutor;
}
}
This works for async in spring 4 but I'm not sure it'll works in spring 5 with reactive.

Vertx worker verticle pool for jdbc

I'm new to Vert.x and I would like to implement a pool of worker verticles to make database queries using BoneCP. However, I'm a little bit confused about how to 'call' them to work and how to share the BoneCP connection pool between them.
I saw in Vertx DeploymentManager source that the start(Future) method is called synchronously and then the verticle is kept in memory until undeployed. After the start method completes, what's the correct way of calling methods on the worker verticle? If I deploy many instances of the verticle (using DeploymentOptions.setInstances()), will Vertx do load balancing between them?
I saw that Vert.x comes with a JDBC client and a worker pool, but it has limited datatypes I can work with because it uses the EventBus and serializes all data returned by the database. I need to work with many different datatypes (including dates, BigDecimals and binary objects) and I would like to avoid serialization as much as possible, but instead make queries in the worker verticle, process the results and return an object via a Future or AsyncResult (I believe this is done on-heap, so no serialization needed; is this correct?).
Please help me to sort out all these questions :) I will appreciate a lot if you give me examples of how can I make this work!
Thanks!
I'll try to answer your questions one by one.
how to 'call' them to work
You call your worker verticles using the EventBus. That's the proper way to communicate between them. Please see this example:
https://github.com/vert-x3/vertx-examples/blob/master/core-examples/src/main/java/io/vertx/example/core/verticle/worker/MainVerticle.java#L27
how to share the BoneCP connection pool between them.
Don't. Instead, create a small connection pool for each. Otherwise, it will cause unexpected behavior.
config.setMinConnectionsPerPartition(1);
config.setMaxConnectionsPerPartition(5);
config.setPartitionCount(1);
will Vertx do load balancing between them
No. That's the reason #Jochen Bedersdorfer and I suggest to use EventBus. You can have a reference to your worker verticle, as you suggested, but then you're stuck with 1:1 configuration.
return an object via a Future or AsyncResult (I believe this is done
on-heap, so no serialization needed; is this correct?)
This is correct. But again, you're stuck with 1:1 mapping then. Which is a lot worse in terms of performance that serialization (that's using buffers).
If you still do need something like that, maybe you shouldn't use worker verticles at all, but something like .executeBlocking:
https://github.com/vert-x3/vertx-examples/blob/master/core-examples/src/main/java/io/vertx/example/core/execblocking/ExecBlockingExample.java#L25
In your start(...) method, register event listeners with event bus as this is how you interact with verticles (worker or not).
Yes, if you deploy many instances, Vert.x will use round-robin to send messages to those instances.
For what you describe, Vert.x might not be the best fit, since it works best with asynchronous I/O.
You might be better off using standard Java concurrency tools to manage the load, i.e. Executor and friends.

What does JMS Session single-threadedness mean?

What is the exact nature of the thread-unsafety of a JMS Session and its associated constructs (Message, Consumer, Producer, etc)? Is it just that access to them must be serialized, or is it that access is restricted to the creating thread only?
Or is it a hybrid case where creation can be distinguished from use, i.e. one thread can create them only and then another thread can be the only one to use them? This last possibility would seem to contradict the statement in this answer which says "In fact you must not use it from two different threads at different times either!"
But consider the "Server Side" example code from the ActiveMQ documentation.
The Server class has data members named session (of type Session) and replyProducer (of type MessageProducer) which are
created in one thread: whichever one invokes the Server() constructor and thereby invokes the setupMessageQueueConsumer() method with the actual creation calls; and
used in another thread: whichever one invokes the onMessage() asynchronous callback.
(In fact, the session member is used in both threads too: in one to create the replyProducer member, and in the other to create a message.)
Is this official example code working by accident or by design? Is it really possible to create such objects in one thread and then arrange for another thread to use them?
(Note: in other messaging infrastructures, such as Solace, it's possible to specify the thread on which callbacks occur, which could be exploited to get around this "thread affinity of objects" restriction, but no such API call is defined in JMS, as far as I know.)
JMS specification says a session object should not be used across threads except when calling Session.Close() method. Technically speaking if access to Session object or it's children (producer, consumer etc) is serialized then Session or it's child objects can be accessed across threads. Having said that, since JMS is an API specification, it's implementation differs from vendor to vendor. Some vendors might strictly enforce the thread affinity while some may not. So it's always better to stick to JMS specification and write code accordingly.
The official answer appears to be a footnote to section 4.4. "Session" on p.60 in the JMS 1.1 specification.
There are no restrictions on the number of threads that can use a Session object or those it creates. The restriction is that the resources of a Session should not be used concurrently by multiple threads. It is up to the user to insure that this concurrency restriction is met. The simplest way to do this is to use one thread. In the case of asynchronous delivery, use one thread for setup in stopped mode and then start asynchronous delivery. In more complex cases the user must provide explicit synchronization.
Whether a particular implementation abides by this is another matter, of course. In the case of the ActiveMQ example, the code is conforming because all inbound message handling is through a single asynchronous callback.

Synchronous calls in akka / actor model

I've been looking into Akka lately and it looks like a great framework for building scalable servers on the JVM. However most of the libraries on the JVM are blocking (e.g. JDBC) so don't your lose out on the performance benefits of using an event based model because your threads will always be blocked? Does Akka do something to get around this? Or is it just something you have to live with until we get more non-blocking libraries on the JVM?
Have a look at CQRS, it greatly improves scalability by separating reads from writes. This means that you can scale your reads separately from your writes.
With the types of IO blocking issues you mentioned Scala provides a language embedded solution that matches perfectly: Futures. For example:
def expensiveDBQuery(key : Key) = Future {
//...query the database
}
val dbResult : Future[Result] =
expensiveDBQuery(...) //non-blocking call
The dbResult returns immediately from the function call. The Result will be a available in the "Future". The cool part about a Future is that you can think about them like any old collection, except you can never call .size on the Future. Other than that all collection-ish functions (e.g. map, filter, foreach, ...) are fair game. Simply think of the dbResult as a list of Results. What would you do with such a list:
dbResult.map(_.getValues)
.filter(values => someTestOnValues(values))
...
That sequence of calls sets up a computation pipeline that is invoked whenever the Result is actually returned from the database. You can give a sequence of computing steps before the data has arrived. All asynchronously.

Resources