I want to stream result objects captured by Spring JDBC RowCallbackHandler using via a Kotlin Sequence.
The code looks basically like this:
fun findManyObjects(): Sequence<Thing> = sequence {
val rowHandler = object : RowCallbackHandler {
override fun processRow(resultSet: ResultSet) {
val thing = // create from resultSet
yield(thing) // ERROR! No coroutine scope
}
}
jdbcTemplate.query("select * from ...", rowHandler)
}
But I get the compilation error:
Suspension functions can be called only within coroutine body.
However, exactly this "coroutine body" should exist, because the whole block is wrapped in a sequence builder. But it doesn't seem to work with a nested object.
Minimal example to show that it doesn't compile with a nested object:
// compiles
sequence {
yield(1)
}
// doesn't compile
sequence {
object {
fun doit() {
yield(1) // Suspension functions can be called only within coroutine body.
}
}
}
How can I pass an object from the ResultSet into the Sequence?
Use Flow for asynchronous data streams
The reason you can't call yield inside your RowCallbackHandler object is twofold.
The processRow function isn't a suspending function (and can't be, because it's declared in and called by Java). A suspending function like yield can only be called by another suspending function.
A sequence always ends when the sequence { ... } builder returns. Even if you and I know that the query method will invoke the RowCallbackHandler before returning from the sequence, the Kotlin compiler has no way of knowing that. Yielding sequence values from functions and objects other than the body of the sequence itself is never allowed, because there's no way of knowing where or when they will run.
To solve this problem, we need to introduce a different kind of coroutine: one that can suspend itself while it waits for the RowCallbackHandler to be invoked.
Unfortunately, because we're talking about JDBC here, there may not be much to gain by introducing full-blown coroutines. Under the hood, calls to the database will always be made in a blocking way, removing a lot of the benefit. It might well be simpler not to try and 'stream' results, and just iterate over them in a boring, old-fashioned way. But let's explore the possibilities all the same.
The problem with sequences
Sequences are designed for on-demand computation, and are not asynchronous. They can't wait for other asynchronous operations, such as callbacks. The sequence builder's yield function simply suspends while waiting for the caller to retrieve the next item, and it's the only suspending function a sequence is ever allowed to call. You can demonstrate this if you try to use a simple suspending call like delay inside a sequence. You'll get a compile error letting you know that you're operating in a restricted coroutine scope.
sequence<String> { delay(1000) } // doesn't compile
Without the ability to call suspending functions, there's no way to wait for a callback to be invoked. Recognising this limitation, Kotlin provides an alternative mechanism for streams of on-demand values that do provide data in an asynchronous way. It's called a Flow.
Callback flows
The mechanism for using Flows to provide values from a callback interface is described very nicely by Roman Elizarov in his Medium article Callbacks and Kotlin Flows.
If you did want to use a callback flow, you'd simply replace sequence with callbackFlow, and replace yield with sendBlocking.
Your code might look something like this:
fun findManyObjects(): Flow<Thing> = callbackFlow {
val rowHandler = object : RowCallbackHandler {
override fun processRow(resultSet: ResultSet) {
val thing = // create from resultSet
sendBlocking(thing)
}
}
jdbcTemplate.query("select * from ...", rowHandler)
close() // the query is finished, so there are no more rows
}
A simpler flow
While that's the idiomatic way to stream values provided by a callback, it might not be the simplest approach to this problem. By avoiding callbacks altogether, you can use the much more common flow builder, passing each value to its emit function. But now that you have asynchrony in the form of coroutines, you can't just return a flow and then allow Spring to immediately close the result set. You need to be able to delay the closing of the result set until the flow has actually been consumed. That means peeling back the abstractions provided by RowCallbackHandler or ResultSetExtractor, which expect to process all the results in a blocking way, and instead providing your own implementation.
fun Connection.findManyObjects(): Flow<Thing> = flow {
prepareStatement("select * from ...").use { statement ->
statement.executeQuery().use { resultSet ->
while (resultSet.next()) {
val thing = // create from resultSet
emit(thing)
}
}
}
}
Note the use blocks, which will deal with closing the statement and result set. Because we don't reach the end of the use blocks until the while loop has completed and all the values have been emitted, the flow is free to suspend while the result set remains open.
So why use a flow at all?
You might notice that if you do it this way, you can actually replace flow and emit with sequence and yield. So have we come full circle? Well, sort of. The difference is that a flow can only be consumed from a coroutine, whereas with sequence, you can iterate over the resulting values without suspending at all. In this particular case, it's a hard call to make, because JDBC operations are always blocking.
If you use a sequence, the calling thread will block as it waits to receive the data. Values in a sequence are always computed by the thing consuming the sequence, so if the sequence invokes a blocking function, the consumer's thread will block waiting for the value. In a non-coroutine application, that might be okay, but if you're using coroutines, you really want to avoid hiding blocking calls inside innocuous-looking sequences.
If you use a flow, you can at least isolate the blocking calls by having the flow run on a particular dispatcher. For example, you could use the built-in IO dispatcher to perform the JDBC call, then switch back to the default dispatcher for any further processing. If you definitely want to stream values, I think this is a better approach than using a sequence.
With all this in mind, you'll need to be careful with your use of coroutines and dispatchers if you do choose one of these solutions. If you'd rather not worry about that, there's nothing wrong with using a regular ResultSetExtractor and forgetting about both sequences and flows for now.
Related
Reference:
websocket_client_async_ssl.cpp
strands
Question 1> Here is my understanding:
Given a few async operations bound with the same strand, the strand
will guarantee that all associated async operations will be executed
as a strictly sequential invocation.
Does this mean that all above async operations will be executed by a same thread?
Or it just says that at any time, only one asyn operation will be executed by any available thread?
Question 2> The boost::asio::make_strand function creates a strand object for an executor or execution context.
session(net::io_context& ioc, ssl::context& ctx)
: resolver_(net::make_strand(ioc))
, ws_(net::make_strand(ioc), ctx)
Here, resolver_ and ws_ have its own strand, but I have problems to understand how each strand applies to what asyn operations.
For example, in the following aysnc and handler, which functions(i.e aysnc or handler) are bound to the same strand and will not run simultaneously.
run
=>resolver_.async_resolve -->session::on_resolve
=>beast::get_lowest_layer(ws_).async_connect -->session::on_connect
=>ws_.next_layer().async_handshake --> session::on_ssl_handshake
=>ws_.async_handshake --> session::on_handshake
async ================================= handler
Question 3> How can we retrieve the strand from executor?
Is there any difference between these two?
get_associated_executor
get_executor
io_context::get_executor: Obtains the executor associated with the
io_context.
get_associated_executor: Helper function to obtain an object's
associated executor.
Question 4> Is it correct that I use the following method to bind deadline_timer to io_context to prevent race condition?
All other parts of the code is same as the example of websocket_client_async_ssl.cpp.
session(net::io_context& ioc, ssl::context& ctx)
: resolver_(net::make_strand(ioc))
, ws_(net::make_strand(ioc), ctx),
d_timer_(ws_.get_executor())
{ }
void on_heartbeat_write( beast::error_code ec, std::size_t bytes_transferred)
{
d_timer_.expires_from_now(boost::posix_time::seconds(5));
d_timer_.async_wait(beast::bind_front_handler( &session::on_heartbeat, shared_from_this()));
}
void on_heartbeat(const boost::system::error_code& ec)
{
ws_.async_write( net::buffer(text_ping_), beast::bind_front_handler( &session::on_heartbeat_write, shared_from_this()));
}
void on_handshake(beast::error_code ec)
{
d_timer_.expires_from_now(boost::posix_time::seconds(5));
d_timer_.async_wait(beast::bind_front_handler( &session::on_heartbeat, shared_from_this()));
ws_.async_write(net::buffer(text_), beast::bind_front_handler(&session::on_write, shared_from_this()));
}
Note:
I used d_timer_(ws_.get_executor()) to init deadline_timer and hoped that it will make sure they don't write or read the websocket at the same time.
Is this the right way to do it?
Question 1
Does this mean that all above async operations will be executed by a same thread? Or it just says that at any time, only one async operation will be executed by any available thread?
The latter.
Question 2
Here, resolver_ and ws_ have its own strand,
Let me interject that I think that's unnecessarily confusing in the example. They could (should, conceptually) have used the same strand, but I guess they didn't want to go through the trouble of storing a strand. I'd probably have written:
explicit session(net::io_context& ioc, ssl::context& ctx)
: resolver_(net::make_strand(ioc))
, ws_(resolver_.get_executor(), ctx) {}
The initiation functions are called where you decide. The completion handlers are dispatch-ed on the executor that belongs to the IO object that you call the operation on, unless the completion handler is bound to a different executor (e.g. using bind_executor, see get_associated_exectutor). In by far the most circumstances in modern Asio, you will not bind handlers, instead "binding IO objects" to the proper executors. This makes it less typing, and much harder to forget.
So in effect, all the async-initiations in the chain except for the one in run() are all on a strand, because the IO objects are tied to strand executors.
You have to keep in mind to dispatch on a strand when some outside user calls into your classes (e.g. often to stop). It is a good idea therefore to develop a convention. I'd personally make all the "unsafe" methods and members private:, so I will often have pairs like:
public:
void stop() {
dispatch(strand_, [self=shared_from_this()] { self->do_stop(); });
}
private:
void do_stop() {
beast::get_lowest_layer(ws_).cancel();
}
Side Note:
In this particular example, there is only one (main) thread running/polling the io service, so the whole point is moot. But as I explained recently (Does mulithreaded http processing with boost asio require strands?), the examples are here to show some common patterns that allow one to do "real life" work as well
Bonus: Handler Tracking
Let's use BOOST_ASIO_ENABLE_HANDLER_TRACKING to get some insight.¹ Running a sample session shows something like
If you squint a little, you can see that all the strand executors are the same:
0*1|resolver#0x559785a03b68.async_resolve
1*2|strand_executor#0x559785a02c50.execute
2*3|socket#0x559785a05770.async_connect
3*4|strand_executor#0x559785a02c50.execute
4*5|socket#0x559785a05770.async_send
5*6|strand_executor#0x559785a02c50.execute
6*7|socket#0x559785a05770.async_receive
7*8|strand_executor#0x559785a02c50.execute
8*9|socket#0x559785a05770.async_send
9*10|strand_executor#0x559785a02c50.execute
10*11|socket#0x559785a05770.async_receive
11*12|strand_executor#0x559785a02c50.execute
12*13|deadline_timer#0x559785a05958.async_wait
12*14|socket#0x559785a05770.async_send
14*15|strand_executor#0x559785a02c50.execute
15*16|socket#0x559785a05770.async_receive
16*17|strand_executor#0x559785a02c50.execute
17*18|socket#0x559785a05770.async_send
13*19|strand_executor#0x559785a02c50.execute
18*20|strand_executor#0x559785a02c50.execute
20*21|socket#0x559785a05770.async_receive
21*22|strand_executor#0x559785a02c50.execute
22*23|deadline_timer#0x559785a05958.async_wait
22*24|socket#0x559785a05770.async_send
24*25|strand_executor#0x559785a02c50.execute
25*26|socket#0x559785a05770.async_receive
26*27|strand_executor#0x559785a02c50.execute
23*28|strand_executor#0x559785a02c50.execute
Question 3
How can we retrieve the strand from executor?
You don't[*]. However make_strand(s) returns an equivalent strand if s is already a strand.
[*] By default, Asio's IO objects use the type-erased executor (asio::executor or asio::any_io_executor depending on version). So technically you could ask it about its target_type() and, after comparing the type id to some expected types use something like target<net::strand<net::io_context::executor_type>>() to access the original, but there's really no use. You don't want to be inspecting the implementation details. Just honour the handlers (by dispatching them on their associated executors like Asio does).
Is there any difference between these two? get_associated_executor get_executor
get_executor gets an owned executor from an IO object. It is a member function.
asio::get_associated_executor gets associated executors from handler objects. You will observe that get_associated_executor(ws_) doesn't compile (although some IO objects may satisfy the criteria to allow it to work).
Question 4
Is it correct that I use the following method to bind deadline_timer to io_context
You will notice that you did the same as I already mentioned above to tie the timer IO object to the same strand executor. So, kudos.
to prevent race condition?
You don't prevent race conditions here. You prevent data races. That is because in on_heartbeat you access the ws_ object which is an instance of a class that is NOT threadsafe. In effect, you're sharing access to non-threadsafe resources, and you need to serialize access, hence you want to be on the strand that all other accesses are also on.
Note: [...] and hoped that it will make sure they don't write or read the websocket at the same time. Is this the right way to do it?
Yes this is a good start, but it is not enough.
Firstly, you can write or read at the same time, as long as
write operations don't overlap
read operations don't overlap
accesses to the IO object are safely serialized.
In particular, your on_heartbeat might be safely serialized so you don't have a data race on calling the async_write initiation function. However, you need more checks to know whether a write operation is already (still) in progress. One way to achieve that is to have a queue with outgoing messages. If you have strict heartbeat requirements and high load, you might need a priority-queue here.
¹ I simplified the example by replacing the stream type with the Asio native ssl::stream<tcp::socket>. This means we don't get all the internal timers that deal with tcp_stream expirations. See https://pastebin.ubuntu.com/p/sPRYh6Xbwz/
I have a simple series of chained operations that retrieve and persist some data using a Panache repository, running in a Quarkus service. Where these operations are parallelised a ContextNotActiveException is thrown. Where the parallelisation is removed, the code works as intended.
This code works:
dataRepository.get()
.map { convert(it) }
.forEach { perist(it) }
This code does not:
dataRepository.get()
.parallelStream()
.map { convert(it) }
.forEach { perist(it) }
The Quarkus documentation is pretty limited, only addressing use of mutiny or RX.
How can I propagate the context such that parallelStream() will work?
Unfortunately Context Propagation does not play well with parallel Java streams, because making a stream parallel automatically moves the execution to the ForkJoinPool, which means you lose the context. You'll need to handle the parallelism differently, without having the Java streams do it for you - you will probably want to use the org.eclipse.microprofile.context.ManagedExecutor.
Assuming that it's the convert method which, for whatever reason, requires an active request context, you will need to dispatch its invocation into the managed executor. This will make sure that the context is propagated.
In Java code, one close equivalent to your code that I can think of is this:
#Inject
org.eclipse.microprofile.context.ManagedExecutor executor;
(...)
dataRepository.streamAll()
.forEach(i -> {
executor.supplyAsync(() -> {
return convert(i);
}).thenAccept(persist(i));
});
I have more of a opinions question, asi if this, what many people do, should be a Rx use case.
In apps there is usually sql database, which is queried by UI as a observable, which emits after the query is loaded + anytime data changes (Room / SqlDelight etc)
Reads sound okay, however, is it possible to have "pure" writes to the database?
Writing to the database might look like this
fun sync() = Completable.fromCallable {
// do something
database.writeSomethingSynchronously()
}
SomeUi {
init {
database.someQueryObservable()
.subscribe { show list }
}
}
Imagine you want to display progressbar while this Completable is in flight.
What is effectively happening here is sideffecting to the database. Which means the opened database observable will re-emit when the data is written, but still before the sync() returns (assuming single threaded for simplicity)
Now there is point in time where there is new data in the UI and the progressbar is shown. (and worse with multithreading timings) This is invalid state.
In imperative world, sync would provide a completion callback, in which one would reload the query manually + show/hide progressbar synchronously. (And somehow block the database change listener for duration of the sync writes?)
Is there a way around this at all?
I've an actor where I want to store my mutable state inside a map.
Clients can send Get(key:String) and Put(key:String,value:String) messages to this actor.
I'm considering the following options.
Don't use futures inside the Actor's receive method. In this may have a negative impact on both latency as well as throughput in case I've a large number of gets/puts because all operations will be performed in order.
Use java.util.concurrent.ConcurrentHashMap and then invoke the gets and puts inside a Future.
Given that java.util.concurrent.ConcurrentHashMap is thread-safe and providers finer level of granularity, I was wondering if it is still a problem to close over the concurrentHashMap inside a Future created for each put and get.
I'm aware of the fact that it's a really bad idea to close over mutable state inside a Future inside an Actor but I'm still interested to know if in this particular case it is correct or not?
In general, java.util.concurrent.ConcurrentHashMap is made for concurrent use. As long as you don't try to transport the closure to another machine, and you think through the implications of it being used concurrently (e.g. if you read a value, use a function to modify it, and then put it back, do you want to use the replace(key, oldValue, newValue) method to make sure it hasn't changed while you were doing the processing?), it should be fine in Futures.
May be a little late, but still, in the book Reactive Web Applications, the author has indicated an indirection to this specific problem, using pipeTo as below.
def receive = {
case ComputeReach(tweetId) =>
fetchRetweets(tweetId, sender()) pipeTo self
case fetchedRetweets: FetchedRetweets =>
followerCountsByRetweet += fetchedRetweets -> List.empty
fetchedRetweets.retweets.foreach { rt =>
userFollowersCounter ! FetchFollowerCount(
fetchedRetweets.tweetId, rt.user
)
}
...
}
where followerCountsByRetweet is a mutable state of the actor. The result of fetchRetweets() which is a Future is piped to the same actor as a FetchedRetweets message, which then acts on the message on to modify the state of the acto., this will mitigate any concurrent operation on the state
Does the method post() method of the object boost::asio::io_service uses the boost::coroutines to perform queue of short-tasks performed in the handlers? This can save the resources spent on synchronization when using threads, but makes it impossible to move tasks to another thread. Or it makes no sense?
As best as I could tell, Boost.Asio does not use coroutines.
From an implementation point of view, I would imagine using a coroutine, such as those provided by Boost.Coroutine, would introduce overhead when invoking posted handlers. At the point in which the event loop knows what handlers can be invoked, it could simple invoke the handler rather than having to hoist the handler in a trampoline function so that it can be transparently invoked within the context of a coroutine.
Boost.Asio does not know the actual or expected runtime duration of handlers, so it must perform the same internal synchronization regardless of the handlers. When the io_service is only being processed by a single thread, then synchronization overhead can be mitigated by providing a concurrency_hint during construction. Other areas, such as the reactor, may still need to perform synchronization.
In the end, rather than imposing context of execution, Boost.Asio provides a robust toolkit and empowers the users to choose the best option for themselves. The current Boost.Asio candidate for Boost 1.54 enhances this experience through its first-class support for:
Stackful Coroutines based on Boost.Coroutine. Here is an example where do_echo executes within the context of my_strand as a coroutine. Each async operation yields control back to the calling thread after initiating the asynchronous operation, and when the completion handler is invoked, control returns to immediately following the previous yield point.
boost::asio::spawn(my_strand, do_echo);
// ...
void do_echo(boost::asio::yield_context yield)
{
try
{
char data[128];
for (;;)
{
std::size_t length =
my_socket.async_read_some(
boost::asio::buffer(data), yield);
boost::asio::async_write(my_socket,
boost::asio::buffer(data, length), yield);
}
}
catch (std::exception& e)
{
// ...
}
}
Boost.Asio provides a complete echo_service example that uses Stackful Coroutines.
Stackless Coroutines have been promoted to the documented public API from the HTTP Server 4 example. These are implemented as a variant of Duff's Device, but the details are cleanly hidden through the use of pseudo-keywords reenter, yield, and fork. This following is roughly the equivalent of the above Stackful Coroutine example:
struct session : boost::asio::coroutine
{
tcp::socket my_socket_;
char data_[128];
// ...
void operator()(boost::system::error_code ec = boost::system::error_code(),
std::size_t length = 0)
{
if (!ec) reenter (this)
{
for (;;)
{
yield my_socket_.async_read_some(
boost::asio::buffer(data_), *this);
yield boost::asio::async_write(my_socket_,
boost::asio::buffer(data_, length), *this);
}
}
}
};
See the boost::asio::coroutine documentation for more details.
While I do not know if there are performance benefits to constructing asynchronous call chains with coroutines, I feel as though their greatest contribution is maintainability and readability. I have found that being able to read and write asynchronous programs in a synchronous manner helps reduce the complexities introduced with inverted flow of control, as it is now possible to remove the spacial separation between operation initiation and completion.