Kotlin coroutines slow start - performance

I've been attempting to do a bit of performance review on an app I have, it's a back end Kotlin app that just pulls in some data, does a bit of data transformation and dumps it out, nothing too fancy. One thing that caught my eye was the final bit of execution where we dump our final data onto a queue, at first I noticed when we start up the app the final network call takes a very long time at first, sometimes over a second. Normally we run this network call in a coroutine to stop that last call blocking everything but I started trying to time the coroutine and the network call separately and got some odd results, from what I can see the coroutine takes can take forever to launch/complete compared to the network call. It's entirely possible I'm not recording things correctly but this is the general timing approach I have:
val coroutineTime - Instant.now().toEpochMillis()
GlobalScope.launch {
executionTime = measureTimeMillis { <--DO Message Sending -->}
totalTime = Instant.now().toEpochMillis() - coroutineTime
// Log out execution Time and total time
}
Now here what I'll see is something like
- totalTime = ~800ms
- executionTime = ~150ms
These aren't one-offs either, I have multiple of these processes going on at once ( up to 10 threads I think) and the first total times will always take significantly longer than the actual executionTime/network call. Eventually after a new dozen messages the overhead will calm down and these times will become equivalent at about 15ms, but having nearly 700ms overhead on coroutine start up seems insane to me.
Is this normal/expected behavior? I've tested this in a separate app and see similar but less extreme results where the first coroutine will take about 70ms to boot up, I'm struggling to find any other examples of this type of discussion outside of kotlin being used in android development.

As a first note, it's almost never a good idea to use the GlobalScope unless you really know what you're doing. This is why it was marked as delicate API. You should instead use a scope that is appropriately closed (following the lifecycle of whatever component launches this work).
Now, AFAIK, this GlobalScope runs on the default dispatcher, so maybe this is due to a cold start of that default thread pool. Later, it could also be a problem to use this dispatcher for network calls depending on the amount of concurrent coroutines you have. It would be more appropriate to use Disptachers.IO instead for IO bound work (or a custom thread pool).
It still doesn't explain the cold start, but I would first change that before investigating.

This is expected behavior if you use coroutines inappropriately ;-)
My guess is that your message sending is a blocking operation. By default GlobalScope.launch() dispatches coroutines with Dispatchers.Default which is designed to perform CPU-intensive operations, it has a limited number of threads and you should never block when using it. If you do you may run out of threads and coroutines will need to wait until some blocking operations will finish.
If you need to run blocking or IO code, you should use Dispatchers.IO instead:
GlobalScope.launch(Dispatchers.IO) {

I was facing similar issue, I have a function that loads some data from shared prefs, makes some calculations on the data (all this done in Dispatcher.Default), and return the result on Dispatcher.Main. I measured how long it took the Coroutine to actually start executing the block inside Dispatchers.Main.launch { } after calculations are done(time from tag2 to tag3 below), and got about 950ms (!!) Here is the function :
fun someName() {
CoroutineScope(Dispatchers.Default).launch {
val time = System.currentTimeMillis()
//load data and calculations
Log.d("tag2", "load and calculations took " + (System.currentTimeMillis() - time))
CoroutineScope(Dispatchers.Main.immediate).launch {
Log.d("tag3", "reached main thread code " + (System.currentTimeMillis() - time))
//do something
Log.d("tag4", "do something took " + (System.currentTimeMillis() - time))
}
}
}
But then I realized this happens while app launch, and main thread is busy creating all the UI, so even with .immediate it takes time until main thread will get to execute the dispatched code... then I tried to run this function after app already started and waiting, and found that from tag2 to tag 3 takes about 1ms (!!) (with .immediate). So looks like when dispatching something on Coroutine, when thread isn't busy it will start immediately

Related

How can I write a save GUI-Aktor for Scalafx?

Basically I want an Aktor to change a scalafx-GUI safely.
I've read many posts describing this, but there where sometimes contradictory and some years old, so some of them might be outdated.
I have a working example code and I basically want to know if this kind of programming is thread-save.
The other question is if I can configure sbt or the compiler or something in a way, that all threads (from the gui, the actors and the futures) are started by the same dispatcher.
I've found some example code "scalafx-akka-demo" on GitHub, which is 4 years old. What I did in the following example is basically the same, just a little simplified to keep things easy.
Then there is the scalatrix-example approximately with the same age. This example really worries me.
In there is a self-written dispatcher from Viktor Klang from 2012, and I have no idea how to make this work or if I really need it. The question is: Is this dispatcher only an optimisation or do I have to use something like it to be thread save?
But even if I don't absolutely need the dispatcher like in scalatrix, it is not optimal to have a dispatcher for the aktor-threads and one for the scalafx-event-threads. (And maybe one for the Futures-threads as well?)
In my actual project, I have some measurement values coming from a device over TCP-IP, going to an TCP-IP actor and are to be displayed in a scalafx-GUI. But this is much to long.
So here is my example code:
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scalafx.Includes._
import scalafx.application.{JFXApp, Platform}
import scalafx.application.JFXApp.PrimaryStage
import scalafx.event.ActionEvent
import scalafx.scene.Scene
import scalafx.scene.control.Button
import scalafx.stage.WindowEvent
import scala.concurrent.ExecutionContext.Implicits.global
object Main extends JFXApp {
case object Count
case object StopCounter
case object CounterReset
val aktorSystem: ActorSystem = ActorSystem("My-Aktor-system") // Create actor context
val guiActor: ActorRef = aktorSystem.actorOf(Props(new GUIActor), "guiActor") // Create GUI actor
val button: Button = new Button(text = "0") {
onAction = (_: ActionEvent) => guiActor ! Count
}
val someComputation = Future {
Thread.sleep(10000)
println("Doing counter reset")
guiActor ! CounterReset
Platform.runLater(button.text = "0")
}
class GUIActor extends Actor {
def receive: Receive = counter(1)
def counter(n: Int): Receive = {
case Count =>
Platform.runLater(button.text = n.toString)
println("The count is: " + n)
context.become(counter(n + 1))
case CounterReset => context.become(counter(1))
case StopCounter => context.system.terminate()
}
}
stage = new PrimaryStage {
scene = new Scene {
root = button
}
onCloseRequest = (_: WindowEvent) => {
guiActor ! StopCounter
Await.ready(aktorSystem.whenTerminated, 5.seconds)
Platform.exit()
}
}
}
So this code brings up a button, and every time it is clicked the number of the button increases. After some time the number on the button is reset once.
In this example-code I tried to bring the scalafx-GUI, the actor and the Future to influence each other. So the button click sends a message to the actor, and then the actor changes the gui - which is what I am testing here.
The Future also sends to the actor and changes the gui.
So far, this example works and I haven't found everything wrong with it.
But unfortunately, when it comes to thread-safety this doesn't mean much
My concrete questions are:
Is the method to change the gui in the example code thread save?
Is there may be a better way to do it?
Can the different threads be started from the same dispatcher?
(if yes, then how?)
To address your questions:
1) Is the method to change the gui in the example code thread save?
Yes.
JavaFX, which ScalaFX sits upon, implements thread safety by insisting that all GUI interactions take place upon the JavaFX Application Thread (JAT), which is created during JavaFX initialization (ScalaFX takes care of this for you). Any code running on a different thread that interacts with JavaFX/ScalaFX will result in an error.
You are ensuring that your GUI code executes on the JAT by passing interacting code via the Platform.runLater method, which evaluates its arguments on the JAT. Because arguments are passed by name, they are not evaluated on the calling thread.
So, as far as JavaFX is concerned, your code is thread safe.
However, potential issues can still arise if the code you pass to Platform.runLater contains any references to mutable state maintained on other threads.
You have two calls to Platform.runLater. In the first of these (button.text = "0"), the only mutable state (button.text) belongs to JavaFX, which will be examined and modified on the JAT, so you're good.
In the second call (button.text = n.toString), you're passing the same JavaFX mutable state (button.text). But you're also passing a reference to n, which belongs to the GUIActor thread. However, this value is immutable, and so there are no threading issues from looking at its value. (The count is maintained by the Akka GUIActor class's context, and the only interactions that change the count come through Akka's message handling mechanism, which is guaranteed to be thread safe.)
That said, there is one potential issue here: the Future both resets the count (which will occur on the GUIActor thread) as well as setting the text to "0" (which will occur on the JAT). Consequently, it's possible that these two actions will occur in an unexpected order, such as button's text being changed to "0" before the count is actually reset. If this occurs simultaneously with the user clicking the button, you'll get a race condition and it's conceivable that the displayed value may end up not matching the current count.
2) Is there may be a better way to do it?
There's always a better way! ;-)
To be honest, given this small example, there's not a lot of further improvement to be made.
I would try to keep all of the interaction with the GUI inside either GUIActor, or the Main object to simplify the threading and synchronization issues.
For example, going back to the last point in the previous answer, rather than have the Future update button.text, it would be better if that was done as part of the CounterReset message handler in GUIActor, which then guarantees that the counter and button appearance are synchronized correctly (or, at least, that they're always updated in the same order), with the displayed value guaranteed to match the count.
If your GUIActor class is handling a lot of interaction with the GUI, then you could have it execute all of its code on the JAT (I think this was the purpose of Viktor Klang's example), which would simplify a lot of its code. For example, you would not have to call Platform.runLater to interact with the GUI. The downside is that you then cannot perform processing in parallel with the GUI, which might slow down its performance and responsiveness as a result.
3) Can the different threads be started from the same dispatcher? (if yes, then how?)
You can specify custom execution contexts for both futures and Akka actors to get better control of their threads and dispatching. However, given Donald Knuth's observation that "premature optimization is the root of all evil", there's no evidence that this would provide you with any benefits whatsoever, and your code would become significantly more complicated as a result.
As far as I'm aware, you can't change the execution context for JavaFX/ScalaFX, since JAT creation must be finely controlled in order to guarantee thread safety. But I could be wrong.
In any case, the overhead of having different dispatchers is not going to be high. One of the reasons for using futures and actors is that they will take care of these issues for you by default. Unless you have a good reason to do otherwise, I would use the defaults.

Understanding Celluloid Pool

I guess my understanding toward Celluloid Pool is sort of broken. I will try to explain below but before that a quick note.
Note: Our system is running against a very fast client passing messages over ZeroMQ.
With the following Vanilla Celluloid app
class VanillaClient
include Celluloid::ZMQ
def read
loop { async.evaluate_response(socket.read_multipart)
end
def evaluate_response(data)
## the reason for using defer can be found over here.
Celluloid.defer do
ExternalService.execute(data)
end
end
end
Our system result in failure after some time, reason 'Can't spawn more thread' (or something like it)
So we intended to use Celluloid Pool(to avoid the above-mentioned problem ) so that we can limit the number of threads that spawned
My Understanding toward Celluloid Pool is
Celluloid Pool maintains a pool of actors for you so that you can distribute your task in parallel.
Hence, I decide to test it, but according to my test cases, it seems to behave serially(i.e thing never get distribute or happen in parallel.)
Example to replicate this.
sender-1.rb
## Send message `1` to the the_client.rb
sender-2.rb
## Send message `2` to the the_client.rb
the_client.rb
## take message from sender-1 and sender-2 and return it back to receiver.rb
## heads on, the `sleep` is introduced to test/replicate the IO block that happens in the actual code.
receiver.rb
## print the message obtained from the_client.rb
If, the sender-2.rb is run before sender-1.rb it appears that the pool gets blocked for 20 sec (sleep time in the_client.rb,can be seen over here) before consuming the data sent by sender-1.rb
It behaves the same in ruby-2.2.2 and under jRuby-9.0.5.0. What could be the possible causes for Pool to act in such manner?
Your pool call is not asynchronous.
Execution of evaluate on #pool needs to be .async still, as in your original example, not using pools. You still want asynchronous behavior, but you als want to have multiple handler actors.
Next you will likely hit the Pool.async bug.
https://github.com/celluloid/celluloid-pool/issues/6
This means after 5 hits to evaluate your pool will become unresponsive until at least one actor in the pool is finished. Worst case scenario, if you get 6+ requests in rapid succession, the 6th will then take 120 seconds, because it will take 5*20 seconds before it executes, then 20 seconds to execute itself.
Depending on what your actual operation is that's causing you delays -- you might need to adjust your pool size down the line.

What is considered overloading the main thread?

I am displaying information from a data model on a user interface. My current approach to doing so is by means of delegation as follows:
#protocol DataModelDelegate <NSObject>
- (void)updateUIFromDataModel;
#end
I am implementing the delegate method in my controller class as follows, using GCD to push the UI updating to the main thread:
- (void)updateUIFromDataModel {
dispatch_async(dispatch_get_main_queue(), ^{
// Code to update various UI controllers
// ...
// ...
});
}
What I am concerned about is that in some situations, this method can be called very frequently (~1000 times per second, each updating multiple UI objects), which to me feels very much like I am 'spamming' the main thread with commands.
Is this too much to be sending to the main thread? If so does anyone have any ideas on what would be the best way of approaching this?
I have looked into dispatch_apply, but that appears to be more useful when coalescing data, which is not what I am after - I really just want to skip updates if they are too frequent so only a sane amount of updates are sent to the main thread!
I was considering taking a different approach and implementing a timer instead to constantly poll the data, say every 10 ms, however since the data updating tends to be sporadic I feel that it would be wasteful to do so.
Combining both approaches, another option I have considered would be to wait for an update message and respond by setting the timer to poll the data at a set interval, and then disabling the timer if the data appears to have stopped changing. But would this be over-complicating the issue, and would the sane approach be to simply have a constant timer running?
edit: Added an answer below showing the adaptations using a dispatch source
One option is to use a Dispatch Source with type DISPATCH_SOURCE_TYPE_DATA_OR which lets you post events repeatedly and have libdispatch combine them together for you. When you have something to post, you use dispatch_source_merge_data to let it know there's something new to do. Multiple calls to dispatch_source_merge_data will be coalesced together if the target queue (in your case, the main queue) is busy.
I have been experimenting with dispatch sources and got it working as expected now - Here is how I have adapted my class implementation in case it is of use to anyone who comes across this question:
#implementation AppController {
#private
dispatch_source_t _gcdUpdateUI;
}
- (void)awakeFromNib {
// Added the following code to set up the dispatch source event handler:
_gcdUpdateUI = dispatch_source_create(DISPATCH_SOURCE_TYPE_DATA_ADD, 0, 0,
dispatch_get_main_queue());
dispatch_source_set_event_handler(_gcdUpdateUI, ^{
// For each UI element I want to update, pull data from model object:
// For testing purposes - print out a notification:
printf("Data Received. Messages Passed: %ld\n",
dispatch_source_get_data(_gcdUpdateUI));
});
dispatch_resume(_gcdUpdateUI);
}
And now in the delegate method I have removed the call to dispatch_async, and replaced it with the following:
- (void)updateUIFromDataModel {
dispatch_source_merge_data(_gcdUpdateUI, 1);
}
This is working absolutely fine for me. Now Even during the most intense data updating the UI stays perfectly responsive.
Although the printf() output was a very crude way of checking if the coalescing is working, a quick scrolling back up the console output showed me that the majority of the messages print outs had a value 1 (easily 98% of them), however there were the intermittent jumps to around 10-20, reaching a peak value of just over 100 coalesced messages around a time when the model was sending the most update messages.
Thanks again for the help!
If the app beach-balls under heavy load, then you've blocked the main thread for too long and you need to implement a coalescing strategy for UI updates. If the app remains responsive to clicks, and doesn't beach-ball, then you're fine.

What is the cost of creating actors in Akka?

Consider a scenario in which I am implementing a system that processes incoming tasks using Akka. I have a primary actor that receives tasks and dispatches them to some worker actors that process the tasks.
My first instinct is to implement this by having the dispatcher create an actor for each incoming task. After the worker actor processes the task it is stopped.
This seems to be the cleanest solution for me since it adheres to the principle of "one task, one actor". The other solution would be to reuse actors - but this involves the extra-complexity of cleanup and some pool management.
I know that actors in Akka are cheap. But I am wondering if there is an inherent cost associated with repeated creation and deletion of actors. Is there any hidden cost associated with the data structures Akka uses for the bookkeeping of actors ?
The load should be of the order of tens or hundreds of tasks per second - think of it as a production webserver that creates one actor per request.
Of course, the right answer lies in the profiling and fine tuning of the system based on the type of the incoming load.
But I wondered if anyone could tell me something from their own experience ?
LATER EDIT:
I should given more details about the task at hand:
Only N active tasks can run at some point. As #drexin pointed out - this would be easily solvable using routers. However, the execution of tasks isn't a simple run and be done type of thing.
Tasks may require information from other actors or services and thus may have to wait and become asleep. By doing so they release an execution slot. The slot can be taken by another waiting actor which now has the opportunity to run. You could make an analogy with the way processes are scheduled on one CPU.
Each worker actor needs to keep some state regarding the execution of the task.
Note: I appreciate alternative solutions to my problem, and I will certainly take them into consideration. However, I would also like an answer to the main question regarding the intensive creation and deletion of actors in Akka.
You should not create an actor for every request, you should rather use a router to dispatch the messages to a dynamic amount of actors. That's what routers are for. Read this part of the docs for more information: http://doc.akka.io/docs/akka/2.0.4/scala/routing.html
edit:
Creating top-level actors (system.actorOf) is expensive, because every top-level actor will initialize an error kernel as well and those are expensive. Creating child actors (inside an actor context.actorOf) is way cheaper.
But still I suggest you to rethink this, because depending on the frequency of the creation and deletion of actors you will also put afditional pressure on the GC.
edit2:
And most important, actors are not threads! So even if you create 1M actors, they will only run on as many threads as the pool has. So depending on the throughput setting in the config every actor will process n messages before the thread gets released to the pool again.
Note that blocking a thread (includes sleeping) will NOT return it to the pool!
An actor which will receive one message right after its creation and die right after sending the result can be replaced by a future. Futures are more lightweight than actors.
You can use pipeTo to receive the future result when its done. For instance in your actor launching the computations:
def receive = {
case t: Task => future { executeTask( t ) }.pipeTo(self)
case r: Result => processTheResult(r)
}
where executeTask is your function taking a Task to return a Result.
However, I would reuse actors from a pool through a router as explained in #drexin answer.
I've tested with 10000 remote actors created from some main context by a root actor, same scheme as in prod module a single actor was created. MBP 2.5GHz x2:
in main: main ? root // main asks root to create an actor
in main: actorOf(child) // create a child
in root: watch(child) // watch lifecycle messages
in root: root ? child // wait for response (connection check)
in child: child ! root // response (connection ok)
in root: root ! main // notify created
Code:
def start(userName: String) = {
logger.error("HELLOOOOOOOO ")
val n: Int = 10000
var t0, t1: Long = 0
t0 = System.nanoTime
for (i <- 0 to n) {
val msg = StartClient(userName + i)
Await.result(rootActor ? msg, timeout.duration).asInstanceOf[ClientStarted] match {
case succ # ClientStarted(userName) =>
// logger.info("[C][SUCC] Client started: " + succ)
case _ =>
logger.error("Terminated on waiting for response from " + i + "-th actor")
throw new RuntimeException("[C][FAIL] Could not start client: " + msg)
}
}
t1 = System.nanoTime
logger.error("Starting of a single actor of " + n + ": " + ((t1 - t0) / 1000000.0 / n.toDouble) + " ms")
}
The result:
Starting of a single actor of 10000: 0.3642917 ms
There was a message stating that "Slf4jEventHandler started" between "HELOOOOOOOO" and "Starting of a single", so the experiment seems even more realistic (?)
Dispatchers was a default (a PinnedDispatcher starting a new thread each and every time), and it seemed like all that stuff is the same as Thread.start() was, for a long long time since Java 1 - 500K-1M cycles or so ^)
That's why I've changed all code inside loop, to a new java.lang.Thread().start()
The result:
Starting of a single actor of 10000: 0.1355219 ms
Actors make great finite state machines so let that help drive your design here. If your request handling state is greatly simplified by having one actor per request then do that. I find that actors are particularly good at managing more than two states as a rule of thumb.
Commonly though, one request handling actor that references request state from within a collection that it maintains as part of its own state is a common approach. Note that this can also be achieved with an Akka reactive stream and the use of the scan stage.

Efficient daemon in Vala

i'd like to make a daemon in Vala which only executes a task every X seconds.
I was wondering which would be the best way:
Thread.usleep() or Posix.sleep()
GLib.MainLoop + GLib.Timeout
other?
I don't want it to eat too many resources when it's doing nothing..
If you spend your time sleeping in a system call, there's won't be any appreciable difference from a performance perspective. That said, it probably makes sense to use the MainLoop approach for two reasons:
You're going to need to setup signal handlers so that your daemon can die instantaneously when it is given SIGTERM. If you call quit on your main loop by binding SIGTERM via Posix.signal, that's probably going to be a more readable piece of code than checking that the sleep was successful.
If you ever decide to add complexity, the MainLoop will make it more straight forward.
You can use GLib.Timeout.add_seconds the following way:
Timeout.add_seconds (5000, () => {
/* Do what you want here */
// Continue this "loop" every 5000 ms
return Source.CONTINUE;
// Or remove it
return Source.REMOVE;
}, Priority.LOW);
Note: The Timeout is set as Priority.LOW as it runs in background and should give priority to others tasks.

Resources