My team is currently facing an issue in our Xamarin.Forms app across all platforms(Android, iOS, and UWP). Realm will frequently become unresponsive, where the only way to use it again is to close the app. Over the past few months it's become more frequent and easy to reproduce, yet we have not been able to determine the cause or a workaround.
We have identified a few patterns that may help identify what's happening. We've noticed that whenever something that needs information from the database, we'll see that worker thread stuck on a Realm.Write() call. This behavior seems almost as if there's a deadlock occuring within the Realm system. It's not consistent as to which Write() call it's stuck on, seeming to be random based on when the Realm fails. At that point, any other attempts to access this realm through any method, such as Find(),All(),Remove(), etc also get stuck. We've also confirmed that the code within the Write() is never being run at this point, since we can put a realm independent logging call on the first line and never see it in our logs.
Once this issue occurs, some other issues can happen in addition to this. We have two other Realms in our app that handle completely separate data, and as such have no overlapping code. These Realms are never the cause of this issue, but when the problem Realm gets stuck, it sometimes causes the other Realms to get stuck on their next calls as well. This issue also sometimes persists between uses of the app, causing the very first call to Realm to get stuck and requires a complete reinstall to fix.
Due to our app using Reactive based programming, we've had to structure how we handle our database a bit differently. For the problem Realm, we have a service that keeps a single instance active in an observable stream, which can then be subscribed to for watching changes. I've added some examples of this architecture at the end of this post. We also route all our other non-observable actions through this stream, however during debugging we've been able to move these calls to their own independent realm instances with little issue/no change to functionality.
Currently, we're thinking it's most likely an issue related either to how we're converting Realm to an observable system, or with our Realms crashing/becoming corrupted somehow.
RealmStream declaration:
_realmStream = Observable
.Start(() => Realm.GetInstance(_dbConfig), _scheduler)
.Do(_ => logger.LogTrace("Realm created"), () => logger.LogTrace("Realm stream completed"))
.Replay()
.AutoConnect();
RealmStream use example:
public IObservable<IChangeSet<TResult>> GetChangeSetStream<TSource, TResult>(Func<Realm, IQueryable<TSource>> selector, Func<TSource, TResult> transformFactory) where TSource : RealmObject
{
return _realmStream
.Select(realm =>
selector(realm)
.AsRealmCollection()
.ToObservableChangeSet<IRealmCollection<TSource>, TSource>()
.SubscribeOn(_scheduler)
.Transform(transformFactory)
.DisposeMany())
.Switch()
.Catch<IChangeSet<TResult>, Exception>(ex =>
{
_logger.LogError(ex, "Error getting property change stream");
return Observable.Return<IChangeSet<TResult>>(default);
})
.SubscribeOn(_scheduler);
}
Non-Observable realm methodss:
public async Task Run(Action<Realm> action)
{
await _realmStream
.Do(action)
.SubscribeOn(_scheduler);
}
public async Task<TResult> Run<TResult>(Func<Realm, TResult> action)
{
return await _realmStream
.Select(action)
.SubscribeOn(_scheduler);
}
So far, we've attempted the following:
Made sure Realm and Xamarin are both on the most recent versions
Reducing the number of Realm.Write()s (Minor improvement)
Moving every Realm function into our observable system (No noticable change, most of our functions already do this)
Attempted moving everything that does not require observables to using independent realm instances (increased frequency of locking)
Attempted to move everything away from our single instance of Realm. We weren't able to do this, as we could not determine how to properly handle some observable events, such as a RealmObject being deleted, without causing major performance issues
realm.Write needs to acquire a write lock and based on your description, it appears that you do get a deadlock where a thread with an open write transaction waits for another thread that is stuck on the realm.Write call. If you're able to reproduce the hand with a debugger attached, you can inspect the threads window and try to pinpoint the offending code.
This article provides some tips about debugging deadlocks. Unfortunately, without the whole project and a repro case, it'd be hard to pinpoint the cause.
Related
I have a Spring Boot application which contains a complex reactive flow (it involves MongoDB and RabbitMQ operations). Most of the time it works, but...
Some of the methods return a Mono<Void>. This is a typical pattern, in multiple layers:
fun workflowStep(things: List<Thing>): Mono<Void> =
Flux.fromIterable(things).flatMap { thing -> doSomethingTo(thing) }.collectList().then()
Let's say doSomethingTo() returns a Mono<Void> (it writes something to the database, sends a message etc). If I just replace it with Mono.empty() then everything works as expected, but otherwise it doesn't. More specifically the Mono never completes, it runs through all processing but misses the termination signal at the end. So the things are actually written in the database, messages are actually sent, etc.
To prove that the lack of termination is the problem, here is a hack that works:
val hackedDelayedMono = Mono.empty<Void>().delayElement(Duration.ofSeconds(1))
return Mono.first(
workflowStep(things),
hackedDelayedMono
)
The question is, what can I do with a Mono that never completes, to figure out what's going on? There is nowhere I could put a logging statement or a brakepoint, because:
there are no errors
there are no signals emitted
How could I check what the Mono is waiting for to be completed?
ps. I could not reproduce this behaviour outside the application, with simple Mono workflows.
You can trace and log events in your stream by using the log() operator in your reactive stream. This is useful for gaining a better understanding about what events are occurring within your app.
Flux.fromIterable(things)
.flatMap(thing -> doSomethingTo(thing))
.log()
.collectList()
.then()
Chained inside a sequence, it peeks at every event of the Flux or Mono
upstream of it (including onNext, onError, and onComplete as well as
subscriptions, cancellations, and requests).
Reactor Reference Documentation - Logging a Sequence
The Reactor reference documentation also contains other helpful advice for debugging a reactive stream and can be found here: Debugging Reactor
(We managed to fix the problem - it was not directly in the code I was working on, but for some reason my changes triggered it. I still don't understand the root cause, but higher up the chain we found a Mono.zip() zipping a Mono<Void>. Although this used to work before, it stopped working at some point. Why is a Mono<Void> even zippable, why don't we get a compiler error, and even worse, why does it work sometimes?)
To answer my own question here, the tool used for debugging was adding the following to all Monos in the chain, until it didn't produce any output:
mono.doOnEach { x ->
logger.info("signal: ${x}")
}
.then(Mono.defer {
logger.info("then()")
Mono.empty<Void>()
})
I also experimented with the .log() - also fine tool, but maybe too detailed, and it is not very easy to understand which Mono produces which log messages - as these are logged with the dynamic scope, not the lexical scope, which the above method gives you unambiguously.
I am writing an application for my own purposes that aims to get play pause events no matter what is going on in the system. I have gotten this much working
let commandCenter = MPRemoteCommandCenter.shared()
commandCenter.togglePlayPauseCommand.isEnabled = true
commandCenter.togglePlayPauseCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("Play Pause Command")
return .success
}
commandCenter.nextTrackCommand.isEnabled = true
commandCenter.nextTrackCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("NextTrackCommand")
return .success
}
commandCenter.previousTrackCommand.isEnabled = true
commandCenter.previousTrackCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("previousTrackCommand")
return .success
}
commandCenter.playCommand.isEnabled = true
commandCenter.playCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("playCommand")
return .success
}
MPNowPlayingInfoCenter.default().playbackState = .playing
Most of those methods are there because apparently you will not get any notifications without having nextTrackCommand or previousTrackCommand or playCommand implemented.
Anyways my one issue is that as soon as you open another application that uses audio these event handlers stop getting called and I cant find a way to detect and fix this.
I would normally try doing AVAudioSession things to state this as a background application however that does not seem to work. Any ideas on how I can get playpause events no matter what state the system is in?
I would like to be able to always listen for these events OR get an indication of when someone else has taken control of the audio? Perhaps even be able to re-subscribe to these play pause events.
There's an internal queue in the system which contains all the audio event subscribers. Other applications get on top of it when you start using them.
I would like to be able to always listen for these events
There's no API for that but there's a dirty workaround. If I understand your issue correctly, this snippet:
MPNowPlayingInfoCenter.default().playbackState = .paused
MPNowPlayingInfoCenter.default().playbackState = .playing
must do the trick for you if you run it in a loop somewhere in your application.
Note that this is not 100% reliable because:
If an event is generated before two subsequent playbackState state changes right after you've switched to a different application, it would still be catched by the application in the active window;
If another application is doing the same thing, there would be a constant race condition in the queue, with unpredictable outcome.
References:
Documentation for playbackState is here;
See also a similar question;
See also a bug report for mpv with a similar
issue (a pre-MPRemoteCommandCenter one, but still very valuable)
OR get an indication of when someone else has taken control of the audio
As far as I know there's no public API for this in macOS.
Basically I want an Aktor to change a scalafx-GUI safely.
I've read many posts describing this, but there where sometimes contradictory and some years old, so some of them might be outdated.
I have a working example code and I basically want to know if this kind of programming is thread-save.
The other question is if I can configure sbt or the compiler or something in a way, that all threads (from the gui, the actors and the futures) are started by the same dispatcher.
I've found some example code "scalafx-akka-demo" on GitHub, which is 4 years old. What I did in the following example is basically the same, just a little simplified to keep things easy.
Then there is the scalatrix-example approximately with the same age. This example really worries me.
In there is a self-written dispatcher from Viktor Klang from 2012, and I have no idea how to make this work or if I really need it. The question is: Is this dispatcher only an optimisation or do I have to use something like it to be thread save?
But even if I don't absolutely need the dispatcher like in scalatrix, it is not optimal to have a dispatcher for the aktor-threads and one for the scalafx-event-threads. (And maybe one for the Futures-threads as well?)
In my actual project, I have some measurement values coming from a device over TCP-IP, going to an TCP-IP actor and are to be displayed in a scalafx-GUI. But this is much to long.
So here is my example code:
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scalafx.Includes._
import scalafx.application.{JFXApp, Platform}
import scalafx.application.JFXApp.PrimaryStage
import scalafx.event.ActionEvent
import scalafx.scene.Scene
import scalafx.scene.control.Button
import scalafx.stage.WindowEvent
import scala.concurrent.ExecutionContext.Implicits.global
object Main extends JFXApp {
case object Count
case object StopCounter
case object CounterReset
val aktorSystem: ActorSystem = ActorSystem("My-Aktor-system") // Create actor context
val guiActor: ActorRef = aktorSystem.actorOf(Props(new GUIActor), "guiActor") // Create GUI actor
val button: Button = new Button(text = "0") {
onAction = (_: ActionEvent) => guiActor ! Count
}
val someComputation = Future {
Thread.sleep(10000)
println("Doing counter reset")
guiActor ! CounterReset
Platform.runLater(button.text = "0")
}
class GUIActor extends Actor {
def receive: Receive = counter(1)
def counter(n: Int): Receive = {
case Count =>
Platform.runLater(button.text = n.toString)
println("The count is: " + n)
context.become(counter(n + 1))
case CounterReset => context.become(counter(1))
case StopCounter => context.system.terminate()
}
}
stage = new PrimaryStage {
scene = new Scene {
root = button
}
onCloseRequest = (_: WindowEvent) => {
guiActor ! StopCounter
Await.ready(aktorSystem.whenTerminated, 5.seconds)
Platform.exit()
}
}
}
So this code brings up a button, and every time it is clicked the number of the button increases. After some time the number on the button is reset once.
In this example-code I tried to bring the scalafx-GUI, the actor and the Future to influence each other. So the button click sends a message to the actor, and then the actor changes the gui - which is what I am testing here.
The Future also sends to the actor and changes the gui.
So far, this example works and I haven't found everything wrong with it.
But unfortunately, when it comes to thread-safety this doesn't mean much
My concrete questions are:
Is the method to change the gui in the example code thread save?
Is there may be a better way to do it?
Can the different threads be started from the same dispatcher?
(if yes, then how?)
To address your questions:
1) Is the method to change the gui in the example code thread save?
Yes.
JavaFX, which ScalaFX sits upon, implements thread safety by insisting that all GUI interactions take place upon the JavaFX Application Thread (JAT), which is created during JavaFX initialization (ScalaFX takes care of this for you). Any code running on a different thread that interacts with JavaFX/ScalaFX will result in an error.
You are ensuring that your GUI code executes on the JAT by passing interacting code via the Platform.runLater method, which evaluates its arguments on the JAT. Because arguments are passed by name, they are not evaluated on the calling thread.
So, as far as JavaFX is concerned, your code is thread safe.
However, potential issues can still arise if the code you pass to Platform.runLater contains any references to mutable state maintained on other threads.
You have two calls to Platform.runLater. In the first of these (button.text = "0"), the only mutable state (button.text) belongs to JavaFX, which will be examined and modified on the JAT, so you're good.
In the second call (button.text = n.toString), you're passing the same JavaFX mutable state (button.text). But you're also passing a reference to n, which belongs to the GUIActor thread. However, this value is immutable, and so there are no threading issues from looking at its value. (The count is maintained by the Akka GUIActor class's context, and the only interactions that change the count come through Akka's message handling mechanism, which is guaranteed to be thread safe.)
That said, there is one potential issue here: the Future both resets the count (which will occur on the GUIActor thread) as well as setting the text to "0" (which will occur on the JAT). Consequently, it's possible that these two actions will occur in an unexpected order, such as button's text being changed to "0" before the count is actually reset. If this occurs simultaneously with the user clicking the button, you'll get a race condition and it's conceivable that the displayed value may end up not matching the current count.
2) Is there may be a better way to do it?
There's always a better way! ;-)
To be honest, given this small example, there's not a lot of further improvement to be made.
I would try to keep all of the interaction with the GUI inside either GUIActor, or the Main object to simplify the threading and synchronization issues.
For example, going back to the last point in the previous answer, rather than have the Future update button.text, it would be better if that was done as part of the CounterReset message handler in GUIActor, which then guarantees that the counter and button appearance are synchronized correctly (or, at least, that they're always updated in the same order), with the displayed value guaranteed to match the count.
If your GUIActor class is handling a lot of interaction with the GUI, then you could have it execute all of its code on the JAT (I think this was the purpose of Viktor Klang's example), which would simplify a lot of its code. For example, you would not have to call Platform.runLater to interact with the GUI. The downside is that you then cannot perform processing in parallel with the GUI, which might slow down its performance and responsiveness as a result.
3) Can the different threads be started from the same dispatcher? (if yes, then how?)
You can specify custom execution contexts for both futures and Akka actors to get better control of their threads and dispatching. However, given Donald Knuth's observation that "premature optimization is the root of all evil", there's no evidence that this would provide you with any benefits whatsoever, and your code would become significantly more complicated as a result.
As far as I'm aware, you can't change the execution context for JavaFX/ScalaFX, since JAT creation must be finely controlled in order to guarantee thread safety. But I could be wrong.
In any case, the overhead of having different dispatchers is not going to be high. One of the reasons for using futures and actors is that they will take care of these issues for you by default. Unless you have a good reason to do otherwise, I would use the defaults.
I'm currently using Calabash framework to automate functional testing for a native Android and IOS application. During my time studying it, I stumbled upon this example project from Xamarin that uses page objects design pattern which I find to be much better to organize the code in a Selenium fashion.
I have made a few adjustments to the original project, adding a file called page_utils.rb in the support directory of the calabash project structure. This file has this method:
def change_page(next_page)
sleep 2
puts "current page is #{current_page_name} changing to #{next_page}"
#current_page = page(next_page).await(PAGE_TRANSITION_PARAMETERS)
sleep 1
capture_screenshot
#current_page.assert_info_present
end
So in my custom steps implementation, when I want to change the page, I trigger the event that changes the page in the UI and update the reference for Calabash calling this method, in example:
#current_page.click_to_home_page
change_page(HomePage)
PAGE_TRANSITION_PARAMETERS is a hash with parameters such as timeout:
PAGE_TRANSITION_PARAMETERS = {
timeout: 10,
screenshot_on_error: true
}
Just so happens to be that whenever I have a timeout waiting for any element in any screen during a test run, I get a generic error message such as:
Timeout waiting for elements: * id:'btn_ok' (Calabash::Android::WaitHelpers::WaitError)
./features/support/utils/page_utils.rb:14:in `change_page'
./features/step_definitions/login_steps.rb:49:in `/^I enter my valid credentials$/'
features/04_support_and_settings.feature:9:in `And I enter my valid credentials'
btn_ok is the id defined for the trait of the first screen in my application, I don't understand why this keeps popping up even in steps ahead of that screen, masking the real problem.
Can anyone help getting rid of this annoyance? Makes really hard debugging test failures, specially on the test cloud.
welcome to Calabash!
As you might be aware, you'll get a Timeout waiting for elements: exception when you attempt to query/wait for an element which can't be found on the screen. When you call page.await(opts), it is actually calling wait_for_elements_exist([trait], opts), which means in your case that after 10 seconds of waiting, the view with id btn_ok can't be found on the screen.
What is assert_info_present ? Does it call wait_for_element_exists or something similar? More importantly, what method is actually being called in page_utils.rb:14 ?
And does your app actually return to the home screen when you invoke click_to_home_page ?
Unfortunately it's difficult to diagnose the issue without some more info, but I'll throw out a few suggestions:
My first guess without seeing your application or your step definitions is that #current_page.click_to_home_page is taking longer than 10 seconds to actually bring the home page back. If that's the case, simply try increasing the timeout (or remove it altogether, since the default is 30 seconds. See source).
My second guess is that the element with id btn_ok is not actually visible on screen when your app returns to the home screen. If that's the case, you could try changing the trait definition from * id:'btn_ok' to all * id:'btn_ok' (the all operator will include views that aren't actually visible on screen). Again, I have no idea what your app looks like so it's hard to say.
My third guess is it's something related to assert_info_present, but it's hard to say without seeing the step defs.
On an unrelated note, I apologize if our sample code is a bit outdated, but at the time of writing we generally don't encourage the use of #current_page to keep track of a page. Calabash was written in a more or less stateless manner and we generally encourage step definitions to avoid using state wherever possible.
Hope this helps! Best of luck.
I recently ran into an instance where I wanted to hit the database from a Task I have running periodically within a web application. I refactored the code to use the ThreadStaticSessionContext so that I could get a session without an HttpContext. This works fine for reads, but when I try to flush an update from the Task, I get the "Index was out of range. Must be non-negative and less than the size of the collection." error. Normally what I see for this error has to do with using a column name twice in the mapping, but that doesn't seem to be the issue here, as I'm able to update that table if the session is associated with a request (and I looked and I'm not seeing any duplicates). It's only when the Task tries to flush that I get the exception.
Does anyone know why it would work fine from a request, but not from a call from a Task?
Could it be because the Task is asynchronous?
Call Stack:
at System.ThrowHelper.ThrowArgumentOutOfRangeException()
at System.Collections.Generic.List`1.System.Collections.IList.get_Item(Int32 index)
at NHibernate.Engine.ActionQueue.ExecuteActions(IList list)
at NHibernate.Engine.ActionQueue.ExecuteActions()
at NHibernate.Event.Default.AbstractFlushingEventListener.PerformExecutions(IEventSource session)
at NHibernate.Event.Default.DefaultFlushEventListener.OnFlush(FlushEvent event)
at NHibernate.Impl.SessionImpl.Flush()
Session Generation:
internal static ISession CurrentSession {
get {
if(HasSession) return Initializer.SessionFactory.GetCurrentSession();
ISession session = Initializer.SessionFactory.OpenSession();
session.BeginTransaction();
CurrentSessionContext.Bind(session);
return session;
}
}
private static bool HasSession {
get { return CurrentSessionContext.HasBind(Initializer.SessionFactory); }
}
Task that I want to access the database from:
_maid = Task.Factory.StartNew(async () => {
while(true) {
if(CleaningSession != null) CleaningSession(Instance, new CleaningSessionEventArgs { Session = UnitOfWorkProvider.CurrentSession });
UnitOfWorkProvider.TransactionManager.Commit();
await Task.Delay(AppSettings.TempPollingInterval, _paycheck.Token);
}
//I know this function never returns, I'm using the cancellation token for that
// ReSharper disable once FunctionNeverReturns
}, _paycheck.Token);
_maid.GetAwaiter().OnCompleted(() => _maid.Dispose());
Edit: Quick clarification about some of the types above. CleaningSession is an event that is fired to run the various things that need to be done, and _paycheck is the CancellationTokenSource for the Task.
Edit 2: Oh yeah, and this is using NHibernate version 4.0.0.4000
Edit 3: I have since attempted this using a Timer, with the same results.
Edit 4: From what I can see of the source, it's doing a foreach loop on an IList. Questions pertaining to an IndexOutOfRangeException in a foreach loop tend to suggest a concurrency issue. I still don't see how that would be an issue, unless I misunderstand the purpose of ThreadStaticSessionContext.
Edit 5: I thought it might be because of requests bouncing around between threads, so I tried creating a new SessionContext that combines the logic of the WebSessionContext and ThreadStaticSessionContext. Still getting the issue, though...
Edit 6: It seems this has something to do with a listener I have set up to update some audit fields on entities just before they're saved. If I don't run it, the commit occurs properly. Would it be better to do this through an event than OnPreInsert, or use an interceptor instead?
After muddling through, I found out exactly where the problem was. Basically, there was a query that was run to load the current user record called from inside of the PreUpdate event in my listener.
I came across two solutions to this. I could cache the user in memory, avoiding the query, but having possibly stale data (not that anything other than the id matters here). Alternatively, I could open a temporary stateless session and use that to look up the user in question.