Core Data and threading - cocoa

What are some of the obscure pitfalls of using Core Data and threads? I've read much of the documentation, and so far I've come across the following either in the docs or through painful experience:
Use a new NSManagedObjectContext for each thread, but a single NSPersistentStoreCoordinator is enough for the whole app.
Before sending an NSManagedObject's objectID back to the main thread (or any other thread), be sure the context has been saved (or at a minimum, it wasn't a newly-inserted-but-not-yet-saved object) - otherwise the objectID will actually be a temporary ID and not a persistent one.
Use mergeChangesFromContextDidSaveNotification: to detect when a save happens in another thread and use that to merge those changes with the current thread's context.
Bonus question/observation: I was led to believe by the wording of some of the docs that mergeChangesFromContextDidSaveNotification: is something only needed by the main thread to merge changes into the "main" context from worker threads - but I don't think that's the case.
I set up my importer to create batches of data which are imported using a subclass of an NSOperation that owns it's own context. The operations are loaded into an NSOperationQueue that's set to allow the default number of concurrent operations, so it's possible for several import batches to be running at the same time. I would occasionally get very strange validation errors and exceptions (like trying to add nil to a relationship) and other failures that I had never seen when I did all the same stuff on the main thread. It occurred to me (and perhaps this should have been obvious) that maybe the context merging needed to be done for all contexts in every thread - not just the "main" one! I don't know why I didn't think of that before, but I think this helped. (It hasn't been tested well enough yet for me to feel sure, though.) In any case, is it true that you need to observe that notification for ALL import threads that may be working with the same datasets and adding/updating the same entities? If so, this is yet another pitfall bullet point, IMO, although I have yet to be certain that it'll work.
Given how many of these I've run into with Core Data in general (and not all of them just about multi-threading), I have to wonder how many more are lurking. Since multi-threading so often ends up with bugs that are difficult if not impossible to reproduce due to the timing issues, I figured I'd ask if anyone had other important things that I may be missing that I need to concern myself with.

There is an entire rather large bit of documentation devoted to the subject of Core Data and Threading.
It isn't clear from your set of issues what isn't covered by that documentation.

Related

where is fyne's thread safety defined?

I was attracted to Fyne (and hence Go) by a promise of thread safety. But now that I'm getting better at reading Go I'm seeing things that make be believe that the API as a whole is not thread safe and perhaps was never intended to be. So I'm trying to determine what "thread safe" means in Fyne.
I'm looking specifically at
func (l *Label) SetText(text string) {
l.Text = text
l.textProvider.SetText(text) // calls refresh
}
and noting that l.Text is also a string. Assignments in Go are not thread safe, so it seems obvious to me that if two threads fight over the text of a label and both call label.SetText at the same time, I can expect memory corruption.
"But you wouldn't do that", one might say. No, but I am worried about the case of someone editing the content of an Entry while an app thread decides it needs to replace all the Entry's text - this is entirely possible in my app because it supports simultaneous editing by multiple users over a network, so updates to all sorts of widgets come in asynchronously. (Note I don't care what happens if two people edit the same Entry at the same time; someone's changes will be lost and I don't care who's. But it must not result in memory corruption.) Note that one approach I could take would be to have the background thread create an entirely new Entry widget, which would then replace the one in the current Box. But is that thread safe?
It's not that I don't know how to serialize things with channels. But I was hoping that Fyne would eliminate the need for it (a blog post claims it does); and even using channels I can't convince myself that a user meddling with a widget in various ways while some background thread is altering it, hiding it, etc, isn't going to result in crashes. Maybe all that is serialized under the covers and is perfectly safe, but I don't want to find out the hard way that it isn't, because I'll have no way to fix it.
Fyne is clearly pretty new and seems to have tons of promise, but documentation seems light on details. Is more information available somewhere? Have people tried this successfully?
You have found some race conditions here. There are plans to improve, but the 1.2 release was required to get a new "BaseWidget" first - and that was only released a few weeks ago.
Setting fields directly is primarily for setup purposes and so not expected to be used in the way you illustrate. That said, we do want to support it. The base widget will soon introduce something akin to SetFieldsAndRefresh(func()) which will ensure the safety of the code passed and refresh the widget afterward.
There is indeed a race currently within Refresh(). The use of channels internally were designed to remove this - but there are some corners such as multiple goroutines calling it. This is the area that our new BaseWidget code can help with - as they can internally lock automatically. Using this approach will be thread safe with no changes to the developer in a future release.
The API so far has made it possible for developers to not worry about threading and work from any goroutines - we do need to work internally to make it safer - you are quite right. https://github.com/fyne-io/fyne/issues/506

What's the best practice for NSPersistentContainer newBackgroundContext?

I'm familiarizing myself with NSPersistentContainer. I wonder if it's better to spawn an instance of the private context with newBackgroundContext every time I need to insert/fetch some entities in the background or create one private context, keep it and use for all background tasks through the lifetime of the app.
The documentation also offers convenience method performBackgroundTask. Just trying to figure out the best practice here.
I generally recommend one of two approaches. (There are other setups that work, but these are two that I have used, and tested and would recommend.)
The Simple Way
You read from the viewContext and you write to the viewContext and only use the main thread. This is the simplest approach and avoid a lot of the multithread issues that are common with core-data. The problem is that the disk access is happening on the main thread and if you are doing a lot of it it could slow down your app.
This approach is suitable for small lightweight application. Any app that has less than a few thousand total entities and no bulk changes at once would be a good candidate for this. A simple todo list, would be a good example.
The Complex Way
The complex way is to only read from the viewContext on the main thread and do all your writing using performBackgroundTask inside a serial queue. Every block inside the performBackgroundTask refetches any managedObjects that it needs (using objectIds) and all managedObjects that it creates are discarded at the end of the block. Each performBackgroundTask is transactional and saveContext is called at end of the block. A fuller description can be found here: NSPersistentContainer concurrency for saving to core data
This is a robust and functional core-data setup that can manage data at any reasonable scale.
The problem is that you much always make sure that the managedObjects are from the context you expect and are accessed on the correct thread. You also need a serial queue to make sure you don't get write conflicts. And you often need to use fetchedResultsController to make sure entities are not deleted while you are holding pointers to them.

NSURLDownload delegate methods on a separate thread

Is anyone aware of a way to receive NSURLDownload's delegate methods on a separate thread, i.e. not the main one? I am using an NSOperationQueue to manage them but at the moment I need to use the performSelectorOnMainThread method to get it too work. The problem with this is that it drives the kernel task crazy reaching about 30% of CPU cycles. Curiously this has only happened since upgrading to SL, when NSOperationQueue changed behaviour (not that I am dissing it, GCD rocks!)
Thanks
Colin
My first question is, what are you using NSURLDownload to do? Are you just downloading a bunch of files to the disk, or do you really want the data in memory?
If you're downloading a bunch of files to the disk and you don't want to do any special processing, I'd first try just firing off all the NSURLDownloads on the main thread, without bothering with an NSOperationQueue... I mean, how many operations are we talking about? Can they all run concurrently? The callbacks on the main thread shouldn't be too much of a problem, unless you are doing something heavyweight when you get notified you got some data, in which case it seems like...
Otherwise, I'd switch to using NSURLConnection. It's specifically documented to call you back on the thread you set it up on, and is more flexible. Of course, it's not as high-level, so if you really want files saved to disk, you're going to have to write the I/O yourself. Shouldn't be a huge hardship - it's like four extra lines of code.
-W
NSOperationQueue changed behaviour because it was buggy. It's seems really solid now but yeah, it has a different personality.
Reference (http://www.mikeash.com/?page=pyblog/dont-use-nsoperationqueue.html)
Can you give more info on your problem? Do you only need to notify when the download is finished? Are you doing many downloads at once?

Address Book thread safety and performance

My sense from the Address Book documentation and my understanding of the underlying CoreData implementation suggests that Address Book should be thread safe, and making queries from multiple threads should pose no problems. But I'm having trouble finding any explicit discussion of thread safety in the docs. This raises a few questions:
Is it safe to use +sharedAddressBook on multiple threads for read-only access? I believe the answer is yes.
For write-access on background threads, it appears that you should use +addressBook instead (and save your changes manually). Do I understand this correctly?
Has anyone investigated the performance impact of making multiple simultaneous queries to Address Book on multiple threads? This should be very similar to the performance of making multiple CoreData queries on multiple threads. My sense is that I would gain little by making parallel queries since I assume they will serialize when they hit SQLLite, but I'm not certain here.
I need to make dozens of queries (some complex) against AddressBook and am doing so on a background thread using NSOperation to avoid blocking the UI (which it currently does). My underlying question is whether it makes sense to set the max concurrent operations to a value larger than 1, and whether there is any danger in doing so if the application may also be writing to AddressBook at the same time on another thread.
Unless an API says it is threadsafe it is not. Even if the current implementation happens to be thread safe it might not be in the future. In other words, do not use AB from multiple threads.
As an aside, what about it being CoreData based makes you think it would be thread safe? CoreData uses a thread confinement model where it is only safe to access a context on a single thread, all the objects from the context must be accessed on the same thread.
That means that sharedAddressBook will not be thread safe if it keeps an NSManagedObjectContext around to use. It would only be safe if AB creates a new context every time it needs to do something and immediately disposes of it, or if it creates a context per thread and always uses the appropriate context (probably by storing a ref to it in the threadDictionary). In either event it would not be safe to store anything as NSManagedObjects since the contexts would be constantly destroyed, which means every ABRecord would have to store an NSManagedObjectID so it could reconstitute the object in the appropriate context whenever it needed it.
Clearly all of that is possible, it may be what is done, but it is hardly the obvious implementation.

Is it safe to manipulate objects that I created outside my thread if I don't explicitly access them on the thread which created them?

I am working on a cocoa software and in order to keep the GUI responsive during a massive data import (Core Data) I need to run the import outside the main thread.
Is it safe to access those objects even if I created them in the main thread without using locks if I don't explicitly access those objects while the thread is running.
With Core Data, you should have a separate managed object context to use for your import thread, connected to the same coordinator and persistent store. You cannot simply throw objects created in a context used by the main thread into another thread and expect them to work. Furthermore, you cannot do your own locking for this; you must at minimum lock the managed object context the objects are in, as appropriate. But if those objects are bound to by your views a controls, there are no "hooks" that you can add that locking of the context to.
There's no free lunch.
Ben Trumbull explains some of the reasons why you need to use a separate context, and why "just reading" isn't as simple or as safe as you might think, in this great post from late 2004 on the webobjects-dev list. (The whole thread is great.) He's discussing the Enterprise Objects Framework and WebObjects, but his advice is fully applicable to Core Data as well. Just replace "EC" with "NSManagedObjectContext" and "EOF" with "Core Data" in the meat of his message.
The solution to the problem of sharing data between threads in Core Data, like the Enterprise Objects Framework before it, is "don't." If you've thought about it further and you really, honestly do have to share data between threads, then the solution is to keep independent object graphs in thread-isolated contexts, and use the information in the save notification from one context to tell the other context what to re-fetch. -[NSManagedObjectContext refreshObject:mergeChanges:] is specifically designed to support this use.
I believe that this is not safe to do with NSManagedObjects (or subclasses) that are managed by a CoreData NSManagedObjectContext. In general, CoreData may do many tricky things with the sate of managed objects, including firing faults related to those objects in separate threads. In particular, [NSManagedObject initWithEntity:insertIntoManagedObjectContext:] (the designated initializer for NSManagedObjects as of OS X 10.5), does not guarantee that the returned object is safe to pass to an other thread.
Using CoreData with multiple threads is well documented on Apple's dev site.
The whole point of using locks is to ensure that two threads don't try to access the same resource. If you can guarantee that through some other mechanism, go for it.
Even if it's safe, but it's not the best practice to use shared data between threads without synchronizing the access to those fields. It doesn't matter which thread created the object, but if more than one line of execution (thread/process) is accessing the object at the same time, since it can lead to data inconsistency.
If you're absolutely sure that only one thread will ever access this object, than it'd be safe to not synchronize the access. Even then, I'd rather put synchronization in my code now than wait till later when a change in the application puts a second thread sharing the same data without concern about synchronizing access.
Yes, it's safe. A pretty common pattern is to create an object, then add it to a queue or some other collection. A second "consumer" thread takes items from the queue and does something with them. Here, you'd need to synchronize the queue but not the objects that are added to the queue.
It's NOT a good idea to just synchronize everything and hope for the best. You will need to think very carefully about your design and exactly which threads can act upon your objects.
Two things to consider are:
You must be able to guarantee that the object is fully created and initialised before it is made available to other threads.
There must be some mechanism by which the main (GUI) thread detects that the data has been loaded and all is well. To be thread safe this will inevitably involve locking of some kind.
Yes you can do it, it will be safe
...
until the second programmer comes around and does not understand the same assumptions you have made. That second (or 3rd, 4th, 5th, ...) programmer is likely to start using the object in a non safe way (in the creator thread). The problems caused could be very subtle and difficult to track down. For that reason alone, and because its so tempting to use this object in multiple threads, I would make the object thread safe.
To clarify, (thanks to those who left comments):
By "thread safe" I mean programatically devising a scheme to avoid threading issues. I don't necessarily mean devise a locking scheme around your object. You could find a way in your language to make it illegal (or very hard) to use the object in the creator thread. For example, limiting the scope, in the creator thread, to the block of code that creates the object. Once created, pass the object over to the user thread, making sure that the creator thread no longer has a reference to it.
For example, in C++
void CreateObject()
{
Object* sharedObj = new Object();
PassObjectToUsingThread( sharedObj); // this function would be system dependent
}
Then in your creating thread, you no longer have access to the object after its creation, responsibility is passed to the using thread.

Resources