Thread-safe unique entity instance in Core Data - cocoa

I have a Message entity that has a messageID property. I'd like to ensure that there's only ever one instance of a Message entity with a given messageID. In SQL, I'd just add a unique constraint to the messageID column, but I don't know how to do this with Core Data. I don't believe it can be done in the data model itself, so how do you go about it?
My initial thought is to use a validation method to do a fetch on the NSManagedObject's context for the ID, see if it finds anything but itself, and if so, fails the validation. I suspect this will work - but I'm worried about the performance of something like that. I went through a lot of effort to minimize the fetch requests needed for the entire import routine, and having it validate by performing a fetch for every single new message entity seems a bit excessive. I can get all pre-existing objects I need and identify all the new objects I need to insert into the store using just two fetch queries before I do the actual work of importing and connecting everything together. This would add a fetch to every single update or insert in addition to those two - which would seem to eliminate any performance advantage I had by pre-processing the import data in the first place!
The main reason this is an issue is that the importer can (potentially) run several batches concurrently on several threads and may include some overlapping/duplicate data that needs to ultimately result in just one object in the store and not duplicate entries. Is there a reasonable way to do this and does what I'm asking for make sense for Core Data?

The only way to guarantee uniqueness is to do a fetch. Fortunately you can just do a -countForFetchRequest:error: and check to see if it is zero or not. That is the least expensive way to guarantee uniqueness at this time.
You can probably accomplish this in the validation or run it in the loop that is processing the data. Personally I would do it above the creation of the NSManagedObject so that you do not have the unnecessary allocs when a record already exists.

I don't think there is a way to easily guarantee an attribute is unique without doing a lot of work on your own. You can, of course use CFUUIDCreate to create a globally unique UUID, which should be unique, even in a multithreaded environment. But...
The objectID (type NSManagedObjectID) of all managed objects is guaranteed to be unique within the persistent store coordinator. Since you can add arbitrarily many persistent stores to the coordinator, this guarantee basically guarantees that the objectIDs are globally unique. Why don't you use the objectID as your messageID? You can't, of course, change the objectID once it's assigned (and it won't get assigned until the context containing the inserted object is saved; until then it will be a temporary but still unique ID).

So you have a NSManagedContext for each thread, backed by the same persistent store, is that correct? And before you save the NSManagedContext, you'd like to make sure the messageID is unique, that is, that you are not updating an existing row, and that it is not in one of the other contexts, correct?
Given that model (correct me if I misunderstand), I think you'd be better served having one object that manages access to the persistent store. That way, all threads would update one context and you can do your validation in there, using Marcus's -countForFetchRequest:error: suggestion. Granted, that places a bottleneck on this operation.

Just to add my 2 cents: I think inconsistencies will occur sooner or later anyway, and the only way to mitigate them seems to be to do it on an application-level with rather complex code.
So in my case I decided to allow duplicate values for what are supposed to be "unique" fields.
I added code, however, that detects these problems later (e.g. when a fetch that should return 1 object returns more than 1) and fixes them when they occur (usually by deleting).
It's a "go ahead, make a mistake, ill fix it later for you"-strategy.
This is not ideal, of course, but a valid way to attack this problen, imho.

Related

What should be projection primary key on query side - CQRS, Event Sourcing, Microservices

I have one thing that confuses me.
I have 2 microservices.
One creates commands and other consumes commands and produces events (events are stored in Event Store).
In my example aggregates have Guid as Entity ID, and Guid is created when aggregate is created.
Thing that confuses me is, should that key (generated on write side) be transfered via Event to query side (microservice that created command)?
Or maybe query side (projection) should have separate id in read DB.
Or maybe I should generate some shared key?
What is best solution here?
I think it all depends on your setup.
If you are doing CQRS, and you have a separate read-service (within the same bounded context), then it is up to the read-side service to model the data as it wish, either reusing the same keys or not.
If you are communicating between two different services (separate bounded contexts) then I recommend you create new primary keys in the receiving service and use the incoming key as a foreign key. Just as you would do with relationships between two tables in a SQL-database.
I think this depends on your requirements. Is there a specific reason to have different keys?
Given that you are using Guids as your PK, it seems simplest to reuse the PKs assigned by the write side.
Some reasons you might want to keep the keys consistent:
During command processing an ID was returned to the client that they may have cached and should reasonably expect to be able to use that key when querying the read side.
If your write side data is long-lived and there is an bug on your read side output, it is gonna be much easier to debug what went wrong if your keys are consistent on write and read side.
Entities in the write side will use the write side Guid PK of another entity as its FK. When you emit an event for this new dependent entity you would want the read side to be able to build the relationship back to the principal.
This is kind of an odd question.
Your primary key on a projection could literally be anything or you might not even have one.
There is no "correct answer" for this question ... It depends entirely on the projection.
What if my projection was say just a flattening out of information associated to an aggregate ... As example we have an "order" and we make a row per order showing summary information about that order. Using an "OrderId" here would seemingly make some sense as my primary key.
What if my projection was building out counts of orders by Product? Well then using a "ProductItemId" would make a lot more sense.
What if in either of these cases the Ids themselves ("OrderId" and "ProductItemId") could change? Well then using another key might make a lot of sense.
What if this is an append-only table? I might not even want to have a key.
Again, there is not a ... correct ... answer here there are many situations that you may run into.

Android room insert/update result

I've been using room for a while. I'm from a mysql background where you have to check the values of queries and stuff. In room, I find this a bit complicated because so far I can either declare the dao insert query as void or as long returning the rowId
If I return a long, I have to write a listener to notify the UI of success/failure
My question is, is this necessary? Do I need the return value of inserts/updates/deletes or are these queries guaranteed to succeed?
My question is, is this necessary?
This depends and is hopefully better explained (a least a little anyway) below.
Do I need the return value of inserts/updates/deletes or are these
queries guaranteed to succeed?
There is no guarantee that they will succeed. However, you may be able to assume they have or use CONFLICT handling.
Much could depend upon how the Entities are coded, for example say you had a simple table (Entity) with an id and a name and for simplicity that you have autogenerate = true and you never allow the id to be specified when inserting. Unless the database is massive (beyond storage device capacity) or tweaked. A unique id will always result.
If the name needs to be UNIQUE then you are introducing a facet that makes it more likely that the insert will not succeed. If you had the onConflictStrategy of the insert as IGNORE, then a duplicate wouldn't fail but you'd may want to know if nothing was inserted (-1 returned).
This is just one facet. The answer is really that you need to consider the design of the database and of the app itself. Personally I'd always go with informing the user at least of the abnormal/unexpected which probably then means yes it is necessary (typically it is easier to supress code than add new code).

Checking uniqueness on real time application

On a real time messaging application, I want to control if incoming message is unique. For this purpose, I am planning to insert a hash of incoming message as unique key in db and check if I get unique key exception. (ORA-00001 in oracle).
Is this an efficient way or is there a better way to consider for this case ?
For ones who want to know, program is written in java and as a db we use oracle.
If you're trying to get around the performance problem of uniqueness tests on very large strings, then this is a decent way of achieving it, yes.
You might need a way to deal with hash collisions, though, as the presence of a unique key would prevent different messages having the same hash from loading. One way would be to check for existing matching hashes and do a comparison test against the full text of the message. It would keep your index size down as you'd index on the hash not the message text, but Ii would not be completely foolproof as two identical messages could be loaded by different sessions if the timing was exactly right (or wrong, depending on your perspective).

best way to deal with multiple transaction to the database with entity framework

Bit of advice really, i am building an MVC application that takes in feeds for products from multiple sources. This can run into millions and despite my best advice for the client to split all his feeds into smaller chunks, I know they will probably try and do a thousand at a go.
Now the main problem is that I don't want to loop through every xml record and do an insert.
what i would rather do is queue a stack off inserts and then fly them into the database in one massive transaction. Very much like a database SQL import of a whole table.
Is this possible? if so how or what do they call it?
also, if I did want to re-insert repeated products again and again, when nothing has changed, what would be the best practice for this. could I maybe loop through an already fetched dataset?
I'm not sure what is best to do here, so ask the people, what is the consensus when it comes to a scenario like this.
thanks
With the entity framework you will get a single db insert per record you are inserting, there will be no bulk insert (if that is what you were looking for).
However to enclose this in a transaction, you need to do nothing but add your item to the context class.
http://msdn.microsoft.com/en-us/library/bb336792.aspx
This will automatically put in a transaction when you call SaveChanges. All you need to do is ensure you use a single context class and .Add(yourObject) to the context.
So just wait to call SaveChanges until all of the objects have been added to the context.

How can I sort by a transformable attribute in an NSFetchedResultsController?

I'm using NSValueTransformers to encrypt attributes (strings, dates, etc.) in my Core Data model, but I'm pretty sure it's interfering with the sorting in my NSFetchedResultsController.
Does anyone know if there's a way to get around this? I suppose it depends on how the sort is performed; if it's always only performed directly on the database, then I'm probably out of luck. If it sorts on the objects themselves, then perhaps there's a way to activate the transformation before the sort occurs.
I'm guessing it's directly on the database, though, since the sort would be key in grabbing subsets of the collection, which is the main benefit of NSFetchedResultsController anyway.
Note: I should add that there's some strange behavior here... the collection doesn't sort in the first session (the session where the objects are created), but it does sort in subsequent sessions (where the objects already exist and are just being retrieved). So perhaps sorting does work with transformables, but maybe there is caveat in that they have to be saved first or something like that (?)
If you are sorting within the NSFetchedResultsController then it is against the store (i.e. database). However, you can perform a "secondary" sort against the results when they are in memory and therefore decrypted by calling -sortedArrayUsingDescriptors:
update
I believe your inconsistent behavior is probably based on what is already in memory vs. what is being read directly from disk.

Resources