best way to deal with multiple transaction to the database with entity framework - asp.net-mvc-3

Bit of advice really, i am building an MVC application that takes in feeds for products from multiple sources. This can run into millions and despite my best advice for the client to split all his feeds into smaller chunks, I know they will probably try and do a thousand at a go.
Now the main problem is that I don't want to loop through every xml record and do an insert.
what i would rather do is queue a stack off inserts and then fly them into the database in one massive transaction. Very much like a database SQL import of a whole table.
Is this possible? if so how or what do they call it?
also, if I did want to re-insert repeated products again and again, when nothing has changed, what would be the best practice for this. could I maybe loop through an already fetched dataset?
I'm not sure what is best to do here, so ask the people, what is the consensus when it comes to a scenario like this.
thanks

With the entity framework you will get a single db insert per record you are inserting, there will be no bulk insert (if that is what you were looking for).
However to enclose this in a transaction, you need to do nothing but add your item to the context class.
http://msdn.microsoft.com/en-us/library/bb336792.aspx
This will automatically put in a transaction when you call SaveChanges. All you need to do is ensure you use a single context class and .Add(yourObject) to the context.
So just wait to call SaveChanges until all of the objects have been added to the context.

Related

Spring jpa performance the smart way

I have a service that listens to multiple queues and saves the data to a database.
One queue gives me a person.
Now if I code it really simple. I just get one message from the queue at a time.
I do the following
Start transaction
Select from person table to check if it exists.
Either update existing or create a new entity
repository.save(entity)
End transaction
The above is clean and robust. But I get alot of messages its not fast enough.
To improve performance I have done this.
Fetch 100 messages from queue
then
Start transaction
Select all persons where id in (...) in one query using ids from incomming persons
Iterate messages and for each one check if it was selected above. If yes then update it if not then create a new
Save all changes with batch update/create
End transaction
If its a simple message the above is really good. It performs. But if the message is complicated or the logic I should do when I get the message is then the above is not so good since there is a change some of the messages will result in a rollback and the code becomes hard to read.
Any ideas on how to make it run fast in a smarter way?
Why do you need to rollback? Can't you just not execute whatever it is that then has to be rolled back?
IMO the smartest solution would be to code this with a single "upsert" statement. Not sure which database you use, but PostgreSQL for example has the ON CONFLICT clause for inserts that can be used to do updates if the row already exists. You could even configure Hibernate to use that on insert by using the #SQLInsert annotation.

Using HIbernate / Spring whats the best way to watch a table for changes to individual records?

Q: What is the proper way to watch a table for record level changes using Hibernate / Spring? The DB is a typical relational database system. Our intent is to move to an in-memory solution some time in the future but we can't do it just yet. Q: Are we on the right track or is there a better approach? Examples?
We've thought of two possibilities. One is to load and cache the whole table and the other is to implement a hibernate event listener. Problem is that we aren't interested in events originating in the current VM. What we are interested in is if someone else changes the table. If we load and cache the entire table we'll still have to figure out an efficient way to know when it changes so we may end up implementing both a cache and a listener. Of course a listener might not help us if it doesn't hear changes external to the VM. Our interest is in individual records which is to say that if a record changes, we want Java to update something else based on that record. Ideally we want to avoid re-loading the entire cache, assuming we use one, from scratch and instead update specific records in the cache as they change.

Entity Framework performance issues

I'm having a problem with performance with the entity framework.
Here's the scenario.
I have an entity called "Segment". Each of these are stored in their own table in the DB.
"Segments" have a custom property called "IsHPMSSegment" which is a calculated field. It is calculated by calling a stored procedure in the DB that takes the "ID" of the "Segment" and compares some of it's value against values in another table.
One of the queries we need to run is stated as follows: Get me all Segments that are HPMS Segments.
Since the "ISHPMSSegment" value of "Segment" is a custom property, I cannot retrieve it's value directly from the DB when the segments are first selected. Instead, as each "Segment" is being created in the result set, entity framework queries the db again to get the value for "IsHPMSSegment". So everytime a "Segment" is being filled, it has to query the DB once again for each Segment returned.
Example: If I get all "Segments" with an ID greater than 5, and the resultset is 1000 segments, then the DB is hit for a total of 1001 times. Once for the initial select query that gets the 1000 records, and then another 1000 times to fill the "IsHPMSSegment" value of each of the "Segments".
The only workaround I can think of it to create a view in the DB ("vSegments") that contains this extra calculated property, and then link my EF object to this view, instead of to the "Segment" table. That way this property would be filled in the first query.
I then have two choices for the remaining functionality (insert, update, delete):
1) wire up my insert, update, and delete functions for the entity to stored procedures
2) make the view updatable
All of this seems like a lot of extra work just to address this performance issue, and I'm left wondering what benefit there is to using EF at all?
Is there a better solution to the "view + stored procedures" idea I stated above (still using EF)?
If not, what benefit does EF provide me? If I was creating my own DAL from scratch, I would still have to create stored procedures and/or views. How much effort am I really saving by using EF and having to program around it's limitations?
On top of all this, EF doesn't seem to handle updating multiple records at once in a satisfactory way. It sends a single update statement call for each record you are updating, even if you are updating them all exactly the same. This also seems to be a detractor (unless there is some workaround for this that I am unaware of).
This is entirely subjective. In my option the separation of duties between your layers is getting mixed up and causing you problems.
My suggestion would be to remove your stored procedure and move the logic into you business layer. Creation of your 'segments' should start in your business layer and have all the appropriate logic done against it. The final state can then be pushed into your data access layer for persistence.

Can I substitute savepoints for starting new transactions in Oracle?

Right now the process that we're using for inserting sets of records is something like this:
(and note that "set of records" means something like a person's record along with their addresses, phone numbers, or any other joined tables).
Start a transaction.
Insert a set of records that are related.
Commit if everything was successful, roll back otherwise.
Go back to step 1 for the next set of records.
Should we be doing something more like this?
Start a transaction at the beginning of the script
Start a save point for each set of records.
Insert a set of related records.
Roll back to the savepoint if there is an error, go on if everything is successful.
Commit the transaction at the beginning of the script.
After having some issues with ORA-01555 and reading a few Ask Tom articles (like this one), I'm thinking about trying out the second process. Of course, as Tom points out, starting a new transaction is something that should be defined by business needs. Is the second process worth trying out, or is it a bad idea?
A transaction should be a meaningful Unit Of Work. But what constitutes a Unit Of Work depends upon context. In an OLTP system a Unit Of Work would be a single Person, along with their address information, etc. But it sounds as if you are implementing some form of batch processing, which is loading lots of Persons.
If you are having problems with ORA-1555 it is almost certainly because you are have a long running query supplying data which is being updated by other transactions. Committing inside your loop contributes to the cyclical use of UNDO segments, and so will tend to increase the likelihood that the segments you are relying on to provide read consistency will have been reused. So, not doing that is probably a good idea.
Whether using SAVEPOINTs is the solution is a different matter. I'm not sure what advantage that would give you in your situation. As you are working with Oracle10g perhaps you should consider using bulk DML error logging instead.
Alternatively you might wish to rewrite the driving query so that it works with smaller chunks of data. Without knowing more about the specifics of your process I can't give specific advice. But in general, instead of opening one cursor for 10000 records it might be better to open it twenty times for 500 rows a pop. The other thing to consider is whether the insertion process can be made more efficient, say by using bulk collection and FORALL.
Some thoughts...
Seems to me one of the points of the asktom link was to size your rollback/undo appropriately to avoid the 1555's. Is there some reason this is not possible? As he points out, it's far cheaper to buy disk than it is to write/maintain code to handle getting around rollback limitations (although I had to do a double-take after reading the $250 pricetag for a 36Gb drive - that thread started in 2002! Good illustration of Moore's Law!)
This link (Burleson) shows one possible issue with savepoints.
Is your transaction in actuality steps 2,3, and 5 in your second scenario? If so, that's what I'd do - commit each transaction. Sounds a bit to me like scenario 1 is a collection of transactions rolled into one?

Thread-safe unique entity instance in Core Data

I have a Message entity that has a messageID property. I'd like to ensure that there's only ever one instance of a Message entity with a given messageID. In SQL, I'd just add a unique constraint to the messageID column, but I don't know how to do this with Core Data. I don't believe it can be done in the data model itself, so how do you go about it?
My initial thought is to use a validation method to do a fetch on the NSManagedObject's context for the ID, see if it finds anything but itself, and if so, fails the validation. I suspect this will work - but I'm worried about the performance of something like that. I went through a lot of effort to minimize the fetch requests needed for the entire import routine, and having it validate by performing a fetch for every single new message entity seems a bit excessive. I can get all pre-existing objects I need and identify all the new objects I need to insert into the store using just two fetch queries before I do the actual work of importing and connecting everything together. This would add a fetch to every single update or insert in addition to those two - which would seem to eliminate any performance advantage I had by pre-processing the import data in the first place!
The main reason this is an issue is that the importer can (potentially) run several batches concurrently on several threads and may include some overlapping/duplicate data that needs to ultimately result in just one object in the store and not duplicate entries. Is there a reasonable way to do this and does what I'm asking for make sense for Core Data?
The only way to guarantee uniqueness is to do a fetch. Fortunately you can just do a -countForFetchRequest:error: and check to see if it is zero or not. That is the least expensive way to guarantee uniqueness at this time.
You can probably accomplish this in the validation or run it in the loop that is processing the data. Personally I would do it above the creation of the NSManagedObject so that you do not have the unnecessary allocs when a record already exists.
I don't think there is a way to easily guarantee an attribute is unique without doing a lot of work on your own. You can, of course use CFUUIDCreate to create a globally unique UUID, which should be unique, even in a multithreaded environment. But...
The objectID (type NSManagedObjectID) of all managed objects is guaranteed to be unique within the persistent store coordinator. Since you can add arbitrarily many persistent stores to the coordinator, this guarantee basically guarantees that the objectIDs are globally unique. Why don't you use the objectID as your messageID? You can't, of course, change the objectID once it's assigned (and it won't get assigned until the context containing the inserted object is saved; until then it will be a temporary but still unique ID).
So you have a NSManagedContext for each thread, backed by the same persistent store, is that correct? And before you save the NSManagedContext, you'd like to make sure the messageID is unique, that is, that you are not updating an existing row, and that it is not in one of the other contexts, correct?
Given that model (correct me if I misunderstand), I think you'd be better served having one object that manages access to the persistent store. That way, all threads would update one context and you can do your validation in there, using Marcus's -countForFetchRequest:error: suggestion. Granted, that places a bottleneck on this operation.
Just to add my 2 cents: I think inconsistencies will occur sooner or later anyway, and the only way to mitigate them seems to be to do it on an application-level with rather complex code.
So in my case I decided to allow duplicate values for what are supposed to be "unique" fields.
I added code, however, that detects these problems later (e.g. when a fetch that should return 1 object returns more than 1) and fixes them when they occur (usually by deleting).
It's a "go ahead, make a mistake, ill fix it later for you"-strategy.
This is not ideal, of course, but a valid way to attack this problen, imho.

Resources