we have designed a multi threaded server that use linq to sql at each of its threads. the test are not look so good... from reviewing our code, a big question has be raised: does linq to sql supports such environments at all?
if yes, we assume that we should create a dedicated DataContext object for each thread? if yes, what is the cost of such approach?
if it will be to complicated, I guess we will dump the linq to sql and roll back to connected approach.. (?)
similar question was raised from the (Per Call)WCF based API side: we use a singleton DataContext object for all WCF function calls? or should we initiate a DataContext for each WCF API call?
Thanks!
ofer
DataContext
Any instance members are not guaranteed to be thread safe.
Read as: this class is not thread safe and instances should not be shared between threads unless you implement locking.
Be aware that (by default) datacontexts track the record instances that they load - those instances should also not be shared between threads. Those tracked record instances are not refreshed for changes in the database automatically when requeried.
Be aware that calling SubmitChanges on a datacontext sends all modified records that it is tracking back to the database... this could be really bad with multiple users sharing a datacontext even with locking.
Also from the same article:
In general, a DataContext instance is designed to last for one "unit of work" however your application defines that term. A DataContext is lightweight and is not expensive to create. A typical LINQ to SQL application creates DataContext instances at method scope or as a member of short-lived classes that represent a logical set of related database operations.
if it will be to complicated, I guess we will dump the linq to sql and roll back to connected approach..
You should also not share a SqlConnection object between threads without implementing locking.
Threads have their own respective stack of variables and references. So yes, I believe you would need a DataContext for each thread.
Related
When you develop an ASP.NET application using the repository pattern, do each of your methods create a new entity container instance (context) with a using block for each method, or do you create a class-level/private instance of the container for use by any of the repository methods until the repository itself is disposed? Other than what I note below, what are the advantages/disadvantages? Is there a way to combine the benefits of each of these that I'm just not seeing? Does your repository implement IDisposable, allowing you to create using blocks for instances of your repo?
Multiple containers (vs. single)
Advantages:
Preventing connections from being auto-closed/disposed (will be closed at the end of the using block).
Helps force you to only pull into memory what you need for a particular view/viewmodel, and in less round-trips (you will get a connection error for anything you attempt to lazy load).
Disadvantages:
Access of child entities within the Controller/View is limited to what you called with Include()
For pages like a dashboard index that shows information gathered from many tables (many different repository method calls), we will add the overhead of creating and disposing many entity containers.
If you are instantiating your context in your repository, then you should always do it locally, and wrap it in a using statement.
If you're using Dependency Injection to inject the context, then let your DI container handle calling dispose on the context when the request is done.
Don't instantiate your context directly as a class member, since this will not dispose of the contexts resources until garbage collection occurs. If you do, then you will need to implement IDipsosable to dispose the context, and make sure that whatever is using your repository properly disposes of your repository.
I, personally, put my context at the class level in my repository. My primary reason for doing so is because a distinct advantage of the repository pattern is that I can easily swap repositories and take advantage of a different backend. Remember - the purpose of the repository pattern is that you provide an interface that provides back data to some client. If you ever switch your data source, or just want to provide a new data source on the fly (via dependency injection), you've created a much more difficult problem if you do this on a per-method level.
Microsoft's MSDN site has good information the repository pattern. Hopefully this helps clarify some things.
I disagree with all four points:
Preventing connections from being auto-closed/disposed (will be closed
at the end of the using block).
In my opinion it doesn't matter if you dispose the context on method level, repository instance level or request level. (You have to dispose the context of course at the end of a single request - either by wrapping the repository method in a using statement or by implementing IDisposable on the repository class (as you proposed) and wrapping the repository instance in a using statement in the controller action or by instantiating the repository in the controller constructor and dispose it in the Dispose override of the controller class - or by instantiating the context when the request begins and diposing it when the request ends (some Dependency Injection containers will help to do this work).) Why should the context be "auto-disposed"? In desktop application it is possible and common to have a context per window/view which might be open for hours.
Helps force you to only pull into memory what you need for a
particular view/viewmodel, and in less round-trips (you will get a
connection error for anything you attempt to lazy load).
Honestly I would enforce this by disabling lazy loading altogether. I don't see any benefit of lazy loading in a web application where the client is disconnected from the server anyway. In your controller actions you always know what you need to load and can use eager or explicit loading. To avoid memory overhead and improve performance, you can always disable change tracking for GET requests because EF can't track changes on a client's web page anyway.
Access of child entities within the Controller/View is limited to what
you called with Include()
Which is rather an advantage than a disadvantage because you don't have the unwished surprises of lazy loading. If you need to populate child entities later in the controller actions, depending on some condition, you could load them through additional repository methods (LoadNavigationProperty or something) with the same or even a new context.
For pages like a dashboard index that shows information gathered from
many tables (many different repository method calls), we will add the
overhead of creating and disposing many entity containers.
Creating contexts - and I don't think we are talking about hundreds or thousands of instances - is a cheap operation. I would call this a very theoretical overhead which doesn't play a role in practice.
I've used both approaches you mentioned in web applications and also the third option, namely to create a single context per request and inject this same context into every repository/service I need in a controller action. They all three worked for me.
Of course if you use multiple contexts you have to be careful to do all the work in the same unit of work to avoid attaching entities to multiple contexts which will lead to well know exceptions. It's usually not a problem to avoid this situations but requires a bit more attention, especially when processing POST requests.
I lately use contexts per request, because it is easier and I just don't see the benefit of having very narrow contexts and I see no reason to use more than one single unit of work for the whole request processing. If I would need multiple contexts - for whatever reason - I could always create specialized methods which act with their own context instead of the "default context" of the request.
I have a n-tier application based on pretty classic different layers: User Interface, Services (WCF), Business Logic and Data Access.
Database (Sql Server) is obviously quered throught Entity Framework, the problem is basically that every call starts from user interface and go throught all the layers, but doing that I need to create a new ObjectContext each time for every operation and that makes performance very bad because every time I need to reload metadata and recompile the query.
The most suggested pattern it would be the one below and it is what I'm actually doing: creating and passing the new context throught business layer methods each time the service receives a call
public BusinessObject GetQuery(){
using (MyObjectContext context = new MyObjectContext()){
//..do something } }
For easy query I don't see any particular dealy and it works fine but for complex and heavy query it makes a 2 seconds query to keep going for like 15 seconds each call.
I could set the ObjectContext static and it would solve the performance issue but it appears to be not suggested by anyone, also because I won't be able to access the context at the same time from different thread and multiple calls raise an exception. I could make it thread-safe but mantain the same ObjectContext for long time makes it bigger and bigger (and slower) because the reference it imports each query it execute a query.
The architecture I have I think it is the most common so what is the best and known way to implement and use ObjectContext?
Thank you,
Marco
In a Web context, it's best to use a stateless approach and create an ObjectContext for each request.
The cost of ObjectContext construction are minimal. The metadata is loaded from a global cache so only the first call will have to load it.
Static is definitely not a good idea. The ObjectContext is not thread save and this will lead to problems when using it in a WCF service with multiple calls. Making it thread save will result in less performance and it can cause subtle errors when reusing it in multiple requests.
Check this info: How to decide on a lifetime for your ObjectContext
Working with a static object context is not a good idea. A static context will be shared by all users of the web application meaning that when one user makes modifications to a context such as calling saveChanges , all other users using the context will be affected (this would be a problem when supposing they have added or updated data to the context but have not called save changes). The best practice while working with object context is to keep it alive for the period of the request and use if to perform any atomic business operations. You would want to check out the UnitOfWork pattern and repository pattern
uow
uow and repository in EF
If you feel you are having performance issues with your queries and there is a possibility that you would reuse your query , I would recommend you use precompiled linq queries. You can check out the links below for more info
precompiled linq julie lermann
precompiled linq
What you show is the typical pattern to use a context - by request, similar to using a database connection.
What makes you think the bad performance is related to recreating the context? This is very, very likely not the case. How did you measure this impact?
If you have such performance critical code that this overhead truly matters you should not use Entity Framework since there always will be some overhead, even if the overhead should be very little in the general case. I would start focusing on your data model though and the underlying data store which will have a much larger impact on your query performance. Have you optimized your queries? Did you put indexes everywhere you need them? Can you de-normalize the data to remove joins?
Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?
Example 1# (with bad performance) :
#Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
What could be improved here ?
In my opinion :
I would set hibernate batch size (see hibernate documentation for batch processing).
I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.
Do you agree ? Any tips ?
Here are 2 basic strategies:
JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.
You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.
There are a few things to keep in mind here:
Loading all entites into memory with a findAll method can lead to OOM exceptions.
You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.
Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.
The other alternative is to use a stateful session but clear the session at regular intervals as shown here.
I hope this is useful advice.
If I expose IQueryable from my service layer, wouldn't the database calls be less if I need to grab information from multiple services?
For example, I'd like to display 2 separate lists on a page, Posts and Users. I have 2 separate services that provides a list of these. If both provides IQueryable, will they be joint in 1 database call? Each repository creates a context for itself.
It's best to think of an IQueryable<T> as a single query waiting to be run. So if you return 2 IQueryable<T> instances and run them in the controller, it wouldn't be any different than running them separably in their own service methods. Each time you execute the IQuerable<T> to get results, it will run the query by itself independent of other IQuerable<T> objects.
The only time (as far as I know) it will make an impact if there is enough time between the two service calls that the database connection might close, but you would need a considerable amount of time in between the service calls for that to be the case.
Returning IQuerable<T> to the controller still has some usefulness, such as easier handling of paging and sorting (so sorting is done on the controller and is not done on the service layer which doesn't necessarily care about how data is sorted or paged). This isn't a performance concern though, and people will disagree about if it's best to do this in the controller or not (I've seen reputable developers do this and give well thought out reasons why).
No. The best an IQueryable can do is reduce the number of calls within a singular database context. An IQueryable will not cross contexts.
Personally, I don't use IQueryables past the repositories for a number of reasons:
1) I don't use the same domain objects as database objects, and seeing "no translation to SQL" pisses me off ;)
2) I don't like the necessary structure for IQueryables in views: foreach (var item in collection){var tempItem = item; code on tempItem}
3) I've come up with a method of passing generic filters to the data layer (LinqKit and PredicateBuilder are gods)
If these reasons don't apply to you, of course you should feel free to use IQueryables to whichever layer you desire.
Not with two different contexts.
Definitely NO. It's a leaky abstraction.
It allows abominations like this:
q.Where(x=>{Console.WriteLine("fail");return true;});
Thing is - when exposing IQueryable, You are saying that Your data layer fully supports linq to objects.
If you make two method calls you will make two queries.
You can combine the methods into a single method which gets all the data at once.
If you are implementing the repository pattern you will have an easier time if you instantiate one database context per request.
Your service layer is exactly that, a layer which serves up what you need. Often times my service layers are named things like SearchService which has methods for returning every packaged collection I will ever need (the actual view models themselves). And if I ever need a new search, my service layer gets a new method. The backing for your service layer can then contain any data backing or persistence model you would like, be it a repository or Entity Framework provider, etc.
To answer your question though, the line needs to be drawn at the service layer, all queries need to be contained within it and only data returned.
I have a method that returns lot of data, should I use #TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED) for this method. The method perform a JPA query an loads the full content of a table (about 1000 rows).
The client to this method - is that already in a transaction? When you use NotSupported the caller transaction will be suspended. If not I would say, just put Never as the transaction type. Never is better since callers know they are not supposed to call this method from inside a transaction. A more straight forward contract.
We always use Never for methods that do more processing so that developers are aware right off the bat not to call if they are involved in a transaction already. Hope it helps.
I would care to disagree as it seldom happens that user is not in a transaction in almost all the systems. The best approach is to use NOT SUPPORTED so that the transaction is suspended if the caller is in any transaction already. NEVER is troublesome unless you have a series of calls which are all in NO TRANSACTION scope. In short, NOT SUPPORTED is the type one should use.
As far as I know (at least this is the case with Hibernate), you cannot use JPA outside of a transaction as the entity manager's lifecycle is linked to the transaction's lifecycle. So the actual method that does the query must be transactional.
However, you can set it to TransactionAttributeType.REQUIRES_NEW; this would suspend any existing transaction, start a new one, and stop it when the method returns. That means all your entities would be detached by the time they reach the caller, which is what it sounds like you're trying to achieve.
In more complex systems, it pays to completely separate your data layer from your business layer and create a new set of object. Your method will then call the JPA query, then use the entities returned to populate objects from your business layer, and return those. That way the caller can never get their hands on the actual JPA entities and you are free to do in your data layer what you want, since now it's just an implementation detail. (Heck, you could change the database call to a remote API call and your caller wouldn't have to know.)