Entity Framework: Doing large queries - performance

I'm probably addressing one of the bigger usability-issues in EF.
I need to perform a calculation on a very big part of a model. For example, say we need a Building, with all of its doors, the categories of those doors. But I'd also need the windows, furniture, roof etc.
And imagine that my logic also depends on more coupled tables behind those categories (subcategories etc.).
We need most of this model at a lot of points in the code, so I'd need to have the whole model filled and linked up by EF.
For doing this, we are simply querying the ObjectContext and using type-safe includes.
But this gets inpractical and error-prone.
Does anyone have suggestions for tackling this kind of problems?

Use projection to get only the values you need, especially if you don't intend to update everything. You probably don't need every property of a piece of furniture, etc. So instead of retrieving the entity itself, project what you want:
from b in Context.Buildings
where b.Id == 123
select new
{
Name = b.Name,
Rooms = from r in b.Rooms
select new
{
XDimension = r.XDimension,
// etc.
Now you no longer have to worry about whether something is loaded; the stuff you need is loaded, and the stuff you don't need is not. The generated SQL will be dramatically simpler, as well.

Related

Laravel - 2 queries over nothing?

Hello guys,
in the official doc of Eloquent for Laravel, this is the way to make an update to a table :
$flight = App\Flight::find(1);
$flight->name = 'New Flight Name';
$flight->save();
I must say, I don't really understand that. For me, it means that for a very basic update, there will be 2 queries to the database - a select and THEN an update ?
Anyone could explain me why this would be a good solution ?
Thanks !
Any abstraction is used for complex applications. The code is too simple that you can not feel it advantage. The Object Relation Mapping (ORM) is used to hide details of operating databases, or running SQL queries.
Just like the MVC model, different layers are in charge of different fields.
View: render HTML
Controller: Logical control, like if .. else ..
Model: data access, like data modification and persistence.
The controller layer won't take care of how model layer works, you just find a object like $flight, and change its property, and save(). That is natural and neat. Your controller layer are all about Object modifying, instead of data modifying.
By separating data and object, you can easily, or at lease possibly, change the implementation of data persistence.
If some objects are frequently changed, you can save it on Redis or Memcached or other NoSQL storage. The controller layer's code needs no change.
If some objects are very large and not modify quite often, you can consider using some distributed storage, or using lazy loading techniques. Your controller code also unchanged. You just change the model layer's implementation, the upper codes will not aware of the change.
Decoupling or layering codes, makes it easy to change any layer's implementation. If you think wring two lines of SQL queries is quicker than ORM, maybe you need to experience larger projects with highly demand changing and performance optimization.
It is always good that separating the implementation and the usage.
Edit:
You can use where and update to update by id. See http://laravel.com/docs/5.1/eloquent#basic-updates
App\Flight::where('active', 1)
->where('destination', 'San Diego')
->update(['delayed' => 1]);
There is some benefit of using this approach if you want to ensure that you're modifying only one valid record.
On the other hand there is another way in the documentation:
App\Flight::where('active', 1)
->where('destination', 'San Diego')
->update(['delayed' => 1]);
It's your preference whether you need to ensure the single record modification or not.

Memory/Efficiency with Linq and large data sets

So you know the background I'm coming from, I've been a professional programmer for over twelve years. My best language by far is C# but I've done C, C++, and most recently objectiveC. I've done a lot of work accessing data in databases but I haven't done as much UI work as most people (Except in IOS).
Recently I've begun using the Entity framework in C# for a job and I must say I wish I'd discovered it sooner. I wouldn't say it's the best thing since sliced bread but it's pretty damned close. After using it for a while it got me thinking about best practices and usage as compared to the old school method of using IDBConnections and IDBCommands for everything.
I was coding for a situation where I was going to be listing the contents of a table of users from a database in a bound data grid with the intention of giving the user the ability to do standard CRUD stuff. I started off by making an User class and a IUserManager interface with a corresponding implementation. Each user is assigned to a department and naturally there'd need to be a way to perform CRUD on departments too so I added a Department class, an IDepartmentManager interface and an implementation for that too. I set it up so that the grid bound on the results of the .GetAll() method on the IUserManager interface. Then I started filling in the guts.
I don't have the code in front of me any more but I basically used IDBConnection to tap into the datastore with an IDBCommand using a SQL query. Then I called command.ExecuteReader() and iterated the .Read() method on the IDataReader object. Using the ordinal for each column I pulled out the data, validated it and slipped it into a User class and added the class to a Dictionary that the method would then return. All the DB classes are of course IDisposable so wrapping them in a using takes care of cleaning up the mess.
Pretty standard stuff, I've done it a bazillion times.
That's when I realized that the departmentId I was pulling from the DB wasn't what I wanted to display in my grid. Telling someone 'this guy is in department 7' isn't as useful as saying 'this guy is in accounting'. So I first toyed with modding my query to get both the departmentId and name, and storing the name on the user object for display later. Then I decided to give the user a Department class instance that it would hang onto during it's lifetime that would be populated. That's when I converted the guts to linq.
public Dictionary<int, User> GetAll()
{
var result = new Dictionary<int, User>();
using (var datastore = new myEntities())
{
result = (from user in datastore.userInfoes
join department in datastore.userDepartmentInfoes on user.departmentID equals department.departmentID
select new User()
{
UserIndex = user.id,
FirstName = user.firstName,
LastName = user.lastName,
Department = new Department()
{
DepartmentId = user.departmentID.Value,
DepartmentName = department.departmentName,
},
Username = user.userName,
}
).ToDictionary(x => x.UserIndex, x => x);
}
return result;
}
That's where I started thinking (read: over-analysing probably)
The implementation I had would work just fine. It would even work pretty well for a small dataset. It'll even work fine for a largish dataset (say 10,000). Even if you counted every person in the company I currently work for five times over you'd have less than a thousand people.
But what if for a second I worked for a really big honking company that had 10 million employees? That would result in the departmentName strings being duplicated potentially millions of times.
That also got me thinking that unlike IOS's MVC implementation this particular situation wasn't going to query just enough users to fill the screen and then handle paging and stuff. As soon as the calling code refresh the data binding it was going to pull all 10 million users all at once and pass back the collection. That's going to be slow.
So that leaves me with the idea in my head that this method is both slow and inefficient with larger data sets. Not only that but the fact that there might be 2 million instances of 'Accounting' held with this data set it is going to be a major memory hog. We're also kind of defeating the purpose of a relational database here because of the Department class inside the User. In the DB you just have a departmentId int foreign key referencing an entry in another table. The link only occurs when you cross reference to the other table and even then there's really only one 'Accounting' string at any one time. In the above code you're going to have a whole lot of 'Accounting' strings floating around waiting to be cleaned up.
An MVC scenario would basically 'know' that it takes X number of entries to fill the grid's viewable area. It would only query X at a time starting from index Y and as the user navigated it would query and display additional records as needed. That's a heck of a lot better than querying all 10 million and letting them hang out somewhere whether they're displayed or not.
Like I said, I may very well be over-analysing this. I might also be incorrect in some of my assumptions with the way linq works. But in the interest of learning I figured I had to ask: What is the best way to do something like this? Is this sort of thing ok for small datasets? Would the whole thing be better off as an MCV implementation rather than pulling in the entire dataset to be displayed in the grid?
If you need the whole set of data in memory - you will have to load it anyway. I am sure you will not list 10kk users in a grid, right? The techniques that comes up is paging. Check this article from msdn with examples.
As for departments objects, does your UserInfo has a foreign key to the department? If so you should just have userInfo.Department available to you and no joins are needed.
If you bind the department data to the grid columns, why having the property of Department type? I assume your User class is something you bind to UI. Flatten it out into:
class User
{
Username
UserIndex
FirstName
LastName
DepartmentId
DepartmentName
}
What is the purpose of GetAll()? You return a dictionary and it feels like you need to enable lookups by id. Or do you use the result to enumerate the users?
For lookups, consider talking to the database to get you a single user data when needed. Implement caching if makes sense next.
For enumeration, do not return dictionary - that is all-in-memory object, return IEnumerable with yielded (paged?) results or even better IQueryable so that calling GetAll() doesn't execute the sql call right away, and the calling code can scope the call down by adding necessary filters

Dynamic NSCombobox

I'm creating an application in which I have several entities and now I need to filter the content of third combobox dynamically. I explain myself better. I have 3 combobox (building, floor and department), I would like first to show me all the buildings included, but the second should show only selected before the plans for the building, the last I should be select only the departments of the building and the plan you choose. How can I do this? To simplify attaching some photos.
You simply drill down with predicates, if you use single fetch requests to Core Data.
However, your relationships are not set up correctly. For example, there is an edificio attribute in Particelle. If it refers to an building, it should be a relationship to a Edifici object, not some kind of foreign key. There are no foreign keys in Core Data, just relationships.
If you do this, everything becomes much easier by using a NSFetchedResultsController. You can now simply traverse the object graph without any specific fetching.
The scheme could be something like this (maybe need to change the order):
Anno <--->> Particella <---->> Edificio <---->> AreaRischio
Now you can simply tell the fetched results controller to start fetching all Anno entities. Then you drill down with simple dot notation:
NSSet *listForNextTable = selectedAnnoObject.particelle;
and further with
NSSet *listForNextTable = selectedParticellaObject.edifici;
etc. You see, it gets really simple.

Azure Tables, PartitionKeys and RowKeys functionality

So just getting started with Azure tables- haven't played with them before so wanted to check it out.
My understanding is that I should be thinking of this as object storage, rather than a database, which is cool. But I'm a bit confused on a couple points...
First, if I have one to many object relationships, what should the partitionkey of the root object look like? For example, let's say I have a University object, which is one to many to Student objects, and say Student objects are one to many to Classes. For a new student, should its partitionkey be 'universityId'? Or 'universityId + studentId'? I read in the msdn docs that the RowKey is supposed to be an id specific to the item I am adding, which also sounds like studentId.
And then would both the partitionkey and rowkey for a new University just be universityId?
I also read that Azure Tables are not for storing lists- I take it that does not refer to storing an object that contains a List...?
And anyone have any links to code samples using asp mvc 3 or 4 and razor with azure tables? This is my end goal, would be cool to see what someone who actually knows what they are doing does :)
Thanks!
You're definitely right that Azure Tables is closer to an object store than a database. You do have some ability to query on non-key columns, and to do logic in queries. But you shouldn't plan on using those features for anything performance critical.
Because queries are only fast if you specify at least a PartitionKey (and preferably a RowKey or range or RowKeys) that heavily influences how you lay out your tables. The decisions you make at the beginning will have big performance implications later. As a rough analogy, I like to think about them like a SQL Server table with the primary key as (PartitionKey + RowKey), that can never have another index. That's not completely accurate, but it'll get you thinking in the right direction.
First, if I have one to many object relationships, what should the partitionkey of the root object look like?
I would probably use the UniversityId as the PartitionKey. That's generally a safe place to start.
For a new student, should its partitionkey be 'universityId'? Or 'universityId + studentId'?
How do you plan to query the students? If you're always going to have their UniversityId & StudentId I would probably make them the PartitionKey and RowKey, respectively. If you're mostly going to query based on StudentId, I would use that as the PartitionKey instead.
would both the partitionkey and rowkey for a new University just be universityId?
That's a viable choice. You can also use a constant value (eg "UNIVERSITY") for the RowKey, if you've really got nothing else to put there.
I also read that Azure Tables are not for storing lists- I take it that does not refer to storing an object that contains a List...?
I'm not entirely sure what that means. Clearly you can store a collection of objects in a table, that's what they're for. You can't directly store a list in an entity property. So if your Student has a property of typee List, that can't be stored directly. But you could serialize it to XML or binary, and store that.
I don't have any code samples handy, unfortunately. This may be a good time to abstract your data logic into its own layer, rather than putting it in your MVC controllers. We've found that a well-abstracted data layer can make unit testing your logic very easy. If you create some interfaces for your tables, it's very easy to create mock objects using just a List and some LINQ.

When would it be worth it to maintain an inverse relationship in Doctrine2?

In the Doctrine manual, under Constrain relationships as much as possible, it gives the advice "Eliminate nonessential associations" and "avoid bidirectional associations if possible". I don't understand what criteria would make an association "essential".
I say this because it seems that you would often want to go from the One side of a One-to-Many association rather than from the Many side. For example, I would want to get all of a User's active PhoneNumbers, rather than get all active PhoneNumbers and their associated User. This becomes more important when you have to traverse multiple One-to-Many relations, e.g. if you wanted to see all Users with a MissedCall from the last two days (MissedCall->PhoneNumber->User).
This is how the simple case would look with an inverse association:
SELECT * FROM User u
LEFT JOIN u.PhoneNumbers p WITH p.active
It would make it more sensible if there were a way to go across a given relation in the opposite direction in DQL, like the following raw SQL:
SELECT * FROM User u
LEFT JOIN PhoneNumber p ON p.User_id = u.id AND p.active
Can someone explain why they give this advice, and in what cases it would be worth ignoring?
-- Edit --
If there are mitigating factors or other workarounds, please give me simple example code or a link.
I do not see any way to traverse a relation's inverse when that inverse is not defined, so I'm going to assume that building custom DQL is not in fact a solution -- there are some joins that are trivial with SQL that are impossible with DQL, and hydration probably wouldn't work anyway. This is why I don't understand why adding inverse relations is a bad idea.
Using Doctrine, I only define relationships when they're needed. This means that all of the relationships defined are actually used in the codebase.
For projects with a large team working on different areas of the project, not everyone will be accustomed to Doctrine, it's current configuration, and eager/lazy loading relationships. If you define bi-directional relationships where they aren't essential and possibly don't make sense, it could potentially lead to extra queries for data that:
may not be used
may have been selected previously
Defining only essential relationships will allow you greater control over how you and your team traverse through your data and reduce extra or overly large queries
Updated 22/08/2011
By essential relationships, I mean the ones you use. It doesn't make sense to define a relationship you wouldn't use. For example:
\Entity\Post has a defined relationship to both \Entity\User and \Entity\Comment
Use $post->user to get author
Use $post->comments to get all comments
\Entity\User has a defined relationship to both \Entity\Post and \Entity\Comment
Use $user->posts to get all user posts
Use $user->comments to get all user comments
\Entity\Comment only has a relationship to \Entity\User
Use $comment->user to get author
Cannot use $comment->post as I don't retrieve the post it belongs to in my application
I wouldn't think of them as "Inverse" relationships. Think of them as "Bi-directional", if using the data in both directions makes sense. If it doesn't make sense, or you wouldn't use the data that way around, don't define it.
I hope this makes sense
I think this is a great question, and am looking forward to others' answers.
Generally, I've interpreted the advice you cited in the down to the following rule of thumb:
If I don't need to access the (inverse) association inside my entity, then I typically make it unidirectional. In your example of users and (missed) calls, I'd probably keep it unidirectional, and let some service class or repository handle putting together custom DQL for the odd occurrence when I needed to get a list of all users with recent missed calls. That's a case I'd consider exceptional -- most of the time, I'm just interested in a particular user's calls, so the unidirectional relationship works (at least until I've got so many records that I feel the need to optimize).

Resources