Are Doctrine relations affecting application performance? - performance

I am working on a Symfony project with a new team, and they decide to stop using Doctrine relations the most they can because of performances issues.
For instance I have to stock the id of my "relation" instead of using a ManyToOne relation.
But I am wondering if it is a real problem?
The thing is, it changes the way of coding to retrieve information and so on.

The performance issue most likely comes from the fact that queries are not optimised.
If you let Doctrine (Symfony component that handle the queries) do the queries itself (by using findBy(), findAll(), findOneBy(), etc), it will first fetch what you asked, then do more query as it will require data from other tables.
Lets take the most common example, a library.
Entities
Book
Author
Shelf
Relations
One Book have one Author, but one Author can have many Books (Book <= ManyToOne => Author)
One Book is stored in one Shelf (Book <= OneToOne => Sheilf)
Now if you query a Book, Doctrine will also fetch Shelf as it's a OneToOne relation.
But it won't fetch Author. In you object, you will only have access to book.author.id as this information is in the Book itself.
Thus, if in your Twig view, you do something like {{ book.author.name }}, as the information wasn't fetched in the initial query, Doctrine will add an extra query to fetch data about the author of the book.
Thus, to prevent this, you have to customize your query so it get the required data in one go, like this:
public function getBookFullData(Book $book) {
$qb=$this->createQueryBuilder('book');
$qb->addSelect('shelf')
->addSelect('author')
->join('book.shelf', 'shelf')
->join('book.author', 'author');
return $qb->getQuery()->getResult();
}
With this custom query, you can get all the data of one book in one go, thus, Doctrine won't have to do an extra query.
So, while the example is rather simple, I'm sure you can understand that in big projects, letting free rein to Doctrine will just increase the number of extra query.
One of my project, before optimisation, reached 1500 queries per page loading...
On the other hand, it's not good to ignore relations in a database.
In fact, a database is faster with foreign keys and indexes than without.
If you want your app to be as fast as possible, you have to use relations to optimise your database query speed, and optimise Doctrine queries to avoid a foul number of extra queries.
Last, I will say that order matter.
Using ORDER BY to fetch parent before child will also greatly reduce the number of query Doctrine might do on it's own.
[SIDE NOTE]
You can also change the fetch method on your entity annotation to "optimise" Doctrine pre-made queries.
fetch="EXTRA_LAZY
fetch="LAZY
fetch="EAGER
But it's not smart, and often don't really provide what we really need.
Thus, custom queries is the best choice.

Related

LatestOfMany() of BelongsToMany() relationship

I've been using latestOfmany() for my hasMany() relation to define them as hasOne() for quite a while now. Lately I've been in need of the similar application but for belongsToMany() relationships. Laravel doesn't have this feature unfortunately.
My codebase as follows:
Document
id
upload_date
identifier_code
Person
id
name
DocumentPerson (pivot)
id
person_id
person_id
token
My objective is: define relationship for fetching the first document (according to upload_date) of Person. As you can see it's a many-to-many relationship.
What I have tried so far:
public function firstDocument()
{
return $this->hasOne(DocumentPerson::class)->oldestOfMany('document.upload_date');
//this was my safe bet but oldestOfMany() and ofMany() doesn't allow aggregating on relationship column.
}
public function firstDocument()
{
return $this->belongToMany(Document::class)->oldestOfMany('upload_date')
}
public function firstDocument()
{
return $this->belongToMany(Document::class)->oldest()->limit(1);
}
public function firstDocument()
{
return $this->hasOneThrough(Document::class, DocumentPerson::class, 'id', 'document_id', 'id', 'person_id')->latestOfMany('upload_date');
}
At this point I'm almost positive current relationship base doesn't support something like this, so I'm elaborating alternative methods to solve this. My two choices:
Add a column called first_document_id on Person table, go through that with belongsTo() simple and fast performance-wise. But downside is I'll have to implement so many event-listeners to make sure it is always consistent with actual relationships. What if Document's upload_date is updates etc. (basically database inconsistency)
Add a order column on pivot (document_person) table, which will hold order of related Documents by upload_date. This way I can do hasOne(DocumentPerson::class)->oldestOfMany('order');//or just ofMany() and be done with it. This one also poses the risk of database inconsistency.
It's fair to say I'm at a crossroads here. Any idea and suggestion is welcomed and appreciated. Thank you. Please read the restrictions to prevent suggesting things that are not feasible for my situation.
Restrictions:
(Please)
It should strictly be a relationship. I'll be using it on various places, it definitely has to be relationship so I can eager load and query it. My next objective involves querying by this relationship so it is imperative.
Don't suggest accessors, it won't do well with my case.
Don't suggest collection methods, it needs to be done in query.
Don't suggest ->limit() or ->take() or ->first(), those are prone to cause inconsistent results with eager loading.
Update 1
Q: Why first document of a person has to be a relationship ?
A: Because further down the line I'll be querying it in various different instances. Example queries where it'll be utilized:
Get all the users whose first document (according to upload_date) upload_date between 2022-01-01 and 2022-06-08. (along with 10 other scopes and filters)
Get all the users whose first document (according to upload_date) identifier_code starts with "Lorem" and id bigger than 100.
These are just to name a few, there are many cases where I really gotta query it in various fashions. This is the reason that I desperately need it to be a relationship, so I can query it with ease using Person::whereHas('firstDocument',function($subQuery){ return $subQuery->someScope1()->anotherScope2()->where(...); }
If I only needed to display it, yeah sure eager loading with closure would do well, or even collection methods, or accessors would suffice. But since ability to query it is the need, relationship is of the essence. Keep in mind Person table has around 500k record, hence the need for querying it on the database layer.
Alright here's the solution I've elected to go with (among my choices, explained in the question). I implemented the "adding order column on pivot" table. Because it scales better and is rather flexible compared to other options. It allows for querying the last document, first document, third document etc. Whilst it doesn't even require any aggregate functions (Max, min like ->latestOfMany() applies) which is a performance boost. Given these constraints this solution was the way to go. Here's how I applied it in case someone else is thinking about something similar.
Currently the only noticeable downside to this approach is inability to access any additional pivot data.
Added new column for order:
//migration
$table->unsignedTinyInteger('document_upload_date_order')->nullable()->after('token');
$table->index('document_upload_date_order');//for performance
Person.php (Model)
//... other stuff
public function personalDocuments()
{//my old relationship, which I'll still keep for display/index purposes.
return $this->belongsToMany(Document::class)->withPivot('token')->where('type_slug','personal');
}
//NEW RELATIONSHIP
public function firstDocument()
{//Eloquent relationship, allows for querying and eager loading
return $this->hasOneThrough(
Document::class,
DocumentPerson::class,//pivot class for the pivot table
'person_id',
'id',
'id',
'document_id')
->where('document_upload_date_order',1);//magic here
SomeService.php
public function determineDocumentUploadDateOrders(Person $person){
$sortLogic=[
['upload_date', 'asc'],
['created_at', 'asc'],
];
$documentsOrdered=$person->documents->sortBy($sortLogic)->values();//values() is for re-indexing the array keys
foreach ($documentsOrdered as $index=>$document){
//updating through pivot tables ORM model
DocumentPerson::where('id',$document->pivot->id)->update([
'document_upload_date_order'=>$index+1,
'document_id'=>$document->id,
'person_id'=>$document->pivot->person_id,
]);
}
}
I hooked determineDocumentUploadDateOrders() into various event-listeners and model events so whenever association/disassociation occurs, or upload_date of a document changes I simply call determineDocumentUploadDateOrders() with corresponding Person and this way it is always kept in sync with actual.
Implemented it fully and it is providing consistent results with great performance. Of course it brought a bit of an overhead with keeping it in sync. But nonetheless, It did the job whilst meeting the requirements. Honestly I found this approach far more reliable than some in-official eloquent relationships and similar alternatives.
I have encountered a similar situation years back.
the best workaround on a situation like this is to use #staudenmeir package eager limit
Load the trait use \Staudenmeir\EloquentEagerLimit\HasEagerLimit; on both model (parent and related model)
then try the code below
public function firstDocument() {
return $this->documents()->latest()->limit(1);
}
public function documents() {
return $this->belongsToMany(Document::class);
}
just to add, Eager loading with limit does not work with built laravel eloquent, you would have to build your own raw queries to achieve it which can turn into a nightmare. that eager limit package from staudenmeir should have been merge with laravel source code 😆

Is Laravel's 'pluck' method cheaper than a general 'get'?

I'm trying to dramatically cut down on pricey DB queries for an app I'm building, and thought I should perhaps just return IDs of a child collection (then find the related object from my React state), rather than returning the children themselves.
I suppose I'm asking, if I use 'pluck' to just return child IDs, is that more efficient than a general 'get', or would I be wasting my time with that?
Yes,pluck method is just fine if you are trying to retrieving a Single Column from tables.
If you use get() method it will retrieve all information about child model and that could lead to a little slower process for querying and get results.
So in my opinion, You are using great method for retrieving the result.
Laravel has also different methods for select queries. Here you can look Selects.
The good practice to perform DB select query in a application, is to select columns that are necessary. If id column is needed, then id column should be selected, instead of all columns. Otherwise, it will spend unnecessary memory to hold unused data. If your mind is clear, pluck and get are the same:
Model::pluck('id')
// which is the same as
Model::select('id')->get()->pluck('id');
// which is the same as
Model::get(['id'])->pluck('id');
I know i'm a little late to the party, but i was wondering this myself and i decided to research it. It proves that one method is faster than the other.
Using Model::select('id')->get() is faster than Model::get()->pluck('id').
This is because Illuminate\Support\Collection::pluck will iterate over each returned Model and extract only the selected column(s) using a PHP foreach loop, while the first method will make it cheaper in general as it is a database query instead.

How to map SQL queries to in-memory model objects?

Let's say we are structuring an application with MVC (also, Stores/Services). SQL is used as the persistence mechanism. And memory efficiency is a major concern.
Obviously, we should take advantage of SQL queries and only ask for fields of our Model in theory object when they are needed.
For example, an mobile app may need to display a list of title for articles, while the body of the article doesn't get displayed until user taps on a specific title. In this case, we ask SQL for just the titles first.
The question is, what should the model object look like?
The solutions I can think of are:
Enhance the model with some states that indicate which fields are populated. This could also be archived by using nil/NULL/None values on unpopulated fields of the model object.
Split the theoretical model to multiple classes. Following the previous example, we could have an Article class and an ArticleDetail class, with a one-to-one relation.
Forget the Store object, let each model object lazy evaluate it's costly fields. The model would have to know about its persistence mechanism.
This should be a common problem. How do the ORM in your favorite frameworks/libraries resolve it? Any best practices?

Azure Tables, PartitionKeys and RowKeys functionality

So just getting started with Azure tables- haven't played with them before so wanted to check it out.
My understanding is that I should be thinking of this as object storage, rather than a database, which is cool. But I'm a bit confused on a couple points...
First, if I have one to many object relationships, what should the partitionkey of the root object look like? For example, let's say I have a University object, which is one to many to Student objects, and say Student objects are one to many to Classes. For a new student, should its partitionkey be 'universityId'? Or 'universityId + studentId'? I read in the msdn docs that the RowKey is supposed to be an id specific to the item I am adding, which also sounds like studentId.
And then would both the partitionkey and rowkey for a new University just be universityId?
I also read that Azure Tables are not for storing lists- I take it that does not refer to storing an object that contains a List...?
And anyone have any links to code samples using asp mvc 3 or 4 and razor with azure tables? This is my end goal, would be cool to see what someone who actually knows what they are doing does :)
Thanks!
You're definitely right that Azure Tables is closer to an object store than a database. You do have some ability to query on non-key columns, and to do logic in queries. But you shouldn't plan on using those features for anything performance critical.
Because queries are only fast if you specify at least a PartitionKey (and preferably a RowKey or range or RowKeys) that heavily influences how you lay out your tables. The decisions you make at the beginning will have big performance implications later. As a rough analogy, I like to think about them like a SQL Server table with the primary key as (PartitionKey + RowKey), that can never have another index. That's not completely accurate, but it'll get you thinking in the right direction.
First, if I have one to many object relationships, what should the partitionkey of the root object look like?
I would probably use the UniversityId as the PartitionKey. That's generally a safe place to start.
For a new student, should its partitionkey be 'universityId'? Or 'universityId + studentId'?
How do you plan to query the students? If you're always going to have their UniversityId & StudentId I would probably make them the PartitionKey and RowKey, respectively. If you're mostly going to query based on StudentId, I would use that as the PartitionKey instead.
would both the partitionkey and rowkey for a new University just be universityId?
That's a viable choice. You can also use a constant value (eg "UNIVERSITY") for the RowKey, if you've really got nothing else to put there.
I also read that Azure Tables are not for storing lists- I take it that does not refer to storing an object that contains a List...?
I'm not entirely sure what that means. Clearly you can store a collection of objects in a table, that's what they're for. You can't directly store a list in an entity property. So if your Student has a property of typee List, that can't be stored directly. But you could serialize it to XML or binary, and store that.
I don't have any code samples handy, unfortunately. This may be a good time to abstract your data logic into its own layer, rather than putting it in your MVC controllers. We've found that a well-abstracted data layer can make unit testing your logic very easy. If you create some interfaces for your tables, it's very easy to create mock objects using just a List and some LINQ.

how to use codeigniter database models

I am wondering how the models in code ignitor are suposed to be used.
Lets say I have a couple of tables in menu items database, and I want to query information for each table in different controllers. Do I make different model classes for each of the tables and layout the functions within them?
Thanks!
Models should contain all the functionality for retrieving and inserting data into your database. A controller will load a model:
$this->load->model('model_name');
The controller then fetches any data needed by the view through the abstract functions defined in your model.
It would be best to create a different model for each table although its is not essential.
You should read up about the MVC design pattern, it is used by codeigniter and many other frameworks because it is efficient and allows code reuse. More info about models can be found in the Codeigniter docs:
http://codeigniter.com/user_guide/general/models.html
CodeIgniter is flexible, and leaves this decision up to you. The user's guide does not say one way or the other how you should organize your code.
That said, to keep your code clean and easy to maintain I would recommend an approach where you try to limit each model to dealing with an individual table, or at least a single database entity. You certainly want to avoid having a single model to handle all of your database tables.
For my taste, CodeIgniter is too flexible here - I'd rather call it vague. A CI "model" has no spec, no interface, it can be things as different as:
An entity domain object, where each instance represents basically a record of a table. Sometimes it's an "anemic" domain object, each property maps directly to a DB column, little behaviour and little or no understanding of objects relationships and "graphs" (say, foreign keys in the DB are just integer ids in PHP). Or it can also be a "rich (or true) domain object", with all the business intelligence, and also knows about relations: say instead of $person->getAccountId() (returns int) we have $person->getAccount(); perhaps also knows how to persist itself (and perhaps also the full graph or related object - perhaps some notion of "dirtiness").
A service object, related to objects persistence and/or general DB querying: be a DataMapper, a DAO, etc. In this case we have typically one single instance (singleton) of the object (little or no state), typically one per DB table or per domain class.
When you read, in CI docs or forums, about , say, the Person model you can never know what kind of patter we are dealing with. Worse: frequently it's a ungly mix of those fundamentally different patterns.
This informality/vagueness is not specific to CI, rather to PHP frameworks, in my experience.

Resources