Can I disable autoloading associations in ActiveRecord? - rails-activerecord

Imagine a Rails application with authors and books. On the books index page, we list each book with its author, like book.author.name.
One of two things will happen:
If book.author is not yet loaded, ActiveRecord will query the database to find the author
If book.author was preloaded (eg Book.includes(:author)), the objects will be fetched from memory
In scenario #1, the view causes an N+1 query - every time it tries to render another book, it has to do another SELECT * from authors where author_id = $1 (using the books.author_id).
This is bad for performance and often hard to track down. It would be much better (IMHO) if the controller loaded all the data needed by the view, and the view could not trigger queries at all.
Is there a way to disable ActiveRecord's "on demand" loading of associations? In other words, to make scenario #1 raise an exception saying something like "association not available, please see documentation on preloading associations"?

Related

Are Doctrine relations affecting application performance?

I am working on a Symfony project with a new team, and they decide to stop using Doctrine relations the most they can because of performances issues.
For instance I have to stock the id of my "relation" instead of using a ManyToOne relation.
But I am wondering if it is a real problem?
The thing is, it changes the way of coding to retrieve information and so on.
The performance issue most likely comes from the fact that queries are not optimised.
If you let Doctrine (Symfony component that handle the queries) do the queries itself (by using findBy(), findAll(), findOneBy(), etc), it will first fetch what you asked, then do more query as it will require data from other tables.
Lets take the most common example, a library.
Entities
Book
Author
Shelf
Relations
One Book have one Author, but one Author can have many Books (Book <= ManyToOne => Author)
One Book is stored in one Shelf (Book <= OneToOne => Sheilf)
Now if you query a Book, Doctrine will also fetch Shelf as it's a OneToOne relation.
But it won't fetch Author. In you object, you will only have access to book.author.id as this information is in the Book itself.
Thus, if in your Twig view, you do something like {{ book.author.name }}, as the information wasn't fetched in the initial query, Doctrine will add an extra query to fetch data about the author of the book.
Thus, to prevent this, you have to customize your query so it get the required data in one go, like this:
public function getBookFullData(Book $book) {
$qb=$this->createQueryBuilder('book');
$qb->addSelect('shelf')
->addSelect('author')
->join('book.shelf', 'shelf')
->join('book.author', 'author');
return $qb->getQuery()->getResult();
}
With this custom query, you can get all the data of one book in one go, thus, Doctrine won't have to do an extra query.
So, while the example is rather simple, I'm sure you can understand that in big projects, letting free rein to Doctrine will just increase the number of extra query.
One of my project, before optimisation, reached 1500 queries per page loading...
On the other hand, it's not good to ignore relations in a database.
In fact, a database is faster with foreign keys and indexes than without.
If you want your app to be as fast as possible, you have to use relations to optimise your database query speed, and optimise Doctrine queries to avoid a foul number of extra queries.
Last, I will say that order matter.
Using ORDER BY to fetch parent before child will also greatly reduce the number of query Doctrine might do on it's own.
[SIDE NOTE]
You can also change the fetch method on your entity annotation to "optimise" Doctrine pre-made queries.
fetch="EXTRA_LAZY
fetch="LAZY
fetch="EAGER
But it's not smart, and often don't really provide what we really need.
Thus, custom queries is the best choice.

Avoid query each time the page is loaded

I work on an educational website in which we show dynamic filters. What does this mean? We now have several course categories that will increase in number in the following weeks.
Right now categories are a string field in each course. I'm planning to model this and create a Categories table. Having done this, it would be pretty easy to load the filters based on the categories we have on the database. However, I see the problem that each time the website is loaded, the query will be made.
I thought of caching these categories but I'm guessing is not the best solution.
Any idea on how can I avoid these queries each time but get this information from the database?

How to map SQL queries to in-memory model objects?

Let's say we are structuring an application with MVC (also, Stores/Services). SQL is used as the persistence mechanism. And memory efficiency is a major concern.
Obviously, we should take advantage of SQL queries and only ask for fields of our Model in theory object when they are needed.
For example, an mobile app may need to display a list of title for articles, while the body of the article doesn't get displayed until user taps on a specific title. In this case, we ask SQL for just the titles first.
The question is, what should the model object look like?
The solutions I can think of are:
Enhance the model with some states that indicate which fields are populated. This could also be archived by using nil/NULL/None values on unpopulated fields of the model object.
Split the theoretical model to multiple classes. Following the previous example, we could have an Article class and an ArticleDetail class, with a one-to-one relation.
Forget the Store object, let each model object lazy evaluate it's costly fields. The model would have to know about its persistence mechanism.
This should be a common problem. How do the ORM in your favorite frameworks/libraries resolve it? Any best practices?

How does one validate a partial record when using EF/Data Annotations?

I am updating a record over multiple forms/pages like a wizard. I need to save after each form to the database. However this is only a partial record. The EF POCO model has all data annotations for all the properties(Fields), so I suspect when I save this partial record I will get an error.
So I am unsure of the simplest solution to this.
Some options I have thought of:
a) Create a View Model for each form. Data Annotations on View model instead of EF Domain Model.
b) Save specific properties, rather than SaveAll, in controller for view thereby not triggering validation for non relevant properties.
c) Some other solution...??
Many thanks in Advance,
Option 1. Validation probably (especially in your case) belongs on the view model anyway. If it is technically valid (DB constraint wise) to have a partially populated record, then this is further evidence that the validation belongs on the view.
Additionally, by abstracting the validation to your view, you're allowing other consuming applications to have their own validation logic.
Additional thoughts:
I will say, though, just as a side note that it's a little awkward saving your data partially like you're doing, and unless you have a really good reason (which I originally assumed you did), you may consider holding onto that data elsewhere (session) and persisting it together at the end of the wizard.
This will allow better, more appropriate DB constraints for better data integrity. For example, if a whole record shouldn't allow a null name, then allowing nulls for the sake of breaking your commits up for the wizard may cause more long term headaches.

Should I extract functionality from this model class into a form class? (ActiveRecord Pattern)

I am in the midst of designing an application following the mvc paradigm. I'm using the sqlalchemy expression language (not the orm), and pyramid if anyone was curious.
So, for a user class, that represents a user on the system, I have several accessor methods for various pieces of data like the avatar_url, name, about, etc. I have a method called getuser which looks up a user in the db(by name or id), retrieves the users row, and encapsulates it with the user class.
However, should I have to make this look-up every-time I create a user class? What if a user is viewing her control panel and wants to change avatars, and sends an xhr; isn't it a waste to have to create a user object, and look up the users row when they wont even be using the data retrieved; but simply want to make a change to subset of the columns? I doubt this lookup is negligible despite indexing because of waiting for i/o correct?
More generally, isn't it inefficient to have to query a database and load all a model class's data to make any change (even small ones)?
I'm thinking I should just create a seperate form class (since every change made is via some form), and have specific form classes inherit them, where these setter methods will be implemented. What do you think?
EX: Class: Form <- Class: Change_password_form <- function: change_usr_pass
I'd really appreciate some advice on creating a proper design;thanks.
SQLAlchemy ORM has some facilities which would simplify your task. It looks like you're having to re-invent quite some wheels already present in the ORM layer: "I have a method called getuser which looks up a user in the db(by name or id), retrieves the users row, and encapsulates it with the user class" - this is what ORM does.
With ORM, you have a Session, which, apart from other things, serves as a cache for ORM objects, so you can avoid loading the same model more than once per transaction. You'll find that you need to load User object to authenticate the request anyway, so not querying the table at all is probably not an option.
You can also configure some attributes to be lazily loaded, so some rarely-needed or bulky properties are only loaded when you access them
You can also configure relationships to be eagerly loaded in a single query, which may save you from doing hundreds of small separate queries. I mean, in your current design, how many queries would the below code initiate:
for user in get_all_users():
print user.get_avatar_uri()
print user.get_name()
print user.get_about()
from your description it sounds like it may require 1 + (num_users*3) queries. With SQLAlchemy ORM you could load everything in a single query.
The conclusion is: fetching a single object from a database by its primary key is a reasonably cheap operation, you should not worry about that unless you're building something the size of facebook. What you should worry about is making hundreds of small separate queries where one larger query would suffice. This is the area where SQLAlchemy ORM is very-very good.
Now, regarding "isn't it a waste to have to create a user object, and look up the users row when they wont even be using the data retrieved; but simply want to make a change to subset of the columns" - I understand you're thinking about something like
class ChangePasswordForm(...):
def _change_password(self, user_id, new_password):
session.execute("UPDATE users ...", user_id, new_password)
def save(self, request):
self._change_password(request['user_id'], request['password'])
versus
class ChangePasswordForm(...):
def save(self, request):
user = getuser(request['user_id'])
user.change_password(request['password'])
The former example will issue just one query, the latter will have to issue a SELECT and build User object, and then to issue an UPDATE. The latter may seem to be "twice more efficient", but in a real application the difference may be negligible. Moreover, often you will need to fetch the object from the database anyway, either to do validation (new password can not be the same as old password), permissions checks (is user Molly allowed to edit the description of Photo #12343?) or logging.
If you think that the difference of doing the extra query is going to be important (millions of users constantly editing their profile pictures) then you probably need to do some profiling and see where the bottlenecks are.
Read up on the SOLID principle, paying particular attention to the S as it answers your question.
Create a single class to perform user existence check, and inject it into any class that requires that functionality.
Also, you need to create a data persistence class to store the user's data, so that the database doesn't have to be queried every time.

Resources