Lazy loading VS Eager loading in Active record - Rails

Lazy loading VS Eager loading in Active record - Rails - activerecord

When reading in forums, for query optimization eager loading is better than lazy loading in active record usage. But when I attend interviews people tell that lazy loading is useful in some scenarios. When I surfed in google I can't find enough information about it. Can anybody guide me to understand this concept. Lazy loading Vs Eager loading
My understanding:
Eager loading solves N+1 query problems when retrieving associated records.
Kindly give me some practical scenarios

When Lazy loading is needed?
You want to retrieve data of any model with conditions (join conditions) of other associated models but you do not want to load those associated table data which can consume time.
So lazy loading will save time as you can get data by filtering but you do not load associated data from database.
When eager-loading is needed?
You want conditions or no conditions but you are calling their associated records also so it is good to load associated records eagerly. So calling associated records on object do not fire query on database each time for each objects.
Suppose,
#users is ActiveRecord::Relation collection object (has_one :id_card) having size 80 & I called each loop for table as below,
#users.each do |user|
user.id_card.name
end
It will fire query 80 times on IdCard model table. So here is it efficient.
Update
includes do not create two separate queries always, read here.

Related

What are the best options for data loading in bigger systems? (Laravel)

as my question says, I would like to know what is my best choice for loading data in bigger systems in Laravel.
At the moment I use Laravel Eloquent to pull data from database and in my views I use dataTables JS library. It is effective for smaller systems or websites. However I seem to find myself in a position where systems that are bigger and more complex take very long time to load those data (sometimes more than 15 seconds).
I have found some solutions:
Eager loading relations helps with relations in row
Using DB instead of Eloquent
using Laravel pagination instead of dataTables pagination
loading data to dataTables from ajax source
However the pagination has some problems, especially with dataTables having the option of ordering/searching data.
My question is do you have any advice on how to load data in most effective way, so that it is as fast as it could be and the code as clean as possible. What do you do, when you need to load ginormous amount of data?

On of the best ways to optimize your queries is by using MySQL indexes. As per this documentation:
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data.
This is much faster than reading every row sequentially.
The most simple way to create an index is the following:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
If you want the Laravel way of creating index, you can do it by using index() method, as per official documentation:
$table->index('column');

graphql / strapi: Why fetching of ids of a model that has large json objects takes so long?

I have a model named 'exam' and many instances of it. Each instance has large json object called questions. But when I'm fetching only the ids and names of the exams I can see that their json objects make the fetching extremely slow:
query {
exams {
name,
_id,
}
}
It seems that just accessing a specific model that has a large json object takes forever, even when NOT fetching the content of the JSON object.
I also notice that if I fetch data from a model that has relationship with some Exams, then fetching of this model too will be very slow.
Only when fetching a model that has relationships with a model that itself has relationships with Exams, is fast. Namely, only third degree relationships models can be fetched very quickly.
Does it make any sense? How should I fix my collections so that I can fetch a list of the exams quickly. It seems that even if I move the json object of each exam to a related collection, it would still be slow
Thanks in advance

You have performance issue with your GraphQL application. In your question you doesn't provide what the reason of the issue. You should find bottleneck.
If problem at DataBase layer, then you can use improve your mlab MongoDB instance to better one.
If problem at query layer, then you need to use projection at your resolver. There is awesome package graphql-fields for that.
If problem at wrong DataStructure, then you need to redesign GraphQL schema or mongoose models.
If problem at Node.js layer, then you should not block event loop.

Joomla getItems default Pagination

Can anyone tell me if the getItems() function in the model automatically adds the globally set LIMIT before it actions the query (from getListQuery()). Joomla is really struggling, seemingly trying to cache the entire results (over 1 million records here!).
After looking in /libraries/legacy/model/list.php AND /libraries/legacy/model/legacy.php it appears that getItems() does add LIMIT to setQuery using $this->getState('list.limit') before it sends the results to the cache but if this is the case - why is Joomla struggling so much.
So what's going on? How come phpMyAdmin can return the limited results within a second and Joomla just times out?
Many thanks!

If you have one million records, you'll most definitely want to do as Riccardo is suggesting, override and optimize the model.
JModelList runs the query twice, once for the pagination numbers and then for the display query itself. You'll want to carefully inherit from JModellist to avoid the pagination query.
Also, the articles query is notorious for it's joins. You can definitely lose some of that slowdown (doubt you are using the contacts link, for example).
If all articles are visible to public, you can remove the ACL check - that's pretty costly.
There is no DBA from the West or the East who is able to explain why all of those GROUP BY's are needed, either.
Losing those things will help considerably. In fact, building your query from scratch might be best.

It does add the pagination automatically.
Its struggling is most likely due to a large dataset (i.e. 1000+ items returned in the collection) and many lookup fields: the content modules for example join as many as 10 tables, to get author names etc.
This can be a real killer, I had queries running for over one second with a dedicated server and only 3000 content items. One tag cloud component we found could take as long as 45 seconds to return a keywords list. If this is the situation (a lot of records and many joins), your only way out is to further limit the filters in the options to see if you can get some faster results (for example, limiting to articles in the last 3 months can reduce the time needed dramatically).
But if this is not sufficient or not viable, you're left with writing a new optimized query in a new model, which ultimately will bring the best performance optimization of any other optimization. In writing the query, consider leveraging the database specific optimizations, i.e. adding indexes, full-text indexes and only use joins if you really need them.
Also consider that joins must never grow with the number of fields, translations or else.
A constant query is easy for the db engine to optimize and cache, whilst a dynamic query will never be as efficient.

How to access data in Dynamics CRM?

What is the best way in terms of speed of the platform and maintainability to access data (read only) on Dynamics CRM 4? I've done all three, but interested in the opinions of the crowd.
Via the API
Via the webservices directly
Via DB calls to the views
...and why?
My thoughts normally center around DB calls to the views but I know there are purists out there.

Given both requirements I'd say you want to call the views. Properly crafted SQL queries will fly.
Going through the API is required if you plan to modify data, but it isnt the fastest approach around because it doesnt allow deep loading of entities. For instance if you want to look at customers and their orders you'll have to load both up individually and then join them manually. Where as a SQL query will already have the data joined.
Nevermind that the TDS stream is a lot more effecient that the SOAP messages being used by the API & webservices.
UPDATE
I should point out in regard to the views and CRM database in general: CRM does not optimize the indexes on the tables or views for custom entities (how could it?). So if you have a truckload entity that you lookup by destination all the time you'll need to add an index for that property. Depending upon your application it could make a huge difference in performance.

I'll add to jake's comment by saying that querying against the tables directly instead of the views (*base & *extensionbase) will be even faster.
In order of speed it'd be:
direct table query
view query
filterd view query
api call

Direct table updates:
I disagree with Jake that all updates must go through the API. The correct statement is that going through the API is the only supported way to do updates. There are in fact several instances where directly modifying the tables is the most reasonable option:
One time imports of large volumes of data while the system is not in operation.
Modification of specific fields across large volumes of data.
I agree that this sort of direct modification should only be a last resort when the performance of the API is unacceptable. However, if you want to modify a boolean field on thousands of records, doing a direct SQL update to the table is a great option.
Relative Speed
I agree with XVargas as far as relative speed.
Unfiltered Views vs Tables: I have not found the performance advantage to be worth the hassle of manually joining the base and extension tables.
Unfiltered views vs Filtered views: I recently was working with a complicated query which took about 15 minutes to run using the filtered views. After switching to the unfiltered views this query ran in about 10 seconds. Looking at the respective query plans, the raw query had 8 operations while the query against the filtered views had over 80 operations.
Unfiltered Views vs API: I have never compared querying through the API against querying views, but I have compared the cost of writing data through the API vs inserting directly through SQL. Importing millions of records through the API can take several days, while the same operation using insert statements might take several minutes. I assume the difference isn't as great during reads but it is probably still large.

Entity Framework associations killing performance

Here is the performance test i am looking at. I have 8 different entities that are table per type. Some of the entities contain over 100 thousand rows.
This particular application does several recursive calculations on the client so I think it may be best to preload the data instead of lazy loading.
If there are no associations I can load the entire database in about 3 seconds. As I add associations in any way the performance starts to drastically decline.
I am loading all the data the same way (just calling toList() on the entity attached to the context). I ran the test with edmx generated classes and self tracking entities and had similar results.
I am sure if I were to try and deal with the associations myself, similar to how I would in a dataset, the performance problem would go away. On the other hand I am pretty sure this is not how the entity framework was intended to being used. Any thoughts or ideas?

Loading entities with relationships is going to be much slower than loading entities without even if the related entities are not fetched at load time since it will need to create the complex object used to track the relationship in one case vs perhaps a simple value type like an int in the other. How much slower are you seeing it?
But ...
Preloading 100 thousand rows sounds like a really bad idea. When you do ToList() you have eliminated any chance that EF and SQL can do any kind of optimized query against your data. Are your calculations such that you always need to examine all the data? Have you tried it without preloading and examined the queries it is generating? Have you tried using .Include to just include the related objects you know you will need?
EF will be smart about caching if you give it the chance.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio