I just started playing with Laravel 4 and Eloquent. I have a blog table and lots of other related tables to it:
blog <- main info about the blog record
blog_lang <- translations for each blog record
blog_categories <- name speaks for itself
blog_categories_lang <- translations for blog categories titles
blog_to_categories <- pivot table between blog and blog_categories
blog hasMany blog_lang.
blog_categories hasMany blog_categories_lang
blog belongsToMany blog_categories
I want to show the following info in one grid : blog_id, blog_title, username, and all categories:
$data['blogs'] = Blog::with(array(
'translations' => function ($q) {
$q->where('lang_id', '=', 1);
},
'user',
'categories',
'categories.translations' => function ($q) {
$q->where('lang_id', '=', 1);
}
))->get();
This executes 5 queries... aren't they a little too many? Will it be better to just use Fluent and join all these tables with 1 bigger query?
Eloquent making a lot of small, indexed queries is a much better thing than doing one big query for a wide variety of reasons:
MySQL will not have to load multiple tables in temporary memory for each query, and may re-use them between queries
The SQL optimizer will run more swiftly through each query
It allows you to cache your results without having to throw out the JOINs (and other similar clauses) from your data, which makes caching easy
You're not actually noticing it, but the path taken by the SQL optimizer is the same between the following queries:
SELECT a.*, b.* FROM a INNER JOIN b ON (a.id=b.id) WHERE a.id = 1
SELECT a.*, b.* FROM a, b WHERE a.id = b.id AND a.id = 1
Both of them will cause the SQL optimizer to perform these queries under-the-hood:
SELECT a.* WHERE a.id = 1
SELECT b.* WHERE b.id = 1
And from there, depending on your indices, the SQL optimizer will perform matching based on either the indices or the full table data. What is Eloquent doing? Exactly those two queries. You're not gaining anything by one big query - in fact, you are losing on data reusability. In everything, prefer small, optimized, re-usable, cacheable queries to bulky statements.
Related
Let's say we have a table users and we are left joining multiple tables to it.
$users = User::query()
->select('users.id', 'bananas.id as banana_id', 'dogs.id as dog_id')
->leftJoin('bananas', 'banana.user_id', '=', 'users.id')
->unionAll($usersWithDogs) // a similar query with a left join on `dogs`
->orderByDesc('users.created_at')
->paginate(...);
We end up with a collection of User models with attributes id, dog_id, banana_id.
Now imagine we want to eager load these, but the eloquent relations are based on the one to many relationships, $user->dogs, $user->bananas.
Trying to find a solution that will do all the following:
a) not break pagination
b) allow ordering on the user table
c) allow eager loading
d) use clean code
e) end up with a collection of users
Brainstorming so far has led to the following options:
A union of bananas and dogs, eager load the user relation, then invert the collection (messy code)
Dynamic relationships created on User, possibly with a macro on \Illuminate\Database\Eloquent\Builder. Maybe by leveraging Model::resolveRelationUsing()
Manual eager loading with a union select with a left join to each table in each arm of the union, then a whereIn() to get the related records
Restructure the relations so that there is a polymorphic many to many relationship between users and other entities
e.g.
user_id | model_type | model_id
1 | App\Models\Banana | 2
1 | App\Models\Dog | 5
Maybe I'm missing something obvious...?
The linq query takes around 20 seconds for executing on some of the data . When converted the linq to sql there are 3 nested joins that might be taking more time for execution . Can we optimize the below query .
var query = (from s in this.Items
where demoIds.Contains(s.Id)
select)
.Include("demo1")
.Include("demo2")
.Include("demo3")
.Include("demo4");
return query;
The expectation is to execute the query in 3-4 seconds which is now taking around 20 secs for 100 demoIds .
As far as your code is concerned it looks like it's the best way to get what you want (Assuming Includeing "demo3" twice is a typo for this example.).
However, the database you use will have a way to optimize your queries or rather the underlying data structure. Use whatever tool your database provider has to get an execution plan of the query and see where it spends so much time. You might be missing an index or two.
I advice lazy loading or join query.
Probably SQL output is this query;
(SELECT .. FROM table1 WHERE ID in (...)) AS T1
(INNER, FULL) JOIN (SELECT .. FROM table2) AS T2 ON T1.PK = T2.FOREIGNKEY
(INNER, FULL) JOIN (SELECT .. FROM table3) AS T3 ON T1.PK = T3.FOREIGNKEY
(INNER, FULL) JOIN (SELECT .. FROM table4) AS T4 ON T1.PK = T4.FOREIGNKEY
But if you can use lazy loading, no need use the Include() func. And lazy loading will solve your problem.
Other else, you can write with join query,
var query = from i in this.Items.Where(w=>demoIds.Contains(w.Id))
join d1 in demo1 on i.Id equals d1.FK
join d2 in demo2 on i.Id equals d2.FK
join d3 in demo3 on i.Id equals d3.FK
select new ... { };
This two solutions solve your all problems.
If it continues your problem, I strongly recommended store procedure.
I've got a similar issue with a query that had 15+ "Include" statements and generated a 2M+ rows result in 7 minutes.
The solution that worked for me was:
Disabled lazy loading
Disabled auto detect changes
Split the big query in small chunks
A sample can be found below:
public IQueryable<CustomObject> PerformQuery(int id)
{
ctx.Configuration.LazyLoadingEnabled = false;
ctx.Configuration.AutoDetectChangesEnabled = false;
IQueryable<CustomObject> customObjectQueryable = ctx.CustomObjects.Where(x => x.Id == id);
var selectQuery = customObjectQueryable.Select(x => x.YourObject)
.Include(c => c.YourFirstCollection)
.Include(c => c.YourFirstCollection.OtherCollection)
.Include(c => c.YourSecondCollection);
var otherObjects = customObjectQueryable.SelectMany(x => x.OtherObjects);
selectQuery.FirstOrDefault();
otherObjects.ToList();
return customObjectQueryable;
}
IQueryable is needed in order to do all the filtering at the server side. IEnumerable would perform the filtering in memory and this is a very time consuming process. Entity Framework will fix up any associations in memory.
I see that running ANALYZE results in significantly poor performance on a particular JOIN I'm making between two tables.
Suppose the following schema:
CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name
Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).
Before ANALYZE
Now when I run the following query:
SELECT * FROM ab;
The query plan looks as follows:
1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)
This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.
After ANALYZE
When I execute the same query again, the query plan results in two table scans:
1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b
This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.
Should I adapt the index in table b to make it work after ANALYZE? Anything else to change to the indexing/schema?
I just try to understand the problem here. I worked around it using a CROSS JOIN, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).
Accidentally found the answer by adjusting the GROUP BY clause in the view definition. Instead of joining on a.id, I group on b.a instead, although they have the same values.
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY b.a -- <== changed this from a.id to b.a
ORDER BY a.name
I'm still not entirely sure what the difference is, since it groups the same data.
I wanted to write a LINQ query based on the SQL below.
Basically this strategy seems really confusing - why start from MerchantGroupMerchant and do 2 'from' statements?
Problem: Is there a simpler way to write this LINQ query?
var listOfCampaignsMerchantIsInvolvedIn =
(from merchantgroupactivity in uow.MerchantGroupActivities
from merchantgroupmerchant in uow.MerchantGroupMerchants
where merchantgroupmerchant.MerchantU.Id == merchantUIDGuid
select new
{
merchantgroupactivity.ActivityU.CampaignU.Id
}).Distinct();
Here is the table structure:
and the SQL:
SELECT DISTINCT Campaign.ID
FROM Campaign
INNER JOIN Activity
ON ( Campaign.CampaignUID = Activity.CampaignUID )
INNER JOIN MerchantGroupActivity
ON ( Activity.ActivityUID = MerchantGroupActivity.ActivityUID )
INNER JOIN MerchantGroup
ON ( MerchantGroup.MerchantGroupUID = MerchantGroupActivity.MerchantGroupUID )
INNER JOIN MerchantGroupMerchant
ON ( MerchantGroupMerchant.MerchantGroupUID = MerchantGroup.MerchantGroupUID )
INNER JOIN Merchant
ON ( Merchant.MerchantUID = MerchantGroupMerchant.MerchantUID )
WHERE Merchant.ID = 'M1'
No, not really, even if you use views to partially or completely reduce query size your execution plan will still look the same in the end (and execute just as fast/slow). If you have to traverse 5 joins then you have to traverse 5 joins, the only cure is "shorting" the model by introducing links between say merchant and activity or merchant and campaign. You can accomplish this by either introducing the M2M table between them (at the cost of manual maintenance), but I would not recommend it unless retrieval is really an issue. If this query is too slow you should check for existence of indexes on all join FK fields.
I have two entities: Master and Details.
When I query them, the resulting query to database is:
SELECT [Extent2]."needed columns listed here", [Extent1]."needed columns listed here"
FROM (SELECT * [Details]."all columns listed here"...
FROM [dbo].[Details] AS [Details]) AS [Extent1]
LEFT OUTER JOIN [dbo].[Master] AS [Extent2] ON [Extent1].[key] = [Extent2].[key]
WHERE [Extent1].[filterColumn] = #p__linq__0
My question is: why not the filter is in the inner query? How can I get this query? I've tried a lot of EF and Linq expressions.
What I need is something like:
SELECT <anything needed>
FROM Master LEFT JOIN Details ON Master.key = Details.Key
WHERE filterColumn = #param
I'm having a full sequential scan in both tables, and in my production environment, I have milions of rows in each table.
Thanks a lot !!
Sometimes The entity Framework does not produce the best query. You can do a few of the following to optimize.
Modify the linq statement (test with
LINQPad)
Create a stored proc and map the stored proc to return an entity
Create a view that handles the join and map the view to a new
entity