Faster pagination with Doctrine 2.1 EXTRA_LAZY associations - performance

Doctrine 2.1 brings a new feature EXTRA_LAZY loading for associations: https://www.doctrine-project.org/projects/doctrine-orm/en/latest/tutorials/extra-lazy-associations.html
This feature creates a new method slice($offset, $length) to query just a page of the association and is very useful for pagination of large data sets.
However, behind the scene the SQL query uses the classic LIMIT XX OFFSET XX syntax which is slow for large data sets (https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/)
Is there a way to use the pagination with a WHERE clause?
If not, how may I extend the instance of Doctrine\ORM\PersistentCollection to create a method sliceWithCursor($columnName, $cursor, $length)?
My main goal is to implement a faster pagination while using the very convenient magic of Doctrine for associations.
Thanks !

You can use the matching function of Doctrine\ORM\PersistentCollection, providing the criteria to filter, e.g.:
use Doctrine\Common\Collections\Criteria;
$group = $entityManager->find('Group', $groupId);
$userCollection = $group->getUsers();
$criteria = Criteria::create()
->where(Criteria::expr()->eq("birthday", "1982-02-17"))
->orderBy(array("username" => Criteria::ASC));
$birthdayUsers = $userCollection->matching($criteria);
matching() returns a Doctrine\ORM\LazyCriteriaCollection, if your association is defined as EXTRA_LAZY.
You can paginate with the latter:
$birthdayUsers->slice($offset, $length);
Using cursor pagination
In some cases, it is required to use cursor pagination. You could do this by extending Doctrine\ORM\PersistentCollection, as suggested:
use Doctrine\Common\Collections\Criteria;
public function sliceWithCursor($criteria, $cursorEntity, $limit) {
$orderBy = $criteria->getOrderings();
foreach ($orderBy as $columnName => $direction) {
if ($direction === Criteria::ASC) {
$criteria->andWhere(Criteria::expr()->gte($columnName, $cursorEntity->{$columnName}));
} else {
$criteria->andWhere(Criteria::expr()->lte($columnName, $cursorEntity->{$columnName}));
}
}
// exclude cursor entity from the results
$criteria->andWhere(Criteria::expr()->neq("id", $cursorEntity->id));
$criteria->setMaxResults($limit);
return $this->matching($criteria);
}
The idea of cursor based pagination is to use a result row as starting point, instead of an offset, and get the next rows. As stated at alternative for using OFFSET, the idea is to substitute offset, with conditions from the order by clause.

Related

Algolia Laravel Scout complex where clauses and eager loading

Since Laravel Scout doesn't support more complex where clauses than simple numeric comparisons.
I checked the source code and I found the following lines
if (!empty($models = $model->getScoutModelsByIds($builder, $modelKeys))) {
$instances = $instances->merge($models->load($searchable->getRelations($modelClass)));
}
The instances is what is returned from Algolia search, so for example the following search essentially returns the $instances variable.
Mode::search('something')->get();
the $model is the searchable model and the getScoutModelsByIds what It basically does is a query to the database like
public function getScoutModelsByIds(){
$model->whereIn('id', $modelKeys)->get();
}
I was wondering if I apply any kind of where clauses or addSelect, or with eager loading, on the model before actually retrieving the data from the database, is it a good idea ?
For example
$model->where('some condition')->whereIn('id', $modelKeys)->get();
and instead of using lazy loading
$instances = $instances->merge($models->load($searchable->getRelations($modelClass)));
use the with function before retrieving the data from db.
For example
$model->where('some condition')->whereIn('id', $modelKeys)->with('relationships')->get();

Is Laravel sortBy slower or heavier to run than orderBy?

My concern is that while orderBy is applied to the query, I'm not sure how the sortBy is applied?
The reason for using sortBy in my case is because I get the collection via the model (i.e. $user->houses->sortBy('created_at')).
I'm just concerned about the performance: is sortBy simply looping each object and sorting them?, or is Laravel smart enough to simply transform the sortBy into an orderBy executed within the original query?
You need orderBy in order to perform a SQL order.
$user->houses()->orderBy('created_at')->get()
You can also eager load the houses in the right order to avoid N+1 queries.
$users = User::with(['houses' => function ($query) {
return $query->orderBy('created_at');
}])->get();
$orderedHouses = $users->first()->houses;
The sortBy method is applied to the Collection so indeed, it will looping each objects.
The orderBy() method is much more efficient than the sortBy() method when querying databases of a non-trivial size / at least 1000+ rows. This is because the orderBy() method is essentially planning out an SQL query that has not yet run whereas the sortBy() method will sort the result of a query.
For reference, it is important to understand the difference between a Collection object and a Builder object in Laravel.
A builder object is, essentially, an SQL query that has not been run. In contrast, a collection is essentially an array with some extra functionality/methods added. Sorting an array is much less efficient than pulling the data from the DB in the correct format on the actual query.
example code :
<?php
// Plan out a query to retrieve the posts alphabetized Z-A
// This is still a query and has not actually run
$posts = Posts::select('id', 'created_at', 'title')->orderBy('title', 'desc');
// Now the query has actually run. $posts is now a collection.
$posts = $posts->get();
// If you want to then sort this collection object to be ordered by the created_at
timestamp, you *could* do this.
// This will run quickly with a small number or rows in the result,
// but will be essentially unusable/so slow that your server will throw 500 errors
// if the collection contains hundreds or thousands or objects.
$posts = $posts->sortBy('created_at');

Magento search queries yielding empty results in API

I have this chunk of code:
//to-do
public function searchVehicles($terms, $offset=1, $order='ASC')
{
if (trim($terms) == '') {
return array();
}
$query = $this->_getQuery($terms);
$query->setStoreId(1);
if ($query->getId()) {
$query->setPopularity($query->getPopularity()+1);
}
else {
$query->setPopularity(1);
}
$query->prepare();
$query->save();
$collection = Mage::getResourceModel('catalog/product_collection');
$collection->getSelect()->joinInner(
array('search_result' => $collection->getTable('catalogsearch/result')),
$collection->getConnection()->quoteInto(
'search_result.product_id=e.entity_id AND search_result.query_id=?',
$query->getId()
),
array('relevance' => 'relevance')
);
$collection->setStore(1);
//Mage::getSingleton('catalog/product_status')->addVisibleFilterToCollection($collection);
//Mage::getSingleton('catalog/product_visibility')->addVisibleInSearchFilterToCollection($collection);
return $this->_listProductCollection($collection, $offset, $order);
}
Which is inside a Resource class and reachable via SOAP.
Before we start: Yes, I remember to do the cache flushing and recompiling process - I clarify because this is an usual issue to newbies like me xDDD.
Now: I can access such method but it returns [].
SPECIAL NOTE: $this->_listProductCollection($collection, $offset, $order); WORKS since i'm using the same method in other collections fetched from other methods in the same resource, and have no trouble at all.
Let me review the intention of my code since I'm a newbie at Magento (I'm using version 1.6.2).
The code is based on the CatalogSearch/ResultController controller's indexAction() method, and tried to learn about it.
An empty query will yield an empty result and will not bother the Magento search engine.
There's only a Store (id = 1) in the site and the search query is created like this:
private function _getQuery($terms)
{
$query = Mage::getModel('catalogsearch/query')->loadByQuery($terms);
if (!$query->getId()) {
$query->setQueryText($terms);
}
return $query;
}
The query increases it's popularity (I took this code from the controller. I assume this is for statistical purposes only).
The query is prepared (I think this means: the MySQL internal query is prepared) so I can fetch it later.
The query is saved - AFAIK this means that the query results are iterated and cached so a subsequent same query will only fetch the stored results instead of processing the search again.
At this point the query will have an ID.
I get the whole Product collection, and join it with the search result table. SEEMS that the results table has - at least (queryId, matchedProductId). I only keep the products having IDs in the matched results, and from store 1.
I list the products.
Note that the filters are currently commented.
However, the returned list is [] (an empty list) when I hit this API entry point, althought searching in the usual search bar gives me the expected result.
Question: What am I missing? What did I misunderstood in the process?

MINUS operation in Eloquent ORM

Is there any equivalent MINUS operation from SQL using Eloquent ORM?
For example
$model1 = Model::where('some constraints applied')
$model2 = Model::where('some constraints applied')
I want to get all models that exist in $model1 but not in $model2
seblaze's answer looks good, though it will run 3 queries. Another option is diff() method of the Collection object:
$result = $model1->diff($model2);
This works after fetching data from the db with 2 queries, but complete set of data (unless there are more depending on your 'constraints applied').
The easiest way i see it is :
//Get the id's of first model as array
$ids1 = $model1->lists('id');
//get the id's of second models as array
$ids2 = $model2->lists('id');
//get the models
$models = Model::whereIn('id',$ids1)->whereNotIn('id',$ids2)->get();
This is not tested code, please read more about eloquent queries here

Entity Framework LINQ Query using Custom C# Class Method - Once yes, once no - because executing on the client or in SQL?

I have two Entity Framework 4 Linq queries I wrote that make use of a custom class method, one works and one does not:
The custom method is:
public static DateTime GetLastReadToDate(string fbaUsername, Discussion discussion)
{
return (discussion.DiscussionUserReads.Where(dur => dur.User.aspnet_User.UserName == fbaUsername).FirstOrDefault() ?? new DiscussionUserRead { ReadToDate = DateTime.Now.AddYears(-99) }).ReadToDate;
}
The linq query that works calls a from after a from, the equivalent of SelectMany():
from g in oc.Users.Where(u => u.aspnet_User.UserName == fbaUsername).First().Groups
from d in g.Discussions
select new
{
UnReadPostCount = d.Posts.Where(p => p.CreatedDate > DiscussionRepository.GetLastReadToDate(fbaUsername, p.Discussion)).Count()
};
The query that does not work is more like a regular select:
from d in oc.Discussions
where d.Group.Name == "Student"
select new
{
UnReadPostCount = d.Posts.Where(p => p.CreatedDate > DiscussionRepository.GetLastReadToDate(fbaUsername, p.Discussion)).Count(),
};
The error I get is:
LINQ to Entities does not recognize the method 'System.DateTime GetLastReadToDate(System.String, Discussion)' method, and this method cannot be translated into a store expression.
My question is, why am I able to use my custom GetLastReadToDate() method in the first query and not the second? I suppose this has something to do with what gets executed on the db server and what gets executed on the client? These queries seem to use the GetLastReadToDate() method so similarly though, I'm wondering why would work for the first and not the second, and most importantly if there's a way to factor common query syntax like what's in the GetLastReadToDate() method into a separate location to be reused in several different other LINQ queries.
Please note all these queries are sharing the same object context.
I think your better of using a Model Defined Function here.
Define a scalar function in your database which returns a DateTime, pass through whatever you need, map it on your model, then use it in your LINQ query:
from g in oc.Users.Where(u => u.aspnet_User.UserName == fbaUsername).First().Groups
from d in g.Discussions
select new
{
UnReadPostCount = d.Posts.Where(p => p.CreatedDate > myFunkyModelFunction(fbaUsername, p.Discussion)).Count()
};
and most importantly if there's a way to factor common query syntax like what's in the GetLastReadToDate() method into a separate location to be reused in several different places LINQ queries.
A stored procedure would probably be one way to store that 'common query syntax"...EF, at least 4.0, works very nicely with SP's.

Resources