Limiting Eloquent chunks - laravel

I have a very large result set to process and so I'm using the chunk() method to reduce the memory footprint of the job. However, I only want to process a certain number of total results to prevent the job from running too long.
Currently I'm doing this, but it does not seem like an elegant solution:
$count = 0;
$max = 1000000;
$lists = Lists::whereReady(true);
$lists->chunk(1000, function (Collection $lists) use (&$count, $max) {
if ($count >= $max)
return;
foreach ($lists as $list) {
if ($count >= $max)
break;
$count++;
// ...do stuff
}
});
Is there a cleaner way to do this?

As of right now, I don't believe so.
There have been some issues and pull requests submitted to have chunk respect previously set skip/limits, but Taylor has closed them as expected behavior that chunk overwrites these.
There is currently an open issue in the laravel/internals repo where he said he'd take a look again, but I don't think it is high on the priority list. I doubt it is something he would work on, but may be more receptive to another pull request now.
Your solution looks fine, except for one thing. chunk() will end up reading your entire table, unless you return false from your closure. Currently, you are just returning null, so even though your "max" is set to 1000000, it will still read the entire table. If you return false from your closure when $count >= $max, chunk() will stop querying the database. It will cause chunk() to return false itself, but your example code doesn't care about the return of chunk() anyway, so that's okay.
Another option, assuming you're using sequential ids, would be to get the ending id and then add a where clause to your chunked query to get all the records with an id less than your max id. So, something like:
$max = 1000000;
$maxId = Lists::whereReady(true)->skip($max)->take(1)->value('id');
$lists = Lists::whereReady(true)->where('id', '<', $maxId);
$lists->chunk(1000, function (Collection $lists) {
foreach ($lists as $list) {
// ...do stuff
}
});
Code is slightly cleaner, but it is still a hack, and requires one extra query (to get the max id).

Related

Eloquent Model save() run out of memory

I use eloquent model to do some complex migration of database and run out of memory during the processing. Can someone explain what's the reason? Thank you!
Laravel version: "v8.52.0"
Test code:
public function handle()
{
for ($i = 0; $i < 100; $i++) {
Customer::chunkById(1000, function ($customers) use ($i) {
$this->print_progress();
foreach ($customers as $customer) {
$customer->first_name = (string)$i;
$customer->save();
}
});
}
}
Output (memory usage):
usage: 27MB - peek: 27MB
usage: 33MB - peek: 33MB
usage: 39MB - peek: 39MB
...
...
usage: 491MB - peek: 491MB
usage: 496MB - peek: 496MB
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 20480 bytes) in /home/vagrant/code/billing/vendor/laravel/framework/src/Illumi nate/Support/Str.php on line 855
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 32768 bytes) in /home/vagrant/code/billing/vendor/symfony/error-handler/Error/ FatalError.php on line 1```
Update: Memory leak is caused by telescope. When turn off telescope, no memroy leak occurs.
You have a memory leak somewhere.
I'll assume the problem is not within the print_progress function but please, double check it (or edit your question with its content).
It's hard to give you an accurate answer since there can be many things that cause memory leaks, but try to use saveQuietly instead of save. Model events will not be dispatched and it might be the cause of your problem.
Also, check you are not using Laravel Telescope and if you do, disable it during these tests.
First of all, I see your code is really strange. I understand if you are just testing stuff.
So, your logic is this:
Iterate the next step 100 times.
Each iteration will go to the database and get a collection of 1000 Customers in chunks (but you will always iterate over all your Customers).
You are going to iterate over each chunk and say the first name of each 1000 you got in the previous step is going to be the current $i index (as string). This will happen for all your Customers, but you are doing it in in chunks.
So:
You are not filtering your query in any way, so maybe chunkById is being less performant than a normal chunk, so first of all try to use that.
If you are still running out of memory, just reduce your 1000 to 500 or, personal recommendation 200 or 100, never have more than that per chunk... Getting 1000 models in a collection is not very performant wise.
You can have a code a little bit more readable or Laravel friendly by using Higher Order Messages:
public function handle()
{
for ($i = 0; $i < 100; $i++) {
Customer::chunk(100, function ($customers) use ($i) {
$this->print_progress();
$customers->each->update(['first_name' => (string)$i]);
});
}
}
But, if you want to be 100% performant, you can disregard chunk and directly update the table so that will be nearly instant compared to letting PHP do that work:
public function handle()
{
for ($i = 0; $i < 100; $i++) {
Customer::update(['first_name' => $i]);
$this->print_progress();
}
}
But I am not sure if you are just testing performance or what, so maybe this last code is of no use for you.

Laravel - Collection with relations take a lot of time

We are developing an API with LUMEN.
Today we had a confused problem with getting the collection of our "TimeLog"-model.
We just wanted to get all time logs with additional informationen from the board model and task model.
In one row of time log we had a board_id and a task_id. It is a 1:1 relation on both.
This was our first code for getting the whole data. This took a lot of time and sometimes we got a timeout:
BillingController.php
public function byYear() {
$timeLog = TimeLog::get();
$resp = array();
foreach($timeLog->toArray() as $key => $value) {
if(($timeLog[$key]->board_id && $timeLog[$key]->task_id) > 0 ) {
array_push($resp, array(
'board_title' => isset($timeLog[$key]->board->title) ? $timeLog[$key]->board->title : null,
'task_title' => isset($timeLog[$key]->task->title) ? $timeLog[$key]->task->title : null,
'id' => $timeLog[$key]->id
));
}
}
return response()->json($resp);
}
The TimeLog.php where the relation has been made.
public function board()
{
return $this->belongsTo('App\Board', 'board_id', 'id');
}
public function task()
{
return $this->belongsTo('App\Task', 'task_id', 'id');
}
Our new way is like this:
BillingController.php
public function byYear() {
$timeLog = TimeLog::
join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
->getQuery()
->get();
return response()->json($timeLog);
}
We deleted the relation in TimeLog.php, cause we don't need it anymore. Now we have a load time about 1 sec, which is fine!
There are about 20k entries in the time log table.
My questions are:
Why is the first method out of range (what causes the timeout?)
What does getQuery(); exactly do?
If you need more information just ask me.
--First Question--
One of the issues you might be facing is having all those huge amount of data in memory, i.e:
$timeLog = TimeLog::get();
This is already enormous. Then when you are trying to convert the collection to array:
There is a loop through the collection.
Using the $timeLog->toArray() while initializing the loop based on my understanding is not efficient (I might not be entirely correct about this though)
Thousands of queries are made to retrieve the related models
So what I would propose are five methods (one which saves you from hundreds of query), and the last which is efficient in returning the result as customized:
Since you have many data, then chunk the result ref: Laravel chunk so you have this instead:
$timeLog = TimeLog::chunk(1000, function($logs){
foreach ($logs as $log) {
// Do the stuff here
}
});
Other way is using cursor (runs only one query where the conditions match) the internal operation of cursor as understood is using Generators.
foreach (TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->cursor() as $timelog) {
//do the other stuffs here
}
This looks like the first but instead you have already narrowed your query down to what you need:
TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->get()
Eager Loading would already present the relationship you need on the fly but might lead to more data in memory too. So possibly the chunk method would make things more easier to manage (even though you eagerload related models)
TimeLog::with(['board','task'], function ($query) {
$query->where([['board_id','>',0],['task_id', '>', 0]]);
}])->get();
You can simply use Transformer
With transformer, you can load related model, in elegant, clean and more controlled methods even if the size is huge, and one greater benefit is you can transform the result without having to worry about how to loop round it
You can simply refer to this answer in order to perform a simple use of it. However incase you don't need to transform your response then you can take other options.
Although this might not entirely solve the problem, but because the main issues you face is based on memory management, so the above methods should be useful.
--Second question--
Based on Laravel API here You could see that:
It simply returns the underlying query builder instance. To my observation, it is not needed based on your example.
UPDATE
For question 1, since it seems you want to simply return the result as response, truthfully, its more efficient to paginate this result. Laravel offers pagination The easiest of which is SimplePaginate which is good. The only thing is that it makes some few more queries on the database, but keeps a check on the last index; I guess it uses cursor as well but not sure. I guess finally this might be more ideal, having:
return TimeLog::paginate(1000);
I have faced a similar problem. The main issue here is that Elloquent is really slow doing massive task cause it fetch all the results at the same time so the short answer would be to fetch it row by row using PDO fetch.
Short example:
$db = DB::connection()->getPdo();
$query_sql = TimeLog::join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
->toSql();
$query = $db->prepare($query->sql);
$query->execute();
$logs = array();
while ($log = $query->fetch()) {
$log_filled = new TimeLog();
//fill your model and push it into an array to parse it to json in future
array_push($logs,$log_filled);
}
return response()->json($logs);

Laravel: Global query variable?

I have a query which I use all over my routes.php under almost every get request and also use the results in many of my views. It makes more sense at this point for me to call the query once and be able to use it globally without ever having to call it again.
Here's the query:
$followers = Follower::where('user_id', '1')
->get();
How can I do this?
Why not just execute the query once in an init function and store the result into a global variable?
global $followers = Follower::where('user_id', '1')
->get();
you can store it to the session every time the user logs in
Like this exemple
$followers = Follower::where('user_id', '1')
->first();
Session::put('followers', 'value');
whenever you want that you can access it like this
$value = Session::get('followers');
The another answer with session is a simple solution but
I would suggest you to use Laravel Cache for this purpose (because this is the standard practice).
The Laravel Cache::remember accepts three parameters.
key: make an md5 key of 'followers' and 'user id'
time: time in minutes you want to cache the values (depending how frequently your values will be changed)
A closure function which runs when no value is found corresponding to the key. (this method will query once, in this case, and store the value in your cache)
Just do the following in your BaseController's constructor:
$id = 1; //User id
$key = md5('followers'.$id);
$minutes = 60; //cache for 1 hour, change it accordingly
$followers = Cache::remember($key, $minutes, function() use ($id) {
return Follower::where('user_id', $id)->get();
});
Now of course to use Cache you need to use some Cache driver like Redis.
If you don't have how to setup it Read my other answer.
Though it may be little longer solution for your problem, and may take you 15-20 min to set up and run everything, but believe me once you start using cache you will love it.

Propel saving tags

I have a Post model and I am inserting the Tags for the posts like the one below. When I am editing there can be some tags being removed. So what is the right way to remove the tags and re-insert ?
$post->setTitle($data['title']);
$post->setBody($data['body']);
$post->setSlug($data['slug']);
$tags = explode(',', $data['tags']);
// Want to remove the tags
foreach ($tags as $tag) {
$tagobj = TagQuery::create()->findOneByName($tag);
if (! $tagobj) {
$tagobj = new Tag();
$tagobj->setName($tag);
$tagobj->save();
}
$post->addTag($tagobj);
}
$post->save();
Does propel can insert in a single query or is this a worst approach .
I have asked the question in propel group, but :-( https://groups.google.com/d/msg/propel-users/x6PH_DwLtVE/H84o1cu4W4kJ
The full source code is here
The goal is to re-save the tags, when one tag is removed or one tag is added. What to do ? .
1st priority.
Optimization is second priority.
Update2 :
I modified the code as something like below with the reply I got
$tags = explode(',', $data['tags']);
foreach ($tags as $tag) {
$tagobj = TagQuery::create()->findOneByName($tag);
if (! $tagobj) {
$tagobj = new Tag();
$tagobj->setName($tag);
$tagobj->save();
}
}
// var_dump($tags);
$tagcollection = TagQuery::create()->findByName($tags);
// var_dump($tagcollection);
// exit;
$post->setTags($tagcollection);
Now I am getting array to string conversion error .
Notice: Array to string conversion in /var/www/harisample/vendor/propel/propel/src/Propel/Runtime/Connection/StatementWrapper.php on
line 171 Call Stack: 0.0001 131940
{main}() /var/www/harisample/web/index.php:0 0.0243 1259056
Aura\Framework\Bootstrap\Web->exec() /var/www/harisample/web/index.php:13 0.0243 1259108
Aura\Framework\Web\Controller\Front->exec() /var/www/harisample/package/Aura.Framework/src/Aura/Framework/Bootstrap/Web.php:71 0.0243 1259436
Aura\Framework\Web\Controller\Front->request() /var/www/harisample/package/Aura.Framework/src/Aura/Framework/Web/Controller/Front.php:168 0.0314 1694584
Aura\Web\Controller\AbstractPage->exec() /var/www/harisample/package/Aura.Framework/src/Aura/Framework/Web/Controller/Front.php:222 0.0316 1699500
Aura\Web\Controller\AbstractPage->action() /var/www/harisample/package/Aura.Web/src/Aura/Web/Controller/AbstractPage.php:168 0.0316 1699576
Aura\Web\Controller\AbstractPage->invokeMethod() /var/www/harisample/package/Aura.Web/src/Aura/Web/Controller/AbstractPage.php:206 0.0316 1699960
ReflectionMethod->invokeArgs() /var/www/harisample/package/Aura.Web/src/Aura/Web/Controller/AbstractPage.php:231 0.0316 1699976
Hari\Sample\Web\Post\Page->actionEdit() /var/www/harisample/package/Aura.Web/src/Aura/Web/Controller/AbstractPage.php:231 0.0856 5802116 1
Hari\Sample\Model\Base\Post->save() /var/www/harisample/package/Hari.Sample/src/Hari/Sample/Web/Post/Page.php:127 0.0874 5808356 1
Hari\Sample\Model\Base\Post->doSave() /var/www/harisample/package/Hari.Sample/src/Hari/Sample/Model/Base/Post.php:930 0.0881 5813420 1
Hari\Sample\Model\Base\PostTagQuery->delete() /var/www/harisample/package/Hari.Sample/src/Hari/Sample/Model/Base/Post.php:1000 0.0881 5813700 1
Propel\Runtime\ActiveQuery\ModelCriteria->delete() /var/www/harisample/package/Hari.Sample/src/Hari/Sample/Model/Base/PostTagQuery.php:557 0.0881 5814628 1
Propel\Runtime\ActiveQuery\Criteria->doDelete() /var/www/harisample/vendor/propel/propel/src/Propel/Runtime/ActiveQuery/ModelCriteria.php:1324 0.0883 5817716 1
Propel\Runtime\Connection\StatementWrapper->execute() /var/www/harisample/vendor/propel/propel/src/Propel/Runtime/ActiveQuery/Criteria.php:2408 0.0883 5817772 1
PDOStatement->execute() /var/www/harisample/vendor/propel/propel/src/Propel/Runtime/Connection/StatementWrapper.php:171
Thanks
I guess your goal is optimizing the number of (mysql ?) request, isn't it ?
I think this is not possible, mainly because Propel object saving rely on other steps - think of preSave(), postSave(), behaviors also - needing a single saving query execution for every object.
By trying to make an optimized query, you would loose the benefit of the Propel saving workflow and relations management.
On the other hand, I am not sure about the way clearTags() really works, I think it just remove object references but does not delete records in the database.
You must have in your BasePost.php file a setTags() method that will actually replace any previous relation with the new object collection you provide.
I'm making sth similar, but still something is not good. Maybe you can try, maybe it will work on your project.
$tagNames = $tags->getTags();
$tagsArray = explode(',', $tagNames);
$postTagToDelete = PostTagQuery::create()->filterByPostId($post->getId())->find();
if ($postTagToDelete) {
$postTagToDelete->delete();
}
foreach ($tagsArray as $tagName) {
$tag = TagQuery::create()->filterByName($tagName)->findOne();
//when i find an existing tag,
// there is no need to create another one
//I just simply add it **(it's not working here)**
if ($tag != null) {
$post->addTag($tag);
} else {
//when tag is new
$tag = new Tag();
$tag->setName($tagName);
$post->addTag($tag);
}
}
$post->save()
See my problem here.
To be more specific, let's pretend that I have four elements in $tagsArray.
[first, second, third, fourth]
Every of them IS the database already, so it gonna enter first if four times.
The problem is that only second, third and fourth will be saved. There will be no first . Why?
Another example is that if I have array[first] and do the same (first is in the databse already) it will be saved every only the second time. So I have sth like is in database, database empty, is in database, database empty,[...] every request attempt.

Does this cause a MongoDB performance issue (when doing the `limit` on the client-side by 'breaking' the `cursor`)?

Though this has nothing to do with PHP specifically, I use PHP in the following examples.
Let's say this is the 'normal' way of limiting results.
$db->users->find()->limit(10);
This is probably the fastest way, but there are some restrictions here... In the following example, I'll filter out all rows that have the save value for a certain column as the previous row:
$cursor = $db->users->find();
$prev = null;
$results = array();
foreach ($cursor as $row) {
if ($row['coll'] != $prev['coll']) {
$results[] = $row;
$prev = $row;
}
}
But you still want to limit the results to 10, of course. So you could use the following:
$cursor = $db->users->find();
$prev = null;
$results = array();
foreach ($cursor as $row) {
if ($row['coll'] != $prev['coll']) {
$results[] = $row;
if (count($results) == 10) break;
$prev = $row;
}
}
Explanation: since the $cursor does not actually load the results from the database, breaking the foreach-loop will limit it just as the limit(...)-function does.
Just for sure, is this really working as I'm saying, or are there any performance issues I'm not aware of?
Thank you very much,
Tim
Explanation: since the $cursor does not actually load the results from the database, breaking the foreach-loop will limit it just as the limit(...)-function does.
This is not 100% true.
When you do the foreach, you're basically issuing a series of hasNext / getNext that is looping through the data.
However, underneath this layer, the driver is actually requesting and receiving batches of results. When you do a getNext the driver will seamlessly fetch the next batch for you.
You can control the batch size. The details in the documentation should help clarify what's happening.
In your second example, if you get to 10 and then break there are two side effects:
The cursor remains open on the server (times out in 10 minutes, generally not a big impact).
You may have more data cached in $cursor. This cache will go away when $cursor goes out of scope.
In most cases, these side effects are "not a big deal". But if you're doing lots of this processing in a single process, you'll want to "clean up" to avoid having cursors hanging around.

Resources