Best case for seeding large data in Laravel

Best case for seeding large data in Laravel - laravel

I have a file with over 30,000 records and another with 41,000. Is there a best case study for seeding this using laravel 4's db:seed command? A way to make the inserts more swift.
Thanks for the help.

Don't be afraid, 40K rows table is kind of a small one. I have a 1 milion rows table and seed was done smoothly, I just had to add this before doing it:
DB::disableQueryLog();
Before disabling it, Laravel wasted all my PHP memory limit, no matter how much I gave it.
I read data from .txt files using fgets(), building the array programatically and executing:
DB::table($table)->insert($row);
One by one, wich may be particularily slow.
My database server is a PostgreSQL and inserts took around 1.5 hours to complete, maybe because I was using a VM using low memory. I will make a benchmark one of these days on a better machine.

2018 Update
I have run into the same issue and after 2 days of headache, I could finally write script to seed 42K entries in less than 30s!
You ask How?
1st Method
This method assumes that you have a database with some entries in it(in my case were 42k entries) and you want to import same into other database. Export your database as CSV files with header names and put the file into the public folder of your project and then you can parse the file and insert one by one all the entries in new database via seeder.
So your seeder will look something like this:
<?php
use Illuminate\Database\Seeder;
class {TableName}TableSeeder extends Seeder
{
/**
* Run the database seeds.
*
* #return void
*/
public function run()
{
$row = 1;
if (($handle = fopen(base_path("public/name_of_your_csv_import.csv"), "r")) !== false) {
while (($data = fgetcsv($handle, 0, ",")) !== false) {
if ($row === 1) {
$row++;
continue;
}
$row++;
$dbData = [
'col1' => '"'.$data[0].'"',
'col2' => '"'.$data[1].'"',
'col3' => '"'.$data[2].'"',
so on...how many columns you have
];
$colNames = array_keys($dbData);
$createQuery = 'INSERT INTO locations ('.implode(',', $colNames).') VALUES ('.implode(',', $dbData).')';
DB::statement($createQuery, $data);
$this->command->info($row);
}
fclose($handle);
}
}
}
Simple and Easy :)
2nd method
In case you can modify the settings of your PHP and allocate a big size to aprticular script then this method will work as well.
Well basically you need to focus on three major steps:
Allocate more memory to script
Off Query Logger
Divide your data in chunks of 1000
Iterate through data and use insert() to create chunks of 1K at a time.
So if I combine all of the above mentioned steps in a seeder, your seeder will look something like this:
<?php
use Illuminate\Database\Seeder;
class {TableName}TableSeeder extends Seeder
{
/**
* Run the database seeds.
*
* #return void
*/
public function run()
{
ini_set('memory_limit', '512M');//allocate memory
DB::disableQueryLog();//disable log
//create chunks
$data = [
[
[
'col1'=>1,
'col2'=>1,
'col3'=>1,
'col4'=>1,
'col5'=>1
],
[
'col1'=>1,
'col2'=>1,
'col3'=>1,
'col4'=>1,
'col5'=>1
],
so on..until 1000 entries
],
[
[
'col1'=>1,
'col2'=>1,
'col3'=>1,
'col4'=>1,
'col5'=>1
],
[
'col1'=>1,
'col2'=>1,
'col3'=>1,
'col4'=>1,
'col5'=>1
],
so on..until 1000 entries
],
so on...until how many entries you have, i had 42000
]
//iterate and insert
foreach ($data as $key => $d) {
DB::table('locations')->insert($d);
$this->command->info($key);//gives you an idea where your iterator is in command line, best feeling in the world to see it rising if you ask me :D
}
}
}
and VOILA you are good to go :)
I hope it helps

I was migrating from a different database and I had to use raw sql (loaded from an external file) with bulk insert statements (I exported structure via navicat which has the option to break up your insert statements every 250KiB). Eg:
$sqlStatements = array(
"INSERT INTO `users` (`name`, `email`)
VALUES
('John Doe','john.doe#gmail.com'),.....
('Jane Doe','jane.doe#gmail.com')",
"INSERT INTO `users` (`name`, `email`)
VALUES
('John Doe2','john.doe2#gmail.com'),.....
('Jane Doe2','jane.doe2#gmail.com')"
);
I then looped through the insert statements and executed using
DB::statement($sql).
I couldn't get insert to work one row at a time. I'm sure there's alternatives that are better but this at least worked while letting me keep it within Laravel's migration/seeding.

I had the same problem today. Disabling query log wasn't enough. Looks like an event also get fired.
DB::disableQueryLog();
// DO INSERTS
// Reset events to free up memory.
DB::setEventDispatcher(new Illuminate\Events\Dispatcher());

Related

Laravel update chunked result skips rows

I'm trying to convert our database from ID to UUID. When I run the following code to update the database is skips random rows.
AppUser::select('id')->orderBy('created_at')->chunk(1000, function ($appUsers) {
foreach ($appUsers as $appUser) {
$uuid = Str::orderedUuid();
DB::table('files')->where('fileable_type', AppUserInfo::class)->where('fileable_id', $appUser->id)->update([
'fileable_id' => $uuid
]);
DB::table('app_users')->where('id', $appUser->id)->update(['id' => $uuid]);
}
});
Last time i checked ~290 were skipped out of 236196 total.
I've tried to used chunkById, but the same thing happened.
The update function is always returning true, so I must assume that Laravel thinks every row is updated when executed.

There's a big warning in the Laravel documentation on chunking:
When updating or deleting records inside the chunk callback, any changes to the primary key or foreign keys could affect the chunk query. This could potentially result in records not being included in the chunked results.
You'll need to find another way to update your keys in batches. I've used the technique described in an answer to this question: How to chunk results from a custom query in Laravel when I could not use the callback required by the chunk method, although in that case it was not for an update query, only a select.

This is what i ended up doing
$appUsers = AppUser::select('id')->get();
$chunkSize = 1000;
$numberOfChunks = ceil($appUsers->count() / $chunkSize);
$chunks = $appUsers->split($numberOfChunks);
foreach($chunks as $chunk) {
foreach($chunk as $appUser) {
$uuid = Str::orderedUuid();
DB::table('files')->where('fileable_type', AppUserInfo::class)->where('fileable_id', $appUser->id)->update([
'fileable_id' => $uuid
]);
DB::table('app_users')->where('id', $appUser->id)->update(['id' => $uuid]);
}
}

How to know or database record was deleted

I'm starting some php workers at the same time and each of them takes a job to do. These jobs are written in database table and when worker takes one - it deletes the record. My code:
$job = Job::first();
if (!empty($job) and $job->delete()==true) {
// so something
}
But the problem is that still some workers take the same $job to perform at the same time! How this can happen?
UPDATE
I'm using Postgres Database

Despite above comments seeking for a better solution, you should be able to solve it this way:
$job = Job::first();
if ($job && Job::where('id', $job->id)->delete()) {
// do something else ...
}
Explanation: Job::where('id', $job->id)->delete() will delete all job records with the given id and return the number of affected records. This may be either 0 or 1, or false and true respectively. So this should actually work, if your database handles the concurrent delete properly.

Very simple --- it is a race condition.
In order to avoid that you will need to implement some sort of database locking. You didn't indicate what database you were using, so I'm going to assume that you're using MySQL.
Your select statement needs to Lock the rows you've selected. In MySQL you do this with
DB::beginTransaction();
// Queries you need to make with eloquent
SELECT * FROM queue LIMIT 1 FOR UPDATE;
// use id if you got a row. If not just commit immediately.
DELETE FROM queue WHERE id = $id;
DB::commit();

Laravel - Collection with relations take a lot of time

We are developing an API with LUMEN.
Today we had a confused problem with getting the collection of our "TimeLog"-model.
We just wanted to get all time logs with additional informationen from the board model and task model.
In one row of time log we had a board_id and a task_id. It is a 1:1 relation on both.
This was our first code for getting the whole data. This took a lot of time and sometimes we got a timeout:
BillingController.php
public function byYear() {
$timeLog = TimeLog::get();
$resp = array();
foreach($timeLog->toArray() as $key => $value) {
if(($timeLog[$key]->board_id && $timeLog[$key]->task_id) > 0 ) {
array_push($resp, array(
'board_title' => isset($timeLog[$key]->board->title) ? $timeLog[$key]->board->title : null,
'task_title' => isset($timeLog[$key]->task->title) ? $timeLog[$key]->task->title : null,
'id' => $timeLog[$key]->id
));
}
}
return response()->json($resp);
}
The TimeLog.php where the relation has been made.
public function board()
{
return $this->belongsTo('App\Board', 'board_id', 'id');
}
public function task()
{
return $this->belongsTo('App\Task', 'task_id', 'id');
}
Our new way is like this:
BillingController.php
public function byYear() {
$timeLog = TimeLog::
join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
->getQuery()
->get();
return response()->json($timeLog);
}
We deleted the relation in TimeLog.php, cause we don't need it anymore. Now we have a load time about 1 sec, which is fine!
There are about 20k entries in the time log table.
My questions are:
Why is the first method out of range (what causes the timeout?)
What does getQuery(); exactly do?
If you need more information just ask me.

--First Question--
One of the issues you might be facing is having all those huge amount of data in memory, i.e:
$timeLog = TimeLog::get();
This is already enormous. Then when you are trying to convert the collection to array:
There is a loop through the collection.
Using the $timeLog->toArray() while initializing the loop based on my understanding is not efficient (I might not be entirely correct about this though)
Thousands of queries are made to retrieve the related models
So what I would propose are five methods (one which saves you from hundreds of query), and the last which is efficient in returning the result as customized:
Since you have many data, then chunk the result ref: Laravel chunk so you have this instead:
$timeLog = TimeLog::chunk(1000, function($logs){
foreach ($logs as $log) {
// Do the stuff here
}
});
Other way is using cursor (runs only one query where the conditions match) the internal operation of cursor as understood is using Generators.
foreach (TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->cursor() as $timelog) {
//do the other stuffs here
}
This looks like the first but instead you have already narrowed your query down to what you need:
TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->get()
Eager Loading would already present the relationship you need on the fly but might lead to more data in memory too. So possibly the chunk method would make things more easier to manage (even though you eagerload related models)
TimeLog::with(['board','task'], function ($query) {
$query->where([['board_id','>',0],['task_id', '>', 0]]);
}])->get();
You can simply use Transformer
With transformer, you can load related model, in elegant, clean and more controlled methods even if the size is huge, and one greater benefit is you can transform the result without having to worry about how to loop round it
You can simply refer to this answer in order to perform a simple use of it. However incase you don't need to transform your response then you can take other options.
Although this might not entirely solve the problem, but because the main issues you face is based on memory management, so the above methods should be useful.
--Second question--
Based on Laravel API here You could see that:
It simply returns the underlying query builder instance. To my observation, it is not needed based on your example.
UPDATE
For question 1, since it seems you want to simply return the result as response, truthfully, its more efficient to paginate this result. Laravel offers pagination The easiest of which is SimplePaginate which is good. The only thing is that it makes some few more queries on the database, but keeps a check on the last index; I guess it uses cursor as well but not sure. I guess finally this might be more ideal, having:
return TimeLog::paginate(1000);

I have faced a similar problem. The main issue here is that Elloquent is really slow doing massive task cause it fetch all the results at the same time so the short answer would be to fetch it row by row using PDO fetch.
Short example:
$db = DB::connection()->getPdo();
$query_sql = TimeLog::join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
->toSql();
$query = $db->prepare($query->sql);
$query->execute();
$logs = array();
while ($log = $query->fetch()) {
$log_filled = new TimeLog();
//fill your model and push it into an array to parse it to json in future
array_push($logs,$log_filled);
}
return response()->json($logs);

Can not Soft delete multiple rows using Query Builder

My model is something like this:
namespace App;
use Illuminate\Database\Eloquent\SoftDeletes;
class Photo extends Model {
use SoftDeletes;
protected $dates = ['deleted_at'];
}
_ I can soft delete using:
$\App\Photo::find(1)->delete();
_ It does not work when I try to use soft delete on multiple rows:
\App\Photo::whereIn('id', [1,2,3])->delete();
Does any one know why?

No, you can't soft delete multiple rows.
The only Laravel way is the DB facade in this case.
Here is how I would soft delete multiple rows.
DB::table('table_name')->whereIn('id', [array of ids])
->update([
'deleted_at' => now()
]);
or
ModelName::whereIn('id', [array of ids])
->update(['deleted_at' => now()]);
Instead of whereIn you can put in any where condition like you usually put and can update the deleted_at key. Soft delete is nothing but marking the column as deleted.
This is also a very efficient solution rather than running soft delete for each model inside a loop which can crash the system if there're too many items in the array.
Hope this helps.

The soft delete functionality only works on an instance of the Eloquent model itself. When you are doing this:
\App\Photo::find(1)->delete();
You are actually first retrieving the Photo with an ID of 1 from the database which is then prepared and made available as an instance of the Eloquent model (which can then use soft delete).
However, when you do this:
\App\Photo::whereIn('id', [1,2,3])->delete();
You are not actually retrieving anything from the database, you are basically just preparing DELETE SQL in a more convenient way. This effectively runs something like:
DELETE FROM `photos` WHERE `id` IN (1,2,3);
This is different from something like:
foreach (\App\Photo::whereIn('id', [1,2,3])->get() as $photo) {
$photo->delete(); # $photo is an eloquent model and can soft-delete
}
Notice the ->get() which is actually grabbing data from the database first and will make it available as a collection of Eloquent models (which then can soft delete).
I don't think you can soft-delete a batch. In my foreach example using ->get() I imagine multiple queries are executed - something like:
UPDATE `photos` SET `deleted_at` = NOW() WHERE `id` = 1;
UPDATE `photos` SET `deleted_at` = NOW() WHERE `id` = 2;
UPDATE `photos` SET `deleted_at` = NOW() WHERE `id` = 3;
Hope that makes sense.

Try this as well:
\App\Photo::whereIn('id', [1,2,3])
->get()
->map(function($photo) {
$photo->delete();
});

Insert lots of data at once using Laravel migrations?

I currently parse a CSV file to insert data into a database, but the problem is that because it's 20 000 rows, it takes very long. Is there a way to insert more lines at once using Laravel migrations?
This is what I am doing at the moment:
foreach ($towns as $town) {
DB::table('town')->insert(
array(
// data goes here
)
);
}
I think maybe my question is a bit vague. I want to know what the format is to mass insert multiple items using one query, and if this will actually make a difference in speed?

You can mass insert by filling an array with your data:
foreach ($towns as $town) {
$array[] = array(... your data goes here...);
}
And then run it just once
DB::table('town')->insert($array);
But I really don't know how much faster it can be. You can also disable query log:
DB::disableQueryLog();
It uses less memory and is usually faster.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Best case for seeding large data in Laravel - laravel

I have a file with over 30,000 records and another with 41,000. Is there a best case study for seeding this using laravel 4's db:seed command? A way to make the inserts more swift. Thanks for the help.

I had the same problem today. Disabling query log wasn't enough. Looks like an event also get fired. DB::disableQueryLog(); // DO INSERTS // Reset events to free up memory. DB::setEventDispatcher(new Illuminate\Events\Dispatcher());

Related

Laravel update chunked result skips rows

How to know or database record was deleted

Laravel - Collection with relations take a lot of time

Can not Soft delete multiple rows using Query Builder

Insert lots of data at once using Laravel migrations?

Categories

Resources