Can I optimize this script updating ~6000 rows with a lot of data - laravel

I have ~5-6k $items that I need to update in the database. Each item needs a HTTP request to get the data from the page. In the HTTP GET request I get arrays that are massive (~500-2500) and I need to insert only those lines that are not in the database. It seems to take a lot of time with my current script (1 item every 2-4 minutes) on my vagrant scotch box.
Simplified example:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use App\Http\Requests;
use GuzzleHttp\Client;
use App\Item;
use App\ItemHistory;
use Carbon\Carbon;
use DB;
class UpdateController extends Controller
{
public function getStart() {
// Don't cancel the script
ignore_user_abort(true);
set_time_limit(0);
$client = new Client();
$items = Item::where('updated_at', '<=', Carbon::now()->subDay())->get();
foreach($items as $item) {
$response = $client->request('GET', 'API_URL');
// get the body
$body = $response->getBody()->getContents();
$hugeArray = $body['history']; // can be from 100 to 5 000 lines and I use regex to get the "history" array from the body
$arrayCollection = collect($hugeArray);
foreach($arrayCollection->take(-100) as $row) { // I take the last 100 since each row = 1 hour, so I get items in the last 100 hours
$date = new \DateTime($row['created_at']);
if( ! ItemHistory::whereItemId($item->id)->whereSoldAt($date)->count()) { // Checking if it already exists
// I insert the new rows..
$history = new ItemHistory;
// ....
$history->save();
}
}
}
}
}
I actually crawl the data and use regex to find the arrays in the body response.
Am I doing something wrong? It takes quite a while until it moves onto the next $item.

I can provide a simplified answer - synchronous execution, object hydration, and bulk database querys.
Consider the following example:
$requests = function () use ($items) {
foreach ($items as $item) {
yield new GuzzleHttp\Psr7\Request($method, $uri);
}
};
$client = new GuzzleHttp\Client();
foreach ($requests() as $request) {
$client->sendAsync($request)
->then(
function(Psr7\Http\Message\ResponseInterface) {
// process the response into array;
return $arrayFromResponse;
})
->then(
function ($unfilteredArray) {
// filter the array as necessary
return $filteredArray;
})
->then(
function($filteredArray) {
// create the array for bulk insert / update
return $sqlArray;
})
->then(
function($sqlArray) {
// perform bulk db operations.
}
);
}
Synchronous Http queries - The above example highlight's some of Guzzle's asynchronous capabilities, while breaking out the processing steps. The code you linked above is synchronous. Perform a request, wait for a response, process response, rince & repeat. Asynchronous Http requests will ensure that data is being downloaded while other information is being processed. I should note that your results will vary, and depending on your particular use case, may see increased resource usage.
Object Hydration - aka what your ORM is doing when you perform a query and it returns an object instance (rather than an array), is time consuming and memory intensive. #orcamius (one of Doctrine's developers) wrote a fairly technical article on the subject. While this is not Eloquent specific, it does provide insight into operations that go on behind the scenes for all ORM's. The code snippet performs many of these (reference $itemHistory, $history, Item::where).
Bulk Database Operations - a widely known fact is that database operations are slow. This time is further increased when coupled with object hydration. It is much better to perform a single insert with 1000x records vs 1000x inserts. To do this, code will have to be modified from using the ORM to using the DB tables directly. Bulk inserts can be performed by DB::table('itemHistory')->insert($arrayOfValues) as seen in the docs
Update: Although not shown then() has a method signature of then(callable $fulfilled, callable $onError). If something goes awry with the request you could do something like
// promise returned from a request
$p->then(
function (Psr\Http\Message\ResponseInterface $response) use ($p)
if ($response->getResponseCode() >= 400) {
$p->cancel();
}
//perform processing
return $someArray;
},
function (RequestException $e) {
echo $e->getMessage() . "\n";
echo $e->getRequest()->getMethod();
})
->then(
function($someArray) use ($p) {
// filter or other processing
});
Additional information on Guzzle's Promises can be found within the Github Repo

Related

Get number of returned rows from a query using DB::Listen()

I'm addding some database logging to a laravel (5.8) application and I have registered a DB::listener callback, but it seems I'm fairly limited to the data the $query object has populated.
It does have the time taken to execute, the statement, so it must be being logged after the query is run, so it would make sense for it to be posible to return the number of rows impacted/returned.
I've configured a custom channel for the DB logs, and only enabled them when a config value is set.
My implementation looks like the below.
if (config('app.sql_profiler')) {
DB::listen(function ($query) {
Log::channel('db')->debug(
$query->sql,
[$query->bindings, $query->time]
);
});
}
I would like to extend it to look like
if (config('app.sql_profiler')) {
DB::listen(function ($query) {
Log::channel('db')->debug(
$query->sql,
[
$query->bindings,
$query->time,
// add $query->resultCount.
]
);
});
}
Any suggestions as to where to begin looking would be very helpful.

Update Laravel model from external API

I have a Coin model with id,name,price.
In a function, I extract all the coins and create a comma separated string with all the ids:
$coins = Coin::all();
$coinsIds = $coins->pluck('id')->toArray();
$coinsIdsString = implode(',', $coinsIds);
After that I make a call to an external API:
$url = 'https://myapi.com?ids' . $coinsIdsString;
$response = Http::get($url)->json();
$response value is an array of coins, something like:
[
{
"id":"1",
"name":"A",
"price":"1.2",
},
...
]
What would be the best way to update and save my Coin model with the price value from API?
Unfortunately, you're not going to be able to do anything other than update a single record at a time. That is, loop through the results of the array and perform a database update on each record. My recommendation is
$results = ... // Result of API call;
foreach ($results as $result) {
DB::table('coins')
->where('id', $result['id'])
->update(['price' => $result['price']]);
}
I would then create a scheduled command to periodically perform the update since it is likely to be resource intensive depending on the volume of calls.
https://laravel.com/docs/8.x/scheduling#scheduling-artisan-commands

How to utilize Laravel Cache in API?

In my company we have a three user roles: admin, physician and client. All of them can view one of the records table where we have about 1 million rows and we are in need of caching the results from database.
I've read 10's of posts on Stack and else but I am still trying to figure out the proper way of how to caching.
What I've read is that the proper way is to cache per page, so I cache page 1, page 2 etc based on user page selection. This all works fine.
BUT each user role sees different datasets with different filters selected by them and this is where the problem starts. I cache the results and then filtering the paginated 10 rows seems kind of redundant.
I don't know if I should cache results for each user role with the selected parameters?
Or should I cache all the results first, then load the needed relationships and filter the collection with the parameters from user and then create pagination?
Or shouldn't I be using cache at all in this example and just use simple pagination?
// Set the cache time
$time_in_minutes = 5 * 60;
// Request page and if not set then default page is 1
$page = $paginationObject['page'];
// Set items per page
$per_page = $paginationObject['perpage'] ? $paginationObject['perpage'] : 10;
// Set the cache key based on country
$cache_key = "l04ax_pct_dispensing_forms_{$request->get('country')}_page_{$page}_per_page_$per_page";
// Cache::forget($cache_key);
// Set base query for results
$baseQuery = $this->model->with(['details', 'patient']);
// Assign appropriate relations based on user role
if (Auth::user()->isPhysician()) {
$baseQuery->physicianData();
}
else if (Auth::user()->isManufacturer()) {
$baseQuery->manufacturerData();
}
else if (Auth::user()->isSuperAdmin() || Auth::user()->isAdmin()) {
$baseQuery->adminData();
}
//--------------------------------------
// Add filtering params from request
// -------------------------------------
$baseQuery->when($request->has('atc_code'), function ($query) use ($request) {
if ($request->get('atc_code') === NULL) {
throw new RequestParameterEmpty('atc_code');
}
$query->whereHas('details', function ($subQuery) use ($request) {
$subQuery->where('atc_code', $request['atc_code']);
});
})
->when($request->has('id'), function ($query) use ($request) {
if ($request->get('id') === NULL) {
throw new RequestParameterEmpty('id');
}
$query->where('l04ax_dispensing_forms.id', $request['id']);
})
->when($request->has('pct_patients_hematology_id'), function ($query) use ($request) {
if ($request->get('patient_id') === NULL) {
throw new RequestParameterEmpty('patient_id');
}
$query->where('patient_id', $request['patient_id']);
})
->when($request->has('physician_id'), function ($query) use ($request) {
if ($request->get('physician_id') === NULL) {
throw new RequestParameterEmpty('physician_id');
}
$query->where('physician_id', $request['physician_id']);
})
->when($request->has('date'), function ($query) use ($request) {
if ($request->get('date') === NULL) {
throw new RequestParameterEmpty('date');
}
$query->whereDate('created_at', Carbon::parse($request->get('date'))->toDateString());
})
->when($request->has('deleted'), function ($query) use ($request) {
if ($request->get('only_deleted') === NULL) {
throw new RequestParameterEmpty('only_deleted');
}
$query->onlyTrashed();
})
->when($request->has('withTrashed'), function ($query) use ($request) {
if ($request->get('withTrashed') === NULL) {
throw new RequestParameterEmpty('withTrashed');
}
$query->withTrashed();
});
// Remember results per page into cache
return Cache::remember($cache_key, $time_in_minutes, function () use ($baseQuery, $per_page, $page) {
return new L0axPctDispensingFormsCollection($baseQuery->paginate($per_page, ['*'], 'page', $page));
});
In this example the results are cached per page, but when different user logs in, then the results are wrong.
What would be the best way to approach this?
I wouldn't recommend caching this because of the problem you have already encountered. Caching is massively helpful in some areas (e.g. for reference data like a persistent list of countries or currencies), but for user-specific data I would avoid.
If you really did want to cache you could use cache tagging (supported by redis using the phpredis driver only) to tag by user id. However, as mentioned, I wouldn't recommend in this scenario!
If your desire to cache is driven by the scenario where your pages are loading slowly I would recommend installing Laravel Debugbar, and checking to see how many queries your api calls are generating.
If you find a single api call is generating more queries than the number of records you are loading, then you likely are having the 'n + 1 problem' and need to eager load any nested relationships rather than call them in your resource.
P.s You can immediately reduce the number of queries generated by this controller method by only calling Auth::user() once. e.g. $user = Auth::user() and then $user->isSuperAdmin();

Accessing parameters in Request

I have a question about obtaining parameters from Request object.
What is the difference between
$name = $request->name;
OR
$name = $request->input("name");
They show the same behavior. I am asking that from the typing perspective, it is faster to utilize #1 method. But I don't know the difference. Is #1 prone to SQL injections?
Basically, the first case is just a syntactic sugar for the second. In Laravel, Request implements __get magic function to access its internal properties.
public function all()
{
return array_replace_recursive($this->input(), $this->allFiles());
}
public function __get($key)
{
$all = $this->all();
if (array_key_exists($key, $all)) {
return $all[$key];
} else {
return $this->route($key);
}
}
In the first case, if any files were uploaded, Laravel first looks for a property amongst them. And if there is no such param in files or in input, in your first snippet, Laravel also looks for a value amongst route parameters:
To protect your code against SQL injections, you have to use prepared statements/query builder/ORM. You should not escape/change input, so both these functions don't protect you against SQL injections.

Caching Eloquent models in Laravel 5.1

I've created an API using Laravel and I'm trying to find out how to cache Eloquent models. Lets take this example as one of the API endpoints /posts to get all the posts. Also within the method there are various filter options such as category and search and also gives the option to expand the user.
public function index()
{
$posts = Post::active()->ordered();
if (Input::get('category')) $posts = $posts->category(Input::get('category'));
if (Input::get('search')) $posts = $posts->search(Input::get('search'));
if ($this->isExpand('user')) $posts = $posts->with('user');
$posts = $posts->paginate($this->limit);
return $this->respondWithCollection($this->postTransformer->transformCollection($posts->all()), $posts);
}
I have been reading up and found in Laravel 4 you could cache a model like this
return Post::remember($minutes);
But I see this has been removed for Laravel 5.1 and now you have to cache using the Cache facade, but is only retrievable by a single key string.
$posts = Cache::remember('posts', $minutes, function()
{
return Post::paginate($this->limit);
});
As you can see, my controller method contains different options, so for the cache to be effective I would have to create a unique key for each option like posts_cagetory_5, posts_search_search_term, posts_category_5_search_search_term_page_5 and this will clearly get ridiculous.
So either I'm not coming across the right way to do this or the Laravel cache appears to have gone backwards. What's the best solution for caching this API call?
As the search is arbitrary, using a key based on the search options appears to be the only option here. I certainly don't see it as "ridiculous" to add a cache to for expensive DB search queries. I may be wrong as I came by this post looking for a solution to your exact problem. My code:
$itemId = 1;
$platform = Input::get('platform'); // (android|ios|web)
$cacheKey = 'item:' . $itemId . ':' . $platform;
$item = Item::find(1);
if( Cache::has($cacheKey) ) {
$result = Cache::get($cacheKey);
} else {
$result = $this->response->collection( $item, new ItemTransformer( $platform ) );
Cache::tags('items')->put($cacheKey, $result, 60); // Or whatever time or caching and tagged to be able to clear the lot in one go...
}
return $result;
I realise that my example has less complexity but it seems to cover all the bases for me. I then use an observer to clear the cache on update.

Resources