I'm trying to export a large number of records from my database, but I need relationship data in order to build the export correctly. Ideally I would be able to use cursor() to get a Lazy Collection, but that won't load the relationships. I can't load the relationship within a loop, because that will create N+1 queries, and this could be hundreds of thousands of additional queries, which is unacceptable.
Here's what "works" (but runs out of memory):
Record::with('projects')->get()->map(function ($record) {
dd($record); // Shows the `projects` relationship
});
But when I use cursor()...
Record::with('projects')->cursor()->map(function ($record) {
dd($record); // Does NOT show the `projects` relationship
});
Is there a way to get a lazy collection that includes a record's relationship? I have looked in the documentation and it's not clear. Other suggestions have been to use chunk() which is unfortunately not a possibility in this situation.
EDIT: I shouldn't say chunk isn't a possibility, but it's a very expensive re-write. Currently, the data is structured with a lot of variability. So in order to construct the CSV for export, I need (for example) a header for the file. I currently grab that header by looping through all the records (the fields are stored in a JSONB field) and building out an array based on the fields present on those records.
I am also normalizing the data against those headers. So if one record has the field "address-1" but another record doesn't have that, the one that doesn't have it instead shows a blank value in the appropriate column. Otherwise, when inserting the row into the CSV, it doesn't respect the header.
These operations currently grab the entire data set and use a LazyCollection to map the header and normalize the records, and then feed it into the CSV one at a time. It would be ideal if I could grab relationships in a LazyCollection as well rather than having to rewrite the workflow.
according to this doc
cursor work in db stage, while loading relations come after method 'get' or 'first' ...
so: the code in cursor will work in db row represented as Model instance before the overall result, means that this code will run into db, without loading the relation, again db row (iterate through your database records...)
if you can't use chunk... then i think that you can use mySql to manage your data using raw-expressions
Related
I found that laravel 'pluck' return an plain array and 'select' return an object. Can anyone explain it to me have any other different between two this?
thank you.
Pluck is a Laravel Collections method used to extract certain values from the collection. You might often want to extract certain data from the collection i.e Eloquent collection.
While Select is a normal selection of either multi or specific columns. They are
By using Pluck you only ask to return necessary fields, but with get you will pull all columns. Also select does the same & the difference here is between the returning result. Using pluck cause returning the final result as an array with pair of given arguments, but select return an array (or object) which every single child contain one row.
$name = DB::table('users')->where('name', 'John')->pluck('name');
Select
EX : DB::table('users')->select('id', 'name', 'email)
Select is a method sent to the database that Laravel will translate as
SELECT id, name, email FROM users
This will select the data of the columns you asked and nothing else. It allows you to be more efficient with your request by only asking the required data. Take the example above and image that the user is a Facebook user. It has a ton of data on it, plus relations to other tables. If you just want to display the name, email and a link to the user profile, doing this request!
For more info and knowing more about the expected response visit : https://laravel.com/docs/9.x/queries#select-statements
Pluck
EX:
$users = DB::table('users')->where('roles', '=', 'admin')
$emails = $users->pluck('email')
The Pluck method retrieves the values in a collection that you already got from the Database and that is now a Laravel Collection. This allows you to create an array of the plucked data, but will not improve the performance of your request, as in the Example above, the $users would hold all data of all the admin users.
Since it does not improve performance, what good is it then ?
The pluck would be useful for example to separate some datas in different variables depending on where. You might need the users data for some stuffs, but also want to display a quick list of all emails together.
For more info about the pluck method and understand how to create a keyed array from a second column, visit the docs here: https://laravel.com/docs/9.x/collections#method-pluck
Pluck function normally used to pull a single column from the collection or with 2 columns as key, value pairs, which is always be a single dimension array.
Select will return all of the columns you specified for an entity in a 2 dimensional array, like array of selected values in an array.
Note: Pluck function is a collection function which happens after the data is fetched. Select is a query builder function that builds query to perform in the database server.
Actually select is used inside DB queries, which can affect the performance by limiting the pulled columns. However, Pluck is Laravel Collection's method, so you can use pluck after you pull the data from DB.
Context:
I have a data model in Power pivot with three tables, tTasks, tCaseworks and tCaseworkStatus. I am attempting to create two calculated columns in tCaseworks which from the two data tables. All three tables are linked through the common field casework_id (see illustration below).
The data model is regularly updated with new data. The way I am doing this is as follows:
All three tables are sourced from three corresponding tables in my Excel workbook.
A VBA script deletes all records in the three Excel tables and then refreshes the data model (sidenote: because the data model demands lookup tables to not be empty the VBA code adds one row per table before refreshing).
New data is then added to the excel tables and the data model is refreshed.
This process works perfectly.
Problem:
The problem arises when I am adding calculated columns to tCaseworks and then attempting to update the data as described above. I have added two calculated columns; has_task and status_now. I am using the following DAX code:
has_task:
has_task =
IF (
CONTAINS (
RELATEDTABLE ( tTasks );
tTasks[casework_id]; tCaseworks[casework_id]
);
"Yes";
"No"
)
status_now:
status_now =
VAR TableX = RELATEDTABLE(tCaseworkStatus)
VAR ResultX = IF(
CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Completed");"Completed";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Dismissed");"Dismissed";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Begun");"Begun";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Created");"Created";
"Find no status"))))
RETURN
ResultX
Both of these calculated columns work as expected as long as I do not delete the data in the model (I do have one hickup with both columns as described in this separated problem, but I think that is unrelated).
When the data has been deleted and I refresh the model I get the following error message:
"We cannot get the data from the data model. This is the error message we got: A circular dependency was discovered: 'tCaseworks'[status_now],'tCaseworks'[status_now],'tCaseworks'[has_task],'tCaseworks'[has_task],'tCaseworks'[status_now]."
Question:
What is creating this dependency and how can I avoid it?
My attempted solutions:
The problem only arise when there are two of these calculated columns. Any one of these two works perfectly without the other upon refreshing. I know that calculated columns are prone to circular problems, but unfortunately I need to use columns and not measures. I suspect that perhaps my choice in formula is creating the problem, most likely the contains-function. However, I don't know about any alternative ways of building the formulas I need. Any suggestions?
Edit:
I originally only posted a portion of my data model as I wanted the question to be as concise as possible but I guess it might have been confusing. The whole model concerns five objects from a case handling system: Claims, Cases, Caseworks, Tasks and Action Points. These objects are hierarchical, one claim can have one or more cases, but one case can only have one claim. Similarly, a case can have several caseworks, a casework can have several tasks, a task can have several action points. Additionally, the latter four can have a status attribute which is changed regularly.
I attempted to organize my data model in such a way that I had a lookup table for each object with unique values. I have many attributes for each object in my data that I did not include in the example above, and my goal was to add useful attributes through calculated columns in these tables. The data tables with the changes were intented to provide insight to the lookup tables.
I think your relationship model is a bit unusual. DAX works best when using something like dimensional fact model
I would consider the tCaseworkStatus a fact table since its like a log of the changes to your data. tTasks is a dimension, since it just add an extra dimension to your data.
The tCaseworks is not necessary since it doesn't hold any actual data (only calculated data).
if you want your current model to work, it might fix your problem if you just delete the relationship between tTasks and tCaseworks, and add a new between tTasks and tCaseworksStatus
edit.
it just occurred to me that the reason you have it like this, is that you may have a many-to-many relationship between tTasks and tCaseworksStatus. if that is the case you might have to create a proper many-to-many table. which is kind of what your tCaseworks is, but you cant have a relationship to the same key like you currently have.
edit2.
the solution seemed to be that somehow the Relatedtable function in conjunction with the relationship model was causing the error. using Lookupvalue instead seems to to have fixed the issue.
Let's imagine we have a web-page with the content and one of the elements is table where we have couple of columns. It is done with Joomla, so basically I am working with the web-page constructor if I can call it like this and not with code. In the last column I have a link with query parameters, so something like this: link?qparam1=sth1&qparam2=sth. The values for these query parameters should be taken from the first and second column and inserted in this link. Otherwise I need manually to copy those values to each and every link which makes in very slow and inefficient especially when table values are changed, the link must be updated as well.
Is it possible to fetch the data from columns and include into the link?
Write a system plugin, Get data from a table, insert/update data to request
$app = JFactory::getApplication();
$app->input->set('qparam1','<value from table>');
it will update request data.
I'm trying to dramatically cut down on pricey DB queries for an app I'm building, and thought I should perhaps just return IDs of a child collection (then find the related object from my React state), rather than returning the children themselves.
I suppose I'm asking, if I use 'pluck' to just return child IDs, is that more efficient than a general 'get', or would I be wasting my time with that?
Yes,pluck method is just fine if you are trying to retrieving a Single Column from tables.
If you use get() method it will retrieve all information about child model and that could lead to a little slower process for querying and get results.
So in my opinion, You are using great method for retrieving the result.
Laravel has also different methods for select queries. Here you can look Selects.
The good practice to perform DB select query in a application, is to select columns that are necessary. If id column is needed, then id column should be selected, instead of all columns. Otherwise, it will spend unnecessary memory to hold unused data. If your mind is clear, pluck and get are the same:
Model::pluck('id')
// which is the same as
Model::select('id')->get()->pluck('id');
// which is the same as
Model::get(['id'])->pluck('id');
I know i'm a little late to the party, but i was wondering this myself and i decided to research it. It proves that one method is faster than the other.
Using Model::select('id')->get() is faster than Model::get()->pluck('id').
This is because Illuminate\Support\Collection::pluck will iterate over each returned Model and extract only the selected column(s) using a PHP foreach loop, while the first method will make it cheaper in general as it is a database query instead.
I am using Spring + Hibernate, and I will have a HTML from that has like 100+ fields and I must store all these values to database in a single table.
They are all used in one big massive calculation.
How should I handle this, I thought about creating an Entity with 100 fields and setters, getters, but is there a nicer solution for it?
EDIT:
Everytime when someone submits form, a new row will be added, so eventually there will be tens of thousands of rows.
I believe its not about an HTML but about the data modeling.
Think about your data, who are the consumers of it, how and in which business flows you're going to query the data.
In general an entity with 100 fields is not a good idea because it should be mapped to one single table with 100 columns. Its just not maintainable.
Maybe all the data should be normalized and you can store pieces of it in different tables in db with foreign keys?
Hope this helps or at least will give you some direction to think about
I think you could use a Map in this case, because:
You only want to store the fields as key-value elements.
It is more flexible to add/remove fields in the future.
So, instead of having a table with 100 fields you will end with a table with 2 fields (3 if you want to include the form identifier or something like that) and 100 rows.
If many of the form fields are empty (sparse data) you could also save some storage space (it depends on the database you are using).