What is the best practice to join tables during graphql query - graphql

If my graphql has a user which has posts that are followed which obviously have posted users is there a best practice to efficiently get the data but not over request from the database.
should not join posts table
{ users { first_name } }
should join posts but not back onto users
{ users { followedPosts { title } } }
should join posts and back onto users for poster
{ users {followedPosts { poster { first_name } } } }
I've thought about inspecting the context on the initial resolver to see if the appropriate fields are fetched but in the library i'm using it is pretty ugly.
I'd like to know if there are good practices that exist to solve this and add appropriate joins or does everyone just query the heck out of the database and try to do some caching.
Also, if joins aren't common, what about inspecting the posters that will be requested and fetch them all in one go rather than as graphql resolves down the chain.

Stick to separate queries for separate tables and have a look at DataLoader for batching multiples calls to the same table, and eventually some per-request caching.
This is usually more than enough.

Related

LatestOfMany() of BelongsToMany() relationship

I've been using latestOfmany() for my hasMany() relation to define them as hasOne() for quite a while now. Lately I've been in need of the similar application but for belongsToMany() relationships. Laravel doesn't have this feature unfortunately.
My codebase as follows:
Document
id
upload_date
identifier_code
Person
id
name
DocumentPerson (pivot)
id
person_id
person_id
token
My objective is: define relationship for fetching the first document (according to upload_date) of Person. As you can see it's a many-to-many relationship.
What I have tried so far:
public function firstDocument()
{
return $this->hasOne(DocumentPerson::class)->oldestOfMany('document.upload_date');
//this was my safe bet but oldestOfMany() and ofMany() doesn't allow aggregating on relationship column.
}
public function firstDocument()
{
return $this->belongToMany(Document::class)->oldestOfMany('upload_date')
}
public function firstDocument()
{
return $this->belongToMany(Document::class)->oldest()->limit(1);
}
public function firstDocument()
{
return $this->hasOneThrough(Document::class, DocumentPerson::class, 'id', 'document_id', 'id', 'person_id')->latestOfMany('upload_date');
}
At this point I'm almost positive current relationship base doesn't support something like this, so I'm elaborating alternative methods to solve this. My two choices:
Add a column called first_document_id on Person table, go through that with belongsTo() simple and fast performance-wise. But downside is I'll have to implement so many event-listeners to make sure it is always consistent with actual relationships. What if Document's upload_date is updates etc. (basically database inconsistency)
Add a order column on pivot (document_person) table, which will hold order of related Documents by upload_date. This way I can do hasOne(DocumentPerson::class)->oldestOfMany('order');//or just ofMany() and be done with it. This one also poses the risk of database inconsistency.
It's fair to say I'm at a crossroads here. Any idea and suggestion is welcomed and appreciated. Thank you. Please read the restrictions to prevent suggesting things that are not feasible for my situation.
Restrictions:
(Please)
It should strictly be a relationship. I'll be using it on various places, it definitely has to be relationship so I can eager load and query it. My next objective involves querying by this relationship so it is imperative.
Don't suggest accessors, it won't do well with my case.
Don't suggest collection methods, it needs to be done in query.
Don't suggest ->limit() or ->take() or ->first(), those are prone to cause inconsistent results with eager loading.
Update 1
Q: Why first document of a person has to be a relationship ?
A: Because further down the line I'll be querying it in various different instances. Example queries where it'll be utilized:
Get all the users whose first document (according to upload_date) upload_date between 2022-01-01 and 2022-06-08. (along with 10 other scopes and filters)
Get all the users whose first document (according to upload_date) identifier_code starts with "Lorem" and id bigger than 100.
These are just to name a few, there are many cases where I really gotta query it in various fashions. This is the reason that I desperately need it to be a relationship, so I can query it with ease using Person::whereHas('firstDocument',function($subQuery){ return $subQuery->someScope1()->anotherScope2()->where(...); }
If I only needed to display it, yeah sure eager loading with closure would do well, or even collection methods, or accessors would suffice. But since ability to query it is the need, relationship is of the essence. Keep in mind Person table has around 500k record, hence the need for querying it on the database layer.
Alright here's the solution I've elected to go with (among my choices, explained in the question). I implemented the "adding order column on pivot" table. Because it scales better and is rather flexible compared to other options. It allows for querying the last document, first document, third document etc. Whilst it doesn't even require any aggregate functions (Max, min like ->latestOfMany() applies) which is a performance boost. Given these constraints this solution was the way to go. Here's how I applied it in case someone else is thinking about something similar.
Currently the only noticeable downside to this approach is inability to access any additional pivot data.
Added new column for order:
//migration
$table->unsignedTinyInteger('document_upload_date_order')->nullable()->after('token');
$table->index('document_upload_date_order');//for performance
Person.php (Model)
//... other stuff
public function personalDocuments()
{//my old relationship, which I'll still keep for display/index purposes.
return $this->belongsToMany(Document::class)->withPivot('token')->where('type_slug','personal');
}
//NEW RELATIONSHIP
public function firstDocument()
{//Eloquent relationship, allows for querying and eager loading
return $this->hasOneThrough(
Document::class,
DocumentPerson::class,//pivot class for the pivot table
'person_id',
'id',
'id',
'document_id')
->where('document_upload_date_order',1);//magic here
SomeService.php
public function determineDocumentUploadDateOrders(Person $person){
$sortLogic=[
['upload_date', 'asc'],
['created_at', 'asc'],
];
$documentsOrdered=$person->documents->sortBy($sortLogic)->values();//values() is for re-indexing the array keys
foreach ($documentsOrdered as $index=>$document){
//updating through pivot tables ORM model
DocumentPerson::where('id',$document->pivot->id)->update([
'document_upload_date_order'=>$index+1,
'document_id'=>$document->id,
'person_id'=>$document->pivot->person_id,
]);
}
}
I hooked determineDocumentUploadDateOrders() into various event-listeners and model events so whenever association/disassociation occurs, or upload_date of a document changes I simply call determineDocumentUploadDateOrders() with corresponding Person and this way it is always kept in sync with actual.
Implemented it fully and it is providing consistent results with great performance. Of course it brought a bit of an overhead with keeping it in sync. But nonetheless, It did the job whilst meeting the requirements. Honestly I found this approach far more reliable than some in-official eloquent relationships and similar alternatives.
I have encountered a similar situation years back.
the best workaround on a situation like this is to use #staudenmeir package eager limit
Load the trait use \Staudenmeir\EloquentEagerLimit\HasEagerLimit; on both model (parent and related model)
then try the code below
public function firstDocument() {
return $this->documents()->latest()->limit(1);
}
public function documents() {
return $this->belongsToMany(Document::class);
}
just to add, Eager loading with limit does not work with built laravel eloquent, you would have to build your own raw queries to achieve it which can turn into a nightmare. that eager limit package from staudenmeir should have been merge with laravel source code 😆

Apollo GraphQL cache for common data in different queries?

I have a web based application with two graphql queries that have some data in common. The first query FullProject is more or less a very broad "lets pull all data that the client might need" and contains many nested resources. For this question the important thing is that it also pulls in loads of users:
query FullProject($id: ID!) {
projects(input: {filter: {id: $id}}) {
nodes {
id
name
relatedUsers {
id
name
}
# Many more
}
}
}
The second query is used to populate a list of users:
query NameUser($id: ID!) {
users(input: {filter: {id: $id}}) {
nodes {
id
name
}
}
}
When I check the GraphQL cache (using the Apollo Developer tools) after running FullProject I can see that the data has been properly normalized and I have entries like:
User:1
name:A
---
User:2
name:B
When I however run the NameUser query this always results in one new request for each user. After the first request for a user the cache properly kicks in, but this still means that I am ending up with possibly hundreds of queries for data that is technically already part of the cache (albeit via a different query). I was hoping that the Apollo Client would be able to leverage the cache even for different top-level queries. Am I doing something wrong or is my assumption incorrect?

How to create a GraphQL query that returns data from multiple tables/models within one field using Laravel Lighthouse

Im trying to learn GraphQL with Laravel & Lighthouse and have a question Im hoping someone can help me with. I have the following five database tables which are also defined in my Laravel models:
users
books
user_books
book_series
book_copies
I'd like to create a GraphQL endpoint that allows me to get back an array of users and the books they own, where I can pull data from multiple tables into one subfield called "books" like so:
query {
users {
name
books {
title
issue_number
condition
user_notes
}
}
}
To accomplish this in SQL is easy using joins like this:
$users = User::all();
foreach ($users as $user) {
$user['books'] = DB::select('SELECT
book_series.title,
book.issue_number
book_copies.condition,
user_books.notes as user_notes
FROM user_books
JOIN book_copies ON user_books.book_copy_id = book_copies.id
JOIN books ON book_copies.book_id = books.id
JOIN book_series ON books.series_id = book_series.id
WHERE user_books.user_id = ?',[$user['id']])->get();
}
How would I model this in my GraphQL schema file when the object type for "books" is a mashup of properties from four other object types (Book, UserBook, BookCopy, and BookSeries)?
Edit: I was able to get all the data I need by doing a query that looks like this:
users {
name
userBooks {
user_notes
bookCopy {
condition
book {
issue_number
series {
title
}
}
}
}
}
However, as you can see, the data is separated into multiple child objects and is not as ideal as getting it all in one flat "books" object. If anyone knows how I might accomplish getting all the data back in one flat object, Id love to know.
I also noticed that the field names for the relationships need to match up exactly with my controller method names within each model, which are camelCase as per Laravel naming conventions. Except for my other fields are matching the database column names which are lower_underscore. This is a slight nitpick.
Ok, after you edited your question, I will write the answer here, to answer your new questions.
However, as you can see, the data is separated into multiple child objects and is not as ideal as getting it all in one flat "books" object. If anyone knows how I might accomplish getting all the data back in one flat object, Id love to know.
The thing is, that this kind of fetching data is a central idea of GraphQL. You have some types, and these types may have some relations to each other. So you are able to fetch any relations of object, in any depth, even circular.
Lighthouse gives you out of the box support to eloquent relations with batch loading, avoiding the N+1 performance problem.
You also have to keep in mind - every field (literally, EVERY field) in your GraphQL definition is resolved on server. So there is a resolve function for each of the fields. So you are free to write your own resolver for particular fields.
You actually can define a type in your GraphQL, that fits your initial expectation. Then you can define a root Query field e.g. fetchUsers, and create you custom field resolver. You can read in the docs, how it works and how to implement this: https://lighthouse-php.com/5.2/the-basics/fields.html#hello-world
In this field resolver you are able to make your own data fetching, even without using any Laravel/Eloquent API. One thing you have to take care of - return a correct data type with the same structure as your return type in GraphQL for this field.
So to sum up - you have the option to do this. But in my opinion, you have to write more own code, cover it with tests on you own, which turns out in more work for you. I think it is simpler to use build-in directives, like #find, #paginate, #all in combination with relations-directives, which all covered with tests, and don't care about implementation.
I also noticed that the field names for the relationships need to match up exactly with my controller method names within each model, which are camelCase as per Laravel naming conventions.
You probably means methods within Model class, not controller.
Lighthouse provides a #rename directive, which you can use to define different name in GraphQL for your attributes. For the relation directives you can pass an relation parameter, which will be used to fetch the data. so for your example you can use something like this:
type User {
#...
user_books: [Book!]! #hasMany(relation: "userBooks")
}
But in our project we decided to use snak_case also for relations, to keep GraphQL clean with consistent naming convention and less effort

I don't get GraphQL. How do you solve the N+1 issue without preloading?

A neighborhood has many homes. Each home is owned by a person.
Say I have this graphql query:
{
neighborhoods {
homes {
owner {
name
}
}
}
}
I can preload the owners, and that'll make the data request be a single SQL query. Fine.
But what if I don't request the owner in the graphql query, the data will still be preloaded.
And if I don't preload, the data will either be fetched in every query, or not at all since I'm not loading the belongs_to association in the resolver.
I'm not sure if this is a solved issue, or just a painpoint one must swallow when working with graphql.
Using Absinthe, DataLoader and Elixir by the way.
Most GraphQL implementations, including Absinthe, expose some kind of "info" parameter that contains information specific to the field being resolved and the request being executed. You can parse this object to determine which fields were actually requested and build your SQL query appropriately.
See this issue for a more in-depth discussion.
In order to complement what Daniel Rearden said, you have to use the info.definition to resolve nested includes.
In my application I defined an array of possible values like:
defp relationships do
[
{:person, [tasks: [:items]]]}
...
]
end
then I have a logic that iterates over the info.definition and uses this function to preload the associations.
You will use a DataLoader to lazy load your resources. Usually to fetch third party requests or perform a complex database query.

Am I abusing of Linq to objects?

I think that queries with linq to objects end up very readable and nice. For example:
from person in db.Persons.ToList()
where person.MessageableBy(currentUser) ...
Where MessageableBy is a method that can't be translated into a store expression (sql)
public bool MessageableBy(Person sender)
{
// Sender is system admin
if (sender.IsSystemAdmin())
return true;
// Sender is domain admin of this person's domain
if (sender.Domain.DomainId == this.Domain.DomainId && this.Domain.HasAdmin(sender))
return true;
foreach (Group group in this.Groups)
{
if (group.MessageableBy(sender))
return true;
}
// The person is attorney of someone messageable
if (this.IsAttorney)
{
foreach (Person pupil in this.Pupils)
if (pupil.MessageableBy(sender))
return true;
}
return false;
}
The problem is that I think that this is not going to scale. I'm already noticing that with a few entries in the database, so can't imagine with a large database.
So the question is:
Should I mix linq to entities with linq to objects (ie: apply some of the "where" to the ICollection and some of the "where" to the .ToList() result of that? should I only use linq to entities, ending with a very large sentence?
.ToList() will actually execute the query and fetch all the data in that table, which is not something you'd want unless you know for sure it'll always be few records. So yes, you should do more in the where clause before doing .ToList()
I largely agree with your initial analysis. Mixing Linq to Objects and Linq to Entities is fine, but requires retrieving more data than is necessary, and therefore could lead to scaling problems down the road.
Remember to design you data model to support the critical queries. Perhaps a user could be a person, and person could have a self relationship that determines who can message who. This is just a simple thought, to inspire you to consider other ways of representing your data to allow the MessableBy method to be realized in the query itself.
In the meantime, if it isn't causing performance problems, then I would consider this issue more in terms of model design.
Although this simply paraphrases the statements made by earlier respondents, I believe it is important to enough to truly emphasize:
It is critical for DB application performance to perform as much calculation as possible, and particularly as much filtering and aggregation as possible, on the DB server prior to sending the resulting data to the client.

Resources