Architecture for avoiding repeated data in GraphQL - graphql

I have a application where the same data is present in many places in the graph and need to optimize the data queries to avoid processing and sending the same data too often.
As an example consider the following pseudo schema:
type Group {
name: String
members: [Person]
}
type Person {
name: String
email: String
avatar: Avatar
follows: [Person]
followedBy: [Person]
contacts: [Person]
groups: [Group]
bookmarks: [Bookmark]
sentMessages: [Message]
receivedMessages: [Message]
}
type Message {
text: String
author: Person
recipients: [Person]
}
type Bookmark {
message: Message
}
Querying a users data can easily contain hundreds, if not thousands, of Person-objects even though it the small circle of friends/contacts/follows only contains tens of distict users.
In my real implementation about 80% of each GraphQL query (in bytes) is redundant and considering that the client does many different queries in the same space above 90% of all data transferred and processed is redundant.
How could I improve the model so that I don't have to load the same data again and again without complicating the client too much?
I'm using Apollo for both GraphQL client and server.

Use/implement pagination (instead of just arrays) for relations - this way you can query for count/total (render it without array processing) and array of ids only - usually there is no need to query/join person table (DB) at all.
Render list of Person components (react?) using passed id prop only ... only rendered Person fetches for more details (if not cached, use batching to merge requests) consumed/rendered inside.

Related

How can I pass arguements to child fields in Apollo?

I'm trying to build a graphql interface that deals with data from different regions and for each region there's a different DB.
What I'm trying to accomplish is:
TypeDefs= gql`
type Player {
account_id: Int
nickname: String
clan_id:Int
clan_info:Clan
}
type Clan{
name:
}
So right now I can request player(region, id), and this will pull up the player details no issues there.
But the issue is that Clan_info field also requires the region from the parent, so the resolver would look like clan_info({clan_id}, region).
Is there any way to pass down the region from parent to child field? I know I can add it to the details of the player, but would rather not since there would be millions of records and every field counts

Data modelling for ecommerce website using Amplify + GraphQL + DynamoDB

I'm using Amplify from AWS to build a small ecommerce project using React as frontend.
I'd like to know how I should write the "Product" and "Order" types in the schema in order to be able to write productId's to a product array in the Order table when users complete a purchase.
My schema.graphql file:
type Product #model {
id: ID!
name: String!
price: Int!
category: String!
images: [String]!
}
type Order #model {
id: ID!
products: [Product] #connection
}
My question is about the last line, do I need to define that [Product] connection there or I can use [String] to store product id's in a simple string array?
Point 1: In dynamoDB, you only need to define the data type of your partition key and sort key, and these can be string, number etc. For all the other attributes, you don't need to define anything.
Point 2: The dynamoDB designers prefer using a single table per application, unless it's impossible to manage data without multiple tables. Keeping this in mind, your table can be something like this.
Please observe: Only Id aka partition key and Sk aka sort key column is fixed here, all other columns can be anything per item. This is the beauty of DynamoDB. Refer to this document for dynamoDB supported data types.

Apollo query does not return cached data available using readFragment

I have 2 queries: getGroups(): [Group] and getGroup($id: ID!): Group. One component first loads all groups using getGroups() and then later on a different component needs to access a specific Group data by ID.
I'd expect that Apollo's normalization would already have Group data in cache and would use it when getGroup($id: ID!) query is executed, but that's not the case.
When I set cache-only fetchPolicy nothing is returned. I can access the data using readFragment, but that's not as flexible as just using a query.
Is there an easy way to make Apollo return the cached data from a different query as I would expect?
It's pretty common to have a query field that returns a list of nodes and another that takes an id argument and returns a single node. However, deciding what specific node or nodes are returned by a field is ultimately part of your server's domain logic.
As a silly example, imagine if you had a field like getFavoriteGroup(id: ID!) -- you may have the group with that id in your cache but that doesn't necessarily mean it should be returned by the field (it may not be favorited). There's any number of factors (other arguments execution context, etc.) that might affect what nodes(s) are returned by a field. As a client, it's not Apollo's place to make assumptions about your domain logic.
However, you can effectively duplicate that logic by implementing query redirects.
const cache = new InMemoryCache({
cacheRedirects: {
Query: {
group: (_, args) => toIdValue(cache.config.dataIdFromObject({ __typename: 'Group', id: args.id })),
},
},
});

Using GraphQL with conditional related types

I have an app that has a type with many related types. So like:
type Person {
Name: String!
Address: Address!
Family: [Person!]!
Friends: [Person!]!
Job: Occupation
Car: Car
}
type Address {...}
type Occupation {...}
type Car {...}
(don't worry about the types specifically...)
Anyway, this is all stored in a database in many tables.
Some of these queries are seldom used and are slow. Imagine for example there are billions of cars in the world and it takes time to find the one that is owned by the person we are interested in. Any query to "getPerson" must satisfy the full schema and then graphql will pare it down to the fields that are needed. But since that one is slow and could be requested, we have to perform the query even though the data is thrown out most of the time.
I only see 2 solutions to this.
a) Just do the query each time and it will always be slow
b) Make 2 separate Query options. One for "getPerson" and one "getPersonWithCar" but then you're not able to reuse the schema and now a Person is defined twice. Once in terms of the car and once without.
Is there a way to indicate whether a field is present in the Query requested fields? That way we could say like
if (query.isPresent("Car")) {
car = findCar();
} else {
car = null;
}

Fetching the data optimally in GraphQL

How can I write the resolvers such that I can generate database sub-query in each resolver and effectively combine all of them and fetch the data at once?
For the following schema :
type Node {
index: Int!
color: String!
neighbors(first: Int = null): [Node!]!
}
type Query {
nodes(color: String!): [Node!]!
}
schema {
query: Query
}
To perform the following query :
{
nodes(color: "red") {
index
neighbors(first: 5) {
index
}
}
}
Data store:
In my data store, nodes and neighbors are stored in separate tables. I want to write a resolver so that we can fetch the required data optimally.
If there are any similar examples, please share the details. (It would be helpful to get an answer in reference to graphql-java)
DataFetchingEnvironment provides access to sub-selections via DataFetchingEnvironment#getSelectionSet. This means, in your case, you'd be able to know from the nodes resolver that neighbors will also be required, so you could JOIN appropriately and prepare the result.
One limitation of the current implementation of getSelectionSet is that it doesn't provide info on conditional selections. So if you're dealing with interfaces and unions, you'll have to manually collect the sub-selection starting from DataFetchingEnvironment#getField. This will very likely be improved in the future releases of graphql-java.
The recommended and most common way is to use a data loader.
A data loader collects the info about which fields to load from which table and which where filters to use.
I haven't worked with GraphQL in Java, so I can only give you directions how you could implement this yourself.
Create an instance of your data loader and pass it to your resolvers as the context argument.
Your resolvers should pass the table name, a list of field names and a list of where conditions to the data loader and return a promise.
Once all the resolvers have executed your data loader should combine those lists so you only end up with one query per table.
You should remove duplicate field names and combine the where conditions using the or keyword.
After the queries have executed you can return all of this data to your resolvers and let them filter the data (since we combined the conditions using the or keyword)
As an advanced feature your data loader could apply the where conditions before returning the data to the resolvers so that they don't have to filter them.

Resources