Using GraphQL with conditional related types - graphql

I have an app that has a type with many related types. So like:
type Person {
Name: String!
Address: Address!
Family: [Person!]!
Friends: [Person!]!
Job: Occupation
Car: Car
}
type Address {...}
type Occupation {...}
type Car {...}
(don't worry about the types specifically...)
Anyway, this is all stored in a database in many tables.
Some of these queries are seldom used and are slow. Imagine for example there are billions of cars in the world and it takes time to find the one that is owned by the person we are interested in. Any query to "getPerson" must satisfy the full schema and then graphql will pare it down to the fields that are needed. But since that one is slow and could be requested, we have to perform the query even though the data is thrown out most of the time.
I only see 2 solutions to this.
a) Just do the query each time and it will always be slow
b) Make 2 separate Query options. One for "getPerson" and one "getPersonWithCar" but then you're not able to reuse the schema and now a Person is defined twice. Once in terms of the car and once without.
Is there a way to indicate whether a field is present in the Query requested fields? That way we could say like
if (query.isPresent("Car")) {
car = findCar();
} else {
car = null;
}

Related

Interface vs Union in GraphQL schema design

Suppose I am building a GraphQL API that serves a timeline of natural disaster events.
There are two different kinds of event right now:
Hurricane
Earthquake
All events have an ID and a date they occurred. I plan to have a paginated query for fetching events using cursors.
I can think of 2 different approaches to modelling my domain.
1. Interface
interface Event {
id: ID!
occurred: String! # ISO timestamp
}
type Earthquake implements Event {
epicenter: String!
magnitude: Int!
}
type Hurricane implements Event {
force: Int!
}
2. Union
type Earthquake {
epicenter: String!
magnitude: Int!
}
type Hurricane {
force: Int!
}
type EventPayload =
| Earthquake
| Hurricane
type Event {
id: ID!
occurred: String! # ISO timestamp
payload: EventPayload!
}
What are the trade-offs between the two approaches?
I believe that:
unions are about providing: a field / its resolver function resolves with an object, whose type belongs to a specific, known, set of types.
interfaces are about requesting: without, the clients would have to repeat the fields they are interested in, in every type fragment.
They serve different purposes, and they can be used together:
interface I {
id: ID!
}
type A implements I {
id: ID!
a: Int!
}
type B implements I {
id: ID!
b: Int!
}
type C implements I {
id: ID!
c: Int!
}
union Foo = A | C
type Query {
foo: Foo!
}
This schema declares that A, B, and C have some fields in common, so that it's easier for the client to request them, and that querying foo can only yield A or C.
Could you write foo: I! instead? While this would work seamlessly, I believe this leads to a bad development experience. If you're saying that foo provides an I object, your clients should be prepared for receiving any of the implementing types, including B, and would spend time to write and maintain a code that will never be called. If you know that foo can only yield A and C, please tell them explicitly.
The same holds if foo were to yield A, B, or C. It happens that it's exactly the list of types that implement I, so in this case, could you write foo: I!? No! Don't be fooled by that. Why? Because this list is expandable through federation / schema stitching! I believe it's a seldom used feature of some GraphQL frameworks, but whose adoption grows. If you've never used it, please try, it will open your mind to new ideas of inter-micro-service-communication and other Medium buzzwords. In short, imagine you're making a public API, or even somewhat-public within an organization. Someone else could "augment" your API by providing extra stuff. This may include new types implementing your interface. And so we're back to the previous paragraph.
So far, it looks like I'm in favor of your first code.
However, and this may be specific to this scenario, it seems to me that your definition of event mixes both data about its occurrence and about physics metrics. Your second code splits them into two type hierarchy. I like that. It feels more architecture-friendly. Your schema is more open. Imagine your API is about event history, and someone enhance it with forecasts: your EventPayload can be reused!
Besides, note that your first example is incomplete. Types implementing an interface must implement, i.e. repeat, every single field of this interface, like I wrote in the above code. This becomes harder to maintain as the number of fields and the number of implementing types grow.
So, the second solution also has some advantages. But doing so, the blah-blah I made earlier about being specific with returned types is hard to implement, because the payload, which is the one to be specific about, is embedded into another type, and there's no such thing as generics in GraphQL.
Here's a proposal to reconcile all of that:
interface HasForce {
force: Int!
}
type Earthquake {
epicenter: String!
magnitude: Int!
}
type Hurricane implements HasForce {
force: Int!
}
type Tsunami implements HasForce {
force: Int!
}
interface Event {
data: EventData!
}
type EventData {
id: ID!
occurred: String!
}
union HistoryMeteorologicalPhenomenon = Earthquake | Hurricane
type HistoryEvent implements Event {
data: EventData!
meteorologicalPhenomenon: HistoryMeteorologicalPhenomenon!
}
type Query {
historyEvents: [HistoryEvent!]!
}
It looks a bit more complex that both of your proposals, but it fulfills my needs. Also, it's rare to look at a schema from this height: more often, we know the entry point and dig down from there. For instance, I open the documentation at historyEvents, see that it yields phenomena of two kinds, fine, I'm not aware that other union types and event types exist.
If you were to write a lot of these union + event pairs, you could generate them with code instead, whereby one function call would declare a pair. Less error-prone, funnier to implement, and with more potential of Medium articles.
Note that the GraphQL structure is independent of your storage structure. It's possible to have multiple GraphQL objects providing data from the same insert-your-language-here object, e.g. yielded by your DB driver. There may be a tiny overhead that I haven't benchmarked, but providing a cleaner API outweighs that to me. The basic idea is that resolver functions just have to resolve with the same source, so that the resolver functions related to another type will be called with the same source object.

Perform graphQL query with result from another graphQL query [duplicate]

Hullo everyone,
This has been discussed a bit before, but it's one of those things where there is so much scattered discussion resulting in various proposed "hacks" that I'm having a hard time determining what I should do.
I would like to use the result of a query as an argument for another nested query.
query {
allStudents {
nodes {
courseAssessmentInfoByCourse(courseId: "2b0df865-d7c6-4c96-9f10-992cd409dedb") {
weightedMarkAverage
// getting result for specific course is easy enough
}
coursesByStudentCourseStudentIdAndCourseId {
nodes {
name
// would like to be able to do something like this
// to get a list of all the courses and their respective
// assessment infos
assessmentInfoByStudentId (studentId: student_node.studentId) {
weightedMarkAverage
}
}
}
}
}
}
Is there a way of doing this that is considered to be best practice?
Is there a standard way to do it built into GraphQL now?
Thanks for any help!
The only means to substitute values in a GraphQL document is through variables, and these must be declared in your operation definition and then included alongside your document as part of your request. There is no inherent way to reference previously resolved values within the same document.
If you get to a point where you think you need this functionality, it's generally a symptom of poor schema design in the first place. What follows are some suggestions for improving your schema, assuming you have control over that.
For example, minimally, you could eliminate the studentId argument on assessmentInfoByStudentId altogether. coursesByStudentCourseStudentIdAndCourseId is a field on the student node, so its resolver can already access the student's id. It can pass this information down to each course node, which can then be used by assessmentInfoByStudentId.
That said, you're probably better off totally rethinking how you've got your connections set up. I don't know what your underlying storage layer looks like, or the shape your client needs the data to be in, so it's hard to make any specific recommendations. However, for the sake of example, let's assume we have three types -- Course, Student and AssessmentInfo. A Course has many Students, a Student has many Courses, and an AssessmentInfo has a single Student and a single Course.
We might expose all three entities as root level queries:
query {
allStudents {
# fields
}
allCourses {
# fields
}
allAssessmentInfos {
# fields
}
}
Each node could have a connection to the other two types:
query {
allStudents {
courses {
edges {
node {
id
}
}
}
assessmentInfos {
edges {
node {
id
}
}
}
}
}
If we want to fetch all students, and for each student know what courses s/he is taking and his/her weighted mark average for that course, we can then write a query like:
query {
allStudents {
assessmentInfos {
edges {
node {
id
course {
id
name
}
}
}
}
}
}
Again, this exact schema might not work for your specific use case but it should give you an idea around how you can approach your problem from a different angle. A couple more tips when designing a schema:
Add filter arguments on connection fields, instead of creating separate fields for each scenario you need to cover. A single courses field on a Student type can have a variety of arguments like semester, campus or isPassing -- this is cleaner and more flexible than creating different fields like coursesBySemester, coursesByCampus, etc.
If you're dealing with aggregate values like average, min, max, etc. it might make sense to expose those values as fields on each connection type, in the same way a count field is sometimes available alongside the nodes field. There's a (proposal)[https://github.com/prisma/prisma/issues/1312] for Prisma that illustrates one fairly neat way to do handle these aggregate values. Doing something like this would mean if you already have, for example, an Assessment type, a connection field might be sufficient to expose aggregate data about that type (like grade averages) without needing to expose a separate AssessmentInfo type.
Filtering is relatively straightforward, grouping is a bit tougher. If you do find that you need the nodes of a connection grouped by a particular field, again this may be best done by exposing an additional field on the connection itself, (like Gatsby does it)[https://www.gatsbyjs.org/docs/graphql-reference/#group].

Architecture for avoiding repeated data in GraphQL

I have a application where the same data is present in many places in the graph and need to optimize the data queries to avoid processing and sending the same data too often.
As an example consider the following pseudo schema:
type Group {
name: String
members: [Person]
}
type Person {
name: String
email: String
avatar: Avatar
follows: [Person]
followedBy: [Person]
contacts: [Person]
groups: [Group]
bookmarks: [Bookmark]
sentMessages: [Message]
receivedMessages: [Message]
}
type Message {
text: String
author: Person
recipients: [Person]
}
type Bookmark {
message: Message
}
Querying a users data can easily contain hundreds, if not thousands, of Person-objects even though it the small circle of friends/contacts/follows only contains tens of distict users.
In my real implementation about 80% of each GraphQL query (in bytes) is redundant and considering that the client does many different queries in the same space above 90% of all data transferred and processed is redundant.
How could I improve the model so that I don't have to load the same data again and again without complicating the client too much?
I'm using Apollo for both GraphQL client and server.
Use/implement pagination (instead of just arrays) for relations - this way you can query for count/total (render it without array processing) and array of ids only - usually there is no need to query/join person table (DB) at all.
Render list of Person components (react?) using passed id prop only ... only rendered Person fetches for more details (if not cached, use batching to merge requests) consumed/rendered inside.

Is it a bad practice to use an Input Type for a graphql Query?

I have seen that inserting an Input Type is recommended in the context of mutations but does not say anything about queries.
For instance, in learn tutorial just say:
This is particularly valuable in the case of mutations, where you might want to pass in a whole object to be created
I have this query:
type query {
person(personID: ID!): Person
brazilianPerson(rg: ID!): BrazilizanPerson
foreignerPerson(passport: ID!): ForeignerPerson
}
Instead of having a different type just because of the name (rg, passport) of the fields, or put one more argument like type in query, I could not just have the Person with an documentNr field and do an Input type like that?
input PersonInput {
documentNr : ID!
type: PersonType # this type is Foreign or Brazilian and with this I k
}
PersonType is a enum and with him I know if the document is a rg or a passport.
No, there is nothing incorrect about your approach. The GraphQL spec allows any field to have an argument and allows any argument to accept an Input Object Type, regardless of the operation. In fact, the differences between a query and a mutation are largely symbolic.
It's worth pointing out that any field can accept an argument -- not just ones at the root level. So if it suited your needs, you could easily set up a schema that would allow queries like:
query {
person(id: 1) {
powers(onlyMutant: true) {
name
}
}
}

Use Query Result as Argument in Next Level in GraphQL

Hullo everyone,
This has been discussed a bit before, but it's one of those things where there is so much scattered discussion resulting in various proposed "hacks" that I'm having a hard time determining what I should do.
I would like to use the result of a query as an argument for another nested query.
query {
allStudents {
nodes {
courseAssessmentInfoByCourse(courseId: "2b0df865-d7c6-4c96-9f10-992cd409dedb") {
weightedMarkAverage
// getting result for specific course is easy enough
}
coursesByStudentCourseStudentIdAndCourseId {
nodes {
name
// would like to be able to do something like this
// to get a list of all the courses and their respective
// assessment infos
assessmentInfoByStudentId (studentId: student_node.studentId) {
weightedMarkAverage
}
}
}
}
}
}
Is there a way of doing this that is considered to be best practice?
Is there a standard way to do it built into GraphQL now?
Thanks for any help!
The only means to substitute values in a GraphQL document is through variables, and these must be declared in your operation definition and then included alongside your document as part of your request. There is no inherent way to reference previously resolved values within the same document.
If you get to a point where you think you need this functionality, it's generally a symptom of poor schema design in the first place. What follows are some suggestions for improving your schema, assuming you have control over that.
For example, minimally, you could eliminate the studentId argument on assessmentInfoByStudentId altogether. coursesByStudentCourseStudentIdAndCourseId is a field on the student node, so its resolver can already access the student's id. It can pass this information down to each course node, which can then be used by assessmentInfoByStudentId.
That said, you're probably better off totally rethinking how you've got your connections set up. I don't know what your underlying storage layer looks like, or the shape your client needs the data to be in, so it's hard to make any specific recommendations. However, for the sake of example, let's assume we have three types -- Course, Student and AssessmentInfo. A Course has many Students, a Student has many Courses, and an AssessmentInfo has a single Student and a single Course.
We might expose all three entities as root level queries:
query {
allStudents {
# fields
}
allCourses {
# fields
}
allAssessmentInfos {
# fields
}
}
Each node could have a connection to the other two types:
query {
allStudents {
courses {
edges {
node {
id
}
}
}
assessmentInfos {
edges {
node {
id
}
}
}
}
}
If we want to fetch all students, and for each student know what courses s/he is taking and his/her weighted mark average for that course, we can then write a query like:
query {
allStudents {
assessmentInfos {
edges {
node {
id
course {
id
name
}
}
}
}
}
}
Again, this exact schema might not work for your specific use case but it should give you an idea around how you can approach your problem from a different angle. A couple more tips when designing a schema:
Add filter arguments on connection fields, instead of creating separate fields for each scenario you need to cover. A single courses field on a Student type can have a variety of arguments like semester, campus or isPassing -- this is cleaner and more flexible than creating different fields like coursesBySemester, coursesByCampus, etc.
If you're dealing with aggregate values like average, min, max, etc. it might make sense to expose those values as fields on each connection type, in the same way a count field is sometimes available alongside the nodes field. There's a (proposal)[https://github.com/prisma/prisma/issues/1312] for Prisma that illustrates one fairly neat way to do handle these aggregate values. Doing something like this would mean if you already have, for example, an Assessment type, a connection field might be sufficient to expose aggregate data about that type (like grade averages) without needing to expose a separate AssessmentInfo type.
Filtering is relatively straightforward, grouping is a bit tougher. If you do find that you need the nodes of a connection grouped by a particular field, again this may be best done by exposing an additional field on the connection itself, (like Gatsby does it)[https://www.gatsbyjs.org/docs/graphql-reference/#group].

Resources