How to prevent clients from navigating my GraphQL schema? - graphql

Okay so this is a theoretical question so I don't have any code to show. Consider that I have GraphQL types like so:
type A {
field1: String!
field2: B!
}
type B {
fieldB1: Int!
fieldB2: C!
aRef: A!
}
type C {
fieldC1: Boolean!
fieldC2: D!
bRef: B!
}
So my question is that given a query which can enter some point in my graph. How do I prevent clients from travelling that graph to an arbitrary depth and asking for way more data than they should. Is there any way to restrict how many relationships a query can navigate ? Can I control this on server side ? How do I stop queries which can navigate from A to B to C to D and so forth without restriction ?

The current GraphQL specification (which can be read https://spec.graphql.org/) does not have depth limit features defined as part of the spec.
However, many GraphQL server implementations support a depth limit to prevent cyclic queries - or there are packages which does this (for example graphql-depth-limit for NodeJS)
The keyword you want to search for is depth limit for GraphQL.
A good read is an article GraphQL Cyclic Queries and Depth Limiting:
Many GraphQL implementations provide a specific parameter that you just have to set to a given value so that queries that exceed this depth level are automatically ignored by the GraphQL engine, without even starting the evaluation.
There are several ways to proceed, depending on the GraphQL engine you use:
Apollo, Express GraphQL, GraphQL Node

Related

Where are caller-dependent fields defined in GraphQL

I'm currently writing a POC GraphQL server. However the client UI needs fields that are transient (not part of the DB model), and formed or queried on the fly. For example: A Post can be liked, so I would like to put a flag isLiked: Boolean in graphQL. Depending on the caller this flag would be true if the Post is liked, or false if not.
However this feels not right, as it is strictly speaking not part of the Post-type and is a form of UI-coupling (something we wanted to solve with GraphQL). Also I have the feeling there could be a better way that provides the Like as a type (as it also has a date for example). Would it be a good idea to have caller dependent fields defined in the type which are basically sub-queries?
GraphQL has some limitations, and I don't think it's wrong to add special-case fields to the schema to work around these limitations, particularly if they represent some query some actual application will make (and frequently).
Conversely, since a caller always specifies which specific fields they want out of an object, adding extra fields shouldn't really cost you anything in terms of performance or database queries. So: do both!
scalar DateTime
interface Node { id: ID! }
type Like implements Node {
id: ID!
post: Post!
date: DateTime!
}
type Post implements Node {
id: ID!
title: String!
date: DateTime!
likes: [Like!]!
hasLike: Bool!
}
If you don't do this then the client's only choice is to query for specific like objects and pick some field out of them. If you add limit-type parameters to the field you can minimize the cost, but it still feels a little awkward
query PostSummary($id: ID!) {
node(id: $id) {
... on Post {
title
date
likes(limit: 1) { id }
}
}
}
If this is a real use case for your application, just adding the hasLike field seems like a more reasonable API, even if it's somewhat "specialized to the UI".
I agree with David's answer, but just to offer a different perspective, there's something to be said for keeping user-specific fields out of types that otherwise are not. There's another alternative available, and that's to move such fields into the user or viewer query that already returns the data specific to the logged-in user. For example, you could have
type User {
id: ID!
username: String!
likedPosts: [Post!]!
# or better yet
likedPostIds: [ID!]!
}
Of course, the downside to this approach is your client has to be "smart" enough to use the above to then derive whether the post was liked, which adds complexity on the front end.
The upside is, if you perform a logout or switch users, you only have to refetch the one query -- you don't have to blow away your entire cache because it's peppered with user-specific data that will now have to be refetched.
This kind of approach can also help performance. Any relational field, whether user-specific or not, will incur an additional cost. With this approach, your login query may be bloated and slower, but any subsequent queries will be that much faster. As your data grows, both in terms of breadth and depth, those performance gains can be significant.

Subquery, for lack of a better term, when using an API written in GraphQL

I'm relatively new to GraphQL so please bear with me ...
That said, I'm writing an app in node.js to push/pull data from two disparate systems, one of which has an API written in GraphQL.
For the Graph system, I have, something like, the following types defined for me:
Time {
TimeId: Int
TaskId: Int
ProjectId: Int
Project: [Project]
TimeInSeconds: Int
Timestamp: Date
}
and
Task {
TaskId: Int
TaskName: String
TaskDescription: String
}
Where Project is another type whose definition isn't important, only that it is included in the type definition as a field...
What I would like to know is if there is a way to write a query for Time in such a way that I can include the Task type's values in my results in a similar way as the values for the Project type are included in the definition?
I am using someone else's API and do not have the ability to define my own custom types. I can write my own limited queries, but I don't know if the limits are set by the devs that wrote the API or my limited ability with GraphQL.
My suspicion is that I cannot and that I will have to query both separately and combine them after the fact, but I wanted to check here just in case.
Unfortunately, unless the Time type exposes some kind of field to fetch the relevant Task, you won't be able to query for it within the same request. You can include multiple queries within a single GraphQL request; however, they are ran in parallel, which means you won't be able to use the TaskId value returned by one query as a variable used in another query.
This sort of problem is best solved by modifying the schema, but if that's not an option then unfortunately the only other option is to make each request sequentially and then combine the results client-side.

graphql- same query with different arguments

Can the below be achieved with graph ql:
we have getusers() / getusers(id=3) / getusers(name='John). Can we use same query to accept different parameters (arguments)?
I assume you mean something like:
type Query {
getusers: [User]!
getusers(id: ID!): User
getusers(name: String!): User
}
IMHO the first thing to do is try. You should get an error saying that Query.getusers can only be defined once, which would answer your question right away.
Here's the actual spec saying that such a thing is not valid: http://facebook.github.io/graphql/June2018/#example-5e409
Quote:
Each named operation definition must be unique within a document when
referred to by its name.
Solution
From what I've seen, the most GraphQL'y way to create such an API is to define a filter input type, something like this:
input UserFilter {
ids: [ID]
names: [String]
}
and then:
type Query {
users(filter: UserFilter)
}
The resolver would check what filters were passed (if any) and query the data accordingly.
This is very simple and yet really powerful as it allows the client to query for an arbitrary number of users using an arbitrary filter. As a back-end developer you may add more options to UserFilter later on, including some pagination options and other cool things, while keeping the old API intact. And, of course, it is up to you how flexible you want this API to be.
But why is it like that?
Warning! I am assuming some things here and there, and might be wrong.
GraphQL is only a logical API layer, which is supposed to be server-agnostic. However, I believe that the original implementation was in JavaScript (citation needed). If you then consider the technical aspects of implementing a GraphQL API in JS, you might get an idea about why it is the way it is.
Each query points to a resolver function. In JS resolvers are simple functions stored inside plain objects at paths specified by the query/mutation/subscription name. As you may know, JS objects can't have more than one path with the same name. This means that you could only define a single resolver for a given query name, thus all three getusers would map to the same function Query.getusers(obj, args, ctx, info) anyway.
So even if GraphQL allowed for fields with the same name, the resolver would have to explicitly check for whatever arguments were passed, i.e. if (args.id) { ... } else if (args.name) { ... }, etc., thus partially defeating the point of having separate endpoints. On the other hand, there is an overall better (particularly from the client's perspective) way to define such an API, as demonstrated above.
Final note
GraphQL is conceptually different from REST, so it doesn't make sense to think in terms of three endpoints (/users, /users/:id and /users/:name), which is what I guess you were doing. A paradigm shift is required in order to unveil the full potential of the language.
a request of the type works:
Query {
first:getusers(),
second:getusers(id=3)
third:getusers(name='John)
}

What is the convention around derivative information?

I am working on a service that provides information about a few related entities, somewhat like a database. Suppose that there's calls to retrieve information about a school:
service MySchool {
rpc GetClassRoom (ClassRoomRequest) returns (ClassRoom);
rpc GetStudent (StudentRequest) returns (Student);
}
Now, suppose that I want to find out a class room's information, I'd receive a proto that looks like so:
message ClassRoom {
string id = 1;
string address = 2;
string teacher = 3;
}
Sometimes I also want to know all of the students of the classroom. I am struggling to think which is the better design pattern.
Option A) Add an extra rpc like so: rpc GetClassRoomStudents (ClassRoomRequest) returns (ClassRoomStudents), where ClassRoomStudents has a single field repeated Student students. This technique requires more than one call to get all the information that we want (and many if we wanted to know information for more than one classroom).
Option B) Add an extra repeated Student students field to the ClassRoom proto, and B') Fill it up only when necessary, or B") Fill it up whenever the server receives a GetClassRoom call. This may sometimes fetch extra information, or lead to ambiguity according to what fields are filled up.
I am not sure what's the best / most conventional way of dealing with this. How have some of you dealt with this?
There is no simple answer. It's a tradeoff between simplicity (option A) and performance (option B), and it depends on the situation which solution is best.
In general, I'd recommend to go with the simple solution first, unless your measurements demonstrate that it leads to performance issues. At that point, it's easy to add repeated Student students to ClassRoom and a field bool fetch_students [default=false] to ClassRoomRequest. Then clients are free to continue using the simple API, or choose to upgrade to the more performant API if they need to.
Note that this isn't specific to gRPC; the same issue is seen in REST APIs, and basically almost any request/response model.

What is the point of naming queries and mutations in GraphQL?

Pardon the naive question, but I've looked all over for the answer and all I've found is either vague or makes no sense to me. Take this example from the GraphQL spec:
query getZuckProfile($devicePicSize: Int) {
user(id: 4) {
id
name
profilePic(size: $devicePicSize)
}
}
What is the point of naming this query getZuckProfile? I've seen something about GraphQL documents containing multiple operations. Does naming queries affect the returned data somehow? I'd test this out myself, but I don't have a server and dataset I can easily play with to experiment. But it would be good if something in some document somewhere could clarify this--thus far all of the examples are super simple single queries, or are queries that are named but that don't explain why they are (other than "here's a cool thing you can do.") What benefits do I get from naming queries that I don't have when I send a single, anonymous query per request?
Also, regarding mutations, I see in the spec:
mutation setName {
setName(name: "Zuck") {
newName
}
}
In this case, you're specifying setName twice. Why? I get that one of these is the field name of the mutation and is needed to match it to the back-end schema, but why not:
mutation {
setName(name: "Zuck") {
...
What benefit do I get specifying the same name twice? I get that the first is likely arbitrary, but why isn't it noise? I have to be missing something obvious, but nothing I've found thus far has cleared it up for me.
The query name doesn't have any meaning on the server whatsoever. It's only used for clients to identify the responses (since you can send multiple queries/mutations in a single request).
In fact, you can send just an anonymous query object if that's the only thing in the GraphQL request (and doesn't have any parameters):
{
user(id: 4) {
id
name
profilePic(size: 200)
}
}
This only works for a query, not mutation.
EDIT:
As #orta notes below, the name could also be used by the server to identify a persistent query. However, this is not part of the GraphQL spec, it's just a custom implementation on top.
We use named queries so that they can be monitored consistently, and so that we can do persistent storage of a query. The duplication is there for query variables to fill the gaps.
As an example:
query getArtwork($id: String!) {
artwork(id: $id) {
title
}
}
You can run it against the Artsy GraphQL API here
The advantage is that the same query each time, not a different string because the query variables are the bit that differs. This means you can build tools on top of those queries because you can treat them as immutable.

Resources