Why are edges required in a Relay/GraphQL Connection? - graphql

In a Relay/GraphQL schema configuration, one-to-many relationships (with pagination) are specified as in the tutorial example
type ShipConnection {
edges: [ShipEdge]
pageInfo: PageInfo!
}
type ShipEdge {
cursor: String!
node: Ship
}
However, the one-to-one connection made by ShipEdge seems redundant. Why can't we move the cursor to ShipConnection and store an array of Ship IDs as edges?
type ShipConnection {
edges: [Ship]
pageInfo: PageInfo!
cursor: String!
}
What were the design decisions to require one extra object for every edge in a one-to-many relationship?

(Updated with more explanations)
There are 3 ways to represent an array of data in GraphQL:
List: Use when you have a finite list of associated objects that you're fine fetching all at once. In GraphQL SDL, this is represented as [Ship].
Nodes: Use when you need to paginate over a list, usually because there can be thousands of items. Note that this is not part of the Relay specification and as such is not supported by the Relay client (instead, you'd wrap the item in an edge as described in #3), but some other clients such as Apollo are more flexible and support this construct (but you need to provide more boilerplate). In GraphQL, this would be represented as type ShipConnection { nodes: [Ship], pageInfo: PageInfo! }.
Edges: Use when, in addition to pagination, you also need to provide extra information for each edge in the connection (read below for more details). In GraphQL, you'd write it as type ShipConnection { edges: [ShipEdge], pageInfo: PageInfo! }.
Note that your GraphQL server might support all three options for a specific association, and the client then selects which field they want. Here's how they'd all look together:
type Query {
ships: [Ship] // #1
shipsConnection: [ShipConnection]
}
type ShipConnection {
nodes: [Ship] // #2
edges: [ShipEdge] // #3
pageInfo: PageInfo!
}
type PageInfo {
endCursor // page-based pagination
hasNextPage
}
type ShipEdge {
cursor: String! // edge-based pagination
node: Ship
// ... edge attributes
}
type Ship {
// ... ship attributes
}
Lists (#1) should only ever be used when you know that the number of items won't grow (for example, if you have a Post, you may want to return tags as a List, but you shouldn't do that with comments). To decide between #2 and #3, there are two reasons for using edges over just plain nodes:
It's a place for edge-specific attributes. For example, if you have a User that belongs to many Groups, in a relational database you'd have a UserGroup table with user_id and group_id. This table can have additional attributes like role, joined_at etc. The GroupUserEdge would then be the place where you could access these attributes.
Have a place for the cursor. Relay, in addition to page-based pagination (using pageInfo) supports edge-based pagination. Why does Relay need a cursor for each edge? Because Relay intelligently merges data requirements from your entire app, it may already have a connection with the same parameters you're requesting but not enough records in it. To fetch the missing data, it can ask for data in the connection after some edge's cursor.
I understand it may be confusing, considering databases have cursors, too, and there is only one cursor per query. A Relay connection is not a query really, it's rather a set of parameters that identify a query. A cursor of connection's edge is a set of parameters that identify a position within a connection. This is a higher abstraction level than a pure query cursor (remember that edges need to be able to identify a position even over a connection that might not be a DB query, or be hidden by a 3rd party system). Because of this required flexibility, one cursor for a connection would not be enough.

The edges field provides you with a place to put per-edge data. For example, you might want to put a creator or priority field on there, describing who added the edge and how important the relationship is, respectively.
If you don't require this kind of flexibility (or the other features that you get with connections, such as pagination), you could use a simple GraphQLList type. See this answer for more on the difference between between connections and lists.

We've written a blog article about the differences between a simple GraphQL schema vs a Relay-specific schema:
https://www.prisma.io/blog/connections-edges-nodes-in-relay-758d358aa4c7

Related

GraphQL - limiting the number of subqueries to prevent Batching Attack

I want to understand if there is a mechanism to limit the number of subqueries within the GraphQL query to mitigate against the GraphQL batching attack. (It's possible to send more than one mutation request per HTTP request because of the GraphQL batching feature)
Eg.
{
first: changeTheNumber(newNumber: 1) {
theNumber
}
second: changeTheNumber(newNumber: 1) {
theNumber
}
third: changeTheNumber(newNumber: 1) {
theNumber
}
}
I'm using graphql-java-kickstart.
In graphql-java there are two instrumentations that can check the "depth" or "complexity" or your query:
MaxQueryDepthInstrumentation
MaxQueryComplexityInstrumentation
The first one checks the depth of a query (how many levels are requests) the second one counts the fields. You can configure the expected max depth/complexity and if a query is deeper/complexer than your configured number it is rejected.
You can customize the behaviour of the MaxQueryComplexityInstrumentation so that some fields count as "more complex" than others (for example you could say a plain string field is less complex than a field that requires it's own database request when processed).
Here is an example that uses a custom direcive (Complexity) in a schema description to determine the complexity of a field.
If you only want to avoid that a concrete field is requested more than once, you could write you own Instrumentation or use the DataFetchingEnvironment in your resolver function to count the number of that fields in the current query (getSelectionSet() gives access to all fields contained in the current query).

How do I query nodes that are missing a child of a specific type?

I'm new to graphql, and trying to understand how I might fill this use case.
I have thousands of nodes of a specific type/schema.
Some of these nodes have children, some of them don't.
I'd like to query all the nodes, and return only the ones that don't have children.
This might get more specific in the future, where I'd like to query only nodes that don't have children of a specific type.
Is that even possible?
I've seen plenty of query examples that show how to select children nodes, or nested nodes + fields, or nodes with specific values. It's an easy thing with SQL, I'm just having trouble understanding how it's done with graphql.
Thoughts?
As Daniel Rearden said, there is no built in way in GraphQL to filter or sort the results of a query. We have a few filters in our Gentics Mesh GraphQL API, but it is currently not possible to create a filter involving another list of items (children in your case).
I've added your case to the issue in Github. https://github.com/gentics/mesh/issues/27

GraphQL + Relay Connection Optimization

Using Relay + GraphQL (graphql-relay-js) connections and trying to determine the best way to optimize queries to the data source etc.
Everything is working, though inefficient when connection results are sliced. In the below query example, the resolver on item will obtain 200+ records for sale 727506341339, when in reality we only need 1 to be returned.
I should note that in order to fulfill this request we actually make two db queries:
1. Obtain all items ids associated with a sale
2. Obtain item data for each item id.
In testing and reviewing of the graphql-relay-js src, it looks like the slice happens on the final connection resolver.
Is there a method provided, short of nesting connections or mutating the sliced results of connectionFromArray, that would allow us to slice the results provided to the connection (item ids) and then in the connection resolver fetch the item details against the already sliced id result set? This would optimize the second query so we would only need to query for 1 items details, not all items...
Obviously we can implement something custom or nest connections, though it seems this is something that would be avail, thus I feel like I am missing something here...
Example Query:
query ItemBySaleQuery {
viewer {
item (sale: 727506341339) {
items (first:1){
edges {
node {
dateDisplay,
title
}
}
}
}
}
}
Unfortunately the solution is not documented in the graphql-relay-js lib...
Connections can use resolveNode functions to work directly on an edge node. Example: https://github.com/graphql/graphql-relay-js/blob/997e06993ed04bfc38ef4809a645d12c27c321b8/src/connection/tests/connection.js#L64

Elastic Search: Modelling data containing variable fields

I need to store data that can be represented in JSON as follows:
Article{
Id: 1,
Category: History,
Title: War stories,
//Comments could be pretty long and also be changed frequently
Comments: "Nice narration, Reminds me of the difficult Times, Tough Decisions"
Tags: "truth, reality, history", //Might change frequently
UserSpecifiedNotes:[
//The array may contain different users for different articles
{
userid: 20,
note: "Good for work"
},
{
userid: 22,
note: "Homework is due for work"
}
]
}
After having gone through different articles, denormalization of data is one of the ways to handle this data. But since common fields could be pretty long and even be changed frequently, I would like to not repeat it. What could be the other ways better ways to represent and search this data? Parent-child? Inner object?
Currently, I would be dealing with a lot of inserts, updates and few searches. But whenever search is to be done, it has to be very fast. I am using NEST (.net client) for using elastic search. The search query to be used is expected to work as follows:
Input: searchString and a userID
Behavior: The Articles containing searchString in either Title, comments, tags or the note for the given userIDsort in the order of relevance
In a normal scenario the main contents of the article will be changed very rarely whereas the "UserSpecifiedNotes"/comments against an article will be generated/added more frequently. This is an ideal use case for implementing parent-child relation.
With inner object you still have to reindex all of the "man article" and "UserSpecifiedNotes"/comments every time a new note comes in. With the use of parent-child relation you will be just adding a new note.
With the details you have specified you can take the approach of 4 indices
Main Article (id, category, title, description etc)
Comments (commented by, comment text etc)
Tags (tags, any other meta tag)
UserSpecifiedNotes (userId, notes)
Having said that what need to be kept in mind is your actual requirement. Having parent-child relation will need more memory, and ma slow down search performance a tiny bit. But indexing will be faster.
On the other hand a nested object will increase your indexing time significantly as you need to collect all the data related to an article before indexing. You can of course store everything and just add as an update. As a simpler maintenance and ease of implementation I would suggest use parent-child.

Is there an efficient index persistent data structure with multiple indexes

I am looking for an efficient indexed persistent data structure. I typically work in .NET and am aware of FSharp's Map however that implementation and most others I am aware of only provide a single 'index', the left side of the mapping.
Basically here is the scenario
public class MyObject
public int Id { get; }
public int GroupId { get; }
public string Name { get; }
Where the Id of an object will be globally unique set of items added. GroupId may have duplicate values, and I want to be able to query for all values with a matching GroupId and within a GroupId names will be unique but may be duplicated across different GroupId's. This not a situation where I can simply create a composite key of the 3 fields as I need independent access to groups of the items based on particular field values.
I can do this, and have in the past, using dictionaries of dictionaries, which has been recommended in other posts here on STackoverflow...however, I also want the data structure to be
1) Fully Persistent and everything that means
2) efficient in memory - meaning that versions need to share as many nodes as possible
3) efficient in modifcations - I would like it to be fast
I realize that I am asking for quite a bit here but I wanted to ask to avoid even trying to re-invent the wheel if it has already been done.
Thanks
I am not sure why elsewhere, and in existing replies to your question, people recommend to imbricate existing structures. Imbricating structures (maps of maps, maps of lists, dictionaries of dictionaries, ...) only works for two indexes if one is looser than the other (two values having the same index for Index1 implies these two values have the same index for Index2), which is an unnecessary constraint.
I would use a record of maps, as many of them as you want different indexes, and I would maintain the invariant that every value that is present in a map is present in all the others in the same record. Adding a value obviously requires adding it to all maps in the record. Similarly for removal. The invariant can be made impossible to transgress from the outside through encapsulation.
If you worry that the values stored in your data structure would be duplicated, don't. Each map would only contain a pointer. They would all point to the same single representation of the value. Sharing will be as good as it already is with simple single-indexed maps.
Just as you could use a Dictionary of Dictionaries, I expect that e.g. an F# Map of Maps may be what you want, e.g.
Map<int, Map<string, MyObject> > // int is groupid, string is name
maybe? I am unclear if you also need fast access by integer id.
You might also check out Clojure's library; I don't know much about Clojure, but a range of efficient persistent data structures seems to be one of Clojure's strengths.
It seems that you are trying to apply OOP principles to your FP application.
If you think in terms of functions, what is it you are trying to do?
If you use a List, for example, you can just tell it you want to pull all the objects that have a certain group value.
If you need fast access by group you could have a Map of Lists so you can pull up all the objects in a group.
There are different data structures and many functions that work on each, but you should first think about your problem from a functional, not object-oriented, POV.

Resources