Is there a good way to combine resolvers for multiple GraphQL objects, preferably using graphql-js? - graphql

Suppose you have a GraphQL layer, written on node.js using graphql-js, that communicates with a SQL database. Suppose you have the following simple types and fields:
Store
A single brick-and-mortar location for a chain of grocery stores.
Fields:
id: GraphQLID
region: StoreRegion
employees: GraphQLList(Employee)
StoreRegion
A GraphQLEnumType containing the list of regions into which the chain divides its stores.
Values:
NORTHEAST
MIDATLANTIC
SOUTHEAST
...
Employee
Represents a single employee working at a store.
Fields:
id: GraphQLID
name: GraphQLString
salary: GraphQLFloat
Suppose the API exposes a store query that accepts a Region and returns a list of Store objects. Now suppose the client sends this query:
{
store(region: NORTHEAST) {
employees {
name
salary
}
}
}
Hopefully this is all pretty straightforward so far.
So here's my question, and I hope (expect, really) that it's something that has a common solution and I'm just having trouble finding it because my Google-Fu is weak today: is there a good way that can I write the resolvers for these types such that I can wrap up all the requested fields for all the employees from all the returned stores into a single SQL query statement, resulting in one round-trip to the database of the form:
SELECT name,salary FROM employees WHERE id IN (1111, 1133, 2177, ...)
rather than making one request per employee or even one request per store?
This is really a concrete instance of a more general question: is there a good way to combine resolvers to avoid making multiple requests in cases where they could be easily combined?
I'm asking this question in terms of graphql-js because that's what I'm hoping to work with, and since I figure that would allow for more specific answers, but if there's a more implementation-agnostic answer, that would be cool too.

So, basically you are wondering how you can combine multiple resolvers into fewer database queries. This is trying to solve what they call the N+1 query problem. Here’s at least two ways you can solve this.
DataLoader: This is a more general solution and it's created by Facebook. You could use this to batch multiple queries that query a single item of a single type into a single query that queries multiple items of a single type. In your example you would batch all employees into a single DB query and you would still have a separate query for getting the store. Here's a video by Ben Awad that explains DataLoader
JoinMonster: Specifically for SQL. Will do JOINs to make one SQL query per graphql query. Here's a video by Ben Awad explaining JoinMonster

Related

best practice for very simple relation on a nosql table

I am using a dynamoDB table with GraphQL api to store posts. I want a user to be able to mark certain posts as favorites.
I have thought to create a relation table of user to post, but I also thought to just add an array of userId's to the post object with all the userIds of users who have made that post a favorite.
My understanding is a UUID is 16 bytes so even if say 10,000 users favorite the object then that array will be 160kb. Not insignificant but manageable to pass that much data with the object each time it is loaded.
Just wondering what is the best practice for this scenario. I'm pretty new to nosql.
With dynamoDB you have to think about access patterns first:
To get the favorite posts of a user, store a postsIds array in the user table
To get the users who like a post, store a likerIds array in the post table
To get a bidirectional link, do both of the above
Please also keep in mind that:
You can select fields when getting a document (only select the fields you are interested in)
I don't see a scenario where you would load 10k usernames and display them
The above solution looks pretty good for common scenarios.
More advanced solution:
There could be a more powerful way to do that using range keys. For instance:
Hash Key: postID
range key: likerID
title
...
post1
MyFancyPost
post1
user1
post1
user2
This structures is more powerful, and could store a lot of connections without having any "big" field in the post model.
you could easily paginate, and count the list of likers
can handle many more likers for only one post

Does GraphQL ever redundantly visit fields during execution?

I was reading this article and it used the following query:
{
getAuthor(id: 5){
name
posts {
title
author {
name # this will be the same as the name above
}
}
}
}
Which was parsed and turned into an AST like the one below:
Clearly it is bringing back redundant information (the Author's name is asked for twice), so I was wondering how GraphQL Handles that. Does it redundantly fetch that information? Is the diagram a proper depiction of the actual AST?
Any insight into the query parsing and execution process relevant to this would be appreciated, thanks.
Edit: I know this may vary depending on the actual implementation of the GraphQl server, but I was wondering what the standard / best practice was.
Yes, GraphQL may fetch the same information multiple times in this scenario. GraphQL does not memoize the resolver function, so even if it is called with the same arguments and the same parent value, it will still run again.
This is a fairly common problem when working with databases in GraphQL. The most common solution is to utilize DataLoader, which not only batches your database requests, but also provides a cache for those requests for the duration of the GraphQL request. This way, even if a particular record is requested multiple times, it will only be fetched from the database once.
The alternative (albeit more complicated) approach is to compose a single database query based on the requested fields that executes at the root level. For example, our resolver for getAuthor could constructor a single query that would return the author, their posts and each of that post's author. With this approach, we can skip writing resolvers for the posts field on the Author type or the author field on the Post type and just utilize the default resolver behavior. However, in order to do this and avoid overfetching, we have to parse the GraphQL request inside the getAuthor resolver in order to determine which fields were requested and should therefore be included in our database query.

GraphQL: Can you mutate the results of a query?

In writing this question I realised that there is something very specific I want to be able to do in GraphQL, and I can't see a good way of implementing it. The idea is this:
One of the nice things about GraphQL is that it allows you to make flexible queries. For example, if I want to find all the comments on all the posts of each user in a particular forum then I can make the query
query{
findForum(id:7){
users{
posts{
comments{
content
}
}
}
}
}
which is great. Often, you want to collect data with the intention of mutating it. So in this case, maybe I don't want to fetch all of those comments, and instead I want to delete them. A naive suggestion is to implement a deleteComment field on the comment type, which mutates the object it is called on. This is bad because the request is tagged as a query, so it should not mutate data.
Since we're mutating data, we should definitely tag this as a mutation. But then we lose the ability to make the query we wanted to make, because findForum is a query field, not a mutation field. A way around this might be to redefine all the query fields you need inside the mutation type. This is obviously not a good idea, because you repeat a lot of code, and also make the functionality for query a strict subset of that of mutation.
Now, what I regard as the 'conventional' solution is to make a mutation field which does this job and nothing else. So you define a mutation field deleteAllUserPostCommentsByForum which takes an argument, and implement it in the obvious way. But now you've lost the flexibility! If you decide instead that you want to find the user explicitly, and delete all their posts, or if you only want to delete some of their posts, you need a whole new mutation field. This feels like precisely the sort of thing I though GraphQL was useful for when compared to REST.
So, is there a good way to avoid these problems simultaneously?
Under the hood, the only real difference between queries and mutations is that if a single operation includes multiple mutations, they are resolved sequentially (one at a time) rather than concurrently. Queries, and all other fields are resolved concurrently. That means for an operation like this:
mutation myOperation {
editComment(id: 1, body: "Hello!")
deleteComment(id: 1)
}
The editComment mutation will resolve before the deleteComment mutation. If these operations were queries, they would both be ran at the same time. Likewise, consider if you have a mutation that returns an object, like this:
mutation myOperation {
deleteComment(id: 1) {
id
name
}
}
In this case, the id and name fields are also resolved at the same time (because, even though they are returned as part of a mutation, the fields themselves are not mutations).
This difference in behavior between queries and mutations highlights why by convention we define a single mutation per operation and avoid "nesting" mutations like your question suggests.
The key to making your mutations more flexible lies in how you pass in inputs to your mutation subsequently how you handle those inputs inside your resolver. Instead of making a deleteAllUserPostCommentsByForum mutation, just make a deleteComments mutation that accepts a more robust InputType, for example:
input DeleteCommentsInput {
forumId: ID
userId: ID
}
Your resolver then just needs to handle whatever combination of input fields that may be passed in. If you're using a db, this sort of input very easily translates to a WHERE clause. If you realize you need additional functionality, for example deleting comments before or after a certain date, you can then add those fields to your Input Type and modify your resolver accordingly -- no need to create a new mutation.
You can actually handle creates and edits similarly and keep things a little DRY-er. For example, your schema could look like this:
type Mutation {
createOrUpdateComment(comment: CommentInput)
}
input CommentInput {
id: ID
userId: ID
body: String
}
Your resolver can then check whether an ID was included -- if so, then it treats the operation as an update, otherwise it treats the operation as an insert. Of course, using non-nulls in this case can get tricky (userId might be needed for a create but not an update) so there's something to be said for having separate Input Types for each kind of operation. However, hopefully this still illustrates how you can leverage input types to make your mutations more flexible.
IMHO you lose many indirect aspects.
Trying to create 'flexible' query can result in highly unoptimized server actions.
Queries are resolved structurally, level by level, which may result in processing to many unnecessary data (high memory usage). It can't be optimized on lower layers (f.e. sql server) - it will result in a naive implementation (processing) like many 'manually fired' SQL queries vs. one more complex query with conditions.
In this case f.e. server doesn't need all users at all while user's post/comment usually contain user_id (and forum/thread/post ids) field - it can be processed directly on one table (with joined posts). You don't need the whole structure to affect only some of the elements.
The real power and flexibility of graphQL are placed on the resolvers.
Notice that deleting all or only some comments can be completely different implemented. Resolver can choose a better way (by parameters as Daniel wrote) but for simplicity (readability of the API) it would be better to have a separate mutations.

Search/retrieve by a large OR query clause with Solr or Elasticsearch

I have a search database of car models: "Nissan Gtr", "Huynday Elantra", "Honda Accord", etc...
Now I also have a user list and the types of cars they like
user1 likes: carId:1234, carId:5678 etc...
Given user 1 I would like to return all the cars he likes, it can be 0 to even hundreads.
What the best way to model this in Solr or potentially another "nosql" system that can help with this problem.
I'm using Solr but I have the opportunity to use another system if I can and if it makes sense.
EDIT:
Solr solution is to slow for Join (Maybe we can try nested). And the current MySQL solution which uses join tables has over 2 billion rows.
so, you just want to store a mapping between User->Cars, and retrieve the cars based on the user...sounds very simple:
Your docs are Users: contain id (indexed), etc fields
one of the field is 'carsliked', multivalued, which contains the set of car ids he likes
you have details about each care in a different collection for example.
given a user id, you retrieve the 'carsliked' field, and get the car details with a cross collection join
You could also use nested object to store each liked car (with all the info about it) inside each user, but is a bit more complex. As a plus, you don't need the join on the query.
Solr would allow you many more things, for example, given a car, which users do like it? Elasticsearch will work exactly the same way (and probably many other tools, given how simple your use case seems).

Is there anything wrong with creating Couch DB views with null values?

I've been doing a fair amount of work with Couch DB in my spare time recently and really enjoy using it. I find it to be much more flexible than using a relational database, but it's not without it's disadvantages.
One big disadvantage is the lack of dynamic queries / view generation... So you have to do a fair amount of work in planning and justifying your views, as you can't put that logic into your application code as you might do with SQL.
For example, I wrote a login scheme based on a JSON document template that looked a little bit like this:
{
"_id": "blah",
"type": "user",
"name": "Bob",
"email": "bob#theaquarium.com",
"password": "blah",
}
To prevent the creation of duplicate accounts, I wrote a very basic view to generate a list of user names to lookup as keys:
emit(doc.name, null)
This seemed reasonably efficient to me. I think it's way better than dragging out an entire list of documents (or even just a reduced number of fields for each document). So I did exactly the same thing to generate a list of email addresses:
emit(doc.email, null)
Can you see where I'm going with this question?
In a relational database (with SQL) one would simply make two queries against the same table. Would this technique (of equating a view to the product of an SQL query) be in some way analogous?
Then there's the performance / efficiency issue... Should those two views really be just one? Or is the use of a Couch DB view with keys and no associated value an effective practice? Considering the example above, both of those views would have uses outside of a login scheme... If I ever need to generate a list of user names, I can retrieve them without an additional overhead.
What do you think?
First, you certainly can put the view logic into your application code - all you need is an appropriate build or deploy system that extracts the views from the application and adds them to a design document. What is missing is the ability to generate new queries on the fly.
Your emit(doc.field,null) approach certainly isn't surprising or unusual. In fact, it is the usual pattern for "find document by field" queries, where the document is extracted using include_docs=true. There is also no need to mix the two views into one, the only performance-related decision is whether the two views should be placed in the same design document: all views in a design document are updated when any of them is accessed.
Of course, your approach does not actually guarantee that the e-mails are unique, even if your application tries really hard. Imagine the following circumstances with two client applications A and B:
A: queries view, determines that `test#email.com` does not exist.
B: queries view, determines that `test#email.com` does not exist.
A: creates account with `test#email.com`
B: creates account with `test#email.com`
This is a rare occurrence, but nonetheless possible. A better approach is to keep documents that use the email address as the key, because access to single documents is transactional (it's impossible to create two documents with the same key). Typical example:
{
_id: "test#email.com",
type: "email"
user: "000000001"
}
{
_id: "000000001",
type: "user",
email: "test#email.com",
firstname: "Test",
...
}
EDIT: a reservation pattern only works if two clients attempting to create an account for a given e-mail will reliably try to access the same document. If you randomly generate a new identifier, then client A will create and reserve document XXXX while client B will create and reserve document YYYY, and you will end up with two different documents that have the same e-mail.
Again, the only way to perform a transactional "check if it exists, create if it does not" operation is to have all clients alter a single document.

Resources