relay cache invalidation using pattern queries - graphql

I'm using relay for a react-native app and need to force invalidate, or forceFetch my a query at certain points in the app's lifecycle. I've noticed that you can create a query using Relay and then forceFetch it. This updates the cache. Ideally, I'd like to do what is done in the Mutations API. I'd like to provide a fat query that intersects with my existing data, and refetches that intersection. But I can't figure out how to do this. If I attempt to do this:
var query = Relay.createQuery(
Relay.QL`
query Root {
viewer {
unreadNotificationCount,
notifications
}
}
`, {
}
And apply environment.forceFetch({ query })I get validation errors about notifications not including either first or last arguments. But I'd rather not have to include first, or last arguments, because I don't know how many notifications have loaded in the rest of my app.
Any ideas here?

Unfortunately, there's no trivial way for you to do that intersection, at least not just using forceFetch. The mutation API does much more than try to just fetch a pattern query - it performs the intersection against the local graph to compute the actual mutation query that is sent.
You can definitely ask an individual container to refetch it's date using forceFetch, though. So if you have a container that fetches the notifications (I suspect you do), simply trigger forceFetch on it every N seconds.

Related

Child service data access to other services with Apollo Federation configuration

We've been using Apollo Federation for about 1.5 years as our main api. Behind the federation gateway are 6 child graphql services which are all combined at the gateway. This configuration really works excellent when you have a result set of data which spans the different services. E.g. A list of tickets which references the user who purchased and event it is associated with it, etc.
One place we have experienced this breaking down is when a pre-set of data is needed which is already defined in another child service (or across other child services) (resolver/path). There is no way (that has been discovered by us) to query the federation from a child service to get a federated set of data for use by a resolver to work with that data.
For example, say we have a graphql query defined which queries all tickets for an event, and through federation returns the purchaser's data, the event's data and the products data. If I need this data set from a resolver, I would need to make all those queries again myself duplicating dataSource logic and having to match up the data in code.
One crazy thought which came up is to setup apollo-datasource-rest dataSource to make queries against our gateway end point as a dataSource for our resolvers. This way we can request the data we need and let Apollo Federation stitch all the data together as it is designed to do. So instead of the resolver querying the database for all the different pieces of data and then matching them up, we would request the data from our graphql gateway where this query is already defined.
What we are trying to avoid by doing this is having a repeated set of queries in child services to get the details which are already available in (or across) other services.
The question
Is this a really bad idea?
Is it a plausible idea?
Has anyone tried something like this before?
Yes we would have to ensure that there aren't circular dependencies on the resolvers. In our case I see the "dataSource accessing the gateway" utilized in gathering initial data in mutations.
Example of a federated query. In this query, event, allocatedTo, purchasedBy, and product are all types in other services. event is an Event type, allocatedTo and purchasedBy are a Profile type, and product is a Product type. This query provides me with all the data I would use to say, send an email notification to the people in the result set. Though to get this data from a resolver in a mutation to queue up those emails means I need to make many queries and align all the data through code myself instead of using the Gateway/federation which does this already with the already established query. The thought around using apollo-datasource-rest to query our own gateway is get at this data in this form. Not through separate queries and code to align id's etc.
query getRegisteredUsers($eventId: ID!) {
communications {
event(eventId: $eventId) {
registered {
event {
name
}
isAllocated,
hasCheckedIn,
lastUpdatedAt,
allocatedTo {
firstName
lastName
email
}
purchasedBy {
id
firstName
lastName
}
product {
__typename
...on Ticket {
id
name
}
}
}
}
}
}
FYI, I didn't quite understand the question until I looked at your edits, which had some examples.
Is this a really bad idea?
In my experience, yes. Not as an idea, as you're in good company with other very smart people who have done this.
Is it a plausible idea?
Absolutely it's plausible, but I don't recommend it.
Has anyone tried something like this before?
Yes, but I hope you don't.
Your Question
Having resolvers make requests back to the Gateway:
I do not recommend this. I've seen this happen, and I've personally worked to help companies out of the mess this takes you into. Circular dependencies are going to happen. Latency is just going to skyrocket as you have more and more hops, TLS handshakes, etc. Do orchestration instead. It feels weird to introduce non-GraphQL, but IMO in the end it's way simpler, faster, and more maintainable than where "just talk to the gateway" takes you.
What then?
When you're dealing with some mutations which require data from across multiple data sources to be able to process a single thing (like sending a transaction email to a person), you have some choices. Something that helped me figure this out was the question "how would I have done this before GraphQL?"
Orchestration: you have a single "orchestration service", which takes the mutation and makes calls (preferably non-GraphQL, so REST, gRPC, Lambda?) to the owner services to collect the data. The orchestration layer does NOT own data, but it can speak with the other services. It's like Federation, but for sending the data into the request, instead of into the response.
Choreography: you trigger roughly the same thing, but via an event stream. (doesn't work as well with the request / response model of GraphQL)
CQRS (projections): Copies of database data, used for things like reporting. CQRS is basically "the way you read data doesn't have to be the same as the way you write it", and it allows for things like event-sourced data. If all of your data sources actually share the same database, you don't even need "projections" as much as you would just want a read replica. If you're not at enough scale to do replicas, just skip it and promise never to write data that your current domain doesn't own.
What I Do
Where I work, I have gotten us to:
Queries
queries always start with "one database call".
if the "one database call" goes to one domain of data (most often true), that query goes into one service, and Federation fills in the leaves of your tree. If you really follow CQRS, this could go the same way as #3, but we don't.
if your "one database call" needs data from across domains (e.g. get all orders with Product X in it, but sorted by the customer's first name), you need a database projection. Preferably this can be handled by a "reporting service": it doesn't OWN any data, but it READS all data.
Mutations
if your top-level mutation modifies acts only within one domain, the mutation goes in a service, it can use database transactions, and Federation fills in the leaves
if your mutation is required to write across multiple domains and requires immediate consistency (placing an order with inventory, payments, etc), we chose orchestration to write across multiple services (and roll-back when necessary, since we don't have database transactions to do it for us).
if your mutation requires data from many places to send further into the request (like sending an email), we chose orchestration to pull from the multiple services and to push that data down. This feels very much like Federation, but in reverse.

GraphQL Asynchronous query results

I'm trying to implement a batch query interface with GraphQL. I can get a request to work synchronously without issue, but I'm not sure how to approach making the result asynchronous. Basically, I want to be able to kick off the query and return a pointer of sorts to where the results will eventually be when the query is done. I'd like to do this because the queries can sometimes take quite a while.
In REST, this is trivial. You return a 202 and return a Location header pointing to where the client can go to fetch the result. GraphQL as a specification does not seem to have this notion; it appears to always want requests to be handled synchronously.
Is there any convention for doing things like this in GraphQL? I very much like the query specification but I'd prefer to not leave the client HTTP connection open for up to a few minutes while a large query is executed on the backend. If anything happens to kill that connection the entire query would need to be retried, even if the results themselves are durable.
What you're trying to do is not solved easily in a spec-compliant way. Apollo introduced the idea of a #defer directive that does pretty much what you're looking for but it's still an experimental feature. I believe Relay Modern is trying to do something similar.
The idea is effectively the same -- the client uses a directive to mark a field or fragment as deferrable. The server resolves the request but leaves the deferred field null. It then sends one or more patches to the client with the deferred data. The client is able to apply the initial request and the patches separately to its cache, triggering the appropriate UI changes each time as usual.
I was working on a similar issue recently. My use case was to submit a job to create a report and provide the result back to the user. Creating a report takes couple of minutes which makes it an asynchronous operation. I created a mutation which submitted the job to the backend processing system and returned a job ID. Then I periodically poll the jobs field using a query to find out about the state of the job and eventually the results. As the result is a file, I return a link to a different endpoint where it can be downloaded (similar approach Github uses).
Polling for actual results is working as expected but I guess this might be better solved by subscriptions.

Am I misusing GraphQL if I must decompose REST data, then re-aggregate it?

We are considering using GraphQL on top of a REST service (using the
FHIR standard for medical records).
I understand that the pattern with GraphQL is to aggregate the results
of multiple, independent resolvers into the final result. But a
FHIR-compliant REST server offers batch endpoints that already aggregate
data. Sometimes we’ll need à la carte data—a patient’s age or address
only, for example. But quite often, we’ll need most or all of the data
available about a particular patient.
So although we can get that kind of plenary data from a single REST call
that knits together multiple associations, it seems we will need to
fetch it piecewise to do things the GraphQL way.
An optimization could be to eager load and memoize all the associated
data anytime any resolver asks for any data. In some cases this would be
appropriate while in other cases it would be serious overkill. But
discerning when it would be overkill seems impossible given that
resolvers should be independent. Also, it seems bloody-minded to undo
and then redo something that the REST service is already perfectly
capable of doing efficiently.
So—
Is GraphQL the wrong tool when it sits on top of a REST API that can
efficiently aggregate data?
If GraphQL is the right tool in this situation, is eager-loading and
memoization of associated data appropriate?
If eager-loading and memoization is not the right solution, is there
an alternative way to take advantage of the REST service’s ability
to aggregate data?
My question is different from
this
question and
this
question because neither touches on how to take advantage of another
service’s ability to aggregate data.
An alternative approach would be to parse the request inside the resolver for a particular query. The fourth parameter passed to a resolver is an object containing extensive information about the request, including the selection set. You could then await the batched request to your API endpoint based on the requested fields, and finally return the result of the REST call, and let your lower level resolvers handle parsing it into the shape the data was requested in.
Parsing the info object can be a PITA, although there's libraries out there for that, at least in the Node ecosystem.

How do you prevent nested attack on GraphQL/Apollo server?

How do you prevent a nested attack against an Apollo server with a query such as:
{
authors {
firstName
posts {
title
author {
firstName
posts{
title
author {
firstName
posts {
title
[n author]
[n post]
}
}
}
}
}
}
}
In other words, how can you limit the number of recursions being submitted in a query? This could be a potential server vulnerability.
As of the time of writing, there isn't a built-in feature in GraphQL-JS or Apollo Server to handle this concern, but it's something that should definitely have a simple solution as GraphQL becomes more popular. This concern can be addressed with several approaches at several levels of the stack, and should also always be combined with rate limiting, so that people can't send too many queries to your server (this is a potential issue with REST as well).
I'll just list all of the different methods I can think of, and I'll try to keep this answer up to date as these solutions are implemented in various GraphQL servers. Some of them are quite simple, and some are more complex.
Query validation: In every GraphQL server, the first step to running a query is validation - this is where the server tries to determine if there are any serious errors in the query, so that we can avoid using actual server resources if we can find that there is some syntax error or invalid argument up front. GraphQL-JS comes with a selection of default rules that follow a format pretty similar to ESLint. Just like there is a rule to detect infinite cycles in fragments, one could write a validation rule to detect queries with too much nesting and reject them at the validation stage.
Query timeout: If it's not possible to detect that a query will be too resource-intensive statically (perhaps even shallow queries can be very expensive!), then we can simply add a timeout to the query execution. This has a few benefits: (1) it's a hard limit that's not too hard to reason about, and (2) this will also help with situations where one of the backends takes unreasonably long to respond. In many cases, a user of your app would prefer a missing field over waiting 10+ seconds to get a response.
Query whitelisting: This is probably the most involved method, but you could compile a list of allowed queries ahead of time, and check any incoming queries against that list. If your queries are totally static (you don't do any dynamic query generation on the client with something like Relay) this is the most reliable approach. You could use an automated tool to pull query strings out of your apps when they are deployed, so that in development you write whatever queries you want but in production only the ones you want are let through. Another benefit of this approach is that you can skip query validation entirely, since you know that all possible queries are valid already. For more benefits of static queries and whitelisting, read this post: https://dev-blog.apollodata.com/5-benefits-of-static-graphql-queries-b7fa90b0b69a
Query cost limiting: (Added in an edit) Similar to query timeouts, you can assign a cost to different operations during query execution, for example a database query, and limit the total cost the client is able to use per query. This can be combined with limiting the maximum parallelism of a single query, so that you can prevent the client from sending something that initiates thousands of parallel requests to your backend.
(1) and (2) in particular are probably something every GraphQL server should have by default, especially since many new developers might not be aware of these concerns. (3) will only work for certain kinds of apps, but might be a good choice when there are very strict performance or security requirements.
To supplement point (4) in stubailo's answer, here are some Node.js implementations that impose cost and depth bounds on incoming GraphQL documents.
graphql-depth-limit
graphql-validation-complexity
graphql-query-complexity
These are custom rules that supplement the validation phase.
A variation on query whitelisting is query signing.
During the build process, each query is cryptographically signed using a secret which is shared with the server but not bundled with the client. Then at runtime the server can validate that a query is genuine.
The advantage over whitelisting is that writing queries in the client doesn't require any changes to the server. This is especially valuable if multiple clients access the same server (e.g. web, desktop and mobile apps).
Example
In development, you write your queries as usual against your dev server which allows unsigned queries.
Then in your client build step in CI, each query is tagged with its cryptographic signature. This signature is sent by the client as a header to the server when making the request, along with the full GraphQL query string.
Your staging and production servers are configured to require a signed queries. They calculate the signature of the query received in the same way as the CI server did during the build. If the signatures don't match then they don't process the query.
Limitations:
not suitable for public facing APIs since the secret must be shared with developers
clients cannot dynamically build a GraphQL query at runtime using string interpolation, but I've never had a need for this and it is discouraged
For the Query cost limiting you could use graphql-cost-analysis
This is a validation rule which parses the query before executing it. In your GraphQL server you just have to assign a cost configuration for each field of your Schema Type Map you want.
Don't miss graphql-rate-limit 👌a GraphQL directive to add basic but granular rate limiting to your Queries or Mutations.

Parse.com. Execute backend code before response

I need to know the relative position of an object in a list. Lets say I need to know the position of a certain wine of all wines added to the database, based in the votes received by users. The app should be able to receive the ranking position as an object property when retrieving a "wine" class object.
This should be easy to do in the backend side but I've seen Cloud Code and it seems it only is able to execute code before or after saving or deleting, not before reading and giving response.
Any way to do this task?. Any workaround?.
Thanks.
I think you would have to write a Cloud function to perform this calculation for a particular wine.
https://www.parse.com/docs/cloud_code_guide#functions
This would be a function you would call manually. You would have to provide the "wine" object or objectId as a parameter and then get have your cloud function return the value you need. Keep in mind there are limitations on cloud functions. Read the documentation about time limits. You also don't want to make too many API calls every time you run this. It sounds like your computation could be fairly heavy if your dataset is large and you aren't caching at least some of the information.

Resources