Apollo Server Slow Performance when resolving large data - graphql

When resolving large data I notice a very slow performance, from the moment of returning the result from my resolver to the client.
I assume apollo-server iterates over my result and checks the types... either way, the operation takes too long.
In my product I have to return large amount of data all at once, since its being used, all at once, to draw a chart in the UI. There is no pagination option for me where I can slice the data.
I suspect the slowness coming from apollo-server and not my resolver object creation.
Note, that I log the time the resolver takes to create the object, its fast, and not the bottle neck.
Later operations performed by apollo-server, which I dont know how to measure, takes a-lot of time.
Now, I have a version, where I return a custom scalar type JSON, the response, is much much faster. But I really prefer to return my Series type.
I measure the difference between the two types (Series and JSON) by looking at the network panel.
when AMOUNT is set to 500, and the type is Series, it takes ~1.5s (that is seconds)
when AMOUNT is set to 500, and the type is JSON, it takes ~150ms (fast!)
when AMOUNT is set to 1000, and the type is Series, its very slow...
when AMOUNT is set to 10000, and the type is Series, I'm getting JavaScript heap out of memory (which is unfortunately what we experience in our product)
I've also compared apollo-server performance to express-graphql, the later works faster, yet still not as fast as returning a custom scalar JSON.
when AMOUNT is set to 500, apollo-server, network takes 1.5s
when AMOUNT is set to 500, express-graphql, network takes 800ms
when AMOUNT is set to 1000, apollo-server, network takes 5.4s
when AMOUNT is set to 1000, express-graphql, network takes 3.4s
The Stack:
"dependencies": {
"apollo-server": "^2.6.1",
"graphql": "^14.3.1",
"graphql-type-json": "^0.3.0",
"lodash": "^4.17.11"
}
The Code:
const _ = require("lodash");
const { performance } = require("perf_hooks");
const { ApolloServer, gql } = require("apollo-server");
const GraphQLJSON = require('graphql-type-json');
// The GraphQL schema
const typeDefs = gql`
scalar JSON
type Unit {
name: String!
value: String!
}
type Group {
name: String!
values: [Unit!]!
}
type Series {
data: [Group!]!
keys: [Unit!]!
hack: String
}
type Query {
complex: Series
}
`;
const AMOUNT = 500;
// A map of functions which return data for the schema.
const resolvers = {
Query: {
complex: () => {
let before = performance.now();
const result = {
data: _.times(AMOUNT, () => ({
name: "a",
values: _.times(AMOUNT, () => (
{
name: "a",
value: "a"
}
)),
})),
keys: _.times(AMOUNT, () => ({
name: "a",
value: "a"
}))
};
let after = performance.now() - before;
console.log("resolver took: ", after);
return result
}
}
};
const server = new ApolloServer({
typeDefs,
resolvers: _.assign({ JSON: GraphQLJSON }, resolvers),
});
server.listen().then(({ url }) => {
console.log(`🚀 Server ready at ${url}`);
});
The gql Query for the Playground (for type Series):
query {
complex {
data {
name
values {
name
value
}
}
keys {
name
value
}
}
}
The gql Query for the Playground (for custom scalar type JSON):
query {
complex
}
Here is a working example:
https://codesandbox.io/s/apollo-server-performance-issue-i7fk7
Any leads/ideas would be highly appreciated!

There's a related open issue here. Lee Byron summed it up pretty well:
I think the TL;DR of this issue is that GraphQL has some overhead and that reducing that overhead is non-trivial and removing it completely may not be an option. Ultimately GraphQL.js is still responsible for making API boundary guarantees about the shape and type of the returned data and by design does not trust the underlying systems. In other words GraphQL.js does runtime type checking and sub-selection and this has some cost.
The benefits that GraphQL offers (validation, sub-selection, etc.) inevitably incur some overhead as they require additional processing of the data you're returning. And unfortunately, this overhead scales with the size of the data. I imagine if you were to implement a REST endpoint that supported partial responses and did response validation using something like Swagger or Joi, you'd encounter a similar issue.
The "heap out of memory" error means exactly what it says -- you're running out of memory on the heap. You can try to alleviate this by manually increasing the limit.
Typically, large datasets like this should be broken up by implementing pagination. If that's not an option, utilizing a custom scalar will be the next best approach. The biggest downside to this approach is that clients consuming your API will not be able to request specific fields inside the JSON object you return. Outside of patching GraphQL.js, there's really no other alternative to speed up the responses and reduce your memory usage.

Comment summary
This data structure/types:
are not individual entities;
just a series of [groupped] data;
don't need normalization;
won't be normalized properly in apollo cache (no id fields);
This way this dataset is not the graphQL was designed for. Of course graphQL still can be used for fetching this data but type parsing/matching should be disabled.
Using custom scalar types (graphql-type-json) can be a solution. If you need some hybrid solution - you can type Group.values as json (instead entire Series). Groups still should have an id field if you want to use normalized cache [access].
Alternative
You can use apollo-link-rest for fetching 'pure' json data (file) leaving type parsing/matching to be client side only.
More advanced alternative
If you want to use one graphql endpoint ...
write own link - use directives - 'ask for json, get typed' - mix of two above. Sth like in rest link with de-/serializers.
In both alternatives - why do you really need it? Just for drawing? Not worth the effort. No pagination but hopefully streaming (live updates?) ... no cursors ... load more (subscriptions/polling) by ... last time update? Doable but 'not feel right'.

Related

Apollo graphql client - field policies - No read function for parent types

I can’t seem to find a way to read an entire type without having to resort to individual fieldPolicies for every field in that type.
const cache = new InMemoryCache({
typePolicies: {
SomeType: {
fields:{
// defining individual field (READ) policies would be insane (at least for my case)
// is there at least something like a wildcard mechanism?
},
merge, // yeah... possible at type level
read // ??? not possible (WHYYYY), so... Is there any other way to do this?
}
}
})

Any way to split up multiple Fragment expansions for a GraphQL query into multiple calls?

Context
This problem is likely predicated on certain choices, some of which are changeable and some of which are not. We are using the following technologies and frameworks:
Relay / React / TypeScript
ContentStack (CMS)
Problem
I'm attempting to create a highly customizable page that can be built from multiple kinds of UI components based on the data presented to it (to allow pages to be built using a CMS using prefab UI in an unpredictable order).
My first attempt at this was to create a set of fragments for the potential UI components that may be referenced in an array:
query CustomPageQuery {
title
description
customContentConnection {
edges {
node {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
"""
Further fragments are added here as we add more kinds of UI
"""
}
}
}
}
In the CMS we're using (ContentStack), the complexity of this query has grown to the point that it is rejected because it requires too many calls to the database in a single query. For that reason, I'm hoping there's a way I can split up the calls for the fragments so that they are not part of the initial query, or some similar solution that results in splitting up this query into multiple pieces.
I was hoping the #defer directive would solve this for me, but it's not supported by relay-compiler.
Any ideas?
Sadly #defer is still not a standard so it is not supported by most implementation (since you would also need the server to support it).
I am not sure if I understand the problem correctly, but you might want to look more toward using #skip or #include to only fetch the fragment you need depending on the type of the thing. But it would require the frontend to know what it wants to query beforehand.
query CustomPageQuery($hero: Boolean, $tweet: Boolean, $video: Boolean) {
title
description
customContentConnection {
edges {
node {
... HeroFragment #include(if: $hero)
... TweetBlockFragment #include(if: $tweet)
... EmbeddedVideoFragment #include(if: $video)
}
}
}
}
Generally you want to be able to discriminate the type without having to do a database query. So say:
type Hero {
id: ID
name: String
}
type Tweet {
id: ID
content: String
}
union Content = Hero | Tweet
{
Content: {
__resolveType: (parent, ctx) => {
// That should be able to resolve the type without a DB query
},
}
}
Once that is passed, each fragment is then resolved, making more database queries. If those are not properly batched with dataloaders then you have a N+1 problem. I am not sure how much control (if at all) you have on the backend but there is no silver bullet for your problem.
If you can't make optimizations on the backend then I would suggest trying to limit the connection. They seem to be using cursor based pagination, so you start with say first: 10 and once the first batch is returned, you can query the next elements by setting the after to the last cursor of the previous batch:
query CustomPageQuery($after: String) {
customContentConnection(first: 10, after: $after) {
edges {
cursor
node {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
}
}
pageInfo {
hasNextPage
}
}
}
As a last resort, you could try to first fetch all the IDs and then do subsequent queries to the CMS for each id (using aliases I guess) or type (if you can filter on the connection field). But I feel dirty just writing it so avoid it if you can.
{
one: node(id: "UUID1") {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
}
two: node(id: "UUID2") {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
}
}

Apollo GraphQL - How do I use an RxJS Subject as a variable with Apollo Client?

My type-ahead search was working great with REST but I'm converting to GraphQL, which has its challenges.
As the user types a last name into a form field the suggested results display in a data table below. Each letter is handled by the RxJS Subject.
The var searchTerm$ is a type of RXJS observable called a Subject binds to the HTML. The following is called from the OnViewInit lifecycle hook in an Angular app. The search is by the database column last_name.
However, this results in a Bad Request 400 error as the view loads and search doesn't work. I thought maybe this calls for a subscription but everything I find on those is about using web sockets to connect to a remote URL and server. Where do I go from here?
I'm using the Angular Apollo client with Apollo Express but I would be happy with any JS solution and try to figure it out from there. The server side is Nestjs which just wraps Apollo Server.
const lastNameSearch = gql `
query ($input: String!) {
lastNameSearch(input: $input) {
first_name
last_name
user_name
pitch
main_skill_title
skills_comments
member_status
}
}`;
this.apollo
.watchQuery({
query: lastNameSearch,
variables: {
last_name: searchTerm$, // Trying to use the observable here.
},
})
.valueChanges
.subscribe(result => {
console.log('data in lastNameSearch: ', result);
}),
The schema on the server:
lastNameSearch(input: String!): [Member]
The resolver:
#Query()
async lastNameSearch(#Args('input') input: String) {
const response = await this.membersService.lastNameSearch(input);
return await response;
}
Edit:
The error from the Network panel in dev tools. Console message worthless.
{"errors":[{"message":"Variable \"$input\" of required type \"String!\" was not provided.","locations":[{"line":1,"column":8}],"extensions":{"code":"INTERNAL_SERVER_ERROR","exception":{"stacktrace":["GraphQLError: Variable \"$input\" of required type \"String!\" was not provided."," at getVariableValues
And this goes on showing properties and methods in the app for another 300 lines or so.
First, a big thank you to the amazing Daniel Rearden for his help on various questions as I and lots of others on SO learn GraphQL! He has patience!
As Daniel pointed out in comments I had a simple mistake. I'll point it out in the commented code below. However, the big issue was trying to use an observable, subject, or similar method as a variable. Even if the RxJS subject is emitting a string GraphQL will hate trying to use a large object as a var. So I had to use a little reactive programming to solve this.
Setup the observable:
public searchTerm$ = new Subject<string>(); // Binds to the html text box element.
Second, let's set this up in a lifecycle hook where we subscribe to the observable so it will emit letters one at a time as they are typed into an input box.
ngAfterViewInit() {
let nextLetter: string;
// -------- For Last Name Incremental Query --------- //
this.searchTerm$.subscribe(result => {
nextLetter = result; // Setup a normal variable.
this.queryLastName(nextLetter); // Call the GraphQL query below.
});
}
Last step we have the GraphQL query and consuming the returned data object. This works perfect to say type a 'p' into the form and get back from a db all the last names starting with 'p' or 'P'. Type 'r' and the results narrow to last names starting with 'pr', and so on.
private queryLastName(nextLetter) {
const lastNameSearch = gql`
query ($input: String!) {
lastNameSearch(input: $input) {
first_name
last_name
user_name
pitch
main_skill_title
skills_comments
member_status
}
}`;
this.apollo
.watchQuery({
query: lastNameSearch,
variables: {
input: nextLetter, // Notice I had used last_name here instead of input.
},
})
.valueChanges
.subscribe(result => {
// Put the data into some UI in your app, in this case
// an Angular Material data table.
// Notice how we get the data from the returning object.
// The avoids the dreaded "null" error when the shape of the
// returned data doesn't match the query. This put an array
// of objects into the UI.
this.dataSource.data = result.data['lastNameSearch'];
},
);
}

Can you request a literal value in graphql?

I'm trying to figure out how to mock requests when the client is ahead of the server. I'd like to be able to just request literals so that I can go back and change them later, is there a way to do something like this?
query myQuery {
type {
fieldName: 42
}
}
Yes, it is quite easy to set up mock responses if you have at least the server boilerplate code set up. If you are using Apollo, there are built in tools to facilitate mocks.
https://www.apollographql.com/docs/graphql-tools/mocking.html
From the docs:
The strongly-typed nature of a GraphQL API lends itself extremely well
to mocking. This is an important part of a GraphQL-First development
process, because it enables frontend developers to build out UI
components and features without having to wait for a backend
implementation.
Here is an example from the docs:
import { makeExecutableSchema, addMockFunctionsToSchema } from 'graphql-tools';
import { graphql } from 'graphql';
// Fill this in with the schema string
const schemaString = `...`;
// Make a GraphQL schema with no resolvers
const schema = makeExecutableSchema({ typeDefs: schemaString });
// Add mocks, modifies schema in place
addMockFunctionsToSchema({ schema });
const query = `
query tasksForUser {
user(id: 6) { id, name }
}
`;
graphql(schema, query).then((result) => console.log('Got result', result));
This mocking logic simply looks at your schema and makes sure to
return a string where your schema has a string, a number for a number,
etc. So you can already get the right shape of result. But if you want
to use the mocks to do sophisticated testing, you will likely want to
customize them to your particular data model.

Deleting Apollo Client cache for a given query and every set of variables

I have a filtered list of items based on a getAllItems query, which takes a filter and an order by option as arguments.
After creating a new item, I want to delete the cache for this query, no matter what variables were passed. I don't know how to do this.
I don't think updating the cache is an option. Methods mentionned in Apollo Client documentation (Updating the cache after a mutation, refetchQueries and update) all seem to need a given set of variables, but since the filter is a complex object (with some text information), I would need to update the cache for every given set of variables that were previously submitted. I don't know how to do this. Plus, only the server does know how this new item impact pagination and ordering.
I don't think fetch-policy (for instance setting it to cache-and-network) is what I'm looking for, because if accessing the network is what I want after having created a new item, when I'm just filtering the list (typing in a string to search), I want to stay with the default behavior (cache-only).
client.resetStore would reset the store for all type of queries (not only the getAllItems query), so I don't think it's what I'm looking for either.
I'm pretty sure I'm missing something here.
There's no officially supported way of doing this in the current version of Apollo but there is a workaround.
In your update function, after creating an item, you can iterate through the cache and delete all nodes where the key starts with the typename you are trying to remove from the cache. e.g.
// Loop through all the data in our cache
// And delete any items where the key start with "Item"
// This empties the cache of all of our items and
// forces a refetch of the data only when it is next requested.
Object.keys(cache.data.data).forEach(key =>
key.match(/^Item/) && cache.data.delete(key)
)
This works for queries that exist a number of times in the cache with different variables, i.e. paginated queries.
I wrote an article on Medium that goes in to much more detail on how this works as well as an implementation example and alternative solution that is more complicated but works better in a small number of use cases. Since this article goes in to more detail on a concept I have already explained in this answer, I believe it is ok to share here: https://medium.com/#martinseanhunt/how-to-invalidate-cached-data-in-apollo-and-handle-updating-paginated-queries-379e4b9e4698
this worked for me (requires apollo 2 for cache eviction feature) - clears query matched by regexp from cache
after clearing cache query will be automatically refeteched without need to trigger refetch manually (if you are using angular: gql.watch().valueChanges will perform xhr request and emit new value)
export const deleteQueryFromCache = (cache: any, matcher: string | RegExp): void => {
const rootQuery = cache.data.data.ROOT_QUERY;
Object.keys(rootQuery).forEach(key => {
if (key.match(matcher)) {
cache.evict({ id: "ROOT_QUERY", fieldName: key })
}
});
}
ngrx like
resolvers = {
removeTask(
parent,
{ id },
{ cache, getCacheKey }: { cache: InMemoryCache | any; getCacheKey: any }
) {
const key = getCacheKey({ __typename: "Task", id });
const { [key]: deleted, ...data } = cache.data.data;
cache.data.data = { ...data };
return id;
}
}

Resources