I'm implementing graphql client side implementation with React and Apollo.
Situation
One of the queries created in server side is a wrapper object for a group of related queries, just as a namespace for grouping them:
query Metrics {
metrics {
cpu {
cores
avgUtilization
}
memory {
size
max
}
disc {
usage
freespace
}
}
}
so all types cpu, memory and disk are implemented as child/leaf types under the main type metrics which serves as just a namespace.
The problem
If we have three different queries, one for running each. First we run MetricsCPU
query MetricsCPU {
metrics {
cpu {
cores
avgUtilization
}
}
}
and a while later... running MetricsMemory
query MetricsMemory {
metrics {
memory {
size
max
}
}
}
After this last query MetricsMemory returns, the first query MetricsCPU cache invalidates and it's automatically refetched.
Question
How can I prevent MetricsCPU from being invalidated and refetched after MetricsMemory returns?
I do not know an easy solution.
One option is to remove grouping on BE which sounds a bit weird.
Related
Context
This problem is likely predicated on certain choices, some of which are changeable and some of which are not. We are using the following technologies and frameworks:
Relay / React / TypeScript
ContentStack (CMS)
Problem
I'm attempting to create a highly customizable page that can be built from multiple kinds of UI components based on the data presented to it (to allow pages to be built using a CMS using prefab UI in an unpredictable order).
My first attempt at this was to create a set of fragments for the potential UI components that may be referenced in an array:
query CustomPageQuery {
title
description
customContentConnection {
edges {
node {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
"""
Further fragments are added here as we add more kinds of UI
"""
}
}
}
}
In the CMS we're using (ContentStack), the complexity of this query has grown to the point that it is rejected because it requires too many calls to the database in a single query. For that reason, I'm hoping there's a way I can split up the calls for the fragments so that they are not part of the initial query, or some similar solution that results in splitting up this query into multiple pieces.
I was hoping the #defer directive would solve this for me, but it's not supported by relay-compiler.
Any ideas?
Sadly #defer is still not a standard so it is not supported by most implementation (since you would also need the server to support it).
I am not sure if I understand the problem correctly, but you might want to look more toward using #skip or #include to only fetch the fragment you need depending on the type of the thing. But it would require the frontend to know what it wants to query beforehand.
query CustomPageQuery($hero: Boolean, $tweet: Boolean, $video: Boolean) {
title
description
customContentConnection {
edges {
node {
... HeroFragment #include(if: $hero)
... TweetBlockFragment #include(if: $tweet)
... EmbeddedVideoFragment #include(if: $video)
}
}
}
}
Generally you want to be able to discriminate the type without having to do a database query. So say:
type Hero {
id: ID
name: String
}
type Tweet {
id: ID
content: String
}
union Content = Hero | Tweet
{
Content: {
__resolveType: (parent, ctx) => {
// That should be able to resolve the type without a DB query
},
}
}
Once that is passed, each fragment is then resolved, making more database queries. If those are not properly batched with dataloaders then you have a N+1 problem. I am not sure how much control (if at all) you have on the backend but there is no silver bullet for your problem.
If you can't make optimizations on the backend then I would suggest trying to limit the connection. They seem to be using cursor based pagination, so you start with say first: 10 and once the first batch is returned, you can query the next elements by setting the after to the last cursor of the previous batch:
query CustomPageQuery($after: String) {
customContentConnection(first: 10, after: $after) {
edges {
cursor
node {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
}
}
pageInfo {
hasNextPage
}
}
}
As a last resort, you could try to first fetch all the IDs and then do subsequent queries to the CMS for each id (using aliases I guess) or type (if you can filter on the connection field). But I feel dirty just writing it so avoid it if you can.
{
one: node(id: "UUID1") {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
}
two: node(id: "UUID2") {
... HeroFragment
... TweetBlockFragment
... EmbeddedVideoFragment
}
}
Is there a way in Parse Platform to fallback to local data store if there is no connection ?
I understand that there is pin/pinInBackground, so I can pin any object to the LocalDataStore.
Then I can query the localdatastore to get that info.
However, I want always to try to get first the server data, and if it fails, get the local data.
Is there a way to do this automatically?
(or I have to pin everything locally, then query remote and if it fails, then query locally)
Great question.
Parse has the concept of cached queries. https://docs.parseplatform.org/ios/guide/#caching-queries
The interesting feature of cached queries is that you can specify - "if no network". However this only works if you have previously cached the query results. I've also found that the delay between losing network connectivity and the cached query recognising that its lost network makes the whole capability a bit rubbish.
How I have resolved this issue is using a combination of the AlamoFire library and pinning objects. The reason I chose to use the AlamoFire library is because it's extremely well supported and it spots drops in network connectivity near immediately. I only have a few hundred records so I'm not worrying about pinning objects and definitely performance does not seem to be affected. So how I work this is....
Define some class objects at the top of the class
// Network management
private var reachability: NetworkReachabilityManager!
private var hasInternet: Bool = false
Call a method as the view awakes
// View lifecycle
override func awakeFromNib() {
super.awakeFromNib()
self.monitorReachability()
}
Update object when network availability changes. I know this method could be improved.
private func monitorReachability() {
NetworkReachabilityManager.default?.startListening { status in
if "\(status)" == "notReachable" {
self.hasInternet = false
} else {
self.hasInternet = true
}
print("hasInternet = \(self.hasInternet)")
}
}
Then when I call a query I have a switch as I set up the query object.
// Start setup of query
let query = PFQuery(className: "mySecretClass")
if self.hasInternet == false {
query.fromLocalDatastore()
}
// Complete rest of query configuration
Of course I pin all the results I ever return from the server.
When resolving large data I notice a very slow performance, from the moment of returning the result from my resolver to the client.
I assume apollo-server iterates over my result and checks the types... either way, the operation takes too long.
In my product I have to return large amount of data all at once, since its being used, all at once, to draw a chart in the UI. There is no pagination option for me where I can slice the data.
I suspect the slowness coming from apollo-server and not my resolver object creation.
Note, that I log the time the resolver takes to create the object, its fast, and not the bottle neck.
Later operations performed by apollo-server, which I dont know how to measure, takes a-lot of time.
Now, I have a version, where I return a custom scalar type JSON, the response, is much much faster. But I really prefer to return my Series type.
I measure the difference between the two types (Series and JSON) by looking at the network panel.
when AMOUNT is set to 500, and the type is Series, it takes ~1.5s (that is seconds)
when AMOUNT is set to 500, and the type is JSON, it takes ~150ms (fast!)
when AMOUNT is set to 1000, and the type is Series, its very slow...
when AMOUNT is set to 10000, and the type is Series, I'm getting JavaScript heap out of memory (which is unfortunately what we experience in our product)
I've also compared apollo-server performance to express-graphql, the later works faster, yet still not as fast as returning a custom scalar JSON.
when AMOUNT is set to 500, apollo-server, network takes 1.5s
when AMOUNT is set to 500, express-graphql, network takes 800ms
when AMOUNT is set to 1000, apollo-server, network takes 5.4s
when AMOUNT is set to 1000, express-graphql, network takes 3.4s
The Stack:
"dependencies": {
"apollo-server": "^2.6.1",
"graphql": "^14.3.1",
"graphql-type-json": "^0.3.0",
"lodash": "^4.17.11"
}
The Code:
const _ = require("lodash");
const { performance } = require("perf_hooks");
const { ApolloServer, gql } = require("apollo-server");
const GraphQLJSON = require('graphql-type-json');
// The GraphQL schema
const typeDefs = gql`
scalar JSON
type Unit {
name: String!
value: String!
}
type Group {
name: String!
values: [Unit!]!
}
type Series {
data: [Group!]!
keys: [Unit!]!
hack: String
}
type Query {
complex: Series
}
`;
const AMOUNT = 500;
// A map of functions which return data for the schema.
const resolvers = {
Query: {
complex: () => {
let before = performance.now();
const result = {
data: _.times(AMOUNT, () => ({
name: "a",
values: _.times(AMOUNT, () => (
{
name: "a",
value: "a"
}
)),
})),
keys: _.times(AMOUNT, () => ({
name: "a",
value: "a"
}))
};
let after = performance.now() - before;
console.log("resolver took: ", after);
return result
}
}
};
const server = new ApolloServer({
typeDefs,
resolvers: _.assign({ JSON: GraphQLJSON }, resolvers),
});
server.listen().then(({ url }) => {
console.log(`🚀 Server ready at ${url}`);
});
The gql Query for the Playground (for type Series):
query {
complex {
data {
name
values {
name
value
}
}
keys {
name
value
}
}
}
The gql Query for the Playground (for custom scalar type JSON):
query {
complex
}
Here is a working example:
https://codesandbox.io/s/apollo-server-performance-issue-i7fk7
Any leads/ideas would be highly appreciated!
There's a related open issue here. Lee Byron summed it up pretty well:
I think the TL;DR of this issue is that GraphQL has some overhead and that reducing that overhead is non-trivial and removing it completely may not be an option. Ultimately GraphQL.js is still responsible for making API boundary guarantees about the shape and type of the returned data and by design does not trust the underlying systems. In other words GraphQL.js does runtime type checking and sub-selection and this has some cost.
The benefits that GraphQL offers (validation, sub-selection, etc.) inevitably incur some overhead as they require additional processing of the data you're returning. And unfortunately, this overhead scales with the size of the data. I imagine if you were to implement a REST endpoint that supported partial responses and did response validation using something like Swagger or Joi, you'd encounter a similar issue.
The "heap out of memory" error means exactly what it says -- you're running out of memory on the heap. You can try to alleviate this by manually increasing the limit.
Typically, large datasets like this should be broken up by implementing pagination. If that's not an option, utilizing a custom scalar will be the next best approach. The biggest downside to this approach is that clients consuming your API will not be able to request specific fields inside the JSON object you return. Outside of patching GraphQL.js, there's really no other alternative to speed up the responses and reduce your memory usage.
Comment summary
This data structure/types:
are not individual entities;
just a series of [groupped] data;
don't need normalization;
won't be normalized properly in apollo cache (no id fields);
This way this dataset is not the graphQL was designed for. Of course graphQL still can be used for fetching this data but type parsing/matching should be disabled.
Using custom scalar types (graphql-type-json) can be a solution. If you need some hybrid solution - you can type Group.values as json (instead entire Series). Groups still should have an id field if you want to use normalized cache [access].
Alternative
You can use apollo-link-rest for fetching 'pure' json data (file) leaving type parsing/matching to be client side only.
More advanced alternative
If you want to use one graphql endpoint ...
write own link - use directives - 'ask for json, get typed' - mix of two above. Sth like in rest link with de-/serializers.
In both alternatives - why do you really need it? Just for drawing? Not worth the effort. No pagination but hopefully streaming (live updates?) ... no cursors ... load more (subscriptions/polling) by ... last time update? Doable but 'not feel right'.
I have a filtered list of items based on a getAllItems query, which takes a filter and an order by option as arguments.
After creating a new item, I want to delete the cache for this query, no matter what variables were passed. I don't know how to do this.
I don't think updating the cache is an option. Methods mentionned in Apollo Client documentation (Updating the cache after a mutation, refetchQueries and update) all seem to need a given set of variables, but since the filter is a complex object (with some text information), I would need to update the cache for every given set of variables that were previously submitted. I don't know how to do this. Plus, only the server does know how this new item impact pagination and ordering.
I don't think fetch-policy (for instance setting it to cache-and-network) is what I'm looking for, because if accessing the network is what I want after having created a new item, when I'm just filtering the list (typing in a string to search), I want to stay with the default behavior (cache-only).
client.resetStore would reset the store for all type of queries (not only the getAllItems query), so I don't think it's what I'm looking for either.
I'm pretty sure I'm missing something here.
There's no officially supported way of doing this in the current version of Apollo but there is a workaround.
In your update function, after creating an item, you can iterate through the cache and delete all nodes where the key starts with the typename you are trying to remove from the cache. e.g.
// Loop through all the data in our cache
// And delete any items where the key start with "Item"
// This empties the cache of all of our items and
// forces a refetch of the data only when it is next requested.
Object.keys(cache.data.data).forEach(key =>
key.match(/^Item/) && cache.data.delete(key)
)
This works for queries that exist a number of times in the cache with different variables, i.e. paginated queries.
I wrote an article on Medium that goes in to much more detail on how this works as well as an implementation example and alternative solution that is more complicated but works better in a small number of use cases. Since this article goes in to more detail on a concept I have already explained in this answer, I believe it is ok to share here: https://medium.com/#martinseanhunt/how-to-invalidate-cached-data-in-apollo-and-handle-updating-paginated-queries-379e4b9e4698
this worked for me (requires apollo 2 for cache eviction feature) - clears query matched by regexp from cache
after clearing cache query will be automatically refeteched without need to trigger refetch manually (if you are using angular: gql.watch().valueChanges will perform xhr request and emit new value)
export const deleteQueryFromCache = (cache: any, matcher: string | RegExp): void => {
const rootQuery = cache.data.data.ROOT_QUERY;
Object.keys(rootQuery).forEach(key => {
if (key.match(matcher)) {
cache.evict({ id: "ROOT_QUERY", fieldName: key })
}
});
}
ngrx like
resolvers = {
removeTask(
parent,
{ id },
{ cache, getCacheKey }: { cache: InMemoryCache | any; getCacheKey: any }
) {
const key = getCacheKey({ __typename: "Task", id });
const { [key]: deleted, ...data } = cache.data.data;
cache.data.data = { ...data };
return id;
}
}
When I insert data through a DAO which references a mybatis mapper, multiple tables are affected.
public void insertStuff(Collection<Stuff> data) {
for (Stuff item : data) {
mapper.insertT1(item.getT1Stuff());
mapper.insertT2(item.getT2Stuff());
Collection<MainStuff> mainData = item.getMainStuff();
for (MainStuff mainItem : mainData) {
mapper.insertMainData(mainItem);
}
}
}
I'm using mybatis' BATCH executor type, but I'm quickly reaching Oracle's MAX_CURSOR limit because a new PreparedStatement (and a new Connection) is created for each of the three mapper statements on each iteration through the main loop. I can avoid this by iterating multiple times through the loop:
public void insertStuff(Collection<Stuff> data) {
for (Stuff item : data) {
mapper.insertT1(item.getT1Stuff());
}
for (Stuff item : data) {
mapper.insertT2(item.getT2Stuff());
}
for (Stuff item : data) {
Collection<MainStuff> mainData = item.getMainStuff();
for (MainStuff mainItem : mainData) {
mapper.insertMainData(mainItem);
}
}
}
However, the latter code is less readable, costs a little bit performance-wise, and breaks modularity.
Is there a better way to do this? Do I need to use the SqlSession directly and flush statements after a certain number are queued?
If you want to use batches you should use second way. In first code you actually don't have any batches. Real batch have N same statements. If you executed 3 different queries and encapsulated them into batch, you jdbc driver will divide them in 3 batches with one query. In second code there will be three batches which is the fastest if you have a lot of data.