I have a GraphQL api implemented with Nexus-graphql and Prisma + Postgres.
Prisma has a nice solution for N+1 issue with query batching when it comes to GraphQL but seems to be working on 2 levels of nesting, e.g. User -> Posts, but when we need multiple levels of GraphQL nesting (User -> Posts -> Comments -> Likes -> Users) it seems to execute hundreds of queries.
What are the general best practices for doing query optimization in such cases?
I have 2 ideas but maybe there are more:
Using prisma's "include" option. Construct a map of prisma relations based on the received graphql query (from "info" field) and query all the relations at once in a single query.
Using a dataloader, query all necessary data and save to dataloader and within the resolvers load models from dataloder.
Related
1. I want to create a GraphQL query that can filter by multiple records.
It is about filtering details of statistics. For example, the statistic contains the fields "Number of deaths", "Number of cases", "Number of recovered".
I have already written queries that can filter by the individual fields. Now I want to program a query that uses multiple filters, or a query in which multiple queries are nested.
I have already tried to define the individual steps of each query in a common query. You can see this in the attached images. The program compiles first. However, when I execute the query in the GraphQL UI, I get error messages
2. Unfortunately, I have not yet received any helpful tips regarding my query or my error.
Screenshot
At the top left you can see the individual queries, at the top right the merged query and at the bottom the errors as soon as I try to execute the query.
GraphQL related resources explain how query analysis can be done to protect the GraphQL endpoint. Some of the approaches which are being used are query depth analysis, query complexity analysis, etc. The question that I have is, does Query Analysis only refer to Queries? What about Mutations and Subscriptions? Or are all 3 (Query, Mutation, Subscription) included when we talk about query analysis?
Apollo Tracing is one of the Query Analysis Tools and based on what I have used it in graphql-java , it can be used on all Query , Mutation and Subscription. So , I believe the term Query Analysis can apply to all of them.
After all , all of them are handled in a pretty much the same way internally inside a GraphQL engine as defined by the spec. In the Execution Operation section, you can see both Query and Mutation has the same execution logic.The only differences are that Query is allow to execute in parallel while mutation can only execute in serial.
Then in the subscription response stream section ,it mentions :
The ExecuteSubscriptionEvent() algorithm is intentionally similar to
ExecuteQuery() since this is how each event result is produced.
which mean at the end, its execution logic is the same as Query.
I am just using Graphdb EE for evaluation.
I intend to migrate my bigdata from Cassandra to Graphdb but i read the docs that it can contain 2^40 entity = 2,000B entities. I have few questions regarding it:
Is a way to extend to unlimited entities?
I want to use many repositories to manage my data and the way to connect them to use as single repo ?
Is there a way to search on multiple entites and on multiple properties (already indexed on elasticsearch) /entity?
Do i need to create each ES connector all properties /per entities to get the best performance?
David, please, see below quick answers.
Is a way to extend to unlimited entities?
2^40 means 1T entities. Do you really need more than this?
Entities in GraphDB are the nodes in the graph: URI, literals, blank lists. On average, you would have a multiple edges/statements per node (say 5x).
I want to use many repositories to manage my data and the way to connect them to use as single repo?
Yes, please see the so called internal federation, which allows you to efficiently do federation in a SPARQL query, across repositories in one and the same GraphDB instance.
Is there a way to search on multiple entites and on multiple properties (already indexed on elasticsearch) / entity?
I am not sure I understand your questions. You can definitely embed multiple FTS queries in a single SPARQL query. Those FTS queries can search for different entities using different fields. You can read more on this here.
Do I need to create each ES connector all properties / per entities to get the best performance?
You can have multiple indices for one and the same repo. The best way to boost performance is to have specific indices (on specific properties/fields with specific filters) for those queries which are most critical for you.
The overhead of adding indexes is well-documented, but I have not been able to find good information on when to use multiple indexes with regards to the various document types being indexed.
Here is a generic example to illustrate the question:
Say we have the following entities
Products (Name, ProductID, ProductCategoryID, List-of-Stores)
Product Categories (Name, ProductCategoryID)
Stores (Name, StoreID)
Should I dump these three different types of documents into a single index, each with the appropriate elasticsearch type?
I am having difficulty establishing where to the draw the line on one vs. multiple indexes.
What if we add an unrelated entity, "Webpages". Definitely a separate index?
A very interesting video explaining elasticsearch "Data Design Patterns" by Shay Banon:
http://vimeo.com/44716955
This exact question is answered at 13:40 where examining different data flows, by looking at the concepts of Type, Filter and Routing
Regards
I was recently modeling a ElasticSearch backend from scratch and from my point of view, the best option is putting all related documents types in the same index.
I read that some people had problems with too many concurrent indexes (1 index per type). It's better for performance and robustness to unify related types in the same index.
Besides, if the types are in the same index you can use "_parent" field to create hierarquical models that allow to you interesting features for search as "has_child" and "has_parent" and of course you have not to duplicate data in your model.
Sorry for the ambiguous title, couldn't thing of anything better fitting.
I 'm exploring Elastic Search and it looks very cool. My question is conceptual since I 'm used to sql.
In Sql, you have different databases and you store the data for each application there. Does the same concept exist in ES? Or is all data from all my application going to end up in the same place? In that case, what are the best practices to avoid unwanted results from unfitting data?
Schemaless doesn't mean structureless:
In elastic search you can organize your data into document collections
A top-level document collection is roughly equivalent to a database
You can also hierarchically create new document collections inside top-level collections, which is a very rough equivalent of a database table
When you search documents, you search for documents inside specific document collections (such as search for all posts inside blog1)
Individual documents can be viewed as equivalent to rows in a database table
Also please note that I say roughly equivalent -- data in SQL is often normalized into tables by relations, while documents (in ES) often hold large entities of data. For instance, it generally makes sense to embed all comments inside a blog post document, whereas in SQL you would normalize comments and blogposts into individual tables.
For a nice tutorial, I recommend taking look at "ElasticSearch in 5 minutes" tutorial.
Switching from SQL to a search engine can be challenging at times. Elasticsearch has a concept of index, that can be roughly mapped to a database and type that can, again very roughly, mapped to a table. Elasticsearch has very powerful mechanism of selecting records (rows) of a single type and combining results from different types and indices (union). However, there is no support for joins at the moment. The only relationship that elasticsearch supports is has_child, but it's not suitable for modeling many-to-many relationships. So, in most cases, you need to be prepared to denormalize your data, so it can be stored in a single table.