I read GraphQL specs and could not find a way to avoid 1 + N * number_of_nested calls, am I missing something?
i.e. a query has a type client which has nested orders and addresses, if there are 10 clients it will do 1 call for the 10 clients + 10 calls for each client.orders + 10 calls for each client.addresses.
Is there a way to avoid this? Not that it is not the same as caching an UUID of something, those are all different values and if you GraphQL points to a database which can make joins, it would be pretty bad on it because you could do 3 queries for any number of clients.
I ask this because I wanted to integrate GraphQL with an API that can fetch nested resources in an efficient way and if there was a way to solve the whole graph before resolving it would be nice to try to put some nested stuff in just one call.
Or I got it wrong and GraphQL is meant to be used only with microservices?
This is one of the difficulties of GraphQL's "resolver architecture". You must avoid incurring a ton of network latency by doing a lot of I/O in each resolver. Apps using a SQL DBMS will often grapple with the N + 1 problem at first. You need to use some batching and/or caching techniques to get around this.
If you are using Node.js on the server, I have two tools to recommend:
DataLoader - A database-agnostic tool for batching resolvers for each field and caching individual records.
Join Monster - A SQL-tailored tool that reads each query and your schema and compiles a SQL query for you. It leverages JOINs and DataLoader-style batching to fetch the data from your tables in a few (or a single) SQL queries.
I consider, that you're talking about using GraphQL with SQL database backend. The standard itself is database agnostic, and it doesn't care, how are you going to work out the problems of possible N+1 SELECT issues in your code. That being said, the specific server-side implementations of GraphQL server introduce many different ways of mitigating that problem:
AFAIK, Ruby implementation is able to to make use of Active Record and gems such as bullet to apply horizontal batching of executed database calls.
JavaScript implementation may make use of DataLoader library, which have similar techinque of batching series of executed promises together. You can see it in action here.
Elixir and Python implementations have concept of runtime info about executed subqueries, that can be used to determine which data will be further needed in order to execute GraphQL query, and potentially prefetch it.
F# implementation works similar to Elixir, but plugin itself can perform live analysis of execution tree to better describe, which fields can be potentially used in code, allowing for easier split of GraphQL domain model from database model.
Many implementations (i.e. PostGraph) tie underlying database model directly into GraphQL schema. In this case GQL query is often translated directly into database query language.
Related
I am currently working on a project where I have to retrieve some rows from the database based on some filters (I also have to paginate them).
My solution was to make a function that generates the queries and to query the database directly (it works and it's fast)
When I presented this solution to the senior programmer he told me this is going to work but it's not a long-term solution and I should rather use Spring Specifications.
Now here comes my questions :
Why is Spring Specifications better than generating a query?
Is a query generated by Spring Specifications faster than a normal query?
Is it that big of a deal to use hard-coded queries ?
Is there a better approach to this problem ?
I have to mention that the tables in the database don't store a lot of data, the biggest one (which will be queried the least) has around 134.000 rows after 1 year since the application was launched.
The tables have indexes on the rows that we will use to filter.
A "function that generates the queries" sounds like building query strings by concatenating smaller parts based on conditions. Even presuming this is a JPQL query string and not a native SQL string that would be DB dependent, there are several problems:
you lose the IDEs help if you ever refactor your entities
not easy to modularize and reuse parts of the query generation logic (eg. if you want to extract a method that adds the same conditions to a bunch of different queries with different joins and aliases for the tables)
easy to break the syntax of the query by a typo (eg. "a=b" + "and c=d")
more difficult to debug
if your queries are native SQL then you also become dependent on a database (eg. maybe you want your integration tests to run on an in-memory DB while the production code is on a regular DB)
if in your project all the queries are generated in a way but yours is generated in a different way (without a good reason) then maintenance of the will be more difficult
JPA frameworks generate optimized queries for most common use cases, so generally speaking you'll get at least the same speed from a Specification query as you do from a native one. There are times when you need to write native SQL to further optimize a query but these are exceptional cases.
Yes, it's bad practice that makes maintenance a nightmare
I'm new to GraphQL and am reading about N+1 issue and the dataloader pattern to increase performance. I'm looking at starting a new GraphQL project with DynamoDB for the database. I've done some initial research and found a couple of small NPM packages for dataloader and DynamoDb but they do no seem to be actively supported. So, it seems to me, from my initial research, that DynamoDB may not be the best choice supporting an Apollo GraphQL app.
Is it possible to implement dataloader pattern against DynamoDb database?
Dataloader doesn't care what kind of database you have. All that really matters is that there's some way to batch up your operations.
For example, for fetching a single entity by its ID, with SQL you'd have some query that's a bit like this:
select * from product where id = SOME_ID_1
The batch equivalent of this might be an in query as follows:
select * from product where id in [SOME_ID_1, SOME_ID_2, SOME_ID_3]
The actual mechanism for single vs batch querying is going to vary depending on what database you're using, it may not always be possible but it usually is. A quick search shows that DynamoDB has BatchGetItem which might be what you need.
Batching up queries that take additional parameters (such as pagination, or complex filtering) can be more challenging and may or may not be worth investing the effort. But batching anything that looks like "get X by ID" is always worth it.
In terms of finding libraries that support Dataloader and DynamoDB in particular, I wouldn't worry about it. You don't need this level of tooling. As long as there's some way of constructing the database query, and you can put it inside a function that takes an array of IDs and returns a result in the right shape, you can do it -- and this usually isn't complicated enough to justify adding another library.
Currently our system is designed in a very ad-hoc manner. There are cases where we have datastore entities designed as
NameSpace: ProjectName
Kind: <SpecificUseCaseLikeSQLTables>
Then there are cases where we have defined our entites such as
Namespace: <SomeKeyWhichUniquelyDefineAnObject>
Kind: SpecificUseCaseLikeSQLTables
Now, we are in a situation where a single call from user is taking around 10 seconds to response. I am looking into that function and it looks like we end up fetching multiple entities for one specific use case. Right now i am trying to see how many of those calls that can be fetched only once (i.e., if there is no change in those entities, those entities should get passed down to nested functions rather being fetched again). But besides that, one thing that i am thinking is that is there a way where I can issue only one query to datastore to fetch data from multiple namespaces/kinds (as described above).
In layman terms, I am asking, is there a concept of joins in Datastore? Or an alternative to it?
Joins are not supported in GAE. You could check the following documentation(http://code.google.com/appengine/docs/java/datastore/jdo/relationships.html).
If you are looking for RDBMS style databases you can try using Cloud SQL (https://developers.google.com/cloud-sql/docs/introduction).
I have to make some filters, such as get persons who are in a given department, and I was wondering about the best way to do it.
Some of them are going to require the join of multiple tables.
Does anyone know about the main differences between CDbCriteria and Query Builder? I would particularly like to know about the compatibility with databases.
I found this in the Yii documentation about Query Builder:
It offers certain degree of DB abstraction, which simplifies migration to different DB platforms.
Is it the same for the CDbCriteria objects? Is it better?
The concept of CDbCriteria is used when working with Yii's active record (AR) abstraction (which is usually all of the time). AR requires that you have created models for the various tables in your database.
Query builder a very different way to access the database; in effect it is a structured wrapper that allows you to programmatically construct an SQL query instead of just writing it out as a string (as an added bonus it also offers a degree of database abstraction as you mention).
In a typical application there would be little to no need to use query builder because AR already provides a great deal of functionality and it also offers the same degree of database abstraction.
In some cases you might want to run a very specific type of query that is not convenient or performant to issue through AR. You then have two options:
If the query is fixed or almost fixed then you can simply issue it through DAO; in fact the query builder documentation mentions that "if your queries are simple, it is easier and faster to directly write SQL statements".
If the query needs to be dynamically constructed then query builder becomes a good fit for the job.
So as you can see, query builder is not all that useful most of the time. Only if you want to write very customized and at the same time dynamically constructed queries does it make sense to use it.
The example feature that you mention can and should be implemented using AR.
I just watched the last Channel 9 vid on the upcoming parallel extensions to .NET.
How would you use this in a web app? I'm specifically thinking of using the parallel Linq extensions against a SQL db. Would this makes sense to use as a way to speed up your data access layer in a multi-user server app? What are the issues (aside from the obvious thread safety issues using static collection types)?
I think this paragraph extracted from this article explains the usage of PLINQ-to-SQL:
LINQ-to-SQL and LINQ-to-Entities
queries will still be executed by the
respective databases and query
providers, so PLINQ does not offer a
way to parallelize those queries. If
you wish to process the results of
those queries in memory, including
joining the output of many
heterogeneous queries, then PLINQ can
be quite useful.
As for the usage of PLINQ in a web app, if the request requires many in-memory calculations in witch PLINQ might be useful (like if you have several data sources that you'd like to query together) I see no problems in using it.
Parallel LINQ is primarily intended to work against in-memory collections, I believe. How were you anticipating using it against your database?
Given that webapps are naturally pretty parallel (in terms of separate requests executing on separate threads etc) I suspect that PLINQ won't really apply to it much.