Benefits of using Graphql in Magento 2 - magento

Can someone explain the benefits of using Graphql in your Magento/Magento 2 site?
Is it really faster than a normal query or using collections. Because from what i see is that you still have to set/fetch all data in the resolver that was declared on the schema.graphql so that it will be available on every request.
Is it faster because each set of data is cached by graphql or is their a logic behind it that make it faster?
Like when you just need a name, description of a product you would just have to call getCollection()->addAttributeToSelect(['name', 'description'])->addAttributeToFilter('entity_id', $id)->getFirstItem() inside your block wherein a graphql request the data will be fetch via the resolver which all the data of the product is being fetch.

Regarding about performance , GraphQL is normally faster than the equivalent REST API if the client needs to get a graph of data and assuming GraphQL API is implemented correctly . Otherwise ,it may easily lead to N+1 loading problem which will make the API slow.
It is faster mainly because client only need to send one request to get the whole graph of the data while in the REST API , client need to send many HTTP requests separately to get the whole graph of data. The number of network round trip is reduced to one in GraphQL case and hence it is faster (Of course, it assumes that there is no single equivalent REST API to get this graph of data. 😉)
In other words , if you only get a single record , you will not find there is much performance differences in the REST API and GraphQL API.
But besides performance , what GraphQL API offers is to allow user to get the exact fields and exact graph of data that they want which is difficult to be achieved in REST API.

Related

Retrieving all fields vs only some in graphql, Time Comparison

I am currently working on a project involving graphQL and I was wondering if the action of retrieving every elements of a given type in a query was taking significantly more time than only retrieving some or if this time was negligible.
Here is an exemple:
fragment GlobalProtocolStatsFields on GlobalProtocolStats {
totalProfiles
totalBurntProfiles
totalPosts
totalMirrors
totalComments
totalCollects
totalFollows
totalRevenue {
...Erc20AmountFields
}}
vs
fragment GlobalProtocolStatsFields on GlobalProtocolStats {
totalProfiles
totalBurntProfiles
totalPosts
totalMirrors
}
Thanks in advance!
The answer highly depends on the implementation on the backend side. Let's look at what three stages the data goes through and how this can impact response time.
1. Data fetching from the source
First, the GraphQL server has to fetch the data from the database or a different data source. Some data sources allow you to specify which fields you want to receive. If the GraphQL service is optimised to only fetch the data needed, some time can be saved here. In my experience, it is often not worth it to do this and it is much easier to just fetch all fields that could be needed for an object type. Some GraphQL implementations do this automatically, e.g. Hasure, Postgraphile, Pothos with the Prisma Plugin. What can be more expensive is resolving relationships between entities. Often, the GraphQL implementation has to do another roundtrip to the server.
2. Data transformation and business logic
Sometimes, the data has to be transformed before it is returned from the resolver. The resolver model allows this business logic to be called conditionally. Leaving out a field will skip its resolver. In my experience, most business logic is incredibly fast and does not really impact response time.
3. Data serialisation and network
Once all the data is ready on the server side, it has to be serialised to JSON and sent to the client. Serializing large amounts of data can be expensive, especially because GraphQL is hard to serialise in a stream. Sending data to the client can also take a while, if the connection is slow or the data has a large size. This was one of the motivations for GraphQL: Allow the client to select the required fields and reduce unused data transfer.
Summary
As you can see, the response time is mostly related to the amount of data returned from the API and the network connection. Depending on the implementation, real savings are only made on the network, but more advanced implementations can drastically reduce the work done on the server as well.

hasura - to call an http service api and insert the response into postgresql

I've already made an action of type query that calls an http endpoint and return a list of results.
Then i should insert this resut into the postgresql (i suppose through a mutation).
So, how can i join this insert mutation to the previus query result, and eventually apply some custom logic (eg: not insert the already present records)
I was looking into this myself a couple of days ago, and my takeaway so far was that this is currently not possible. You would still have to write a small service (e.g. aws lambda) that calls your action and plugs the result into the mutation. That is also where you can apply your business logic.
It would be a great feature to have, in order to connect two APIS directly together or even just do data transfers from one place to another.
The new Rest transformers released in 2.1 at least make it easier and faster to integrate with existing APIs, so all you need to do is the plumbing now

Investigate performance of response serialization with nestjs and graphql

I am looking into a performance issue with serialization in a nodejs backend. I would like some suggestions about how to investigate what is happening after that the app logic in the service has returned its response.
Currently there is a bad query executed with typeorm that returns about 12000 rows. The speed of this query is not a problem, but when the result is returned from the service, it takes about 100 seconds for the api to actually return the response. The application is using nestjs with graphql as api.
I guess that there is some heavy serialization done either in apollo server or in nestjs. How do I investigate this further? And is the large size of the database query the only issue here, or could it be something else?
The real problem here is that this is blocking the event loop of nodejs for about 100 seconds, which freezes the whole backend.
By console log debugging I discovered that typeorm was not the problem. The most time was not spent in typeorm, not even in the service or the resolver either. The most time was spent somewhere after the return of the resolver which lead me to think about apollo server itself.
When trying to return from the same service but using a regular rest controller instead, it only took about a second. What I ended up doing was to use JSON.stringify on the response data within the resolver and then just typed the graphql response as a string. For this particular case it was fine since the data was quite isolated from the rest of the system anyway.
The issue was probably within the part of apollo server which validates the typing of the returned data, but that is mostly a guess.
Are you sure that the actual query being executed is returning only 12000 rows, or is 12000 rows just the number that are eventually returned from the API?
It is very likely that you are returning many more rows to the NestJS backend which then need to be normalized into the actual result set you receive from the API. This is an easy problem to run into if you're doing a lot of joins and is related to the concept of Object-relational impedance mismatch.
In the past I had a similar problem where my result set from the API only returned a few thousand rows but over 400 thousand rows were sent back to TypeORM which then had to flatten them appropriately and caused the exact performance issue that you're running into.
I highly recommend that you check the generated SQL for the problematic query and then run it manually on the DB to see how many rows you actually get back.

Reduce data size returned from an API through GraphQL?

We use one endpoint that returns massive size of data and sometime the page would take 5-10s to load. We don't have control over the backend api.
Is there a way to reduce the size that's going to be downloaded from the API?
We have already enabled compression.
I heard GraphQL could make a data schema before query it. Would GraphQL help in this case?
GraphQL could help, assuming:
Your existing API request is doing a lot of overfetching, and you don't actually need a good chunk of the data being returned
You have the resources to set up an additional GraphQL server to serve as a proxy to the REST endpoint
The REST endpoint response can be modeled as a GraphQL schema (this might be difficult or outright impossible if the object keys in the returned JSON are subject to change)
The response from the REST endpoint can be cached
The extra latency introduced by adding the GraphQL server as an intermediary is sufficiently offset by the reduction in response size
Your GraphQL server would have to expose a query that would make a request to the REST endpoint and then cache it server-side. The cached response would be used upon subsequent queries to the server, until it expires or is invalidated. Caching the response is key, otherwise simply proxying the request through the GraphQL server will simply make all your queries slower since getting the data from the REST endpoint to the server will itself take approximately the same amount of time as your request does currently.
GraphQL can then be used to cut down the size of your response in two ways:
By not requesting certain fields that aren't needed by your client (or omitting these fields from your schema altogether)
By introducing pagination. If the reason for the bloated size of your response is the sheer number of records returned, you can add pagination to your schema and return smaller chunks of the total list of records one at a time.
Note: the latter can be a significant optimization, but can also be tricky if your cache is frequently invalidated

paging the query result with multiple heads

We have a microservice for relationship modeling between objects. A relation is defined between primary and secondary objects with cardinality constraints like 1-1, 1-N, N-N, etc.
The microservice provides API like Create relation, Find relations, Get secondaries, Get primaries, etc.
The query API "Get secondaries" takes a primary object and returns back all the related secondary objects. Since the related secondary object could be large, the results are paginated.
We had another microservice which was making good use of this relation microservice to work with relations. This consuming service accepted a similar pagination options like page index and number and passed the same to the relation service, and returned back the calling application the page results as obtained from the relation service. It was so far so good.
We recently identified that the consuming microservice was a bit chatty with the relation microservice as it had to call "Get secondaries" API multiple times given that there were multiple primary objects for which secondary objects had to be fetched.
So we thought to make the "Get Secondaries" API a bulk API by making it accept multiple primary objects as input. But then we got stuck with how the pagination would work.
The API would return related secondary objects for each primary but limit the secondary objects to the page size like earlier.
This seemed fine for the first call, but we are unsure how would this behave for the subsequent calls. If there were lesser number of secondary objects than the page size for one or more primary objects, what should be the input for subsequent calls. Do I need to pass those primary objects again?
This is where we are looking for suggestion on how to design this bulk API. Any input is welcome.
Basically, you should have some way to ensure that the relationship service knows what the original query was when receiving a paginated request.
A simple and maintainable way for your relationship service to handle this is to preprocess the request by sorting the requested primary objects in some way (ie. sort alphabetically by Id), and then simply iterating through the primary objects, adding secondary objects to the response, until the response is full.
The simplest thing for clients to do is to always use the same batch request and just add an index number or page token to the request.
I'd recommend a page token that mentions the last seen item (for example, lastSeen=primaryId,secondaryId (which you should obfuscate in some way to avoid a leaky abstraction)). Then, the service can look at the original request, and know where to resume iterating through all of the primary objects.
Alternately, you can encode enough information into a page token so that you can reconstruct whatever you need from the original request. This allows you to make some adjustments to the query on subsequent requests. (For example, if the client requests primaries A-Z, and you return secondary objects A1 - J5 in the first response, then you could modify the request to be J-Z; already seen J5, encode it so that you aren't leaking your implementation details, and return it to the client as the page token.) Then, instead of responding with the original request + page number, the client simply responds with the page token.
Either way, clients of the relationship service should never have to "figure out" what the request for the next page should be. The pagination should only require the consumer to increment a number or respond with a page token that was given to it by the relationship service.
Another consideration is the database that you are using. For example, in DynamoDB, the way to get the 100th item for a query like select * from secondaries where primaryId='ABC' requires you to read all items up to the 100th item. If you have a NoSQL database, or if you think you might move to a NoSQL database at some point in the future, you might find that a page token makes it much simpler to maintain where you are in the result set (as compared to an index number).
I found this article to be very helpful when I was learning about pagination myself, and I'd recommend reading it. It primarily deals with pagination concerns for UIs, but the fundamentals are the same.
TLDR: Don't make the consumer do any work. The consumer should repeat the original request with an added index number or page token, or the consumer should send a request containing only a page token.

Resources