I am looking into a performance issue with serialization in a nodejs backend. I would like some suggestions about how to investigate what is happening after that the app logic in the service has returned its response.
Currently there is a bad query executed with typeorm that returns about 12000 rows. The speed of this query is not a problem, but when the result is returned from the service, it takes about 100 seconds for the api to actually return the response. The application is using nestjs with graphql as api.
I guess that there is some heavy serialization done either in apollo server or in nestjs. How do I investigate this further? And is the large size of the database query the only issue here, or could it be something else?
The real problem here is that this is blocking the event loop of nodejs for about 100 seconds, which freezes the whole backend.
By console log debugging I discovered that typeorm was not the problem. The most time was not spent in typeorm, not even in the service or the resolver either. The most time was spent somewhere after the return of the resolver which lead me to think about apollo server itself.
When trying to return from the same service but using a regular rest controller instead, it only took about a second. What I ended up doing was to use JSON.stringify on the response data within the resolver and then just typed the graphql response as a string. For this particular case it was fine since the data was quite isolated from the rest of the system anyway.
The issue was probably within the part of apollo server which validates the typing of the returned data, but that is mostly a guess.
Are you sure that the actual query being executed is returning only 12000 rows, or is 12000 rows just the number that are eventually returned from the API?
It is very likely that you are returning many more rows to the NestJS backend which then need to be normalized into the actual result set you receive from the API. This is an easy problem to run into if you're doing a lot of joins and is related to the concept of Object-relational impedance mismatch.
In the past I had a similar problem where my result set from the API only returned a few thousand rows but over 400 thousand rows were sent back to TypeORM which then had to flatten them appropriately and caused the exact performance issue that you're running into.
I highly recommend that you check the generated SQL for the problematic query and then run it manually on the DB to see how many rows you actually get back.
Related
Assuming you've identified queries to inspect on a relational database that are likely running into the pitfall of sending too many too small queries and want to figure out where they come from to give the team sending them a heads up, is there any way to tell what graphql query generated it from the compiled SQL output?
Doing things the other way around where you inspect the compiled output of a known graphql query is easy. But there doesn't seem to be any easy way of acting on feedback from the actual DB?
The Hasura Query log is probably a good place to start. Do you have these logs enabled for your Hasura installation?
If you look for logs of type query-log you'll get a structured JSON object with properties that will have the operation name as well as the GQL query that was submitted to Hasura and the generated_sql that was produced.
You'd be able to match on the generated_sql and then find the actual GQL that caused it using that approach
Can someone explain the benefits of using Graphql in your Magento/Magento 2 site?
Is it really faster than a normal query or using collections. Because from what i see is that you still have to set/fetch all data in the resolver that was declared on the schema.graphql so that it will be available on every request.
Is it faster because each set of data is cached by graphql or is their a logic behind it that make it faster?
Like when you just need a name, description of a product you would just have to call getCollection()->addAttributeToSelect(['name', 'description'])->addAttributeToFilter('entity_id', $id)->getFirstItem() inside your block wherein a graphql request the data will be fetch via the resolver which all the data of the product is being fetch.
Regarding about performance , GraphQL is normally faster than the equivalent REST API if the client needs to get a graph of data and assuming GraphQL API is implemented correctly . Otherwise ,it may easily lead to N+1 loading problem which will make the API slow.
It is faster mainly because client only need to send one request to get the whole graph of the data while in the REST API , client need to send many HTTP requests separately to get the whole graph of data. The number of network round trip is reduced to one in GraphQL case and hence it is faster (Of course, it assumes that there is no single equivalent REST API to get this graph of data. 😉)
In other words , if you only get a single record , you will not find there is much performance differences in the REST API and GraphQL API.
But besides performance , what GraphQL API offers is to allow user to get the exact fields and exact graph of data that they want which is difficult to be achieved in REST API.
I am to generate a report from the database with 1000's of records. This report is to be generated on monthly basis and at times the user might want to get a report spanning like 3 months. Already as per the current records, a month's data set can reach to like 5000.
I am currently using vue-excel to which makes an api call to laravel api and there api returns the resource which is now exported by vue-excel. The resource does not only return the model data but there are related data sets I also need to fetch.
This for smaller data sets works fine that is when I am fetching like 3000 records but for anything larger than this, the server times out.
I have also tried to use laravel excel with the query concern actually timed them and both take same amount of time because laravel excel was also mapping to get me the relations.
So basically, my question is: is there some better way to do this so as get this data faster and avoid the timeouts
just put this on start of the function
ini_set(max_execution_time, 84000); //84000 is in seconds
this will override the laravel inbuild script runtime max value.
We use one endpoint that returns massive size of data and sometime the page would take 5-10s to load. We don't have control over the backend api.
Is there a way to reduce the size that's going to be downloaded from the API?
We have already enabled compression.
I heard GraphQL could make a data schema before query it. Would GraphQL help in this case?
GraphQL could help, assuming:
Your existing API request is doing a lot of overfetching, and you don't actually need a good chunk of the data being returned
You have the resources to set up an additional GraphQL server to serve as a proxy to the REST endpoint
The REST endpoint response can be modeled as a GraphQL schema (this might be difficult or outright impossible if the object keys in the returned JSON are subject to change)
The response from the REST endpoint can be cached
The extra latency introduced by adding the GraphQL server as an intermediary is sufficiently offset by the reduction in response size
Your GraphQL server would have to expose a query that would make a request to the REST endpoint and then cache it server-side. The cached response would be used upon subsequent queries to the server, until it expires or is invalidated. Caching the response is key, otherwise simply proxying the request through the GraphQL server will simply make all your queries slower since getting the data from the REST endpoint to the server will itself take approximately the same amount of time as your request does currently.
GraphQL can then be used to cut down the size of your response in two ways:
By not requesting certain fields that aren't needed by your client (or omitting these fields from your schema altogether)
By introducing pagination. If the reason for the bloated size of your response is the sheer number of records returned, you can add pagination to your schema and return smaller chunks of the total list of records one at a time.
Note: the latter can be a significant optimization, but can also be tricky if your cache is frequently invalidated
I am newbie to mongodb and would like to implement mongodb in my project having millions of record.Would like to know what should i prefer for update -bulk.find.update() vs update.collection with multi =true for performance.
As far as I know the biggest gains Bulk provides are these:
Bulk operations send only one request to the MongoDB for the all requests in the bulk. Others send a request per document or send only for one operation type from one of insert, update, updateOne, upsert with update operations and remove .
Bulk can handle a lot of different cases at different lines on a code page.
Bulk operations can work as asynchronously. Others cannot.
But today some operations work bulk based. For instance insertMany.
If the gains above have taken into account, update() must show the same performance results with an bulk.find.update() operation.
Because update() can take only one query object sending to MongoDB. And multi: true is only an argument which specifies that the all matched documents have to be updated. This means it makes only one request on the network. Just like Bulk operations.
So, both of sends only one request to MongoDB and MongoDB evaluates the query clause to find the documents that'd be updated then updates them!
I had tried to find out an answer for this question on MongoDB official site but I could not.
So, an explanation from #AsyaKamsky would be great!