So I have been following Ben Awad's 14 hr GraphQL and React series and I came across field resolvers and I have been a little confused about their use-case. I understand that field resolvers can be used to modify a certain data in the server. In this particular case, he uses it to slice the text of huge user posts, so in this context I am guessing that fetching large chunks of data is time consuming as the data grows, so field resolvers can be used to modify appropriate data to optimise the server. Am i right? Because this makes sense to me and this is how i understand it. Is there more to what they offer or is this it?
Related
I am currently working on a project involving graphQL and I was wondering if the action of retrieving every elements of a given type in a query was taking significantly more time than only retrieving some or if this time was negligible.
Here is an exemple:
fragment GlobalProtocolStatsFields on GlobalProtocolStats {
totalProfiles
totalBurntProfiles
totalPosts
totalMirrors
totalComments
totalCollects
totalFollows
totalRevenue {
...Erc20AmountFields
}}
vs
fragment GlobalProtocolStatsFields on GlobalProtocolStats {
totalProfiles
totalBurntProfiles
totalPosts
totalMirrors
}
Thanks in advance!
The answer highly depends on the implementation on the backend side. Let's look at what three stages the data goes through and how this can impact response time.
1. Data fetching from the source
First, the GraphQL server has to fetch the data from the database or a different data source. Some data sources allow you to specify which fields you want to receive. If the GraphQL service is optimised to only fetch the data needed, some time can be saved here. In my experience, it is often not worth it to do this and it is much easier to just fetch all fields that could be needed for an object type. Some GraphQL implementations do this automatically, e.g. Hasure, Postgraphile, Pothos with the Prisma Plugin. What can be more expensive is resolving relationships between entities. Often, the GraphQL implementation has to do another roundtrip to the server.
2. Data transformation and business logic
Sometimes, the data has to be transformed before it is returned from the resolver. The resolver model allows this business logic to be called conditionally. Leaving out a field will skip its resolver. In my experience, most business logic is incredibly fast and does not really impact response time.
3. Data serialisation and network
Once all the data is ready on the server side, it has to be serialised to JSON and sent to the client. Serializing large amounts of data can be expensive, especially because GraphQL is hard to serialise in a stream. Sending data to the client can also take a while, if the connection is slow or the data has a large size. This was one of the motivations for GraphQL: Allow the client to select the required fields and reduce unused data transfer.
Summary
As you can see, the response time is mostly related to the amount of data returned from the API and the network connection. Depending on the implementation, real savings are only made on the network, but more advanced implementations can drastically reduce the work done on the server as well.
I'm new to GraphQL and am reading about N+1 issue and the dataloader pattern to increase performance. I'm looking at starting a new GraphQL project with DynamoDB for the database. I've done some initial research and found a couple of small NPM packages for dataloader and DynamoDb but they do no seem to be actively supported. So, it seems to me, from my initial research, that DynamoDB may not be the best choice supporting an Apollo GraphQL app.
Is it possible to implement dataloader pattern against DynamoDb database?
Dataloader doesn't care what kind of database you have. All that really matters is that there's some way to batch up your operations.
For example, for fetching a single entity by its ID, with SQL you'd have some query that's a bit like this:
select * from product where id = SOME_ID_1
The batch equivalent of this might be an in query as follows:
select * from product where id in [SOME_ID_1, SOME_ID_2, SOME_ID_3]
The actual mechanism for single vs batch querying is going to vary depending on what database you're using, it may not always be possible but it usually is. A quick search shows that DynamoDB has BatchGetItem which might be what you need.
Batching up queries that take additional parameters (such as pagination, or complex filtering) can be more challenging and may or may not be worth investing the effort. But batching anything that looks like "get X by ID" is always worth it.
In terms of finding libraries that support Dataloader and DynamoDB in particular, I wouldn't worry about it. You don't need this level of tooling. As long as there's some way of constructing the database query, and you can put it inside a function that takes an array of IDs and returns a result in the right shape, you can do it -- and this usually isn't complicated enough to justify adding another library.
My application should handle a lot of entities (100.000 or more) with location and needs to display them only within a given radius. I basically store everything in SQL but using Redis for caching and optimization (mainly GEORADIUS).
I am adding the entities like the following example (not exactly this, I use Laravel framework with the built-in Redis facade but it does the same as here in the background):
GEOADD k 19.059982 47.494338 {\"id\":1,\"name\":\"Foo\",\"address\":\"Budapest, Astoria\",\"lat\":47.494338,\"lon\":19.059982}
Is it bad practice? Or will it make a negative impact on performance? Should I store only ID-s as member and make a following query to get the corresponding entities?
This is a matter of the requirements. There's nothing wrong with storing the raw data as members as long as it is unique (and it unique given the "id" field). In fact, this is both simple and performant as all data is returned with a single query (assuming that's what actually needed).
That said, there are at least two considerations for storing the data outside the Geoset, and just "referencing" it by having members reflect some form of their key names:
A single data structure, such as a Geoset, is limited by the resources of a single Redis server. Storing a lot of data and members can require more memory than a single server can provide, which would limit the scalability of this approach.
Unless each entry's data is small, it is unlikely that all query types would require all data returned. In such cases, keeping the raw data in the Geoset generates a lot of wasted bandwidth and ultimately degrades performance.
When data needs to be updated, it can become too expensive to try and update (i.e. ZDEL and then GEOADD) small parts of it. Having everything outside, perhaps in a Hash (or maybe something like RedisJSON) makes more sense then.
I am validating the data from Eloqua insights with the data I pulled using Eloqua API. There are some differences in the metrics.So, are there any issues when pulling the data using API vs .csv file using Eloqua Insights?
Absolutely, besides undocumented data discrepancies that might exist, Insights can aggregate, calculate, and expose various hidden relations between data in Eloqua that is not accessible by an API export definition.
Think of the api as the raw data with the ability to pick and choose fields and apply a general filter on those, but Insights/OBIEE as a way to calculate that data, create those relationships across tables of raw data, and then present it in a consumable manner to the end user. A user has little use with a 1 gigabyte csv of individual unsubscribes for the past year, but present that in several graphs on a dashboard with running totals, averages, and timeseries, and it suddenly becomes actionable.
Is there an advantage to storing the metadata (or indexing data) for a document/*LOB separate from the raw data.
For instance having a table/collection/bucket with index on (name,school)
ID: 123
name: Johny
School: Harvard
Transcript: /*2MB text/binary*/
vs
Metadata
ID: 123
name: Johny
School: Harvard
Data
ID: 123
Transcripts: /*2MB text/binary*/
Let's assume mongodb, although it's really db agnostic perhaps.
db.firstModel.find({},{transcripts:0}) vs db.secondModel.find()
Additionally if we have aggregation/grouping on the metadata, would the heavy payload in transcripts weigh it down (even though the aggregation is on other fields)? is it better to aggregate on the metadata collection separately, then retrieve by id from the data collection? Or is it better to respect the database design (keeping everything coupled in a single document)?
In Couchbase, if it works for your use case, an option might be to have the object ID for your 2MB document something like harvard::johny::123. Every object would have such a pattern for each object ID that is used consistently in your application. Therefore your application easily piece together the object ID. Then you do not have to query or use views. You know it is harvard and johny and his 123rd object, you can just get it by ID. You already know the answer, no querying and so Couchbase will be very fast.
That being said, there may be other meta data that you want to keep in that metadata object and you want to index on and then yes, in Couchbase it might be better to break out the documents like you suggest. In Couchbase it might even be better to put them in separate buckets so the indexers are only looking at things it will index.
For an example that may not be entirely applicable to your use case, but should give you an idea of what is possible go here
All of that being said, from experience I do not like keeping larger object like you suggest in a DB long term, regardless of the DB. From an operational perspective it is terrible. You are storing what amounts to static data in a layer that needs to be very performant, with usually expensive storage and having to backup those objects over time. They become a boat anchor around your neck after a few months/years. I suggest keeping the meta-data in a fast performing system like Couchbase (cache+persistence with replication, etc) that also has a pointer to the large objects in something that is best for dishing out large static objects like HDFS, Amazon S3, etc.