paging the query result with multiple heads - algorithm

We have a microservice for relationship modeling between objects. A relation is defined between primary and secondary objects with cardinality constraints like 1-1, 1-N, N-N, etc.
The microservice provides API like Create relation, Find relations, Get secondaries, Get primaries, etc.
The query API "Get secondaries" takes a primary object and returns back all the related secondary objects. Since the related secondary object could be large, the results are paginated.
We had another microservice which was making good use of this relation microservice to work with relations. This consuming service accepted a similar pagination options like page index and number and passed the same to the relation service, and returned back the calling application the page results as obtained from the relation service. It was so far so good.
We recently identified that the consuming microservice was a bit chatty with the relation microservice as it had to call "Get secondaries" API multiple times given that there were multiple primary objects for which secondary objects had to be fetched.
So we thought to make the "Get Secondaries" API a bulk API by making it accept multiple primary objects as input. But then we got stuck with how the pagination would work.
The API would return related secondary objects for each primary but limit the secondary objects to the page size like earlier.
This seemed fine for the first call, but we are unsure how would this behave for the subsequent calls. If there were lesser number of secondary objects than the page size for one or more primary objects, what should be the input for subsequent calls. Do I need to pass those primary objects again?
This is where we are looking for suggestion on how to design this bulk API. Any input is welcome.

Basically, you should have some way to ensure that the relationship service knows what the original query was when receiving a paginated request.
A simple and maintainable way for your relationship service to handle this is to preprocess the request by sorting the requested primary objects in some way (ie. sort alphabetically by Id), and then simply iterating through the primary objects, adding secondary objects to the response, until the response is full.
The simplest thing for clients to do is to always use the same batch request and just add an index number or page token to the request.
I'd recommend a page token that mentions the last seen item (for example, lastSeen=primaryId,secondaryId (which you should obfuscate in some way to avoid a leaky abstraction)). Then, the service can look at the original request, and know where to resume iterating through all of the primary objects.
Alternately, you can encode enough information into a page token so that you can reconstruct whatever you need from the original request. This allows you to make some adjustments to the query on subsequent requests. (For example, if the client requests primaries A-Z, and you return secondary objects A1 - J5 in the first response, then you could modify the request to be J-Z; already seen J5, encode it so that you aren't leaking your implementation details, and return it to the client as the page token.) Then, instead of responding with the original request + page number, the client simply responds with the page token.
Either way, clients of the relationship service should never have to "figure out" what the request for the next page should be. The pagination should only require the consumer to increment a number or respond with a page token that was given to it by the relationship service.
Another consideration is the database that you are using. For example, in DynamoDB, the way to get the 100th item for a query like select * from secondaries where primaryId='ABC' requires you to read all items up to the 100th item. If you have a NoSQL database, or if you think you might move to a NoSQL database at some point in the future, you might find that a page token makes it much simpler to maintain where you are in the result set (as compared to an index number).
I found this article to be very helpful when I was learning about pagination myself, and I'd recommend reading it. It primarily deals with pagination concerns for UIs, but the fundamentals are the same.
TLDR: Don't make the consumer do any work. The consumer should repeat the original request with an added index number or page token, or the consumer should send a request containing only a page token.

Related

What should be projection primary key on query side - CQRS, Event Sourcing, Microservices

I have one thing that confuses me.
I have 2 microservices.
One creates commands and other consumes commands and produces events (events are stored in Event Store).
In my example aggregates have Guid as Entity ID, and Guid is created when aggregate is created.
Thing that confuses me is, should that key (generated on write side) be transfered via Event to query side (microservice that created command)?
Or maybe query side (projection) should have separate id in read DB.
Or maybe I should generate some shared key?
What is best solution here?
I think it all depends on your setup.
If you are doing CQRS, and you have a separate read-service (within the same bounded context), then it is up to the read-side service to model the data as it wish, either reusing the same keys or not.
If you are communicating between two different services (separate bounded contexts) then I recommend you create new primary keys in the receiving service and use the incoming key as a foreign key. Just as you would do with relationships between two tables in a SQL-database.
I think this depends on your requirements. Is there a specific reason to have different keys?
Given that you are using Guids as your PK, it seems simplest to reuse the PKs assigned by the write side.
Some reasons you might want to keep the keys consistent:
During command processing an ID was returned to the client that they may have cached and should reasonably expect to be able to use that key when querying the read side.
If your write side data is long-lived and there is an bug on your read side output, it is gonna be much easier to debug what went wrong if your keys are consistent on write and read side.
Entities in the write side will use the write side Guid PK of another entity as its FK. When you emit an event for this new dependent entity you would want the read side to be able to build the relationship back to the principal.
This is kind of an odd question.
Your primary key on a projection could literally be anything or you might not even have one.
There is no "correct answer" for this question ... It depends entirely on the projection.
What if my projection was say just a flattening out of information associated to an aggregate ... As example we have an "order" and we make a row per order showing summary information about that order. Using an "OrderId" here would seemingly make some sense as my primary key.
What if my projection was building out counts of orders by Product? Well then using a "ProductItemId" would make a lot more sense.
What if in either of these cases the Ids themselves ("OrderId" and "ProductItemId") could change? Well then using another key might make a lot of sense.
What if this is an append-only table? I might not even want to have a key.
Again, there is not a ... correct ... answer here there are many situations that you may run into.

Benefits of using Graphql in Magento 2

Can someone explain the benefits of using Graphql in your Magento/Magento 2 site?
Is it really faster than a normal query or using collections. Because from what i see is that you still have to set/fetch all data in the resolver that was declared on the schema.graphql so that it will be available on every request.
Is it faster because each set of data is cached by graphql or is their a logic behind it that make it faster?
Like when you just need a name, description of a product you would just have to call getCollection()->addAttributeToSelect(['name', 'description'])->addAttributeToFilter('entity_id', $id)->getFirstItem() inside your block wherein a graphql request the data will be fetch via the resolver which all the data of the product is being fetch.
Regarding about performance , GraphQL is normally faster than the equivalent REST API if the client needs to get a graph of data and assuming GraphQL API is implemented correctly . Otherwise ,it may easily lead to N+1 loading problem which will make the API slow.
It is faster mainly because client only need to send one request to get the whole graph of the data while in the REST API , client need to send many HTTP requests separately to get the whole graph of data. The number of network round trip is reduced to one in GraphQL case and hence it is faster (Of course, it assumes that there is no single equivalent REST API to get this graph of data. 😉)
In other words , if you only get a single record , you will not find there is much performance differences in the REST API and GraphQL API.
But besides performance , what GraphQL API offers is to allow user to get the exact fields and exact graph of data that they want which is difficult to be achieved in REST API.

Laravel pagination in Data Table

I am using DataTable plugin in Laravel. I have a record of 3000 entries in some
But when i load that page it loads all 3000 records in the browser then create pagination, this slow down the page loading.
How to fix this or correct way
Use server-side processing.
Get help from some Laravel Packages. Such as Yajra's: https://yajrabox.com/docs/laravel-datatables/
Generally you can solve pagination either on the front end, the back end (server or database side), or a combination of both.
Server side processing, without a package, would mean setting up TOP/FETCH or make rows in data being returned from your server. 

You could also load a small amount (say 20) and then when the user scrolls to the bottom of the list, load another 20 or so. I mention the inclusion of front end processing as well because I’m not sure what your use cases are, but I imagine it’s pretty rare any given user actually needs to see 3000 rows at a time.

Given that Data Tables seems to have built-in functionality for paginating data, I think that #tersakyan is essentially correct — what you want is some form of back-end filtering or paginating of rows of data to limit what’s being sent to the front end.

I don’t know if that package works for you or not or what your setup looks like, but pagination can also be achieved directly from a DataBase returning data via the SQL (using TOP/FETCH for example) or could be implemented in a Controller or Service by tracking pages of data and “loading a page at a time” both from the server and then into the table. All you would need is a unique key to associate each "set of pages" for a specific request.
But for performance, you want to avoid both large data requests and operations on large sets of data. So the more you limit how much data is being grabbed or processed at any stage of your application using it, the more performant your application will be in principle.




REST Api for Infinite scrolled query results

I'm building an internal server which contains a database of customer events. The webpage which allows access to the events is going to utilize an infinite scroll/dynamic loading scheme for display of live events as well as for browsing the results of queries to the database. So, you might query the database and maybe get 200k results. The webpage would display the 'first' 50 and allow you to scroll and scroll and scroll to see more and more results (loading perhaps 50 more at time).
I'm supposed to be using a REST api for the database access (a C# server). I'm unsure what the API should be so it remains RESTful. I've come up with 3 options. The question is, are any of them RESTful and which is most RESTful(is there such a thing -- if not I'll pick one of the RESTful).
Option 1.
GET /events?query=asdfasdf&first=1&last=50
This simply does the query and specifies the range of results to return. The server, unable to keep state, would have to requery the database each time (though perhaps utilizing the first/last hints to stop early) the infinite scroll occurs. Seems bad and there isn't any feedback about how many results are forthcoming.
Option 2 :
GET /events/?query=asdfasdf
GET /events/details?id1=asdf&id2=qwer&id3=zxcv&id4=tyui&...&id50=vbnm
This option first does a query which then returns the list of event ids but no further details. The webpage simply has the list of all the ids(at least it knows the count). The webpage holds onto the event id list and as infinite scroll/dynamic load is needed, makes another query for the event details of the specified ids. Each id is would nominally be a guid, so about 36 characters per id (plus &id##= for 41 characters). At 50 queries per hit, the URL would be quite long, 2000+ characters. The URL limit mentioned elsewhere on SO is around 2k. Maybe if I limit it to 40 ids per query this would be fine. It'd be nice to simply have a comma separated list instead of all the query parameters. Can you make a query parameter like ?ids=qwer,asdf,zxcv,wert,sdfg,rtyu,gfhj, ... ,vbnm ?
Option 3 :
POST /events/?query=asdfasdf
GET /events/results/{id}?first=1&last=50
This would post the query to the server and cause it to create a results resource. The ID of the results resource would be returned and would then be used to get blocks of the query results which in turn contain the event details needed for the webpage. The return from the POST XML could contain the number of records and other useful information besides the ID. Either the webpage would have to later delete the resource when the query page closed or the server would have to clean them up once they expire (days or weeks later).
I am concerned at Option 1, while RESTful is horrible for the server. I'm not sure requesting so many simultaneous resources, like the second GET in Option 2 is really RESTful or practical(seems like there has to be a better way). I'm not sure Option 3 is RESTful at all or if it is, its sort of cheating the REST thing by creating state via a POST(or should that be PUT).
Option 3 worked out fine. It required the server to maintain the query results and there was a bit of debate about how many queries (from various users) should simultaneously be saved as there would be no way to know when a user was actually done with a query.

Redis multiple requests

I am writing a very simple social networking app that uses Redis.
Each user has a sorted set that contains ids of items in their feed. If I want to display their feed, I do the following steps:
use ZREVRANGE to get ids of items in their feed
use HMGET to get the feed (each feed item is a string)
But now, I also want to know if the user has liked a feed item or not. So I have a set associated with each feed item that contains ids of user who have liked a feed item.
If I get 15 feed items, now I have to execute an additional 15 requests to Redis to find out, for each feed item if current user has commented on it or not (by checking if id exists in each set for each feed).
So that will take 15+1 requests.
Is this type of querying considered 'normal' when using Redis? Are there better ways I can structure the data to avoid this many requests?
I am using redis-rb gem.
You can easily refactor your code to collapse the 15 requests in one by using pipelines (which redis-rb supports).
You get the ids from the sorted sets with the first request and then you use them to get the many keys you need based on those results (using the pipeline)
With this approach you should have 2 requests in total instead of 16 and keep your code quite simple.
As an alternative you can use a lua script and fetch everything in one request.
This kind of database (Non-relational database), you have to make a trade-off between multiple requests and include some data redundancy.
You should analyze each case separately and consider some aspects, like:
How frequently this data will be accessed?
How much space this redundancy will consume?
How many requests I will have to do, in order to have all data, without redundancy?
Performance is an issue?
In your case, I would suggest to keep a Set/Hash or just a JSON encoded data for each user with a historical of all recent user interaction, such as comments, likes, etc. Every time the user access the feeds you just have to read the feeds and the historical; only two requests.
One thing to keep in mind, every user interaction, you must update all redundant data as well.

Resources