Infinite/paginated scrolling with caching - ajax

I have a requirement where I need to display a long table. It doesn't have to be displayed all at once, so ajax loading it is (load first 50 recs, then get another 50 rows everytime the user scrolls to/past the tenth row from the last).
But I'm not sure which of the two, pagination and infinite scrolling, is better. I'd like the user to be able to skip to the last scrolled-to point when returning to the page (through Back button, definitely; if I can do that whenever, however user visits the page, even better!) with the previous rows visible as well. At the same time, for performance, I want to restrict the number of ajax calls to as low as I can keep it.
Any thoughts?

To implement such scenerio, first consume an api with page no and number of records as request params in API calls
For Ex- 'www.abc.com/v1/tableData/pageId=1&noOfRecords=50'
Then you will get the first 50 records. Its response should also provide you the total number of recors avaiallbe in database after callling first api .
When you scroll down, increase the pageId with +1
For ex - 'www.abc.com/v1/tableData/pageId=2&noOfRecords=50'
In the same way, you will increase the pageId untill you check the total records you got till now, should be equals to the total records, you are getting from API key.
In this way you can able to impmentent it.
Talking about performance, its up to you whther you are using pagination or scroll, it does not matter, since you are restricting the number of records to display.

Related

How to perform a "from" in Elasticsearch scroll context?

I have a large dataset to query and display in website on an array.
I made a pagination system with a scroll but i can only display a maximum of 100 items at a time so i'm facing issue when i want to display data of page 200 and more because i have to scroll until them and it take too long.
I have check other parts of my code and i didn't find other perf issue, is just the scroll queries which make my api call too long. I tried setting the request size from 100 to 10000 but it doesn't change anything.
I don't think sliced ​​scroll can be a solution or then I didn't understand the functionality.
I'm desperately searching a way to skip the scroll queries before datas that i'm searching even it's not a precise method.
Hoping someone has a solution or at least a clue.
Edit:
More details about what i'm trying to achieve.
I log some actions of my users like calls in Elasticsearch indexes. They do millions of actions per month so Elasticsearch seems like a good option to store them knowing that i don't have to update them after they are stored .
I'm creating a page where my users can search for actions they've performed, but they're doing the "query" themselves. I mean they can select the period and many other parameters, order them by many parameters, etc. The number of result can be 1 or 100,000 items, but I can't show 100,000 items on my page for UI reasons, so I have to manage a pagination and send only part of the result to the page.
I made a scroll query to do it for now with a size of 1000, and i scroll until i'm in the current page of my pagination. I tried to vary the size but it's not really concluent because I can't know the number of result before the query is made.
And the deeper my user go in the pagination, the longer the query take.
I could increase the index.max_result_window with an unreachable number (but I don't know what that implies) make a simple query with a from and a second scroll query for export case but I wonder if they are a way to skip some step in a scroll when i know i'm going to take 100 items after the 1 000 000th item ?
Edit: I watched how google design its pagination and i notice that if you want to go deep in search results you can't unless you go step by step. You can't go directly to the 500th page.
This is how I done mine
So I just redesign my pagination to do the same as Google and force my users to use more precise filters to get less result. Thank you #Val for getting me to ask the right questions :)

REST API - Retrieve previous query in dynamoDB

I have 100 rows of data in DynamoDB and a api with path api/get/{number}
Now when I say number=1 api should return me first 10 values. when I say number=2 it should return next 10 values. I did something like this with query, lastEvaluatedKey and sort by on createdOn . Now the use case is if the user passes number=10 after number=2 the lastEvaluatedKey is still that of page 2 and the result would be data of page 3. How can I get data directly. Also if the user goes from number=3 to number=1 still the data will not be of page 1.
I am using this to make API call based of pagination on HTML.
I am using java 1.8 and aws-java-sdk-dynamodb.
Non-sequential pagination in DynamoDB is tough - you have to design your data model around it, if it's an operation that needs to be efficient at all times. For a recommendation in your specific case I'd need more details about the data and access patterns.
In general you have the option of setting the ExclusiveStartKey attribute in the query call, which is similar to an offset in relational databases, but only similar and not identical. The ExclusiveStartKey is the key after which the query will continue, meaning data from your table and not just a number.
That means you usually can't guess it, unless it's a sequential number - which isn't ideal.
For sequential pagination, i.e. the user goes from page 1 to page 2, page 2 to page 3 etc. you can pass that along in the request as a token, but that won't work if the user moves in the other direction page 3 to page 2 or just randomly navigates to page 14.
In your case you only have a limited amount of data - 100 items, so my solution for your specific case would be to query all items and limit the amount of items in the response to n * 10, where n is the result page. Then you return the last 10 items from that result to your client.
This is a solution that would get expensive at scale (time + cost) though, fortunately not many people will use the pagination to go to page 7 or 8 though (you could bury a body on page 2 of the google search results).
Yan Cui has written an interesting post on this problem on Hackernoon, you might want to check it out.

How to combine multiple page records?

I have a model called DemoModel and contains 1000 records in DB. So i am paginating using paginator in Django(assume that per page 15 records, so i have 67 pages).
So i want to get the records of 3,4 and 5 pages and i have to append the records into list.
So can i get the objects_list based on page range or anything else i want to do?
Example:
records.page(1)
Here i am getting only one page records at a time, but how can i get multiple page records i.e; from fist page to third page
Assuming you are asking about the API request to get the paginated resources, and you are using the default pagination class: rest_framework.pagination.LimitOffsetPagination, then you can make an request as such:
https://api.example.org/accounts/?limit=30&offset=15
which in turns give you the 2nd and 3rd "page".
The limit indicates the maximum number of items to return, and is equivalent to the page_size in other styles. The offset indicates the starting position of the query in relation to the complete set of unpaginated items. doc link

Elasticsearch Scroll

I am little bit confused over Elasticsearch by its scroll functionality.
In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set?
From documentation
"search_type" => "scan", // use search_type=scan
"scroll" => "30s", // how long between scroll requests. should be small!
"size" => 50, // how many results *per shard* you want back
Is that mean it will perform search for every 30 seconds and returns all the sets of results until there is no records?
For example my ES returns total 500 records. I am getting an data from ES as two sets of records each with 250 records. Is there any way I can display first set of 250 records first, when user scrolls then second set of 250 records.Please suggest
What you are looking for is pagination.
You can achieve your objective by querying for a fixed size and setting the from parameter. Since you want to set display in batches of 250 results, you can set size = 250 and with each consecutive query, increment the value of from by 250.
GET /_search?size=250 ---- return first 250 results
GET /_search?size=250&from=250 ---- next 250 results
GET /_search?size=250&from=500 ---- next 250 results
On the contrary, Scan & scroll lets you retrieve a large set of results with a single search and is ideally meant for operations like re-indexing data into a new index. Using it for displaying search results in real-time is not recommended.
To explain Scan & scroll briefly, what it essentially does is that it scans the index for the query provided with the scan request and returns a scroll_id. This scroll_id can be passed to the next scroll request to return the next batch of results.
Consider the following example-
# Initialize the scroll
page = es.search(
index = 'yourIndex',
doc_type = 'yourType',
scroll = '2m',
search_type = 'scan',
size = 1000,
body = {
# Your query's body
}
)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print "scroll size: " + str(scroll_size)
# Do something with the obtained page
In above example, following events happen-
Scroller is initialized. This returns the first batch of results along with the scroll_id
For each subsequent scroll request, the updated scroll_id (received in the previous scroll request) is sent and next batch of results is returned.
Scroll time is basically the time for which the search context is kept alive. If the next scroll request is not sent within the set timeframe, the search context is lost and results will not be returned. This is why it should not be used for real-time results display for indexes with a huge number of docs.
You are understanding wrong the purpose of the scroll property. It does not mean that elasticsearch will fetch next page data after 30 seconds. When you are doing first scroll request you need to specify when scroll context should be closed. scroll parameter is telling to close scroll context after 30 seconds.
After doing first scroll request you will get back scroll_idparameter in response. For next pages you need to pass that value to get next page of the scroll response. If you will not do the next scroll request within 30 seconds, the scroll request will be closed and you will not be able to get next pages for that scroll request.
What you described as an example use case is actually search results pagination, which is available for any search query and is limited by 10k results. scroll requests are needed for the cases when you need to go over that 10k limit, with scroll query you can fetch even the entire collection of documents.
Probably the source of confusion here is that scroll term is ambiguous: it means the type of a query, and also it is a name of a parameter of such query (as was mentioned in other comments, it is time ES will keep waiting for you to fetch next chunk of scrolling).
scroll queries are heavy, and should be avoided until absolutely necessary. In fact, in the docs
it says:
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, ...
Now regarding your another question:
In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set?
Yes, even several parallel scroll requests
are possible:
Each scroll is independent and can be processed in parallel like any scroll request.
The documentation of the Scroll API at elastic explains this behaviour also.
The result size of 10k is a default value and can be overwritten during runtime, if necessary:
PUT { "index" : { "max_result_window" : 500000} }
The life time of the scroll id is defined in each scroll request with the parameter "scroll", e.g.
..
"scroll" : "5m"
..
In recent versions of Elasticsearch, you'll use search_after. The keep_alive you set there, much like the timeout in the scroll, is only the time needed for you to process one page.
That's because Elasticsearch will keep your "search context" alive for that amount of time, then removes it. Also, Elasticsearch won't fetch the next page for you automatically, you'll have to do that by sending requests with the ID from the last request.
It is wise to use the scroll api as one can not get more than 10K data at a time in elasticsearch.

Smart pagination algorithm that works with local data cache

This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.
Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.
You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.

Resources