I have 100 rows of data in DynamoDB and a api with path api/get/{number}
Now when I say number=1 api should return me first 10 values. when I say number=2 it should return next 10 values. I did something like this with query, lastEvaluatedKey and sort by on createdOn . Now the use case is if the user passes number=10 after number=2 the lastEvaluatedKey is still that of page 2 and the result would be data of page 3. How can I get data directly. Also if the user goes from number=3 to number=1 still the data will not be of page 1.
I am using this to make API call based of pagination on HTML.
I am using java 1.8 and aws-java-sdk-dynamodb.
Non-sequential pagination in DynamoDB is tough - you have to design your data model around it, if it's an operation that needs to be efficient at all times. For a recommendation in your specific case I'd need more details about the data and access patterns.
In general you have the option of setting the ExclusiveStartKey attribute in the query call, which is similar to an offset in relational databases, but only similar and not identical. The ExclusiveStartKey is the key after which the query will continue, meaning data from your table and not just a number.
That means you usually can't guess it, unless it's a sequential number - which isn't ideal.
For sequential pagination, i.e. the user goes from page 1 to page 2, page 2 to page 3 etc. you can pass that along in the request as a token, but that won't work if the user moves in the other direction page 3 to page 2 or just randomly navigates to page 14.
In your case you only have a limited amount of data - 100 items, so my solution for your specific case would be to query all items and limit the amount of items in the response to n * 10, where n is the result page. Then you return the last 10 items from that result to your client.
This is a solution that would get expensive at scale (time + cost) though, fortunately not many people will use the pagination to go to page 7 or 8 though (you could bury a body on page 2 of the google search results).
Yan Cui has written an interesting post on this problem on Hackernoon, you might want to check it out.
Related
I have a large dataset to query and display in website on an array.
I made a pagination system with a scroll but i can only display a maximum of 100 items at a time so i'm facing issue when i want to display data of page 200 and more because i have to scroll until them and it take too long.
I have check other parts of my code and i didn't find other perf issue, is just the scroll queries which make my api call too long. I tried setting the request size from 100 to 10000 but it doesn't change anything.
I don't think sliced scroll can be a solution or then I didn't understand the functionality.
I'm desperately searching a way to skip the scroll queries before datas that i'm searching even it's not a precise method.
Hoping someone has a solution or at least a clue.
Edit:
More details about what i'm trying to achieve.
I log some actions of my users like calls in Elasticsearch indexes. They do millions of actions per month so Elasticsearch seems like a good option to store them knowing that i don't have to update them after they are stored .
I'm creating a page where my users can search for actions they've performed, but they're doing the "query" themselves. I mean they can select the period and many other parameters, order them by many parameters, etc. The number of result can be 1 or 100,000 items, but I can't show 100,000 items on my page for UI reasons, so I have to manage a pagination and send only part of the result to the page.
I made a scroll query to do it for now with a size of 1000, and i scroll until i'm in the current page of my pagination. I tried to vary the size but it's not really concluent because I can't know the number of result before the query is made.
And the deeper my user go in the pagination, the longer the query take.
I could increase the index.max_result_window with an unreachable number (but I don't know what that implies) make a simple query with a from and a second scroll query for export case but I wonder if they are a way to skip some step in a scroll when i know i'm going to take 100 items after the 1 000 000th item ?
Edit: I watched how google design its pagination and i notice that if you want to go deep in search results you can't unless you go step by step. You can't go directly to the 500th page.
This is how I done mine
So I just redesign my pagination to do the same as Google and force my users to use more precise filters to get less result. Thank you #Val for getting me to ask the right questions :)
I have a requirement where I need to display a long table. It doesn't have to be displayed all at once, so ajax loading it is (load first 50 recs, then get another 50 rows everytime the user scrolls to/past the tenth row from the last).
But I'm not sure which of the two, pagination and infinite scrolling, is better. I'd like the user to be able to skip to the last scrolled-to point when returning to the page (through Back button, definitely; if I can do that whenever, however user visits the page, even better!) with the previous rows visible as well. At the same time, for performance, I want to restrict the number of ajax calls to as low as I can keep it.
Any thoughts?
To implement such scenerio, first consume an api with page no and number of records as request params in API calls
For Ex- 'www.abc.com/v1/tableData/pageId=1&noOfRecords=50'
Then you will get the first 50 records. Its response should also provide you the total number of recors avaiallbe in database after callling first api .
When you scroll down, increase the pageId with +1
For ex - 'www.abc.com/v1/tableData/pageId=2&noOfRecords=50'
In the same way, you will increase the pageId untill you check the total records you got till now, should be equals to the total records, you are getting from API key.
In this way you can able to impmentent it.
Talking about performance, its up to you whther you are using pagination or scroll, it does not matter, since you are restricting the number of records to display.
This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.
Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.
You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.
I'm trying to create a backend in Spring Data Mongodb. I have the following code which works and I have used the built in methods by extending my repo with the MongoRepository class:
#RequestMapping(value="/nextpost", method=RequestMethod.GET)
public List getNextPosts(#RequestParam int next) {
Pageable pageable = new PageRequest(next, 5, new Sort(new Sort.Order(Direction.DESC, "id")));
page = repo.findAll(pageable);
return page.getContent();
}
The above code will return the page as per the page number inserted into the "next" variable.
My android frontend however allows for things to be added and deleted from the database and this causes problems with this pagination method. Lets take an example:
When my android frontend starts up, it loads the first 5 items by
calling "getNextPosts" with next = 0.
My android frontend also keeps track of the page it is on and
increments it when the user wants to see more items.
Now, we immediately add 5 more items.
When I swipe up to fetch the next 5 items, it calls the
"getNextPosts" method passing the the "next page" value = 1. The app
will load the
same 5 items originally displayed when the app was started as the 5 "NEW" items I have added just pushed the 5 "OLD" items down in
the database.
Therefore on the app, we see 15 items comprising of:
5 "NEW" + 5 "OLD" + 5 "OLD"
So if I gave numbers to all my items on my android ListView, I would see:
15
14
13
12
11
// the above would be the new items added
10
9
8
7
6
//the above would be the original items on page 0
10
9
8
7
6
//the above would be still be the original items but now we are on page 1
Does anyone know how one can solve this issue so that when I swipe up, the items would be:
15
14
13
12
11
// the above would be the new items added
10
9
8
7
6
5
//the above would be the original items on page 0
4
3
2
1
0
//the above would be on page 1
tl;dr
That's the nature of the beast. Pagination in Spring Data is defined as retrieving a part of the result set at the time of querying. Especially for remote communication, that kind of statelessness is usually the best tradeoff between keeping state, keeping connections open, scalability etc.
Details
The only way to avoid this would be to capture the state of the database at the time of the first access and only work on that. You can actually build this by retrieving all items and page through the data locally.
Of course hardly anyone does this as it easily gets out of hand for larger data volumes. Also, this would bring up other problems like: when do you actually want to see the items introduced in the meantime? So the definition of "correct content" when paginating a list is not distinct.
Mitigation strategies
If applicable to your scenario you could try to apply a sorting that guarantees new items to be added at the very end and thus basically making this an append-only list. This would naturally sort the most recent items last though, which is contrary to what's needed often times.
If you use the pagination to work down a list of items and process all of them, another approach is to keep track of the identifiers of the items you already have processed. In your particular scenario, you'd be able to detect that the items have already been processed and go on with the next page. This of course only makes sense if you read and process faster than someone else manipulates the list in the backend.
Another solution could be to store an insert timestamp into the db for each entry. This enables you to create deterministic pagination queries:
The moment you initialize pagination (querying first page) you restrict items to have an insert timestamp lower equals than now(). You have to save now() as the pagination timestamp for querying more pages in the future. Since newly added items all get an insert timestamp greater than the pagination timestamp those items won't affect existing paginations.
Please keep in mind that new items won't show until you re-initialize pagination by refreshing the pagination timestmap. But you can simply check for the existence of new items by counting the number of items with an insertion timestamp greater than the pagination timestamp and in this case show a refresh button or something like that.
I am using ASP/VB Script in my project but, i don't have much idea of Pagination in Classic ASP. I have designed a datagrid format using tables and looping. That table is filled by accessing database. As we have a huge amount of data to display, we need pagination.
Thanks in advance
The pagination problem is not inherently to ASP classic or VBScript. You need first to define which strategy to follow:
In the client:
Ajax style pagination (You can use a jQuery plugin like SlickGrid)
Linked pagination: Your page have links to page 1, page 2, etc.
Infite scrolling: This is a modern way to do pagination, with more results added to the page via ajax
In the server
Full DB results retrieve and return only the page asked. This is sometimes necessary.
Full DB retrieve but caching the result so subsequent page request come from the cache, not the DB
Ask the DB only the page asked (Different techniques depending on the DB engine)
There is a issue you need to be aware of... the built-in ASP record set will allow pagiing, however is not very efficient. The entire result set gets returned to the browser and then it locates the appropriate page and displays that data.
Think of it like this... your result set is a 4 shelf book case. When you ask for page one all 4 shelves of books get returned. The the display code says "Okay now only show page 1". If you then ask for page two... All four shelves of books gets returned and then the display code says "Okay give me page 4".
So, you should look for a paging solution that takes place on the server, inside the database. This way if you ask for page 15 of a 50 page result, the database will only return one shelf of books.
This google query should put you on the right track.
Edit: How SQL Paging Works
You must us a stored procedure
One of the input parameters is the page to view
The stored procedure filters the results on the server
Here is the basic concept of what happens inside the proc:
Step 1:
Create a temp table that stores the entire result set. My preference is to store only two values in this temp table. An identity seed value called RowId and the primary key of the result data. (I'm one of those people that believes in non-sensical identity seed keys)
Step 2:
Insert all the PKey values from the select statement into the temp table
Step 3:
Determine the StartRowId and EndRowId based on the input page parameter.
Step 4:
Select from the temp table using an inner join to the datatable on the PKey. In the where clause limit the result so the RowId (of the temp table) is between StartRowId and EndRowId. Make sure to Order By the RowId.
Set page size
recordset.PageSize = 100 ' number of records per page
Set the current page
recordset.AbsolutePage = nPage ' nPage being the page you want to jump to.
Other useful bits:
recordset.RecordCount ' number of records returned
recordset.PageCount ' number of pages based on PageSize and RecordCount
That's the basic info. You'll still need to loop through the appropriate number of records, and check the page number as it is passed back to the page.