How to perform a "from" in Elasticsearch scroll context? - elasticsearch

I have a large dataset to query and display in website on an array.
I made a pagination system with a scroll but i can only display a maximum of 100 items at a time so i'm facing issue when i want to display data of page 200 and more because i have to scroll until them and it take too long.
I have check other parts of my code and i didn't find other perf issue, is just the scroll queries which make my api call too long. I tried setting the request size from 100 to 10000 but it doesn't change anything.
I don't think sliced ​​scroll can be a solution or then I didn't understand the functionality.
I'm desperately searching a way to skip the scroll queries before datas that i'm searching even it's not a precise method.
Hoping someone has a solution or at least a clue.
Edit:
More details about what i'm trying to achieve.
I log some actions of my users like calls in Elasticsearch indexes. They do millions of actions per month so Elasticsearch seems like a good option to store them knowing that i don't have to update them after they are stored .
I'm creating a page where my users can search for actions they've performed, but they're doing the "query" themselves. I mean they can select the period and many other parameters, order them by many parameters, etc. The number of result can be 1 or 100,000 items, but I can't show 100,000 items on my page for UI reasons, so I have to manage a pagination and send only part of the result to the page.
I made a scroll query to do it for now with a size of 1000, and i scroll until i'm in the current page of my pagination. I tried to vary the size but it's not really concluent because I can't know the number of result before the query is made.
And the deeper my user go in the pagination, the longer the query take.
I could increase the index.max_result_window with an unreachable number (but I don't know what that implies) make a simple query with a from and a second scroll query for export case but I wonder if they are a way to skip some step in a scroll when i know i'm going to take 100 items after the 1 000 000th item ?
Edit: I watched how google design its pagination and i notice that if you want to go deep in search results you can't unless you go step by step. You can't go directly to the 500th page.
This is how I done mine
So I just redesign my pagination to do the same as Google and force my users to use more precise filters to get less result. Thank you #Val for getting me to ask the right questions :)

Related

REST API - Retrieve previous query in dynamoDB

I have 100 rows of data in DynamoDB and a api with path api/get/{number}
Now when I say number=1 api should return me first 10 values. when I say number=2 it should return next 10 values. I did something like this with query, lastEvaluatedKey and sort by on createdOn . Now the use case is if the user passes number=10 after number=2 the lastEvaluatedKey is still that of page 2 and the result would be data of page 3. How can I get data directly. Also if the user goes from number=3 to number=1 still the data will not be of page 1.
I am using this to make API call based of pagination on HTML.
I am using java 1.8 and aws-java-sdk-dynamodb.
Non-sequential pagination in DynamoDB is tough - you have to design your data model around it, if it's an operation that needs to be efficient at all times. For a recommendation in your specific case I'd need more details about the data and access patterns.
In general you have the option of setting the ExclusiveStartKey attribute in the query call, which is similar to an offset in relational databases, but only similar and not identical. The ExclusiveStartKey is the key after which the query will continue, meaning data from your table and not just a number.
That means you usually can't guess it, unless it's a sequential number - which isn't ideal.
For sequential pagination, i.e. the user goes from page 1 to page 2, page 2 to page 3 etc. you can pass that along in the request as a token, but that won't work if the user moves in the other direction page 3 to page 2 or just randomly navigates to page 14.
In your case you only have a limited amount of data - 100 items, so my solution for your specific case would be to query all items and limit the amount of items in the response to n * 10, where n is the result page. Then you return the last 10 items from that result to your client.
This is a solution that would get expensive at scale (time + cost) though, fortunately not many people will use the pagination to go to page 7 or 8 though (you could bury a body on page 2 of the google search results).
Yan Cui has written an interesting post on this problem on Hackernoon, you might want to check it out.

Infinite/paginated scrolling with caching

I have a requirement where I need to display a long table. It doesn't have to be displayed all at once, so ajax loading it is (load first 50 recs, then get another 50 rows everytime the user scrolls to/past the tenth row from the last).
But I'm not sure which of the two, pagination and infinite scrolling, is better. I'd like the user to be able to skip to the last scrolled-to point when returning to the page (through Back button, definitely; if I can do that whenever, however user visits the page, even better!) with the previous rows visible as well. At the same time, for performance, I want to restrict the number of ajax calls to as low as I can keep it.
Any thoughts?
To implement such scenerio, first consume an api with page no and number of records as request params in API calls
For Ex- 'www.abc.com/v1/tableData/pageId=1&noOfRecords=50'
Then you will get the first 50 records. Its response should also provide you the total number of recors avaiallbe in database after callling first api .
When you scroll down, increase the pageId with +1
For ex - 'www.abc.com/v1/tableData/pageId=2&noOfRecords=50'
In the same way, you will increase the pageId untill you check the total records you got till now, should be equals to the total records, you are getting from API key.
In this way you can able to impmentent it.
Talking about performance, its up to you whther you are using pagination or scroll, it does not matter, since you are restricting the number of records to display.

Smart pagination algorithm that works with local data cache

This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.
Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.
You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.

Grafana Templating

I am wondering if there is a way to limit the number of rows generated from grafana templating.
I can have a drop down variable "$x"on my grafana page and I can select the row editor and say repeat row for every value under $x.
Based on different criteria, I can have x anywhere between 1 and like 160 rows. I need to only be looking at about 10 at a time. I am wondering if I can control the number of rows shown/change the rows shown somewhere in grafana.
I can manually select items from the $x drop down to show only a few items of course, but its a matter of selecting only say.. 10 items right when the page loads.
Please let me know if I am not describing the problem correctly or if I need to clarify more.
Thanks,
Karan
As far as I know there is no direct option for this but there are some ways you may be able to achieve what you want.
You could select your ~10 default entries and then save the dashboard this will save the selected ones in the dashboards JSON. (or modify the JSON of the dashboard directly)
You could use the regex field in the template settings to filter some of your values and split them in groups this way. (one variable per regex group)
You could change your data in elasticsearch in order to use multiple fields where you can split on.
see PR #5616 as #Daniel Lee mentioned
In general I think you get a faster response to this in grafana directly at github.

How to display large amount of data fast

I have a large database table that I need to display on a Windows Form. The data is some sort of a "category list" which I need to display to the user in a treeview-like structure, there are categories with more sub-categories. Treeview control has delay loading but the problem is there could be hundred thousand root nodes with 4 column string values. I tried adding 100000 nodes to a treeview and it took 5 minutes to complete. Are there any other options for such an operation? Can you give me any ideas? It doesn't have to be a treeview..
That is a lot of nodes, but did you try calling BeginUpdate() and then EndUpdate() when you added the list of nodes to the treeview? That would probably bring your performance up a bit!
http://msdn.microsoft.com/en-us/library/system.windows.forms.treeview.beginupdate.aspx
For applications like this, I tend to implement features such as "search as you type", which would return category names as they are typed into a textbox. For example, on each keystroke, I would go back to the DB and return the top 10 or so values that begin with what's in the text box. If there are more than 10 results, I either indicate that more results are present, or I tell them to refine their search. IMO direct searching always trumps sorting and/or paging. I hate paging. It's almost always an admission that your search functionality is not good enough!
Perhaps implement some sort of paging mechanism. 100,000 items in a TreeView would be very difficult to read from a user's perspective. Providing only 1,000 or even fewer root nodes at a time would certainly cut back on load times.
You can use cache capabilities. 20000 records generally takes 0,2s to load. Check your language support in order to use it.
regards,
TreeView has an event BeforeExpand. You can use it to determine on the fly which node content to load. That is, you first load only the top level nodes in your TreeView.
If the user is about to expand a node you can fetch the required data and fill the subnodes of that node. Use the Tag property of TreeNode to store an ID which data belongs to which node.
Make sure to use BeginUpdate() and EndUpdate() or use AddRange() instead of Add() to add nodes because it is much faster.
I decided to implement a virtual/server mode for the treeview.
Thanks for the answers.

Resources