How to get a previous page from elasticsearch using search_after? - elasticsearch

I'm trying to create fast pagination with ElasticSearch. I have read this doc page about search_after operator. I understand how to create a "forward" pagination. But I can't realize how to move to the previous page in this case.

In a project we are working on we’re going to simply reverse the sort direction and then use search_after as if it was a search_before.
This is a late answer, but it’s a little bit better than having to keep track of results on the application. For that specific scenario the Scroll APIs (that I don’t know if it was available at the time) should be more suited.

Although the API doesn't have a search previous, you have this workaround.
It's easy to move backward and I had to do it as well. Just keep track of it in variables in whatever language you're using.
I indexed an object with the searches with a pointer to keep track of where I was in the data. For example:
var $scope.search.display = 0;
var $scope.searchIndex = {};
var data = getElasticSearchQuery() //This is your data from the elastic query
if (!$scope.searchIndex[$scope.search.display + 10] && data.hits.length > 0) {
$scope.searchIndex[$scope.search.display + 10] = data.hits[data.hits.length - 1].sort;
}
If you have 'next' and 'previous' buttons, then in your POST request for elastic just assign the search_after parameter with the correct index:
$scope.prevButton = function(){
$scope.search.display -= 10;
if($scope.search.display < 10){
$scope.search.searchAfter = null;
}
if($scope.searchIndex[$scope.search.display]){
$scope.search.searchAfter = $scope.searchIndex[$scope.search.display]
}
$scope.sendResults(); //send the post in an elastic search query
};
$scope.nextButton = function() {
$scope.search.display += 10;
if($scope.searchIndex[$scope.search.display]){
$scope.search.searchAfter = $scope.searchIndex[$scope.search.display];
}
$scope.sendResults(); //send the post in an elastic search query
};
That should get you on your feet. The 10 is my size, meaning I have a pagination of 10 results.

Related

How does Apollo paginated "read" and "merge" work?

I was reading through the docs to learn pagination approaches for Apollo. This is the simple example where they explain the paginated read function:
https://www.apollographql.com/docs/react/pagination/core-api#paginated-read-functions
Here is the relevant code snippet:
const cache = new InMemoryCache({
typePolicies: {
Query: {
fields: {
feed: {
read(existing, { args: { offset, limit }}) {
// A read function should always return undefined if existing is
// undefined. Returning undefined signals that the field is
// missing from the cache, which instructs Apollo Client to
// fetch its value from your GraphQL server.
return existing && existing.slice(offset, offset + limit);
},
// The keyArgs list and merge function are the same as above.
keyArgs: [],
merge(existing, incoming, { args: { offset = 0 }}) {
const merged = existing ? existing.slice(0) : [];
for (let i = 0; i < incoming.length; ++i) {
merged[offset + i] = incoming[i];
}
return merged;
},
},
},
},
},
});
I have one major question around this snippet and more snippets from the docs that have the same "flaw" in my eyes, but I feel like I'm missing some piece.
Suppose I run a first query with offset=0 and limit=10. The server will return 10 results based on this query and store it inside cache after accessing merge function.
Afterwards, I run the query with offset=5 and limit=10. Based on the approach described in docs and the above code snippet, what I'm understanding is that I will get only the items from 5 through 10 instead of items from 5 to 15. Because Apollo will see that existing variable is present in read (with existing holding initial 10 items) and it will slice the available 5 items for me.
My question is - what am I missing? How will Apollo know to fetch new data from the server? How will new data arrive into cache after initial query? Keep in mind keyArgs is set to [] so the results will always be merged into a single item in the cache.
Apollo will not slice anything automatically. You have to define a merge function that keeps the data in the correct order in the cache. One approach would be to have an array with empty slots for data not yet fetched, and place incoming data in their respective index. For instance if you fetch items 30-40 out of a total of 100 your array would have 30 empty slots then your items then 60 empty slots. If you subsequently fetch items 70-80 those will be placed in their respective indexes and so on.
Your read function is where the decision on whether a network request is necessary or not will be made. If you find all the data in existing you will return them and no request to the server will be made. If any items are missing then you need to return undefined which will trigger a network request, then your merge function will be triggered once data is fetched, and finally your read function will run again only this time the data will be in the cache and it will be able to return them.
This approach is for the cache-first caching policy which is the default.
The logic for returning undefined from your read function will be implemented by you. There is no apollo magic under the hood.
If you use cache-and-network policy then a your read doesn't need to return undefined when data

Simple pagination in datatable using ajax without sending total count from server

I'm using DataTables 1.10.5. My table uses server side processing via ajax.
$('#' + id).dataTable({
processing: true,
serverSide: true,
ajax: 'server-side-php-script-url',
"pagingType": "simple_incremental_bootstrap"
});
Everything will work properly if I send 'recordsTotal' in the server response. But I don't want to count the total entries because of performance issues. So I tried to use the pagination plugin simple_incremental_bootstrap. However it is not working as expected. The next button always return the first page itself. If I give 'recordsTotal' in server response this plugin will work properly. I found out that If we don't give 'recordsTotal', the 'start' param sent by datatable to server side script is always 0. So my server side script will always return the first page.
According to this discussion, server side processing without calculating total count is not possible because “DataTables uses the record count that is passed back to it to deal with the paging controls”. The suggested workaround is “So the display records are needed, but it would be possible to just pass back a static number (like 1'000'000 or whatever) which would make DataTables think there are a million rows. You could hide the information element if this information is totally bogus!”
I wonder if anybody have a solution for this. Basically I want to have a simple pagination in my datatable with ajax without sending total count from server.
A workaround worth to try..
If we don't send recordsTotal from server, the pagination won't work properly. If we send a high static number as recordsTotal, table will show an active Next button even if there is no data in next page.
So I ended up in a solution which utilizes two parameters received in ajax script - 'start' and 'length'.
If rows in current page is less than 'limit' there is no data in next page. So total count will be 'start' + 'current page count'. This will disable Next button in the last page.
If rows in current page is equal to or greater than 'limit' there is more data in next pages. Then I will fetch data for next page. If there is at least one row in next page, send recordsTotal something larger than 'start + limit'. This will display an active Next button.
Sample code:
$limit = require_param('length');
$offset = require_param('start');
$current_page_data = fn_to_calculate_data($limit, $offset); // in my case, mysqli result.
$data = “fetch data $current_page_data”;
$current_page_count = mysqli_num_rows($current_page_data);
if($current_page_count >= $limit) {
$next_page_data = fn_to_calculate_data($limit, $offset+$limit);
$next_page_count = mysqli_num_rows($next_page_data);
if($next_page_count >= $limit) {
// Not the exact count, just indicate that we have more pages to show.
$total_count = $offset+(2*$limit);
} else {
$total_count = $offset+$limit+$next_page_count;
}
} else {
$total_count = $offset+$current_page_count;
}
$filtered_count = $total_count;
send_json(array(
'draw' => $params['draw'],
'recordsTotal' => $total_count,
'recordsFiltered' => $filtered_count,
'data' => $data)
);
However this solution adds some load to server as it additionally calculate count of rows in next page. Anyway it is a small load as compared to the calculation total rows.
We need to hide the count information from table footer and use simple pagination.
dtOptions = {};
dtOptions.pagingType = "simple";
dtOptions.fnDrawCallback = function() {
$('#'+table_id+"_info").hide();
};
$('#' + table_id).dataTable(dtOptions);

GraphQL Relay hasNextPage

How does graphql generates hasNextPage if only "first" parameter passed?
I am using
return relay.connectionFromPromisedArray(
global.app.get('model__user').getUsers(args),
args
);
and query:
query RootQueryType { viewer { user(id: 1){ id,email,friends(first: 5) {edges {cursor, node { id, email } }, pageInfo { hasNextPage } } } } }
So how can i pass to graphql / relay friends count so hasNextPage will be generated correct?
Relay pagination is not page based, but rather cursor based. So you paginate by saying "I want X items after item Y". Item Y is not pointed to as a page number or an offset, but rather as a pointer to that exact object, a so-called cursor. This model of pagination is nice for, for example, infinite scrolling. "Pages" are also stable after adding or removing items, as they don't depend on number of items.
hasNextPage in Relay GraphQL spec just indicates whether there are more items after the last element that has been retrieved. So in your case, it means there are more than 5 elements in total and you'll get more elements if you do
friends(first: 5, after: "CURSOR_TO_THE_LAST_ELEMENT")
You can retrieve cursor from the edges list, it's one of the elements alongside node there.
You can find detailed information on the relay pagination algorithm here: https://facebook.github.io/relay/graphql/connections.htm#sec-Pagination-algorithm.
To answer your specific question about hasNextPage, this is the algorithm:
function hasNextPage(allEdges, before, after, first, last) {
// If first was not set, return false.
if (first === null) { return false; }
// Apply the before & after cursor arguments to the set of edges.
// i.e. edges is the set of edges between the before and after cursors
const edges = ApplyCursorsToEdges(allEdges, before, after)
// If more edges exist between the before & after cursors than
// you are asking for then there is a next page.
if (edges.length > first) { return true; }
return false
}
A quick note on cursor vs page based pagination. It is generally a bad idea to paginate using fixed page sizes. A classic example of this is using the OFFSET keyword in SQL to grab the next page. There are many issues with this approach. For example, what would happen if a new object was inserted while you were in the middle of paginating the set? If the new object was inserted before the page you are currently grabbing and you use a fixed offset you are going to grab an object that you have already grabbed which leads to duplicate data in your presentation layer. Using cursors for pagination fixes this problem by allowing you to keep track of the objects themselves instead of counts of the objects.
Once last thing with relay pagination specifically. I recommend only using (first & after) OR (last & before) at any given time. Using both in the same query can lead to logical, yet unexpected results.
Best of luck!

Querying RavenDb with max 30 requests error

Just want to get some ideas from anyone who have encountered similar problems and how did you guys come up with the solution.
Basically, we have around 10K documents stored in RavenDB. And we need the ability to allow users to perform filter and search against those documents. I am aware that there is a maximum of 1024 page size within RavenDb. So in order for the filter and search to work, I need to do my own paging. But my solution gives me the following error:
The maximum number of requests (30) allowed for this session has been reached.
I have tried many different ways of disposing the session by wrapping it around using keyword and also explicitly calling Dispose after every call to RavenDb with no success.
Does anyone know how to get around this issue? what's the best practice for this kind of scenario?
var pageSize = 1024;
var skipSize = 0;
var maxSize = 0;
using (_documentSession)
{
maxSize = _documentSession.Query<LogEvent>().Count();
}
while (skipSize < maxSize)
{
using (_documentSession)
{
var events = _documentSession.Query<LogEvent>().Skip(skipSize).Take(pageSize).ToList();
_documentSession.Dispose();
//building finalPredicate codes..... which i am not providing here....
results.AddRange(events.Where(finalPredicate.Compile()).ToList());
skipSize += pageSize;
}
}
Raven limits the number of Request (Load, Query, ...) to 30 per Session. This behavior is documented.
I can see that you dispose the session in your code. But I don't see where you recreating the session. Anyways loading data they way you intend to do is not a good idea.
We're using indexes and paging and never load more than 1024.
If you're expecting thousands of documents or your precicate logic doesn't work as an index and you don't care about how long your query will take use the unbounded results API.
var results = new List<LogEvent>();
var query = session.Query<LogEvent>();
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
if (predicate(enumerator.Current.Document)) {
results.Add(enumerator.Current.Document);
}
}
}
Depending on the amount of document this will use a lot of RAM.

Model records ordering in Spine.js

As I can see in the Spine.js sources the Model.each() function returns Model's records in the order of their IDs. This is completely unreliable in scenarios where ordering is important: long person list etc.
Can you suggest a way to keep original records ordering (in the same order as they've arrived via refresh() or similar functions) ?
P.S.
Things are even worse because by default Spine.js internally uses new GUIDs as IDs. So records order is completely random which unacceptable.
EDIT:
Seems that in last commit https://github.com/maccman/spine/commit/116b722dd8ea9912b9906db6b70da7948c16948a
they made it possible, but I have not tested it myself because I switched from Spine to Knockout.
Bumped into the same problem learning spine.js. I'm using pure JS, so i was neglecting the the contact example http://spinejs.com/docs/example_contacts which helped out on this one. As a matter of fact, you can't really keep the ordering from the server this way, but you can do your own ordering with javascript.
Notice that i'm using the Element Pattern here. (http://spinejs.com/docs/controller_patterns)
First you set the function which is gonna do the sorting inside the model:
/*Extending the Student Model*/
Student.extend({
nameSort: function(a,b) {
if ((a.name || a.email) > (b.name || b.email))
return 1;
else
return -1
}
});
Then, in the students controller you set the elements using the sort:
/*Controller that manages the students*/
var Students = Spine.Controller.sub({
/*code ommited for simplicity*/
addOne: function(student){
var item = new StudentItem({item: student});
this.append(item.render());
},
addAll: function(){
var sortedByName = Student.all().sort(Student.nameSort);
var _self = this;
$.each(sortedByName, function(){_self.addOne(this)});
},
});
And that's it.

Resources