How can I use X.PagedList with ElasticSearch Nest? - elasticsearch

Background
I'm using ElasticSearch as the search engine for a new ASP.Net Core 2.1 website I'm working on. I'm using the Nest API to integrate with it. I want to use the X.PagedList to handle the paging for me.
I've used this in other ASP.Net Core projects and it's worked well querying data in MS SQL Server.
Code
ISearchResponse<Foo> searchResponse =
_elasticSearchClient.Search<Foo>(s => s
.Query(q => q
.Bool(b => b.Filter(distanceFilters))
)
.Source(src => src
.Includes(i => i
.Fields(
f => f.Field1,
f => f.Field2,
f => f.Field3
)
)
)
.From(options.From)
.Size(options.Size)
);
var hitsMD = searchResponse.HitsMetadata;
var results = hitsMD?.Hits.Select(s => new Hit()
{
Index = s.Index,
Id = s.Id,
Score = s.Score,
Job = s.Source
}
).ToPagedList(PageNumber, PageSize);
Issue
When I call .ToPagedList on the search results returned by ElasticSearch, it only shows one page of results.
The issue is that ElasticSearch has its own paging mechanism so it's only returning one page of hits.
I had the idea that because ElasticSearch passes back the total number of hits I could tell the PagedList how many items are in the list by setting the PagedList.TotalItemCount property. However, I can't do this as it's a private set.
I've tried removing the from and size but this returns 10 hits which is ElasticSearch's default size which they obviously put in place for performance reasons.
Question
How can I make use of the X.PagedList package whilst integrating into ElasticSearch using the Nest API?

You've basically got all the pieces here already. All you're missing is StaticPagedList<T>. Since paging is already being handled by Elasticsearch, you need to simply define a static paging setup, i.e.:
var pagedResults = new StaticPagedList<Foo>(results, PageNumber, PageSize, total);

Related

Uniqueness check in Elasticsearch without constantly refreshing index

I'm indexing a lot of data in Elasticsearch (through NEST) from multiple processes each running multiple threads. Part of indexing a document is finding out if we have seen a similar document before. This feature is implemented by generating a hash of a set of fields on the document and checking if we have documents in Elasticsearch with the same hash. Before indexing a document, I make the following query:
var result = elasticClient
.Index(indexName)
.Count<MyDocument>(c => c
.Query(q => q
.ConstantScore(qs => qs
.Filter(f => f
.Term(field => field.Hash, hash))))
...
This returns a count of existing documents with the specified hash. So far so good. Things are working. If a process is indexing two documents with the same hash within the same second, the count check doesn't work, since the first document isn't available for search yet. I'm running with the default refresh interval (1 second). For now I have added a refresh call after indexing each document:
var refreshResponse = client.Refresh(indexName);
This also works but it doesn't scale when indexing large amounts of documents (indexing becomes slow as already pointed out here: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html).
Any ideas for how to avoid having to call Refresh but still be able to perform a uniqueness check? I'm thinking some sort of local cache shared between all threads with hashes of documents indexed since the last refresh. I know that this won't work across processes, but that is acceptaple for now.
I ended up implementing a write-through cache as suggested by Val. This makes it possible to remove the call to Refresh but still make the count on each iteration. This is implemented using a singleton MemoryCache shared between all threads:
var cache = new MemoryCache("hashes");
When checking for uniqueness I check the cache in case no similar documents are found in Elasticsearch:
var result = elasticClient
.Count<MyDocument>(c => c
.Index(indexName)
.Query(q => q
.ConstantScore(qs => qs
.Filter(f => f
.Term(field => field.Hash, hash)))));
bool isUnique = false;
if (result.Count == 0)
{
isUnique = !cache.Contains(hash);
}
In case the count for the hash returns 0 I check a cache for that hash.
When a document has been successful indexed, I store the hash in the cache with an expiration:
var policy = new CacheItemPolicy();
policy.AbsoluteExpiration = DateTimeOffset.UtcNow.AddSeconds(5);
cache.AddOrGetExisting(hash, string.Empty, policy);
TTL could probably be 1 second as well since that is the refresh interval I currently have configured on the index.

Creating Elasticsearch Index using NEST 5.x

I am trying to create an index using NEST 5.x pre release version for Elasticsearch 5.x. I have samples from 2.x which shows how index can be created using ElasticClient.CreateIndex method. Below is my sample code.
ESnode = new Uri("http://localhost:9200");
Nodesettings = new ConnectionSettings(ESnode);
Client = new ElasticClient(Nodesettings);
However, when I am typing below, there is NO autocomplete available.
Client.CreateIndex( c => c.
I am able to successfully get the health of the node using below code.
var res = Client.ClusterHealth();
Console.WriteLine("Status:" + res.Status);
I am having a complex document mapping for which I have defined the class structure and intend to use Automap method. Hence I am trying to create the index programatically to avoid manually creating the index.
I tried using some very old version of NEST (1.x) and I am able to get the autocomplete for createIndex. But both v2.4x and 5.x did not provide the autocomplete. Is there a new way to create index? Please let me know.
Thanks
You need to supply a name to the index, in addition to the delegate that provides additional index creation options
var createIndexResponse = client.CreateIndex("index-name", c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Conference>(d => d
.AutoMap()
)
)
);

elasticseach 2.4 : retrieve all records which are fulfilling all search criterias using scroll

I am using elastic search for the first time and based on requirements i have some doubts and questions for scroll
To retrieve all data which are fulfilling all search criteria
1)I am trying to use scroll but i found while searching about it
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_21_search_changes.html
i found Search type scan is deprecated
but NEST is supporting it
so should i use "search type scan" or "sort by doc"? (I am using elastic search 2.4)
2)Can i use "sorting on any field" when using scrolling?
3)while doing clear scroll
var test2 = client.ClearScroll(x=>x.ScrollId(results.ScrollId));
Getting error as below:
Invalid NEST response built from a unsuccessful low level call on DELETE: /_search/scroll
Audit trail of this API call:
[1] BadResponse: Node: http://mydomain#localhost:9200/ Took: 00:00:00.0160110
OriginalException: System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.HttpConnection.Request[TReturn](RequestData requestData) in C:\Users\russ\source\elasticsearch-net-2.x\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 141
Request:
{"scroll_id":["c2NhbjswOzE7dG90YWxfaGl0czoxMjs="]}
Response:
{}
so is it correct way of clearing scroll or not?
Update: : below is my code :
List<Object> indexedList = new List<Object>();
ISearchResponse<ListingSearch> listingResult =
client.Search<ListingSearch>(search => search
.Index(Constant.ES_INDEX)
.Type(Constant.ES_TYPE)
.From(listingSearch.StartIndex)
.Size(10)
.Source(s => s.Include(i => i.Fields(outpputFields)))
.Query(query => query.
Bool(boolean => boolean.
Must(
must => must.Term(t => t.Field("is_deleted").Value(false))
)
.Sort(x => x.Field("_doc", SortOrder.Ascending))
.Scroll("60s")
);
List<Object> indexedList = new List<Object>();
var results = client.Scroll<ListingSearch>("60s", listingResult.ScrollId);
while (results.Documents.Any())
{
foreach (var doc in results.Hits)
{
indexedList.Add(doc);
}
results = client.Scroll<ListingSearch>("60s", results.ScrollId);
}
var test2 = client.ClearScroll(x=>x.ScrollId(results.ScrollId));
//Clear Scroll
With above code I am getting data
but if i change size from 10 to 1000, getting no records.
Not sure if issue is the amount of data because my ES db has only 12-15 documents.
NEST 2.x versions have SearchType.Scan because NEST 2.x versions are compatible with all Elasticsearch 2.x versions, so the search type needs to exist when using NEST 2.x against Elasticsearch 2.0. Sending the search type through in later versions won't have any effect.
The most efficient way of retrieving documents with scroll is sorting by _doc but you can specify any sort parameters when scrolling.
When using the scroll API, you should use the scroll_id from the previous request in the next scroll call to fetch the next set of results. Once you have finished with a scroll, it is a good idea to clear it by calling ClearScroll() as you are doing. Your call looks correct; perhaps the scroll_id has already expired at the point you make the clear call?

Cannot index document

I have written some code using the Elasticsearch.Net & NEST client library that should index a document without using a POCO for mapping fields as I have many different documents.
Question 1) Is this correct way to do create an index, does the .AddMapping<string>(mapping => mapping.Dynamic(true)) create the mapping based on the document passed in?
var newIndex = client.CreateIndex(indexName, index => index
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
)
.AddMapping<string>(mapping => mapping.Dynamic(true))
);
Question 2) Is this possible?
string document = "{\"name\": \"Mike\"}";
var newIndex = client.Index(document, indexSelector => indexSelector
.Index(indexName)
);
When I run code in "Question 2" it returns:
{"Unable to perform request: 'POST ' on any of the nodes after retrying 0 times."}
NEST only deals with typed objects in this case passing a string will cause it to index the document into /{indexName}/string/{id}.
Since it can't infer an id from string and you do not pass it one it will fail on that or on the fact that it can't serialize a string. I'll update the client to throw a better exception in this case.
If you want to index a document as string use the exposed Elasticsearch.NET client like so:
client.Raw.Index(indexName, typeName, id, stringJson);
If you want elasticsearch to come up with an id you can use
client.Raw.Index(indexName, type, stringJson);
client is the NESTclient and the Raw property is an Elasticsearch.Net client with the same connectionsettings.
Please note that I might rename Raw with LowLevel in the next beta update, still debating that.

FOSElasticaBundle order query

I am integrating FOSElasticaBundle in my Symfony 2.3 project and I need to sort the results by their price property.
Here is my code:
$finder = $this->container->get('fos_elastica.finder.website.product');
$fieldTerms = new \Elastica\Query\Terms();
$fieldTerms->setTerms('taxon_ids', $taxon_ids_array);
$boolQuery->addMust($fieldTerms);
$resultSet = $finder->find($boolQuery);
How I can do this?
Thanks
Try create a \Elastica\Query object which also contains the sorting information, then send this to the finder:
$finder = $this->container->get('fos_elastica.finder.website.product');
$fieldTerms = new \Elastica\Query\Terms();
$fieldTerms->setTerms('taxon_ids', $taxon_ids_array);
$boolQuery->addMust($fieldTerms);
$finalQuery = new \Elastica\Query($boolQuery);
$finalQuery->setSort(array('price' => array('order' => 'asc')));
$resultSet = $finder->find($finalQuery);
Have a look at the elasticsearch docs on the sort parameter to see how to use it properly.
NOTE: \Elastica\Query is quite different to \Elastica\Query\AbstractQuery, the first encapsulates everything you could send to the _search API endpoint (facets, sorting, explain, etc...) The AbstractQuery represents a base type for each of the individual query types (range, fuzzy, terms, etc...).

Resources