Does Lucene.Net Sort and then filter, or filter and then sort? - sorting

We are using the Lucene.Net IndexSearch.Search method. We are passing a filter and a Sort, but we're seeing some strange behaviour. Logic tells me that filtering would be done before sorting, for performance reasons, but wanted to make sure.

Filter then Sort.
Sorting in Lucene is done by collecting Documents in order into a queue. It keeps the top X documents, where X is the maximum number of results you asked for. The collectors wont compare documents that dont match either the Filter or the Query.
When you dont specify a Sort, the score is used to prioritize documents into the queue, if you use a Sort, a Comparator for the Sort you asked for is used instead.
If you are more curious, have a look at the different Collector classes in the source code, the Collect() methods have all the info you want.

Related

Strategies to compare performance of two Elasticsearch queries?

Since actual query runtime varies, it's not always useful to just check the runtime of two queries to determine which is generally faster. What are some ways to generally test whether one query is more efficient than another?
As an example of what I'm after, in MongoDB I can run explain on a query to get the number of documents iterated vs. returned. If the documents iterated is several orders of magnitude higher than what it's actually returning, I know I have an inefficient query. I know that since Elasticsearch indexes data much differently than other dbs, this may not translate well, but I'm wondering if there's some rough equivalent.
I'm looking at the Profile API which looks like a good starting place. Are fields like next_doc and next_doc_count what I'm after? Are there any others I should look for? Thanks!!

ElasticSearch Search Queries Count

We have a use case for aggregating count of elastic-search search queries/operations. Initially we've decided to make use of the /_stats endpoint for aggregating results on a per index basis. However, we would also like to explore the option of filtering search operations so we can distinguish operations by origin/source. I was wondering how we can do this efficiently. Any references to documentation or implementations would be highly appreciated,

Sort results by relevance without filtering in Algolia

Is there a way to sort results in Algolia by relevance instead of filtering them? In our case we have quite a few important attributes but we only have around 700 products so many times the search using facets end up with few or no results.
To avoid this, we are looking for a solution to reorder the list by relevance to show the best results on top while allowing users to still see the other less relevant results. Basically not filtering products, but just reorder them by relevance based on a combination of attributes we set.
Thanks
When setting filters leads you to few or no results, and you'd like to avoid that by still showing less relevant results, two solutions come to mind:
Use optionalFilters instead of filters. You get the same behavior as with filtering, but the Algolia API also returns results that don't match the filters and ranks them lower. This is the ideal solution, as it takes a single API round trip.
Perform a second search without filters when the first search returns fewer records than a threshold of your choice. This is a more manual approach and takes up to two API calls.

Does solr support the sorting while creating index?

In my test environment, there are nearly 130,000,000 documents on each server. It works fast if I do a search without sorting by date, but extremly slow if sorting is enabled.
I think if the solr can sort an indexed field while creating index, searching would be more efficient. So, how to configure the solr to sort some fields while indexing?
The initial query would be slower but all the subsequent queries should be fast.
Solr should be able to use the Filter Query Cache for sorting.
You can also warm the sort fields.
Also check if the overhead is also just cause of sorting and there is no querying and scoring involved.

Filtering the results of a sorted query in Lucene.NET

I'm using Lucene.NET, which is currently up to date with Lucene 2.9. I'm trying to implement a kind of select distinct, but without the need to drill down into any groups. I know that Lucene 3.2 has a faceted search that may solve this, but I don't have the time to port it to 2.9 yet.
I figure in any event, when you perform a paged query with a sort operator, Lucene has to find all the documents that match the query, sort them, then take the top N results, where N is the page size. I'd like to build something that is also applied after the sorted query has completed, but takes the top N unique results and returns them. I'm thinking of using a HashSet and one of the indexed fields to determine uniqueness. I'd rather find a way to extend something in Lucene than try and do this once the results are already returned for performance reasons.
Custom filters seem to run before the main query is even applied and custom collectors run before sorting is applied, unless you are sorting by Lucene's document id. So what is the best approach to this problem? A point in the direction of the right component to extend will get you the answer on this one, an example implementation will most definitely get you the answer. Thanks in advance
I'd make the search without sorting, and in a custom collector, would collect the results in a sorted list of size N based on "uniqueness"

Resources