Elasticsearch: Is there a way to turn off scoring, to gain performance? - performance

I have over 33m records in my Elasticsearch 7.1 index and when I query it, I limit the result size to 20. However, ES still scores the records internally. But this isn't important for me and in fact, I want any 20 results. So for example I don't care if some of the results are more relevant.
My question is, is there a way to turn this behaviour off, and if so, will it improve the performance?
Kind regards,
R.

You can use _doc as a sort field. This will make ES return the fields sorted in the order of insertion, and hence it will not do scoring.
Here is a thread from the forums that explains more:
https://discuss.elastic.co/t/most-efficient-way-to-query-without-a-score/57457/4

Related

What should be the value of max_gram and min_gram in Elastic search

I have a question regarding ngram configurations. Elastic search documentation says
It usually makes sense to set min_gram and max_gram to the same value.
Perhaps, too much of difference between min and max grams will increase the index storage.
But there are many blogs which are using max_gram as 8 or 20 to get higher accurate results.
I am confused between the two. Which should be the one to use?
What are pros and cons of both?
Note: My use case deals with indexing of article. Article content is usually of size 150KB.
Thanks
Analyze your search query. Find what type of like query is coming frequently, what is maximum length of search phrase and minimum length, is it case sensitive? Which is the field, Which having similar data? If data is similar, It will not take more storage.
You need to analyze your data and their relationship among them. Analyze your query behavior. Know your search query . Once you have all these information, You can take better decision or you can find some better way to solve it.
This article can help you: https://medium.com/#ashishstiwari/what-should-be-the-value-of-max-gram-and-min-gram-in-elasticsearch-f091404c9a14

How to maintain index on incrementing counters in Elasticsearch?

What is the best way to go around implementing counters and to be able to sort on them.
These counters are updated quite frequently and I do not want to reindex the entire document. The approaches I know of are:
1) To maintain the the counter values in some form of cache, query elastic search and sort in memory to return the results.
2) Maintain 2 indices in elastic search, 1 for the document and other for the counters. Issue 2 queries separately to elasticsearch and merge the results.
Please help.
Seems like there updating the index too frequently is not an ideal use of elasticsearch.
Based on the information from this blog by elasticsearch, eventual consistency is the way to go.
https://www.elastic.co/blog/found-keeping-elasticsearch-in-sync
I will be updating my implementation based on the approach suggested in the blog.
Closing the question.

Performance of "function_score"

I'm working on a solution for custom score boosting in Elasticsearch.
I wanted to ask if using function_score is a good idea. Because the index size is great but the result of the query should not be that big.
Does function_score work on a query result or rather as a part of query logic? If former, it might be fast, is it?
PS. Initially query boost operator seemed like a best option, but I can't get it to raise a score much above the normal range for one of the match. I've checked _explain API and it says that queryNorm normalizes my boost and I still get values below normal range (0.1 .. 4).
In principle - yes, it will slow down the performance of the search. Of course real penalty will depend on the complexity of your script. It will work during so called 'search' phase, so it means, that it will be applied for all matched docs.
You could try to make your logic faster, if your case is suitable for rescoring functionality, cause it's applied only to the top N (configurable in rescore API) results.
More information about rescoring - https://www.elastic.co/guide/en/elasticsearch/guide/current/_improving_performance.html#rescore-api

Can elasticsearch handle structured data whats its limits?

I have got 5 million objects.Each object has 100 properties. 50 properties text, 40 properties numeric values and ten properties datetime. I am sending random ad hoc queries with sorts and all kinds of queries. What ES limits? How big can the cluster go ?
I don't think you'll get a precise answer as to how big ES can go - as always it's a question of what kind of documents you are indexing, the complexity of the queries, the frequency with which you update and query, what acceptable response times are, etc.
That said, from my own experience of several live clusters with hundreds of millions of documents each I'd say you are well below any known limits to ES. I'd also check their list of cases studies, should give you more insight into how others are using ES.
http://www.elasticsearch.org/case-studies/
I would look at the limitations of Apache Lucene rather than Elasticsearch seeing as ES is essentially Lucene in a big wrapper. You can start by having a look here. Hope this helps.

Filtering the results of a sorted query in Lucene.NET

I'm using Lucene.NET, which is currently up to date with Lucene 2.9. I'm trying to implement a kind of select distinct, but without the need to drill down into any groups. I know that Lucene 3.2 has a faceted search that may solve this, but I don't have the time to port it to 2.9 yet.
I figure in any event, when you perform a paged query with a sort operator, Lucene has to find all the documents that match the query, sort them, then take the top N results, where N is the page size. I'd like to build something that is also applied after the sorted query has completed, but takes the top N unique results and returns them. I'm thinking of using a HashSet and one of the indexed fields to determine uniqueness. I'd rather find a way to extend something in Lucene than try and do this once the results are already returned for performance reasons.
Custom filters seem to run before the main query is even applied and custom collectors run before sorting is applied, unless you are sorting by Lucene's document id. So what is the best approach to this problem? A point in the direction of the right component to extend will get you the answer on this one, an example implementation will most definitely get you the answer. Thanks in advance
I'd make the search without sorting, and in a custom collector, would collect the results in a sorted list of size N based on "uniqueness"

Resources