Disable scoring in Elastic Search and improve search performance - elasticsearch

I want to improve the speed and performance of my search queries.
Currently, I am using filtered queries and also applying the "fuzziness" parameter on some of the fields in my index while searching. I have already kept the fields as "not_analyzed" to improve performance.
The query is equivalent to the SQL query (Select from where =).
Also, while performing analysis on my query in QueryProfiler, i found that boost query is consuming a certain amount of time. I am not concerned with the scoring of the records and wish to fetch the data as it is searched.
For that, I am planning to set the "norms" parameter to "false" while insertion and search operations.
What other steps should I follow to disable boosting & scoring and to fasten the search operations?
And what properties should I enable/ disable in order to achieve the specified purpose?
Thanks in advance.

Related

ElasticSearch Search Queries Count

We have a use case for aggregating count of elastic-search search queries/operations. Initially we've decided to make use of the /_stats endpoint for aggregating results on a per index basis. However, we would also like to explore the option of filtering search operations so we can distinguish operations by origin/source. I was wondering how we can do this efficiently. Any references to documentation or implementations would be highly appreciated,

Hold Elasticsearch document frequency constant as index changes

I'm using Elasticsearch to retrieve XML documents by terms. I have multiple indexes, one for each day. I have a large collection of documents that is, in some sense, representative. The document frequency of several terms varies from day to day.
The mathching I'm doing depends on inverse document frequency of terms. I'd like to not use the IDF of the indices I'm searching, and instead use the IDF based on the large, representative set. Is there a straightforward way to do this without writing custom scoring functions for large, complex queries?
There is no other way.
FWIW , To access and use IDF , you need to write a custom script Engine in elasticsearch, and probably use that engine based script for sorting.

What should be the value of max_gram and min_gram in Elastic search

I have a question regarding ngram configurations. Elastic search documentation says
It usually makes sense to set min_gram and max_gram to the same value.
Perhaps, too much of difference between min and max grams will increase the index storage.
But there are many blogs which are using max_gram as 8 or 20 to get higher accurate results.
I am confused between the two. Which should be the one to use?
What are pros and cons of both?
Note: My use case deals with indexing of article. Article content is usually of size 150KB.
Thanks
Analyze your search query. Find what type of like query is coming frequently, what is maximum length of search phrase and minimum length, is it case sensitive? Which is the field, Which having similar data? If data is similar, It will not take more storage.
You need to analyze your data and their relationship among them. Analyze your query behavior. Know your search query . Once you have all these information, You can take better decision or you can find some better way to solve it.
This article can help you: https://medium.com/#ashishstiwari/what-should-be-the-value-of-max-gram-and-min-gram-in-elasticsearch-f091404c9a14

How do normalization and internal optimization of boosting work? And how does that affect the relevance?

I'm new to elastic search. I'm having trouble understanding the calibration and scaling of boost values for fields in a document. As in how should we decide the boosting values for field so that it works as expected. I've gone through some of the online blogs and es doc as well, it's written that es does normalization and internal optimization of boosting values? How does that work?
E.g.: If we have tags, title, name and text fields in our doc, how should we decide the boosting values for these?
Elasticsearch uses a boolean model to match documents, and then a scoring model to determine relevance (i.e. ranking). The scoring model utilizes a TF/IDF score, coupled with some additional features. Those TF/IDF scores are calculated for each matching field within a query, and then aggregated to produce an overall score for a document. To dig into this process, I suggest running explain on your query to see how the score of each field is influencing the overall relevance of your document.
As the expert on your data, you're in the best position to determine which fields should most heavily influence the relevance of your document. Finding the right boost value for a field is about adjusting the levers until you find a formula that best suites your desired outcome (Also, if you have users, A/B testing can help here).

Does solr support the sorting while creating index?

In my test environment, there are nearly 130,000,000 documents on each server. It works fast if I do a search without sorting by date, but extremly slow if sorting is enabled.
I think if the solr can sort an indexed field while creating index, searching would be more efficient. So, how to configure the solr to sort some fields while indexing?
The initial query would be slower but all the subsequent queries should be fast.
Solr should be able to use the Filter Query Cache for sorting.
You can also warm the sort fields.
Also check if the overhead is also just cause of sorting and there is no querying and scoring involved.

Resources