Finding the set "max_result_window" for Elastic Search index? - elasticsearch

So when querying ElasticSearch, I know you can constrain the size with the "size" parameter. By default, it's 10,000. I was wondering how to know what's the max (if it has been changed from 10,000)?
I have tried "/index/_settings" in hopes of finding the max_window_size, but couldn't find anything. I'm not necessarily sure if that's because it doesn't have a limit at all, or if I am doing something wrong.
So to rephrase my question: I basically want to know how to find the max size when trying to query "size: xx" to an elastic search server. If the size is 10,000/the default, then I want to know where I can find this number.
Any tips or guidance?

If the value isn't specified on the index itself (in _settings where you were looking), then it is 10000. You can change this setting only on the index itself as far as I know. To automatically apply it to new indices you can use an index template.
It appears to be an oversight by the devs to me, if you use rolling indices by date for example then there is no single index for you to query modifications to the value from (sure you could guess one). I think you just have to make sure to match your query code assumptions to your index template. In my opinion there should be a way to just ask for max results possible without needing to know that value beforehand.

You are correct in that elastic search default max query size is 10000. The way to get more is to use the "scroll" api:
https://www.elastic.co/guide/en/elasticsearch/reference/7.3/search-request-body.html#request-body-search-scroll
This essentially uses pagination to split your result into user defined segments and allows you to "scroll" to the next one using a "Scroll_id" that's returned from the initial query.

Related

Elasticsearch - How to sum the values ​from each new document into a separate index?

Example:
My documents:
{"_id":"1", "data_sent":"100"}
{"_id":"2", "data_sent":"110"}
{"_id":"3", "data_sent":"120"}
I would like to get value of 'data_sent' for every new document and sum it up to another index, lets say
index_name: 'data_sum'
field: 'total_data_sent'='330'
Bonus: I would like to create new indexes automatically for specified time period (for example /week)
I know that aggregations can be used here, but as I understand they are performed when the request is sent and for big data it could last for a while. I need to receive those data very fast when its needed.
Is there anything in Elastic that could help in my case?
I have figured it out by diving deeper into documentation.
'Transforms' was that I was looking for.
https://www.elastic.co/guide/en/elasticsearch/reference/7.9/transform-overview.html

Pagination with multi match query

I'm trying to figure out how to accomplish pagination with a multi match query using elasticsearch.
The scroll and search_after APIs seem like they won't work. scroll isn't meant for real time user requests as per documentation. search_after requires some unique field per id and requires you to sort on that field as per documentation but when using a multi-match query you're basically sorting by the score.
So, the only thing I've thought of so far is to do the following:
Send back last document id + score and use the score as the sort field. But, this could potentially return duplicate documents if other documents were added in between two queries.
If you want to paginate the first option is to use from and size parameter in your query. The documentation here
Pagination of results can be done by using the from and size
parameters. The from parameter defines the offset from the first
result you want to fetch. The size parameter allows you to configure
the maximum amount of hits to be returned.
Though from and size can be set as request parameters, they can also
be set within the search body. from defaults to 0, and size defaults
to 10.
Note that from + size can not be more than the index.max_result_window
index setting which defaults to 10,000. See the Scroll or Search After
API for more efficient ways to do deep scrolling.
If you don't need to paginate over 10k results it's your best choice. The max_result_window can be modified, but the performance will decrease as the selected page number will increase.
But of course if some documents are added during your user pagination they will be added and your pagination can be slightly inaccurate.

Elasticsearch query on string representation of number

Good day:
I have an indexed field called amount, which is of string type. The value of amount can be either one or 1. Say in this example, we have amount=1 as an indexed document but, I try to search for one, ElasticSearch will not return the value unless I put 1 for the search query. Thoughts on how I can get this to work? I'm thinking a tokenizer is what's needed.
Thanks.
You probably don't want this for sevenmillionfourhundredfifteenthousendtwohundredfourteen and the like, but only for a small number of values.
At index time I would convert everything to a proper number and store it in a numerical field, which then even allows to sort --- if you need it. Apart from this I would use synonyms at index and at query time and map everything to the digit-strings, but in a general text field that is searched by default.

Change Similarity per query in elasticsearch

I know that we can set similarity in mapping but I need to change similarity at query time. I need scores to be calculated in different ways by changing similarity. Is there any way to do so.
From Official Doc
The similarity can be set on the field level when a field is first created, as follows:
So no you cant change it at query time. Im not even sure if you can change it by updating your mapping.

Limit the number of results returned by Elastic Search

I am having an issue where i want to reduce the number of results from Elastic search to 1,000 no matter how many matching results are there matching, but this should not affect the ranking and scoring.
I was trying terminate_after, but that seems to just tell the elastic search to just get the top N results without considering the scores. Correct me if am wrong.
Any help on this?
EDIT:
I am already using pagination. So, using Size in From/Size will only affect the size of current page. But i want to limit the size of total results to 1,000 and then pagination on that.
How about using From/Size in order to return the requirement number of results:
GET /_search
{
"from" : 0, "size" : 1000,
"query" : {
//your query
}
}
You can just specify the size as an parameter.
GET /_search?size=1000
{
"query" : {
//your query
}
}
I know this question aged a little since it was asked, but i stumbled over this and i am surprised no one could give the correct answer.
Elasticsearch indices have an index module called max_result_window. You can find it in the documentation under dynamic index settings.
index.max_result_window
The maximum value of from + size for searches to this index. Defaults to 10000. Search requests take heap memory and time proportional to from + size and this limits that memory. See Scroll or Search After for a more efficient alternative to raising this.
So basically instead of limiting from or size (or a combination of those), you set max_result_window to 1000 and ES will only return a maximum of 1000 hits per request.
If you are using an index definition in a separate JSON file to create your index, you can set this value there under yourindexname.settings.index.max_result_window.
I hope this helps the folks still looking for a solution to this problem!
did you try with
terminate_after
The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. If set, the response will have a boolean field terminated_early to indicate whether the query execution has actually terminated_early. Defaults to no terminate_after.

Resources