OpenSearch Query Intermittent slow response - performance

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 1.1
Describe the issue:
Intermittent slowness on the query response.
Configuration:
Cluster :
data nodes
Instance type c5.large.search (3)
Dedicated Master Node : r4.large.search (3)
Storage Type : EBS
EBS Volume Type : General Purpose (SSD) - gp2
EBS Size : 10 GiB
Relevant Logs or Screenshots:
Collection size is 1MB ; with below settings:
{
“
spec_proc_comb_exp”: {
“
settings”: {
“
index”: {
“
refresh_interval”: “86400 s”,
“number_of_shards”: “5”,
“plugins”: {
“
index_state_management”: {
“
rollover_skip”: “true”
}
},
“provided_name”: “spec_proc_comb_exp”,
“creation_date”: “1671629334835”,
“number_of_replicas”: “2”,
“uuid”: “aht7O4QQTV6WtozcVCfi1A”,
“version”: {
“
created”: “135227827”
}
}
}
}
}
Query run:
GET spec_proc_comb_exp / _search {
“
query”: {
“
bool”: {
“
must”: [{
“
multi_match”: {
“
query”: “dent”,
“fields”: [“name”, “alias_terms”],
“fuzziness”: “4”
}
}],
“filter”: {
“
match_phrase”: {
“
category”: “Specialty”
}
}
}
}
}
Problem:
We are using OpenSearch as backend to perform exact/fuzzy match for a UI search bar. The index currently used is pretty small 1MB. We see an issue with not stable Response time. Most of the time it is around 100ms; but at times intermittently (10% ) is around 1000 ms or higher.
Request experts to please help troubleshoot this issue. I am new to OpenSearch or any search tech.
Tried reviewing the cluster configuration. Need guidance on the troubleshooting steps in detail. Would be very helpful.

Here are some notes to increase the performance.
set number_of_shards to 1
It's recommended to keep the shard size between 10-50GB. To do that you can reindex the data into a new index with 1 primary shard.
use term query rather than match_phrase
If you don't need the full-text search feature you can use the term query. The term query is faster than match_phrase.
decrease fuzziness or remove it if you can
The fuzzy query is an expensive query. If you can decrease the query will return faster.
You can enable the slow logs to detect slow search logs.
Note: your hardware looks good.

Related

ElasticSearch retrieves documents slowly

I'm using Java_API to retrieve records from ElasticSearch, it needs approximately 5 second to retrieve 100000 document (record/row) in Java application.
Is it slow for ElasticSearch? or is it normal?
Here is the index settings:
I tried to get better performance but without result, here is what I did:
Set ElasticSearch heap space to 3GB it was 1GB(default) -Xms3g -Xmx3g
Migrate the ElasticSearch on SSD from 7200 RPM Hard Drive
Retrieve only one filed instead of 30
Here is my Java Implementation Code
private void getDocuments() {
int counter = 1;
try {
lgg.info("started");
TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
.addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));
SearchResponse scrollResp = client.prepareSearch("ebpp_payments_union").setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.matchAllQuery())
.setScroll(new TimeValue(1000))
.setFetchSource(new String[] { "payment_id" }, null)
.setSize(10000)
.get();
do {
for (SearchHit hit : scrollResp.getHits().getHits()) {
if (counter % 100000 == 0) {
lgg.info(counter + "--" + hit.getSourceAsString());
}
counter++;
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
.setScroll(new TimeValue(60000))
.execute()
.actionGet();
} while (scrollResp.getHits().getHits().length != 0);
client.close();
} catch (UnknownHostException e) {
e.printStackTrace();
}
}
I know that TransportClient is deprecated, I tried by
RestHighLevelClient also, but it does not changes anything.
Do you know how to get better performance?
Should I change something in ElasticSearch or modify my Java code?
Performance troubleshooting/tuning is hard to do with out understanding all of the stuff involved but that does not seem very fast. Because this is a single node cluster you're going to run into some performance issues. If this was a production cluster you would have at least a replica for each shard which can also be used for reading.
A few other things you can do:
Index your documents based on your most frequently searched attribute - this will write all of the documents with the same attribute to the same shard so ES does less work reading (This won't help you since you have a single shard)
Add multiple replica shards so you can fan out the reads across nodes in the cluster (once again, need to actually have a cluster)
Don't have the master role on the same boxes as your data - if you have a moderate or large cluster you should have boxes that are neither master nor data but are the boxes your app connects to so they can manage the meta work for the searches and let the data nodes focus on data.
Use "query_then_fetch" - unless you are using weighted searches, then you should probably stick with DFS.
I see three possible axes for optimizations:
1/ sort your documents on _doc key :
Scroll requests have optimizations that make them faster when the sort
order is _doc. If you want to iterate over all documents regardless of
the order, this is the most efficient option:
( documentation source )
2/ reduce your page size, 10000 seems a high value. Can you make differents test with reduced values like 5000 /1000?
3/ Remove the source filtering
.setFetchSource(new String[] { "payment_id" }, null)
It can be heavy to make source filtering, since the elastic node needs to read the source, transformed in Object and then filtered. So can you try to remove this? The network load will increase but its a trade :)

Application-side Joins Elasticsearch

I have two indexes in Elasticsearch, a system index, and a telemetry index. I'd like to perform queries and aggregations on the telemetry index using filters from the systems index. The systems index is relatively small and only receives new documents occasionally, but the telemetry index is much larger and is constantly receiving new documents. This seems like an ideal situation for using an application-side join.
I tried emulating the example query at the pervious link, but it turns out the filtered query is deprecated as of ES 5.0. (Why is this example in the current documentation?!)
Here are my queries:
GET /system/_search
{
"query": {
"match": {
"name": "George's system"
}
}
}
GET /telemetry/_search
{
"query": {
"bool":{
"must": {
"multi_match": {
"operator": "and",
"fields": ["systemId"]
, [1] }
}
}
}
}
}
The second one fails with a json_parse_exception because for some reason it doesn't like the [ ] characters after "fields".
Can anyone provide a simple example of using application-side joins?
Once such a query is defined (perhaps in Kibana's Dev Tools console) is there a way to visualize it in Kibana?
With elastic there is no way to execute two nested queries like in a relational database where the first query uses the response of the second. The example in the application-side join, means that you are actually making two queries (two different requests to elastic) on the application side.
First query you get the list of ids you need to filter on.
Second query you pass the list of ids that you got to the terms filter.
This works when you have no more than 1024 values for systemId. Because terms query has a limit on the number of terms.
Because this query is not feasible, then you can't visualize it in kibana.
In such case you have to sacrifice a little of space and add the systemId to your mapping.
Good Luck!

ElasticSearch circuit_breaking_exception (Data too large) with significant_terms aggregation

The query:
{
"aggregations": {
"sigTerms": {
"significant_terms": {
"field": "translatedTitle"
},
"aggs": {
"assocs": {
"significant_terms": {
"field": "translatedTitle"
}
}
}
}
},
"size": 0,
"from": 0,
"query": {
"range": {
"timestamp": {
"lt": "now+1d/d",
"gte": "now/d"
}
}
},
"track_scores": false
}
Error:
{
"bytes_limit": 6844055552,
"bytes_wanted": 6844240272,
"reason": "[request] Data too large, data for [<reused_arrays>] would be larger than limit of [6844055552/6.3gb]",
"type": "circuit_breaking_exception"
}
Index size is 5G. How much memory does the cluster need to execute this query?
You can try to increase the request circuit breaker limit to 41% (default is 40%) in your elasticsearch.yml config file and restart your cluster:
indices.breaker.request.limit: 41%
Or if you prefer to not restart your cluster you can change the setting dynamically using:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"indices.breaker.request.limit" : "41%"
}
}'
Judging by the numbers showing up (i.e. "bytes_limit": 6844055552, "bytes_wanted": 6844240272), you're just missing ~190 KB of heap, so increasing by 1% to 41% you should get 17 MB of additional heap (your total heap = ~17GB) for your request breaker which should be sufficient.
Just make sure to not increase this value too high, as you run the risk of going OOM since the request circuit breaker also shares the heap with the fielddata circuit breaker and other components.
I am not sure what you are trying to do, but I'm curious to find out. Since you get that exception, I can assume the cardinality of that field is not small. You are basically trying to see, I guess, the relationships between all the terms in that field, based on significance.
The first significant_terms aggregation will consider all the terms from that field and establish how "significant" they are (calculating frequencies of that term in the whole index and then comparing those with the frequencies from the range query set of documents).
After it's doing that (for all the terms), you want a second significant_aggregation that should do the first step, but now considering each term and doing for it another significant_aggregation. That's gonna be painful. Basically, you are computing number_of_term * number_of_terms significant_terms calculations.
The big question is what are you trying to do?
If you want to see a relationship between all the terms in that field, that's gonna be expensive for the reasons explained above. My suggestion is to run a first significant_terms aggregation, take the first 10 terms or so and then run a second query with another significant_terms aggregation but limiting the terms by probably doing a parent terms aggregation and include only those 10 from the first query.
You can, also, take a look at sampler aggregation and use that as a parent for your only one significant terms aggregation.
Also, I don't think increasing the circuit breaker limit is the real solution. Those limits were chosen with a reason. You can increase that and maybe it will work, but it has to make you ask yourself if that's the right query for your use case (as it doesn't sound like it is). That limit value that it's in the exception might not be the final one... reused_arrays refers to an array class in Elasticsearch that is resizeable, so if more elements are needed, the array size is increased and you may hit the circuit breaker again, for another value.
Circuit breakers are designed to deal with situations when request processing needs more memory than available. You can set limit by using following query
PUT /_cluster/settings
{
"persistent" : {
"indices.breaker.request.limit" : "45%"
}
}
You can get more information on
https://www.elastic.co/guide/en/elasticsearch/reference/current/circuit-breaker.html
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-fielddata.html

Elasticsearch 2.1: Result window is too large (index.max_result_window)

We retrieve information from Elasticsearch 2.1 and allow the user to page thru the results. When the user requests a high page number we get the following error message:
Result window is too large, from + size must be less than or equal
to: [10000] but was [10020]. See the scroll api for a more efficient
way to request large data sets. This limit can be set by changing the
[index.max_result_window] index level parameter
The elastic docu says that this is because of high memory consumption and to use the scrolling api:
Values higher than can consume significant chunks of heap memory per
search and per shard executing the search. It’s safest to leave this
value as it is an use the scroll api for any deep scrolling https://www.elastic.co/guide/en/elasticsearch/reference/2.x/breaking_21_search_changes.html#_from_size_limits
The thing is that I do not want to retrieve large data sets. I only want to retrieve a slice from the data set which is very high up in the result set. Also the scrolling docu says:
Scrolling is not intended for real time user requests https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-request-scroll.html
This leaves me with some questions:
1) Would the memory consumption really be lower (any if so why) if I use the scrolling api to scroll up to result 10020 (and disregard everything below 10000) instead of doing a "normal" search request for result 10000-10020?
2) It does not seem that the scrolling API is an option for me but that I have to increase "index.max_result_window". Does anyone have any experience with this?
3) Are there any other options to solve my problem?
If you need deep pagination, one possible solution is to increase the value max_result_window. You can use curl to do this from your shell command line:
curl -XPUT "http://localhost:9200/my_index/_settings" -H 'Content-Type: application/json' -d '{ "index" : { "max_result_window" : 500000 } }'
I did not notice increased memory usage, for values of ~ 100k.
The right solution would be to use scrolling.
However, if you want to extend the results search returns beyond 10,000 results, you can do it easily with Kibana:
Go to Dev Tools and just post the following to your index (your_index_name), specifing what would be the new max result window
PUT your_index_name/_settings
{
"max_result_window" : 500000
}
If all goes well, you should see the following success response:
{
"acknowledged": true
}
The following pages in the elastic documentation talk about deep paging:
https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html
Depending on the size of your documents, the number of shards, and the
hardware you are using, paging 10,000 to 50,000 results (1,000 to
5,000 pages) deep should be perfectly doable. But with big-enough from
values, the sorting process can become very heavy indeed, using vast
amounts of CPU, memory, and bandwidth. For this reason, we strongly
advise against deep paging.
Use the Scroll API to get more than 10000 results.
Scroll example in ElasticSearch NEST API
I have used it like this:
private static Customer[] GetCustomers(IElasticClient elasticClient)
{
var customers = new List<Customer>();
var searchResult = elasticClient.Search<Customer>(s => s.Index(IndexAlias.ForCustomers())
.Size(10000).SearchType(SearchType.Scan).Scroll("1m"));
do
{
var result = searchResult;
searchResult = elasticClient.Scroll<Customer>("1m", result.ScrollId);
customers.AddRange(searchResult.Documents);
} while (searchResult.IsValid && searchResult.Documents.Any());
return customers.ToArray();
}
If you want more than 10000 results then in all the data nodes the memory usage will be very high because it has to return more results in each query request. Then if you have more data and more shards then merging those results will be inefficient. Also es cache the filter context, hence again more memory. You have to trial and error how much exactly you are taking. If you are getting many requests in small window you should do multiple query for more than 10k and merge it by urself in the code, which is supposed to take less application memory then if you increase the window size.
2) It does not seem that the scrolling API is an option for me but that I have to increase "index.max_result_window". Does anyone have any experience with this?
--> You can define this value in index templates , es template will be applicable for new indexes only ,so you either have to delete old indexes after creating template or wait for new data to be ingested in elasticsearch .
{
"order": 1,
"template": "index_template*",
"settings": {
"index.number_of_replicas": "0",
"index.number_of_shards": "1",
"index.max_result_window": 2147483647
},
In my case it looks like reducing the results via the from & size prefixes to the query will remove the error as we don't need all the results:
GET widgets_development/_search
{
"from" : 0,
"size": 5,
"query": {
"bool": {}
},
"sort": {
"col_one": "asc"
}
}

Short queries return not enough results

Hey I have a field in elasticsearch that is analyzed with the alphanumeric_analyzer. Then I index data into that field that looks like this:
Test-00001
Test-00002
to
Test-01000
If I execute the following query, I get 250 results consistently. But they aren't necessarily Test-00001 to Test -00250.
`{
"query": {
"match": {
"filename_Analyzed": {
"type": "phrase_prefix",
"query": "0"
}
}
}
}`
I was expecting to get 1000 results, but I only get 250. Are my expectations correct, or is the search incorrect?
EDIT 1:
Gist for the mapping:
https://gist.github.com/goalie7960/8ffd1536269a901f18bc
EDIT 2:
If I double the number of shards, the number of results also doubles. So 5 shards = 250 results, 10 shards = 500 results, etc.
EDIT 3:
Here's a gist for the analyzer I am using. But I can also reproduce with the standard analyzer.
https://gist.github.com/goalie7960/b0bbbddf1cee29b4b5ed
Turns out the prefix query or phrase prefix was exceeding the max expansion limit in elastic search. A non simple solution was to switch to ngram analysis and it has fixed the problem. Yay.

Resources