How can we get the turnabout time for any Elasticsearch query? - elasticsearch

I am trying to find out what is the actual time taken by my ES cluster to return the query result to the client.
I understand from googling and reading that took time which we get in ES response can't be taken as the correct measure. Is there any other way or logs which I can enable , because if I enable the slow logs it also gives to took time.
The reason why I am looking for this is while querying ES from my client I am seeing took time and the time by which I get the response is huge. And I am unable to trace/pinpoint the reason, is it because of lag in between servers or ES is consuming more time to send the response.

Related

Is there a way to profile elasticsearch serialization/deserialization?

I am trying to obtain documents from elasticsearch rapidly. Some of the queries take several seconds to get the results to the python client.
When I tried the to run same query with _count, the results are instant.
I know that we can profile the queries via the Profile API, but is there a way to know how much time it takes to serialize the results on the server and deserialize them on the python client?

Took is having a different value than the time ES is taking to show the result in Kibana

I have a query in Elastic Search which is fetching 10000 records having approximately 8MB of data. In the "took" tag in the JSON response, it is showing "1071" meaning 1071 ms. However the response is actually appearing in Kibana after approximately, 6 to 7 seconds later. Same thing is observed, when I am executing it from JAVA API. After doing some googling, I understood that "took" includes only the query execution time in Elastic Search later and does not include the time for the following:
1) Sending the response to the server
2) Deserializing the request from JSON on the server
3) Serializing the response into JSON on the server
4) Sending the response over the network
So, given the above scenario, what measures can I take to make sure that it appears in a very reasonable time (in 1 or 2 seconds)?

Investigating slow queries in ElasticSearch

We are using elastic search version 5.4.1 in our production environments. The cluster setup is 3 data, 3 query, 3 master nodes. Of late we are observing a lot of slow queries in a particular data node and the [index][shard] present in that are just replicas.
I don't find many deleted docs or memory issues that could directly cause the slowness.
Any pointers on how to go about the investigation here would be helpful.
Thanks!
Many things are happening during one ES query. First, check the took field returned by ElasticSearch.
took – time in milliseconds for Elasticsearch to execute the search
However, the took field is the time that it
took ES to process the query on its side. It doesn't include
serializing the request into JSON on the client
sending the request over the network
deserializing the request from JSON on the server
serializing the response into JSON on the server
sending the response over the network
deserializing the response from JSON on the client
As such, I think you should try to identify the exact step that is slow.
Reference: Query timing: ‘took’ value and what I’m measuring

magento and solr reindexing issue

im having troubles reindexing magento with solr, im getting the following error via ssh (all other indexes successfully:
Error reindexing Solr: Solr HTTP error: HTTP request failed, Operation timed out after 5001 milliseconds with 0 bytes received
any ideas how to fix this?
many thanks
Looks like there is a time limit of 5000 miliseconds where as your solr indexing needs more time.
Increase time limit.
While indexing is running check solr log using tail commmand.
Using Solr interface query solr if some new products or data update in place.
Also you can write some log code in sole client.php adddoc function to check if this is getting called or not.
Having the same issue... I'm assuming you're using Magento Solarium. I opened an issue on github with the dev, I'll update you if he responds with a solution. In the meanwhile, if you were able to fix it, please let us know.
Since this is the only relevant hit from Google considering this issue, I add my findings here. The issue arises when you have a large database of products (or many shops together with many products). I noticed SOLR was filling up until the error occurred, after that the SOLR index was empty. Then I found in the code that the indexing process ends with committing all the changes. This is where the timeout happens.
Just put the timeout settings in system -> configuration -> catalogus -> Solarium search to a large number (like 500 seconds), do a total re-index and put back the timeout settings to a more reasonable number (2 seconds).
Though there are 2 options, one for search and a general timeout setting, this doesn't seem to work. If you change the search time out setting it still affects the indexing process.
You don't want to leave the timeout at 500 seconds, this can cause serious issues on your server performance.

Issues with ElasticSearch for real-time geo queries

I'm building a service that will allow users to search for other users who are nearby, based on GPS coordinates. I've tried using ElasticSearch's geo spatial indexes. When a user signs in, he submits his GPS location to an ElasticSearch geo index. Other users periodically poll ElasticSearch, querying for new documents that contain GPS coordinates within a few hundred meters.
The problem is that ElasticSearch either doesn't update its index fast enough, or it caches its results, making it unsuitable for retrieving real-time results. I've tried disabling the cache with index.cache.filter.max_size=-1 and passing "_cache=false" with every query. ElasticSearch still returns stale results when polling with the same query, and it can return stale results for up to a few minutes.
Any idea on what could be happening? Maybe it's because I'm keeping the same connection open during polling, and ElasticSearch caches results for each connection? Still, the results can be out of date with subsequent requests.
Elasticsearch results don't become immediately available for search. They are accumulated in a buffer and become available only after operation called refresh. In other words, search is not real time, but "near real time" operation ("near" is because refresh is called every second by default). Please also note that get operation is real-time - you can get document immediately after it is indexed.
While you can force refresh process after each document or make it more often, it's not the best solution for your problem because very frequent refreshing can significantly reduce search and indexing performance. Instead, I would advise you to check Elasticsearch percolators, which were added exactly for the use cases such as yours.

Resources