Is there a way to profile elasticsearch serialization/deserialization? - elasticsearch

I am trying to obtain documents from elasticsearch rapidly. Some of the queries take several seconds to get the results to the python client.
When I tried the to run same query with _count, the results are instant.
I know that we can profile the queries via the Profile API, but is there a way to know how much time it takes to serialize the results on the server and deserialize them on the python client?

Related

Does ElasticSearch Keep Count The Number Of Times A Record Is Returned In A Given Period Of Time?

I have an ElasticSearch instance and it does one type of search - it takes a few parameters and returns the companies in its index that match the parameters given.
I'd like to be able to pull some stats that essentially says "This company has been returned from search queries X number of times in the past week".
Does ElasticSearch store metadata that will allow to pull this kind of info from it? If this kind of data isn't stored in ES out of the box, is there a way to enable it?
Elasticsearch (not ElasticSearch ;) ) does not do this natively, no. you can build something using the slow log, where you set the timing to 0 to get it to log everything, but that then logs everything which may not be useful/too noisy
things like https://www.elastic.co/enterprise-search, built on top of Elasticsearch, do provide this sort of insight

How can I find the most used query from Elasticsearch?

I have a Elasticsearch cluster running on AWS Elasticsearch instance. It is up running for a few months. I'd like to know the most used query requests over the last few months. Does Elasticsearch save all queries somewhere I can search? Or do I have to programmatically save the requests for analysis?
As far as I'm aware, Elasticsearch doesn't by default save a record or frequency histogram of all queries. However, there's a way you could have it log all queries, and then ship the logs somewhere to be aggregated/searched for the top results (incidentally this is something you could use Elasticsearch for :D). Sadly, you'll only be able to track queries after you configure this, I doubt that you'll be able to find any record of your historical queries the last few months.
To do this, you'd take advantage of Elasticsearch's slow query log. The default thresholds are designed to only log slow queries, but if you set those defaults to 0s then Elasticsearch would log any query as a slow query, giving you a record of all queries. See that link above for detailed instructions how, you could set this for a whole cluster in your yaml configuration file like
index.search.slowlog.threshold.fetch.debug: 0s
or set it dynamically per-index with
PUT /<my-index-name>/_settings
{
"index.search.slowlog.threshold.query.debug": "0s"
}
To be clear the log level you choose doesn't strictly matter, but utilizing debug for this would allow you to keep logging actually slow queries at the more dangerous levels like info and warn, which you might find useful.
I'm not familiar with how to configure an AWS elasticsearch cluster, but as the above are core Elasticsearch settings in all the versions I'm aware of there should be a way to do it.
Happy searching!

Investigating slow queries in ElasticSearch

We are using elastic search version 5.4.1 in our production environments. The cluster setup is 3 data, 3 query, 3 master nodes. Of late we are observing a lot of slow queries in a particular data node and the [index][shard] present in that are just replicas.
I don't find many deleted docs or memory issues that could directly cause the slowness.
Any pointers on how to go about the investigation here would be helpful.
Thanks!
Many things are happening during one ES query. First, check the took field returned by ElasticSearch.
took – time in milliseconds for Elasticsearch to execute the search
However, the took field is the time that it
took ES to process the query on its side. It doesn't include
serializing the request into JSON on the client
sending the request over the network
deserializing the request from JSON on the server
serializing the response into JSON on the server
sending the response over the network
deserializing the response from JSON on the client
As such, I think you should try to identify the exact step that is slow.
Reference: Query timing: ‘took’ value and what I’m measuring

can kibana used other data source(eg, a cache that contain elasticsearch result) instead of from elasticsearch directly?

I want to use Kibana to visualize data on a dashboard and make a lot of users on the internet can access the dashboard.
The problem is, Kibana will do a query every time, but the data will update about every 30 minutes, so it's a waste of cpu to do query evertime.
So, I want to cache the elasticsearch result in some place like redis and let the kinana to fetch data from the cache.
So:
is there any software that act as a proxy, which can accept kibana request and fetch data from cache and then send response to the kibana? In another word, I only want to use kibana as a UI framework and customize the data source
Is there any other UI framework that can easily visually elasticsearch query result?
There is no need - Elasticsearch will cache the results.

Summarization in Elasticsearch

I am a newbie to Elasticsearch. We are currently using Splunk platform for our analytics application and looking to migrate to ELK. Splunk provides options to schedule searches to run in background periodically and to store the search results in a separate summary index. Is similar functionality available in Elasticsearch? If so, please point me to the documentation containing the process.
Thanks,
Keerthana
This is a great use case. Of course Elasticsearch can perform such tasks, but it is more manual. You have to write your own script. So for example, if you want to summarize data, you can use ElasticSearch aggregations, and take the result (which comes in JSON format) and store it back into an index where you keep summary data. This way, even if you delete your raw data, your summary data lives on.
Elasticsearch comes with different clients. I like to use the Python Elasticsearch DSL library.

Resources