ElasticSearch - Operational Insights , and Trends - elasticsearch

Given an ElasticSearch Installation, I want to know current trends and insights. I am not sure if Aggregators would help here.
What are the top queries for last 24 hours?
Most frequently searched terms in last 24 hours? etc.
Most accessed documents in last 24 hours?
Is there any way to collect and get hold of these metrics from ElasticSearch.
A typical use case- As a user visits the homepage, i want to show the trending searches, and top content.

One alternative if you do not want to load your existing ElasticSearch installation with additional metrics is to send this data to a log management solution on the cloud -- such as Loggly, Logentries, etc.

Related

Implements popular keyword in ElasticSearch

I'm using ElasticSearch on AWS EC2.
And i want to implement today's popular keyword function in ES.
there is 3 indexes(place, genre, name), and i want see today's popular keyword in name index only.
I tried to use ES slowlog and logstash. but slowlog save logs every shard's log.
(ex)number of shards : 5 then 5 query log saved.
Is there any good and easy way to implement popular keyword in ES?
As far as I know, this is not supported by Elasticsearch and you need to build your own custom solution.
Design you mentioned using the slowlog is not good as you mentioned its on per shard basis, even if you do some more computing and able to merge and relate them to a single search at index level, it would not be good, as
you have to change the slow log configuration and for every index there needs to be a different threshold, you can change it to 0ms, to make sure you get all the search queries in slow logs, but that would take a huge disk space and would not be good for Elasticsearch performance.
You have to do some parsing of slow log in your application and if you do it runtime it would be very costly.
I think you can maintain a distributed cache in your application where you store the top searched keyword like the leaderboard of a multi-player gaming app, which is changing very frequently but in your case, you don't even have to update this cache very frequently. I would not go into much implementation details, but simple Hashmap of search term as key and count as value would solve the issue.
Hope this helps. let me know if you have questions.

How can I find the most used query from Elasticsearch?

I have a Elasticsearch cluster running on AWS Elasticsearch instance. It is up running for a few months. I'd like to know the most used query requests over the last few months. Does Elasticsearch save all queries somewhere I can search? Or do I have to programmatically save the requests for analysis?
As far as I'm aware, Elasticsearch doesn't by default save a record or frequency histogram of all queries. However, there's a way you could have it log all queries, and then ship the logs somewhere to be aggregated/searched for the top results (incidentally this is something you could use Elasticsearch for :D). Sadly, you'll only be able to track queries after you configure this, I doubt that you'll be able to find any record of your historical queries the last few months.
To do this, you'd take advantage of Elasticsearch's slow query log. The default thresholds are designed to only log slow queries, but if you set those defaults to 0s then Elasticsearch would log any query as a slow query, giving you a record of all queries. See that link above for detailed instructions how, you could set this for a whole cluster in your yaml configuration file like
index.search.slowlog.threshold.fetch.debug: 0s
or set it dynamically per-index with
PUT /<my-index-name>/_settings
{
"index.search.slowlog.threshold.query.debug": "0s"
}
To be clear the log level you choose doesn't strictly matter, but utilizing debug for this would allow you to keep logging actually slow queries at the more dangerous levels like info and warn, which you might find useful.
I'm not familiar with how to configure an AWS elasticsearch cluster, but as the above are core Elasticsearch settings in all the versions I'm aware of there should be a way to do it.
Happy searching!

How to merge old data to save space in Elasticsearch

I tried to find information about this, but I have not find what I was looking for.
I am storing metrics every minutes in an Elasticsearch database. My idea is that the frequency is important only in a short period.
For example, I want to have my metrics every minutes for the last past week, but then I would like to merge these metrics in order to have only one document of metrics for each past weeks.
Thus, I have an idea to achieve this with a stream processing framework such as Spark streaming or Flink, but my question is : is there a native way / tool / tricks to make it happen in Elasticsearch ?
Thank you, hope my question is clear enough, otherwise leave a comment for more details.
One idea would be to have a weekly index in which you store all your metrics every minutes, once the week has passed, you could run an aggregation query on the past week index and aggregate all info at the day or week level. You'd then store that weekly aggregated information as new document in another historical index that you can query later on. I don't think it's necessary to leverage Spark streaming for this, ES aggregations can do the job pretty easily.

Query regarding Statsd and Collectd

I have a query regarding the usage of statsd and collectd.
Wherever I see in the internet, I am only getting examples where statsd/collectd is used to collect metric information about the Application/System.
My Question is: Can statsd/collectd be used to collect statistical information on any other datasets which is not a system performance related data Eg: in Ecommerce,?
Can we use it to get the information of top 10 or top 15 users / URLs that are hitting the website, in a time-series analysis(say in last 15 minutes or last 15 days)?
Any relevant links or document in this regards is most welcome.
Also, I wanted to know if we can store this data in Elastic Search as well. Any documents on this is also most relevant to me and most welcome.
Thanks

Kibana Dashboard multiple time periods and search terms

Is it possible to give different time periods or different search terms to each Visualization in a Kibana Dashboard?
Currently - no.
This is on the list of enhancements that the 'elastic' team will implement soon, but doesn't have any due date yet.
You could follow the open issue here: https://github.com/elastic/kibana/issues/3578
I think i've understood your question.
Lets supose this is yout data whitin elasticSearch:
timestamp level message
19:05:15 error connection failed
19:06:30 debug connection succesfull
You can reflect your percentajes of each level in differente time periods (10% of debug, 20% of errors, 14% of info and so on). For instance you can design a chart for the last 1 hour and other one for the last day in the same dashboard, so you don't need to manipulate the date picker in de header.
First you have to make a query to filter your data by the timestamp
(ex. last day):
#timestamp:[now-1d TO now]
Second, you need to save this search, and name it.
Finally, design whatever visualization you need based on this
search, and the results will be bound to it.
Repeat with different time periods.
Hope this helps. Bye.

Resources