Is it possible to run an elasticsearch aggregation query in Kibana? - elasticsearch

I would like to run the following aggregation query in Kibana:
GET _search
{
"size": 0,
"aggs": {
"group_by_host": {
"terms": {
"field": "host",
"size": 20
}
}
}
}
I can run it in the dev tools console (what used to be called Sense), but I would like to run it in the Kibana proper. Having a hard time figuring it out.

Just create a Chart from Visualize tab.
Then buckets => X Axis (or Split Rows or whatever based on your chart type) => Terms => Choose your field.
Then click Advanced link and write {"size":10} to there:
Hope that helps!

Related

Daterange + top_hits aggregation (as subaggregation) with Elasticsearch Java API Client 7.17.x

I've been at this for a day and I don't quite understand how I do it! This is the query I want to "recreate" with the new Java API Client (using Spring Boot)
{
"aggs": {
"range": {
"date_range": {
"field": "timestamp",
"ranges": [
{ "to": "now-2d" }
]
}
}
,
"aggs": {
"top_hits": {
"_source": {
"includes": [ "Id", "timestamp" ]
}
}
}
}
}
I tried doing it with DateRangeAggregation.of but I can't seem to get the right results or type. Here's what I have
SearchResponse<MyDto> response = client.search(b -> b
.index("test-index")
.size(0)
.aggregations("range",a->a.dateRange(DateRangeAggregation.of(d->d
.field("timestamp").ranges(r->r.to(t->t.expr("now-2d")))))),
.aggregations("hits", a -> a
.topHits(h->h.source(SourceConfig.of(c->c.filter(f->f.includes(Arrays.asList("Id", "timestamp"))))))),
MyDto.class
);
I've also tried removing the subaggregation and query for now, but I don't seem to be on the right track to even get the number of doc_count from the bucket. I kind of don't get how to work with the dateRange() here.
Edit: I played around a bit and was able to at least get the number of doc_count, I'm not very sure if this is a good way to do it though?
Aggregation agg = Aggregation.of(a -> a
.dateRange(d->d.field("timestamp").ranges(r->r.to(FieldDateMath.of(v->v.expr("now-2d"))))));
SearchResponse<MyDto> response = client.search(b -> b
.index("test-index")
.size(0)
.aggregations("range", agg),
MyDto.class
);
return response.aggregations().get("range").dateRange().buckets().array().get(0).docCount();
I also fixed the query above, it had an unnecessary extra query that broke the result.
My thought process was wrong. I wanted the documents that were aggregated within this a time but I misunderstood and thought tophits would give them to me, but that's not how it works! I made a seperate range query that actually queries the documents I needed back instead.

elasticsearch: count appearance of terms aggregation on other fields

I want to count how many times, unique values (result of terms aggragation) have appeared in other fields in the same query. Let's say:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"unique_products": {
"terms": {
"field": "products.name.keyword",
"min_doc_count": 10
}
}
}
}
What I want is to count, how many time each of the keys returned in the bucket, appeared in another field.
My ideal output is:
"aggregations": {
"product_stat": {
"key": "<product_name>"
"sold": "<#>" #I want to know how many times the key is appeared in another field like sold
"bought": "<#>"
}
}
Elasticsearch cannot do terms aggregations over multiple fields. In short, if they would, aggregations would not be blazing fast.
As documentation suggests, there are two options:
use script terms aggregation (with performance penalty),
change how the documents are indexed so a normal terms aggregation can be used.
Depending on the structure of your data and your use-cases, you might get by with a complex aggregation + some processing on the client side. This can be done with sub aggregations like here, for example.
Hope that helps!

How to use DSL query from Kibana dev-tools in visualisation?

I have successfully aggregated and queried a particular content I needed in Kibana Dev Tools. However, I need this information in a tabular form either as CSV or PDF. For this, I need to run the DSL query I constructed in Dev Tools in visualisation tool of Kibana. However, I am not able to do it.
I tried copying the DSL to the Lucene query text box on the top part of the visualisation page and also tried within the add filter option. Both way it returns an error.
The query that works in Dev Tools:
{
"query": {
"bool": {
"must": [
{ "match": { "start_datetime":"1569868200" }}
]
}
},
"aggs" : {
"state_location" : {
"terms": {
"field" : "state_location"
},
"aggs": {
"stakeholder_category": {
"terms": {
"field": "stakeholder_category"
},
"aggs": {
"coverage_category": {
"terms": {
"field": "category_paragraph_name.keyword"
}
}
}
}
}
}
}
}
Expecting to get the result on visualisation screen as a table, so that I can export it to CSV or PDF.
The search bar in the discovery bar doesn't work with the json-syntax of a search request towards the REST-API. Instead it uses a simple lucene syntax.
However, you still can edit your search in the discovery manually:
You should be able to see a button with the label "Inspect" like in the following figure.
Note that the look & feel of Kibana got a significant update, so depending of the version you are using, you will find the Inspect button somewhere else in the discovery)
By hitting the button, a right-sided pane will show up with three tabs (Statistics, Request and Response). In the Request-section you can paste your query. Be sure NOT to past the root "query"-node of your json.
Hope, this will help you :-)

Insert aggregation results into an index

The goal is to build an Elasticsearch index with only the most recent documents in groups of related documents to track the current state of some monitoring counters and states.
I have crafted a simple Elasticsearch aggregation query:
{
"size": 0,
"aggs": {
"group_by_monitor": {
"terms": {
"field": "monitor_name"
},
"aggs": {
"get_latest": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
It groups related documents into buckets and select the most recent document for each bucket.
Here are the different ideas I had to get the job done:
directly use the aggregation query to push the results into the index, but it does not seem possible : Is it possible to put the results of an ElasticSearch aggregation back into the index?
use the Logstash Elasticsearch input plugin to execute the aggregation query and the Elasticsearch output plugin to push into the index, but seems like the input plugin only looks at the hits field and is unable to handle aggregation results: Aggregation Query possible input ES plugin !
use the Logstash http_poller plugin to get a JSON document, but it does not seem to allow specifying a body for the HTTP request !
use the Logstash exec plugin to execute cURL commands to get the JSON but this seems quite cumbersome and my last resort.
use the NEST API to build a basic application that will do polling, extract results, clean them and inject the resulting documents into the target index, but I'd like to avoid adding a new tool to maintain.
Is there a reasonably complex way of accomplishing this?
Edit the logstash.conf file as follow
input {
elasticsearch {
hosts => "localhost"
index => "source_index_name"
type =>"index_type"
query => '{Query}'
size => 500
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
index => "target_index_name"
document_id => "%{[#metadata][_id]}"
}
}

Elasticsearch and aggregation of subqueries

I know that elasticsearch allows sub-aggregations (ie. nested aggregation), however I would like to apply aggregation on the result of "first" aggregation (or in generic any query - aggregation or not).
Concrete example: I log events about user actions (for simplicity I have documents with user_id and action). I can make a query that counts number of actions executed by each user. However I would like to find out percentage (or count) of "active users" (e.g. users that have executed more than 10 actions). Ideal result would be a histogram over all users showing how active the users are.
Is there a way how to create such query? Or is there any other approach I can take other than store aggregated results of subquery and compute the histogram out of that?
Note: I have seen Elastic Search and "sub queries" question, but it was about something else and it is over one and half year old and elasticsearch is being actively developed.
Additionally it seems that in version 1.4 there will be available scripted metric aggregation, but anyway that would require to store counter for every user until reduce phase. And some "approximate solution" is good for me - similar to what ES uses internally for its aggregations.
Here is the query I have used, notice the "min_doc_count" in the aggregation.
{
"query": {
"filtered": {
"filter": {
"and": [
{ "term" : { "name": "did x" } },
{ "range": { "created_at": { "gte": "now-7d", "lte": "now" } } }
]
}
}
},
"aggregations": {
"my_agg": {
"terms": {
"field": "user_id",
"min_doc_count": 10,
"size": 0
}
}
}
}
This query returns the list of buckets (users) with more than 9 events in the specified time period. Just 'count' results to get the number of active users.
I have tested this approach with thousands of events and it works well. At a certain scale you will have to use Hadoop.

Resources