Groupby functionality on multiple fields in elastic search - elasticsearch

I have a requirement where I need to groupby status_value as per regions and regions as per given date. For the same I have written a query and it is not exactly working in the ES. It would be a great help, if someone look into this and provide me with the solution.
Note: I would like to get the result for the last day (i.e. previous day).
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"UsagePoint_Asset_lifecycle_installationDate": "2014-07-13T16:55:00.0-07:00"
}
}
}
},
"aggs" : {
"product" : {
"terms" : {
"field" : "UsagePoint_ServiceLocation_region"
},
"aggs" : {
"material" : {
"terms" : {
"field" : "UsagePoint_status_value"
}
}
}
}
}
}
my sql query may be like below:
select count(status_value)
from products
where date = "yesterday"
group by region , date
Please check below query is working, but I would like to get the values for a specific day or dates.
{
"agg1": {
"terms": {
"field":"UsagePoint_Asset_lifecycle_installationDate"
},
"aggs" : {
"product" : {
"terms" : {
"field" : "UsagePoint_ServiceLocation_region"
},
"aggs" : {
"material" : {
"terms" : {
"field" : "UsagePoint_status_value"
}
}
}
}
}
}
}

If you need the group by info for yesterday , the following is a solution.
For any other custom dates , change the value in gte ( Greater than or equals to ) and lt ( Less than )
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"UsagePoint_Asset_lifecycle_installationDate": {
"gte": "now-1d",
"lt": "now"
}
}
}
}
},
"aggs": {
"product": {
"terms": {
"field": "UsagePoint_ServiceLocation_region"
},
"aggs": {
"material": {
"terms": {
"field": "UsagePoint_status_value"
}
}
}
}
}
}

Related

In ElasticSearch break down hits per filter?

Given the following query, how can I get the number of hits independently for each range and term query and what are the performance implications for this? As of yet, I can't find anything in the documentation that indicates how to do this. Where can I find the docs for such a feature?
{
"query": {
"bool" : {
"must" : {
"term" : { "user.id" : "kimchy" }
},
"filter": {
"term" : { "tags" : "production" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
You can use filter aggregation for getting document count per query clause. As you are providing query as well, you need to use global aggregation with filter aggregation. If you dont use global aggregation then it will return count based on top level query and you will not able to get total document for specific query clause.
Below is sample query with aggregation:
{
"query": {
"bool": {
"must": {
"term": {
"user.id": "kimchy"
}
},
"filter": {
"term": {
"tags": "production"
}
},
"must_not": {
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
}
},
"aggs": {
"Total": {
"global": {},
"aggs": {
"user_term": {
"filter": {
"term": {
"user.id": "kimchy"
}
}
},
"tag_term": {
"filter": {
"term": {
"tags": "production"
}
}
},
"age_range_not": {
"filter": {
"bool": {
"must_not": {
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
}
}
},
"age_range": {
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
}
}
}
}
}
You will get below response:
"aggregations" : {
"Total" : {
"doc_count" : 3,
"age_range" : {
"doc_count" : 2
},
"age_range_not" : {
"doc_count" : 1
},
"tag_term" : {
"doc_count" : 3
},
"user_term" : {
"doc_count" : 2
}
}
}

Elasticsearch querying number of dates in array matching query

I have documents in the following form
PUT test_index/_doc/1
{
"dates" : [
"2018-07-15T14:12:12",
"2018-09-15T14:12:12",
"2018-11-15T14:12:12",
"2019-01-15T14:12:12",
"2019-03-15T14:12:12",
"2019-04-15T14:12:12",
"2019-05-15T14:12:12"],
"message" : "hello world"
}
How do I query for documents such that there are n number of dates within the dates array falling in between two specified dates?
For example: Find all documents with 3 dates in the dates array falling in between "2018-05-15T14:12:12" and "2018-12-15T14:12:12" -- this should return the above document as "2018-07-15T14:12:12", "2018-09-15T14:12:12" and "2018-11-15T14:12:12" fall between "2018-05-15T14:12:12" and "2018-12-15T14:12:12".
I recently faced the same problem. However came up with two solutions.
1) If you do not want to change your current mapping, you could query for the documents using query_string. Also note you will have to create the query object according to the range that you have. ("\"2019-04-08\" OR \"2019-04-09\" OR \"2019-04-10\" ")
{
"query": {
"query_string": {
"default_field": "dates",
"query": "\"2019-04-08\" OR \"2019-04-09\" OR \"2019-04-10\" "
}
}
}
However,this type of a query only makes sense if the range is short.
2) So the second way is the nested method. But you will have to change your current mapping in such a way.
{
"properties": {
"dates": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "YYYY-MM-dd"
}
}
}
}
}
So your query will look something like this :-
{
"query": {
"nested": {
"path": "dates",
"query": {
"bool": {
"must": [
{
"range": {
"dates.key": {
"gte": "2018-04-01",
"lte": "2018-12-31"
}
}
}
]
}
}
}
}
}
You can create dates as a nested document and use bucket selector aggregation.
{
"empId":1,
"dates":[
{
"Days":"2019-01-01"
},
{
"Days":"2019-01-02"
}
]
}
Mapping:
"mappings" : {
"properties" : {
"empId" : {
"type" : "keyword"
},
"dates" : {
"type" : "nested",
"properties" : {
"Days" : {
"type" : "date"
}
}
}
}
}
GET profile/_search
{
"query": {
"bool": {
"filter": {
"nested": {
"path": "dates",
"query": {
"range": {
"dates.Days": {
"format": "yyyy-MM-dd",
"gte": "2019-05-01",
"lte": "2019-05-30"
}
}
}
}
}
}
},
"aggs": {
"terms_parent_id": {
"terms": {
"field": "empId"
},
"aggs": {
"availabilities": {
"nested": {
"path": "dates"
},
"aggs": {
"avail": {
"range": {
"field": "dates.Days",
"ranges": [
{
"from": "2019-05-01",
"to": "2019-05-30"
}
]
},
"aggs": {
"count_Total": {
"value_count": {
"field": "dates.Days"
}
}
}
},
"max_hourly_inner": {
"max_bucket": {
"buckets_path": "avail>count_Total"
}
}
}
},
"bucket_selector_page_id_term_count": {
"bucket_selector": {
"buckets_path": {
"children_count": "availabilities>max_hourly_inner"
},
"script": "params.children_count>=19;" ---> give the number of days that should match
}
},
"hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
I found my own answer to this, although I'm not sure how efficient it is compared to the other answers:
GET test_index/_search
{
"query":{
"bool" : {
"filter" : {
"script" : {
"script" : {"source":"""
int count = 0;
for (int i=0; i<doc['dates'].length; ++i) {
if (params.first_date < doc['dates'][i].toInstant().toEpochMilli() && doc['dates'][i].toInstant().toEpochMilli() < params.second_date) {
count += 1;
}
}
if (count >= 2) {
return true
} else {
return false
}
""",
"lang":"painless",
"params": {
"first_date": 1554818400000,
"second_date": 1583020800000
}
}
}
}
}
}
}
where the parameters are the two dates in epoch time. I've chosen 2 matches here, but obviously you can generalise to any number.

How to group events by multiple terms?

How can I group by year and month? My query works if I leave 1 term, for example, Month. But I cannot group by multiple terms.
GET traffic-data/_search?
{
"size":0,
"query": {
"bool": {
"must": [
{ "match": {
"VehiclePlateNumber": "111"
}}
]
} },
"aggs" : {
"years" : {
"terms" : {
"field" : "Year"
},
"aggs" : {
"months" : { "by_month" : { "field" : "Month" } }
}
}
}
}
I think your question's query is already close, try this:
GET traffic-data/_search?
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"VehiclePlateNumber": "111"
}
}
]
}
},
"aggs": {
"years": {
"terms": {
"field": "Year",
"size": 100
},
"aggs": {
"months": {
"terms": {
"size": 12,
"field": "Month"
}
}
}
}
}
}
Edit - I am assuming your month is a string keyword field. Let me know if this is not the case (and please include the mappings) and I will revise.

Getting "Field data loading is forbidden" when trying to aggregate

I'm trying to do a simple unique aggregation, but getting this error:
java.lang.IllegalStateException: Field data loading is forbidden on eid
this is my query:
POST /logstash-2016.06.*/Nginx/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"pid": "1"
}
},
{
"term": {
"cvprogress": "0"
}
},
{
"range" : {
"ServerTime" : {
"gte" : "2016-06-28T00:00:00"
}
}
}
]
}
},
"aggs": {
"distinct_colors" : {
"cardinality" : {
"field" : "eid"
}
}
}
}
After going through the entire thread at https://github.com/elastic/elasticsearch/issues/15267 what worked was adding .raw
like this:
"aggs": {
"distinct_colors" : {
"cardinality" : {
"field" : "eid.raw"
}
}
}

Elasticsearch match list against field

I have a list, array or whichever language you are familiar. E.g. names : ["John","Bas","Peter"] and I want to query the name field if it matches one of those names.
One way is with OR Filter. e.g.
{
"filtered" : {
"query" : {
"match_all": {}
},
"filter" : {
"or" : [
{
"term" : { "name" : "John" }
},
{
"term" : { "name" : "Bas" }
},
{
"term" : { "name" : "Peter" }
}
]
}
}
}
Any fancier way? Better if it's a query than a filter.
{
"query": {
"filtered" : {
"filter" : {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
Which Elasticsearch rewrites as if you hat used this one
{
"query": {
"filtered" : {
"filter" : {
"bool": {
"should": [
{
"term": {
"name": "John"
}
},
{
"term": {
"name": "Bas"
}
},
{
"term": {
"name": "Peter"
}
}
]
}
}
}
}
}
When using a boolean filter, most of the time, it is better to use the bool filter than and or or. The reason is explained on the Elasticsearch blog: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
As I tried the filtered query I got no [query] registered for [filtered], based on answer here it seems the filtered query has been deprecated and removed in ES 5.0. So I provide using:
{
"query": {
"bool": {
"filter": {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
example query = filter by keyword and a list of values
{
"query": {
"bool": {
"must": [
{
"term": {
"fguid": "9bbfe844-44ad-4626-a6a5-ea4bad3a7bfb.pdf"
}
}
],
"filter": {
"terms": {
"page": [
"1",
"2",
"3"
]
}
}
}
}
}

Resources