Filter in aggregation - elasticsearch

Filter in aggregation - elasticsearch

say I have mapping:
{
// ...other fields,
"locations": {
"type": "nested",
"properties": {
"countrySlug": { "type": "keyword" },
"citySlug": { "type": "keyword" }
}
}
}
So this way, each document can have multiple locations:
{
"locations": [
{
"countrySlug": "germany",
"citySlug": "berlin"
},
{
"countrySlug": "germany",
"citySlug": "hamburg"
},
{
"countrySlug": "poland",
"citySlug": "krakow"
},
{
"countrySlug": "italy",
"citySlug": "milan"
}
]
}
Now I want to get aggregation of city slugs where location contains countrySlug = "germany".
My query looks like this:
{
"_source": false,
"aggs": {
"cities": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"nested": {
"path": "locations",
"query": {
"bool": {
"must": {
"term": {
"locations.countrySlug": "germany"
}
}
}
}
}
}
]
}
}
]
}
},
"aggs": {
"agg": {
"nested": {
"path": "locations"
},
"aggs": {
"slugs": {
"terms": {
"field": "locations.citySlug",
"size": 5
},
"aggs": {
"top_reverse_nested": {
"reverse_nested": {}
}
}
}
}
}
}
}
},
"size": 0
}
But it returns all city slugs that were found, eg:
berlin: 2
krakow: 1
milan: 3
My goal is to get just:
berlin: 2
(or other city slugs that are related to a location with countrySlug = "germany")
Am I missing anything? How to make something like "post filter" for aggregations?
Thanks, PS

After filtering out all the documents where countrySlug is germany, you can put a nested aggregation on the returned records.
GET /cities/_search
{
"size": 0,
"aggs": {
"cities": {
"nested": {
"path": "locations"
},
"aggs": {
"filter_cities": {
"filter": {
"bool": {
"filter": [
{
"term": {
"locations.countrySlug": "germany"
}
}
]
}
},
"aggs": {
"cities": {
"terms": {
"field": "locations.citySlug"
}
}
}
}
}
}
}
}
The result for the above query:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"cities" : {
"doc_count" : 17,
"filter_cities" : {
"doc_count" : 9,
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "hamburg",
"doc_count" : 5
},
{
"key" : "berlin",
"doc_count" : 4
}
]
}
}
}
}
}

Related

How to filter nested aggregations in ElasticSearch?

For example, let's assume we have a product index with the following mapping:
{
"product": {
"mappings": {
"producttype": {
"properties": {
"id": {
"type": "keyword"
},
"productAttributes": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "text",
"analyzer": "keyword"
}
},
"analyzer": "standard"
}
}
}
}
}
}
I am trying to find how many products which have specific product attributes using the following query(I am using a fuzzy query to allow some edit distance):
{
"size": 0,
"query": {
"nested": {
"query": {
"fuzzy": {
"productAttributes.name": {
"value": "SSD"
}
}
},
"path": "productAttributes"
}
},
"aggs": {
"product_attribute_nested_agg": {
"nested": {
"path": "productAttributes"
},
"aggs": {
"terms_nested_agg": {
"terms": {
"field": "productAttributes.name"
}
}
}
}
}
}
But it returns all product attributes for each matched document and here is the response I get.
"aggregations" : {
"product_attribute_nested_agg" : {
"doc_count" : 6,
"terms_nested_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "SSD",
"doc_count" : 3
},
{
"key" : "USB 2.0",
"doc_count" : 3
}
]
}
}
}
Could you please guide me to how to filter buckets to only return matched attributes?
Edit:
Here are some document samples:
"hits" : {
"total" : 12,
"max_score" : 1.0,
"hits" : [
{
"_index" : "product",
"_type" : "producttype",
"_id" : "677d1164-c401-4d36-8a08-6aa14f7f32bb",
"_score" : 1.0,
"_source" : {
"title" : "Dell laptop",
"productAttributes" : [
{
"name" : "USB 2.0",
"value" : "4"
},
{
"name" : "SSD",
"value" : "250 GB"
}
]
}
},
{
"_index" : "product",
"_type" : "producttype",
"_id" : "2954935a-7f60-437a-8a54-00da2d71da46",
"_score" : 1.0,
"_source" : {
"productAttributes" : [
{
"name" : "USB 2.0",
"value" : "3"
},
{
"name" : "SSD",
"value" : "500 GB"
}
],
"title" : "HP laptop"
}
},
]
}

To filter only specific, you can use filter queries.
Query:
{
"size": 0,
"aggs": {
"product_attribute_nested_agg": {
"nested": {
"path": "productAttributes"
},
"aggs": {
"inner": {
"filter": {
"terms": {
"productAttributes.name": [
"SSD"
]
}
},
"aggs": {
"terms_nested_agg": {
"terms": {
"field": "productAttributes.name"
}
}
}
}
}
}
}
}
This is what it does the trick:
"filter": {
"terms": {
"productAttributes.name": [
"SSD"
]
}
}
You need to do filter part of the aggregation.
Output:
"aggregations": {
"product_attribute_nested_agg": {
"doc_count": 4,
"inner": {
"doc_count": 2,
"terms_nested_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "SSD",
"doc_count": 2
}
]
}
}
}
}
Filtering using Fuzziness :
GET /product/_search
{
"size": 0,
"aggs": {
"product_attribute_nested_agg": {
"nested": {
"path": "productAttributes"
},
"aggs": {
"inner": {
"filter": {
"fuzzy": {
"productAttributes.name": {
"value": "SSt",//here will match SSD
"fuzziness": 3//you can remove it to be as Auto
}
}
},
"aggs": {
"terms_nested_agg": {
"terms": {
"field": "productAttributes.name"
}
}
}
}
}
}
}
}

Elasticsearch query with filter take more time than query without filter. Why?

I am using Elasticsearch Version (7.6.1)
Query with Filter is
GET mark13/_search
{
"explain": false,
"from": 0,
"size": 500,
"track_scores": true,
"stored_fields": [
"_source"
],
"sort": {
"_script": {
"type": "number",
"script": {
"id": "sorting_algo",
"params": {
"query": "abhinav keshri"
}
},
"order": "desc"
}
},
"script_fields": {
"poca_score": {
"script": {
"id": "field",
"params": {
"query": "abhinav keshri"
}
}
}
},
"query": {
"bool": {
"filter": [
{
"term": {
"class" : "42"
}
}
],
"should": [
{
"match": {
"applied_for": {
"query": "abhinav keshri",
"boost": 118,
"fuzziness": 0
}
}
},
...
...
...
Output of the above query is
{
"took" : 45414,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 4730.905,
"hits" : [
{
...
...
...
It took 45414 ms to execute.
Query without Filter is
GET mark13/_search
{
"explain": false,
"from": 0,
"size": 500,
"track_scores": true,
"stored_fields": [
"_source"
],
"sort": {
"_script": {
"type": "number",
"script": {
"id": "sorting_algo",
"params": {
"query": "abhinav keshri"
}
},
"order": "desc"
}
},
"script_fields": {
"poca_score": {
"script": {
"id": "script",
"params": {
"query": "abhinav keshri"
}
}
}
},
"query": {
"bool": {
"should": [
{
"match": {
"applied_for": {
"query": "abhinav keshri",
"boost": 118,
"fuzziness": 0
}
}
},
...
...
...
Output of the above query is
{
"took" : 7104,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 9920,
"relation" : "eq"
},
"max_score" : 4730.905,
"hits" : [
{
It took 7104 ms to execute.
My Expectation was that , Filter query would take less time as compared to non-Filter query, since there is less results to apply bool query on.
Also i have tried to execute filter query in different formats (one given above)-
also i have tried following format. (it gives me the same results ).
{
"from": 0,
"size": 50,
"track_scores": true,
"stored_fields": [
"_source"
],
"sort": {
"_script": {
"type": "number",
"script": {
"id": "sorting_algo",
"params": {
"query": "abhinav keshri"
}
}
}
},
"script_fields": {
"poca_score": {
"script": {
"id": "script",
"params": {
"query": "abhinav keshri"
}
}
}
},
"query": {
"bool": {
"should": [
],
"filter": [
{
"bool": {
"should": [
{
...
...
}
],
"must": {
"bool": {
"should": [
{
"terms": {
"class": [
"42"
]
}
},
...
...
]
}
}
}
}
]
}
}
}
Question Why Filter query take longer than non-Filter query?

group by nested and non nested fields in es

Hi i am trying to do group by nested and non nested fields.I want to do group by on 1 non nested fields(from_district) ,1 nested field(truck_number) and max on nested field(truck_number.score).
Requirement -: to get max score of each truck in all districts if truck is present in that district for a given sp_id
eg-:
District1 ,truck1, 0.9,
District2 ,truck1, 0.8,
District1 ,truck2, 1.8,
District2 ,truck3, 0.7,
District3 ,truck4, 1.7
Below is my mapping
{
"sp_ranked_indent" : {
"mappings" : {
"properties" : {
"from_district" : {
"type" : "keyword"
},
"sp_id" : {
"type" : "long"
},
"to_district" : {
"type" : "keyword"
},
"truck_ranking_document" : {
"type" : "nested",
"properties" : {
"score" : {
"type" : "float"
},
"truck_number" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
Below is the query that i tried but it is not grouping by nested and non nested field and also the max truck score is incorrect
{
"size": 0,
"query": {
"terms": {
"sp_id": [650128],
"boost": 1.0
}
},
"aggregations": {
"NESTED_AGG": {
"nested": {
"path": "truck_ranking_document"
},
"aggregations": {
"max_score": {
"max": {
"field": "truck_ranking_document.score"
}
},
"truck_numer": {
"terms": {
"field": "truck_ranking_document.truck_number.keyword",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
},
"fromdistrictagg": {
"reverse_nested": {},
"aggregations": {
"fromDistrict": {
"terms": {
"field": "from_district",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
}
}
}
}
}
}
}

I think this can be done using term and nested aggregation. Below query will produce output in follwing format
District1
Truck1
Max score
Truck2
Max score
Truck3
Max score
District2
Truck1
Max score
Truck2
Max score
Truck3
Max score
Query:
{
"query": {
"terms": {
"sp_id": [
1
]
}
},
"aggs": {
"district": {
"terms": {
"field": "from_district",
"size": 10
},
"aggs": {
"trucks": {
"nested": {
"path": "truck_ranking_document"
},
"aggs": {
"truck_no": {
"terms": {
"field": "truck_ranking_document.truck_number.keyword",
"size": 10
},
"aggs": {
"max_score": {
"max": {
"field": "truck_ranking_document.score"
}
},
"select": {
"bucket_selector": {
"buckets_path": {
"score": "max_score"
},
"script": "if(params.score>0) return true;"
}
}
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "trucks>truck_no._bucket_count"
},
"script": {
"inline": "params.count != 0"
}
}
}
}
}
}
}
Result:
"aggregations" : {
"district" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "District1",
"doc_count" : 1,
"trucks" : {
"doc_count" : 2,
"truck_no" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1,
"max_score" : {
"value" : 2.0
}
},
{
"key" : "3",
"doc_count" : 1,
"max_score" : {
"value" : 3.0
}
}
]
}
}
}
]
}
Composite Aggregation
Composite aggregation response contains an after_key
"after_key" : {
"district" : "District4"
}
you need to use the after parameter to retrieve the next results
{
"aggs": {
"my_buckets": {
"composite": {
"size": 100,
"sources": [
{
"district": {
"terms": {
"field": "from_district"
}
}
}
]
},
"aggs": {
"trucks": {
"nested": {
"path": "truck_ranking_document"
},
"aggs": {
"truck_no": {
"terms": {
"field": "truck_ranking_document.truck_number.keyword",
"size": 10
},
"aggs": {
"max_score": {
"max": {
"field": "truck_ranking_document.score"
}
},
"select": {
"bucket_selector": {
"buckets_path": {
"score": "max_score"
},
"script": "if(params.score>0) return true;"
}
}
}
}
}
}
}
}
}
}

Elasticsearch: terms aggregations on doubly nested object

I am trying to do a doubly nested aggregation on a doubly nested object. That is, I have the root document, a child property, and a grand-child property. To be more precise, I have the following mapping:
{
"mappings": {
"root": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"selections": {
"type": "nested",
"properties": {
"value": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
I am trying to aggregate selection value counts per field, or in other words, to count the number of occurrences of each value for each field name, accross all root objects.
I have this:
{
"query": {
...
},
"aggregations": {
"fields": {
"nested": {
"path": "fields"
},
"aggregations": {
"name": {
"terms": {
"field": "fields.name"
},
"aggregations": {
"values": {
"nested": {
"path": "selections"
},
"aggregations": {
"value": {
"terms": {
"field": "selections.value"
}
}
}
}
}
}
}
}
}
}
which gets the field names as I want but for each of them I get no doc counts for the values.
What am I doing wrong?

You need to give full name for inner nested field, Change "path":"selections" to "path":"fields.selections"
{
"size": 0,
"aggregations": {
"fields": {
"nested": {
"path": "fields"
},
"aggregations": {
"name": {
"terms": {
"field": "fields.name"
},
"aggregations": {
"values": {
"nested": {
"path": "fields.selections"
},
"aggregations": {
"value": {
"terms": {
"field": "fields.selections.value"
}
}
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"fields" : {
"doc_count" : 2,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "abc",
"doc_count" : 2,
"values" : {
"doc_count" : 2,
"value" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 2
}
]
}
}
}
]
}
}
}

Model 'or TRUE' in Elasticsearch boolean query for nested type?

I'm trying to build the following query in Elasticsearch:
(query1) AND (query2 OR query2 OR TRUE)
is the 'OR true' part possible using elasticsearch, or maybe there another way of structuring the query to give the same results?
I have a set of documents, say 10, all matching tag1, some of these 10 documents will also match tag2 and tag3 as well, and if so, I'm using named queries to tell me which documents match tag2 and tag3 (documents matching tag2 and tag3 are subsets of documents matching tag1).
However, even if none match tag2 or tag3, I should still get the results from the initial query matching tag1.
GET /test/_search
{
"query": {
"nested": {
"path": "TAGS",
"query": {
"bool": {
"must": [
{
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
},
{
"bool": {
"should": [
{
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
},
{
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
},
// OR true here?
]
}
}
]
}
},
"inner_hits": {}
}
}
}
UPDATE: Based on #Val's comment. Here is my full test:
PUT /test
PUT /test/_mapping/_doc
{
"properties": {
"name": {
"type": "text"
},
"TAGS": {
"type": "nested"
}
}
}
POST /test/_doc
{
"name" : "doc1",
"TAGS" : [
{
"ID" : "tag1",
"TYPE" : "BASIC"
},
{
"ID" : "tag2",
"TYPE" : "BASIC"
}
]
}
# (tag1) and (tag2 or tag3 or true)
GET /test/_search
{
"query": {
"nested": {
"path": "TAGS",
"query": {
"bool": {
"must": [
{
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
}
],
"should": [
{
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
},
{
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
}
]
}
},
"inner_hits": {}
}
}
}
Running the above query only gives the following results:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "SaOs8G4BbvPS27u-IouS",
"_score" : 0.6931472,
"_source" : {
"name" : "doc1",
"TAGS" : [
{
"ID" : "tag1",
"TYPE" : "BASIC"
},
{
"ID" : "tag2",
"TYPE" : "BASIC"
}
]
},
"inner_hits" : {
"TAGS" : {
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "SaOs8G4BbvPS27u-IouS",
"_nested" : {
"field" : "TAGS",
"offset" : 0
},
"_score" : 0.6931472,
"_source" : {
"ID" : "tag1",
"TYPE" : "BASIC"
},
"matched_queries" : [
"tag1-query"
]
}
]
}
}
}
}
]
}
}
I.e. the matched_queries array only reported a match for tag1-query, when I would have expected it to contain tag1-query and tag2-query?

You need two nested queries, because you're checking constraints on two different nested elements (which are two different nested documents under the hood). Try this out:
{
"query": {
"bool": {
"must": {
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
}
}
},
"should": [
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
}
}
},
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
}
}
}
]
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Filter in aggregation - elasticsearch

Related

How to filter nested aggregations in ElasticSearch?

Elasticsearch query with filter take more time than query without filter. Why?

group by nested and non nested fields in es

Elasticsearch: terms aggregations on doubly nested object

Model 'or TRUE' in Elasticsearch boolean query for nested type?

Categories

Resources