elasticsearch facet nested aggregation - elasticsearch

Using elasticsearch 7.0.0.
I am following this link.
I have an index test_products with following mapping:
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"dynamic_templates": [
{
"search_result_data": {
"mapping": {
"type": "keyword"
},
"path_match": "search_result_data.*"
}
}
],
"properties": {
"search_data": {
"type": "nested",
"properties": {
"full_text": {
"type": "text"
},
"string_facet": {
"type": "nested",
"properties": {
"facet-name": {
"type": "keyword"
},
"facet-value": {
"type": "keyword"
}
}
}
}
}
}
}
}
And a document inserted with following format:
{
"search_result_data": {
"sku": "wheel-6075-90092",
"gtin": null,
"name": "Matte Black Wheel Fuel Ripper",
"preview_image": "abc.jg",
"url": "9836817354546538796",
"brand": "Fuel Off-Road"
},
"search_data":
{
"full_text": "Matte Black Wheel Fuel Ripper",
"string_facet": [
{
"facet-name": "category",
"facet-value": "Motor Vehicle Rims & Wheels"
},
{
"facet-name": "brand",
"facet-value": "Fuel Off-Road"
}
]
}
}
and one other document..
I am trying to aggregate on string_facet as mentioned in the link.
"aggregations": {
"agg_string_facet": {
"nested": {
"path": "string_facet"
},
"aggregations": {
"facet_name": {
"terms": {
"field": "string_facet.facet-name"
},
"aggregations": {
"facet_value": {
"terms": {
"field": "string_facet.facet-value"
}
}
}
}
}
}
}
But I get all (two) documents returned with :
"aggregations": {
"agg_string_facet": {
"doc_count": 0
}
}
What am I missing here?
Also why are the docs being returned as a response?

Documents are returned as a response because they match with your query. If you'd like them to disappear, you can set the "size" field to 0. By default, it's set to 10.
query{
...
},
"size" = 0
I read the docs and Facet aggregation has been removed. The recommendation is to use the Terms aggregation.
Now, for your question, you can go with two options:
If you'd like to get the unique values for each: facet-value and facet-name, you can do the following:
"aggs":{
"unique facet-values":{
"terms":{
"field": "facet-value.keyword",
"size": 30 #By default is 10, maximum recommended is 10,000
}
},
"unique facet-names":{
"terms":{
"field": "facet-name.keyword"
"size": 30 #By default is 10, maximum recommended is 10,000
}
}
}
If you'd like to get the unique combinations between facet-name and facet-value, you can use the Composite aggregation. If you choose this way, your aggs should look like this:
{
"aggs":{
"unique-facetvalue-and-facetname-combination":{
"composite":{
"size": 30, #By default is 10, maximum recommended is 10,000. No matter what size you choose, you can paginate.
"sources":[
{
"value":
{
"terms":{
"field": "facet-value.keyword"
}
}
},
{
"name":
{
"terms":{
"field": "facet-name.keyword"
}
}
}
]
}
}
}
}
The advantage of using Composite over Terms is that Composite lets you paginate your results with the After key. So your cluster's performance does not get affected.
Hope this is helpful! :D

Related

Aggregate, sort and paginate on nested documents

I'm managing a product index, with product sales and other KPIs under a nested field.
Trying to sort based on nested aggregation, and paginate - with no success.
Below is a simplified version of my mapping, for the sake of the example -
{
"product_type":
{
"type": "keyword"
},
"family":
{
"type": "keyword"
},
"rootdomain":
{
"type": "keyword"
},
"kpis":
{
"type": "nested",
"properties":
{
"sales_1d":
{
"type": "float"
},
"timestamp":
{
"type": "date",
"format": "strict_date_optional_time_nanos"
},
"views_1d":
{
"type": "float"
}
}
}
}
My aggregation is similar to the one below-
{
"aggs": {
"group_by_family": {
"aggs": {
"nested_aggregation": {
"aggs": {
"range_filtered": {
"aggs": {
"sales_1d": {
"sum": {
"field": "kpis.sales_1d"
}
},
"views_1d": {
"sum": {
"field": "kpis.views_1d"
}
},
"reverse_nesting": {
"aggs": {
"docs": {
"top_hits": {
"size": 1,
"sort": [
{
"_id": {
"order": "asc"
}
}
],
"_source": {
"includes": [
"_id",
"family",
"rootdomain",
"product_type"
]
}
}
}
},
"reverse_nested": {}
}
},
"filter": {
"range": {
"kpis.timestamp": {
"format": "basic_date_time_no_millis",
"gte": "20220721T000000Z",
"lte": "20220918T235959Z"
}
}
}
}
},
"nested": {
"path": "kpis"
}
}
},
"terms": {
"field": "family",
"size": 10
}
}
},
"query": {
//some query to filter by product-type and rootdomain
},
"size": 0
}
I'm aware that I can add an order clause to term aggregation to order the aggregated results.
My target though is to paginate the aggregated results - meaning I want to retrieve and order
1-10 best-selling products, and later retrieve 11-20 best-selling products and so on.
I've tried using bucket sort under range_filtered but I'm getting an error -
class org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
I'm not sure how to proceed from here, is this possible? if not, is there any workaround?
Thanks.

Multiple concurrent aggregations best practice

I'm considering using Elasticsearch to act as the backend search engine for multi-filter utility. Per this requirement, a multiple aggregation queries will be run upon the cluster, while the expected response time is ~5 seconds.
Based on the details below, do you think this approach is valid for my use case?
If yes, what is the suggested cluster sizing?
For sure I'll have to increase default values for parameters such as index.mapping.total_fields.limit and index.mapping.nested_objects.limit.
It will be much appreciated to get some feedback on the approach suggested below, and ways to avoid common pitfalls.
Thanks in advance.
Details
Number of expected documents: ~50m
Number of unique fields values (facet_name + face_value): ~1B
Number of queries per second: ~50 per sec
Mappings:
{
"mappings": {
"properties": {
"customer_id": {
"type": "keyword"
},
"id": {
"type": "keyword"
},
"mi_score_join": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"mi_data": "customer_model"
}
},
"model_id": {
"type": "keyword"
},
"number_facet": {
"type": "nested",
"properties": {
"facet_name": {
"type": "keyword"
},
"facet_value": {
"type": "long"
}
}
},
"score": {
"type": "long"
},
"string_facet": {
"type": "nested",
"properties": {
"facet_name": {
"type": "keyword"
},
"facet_value": {
"type": "keyword"
}
}
}
}
}
}
An example for a document:
{
"id": 33421,
"string_facet":
[
{
"facet_value":"true",
"facet_name": "var_a"
},
{
"facet_value":"dummy_country",
"facet_name": "var_b"
},
{
"facet_value":"dummy_",
"facet_name": "var_c"
},
{
"facet_value":"https://dummy.com/",
"facet_name": "var_d"
}
,
{
"facet_value":"www.dummy.com",
"facet_name": "var_e"
}
,
{
"facet_value":"dummy",
"facet_name": "var_f"
}
],
"mi_score_join": "mi_data"
}
An example for an aggregation query to be run:
POST test_index/_search
{
"size":0,
"aggs": {
"facets": {
"nested": {
"path": "string_facet"
},
"aggs": {
"names": {
"terms": { "field": "string_facet.facet_name", "size":???},
"aggs": {
"values": {
"terms": { "field": "string_facet.facet_value" }
}
}
}
}
}
}
}
The "size": ??? will probably be the max cardinality of the whole terms values.
Filters may be added to the aggregations, based on the filters that already applied.

Elasticsearch - Query to Determine All Unique IDs that are distance X away from a particular ID?

I have data in this format generated from a random walk (to simulate people walking around). It is set up in this manner { location : { lat: someLat, lon: someLong }, id: uniqueId, date:date }. I am trying to write a query given a users unique ID, find how many other unique IDs came within X distance of the given ID between a certain time range. Any hints on how to accomplish this?
My idea is to have a top level filter aggregration, with a nested geo-query of some sort. I think the geo-distance query is the way to go, but I am not sure how to include it into the below query to get all of unique IDs that come within X distance of the ID I am filtering on. The query below is where I am starting from, I am filtering all documents from now - 1 day to now, where the documents user Id is the provided value. How would I check all other documents for their distances against documents that match this query?
{
"aggs" : {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyyy",
"ranges": [
{ "to": "now" },
{ "from": "now-1d" }
]
}
},
"locations" : {
"filter" : {
"term": { "id.keyword": "7a50ab18-886b-42a2-80ad-3d45112e3cfd" }
}
}
}
}
Your hunch is correct. All of this can be done using range & geo_distance filtering and _geo_distance sorting. You wanna filter on the query-level, not in the aggs though:
GET walking/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-1d"
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "20m",
"location": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
}
}
}
]
}
},
"aggs": {
"rings_around_loc": {
"geo_distance": {
"field": "location",
"origin": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
},
"unit": "m",
"keyed": true,
"ranges": [
{
"to": 10
},
{
"from": 10,
"to": 50
},
{
"from": 50
}
]
}
},
"locations": {
"value_count": {
"field": "id.keyword"
}
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
},
"order": "asc",
"unit": "m",
"mode": "min",
"distance_type": "arc",
"ignore_unmapped": true
}
}
]
}
Not sure what you need the range buckets for so I left them out.
Full steps to replicate:
PUT walking
{
"mappings": {
"properties": {
"date": {
"type": "date"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
And then POST _bulk this random walk data

ElasticSearch aggregation query with List in documents

I have following records of car sales of different brands in different cities.
Document -1
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":100,
"sold":80
},{
"name":"Honda",
"purchase":200,
"sold":150
}]
}
Document -2
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":50,
"sold":40
},{
"name":"Honda",
"purchase":150,
"sold":120
}]
}
I am trying to come up with query to aggregate car statistics for a given city but not getting the right query.
Required result:
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":150,
"sold":120
},{
"name":"Honda",
"purchase":350,
"sold":270
}]
}
First you need to map your array as a nested field (script would be complicated and not performant). Nested field are indexed, aggregation will be pretty fast.
remove your index / or create a new one. Please note i use test as type.
{
"mappings": {
"test": {
"properties": {
"city": {
"type": "keyword"
},
"cars": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"purchase": {
"type": "integer"
},
"sold": {
"type": "integer"
}
}
}
}
}
}
}
Index your document (same way you did)
For the aggregation:
{
"size": 0,
"aggs": {
"avg_grade": {
"terms": {
"field": "city"
},
"aggs": {
"resellers": {
"nested": {
"path": "cars"
},
"aggs": {
"agg_name": {
"terms": {
"field": "cars.name"
},
"aggs": {
"avg_pur": {
"sum": {
"field": "cars.purchase"
}
},
"avg_sold": {
"sum": {
"field": "cars.sold"
}
}
}
}
}
}
}
}
}
}
result:
buckets": [
{
"key": "Honda",
"doc_count": 2,
"avg_pur": {
"value": 350
},
"avg_sold": {
"value": 270
}
}
,
{
"key": "Toyota",
"doc_count": 2,
"avg_pur": {
"value": 150
},
"avg_sold": {
"value": 120
}
}
]
if you have index the name / city field as a text (you have to ask first if this is necessary), use .keyword in the term aggregation ("cars.name.keyword").

Unable to create nested date aggregation query

I am trying to create an ElasticSearch aggregation query which can generate sum or average of value in all my ingested documents.
The documents are of the format -
{
"weather":"cold",
"date_1":"2017/07/05",
"feedback":[
{
"date_2":"2017/08/07",
"value":28,
"comment":"not cold"
},{
"date_2":"2017/08/09",
"value":48,
"comment":"a bit chilly"
},{
"date_2":"2017/09/07",
"value":18,
"comment":"very cold"
}, ...
]
}
I am able to create a sum aggregation of all "feedback.value" using "date_1" by using the following request -
GET _search
{
"query": {
"query_string": {
"query": "cold"
}
},
"size": 0,
"aggs": {
"temperature": {
"date_histogram":{
"field" : "date_1",
"interval" : "month"
},
"aggs":{
"temperature_agg":{
"terms": {
"field": "feedback.value"
}
}
}
}
}
}
However, I need to generate the same query across all documents aggregate based on "feedback.date_2". I am not sure if ElasticSearch can resolve such aggregation or how to approach it. Any guidance would be helpful
[EDIT]
Mapping file( I only define the nested items, ES identifes other fields on its own)
{
"mappings": {
"catalog_item": {
"properties": {
"feedback":{
"type":"nested",
"properties":{
"date_2":{
"type": "date",
"format":"YYYY-MM-DD"
},
"value": {
"type": "float"
},
"comment": {
"type": "text"
}
}
}
}
}
}
}
You would need to make use of nested documents and sum aggregation.
Here's a working example:
Sample Mapping:
PUT test
{
"mappings": {
"doc": {
"properties": {
"feedback": {
"type": "nested"
}
}
}
}
}
Add Sample document:
PUT test/doc/1
{
"date_1": "2017/08/07",
"feedback": [
{
"date_2": "2017/08/07",
"value": 28,
"comment": "not cold"
},
{
"date_2": "2017/08/09",
"value": 48,
"comment": "a bit chilly"
},
{
"date_2": "2017/09/07",
"value": 18,
"comment": "very cold"
}
]
}
Calculate both the sum and average based on date_2.
GET test/_search
{
"size": 0,
"aggs": {
"temperature_aggregation": {
"nested": {
"path": "feedback"
},
"aggs": {
"temperature": {
"date_histogram": {
"field": "feedback.date_2",
"interval": "month"
},
"aggs": {
"sum": {
"sum": {
"field": "feedback.value"
}
},
"avg": {
"avg": {
"field": "feedback.value"
}
}
}
}
}
}
}
}

Resources