ElasticSearch query with prefix for aggregation - elasticsearch

I am trying to add a prefix condition for my ES query in a "must" clause.
My current query looks something like this:
body = {
"query": {
"bool": {
"must":
{ "term": { "article_lang": 0 }}
,
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I need to add a mandatory condition to my query to filter articles whose id starts with "article-".
So, far I have tried this:
{
"query": {
"bool": {
"should": [
{ "term": { "article_lang": 0 }},
{ "prefix": { "article_id": {"value": "article-"} }}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I am fairly new to ES and from the documentations online, I know that "should" is to be used for "OR" conditions and "must" for "AND". This is returning me some data but as per the condition it will be consisting of either article_lang=0 or articles starting with article-. When I use "must", it doesn't return anything.
I am certain that there are articles with id starting with this prefix because currently, we are iterating through this result to filter out such articles. What am I missing here?

In your prefix query, you need to use the article_id.keyword field, not article_id. Also, you should prefer filter over must since you're simply doing yes/no matching (aka filters)
{
"query": {
"bool": {
"filter": [ <-- change this
{
"term": {
"article_lang": 0
}
},
{
"prefix": {
"article_id.keyword": { <-- and this
"value": "article-"
}
}
}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}

Related

ElasticSearch - Aggregation result not matching total hits

I have query like below. It returns 320 results for the below condition-
{
"size": "5000",
"sort": [
{
"errorDateTime": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"errorDateTime": {
"gte": "2021-04-07T20:08:20.516",
"lte": "2021-04-08T00:08:20.516"
}
}
},
{
"bool": {
"should": [
{
"match": {
"businessFunction": "PriceUpdate"
}
},
{
"match": {
"businessFunction": "PriceFeedIntegration"
}
},
{
"match": {
"businessFunction": "StoreConnectivity"
}
},
{
"match": {
"businessFunction": "Transaction"
}
},
{
"match": {
"businessFunction": "SalesSummary"
}
}
]
}
}
]
}
},
"aggs": {
"genres_and_store": {
"terms": {
"field": "storeId"
},
"aggs": {
"genres_and_error": {
"terms": {
"field": "errorCode"
},
"aggs": {
"genres_and_business": {
"terms": {
"field": "businessFunction"
}
}
}
}
}
}
}
}
However the aggregation results are not matching. I have so many stores which are not returned in aggregation but I can see them in query result. What am I missing? My schema looks like -
{
"errorDescription": "FTP Service unable to connect to Store to list the files for Store 12345",
"errorDateTime": "2021-04-07T21:01:15.040546",
"readBy": [],
"errorCode": "e004",
"businessFunction": "TransactionError",
"storeId": "12345"
}
Please let me know if I am writing the query wrong. I want to aggregare per store, per errorcode and per businessFunction.
If no size param is set in the terms aggregation, then by default it returns the top 10 terms, which are ordered by their doc_count. You need to add the size param in the terms aggregation, to get all the matching total hits.
Try out the below query
{
"size": "5000",
"sort": [
{
"errorDateTime": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"errorDateTime": {
"gte": "2021-04-07T20:08:20.516",
"lte": "2021-04-08T00:08:20.516"
}
}
},
{
"bool": {
"should": [
{
"match": {
"businessFunction": "PriceUpdate"
}
},
{
"match": {
"businessFunction": "PriceFeedIntegration"
}
},
{
"match": {
"businessFunction": "StoreConnectivity"
}
},
{
"match": {
"businessFunction": "Transaction"
}
},
{
"match": {
"businessFunction": "SalesSummary"
}
}
]
}
}
]
}
},
"aggs": {
"genres_and_store": {
"terms": {
"field": "storeId",
"size": 100 // note this
},
"aggs": {
"genres_and_error": {
"terms": {
"field": "errorCode"
},
"aggs": {
"genres_and_business": {
"terms": {
"field": "businessFunction"
}
}
}
}
}
}
}
}
I think I was missing size parameter inside aggs and was getting default 10 aggregations only:
"aggs": {
"genres_and_store": {
"terms": {
"field": "storeId",
"size": 1000
},

Performing a text search and filtering on nested terms in elasticsearch

I'm trying to perform a search th e.g. searches the word coyotes in the description , but are red and green and are in the cartoon category. Now I think I understand you can't have match and terms in the same query (the query below doesn't work for this reason), but also you that you shouldn't use terms to search on a text field. Can anyone point me in the right direction?
here's my query
GET /searchproducts/_search
{
"query": {
"match": {
"description": {
"query": "coyote"
}
},
"bool": {
"should": [{
"terms": {
"colours.name": ["red", "green"]
}
},
{
"terms": {
"categories.name": ["Cartoon"]
}
}
]
}
},
"aggs": {
"colours": {
"terms": {
"field": "colour.name.value",
"size": 100
}
},
"categories": {
"terms": {
"field": "categories.id",
"size": 100
}
}
}
}
You can use a bool query to combine multiple queries. Try out this query:
{
"query": {
"bool": {
"should": [
{
"match": {
"description": {
"query": "coyote"
}
}
},
{
"bool": {
"should": [
{
"terms": {
"colours.name": [
"red",
"green"
]
}
},
{
"terms": {
"categories.name": [
"Cartoon"
]
}
}
]
}
}
]
}
},
"aggs": {
"colours": {
"terms": {
"field": "colour.name.value",
"size": 100
}
},
"categories": {
"terms": {
"field": "categories.id",
"size": 100
}
}
}
}

Elasticsearch - generic facets structure - calculating aggregations combined with filters

In a new project of ours, we were inspired by this article http://project-a.github.io/on-site-search-design-patterns-for-e-commerce/#generic-faceted-search for doing our “facet” structure. And while I have got it working to the extent the article describes, I have run into issues in getting it to work when selecting facets. I hope someone can give a hint as to something to try, so I don’t have to redo all our aggregations into separate aggregation calculations again.
The problem is basically that we are using a single aggregation to calculate all the “facets” at once, but when I add a filter (fx. checking a brand name), then it “removes” all the other brands when returning the aggregates. What I basically want is that it should use that brand as filter when calculating the other facets, but not when calculating the brand aggregations. This is necessary so the user can, for example, choose multiple brands.
Looking at https://www.contorion.de/search/Metabo_Fein/ou1-ou2?q=Winkelschleifer&c=bovy (which is the site described in the above article), I have selected the “Metabo” and “Fein” manufacturer (Hersteller), and unfolding the Hersteller menu it shows all manufacturers and not just the ones selected. So I know it’s possible somehow and I hope some one out there has a hint as to how to write the aggregations / filters, so I get the "correct e-commerce facet behavior".
On the products in ES I have the following structure: (the same as in the original article, though “C#’ified” in naming)
"attributeStrings": [
{
"facetName": "Property",
"facetValue": "Organic"
},
{
"facetName": "Property",
"facetValue": "Without parfume"
},
{
"facetName": "Brand",
"facetValue": "Adidas"
}
]
So the above product has 2 attributes/facet groups – Property with 2 values (Organic, Without parfume) and Brand with 1 value (Adidas).
Without any filters I calculate the aggregations from the following query:
"aggs": {
"agg_attr_strings_filter": {
"filter": {},
"aggs": {
"agg_attr_strings": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
} } } } } } } }
Now if I select Property "Organic" and Brand "Adidas" I build the same aggregation, but with a filter to apply those two constraints (which is were it kind of goes wrong...):
"aggs": {
"agg_attr_strings_filter": {
"filter": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
}
]
}
},
"aggs": {
"agg_attr_strings": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName",
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
} } } } } } } }
The only way I can see forward with this model, is to calculate the aggregation for each selected facet and somehow merge the result. But it seems very complex and kind of defeats the point of having the model as described in the article, so I hope there's a more clean solution and someone can give a hint at something to try.
The only way I can see forward with this model, is to calculate the aggregation for each selected facet and somehow merge the result.
This is exactly right. If one facet (e.g. brand) is selected than you can not use global brand filter if you also want to fetch other brands for multi-selection. What you can do is apply all other filters on selected facets, and all filters on non-selected facets. As a results you will have n+1 separate aggregations for n selected filters - first one is for all facets and the rest are for selected facets.
In your case query might look like:
{
"aggs": {
"agg_attr_strings_filter": {
"filter": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
}
]
}
},
"aggs": {
"agg_attr_strings": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
}
}
}
}
}
}
}
},
"special_agg_property": {
"filter": {
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
"aggs": {
"special_agg_property": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"agg_filtered_special": {
"filter": {
"query": {
"match": {
"attributeStrings.facetName": "Property"
}
}
},
"aggs": {
"facet_value": {
"terms": {
"size": 1000,
"field": "attributeStrings.facetValue"
}
}
}
}
}
}
}
},
"special_agg_brand": {
"filter": {
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
"aggs": {
"special_agg_brand": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"agg_filtered_special": {
"filter": {
"query": {
"match": {
"attributeStrings.facetName": "Brand"
}
}
},
"aggs": {
"facet_value": {
"terms": {
"size": 1000,
"field": "attributeStrings.facetValue"
}
}
}
}
}
}
}
}
}
}
This query looks super big and scary but generating such query can be done with few dozen lines of code.
When parsing query results, you need to first parse general aggregation (one that uses all filters) and after special facet aggregations. From the upper example, first parse results from agg_attr_strings_filter but those results will also contain aggregation values for Brand and Property that should be overwritten by aggregation values from special_agg_property and special_agg_brand
Also, this query is efficient since Elasticsearch does good job in caching separate filter clauses so applying same filters in different parts of query should be cheap.
But it seems very complex and kind of defeats the point of having the model as described in the article, so I hope there's a more clean solution and someone can give a hint at something to try.
There is really no way around the fact that you need to apply different filters to different facets and at the same time have different query filters. If you need to support "correct e-commerce facet behavior" you will have complex query :)
Disclaimer: I'm coauthor of the mentioned article.
The issue comes from the fact that you are adding a filter on Property and Organic inside your aggregation, hence the more facets you pick, the more you will restrain the terms you will get. In that article, the filter they use is in fact a post_filter, both names were allowed until recently, but filter got removed because that was causing ambiguities.
What you need to do is to move that filter outside the aggregations into the post_filter section, so that the results get correctly filtered out by whatever facets have been picked, but all your facets still get computed correctly on the whole document set.
{
"post_filter": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
}
]
}
},
"aggs": {
"agg_attr_strings_full": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
}
}
}
}
}
},
"agg_attr_strings_filtered": {
"filter": {
"bool": {
"filter": [
{
"nested": {
"path": "attributeStrings",
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "attributeStrings",
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
}
}
}
]
}
},
"aggs": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
}
}
}
}
}
}
}
}
}

ElasticSearch aggregations using filter and without it

I`m building product list page with filters. There a lot of filters, and data for them are counting in ES with aggregation functions.
Simplest example if min/max price:
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
}
}
}
So, this request in ES return me minimal and maximal price according rules installed in filter (category_id 36898, shop_id 44 etc).
It is working perfect.
The question is: is it possible to update this request and get aggregations without filters? Or is it maybe possible to return aggregation data with another filter in one request?
So I want:
min_price and max_price for filtered data (query1)
and mix_price and max_price for unfiltered data (or filtered data with query 2)?
You can use global option for the aggregations to not applying any filters provided in query block.
For example, for your query use the following json input.
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
},
"without_filter_min": {
"global": {},
"aggs": {
"price_value": {
"min": {
"field": "products_price"
}
}
}
},
"without_filter_max": {
"global": {},
"aggs": {
"price_value": {
"max": {
"field": "products_price"
}
}
}
}
}
}

Using aggregation with filters in elastic search

I have an elastic search running with documents like this one:
{
id: 1,
price: 620000,
propertyType: "HO",
location: {
lat: 51.41999,
lon: -0.14426
},
active: true,
rentOrSale: "S",
}
I'm trying to use aggregates to get statistics about a certain area using aggregations and the query I'm using is the following:
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"must": [
{
"term": {
"rentOrSale": "s"
}
},
{
"term": {
"active": true
}
}
]
},
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "15.0mi",
"location": {
"lat": 51.50735,
"lon": -0.12776
}
}
}
]
}
}
},
"aggs": {
"propertytype_agg": {
"terms": {
"field": "propertyType"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
},
"bed_agg": {
"terms": {
"field": "numberOfBedrooms"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
But in the result I can't see the aggregations. As soon as I remove either the bool or filtered part of the query I can see the aggregations. I can't figure out why this is happening, nor how do I get the aggregations for these filters. I've tried using the answer to this question but I've not been able to solve it. Any ideas?
I think your query need to be slightly re-arranged - move the "filtered" further up and repeat the "query" command:
"query": {
"filtered": {
"query" : {
"bool": {
...
}
},
"filter": {
...
}
}
}

Resources