How to search for an array of terms, in elasticsearch? - elasticsearch

Contextualizing: I have this query that I search for a term, in two fields, and the result should bring me items that resemble the one inserted in the wildcard. But eventually I'll get a list of search terms...
I use this query to search when I get only 1 string:
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"wildcard": {
"shortName": "BAN*"
}
},
{
"wildcard": {
"name": "BAN*"
}
}
]
}
},
{
"range": {
"dhCot": {
"gte": "2022-04-11T00:00:00.000Z",
"lt": "2022-04-12T00:00:00.000Z"
}
}
}
]
}
},
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "dtBuy",
"interval": "1H",
"format": "yyyy-MM-dd:HH:mm:ssZ"
},
"aggs": {
"documents": {
"top_hits": {
"size": 100
}
}
}
}
}
}
But in some moments, I will get an array of strings, like this ["BANANA","APPLE","ORANGE"]
So, how do I search for items that exactly match the items within the array? Is it possible?
The object inserted in elastic is this one:
{
"name": "BANANA",
"priceDay": 1,
"priceWeek": 3,
"variation": 2,
"dataBuy":"2022-04-11T11:01:00.585Z",
"shortName": "BAN"
}

If you want to search for items that exactly match the items within the array, you can use the terms query
{
"query": {
"terms": {
"name": ["BANANA","APPLE","ORANGE"]
}
}
}
You can include the terms query, in your existing query either in the should clause or must clause depending on your use case.

Related

Use distinct field for count with significant_terms in Elastic Search

Is there a way to get the signification_terms aggregation to use document counts based on a distinct field?
I have an index with posts and their hashtags but they are from multiple sources so there will be multiple ones with the same permalink field but I only want to count unique permalinks per each hashtag. I have managed to get the unique totals using the cardinality aggregation: (ie "cardinality": { field": "permalink.keyword"}) but can't work out how to do this with the Significant terms aggregation. My query is as follows:
GET /posts-index/_search
{
"aggregations": {
"significant_hashtag": {
"significant_terms": {
"background_filter": {
"bool": {
"filter": [
{
"range": {
"created": {
"gte": 1656414622,
"lte": 1656630000
}
}
}
]
}
},
"field": "hashtag.keyword",
"mutual_information": {
"background_is_superset": false,
"include_negatives": true
},
"size": 100
}
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"created": {
"gte": 1656630000,
"lte": 1659308400
}
}
}
]
}
},
"size": 0
}

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

Elasticsearch how can perform a "TERMS" AND "RANGE" query together

In elasticsearch, I am working well with Terms query to search multiple ID in one query,
my original terms query
{
"query": {
"terms": {
"Id": ["134","156"],
}
}
}
however, I need to add an extra condition like the following:
{
"query": {
"terms": {
"id": ["163","121","569","579"]
},
"range":{
"age":
{"gt":10}
}
}
}
the "id" field can be a long array.
You can combine both the queries using bool query
{
"query": {
"bool": {
"must": [
{
"terms": {
"Id": [
"134",
"156"
]
}
},
{
"range": {
"age": {
"gt": 10
}
}
}
]
}
}
}

Elasticsearch scoped aggregation not desired results

I have the following query but the aggregation doesn't seem to be acting on top of the query.
The query returns 3 results there are 10 items in the aggregation. Looks like the aggregation is acting on top of all queried results.
Basically, how do I get the aggregation to take the given query as the input?
{
"query": {
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"coordinates": [
-79.3931,
43.6709
],
"distance": "15km"
}
},
{
"term": {
"user.type": "2"
}
}
]
},
"query": {
"match": {
"user.shoes": "314"
}
}
}
},
"aggs": {
"dedup": {
"terms": { "field": "user.id" }
"aggs": {
"dedup_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
So as it turns out, I was expecting the aggregation to act on the paginated results given by the query. And that's incorrect.
The aggregation takes as input "all results" of the query, not just the paginated one.

Filter/Query support in Elasticsearch Top hits Aggregation

Elasticsearch documentation states that The top_hits aggregation returns regular search hits, because of this many per hit features can be supported Crucially, the list includes Named filters and queries
But trying to add any filter or query throws SearchParseException: Unknown key for a START_OBJECT
Use case: I have items which have list of nested comments
items{id} -> comments {date, rating}
I want to get top rated comment for each item in the last week.
{
"query": {
"match_all": {}
},
"aggs": {
"items": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"comment": {
"nested": {
"path": "comments"
},
"aggs": {
"top_comment": {
"top_hits": {
"size": 1,
//need filter here to select only comments of last week
"sort": {
"comments.rating": {
"order": "desc"
}
}
}
}
}
}
}
}
}
}
So is the documentation wrong, or is there any way to add a filter?
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-top-hits-aggregation.html
Are you sure you have mapped them as Nested? I've just tried to execute such query on my data and it did work fine.
If so, you could simply add a filter aggregation, right after nested aggregation (hopefully I haven't messed up curly brackets):
POST data/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "comments",
"query": {
"range": {
"comments.date": {
"gte": "now-1w",
"lte": "now"
}
}
}
}
}
}
},
"aggs": {
"items": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"nested": {
"nested": {
"path": "comments"
},
"aggs": {
"filterComments": {
"filter": {
"range": {
"comments.date": {
"gte": "now-1w",
"lte": "now"
}
}
},
"aggs": {
"topComments": {
"top_hits": {
"size": 1,
"sort": {
"comments.rating": "desc"
}
}
}
}
}
}
}
}
}
}
}
P.S. Always include FULL path for nested objects.
So this query will:
Filter documents that have comments younger than one week to narrow down documents for aggregation and to find those, who actually have such comments (filtered query)
Do terms aggregation based on id field
Open nested sub documents (comments)
Filter them by date
Return the most badass one (most rated)

Resources