Elasticsearch aggregation on different search in same query - elasticsearch

I want to make a query to aggregate base only on match no matter what other parameters(terms , term , etc...) are used.
To be more specific I have an online shop where I use multiple filters (color ,size etc..) If I check a field for example color : red the other colors are no longer aggregated.
A solution that I am using is to make 2 separated queries (one for search where filters are applied and other for aggregation. Any idea how can I combine the 2 separated queries ?

You can take advantage of post_filter which will not apply to your aggregations but will only filter the to-be-returned hits. For example:
Create a shop
PUT online_shop
{
"mappings": {
"properties": {
"color": {
"type": "keyword"
},
"size": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Populate it w/ a few products
POST online_shop/_doc
{"color":"red","size":35,"name":"Louboutin High heels abc"}
POST online_shop/_doc
{"color":"black","size":34,"name":"Louboutin Boots abc"}
POST online_shop/_doc
{"color":"yellow","size":36,"name":"XYZ abc"}
Apply a shared query to the hits as well as aggregations and use post_filter to ... post-filter the hits:
GET online_shop/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
}
]
}
},
"aggs": {
"by_color": {
"terms": {
"field": "color"
}
},
"by_size": {
"terms": {
"field": "size"
}
}
},
"post_filter": {
"bool": {
"must": [
{
"term": {
"color": {
"value": "red"
}
}
}
]
}
}
}
Expected result
{
...
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.11750763,
"hits" : [
{
"_index" : "online_shop",
"_type" : "_doc",
"_id" : "cehma3IBG_KW3EFn1QYa",
"_score" : 0.11750763,
"_source" : {
"color" : "red",
"size" : 35,
"name" : "Louboutin High heels abc"
}
}
]
},
"aggregations" : {
"by_color" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "black",
"doc_count" : 1
},
{
"key" : "red",
"doc_count" : 1
},
{
"key" : "yellow",
"doc_count" : 1
}
]
},
"by_size" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 34,
"doc_count" : 1
},
{
"key" : 35,
"doc_count" : 1
},
{
"key" : 36,
"doc_count" : 1
}
]
}
}
}

Related

How to return hit term in ES ?

I try to return only the terms that were successfully hit instead of the document itself, but I don’t know how to achieve the desired effect。
"es_episode" : {
"aliases" : { },
"mappings" : {
"properties" : {
"endTime" : {
"type" : "long"
},
"episodeId" : {
"type" : "long"
},
"startTime" : {
"type" : "long"
},
"studentIds" : {
"type" : "long"
}
}
}
This is an example:
{
"episodeId":124,
"startTime":10,
"endTime":20,
"studentIds":[200,300]
}
My query:
GET /es_episode/_search
{
"_source": ["studentIds"],
"query": {
"terms": {
"studentIds": [300,400]
}
}
}
The result is
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "es_episode",
"_type" : "episode",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"studentIds" : [
200,
300
]
}
}
]
}
But in fact I only want to know which term hits. For example, the result I want should be studentIds=[300] instead of all studentIds=[200,300] of the returned document. It seems that some additional operations are required, but I don’t know
how.
I try to achieve my goal with the following query
GET /es_episode/_search
{
"_source": ["studentIds"],
"query": {
"terms": {
"studentIds": [300,400]
}
},
"aggs": {
"student_id": {
"terms": {
"field": "studentIds",
"size": 10
},
"aggs": {
"id": {
"terms": {
"field": "episodeId"
}
},
"id_select":{
"bucket_selector": {
"buckets_path": {
"key" : "_key"
},
"script": "params.key==300 || params.key==400"
}
}
}
}
}
}
the result for this is
"aggregations" : {
"student_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 300,
"doc_count" : 1,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 124,
"doc_count" : 1
}
]
}
}
]
}
}
It seems that I successfully filtered out the terms I don’t want, but this doesn’t look pretty, and I need to set my parameters repeatedly in the script

ElasticSearch: How to make an aggregation pipeline?

Imagine the following use case:
We work at Stark Airlines and our marketing team wants to segment our passengers in order to give them discounts or gift cards. They decide that they want two sets of passengers:
Passengers that fly at least 3 times per week
Passenger who have flown at least once but who have not flown for two weeks
With this they can make different marketing campaigns for our passengers!
So, in elastic search we have a trip index that represents a ticket bought by a passenger:
{
"_index" : "trip",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"total_amount" : 300,
"trip_date" : "2020/03/24 13:30:00",
"status" : "completed",
"passenger" : {
"id" : 11,
"name" : "Thiago nunes"
}
}
}
The trip index contains a status field that may have other values like: pending or open or canceled
This means that we can only take into account trips that has the completed status (Meaning the passenger did travel).
So, with all this in mind...How would I get those two sets of passengers with elastic search?
I have been trying for a while but with no success.
What I have done until now:
I have built a query that gets all valid trip (trips with status completed)
GET /trip/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"status": {
"value": "completed"
}
}
}
]
}
},
"aggs": {
"status_viagem": {
"terms": {
"field": "status.keyword"
}
}
}
}
This query returns the following:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 200,
"relation" : "eq"
},
"max_score" : 0.18232156,
"hits" : [...]
},
"aggregations" : {
"status_viagem" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "completed",
"doc_count" : 200
}
]
}
}
}
But I am stuck and can't figure out the next step. I know that the next thing to do should create buckets with passengers and then filter them in two buckets representing our desired data sets. But I don't know how.
Can someone help?
PS.:
I don't exactly need this to be one single query, just a hint about how to build a query like this would be very helpful
THE OUPUT SHOULD BE AN ARRAY of passenger id's
Note: I have shortened the trip index for the sake of simplicity
As per my understanding of your issue.
I have used date_histogram with interval as week to get collection on passengers which week. Only those passengers are kept which have three documents in a week. This will give you all passengers which have traveled thrice in a week.
In another aggregation I have use terms aggregation to get passengers and their last travel date. Using bucket selector have kept passengers whose last travel is not beyond certain date.
Mapping
{
"index87" : {
"mappings" : {
"properties" : {
"passengerid" : {
"type" : "long"
},
"passengername" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"total_amount" : {
"type" : "long"
},
"trip_date" : {
"type" : "date"
}
}
}
}
}
Query
{
"query": {
"bool": {
"must": [
{
"term": {
"status": {
"value": "completed"
}
}
}
]
}
},
"aggs": {
"travel_thrice_week": {
"date_histogram": {
"field": "trip_date",
"interval": "week"
},
"aggs": {
"passenger": {
"terms": {
"field": "passengername.keyword",
"min_doc_count": 3,
"size": 10
}
},
"select_bucket_with_user": {-->to keep weeks which have a pasenger with thrice
--> a day travel
"bucket_selector": {
"buckets_path": {
"passenger": "passenger._bucket_count"
},
"script": "if(params['passenger']>=1) {return true;} else{ return false;} "
}
}
}
},
"not_flown_last_two_week": {
"terms": {
"field": "passengername.keyword",
"size": 10
},
"aggs": {
"last_travel": {
"max": {
"field": "trip_date" --> most recent travel
}
},
"last_travel_before_two_week": {
"bucket_selector": {
"buckets_path": {
"traveldate": "last_travel"
},
"script":{
"source": "if(params['traveldate']< params['date_epoch']) return true; else return false;",
"params": {
"date_epoch":1586408336000 --> unix epoc of cutt off date
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"not_flown_last_two_week" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Thiago nunes",
"doc_count" : 3,
"last_travel" : {
"value" : 1.5851808E12,
"value_as_string" : "2020-03-26T00:00:00.000Z"
}
},
{
"key" : "john doe",
"doc_count" : 1,
"last_travel" : {
"value" : 1.5799968E12,
"value_as_string" : "2020-01-26T00:00:00.000Z"
}
}
]
},
"travel_thrice_week" : {
"buckets" : [
{
"key_as_string" : "2020-03-23T00:00:00.000Z",
"key" : 1584921600000,
"doc_count" : 3,
"passenger" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Thiago nunes",
"doc_count" : 3
}
]
}
}
]
}
}

Elasticsearch aggregations: how to get bucket with 'other' results of terms aggregation?

I use aggregation to collect data from nested field and stuck a little
Example of document:
{
...
rectangle: {
attributes: [
{_id: 'some_id', ...}
]
}
ES allows group data by rectangle.attributes._id, but is there any way to get some 'other' bucket to put there documents that were not added to any of groups? Or maybe there is a way to create query to create bucket for documents by {"rectangle.attributes._id": {$ne: "{currentDoc}.rectangle.attributes._id"}}
I think bucket would be perfect because i need to do further aggregations with 'other' docs.
Or maybe there's some cool workaround
I use query like this for aggregation
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword"
}
}
}
}
}
And get this result
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 27616,
"attributes" : {
"doc_count" : 45,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45,
"attributeOptionsCount" : {
"value" : 2
}
}
]
}
}
}
]
result like this would be perfect:
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 1000,
"attributes" : {
"doc_count" : 145,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45
},
{
"key" : "other",
"doc_count" : 100
}
]
}
}
}
]
You can make use of missing value parameter. Update aggregation as below:
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword",
"missing": "other"
}
}
}
}
}

multiple fields in aggs elastic query

i have elastic mapped as
"mappings": {
"keywords": {
"properties": {
"Keyword": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"KeywordType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
I trying to retrieve two fields keywords and its keyword type.
{
"query": {
"bool": {
"must": [{
"match": {
"Keyword": TEXT_REQ
}
}]
}
},
"aggs": {
"keywords": {
"terms": {
"field":"Keyword.keyword",
"size": 500
}
}
}
}
It returns all the keywords that are present in the text. I want the keywordtype also along with it, i tried with multiple value scores
{aggs:{
"keywords":{"terms":{"field":"Keyword.keyword"}},
"keywordtype":{"terms":{"field":"KeywordType.keyword"}}
}}
but i don't get the corresponding keywordtype for the keyword. I got the overall keywordtypes present.
{... "aggregations":{"keywords":{... "buckets":[ {"key": "management"}]},
"keywordtype":{... "buckets":[{"key":"Tools"}, {"key":"technology"}]}
i need output to be
bucket:[{"keyword":"management", keywordtype:"Tools"}]
how to modify the elastic query ?
You can use either of the below queries:
Solution 1: Using Composite Aggregation:
You can make use of the below Composite Aggregation as you mentioned that you would want to group the Keyword and KeywordType
Aggregation Query:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"Keyword": "TEXT_REQ"
}
}
]
}
},
"aggs" : {
"my_buckets": {
"composite" : {
"sources" : [
{ "keyword": { "terms" : { "field": "Keyword.keyword" } } },
{ "keywordType": { "terms" : { "field": "KeywordType.keyword" } } }
]
}
}
}
}
Sample Response:
{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_buckets" : {
"after_key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type3"
},
"buckets" : [ <----- Required Results Start
{
"key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type1"
},
"doc_count" : 1
},
{
"key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type2"
},
"doc_count" : 2
},
{
"key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type3"
},
"doc_count" : 1
}
] <----- Required Results End
}
}
}
Solution 2: Using Terms Aggregation
Using Terms Aggregation, I've constructed parent-child(parent being Keyword and child being KeywordType) which would be in below tree structure.
Bool Query
Terms Aggregation on Keyword.keyword
- Terms Aggregation on KeywordType.keyword
Aggregation Query:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"Keyword": "TEXT_REQ"
}
}
]
}
},
"aggs": {
"mykeywords": {
"terms": {
"field": "Keyword.keyword",
"size": 10
},
"aggs": {
"mytypes": {
"terms": {
"field": "KeywordType.keyword",
"size": 10
}
}
}
}
}
}
Sample Response:
{
"took" : 97,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"mykeywords" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "TEXT_REQ", <----- Parent Value i.e Keyword
"doc_count" : 4,
"mytypes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ <----- Children i.e. KeywordType
{
"key" : "TEXT_REQ_Type2",
"doc_count" : 2
},
{
"key" : "TEXT_REQ_Type1",
"doc_count" : 1
},
{
"key" : "TEXT_REQ_Type3",
"doc_count" : 1
}
]
}
}
]
}
}
}
Let me know if this helps!

ElasticSearch get last n distinct records

I am trying to implement a search query over records stored in elasticsearch.
The record structure looks something like this.
{
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "pWjQLWkBIJk0ORjd0X2P",
"_score" : null,
"_source" : {
"transactionID" : "60ab66cf24c9924f562bf1a2b5d92305d0a6",
"boxNumber" : "Box3",
"createDate" : "2013-09-17T00:00:00",
"itemNumber" : "Item1",
"address" : "Sample Address"
}
}
one box can contain multiple items. For example Box3 can have Item1, Item2 and Item3. So in elasticsearch i will have 3 different documents. Also at the same time, same box and same item can also exist but with different address. The transactionID may or maynot be the same for these documents.
My requirement is to fetch last n recent and distinct transactionIDs, along with their records.
I tried following query to fetch last 7 distinct transactionIDs
GET /box_info_store/boxes/_search?size=7
{
"query": {
"bool": {
"must": [
{"match":{"boxNumber":"Box3"}},
{"match":{"itemNumber":"Item1"}}
]
}
},
"sort": [
{
"createDate": {
"order": "desc"
}
}
],
"aggs": {
"distinct_transactions": {
"terms": { "field": "transactionID"}
}
}
}
This fetched me last 7 documents where boxNumber is Box3 and itemNumber is Item1, but not 7 distinct transactionIDs, two out of these seven documents have the same transactionID(both having separate address though).
But my requirement is to get 7 distinct transactionIds, no matter how many document it returns.
Hope i was able to explain myself.
Appreciate any kind of help here
Thanks
------Edited #gaurav9620, i ran the first query and got count as 32, then i ran the second query with distinct count as 3 i got the following result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 32,
"max_score" : null,
"hits" : [
{
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "RWjRLWkBIJk0ORjdEX-L",
"_score" : null,
"_source" : {
"transactionID" : "3087e106244f6247a5290fb21ce64254529c",
"boxNumber" : "Box3",
"createDate" : "2017-11-15T00:00:00",
"itemNumber" : "Item1",
"address" : "sampleAddress12",
},
"sort" : [
1510704000000
]
},
{
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "MGjQLWkBIJk0ORjdwX0M",
"_score" : null,
"_source" : {
"transactionID" : "60ab66cf24c9924f562bf1a2b5d92305d0a6",
"boxNumber" : "Box3",
"createDate" : "2016-04-03T00:00:00",
"itemNumber" : "Item1",
"address" : "sampleAddress321",
},
"sort" : [
1459641600000
]
},
..........
..........
..........
{
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "AGjRLWkBIJk0ORjdK4CJ",
"_score" : null,
"_source" : {
"transactionID" : "3087e106244f6247a5290fb21ce64254529c",
"boxNumber" : "Box3",
"createDate" : "1996-02-16T00:00:00",
"itemNumber" : "Item1",
"address" : "sampleAddress4324",
},
"sort" : [
824428800000
]
}
]
},
"aggregations" : {
"unique_transactions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 16,
"buckets" : [
{
"key" : "3087e106244f6247a5290fb21ce64254529c",
"doc_count" : 6
},
{
"key" : "27c5f3422f4482495d29e7b2c15c0e311743",
"doc_count" : 5
},
{
"key" : "c40e53212e74e24bf02a5bd2b134cf92bffb",
"doc_count" : 5
}
]
}
}
}
The size which you have used : represents number of raw documents that are retrieved.
If your case what you need to do is :
Mention size as 0 -> which will return you no raw documents
Include a size parameter in aggregation which will return you unique 7 ids.
GET /box_info_store/boxes/_search?size=7
{
"query": {
"bool": {
"must": [
{
"match": {
"boxNumber": "Box3"
}
},
{
"match": {
"itemNumber": "Item1"
}
}
]
}
},
"sort": [
{
"createDate": {
"order": "desc"
}
}
],
"aggs": {
"distinct_transactions": {
"terms": {
"field": "transactionID",
"size": 7
}
}
}
}
EDIT-------------------------------------
First fire this query
GET /box_info_store/boxes/_search?size=0
{
"query": {
"bool": {
"must": [
{
"match": {
"boxNumber": "Box3"
}
},
{
"match": {
"itemNumber": "Item1"
}
}
]
}
}
}
Here you will find total number of documents matching your query which you can set as n
After this fire your query as below
GET /box_info_store/boxes/_search?size=**n**
{
"query": {
"bool": {
"must": [
{
"match": {
"boxNumber": "Box3"
}
},
{
"match": {
"itemNumber": "Item1"
}
}
]
}
},
"sort": [
{
"createDate": {
"order": "desc"
}
}
],
"aggs": {
"distinct_transactions": {
"terms": {
"field": "transactionID",
"size": NUMBER_OF_UNIQUE_TRANSACTION_IDS_TO_BE_FETCHED
}
}
}
}

Resources