Can I sort grouped search result by formula? - elasticsearch

I am trying to implement query which will sort aggregated results by the formula.
For example, we have the next entities:
{
"price":"1000",
"zip":"77777",
"field1":"1",
"field2":"5"
},
{
"price":"2222",
"zip":"77777",
"field1":"2",
"field2":"5"
},
{
"price":"1111",
"zip":"77777",
"field1":"1",
"field2":"5"
}
Now, my query without sorting looks like:
POST /entities/_search {
"size": 0,
"query": {
"term": {
"zip": {
"value": "77777"
}
}
},
"aggs": {
"my composite": {
"composite": {
"size": 500,
"sources": [{
"field1_term": {
"terms": {
"field": "field1"
}
}
},
{
"field2_term": {
"terms": {
"field": "field2"
}
}
}
]
},
"aggs": {
"avg_price_per_group": {
"avg": {
"field": "price"
}
},
"results_per_group": {
"top_hits": {
"size": 100,
"_source": {
"include": ["entity_id", "price"]
}
}
}
}
}
}
}
The first one I need to group result by field1 and field2 and then calculate the average price for each group.
Then I need to divide the price of each doc by average price value and sort documents based on this value.
Is it possible to do this somehow?

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Filter based on different values for the same field in different documents

Let's say I have the following data:
{
"id":"1",
"name": "John",
"tag":"x"
},
{
"id": 2,
"name":"John",
"tag":"y"
},
{
"id": 3,
"name":"Jane",
"tag":"x"
}
I want to get the count of documents (unique on name) that has both tag = "x" and tag = "y"
Given the above data, the query should return 1, because only John has two documents exists that has the two required tags.
What I am able to do so far is a query that uses OR ( so either tag = "x" or tag = "y") which will return 2. For example:
"aggs": {
"distict_count": {
"filter": {
"terms": {
"tag": [
"x",
"y"
]
}
},
"aggs": {
"agg_cardinality_name": {
"cardinality": {
"field": "name"
}
}
}
}
}
Would it be possible to change that to use and instead of or?
Try putting cardinality under a terms agg to get proper distinct counts:
{
"size": 0,
"aggs": {
"distict_count": {
"filter": {
"terms": {
"tag": [
"x",
"y"
]
}
},
"aggs": {
"agg_terms": {
"terms": {
"field": "name"
},
"aggs": {
"agg_cardinality_name": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
}
}
CORRECTION
You can use a combination of cardinality aggs with a bucket_selector which'll rule out buckets where there are fewer than 2 unique tags -- i.e. both x and y:
{
"size": 0,
"aggs": {
"distict_count": {
"filter": {
"terms": {
"tag": [
"x",
"y"
]
}
},
"aggs": {
"agg_terms": {
"terms": {
"field": "name"
},
"aggs": {
"agg_cardinality_tag2": {
"bucket_selector": {
"buckets_path": {
"unique_tags_count": "unique_tags_count"
},
"script": "params.unique_tags_count > 1"
}
},
"unique_tags_count": {
"cardinality": {
"field": "tag"
}
},
"unique_names_count": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
}
}

ElasticSearch Query to find intersection of two queries

Records exist in this format: {user_id, state}.
I need to write an elasticsearch query to find all user_id's that have both states present in the records list.
For example, if sample records stored are:
{1,a}
{1,b}
{2,a}
{2,b}
{1,a}
{3,b}
{3,b}
The output from running the query for this example would be
{"1", "2"}
I've tried this so far:
{
"size": 0,
"query": {
"bool": {
"filter": {
"terms": {
"state": [
"a",
"b"
]
}
}
}
},
"aggs": {
"user_id_intersection": {
"terms": {
"field": "user_id",
"min_doc_count": 2,
"size": 100
}
}
}
}
but this will return
{"1", "2", "3"}
Assuming you know the cardinality of the states set, here 2, you can use the
Bucket Selector Aggregation
GET test/_search
{
"size": 0,
"aggs": {
"user_ids": {
"terms": {
"field": "user_id"
},
"aggs": {
"states_card": {
"cardinality": {
"field": "state"
}
},
"state_filter": {
"bucket_selector": {
"buckets_path": {
"states_card": "states_card"
},
"script": "params.states_card == 2"
}
}
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources