ElasticSearch Query to find intersection of two queries - elasticsearch

Records exist in this format: {user_id, state}.
I need to write an elasticsearch query to find all user_id's that have both states present in the records list.
For example, if sample records stored are:
{1,a}
{1,b}
{2,a}
{2,b}
{1,a}
{3,b}
{3,b}
The output from running the query for this example would be
{"1", "2"}
I've tried this so far:
{
"size": 0,
"query": {
"bool": {
"filter": {
"terms": {
"state": [
"a",
"b"
]
}
}
}
},
"aggs": {
"user_id_intersection": {
"terms": {
"field": "user_id",
"min_doc_count": 2,
"size": 100
}
}
}
}
but this will return
{"1", "2", "3"}

Assuming you know the cardinality of the states set, here 2, you can use the
Bucket Selector Aggregation
GET test/_search
{
"size": 0,
"aggs": {
"user_ids": {
"terms": {
"field": "user_id"
},
"aggs": {
"states_card": {
"cardinality": {
"field": "state"
}
},
"state_filter": {
"bucket_selector": {
"buckets_path": {
"states_card": "states_card"
},
"script": "params.states_card == 2"
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Filter based on different values for the same field in different documents

Let's say I have the following data:
{
"id":"1",
"name": "John",
"tag":"x"
},
{
"id": 2,
"name":"John",
"tag":"y"
},
{
"id": 3,
"name":"Jane",
"tag":"x"
}
I want to get the count of documents (unique on name) that has both tag = "x" and tag = "y"
Given the above data, the query should return 1, because only John has two documents exists that has the two required tags.
What I am able to do so far is a query that uses OR ( so either tag = "x" or tag = "y") which will return 2. For example:
"aggs": {
"distict_count": {
"filter": {
"terms": {
"tag": [
"x",
"y"
]
}
},
"aggs": {
"agg_cardinality_name": {
"cardinality": {
"field": "name"
}
}
}
}
}
Would it be possible to change that to use and instead of or?
Try putting cardinality under a terms agg to get proper distinct counts:
{
"size": 0,
"aggs": {
"distict_count": {
"filter": {
"terms": {
"tag": [
"x",
"y"
]
}
},
"aggs": {
"agg_terms": {
"terms": {
"field": "name"
},
"aggs": {
"agg_cardinality_name": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
}
}
CORRECTION
You can use a combination of cardinality aggs with a bucket_selector which'll rule out buckets where there are fewer than 2 unique tags -- i.e. both x and y:
{
"size": 0,
"aggs": {
"distict_count": {
"filter": {
"terms": {
"tag": [
"x",
"y"
]
}
},
"aggs": {
"agg_terms": {
"terms": {
"field": "name"
},
"aggs": {
"agg_cardinality_tag2": {
"bucket_selector": {
"buckets_path": {
"unique_tags_count": "unique_tags_count"
},
"script": "params.unique_tags_count > 1"
}
},
"unique_tags_count": {
"cardinality": {
"field": "tag"
}
},
"unique_names_count": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
}
}

Elasticsearch sort terms agg by arbitrary order

I have a terms aggregation and they want some specific values to always be at the top.
Like:
POST _search
{ "size": 0,
"aggs": {
"pets": {
"terms": {
"field": "species",
"order": "Dogs, Cats"
}
}
}
}
Where the results would be like "Dog", "Cat", "Iguana".
Dog and Cat at the top and everything else below.
Is this possible without scripting?
Thanks!
One way to do it is by filtering values in the terms aggregation. You'd create two terms aggregations, one with the desired terms and another with all other terms.
{
"size": 0,
"aggs": {
"top_terms": {
"terms": {
"field": "species",
"include": ["Dogs", "Cats"],
"order": { "_key" : "desc" }
}
},
"other_terms": {
"terms": {
"field": "species",
"exclude": ["Dogs", "Cats"]
}
}
}
}
Try it out
A script wouldn't be too complicated though -- first boost the two species, then sort by the scores first and then by _count:
GET pets/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"species": [
"dog",
"cat"
],
"boost": 10
}
},
{
"match_all": {}
}
]
}
},
"aggs": {
"pets": {
"terms": {
"field": "species.keyword",
"order": [
{
"max_score": "desc"
},
{
"_count": "desc"
}
]
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}

Global term aggregation with filtered count - Elasticsearch 5

I have products stored in ES and I'm trying to aggregate them by their size. I would like to design following behaviour. For each term even outside of query to receive term counts based on query.
So querying for sizes ["S", "M"] I would like to receive:
S: 1
M: 1
L: 0
Is this somehow possible?
Here is my setup where I get following result:
S: 1
M: 1
But L is completely missing.
PUT demo
{
"mappings": {
"product": {
"properties": {
"size": {
"type": "keyword"
}
}
}
}
}
PUT demo/product/1
{
"size": "S"
}
PUT demo/product/2
{
"size": "M"
}
PUT demo/product/3
{
"size": "L"
}
GET demo/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"size": [
"S",
"M"
]
}
}
]
}
},
"aggs": {
"size": {
"terms": {
"field": "size"
}
}
}
}
You can use filter.
{
"size": 0,
"query": {
"bool": {
"must": [
{ "terms": { "field": "size" } }
],
"filter": {
"terms": { "size": [ "S", "M"] }
}
}
},
"aggs": {
"size": {
"terms": { "field": "size" }
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources