Elasticsearch - Find all documents with aggregations results included in math operations - elasticsearch

I have 4 different aggregation queries where the results included in a math operation to find the total number required, pseudo example below. I need to find all the documents where the number is negative (e.g. -10).
number = agg1 + agg2 - agg3 - agg4
To keep it simple I will post two abbreviated aggregation queries.
Agg1:
{
"track_total_hits": true,
"aggs": {
"queryAmount_1": {
"sum": {
"field": "amount"
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match": {
"some_field": {
"query": "PayoutRequested"
}
}
}
]
}
}
]
}
},
"size": 0
}
Agg2:
{
"track_total_hits": true,
"aggs": {
"queryAmount_2": {
"sum": {
"field": "amount"
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match": {
"some_field": {
"query": "DonationRequested"
}
}
}
]
}
}
]
}
},
"size": 0
}
Somehow, I need to combine these in 1 query and grab the amount from the response for each aggregation grouped by some_id where the number result is negative.
Not sure if we can really achieve it but ideas are welcome.

The starting point would be the Pipeline aggregations and in specific have a look at Cumulative sum and Sum Bucket. Hope this would help.

Related

How to return results from elasticsearch after a threshold match

I have two queries as follows:
The first query returns the count of all documents per domain.
The second query returns the count where a field is empty.
Later I filter it in my backend, such that, if for a domain the count of documents missing field value is more than a specific threshold then only consider them else ignore. Could these two queries be combined together, such that I could do the threshold comparison and then return the results.
The first query is as follows:
GET database/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
]
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
The second query just applies a should filter as follows:
GET mapachitl/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
],
"should": [
{
"term": {
"address.city.keyword": {
"value": ""
}
}
},
{
"term": {
"address.zip.keyword": {
"value": ""
}
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
Can I only return those domains where the ratio of documents missing city or zip code is more than 25%? I read about scripting but not sure how can I use it here.

Need aggregation of only the query results

I need to do an aggregation but only with the limited results I get form the query, but it is not working, it returns other results outside the size limit of the query. Here is the query I am doing
{
"size": 500,
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "possiblePurchase"
}
},
{
"term": {
"clientName": "Ci"
}
},
{
"range": {
"firstSeenDate": {
"gte": "now-30d"
}
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "skipPurchase"
}
}
]
}
},
"sort": [
{
"firstSeenDate": {
"order": "desc"
}
}
],
"aggs": {
"byClient": {
"terms": {
"field": "clientName",
"size": 25
},
"aggs": {
"byTarget": {
"terms": {
"field": "targetName",
"size": 6
},
"aggs": {
"byId": {
"terms": {
"field": "id",
"size": 5
}
}
}
}
}
}
}
}
I need the aggregations to only consider the first 500 results of the query, sorted by the field I am requesting on the query. I am completely lost. Thanks for the help
Scope of the aggregation is the number of hits of your query, the size parameter is only used to specify the number of hits to fetch and display.
If you want to restrict the scope of the aggregation on the first n hits of a query, I would suggest the sampler aggregation in combination with your query

Aggregation not taking place on basis of size paramter passed in ES query

My ES query looks like this. I am trying to get average rating for indexes starting from 0 to 9. But ES is taking the average of all the records.
GET review/analytics/_search
{
"_source": "r_id",
"from": 0,
"size": 9,
"query": {
"bool": {
"filter": [
{
"terms": {
"b_id": [
236611
]
}
},
{
"range": {
"r_date": {
"gte": "1970-01-01 05:30:00",
"lte": "2019-08-13 17:13:17",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"s_type": [
"aggregation",
"organic",
"survey"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"s_id": [
392
]
}
}
]
}
},
{
"term": {
"status": 2
}
},
{
"bool": {
"must_not": [
{
"terms": {
"ba_id": []
}
}
]
}
}
]
}
},
"sort": [
{
"featured": {
"order": "desc"
}
},
{
"r_date": {
"order": "desc"
}
}
],
"aggs": {
"avg_rating": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
},
"avg_rating1": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
}
}
}
The query results shows the doc_count as 43 . whereas i want it to be 9 so that i can calculate the average correctly. I have specified the size above. The result of query seems to be calculated correctly but aggregation result is not proper.
from and size have no impact on the aggregations. They only define how many documents will be returned in the hits.hits array.
Aggregations always run on the whole document set selected by whatever query is in your query section.
If you know the IDs of the "first" nine documents, you can add a terms query in your query so that only those 9 documents are selected and so that the average rating is only computed on those 9 documents.

elasticsearch filter aggs by doc count

I have a query that counts the number of images per user:
GET images/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"appID.raw": "myApp"
}
}
]
}
},
"size": 0,
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
}
}
}
}
It basically works fine, but I would like to exclude all aggregation results for users that have less than 200 images. How can I tweak the query above to achieve this?
Thanks.
You can achieve this by using a Minimum Document Count option.
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID",
"min_doc_count": 200
}
}
}
Add a filter aggregation to your terms aggregation with the query clause.
Filter Aggregations
You can modify your above query to look like this.
{
"query": {
"bool": {
"must": [
{
"term": {
"appID.raw": "myApp"
}
}
]
}
},
"size": 0,
"aggs": {
"filtered_users_with_images_count": {
"filter": {
"term": {
"count": 200
}
},
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
}
}
}
}
}
}
You can modify the filter inside filtered_users_with_images_count to match documents with images greater than 200.
Please also consider to post your data mappings along with query to support your questions.

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Resources