Aggregation not taking place on basis of size paramter passed in ES query - elasticsearch

My ES query looks like this. I am trying to get average rating for indexes starting from 0 to 9. But ES is taking the average of all the records.
GET review/analytics/_search
{
"_source": "r_id",
"from": 0,
"size": 9,
"query": {
"bool": {
"filter": [
{
"terms": {
"b_id": [
236611
]
}
},
{
"range": {
"r_date": {
"gte": "1970-01-01 05:30:00",
"lte": "2019-08-13 17:13:17",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"s_type": [
"aggregation",
"organic",
"survey"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"s_id": [
392
]
}
}
]
}
},
{
"term": {
"status": 2
}
},
{
"bool": {
"must_not": [
{
"terms": {
"ba_id": []
}
}
]
}
}
]
}
},
"sort": [
{
"featured": {
"order": "desc"
}
},
{
"r_date": {
"order": "desc"
}
}
],
"aggs": {
"avg_rating": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
},
"avg_rating1": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
}
}
}
The query results shows the doc_count as 43 . whereas i want it to be 9 so that i can calculate the average correctly. I have specified the size above. The result of query seems to be calculated correctly but aggregation result is not proper.

from and size have no impact on the aggregations. They only define how many documents will be returned in the hits.hits array.
Aggregations always run on the whole document set selected by whatever query is in your query section.
If you know the IDs of the "first" nine documents, you can add a terms query in your query so that only those 9 documents are selected and so that the average rating is only computed on those 9 documents.

Related

How to return results from elasticsearch after a threshold match

I have two queries as follows:
The first query returns the count of all documents per domain.
The second query returns the count where a field is empty.
Later I filter it in my backend, such that, if for a domain the count of documents missing field value is more than a specific threshold then only consider them else ignore. Could these two queries be combined together, such that I could do the threshold comparison and then return the results.
The first query is as follows:
GET database/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
]
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
The second query just applies a should filter as follows:
GET mapachitl/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
],
"should": [
{
"term": {
"address.city.keyword": {
"value": ""
}
}
},
{
"term": {
"address.zip.keyword": {
"value": ""
}
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
Can I only return those domains where the ratio of documents missing city or zip code is more than 25%? I read about scripting but not sure how can I use it here.

Query to get random n items from top 100 items in Elastic Search

I need to write a query in elasticsearch to get random 12 items in the top 100 sorted items.
I tried something like this, but I am unable to get random 12 items(I can get only the top 12 items).
The query I used:
GET product/_search
{
"sort": [
{
"DateAdded": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"definitionName": {
"value": "ABC"
}
}
},
{
"range": {
"price": {
"gt": 0
}
}
}
]
}
},
"functions": [
{
"random_score": {
"seed": 314159265359
}
}
]
}
},
"size": 12
}
Can anybody guide me where am I going wrong? (I am a beginner in writing ElasticQueries)
Thanks in Advance.
EDIT: doesnot work, window_size recalculate score on the X top results.
Also:
need to set: "track_scores" to true at the top level.
corect syntax is:
"rescore": {
"window_size": 10,
"query": {
"score_mode": "max", //wathever
"rescore_query": {
"bool": {
"should": [
{
//your query here - you can use a function or a script score too
}
]
}
},
"query_weight": 0.7,
"rescore_query_weight": 1.2
}
}
Ok i understand better.
Indeed you have to sort by date (top 100) and rescore with a random function (read https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html#request-body-search-post-filter).
Should be something like:
{
"sort": [
{
"DateAdded": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"term": {
"definitionName": {
"value": "ABC"
}
}
},
{
"range": {
"price": {
"gt": 0
}
}
}
]
}
},
"size": 100,
"rescore": {
"window_size": 12,
"query": {
"rescore_query": {
"random_score": {
"seed": 314159265359
}
}
}
}
}

Need aggregation of only the query results

I need to do an aggregation but only with the limited results I get form the query, but it is not working, it returns other results outside the size limit of the query. Here is the query I am doing
{
"size": 500,
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "possiblePurchase"
}
},
{
"term": {
"clientName": "Ci"
}
},
{
"range": {
"firstSeenDate": {
"gte": "now-30d"
}
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "skipPurchase"
}
}
]
}
},
"sort": [
{
"firstSeenDate": {
"order": "desc"
}
}
],
"aggs": {
"byClient": {
"terms": {
"field": "clientName",
"size": 25
},
"aggs": {
"byTarget": {
"terms": {
"field": "targetName",
"size": 6
},
"aggs": {
"byId": {
"terms": {
"field": "id",
"size": 5
}
}
}
}
}
}
}
}
I need the aggregations to only consider the first 500 results of the query, sorted by the field I am requesting on the query. I am completely lost. Thanks for the help
Scope of the aggregation is the number of hits of your query, the size parameter is only used to specify the number of hits to fetch and display.
If you want to restrict the scope of the aggregation on the first n hits of a query, I would suggest the sampler aggregation in combination with your query

Need aggregation on document inner array object - ElasticSearch

I am trying to do aggregation over the following document
{
"pid": 900000,
"mid": 9000,
"cid": 90,
"bid": 1000,
"gmv": 1000000,
"vol": 200,
"data": [
{
"date": "25-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "24-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "23-11-2018",
"gmv": 100000,
"vol": 20
}
]
}
The analysis which needs to be done here is:
Filter on mid or/and cid on all documents
Filter range on data.date for last 7 days and sum data.vol over that range for each pid
sort the documents over the sum obtained in previous step in desc order
Group these results by pid.
This means we are trying to get top products by sum of the volume (quantity sold) within a date range for specific cid/mid.
PID here refers product ID,
MID refers here merchant ID,
CID refers here category ID
Firstly you need to change your mapping to run the query on nested fields.
change the type for field 'data' as 'nested'.
Then you can use the range query in filter along with the terms filter on mid/cid to filter on the data. Once you get the correct data set, then you can aggregate on the pid following the sub aggregation on sum of vol.
Here is the below query.
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"data.date": {
"gte": "28-11-2018",
"lte": "25-11-2018"
}
}
},
{
"must": [
{
"terms": {
"mid": [
"9000"
]
}
}
]
}
]
}
}
]
}
},
"aggs": {
"AGG_PID": {
"terms": {
"field": "pid",
"size": 0,
"order": {
"TOTAL_SUM": "desc"
},
"min_doc_count": 1
},
"aggs": {
"TOTAL_SUM": {
"sum": {
"field": "data.vol"
}
}
}
}
}
}
You can modify the query accordingly. Hope this will be helpful.
Please find nested aggregation query which sorts by "vol" for each bucket of "pid". You can add any number of filters in the query part.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"mid": "2"
}
}
]
}
},
"aggs": {
"top_products_sorted_by_order_volume": {
"terms": {
"field": "pid",
"order": {
"nested_data_object>order_volume_by_range>order_volume_sum": "desc"
}
},
"aggs": {
"nested_data_object": {
"nested": {
"path": "data"
},
"aggs": {
"order_volume_by_range": {
"filter": {
"range": {
"data.date": {
"gte": "2018-11-26",
"lte": "2018-11-27"
}
}
},
"aggs": {
"order_volume_sum": {
"sum": {
"field": "data.ord_vol"
}
}
}
}
}
}
}
}
}
}

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

Resources