How to return results from elasticsearch after a threshold match - elasticsearch

I have two queries as follows:
The first query returns the count of all documents per domain.
The second query returns the count where a field is empty.
Later I filter it in my backend, such that, if for a domain the count of documents missing field value is more than a specific threshold then only consider them else ignore. Could these two queries be combined together, such that I could do the threshold comparison and then return the results.
The first query is as follows:
GET database/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
]
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
The second query just applies a should filter as follows:
GET mapachitl/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
],
"should": [
{
"term": {
"address.city.keyword": {
"value": ""
}
}
},
{
"term": {
"address.zip.keyword": {
"value": ""
}
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
Can I only return those domains where the ratio of documents missing city or zip code is more than 25%? I read about scripting but not sure how can I use it here.

Related

Elasticsearch: How to search with all inputs only

I am looking for a solution to the problem
Problem:
i have two records A:Trace(id, traceId, Tags) B:Trace(id,traceId, Tags)
both records have same traceId and different tags
for that i used should Clause which return data even if only record A have tag in it. But what i want is that if query have tags that are not in records there should be empty response.
this is the query i actually used on Zipkin ELasticsearch Data
GET zipkin-span-2021-12-08/_search?size=10
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"_q": "smpp.charged=false"
}
},
{
"term": {
"_q": "connection.type=WEEK"
}
},
{
"term": {
"_q": "connection.type=a"
}
}
]
}
}
]
}
},
"aggs": {
"same_treace_id": {
"terms": {
"field": "traceId",
"size": 10,
"min_doc_count": 2
}
}
},
"fields": [
"traceId"
],
"_source": true
}

Elasticsearch query with Must (and) Should (or) not producing desired results

I'm trying to perform a query of X AND (y OR z)
I need to get all the sold properties that the agent was either the listing agent or the selling agent.
With only the bool must I get 9324 results. When I add the bool should, I get the same result set of 9324. The agent with the ID of 140699 should have only about 100 results. I've also tried a bool filter with no success. When replacing the should with a filter, the result is like another bool must, and I only get results where the agent was the listing agent AND the selling agent
GET /property/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"statusCatID": {
"value": "Sold"
}
}
},
{
"range": {
"closingDate": {
"gte": "now-3M"
}
}
}
],
"should": [
{
"term": {
"listAgent1": {
"value": 140699
}
}
},
{
"term": {
"sellingAgent1": {
"value": 140699
}
}
}
]
}
},
"size": 300
}
With your notation you are performing a query like this:
(statuscatid:sold AND closingDate:now-3M OR listAgent1:140699 OR sellingAgent1:140699)
I suggest you to read this official blog post to understand better the bool queries in elastic. If you want a query like this:
(statuscatid:sold AND closingDate:now-3M) AND (listAgent1:140699 OR sellingAgent1:140699)
You should write it in this way:
{
"query": {
"bool": {
"must": [
{
"term": {
"statusCatId": "sold"
}
},
{
"range": {
"closingDate": "now-3M"
}
},
{
"bool": {
"should": [
{
"term": {
"listAgent1": 140699
}
},
{
"term": {
"sellingAgent1": 140699
}
}
]
}
}
]
}
},
"size": 300
}

Aggregation not taking place on basis of size paramter passed in ES query

My ES query looks like this. I am trying to get average rating for indexes starting from 0 to 9. But ES is taking the average of all the records.
GET review/analytics/_search
{
"_source": "r_id",
"from": 0,
"size": 9,
"query": {
"bool": {
"filter": [
{
"terms": {
"b_id": [
236611
]
}
},
{
"range": {
"r_date": {
"gte": "1970-01-01 05:30:00",
"lte": "2019-08-13 17:13:17",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"s_type": [
"aggregation",
"organic",
"survey"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"s_id": [
392
]
}
}
]
}
},
{
"term": {
"status": 2
}
},
{
"bool": {
"must_not": [
{
"terms": {
"ba_id": []
}
}
]
}
}
]
}
},
"sort": [
{
"featured": {
"order": "desc"
}
},
{
"r_date": {
"order": "desc"
}
}
],
"aggs": {
"avg_rating": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
},
"avg_rating1": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
}
}
}
The query results shows the doc_count as 43 . whereas i want it to be 9 so that i can calculate the average correctly. I have specified the size above. The result of query seems to be calculated correctly but aggregation result is not proper.
from and size have no impact on the aggregations. They only define how many documents will be returned in the hits.hits array.
Aggregations always run on the whole document set selected by whatever query is in your query section.
If you know the IDs of the "first" nine documents, you can add a terms query in your query so that only those 9 documents are selected and so that the average rating is only computed on those 9 documents.

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

elasticsearch to apply a sort to a query, the select top N for aggregate

The query below aggregates over the entire result of the query, and size only affects what is returned rather than what is aggregated.
How would I modify the search so that only the top N results after sort is processed by the average aggregation?
It seems such a simple requirement that I'm expecting it to be possible but so far all my efforts have failed, and similar questions on SO have gone unanswered.
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"jobType": "LiveEventScoring"
}
},
{
"term": {
"host": "MTVMDANS"
}
},
{
"term": {
"dataSourceCode": "AU_VIRT"
}
},
{
"term": {
"measurement": "EventDataLoadFromCacheDuration"
}
}
]
}
}
}
},
"sort": {
"timestamp": {
"order": "desc"
}
},
"aggs": {
"avgDuration": {
"avg": {
"field": "elapsedMs"
}
}
}
}

Resources