ElasticSearch: Restrict Aggregations to Query String - elasticsearch

I'm struggling to get my aggregations to be restricted to my query.
I, of course, tried:
{
"_source": ["burger.id", "burger.user_name", "burger.timestamp"],
"query": {
"query_string": {
"query": "burger.user_name:Bob"
}
},
"aggs": {
"burger_count": {
"cardinality": {
"field": "burger.id.keyword"
}
},
"min_dtm": {
"min": {
"field": "burger.timestamp"
}
},
"max_dtm": {
"max": {
"field": "burger.timestamp"
}
}
}
}
I am very set on using "query_string" for filtering, as we have a very nice front-end that allows users to easily build queries that are then turned into a "query_string."
Unfortunately, I have not found a way to combine query_string and aggregations so that the aggregations are only over the results of the query!
I've read through many SO posts about doing this, but they are all very old and outdated as they all suggest the deprecated way of Filtered Queries, but even that doesn't implement query_string.
UPDATE
Here are some example documents.
It appears that my results are not being filtered by my query. Is there a setting that I am missing?
I also changed all of the fields to be about burgers...
{
"_index": "burgers",
"_type": "burger",
"_id": "123",
"_score": 5.3759894,
"_source": {
"inference": {
"id": "1",
"user_name": "Jonathan",
"timestamp": 1541521691847
}
}
},
{
"_index": "burgers",
"_type": "burger",
"_id": "456",
"_score": 5.3759894,
"_source": {
"inference": {
"id": "2",
"user_name": "Ryan",
"timestamp": 1542416601153
}
}
},
{
"_index": "burgers",
"_type": "burger",
"_id": "789",
"_score": 5.3759894,
"_source": {
"inference": {
"id": "3",
"user_name": "Grant",
"timestamp": 1542237715511
}
}
}

Found my answer!
It appears that the issue was caused by querying the text field of burger.user_name instead of the keyword field: burger.user_name.keyword.
Changing my query_string to use the keyword for each text field solved my issue.
{
"_source": ["burger.id", "burger.user_name", "burger.timestamp"],
"query": {
"query_string": {
"query": "burger.user_name.keyword:Bob"
}
},
"aggs": {
"burger_count": {
"cardinality": {
"field": "burger.id.keyword"
}
},
"min_dtm": {
"min": {
"field": "burger.timestamp"
}
},
"max_dtm": {
"max": {
"field": "burger.timestamp"
}
}
}
}
This SO answer gives a great, brief explanation why.

Related

Count API: count query field A with distinct field B value

For instance, given this result for a search, reduced to a size of 3 hits for brevity:
{
"hits": {
"total": {
"value": 51812937,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "desc-imunizacao",
"_type": "_doc",
"_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0",
"_score": 1.0,
"_source": {
"vacina_descricao_dose": "    2ª Dose",
"estabelecimento_uf": "BA",
"document_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0"
}
},
{
"_index": "desc-imunizacao",
"_type": "_doc",
"_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0",
"_score": 1.0,
"_source": {
"vacina_descricao_dose": "    1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0"
}
},
{
"_index": "desc-imunizacao",
"_type": "_doc",
"_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0",
"_score": 1.0,
"_source": {
"vacina_descricao_dose": "    1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
}
]
}
}
If I wanted to query for "estabelecimento_uf": "SE" and keep only one result for duplicates of "document_id", I would issue:
{
"_source": ["document_id", "estabelecimento_uf", "vacina_descricao_dose"],
"query": {
"match": {
"estabelecimento_uf": {
"query": "SE"
}
}
},
"collapse": {
"field": "document_id",
"inner_hits": {
"name": "latest",
"size": 1
}
}
}
Is there a way to achieve this with Elasticsearch's Count API? Meaning: count query for field A (estabelecimento_uf) and count for unique values of field B (document_id), knowing that document_id has duplicates of course.
This is a public API: https://imunizacao-es.saude.gov.br/_search
This is the authentication:
User: imunizacao_public
Pass: qlto5t&7r_#+#Tlstigi
You can use a combination of filter aggregation along with cardinality aggregation, to get a count of unique document id based on a filter
{
"size": 0,
"aggs": {
"filter_agg": {
"filter": {
"term": {
"estabelecimento_uf.keyword": "SE"
}
},
"aggs": {
"count_docid": {
"cardinality": {
"field": "document_id.keyword"
}
}
}
}
}
}
As far as I know, you cannot get the count of distinct field values using count API, you can either use field collapsing feature (as done in the question) OR use cardinality aggregation
Adding a working example with index data, search query and search result
{
"vacina_descricao_dose": " 2ª Dose",
"estabelecimento_uf": "BA",
"document_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0"
}
{
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0"
}
{
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
{
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
Search Query 1:
{
"size": 0,
"query": {
"match": {
"estabelecimento_uf": "SE"
}
},
"aggs": {
"count_doc_id": {
"cardinality": {
"field": "document_id.keyword"
}
}
}
}
Search Result:
"aggregations": {
"count_doc_id": {
"value": 2 // note this
}
}
Search Query 2:
{
"size": 0,
"aggs": {
"filter_agg": {
"filter": {
"term": {
"estabelecimento_uf.keyword": "SE"
}
},
"aggs": {
"count_docid": {
"cardinality": {
"field": "document_id.keyword"
}
}
}
}
}
}
Search Result:
"aggregations": {
"filter_agg": {
"doc_count": 3,
"count_docid": {
"value": 2 // note this
}
}
}

conditionally query for fields in elasticsearch

I m new to Elasticsearch and before posting this question I have googled for help but not understanding how to write the query which i wanted to write.
My problem is I have few bunch of documents which i want to query, few of those documents has field "DueDate" and few of those has "PlannedCompletionDate" but not both exist in a single document. So I want to write a query which should conditionally query for a field from documents and return all documents.
For example below I m proving sample documents of each type and my query should return results from both the documents, I need to write query which should check for field existence and return the document
"_source": {
...
"plannedCompleteDate": "2019-06-30T00:00:00.000Z",
...
}
"_source": {
...
"dueDate": "2019-07-26T07:00:00.000Z",
...
}
You can use range query with the combination of the boolean query to achieve your use case.
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"plannedCompleteDate": {
"type": "date",
"format": "yyyy-MM-dd"
},
"dueDate": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
Index Data:
{
"plannedCompleteDate": "2019-05-30"
}
{
"plannedCompleteDate": "2020-06-30"
}
{
"dueDate": "2020-05-30"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"range": {
"plannedCompleteDate": {
"gte": "2020-01-01",
"lte": "2020-12-31"
}
}
},
{
"range": {
"dueDate": {
"gte": "2020-01-01",
"lte": "2020-12-31"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "65808850",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"plannedCompleteDate": "2020-06-30"
}
},
{
"_index": "65808850",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"dueDate": "2020-05-30"
}
}
]

Elasticsearch - Find documents missing two fields

I'm trying to create a query that returns information about how many documents that don't have data for two fields (date.new and date.old). I have tried the query below, but it works as OR-logic, where all documents missing either date.new or date.old are returned. Does anyone know how I can make this only return documents missing both fields?
{
"aggs":{
"Missing_field_count1":{
"missing":{
"field":"date.new"
}
},
"Missing_field_count2":{
"missing":{
"field":"date.old"
}
}
}
}
Aggregations is not the feature to use for this. You need to use the exists query wrapped within a bool/must_not query, like this:
GET index/_count
{
"size": 0,
"bool": {
"must_not": [
{
"exists": {
"field": "date.new"
}
},
{
"exists": {
"field": "date.old"
}
}
]
}
}
hits.total.value indicates the count of the documents that match the search request. The value indicates the number of hits that match and relation indicates whether the value is accurate (eq) or a lower bound (gte)
Index Data:
{
"data": {
"new": 1501,
"old": 10
}
}
{
"title": "elasticsearch"
}
{
"title": "elasticsearch-query"
}
{
"date": {
"new": 1400
}
}
The search query given by #Val answers on how to achieve your use case.
Search Result:
"hits": {
"total": {
"value": 2, <-- note this
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "65112793",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"title": "elasticsearch"
}
},
{
"_index": "65112793",
"_type": "_doc",
"_id": "5",
"_score": 0.0,
"_source": {
"title": "elasticsearch-query"
}
}
]
}

elasticsearch - get intermediate scores within 'function_score'

Here's my index
POST /blogs/1
{
"name" : "learn java",
"popularity" : 100
}
POST /blogs/2
{
"name" : "learn elasticsearch",
"popularity" : 10
}
My search query:
GET /blogs/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": "learn"
}
},
"script_score": {
"script": {
"source": "_score*(1+Math.log(1+doc['popularity'].value))"
}
}
}
}
}
which returns:
[
{
"_index": "blogs",
"_type": "1",
"_id": "AW5fxnperVbDy5wjSDBC",
"_score": 0.58024323,
"_source": {
"name": "learn elastic search",
"popularity": 100
}
},
{
"_index": "blogs",
"_type": "1",
"_id": "AW5fxqmL8cCMCxtBYOyC",
"_score": 0.43638366,
"_source": {
"name": "learn java",
"popularity": 10
}
}
]
Problem: I need to return an extra field in results which would give me raw score (just tf/idf which doesn't take popularity into account)
Things I have explored: script_fields (which doesn't give access to _score at fetch time.
The problem is in the way you are querying, which over-writes the _score variable. Instead if you use sort then _score isn't changed and can be pulled up within the same query.
You can try querying this way :
{
"query": {
"match": {
"name": "learn"
}
},
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "_score*(1+Math.log(1+doc['popularity'].value))"
},
"order": "desc"
}
},
"_score"
]
}

Elastic search ---: MUST_NOT query not working

I have a query in which i want to add a must_not clause that would discard all records that have blank data for a some field. I tried a lot of ways but none worked. when I issue the same query (mentioned below) with other specific fields then it works fine.
this query should get all records that do not have "registrationType1" field empty/blank
query:
{
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1": ""
}
}
]
}
}
}
the results below still contains "registrationType1" with empty values
results:
**"_source": {
"registrationType1": ""}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842002",
"_score": 1,
"_source": {
"registrationType1": "A&R"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842033",
"_score": 1,
"_source": {
"registrationType1": "AMHA"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842213",
"_score": 1,
"_source": {
"registrationType1": "AMHA"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842963",
"_score": 1,
"_source": {
"registrationType1": ""}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3869063",
"_score": 1,
"_source": {
"registrationType1": ""}}**
PFB mappings for the field above
"registrationType1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
You need to use the keyword subfield in order to do this:
{
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": "" <-- change this
}
}
]
}
}
}
If you do not specify any text value on the text fields, there is basically nothing to analyze and return the documents accordingly.
In similar way, if you remove must_not and replace it with must, it would show empty results.
What you can do is, looking at your mapping, query must_not on keyword field. Keyword fields won't be analysed and in that way your query would return the results as you expect.
Query
POST myemptyindex/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": ""
}
}
]
}
}
}
Hope this helps!
I am using elasticsearch version 7.2,
I replicated your data and ingested in my elastic index,and tried querying with and without .keyword.
I am getting the desired result when using the ".keyword" in the field name.It is not returning the docs which have registrationType1="".
Note - The query does not works when not using the ".keyword"
I have added my sample code below, have a look if that helps.
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index="test", ignore=400, body={
"mappings": {
"_doc": {
"properties": {
"registrationType1": {
"type": "text",
"field": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
})
data = {
"registrationType1": ""
}
es.index(index="test",doc_type="_doc",body=data,id=1)
search = es.search(index="test", body={
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": ""
}
}
]
}
}
})
print(search)
Executing the above should not return any results as we are inserting empty for the field
There was some issue with the mappings itself, I deleted the index and re-indexed it with new mappings and its working now.

Resources