Query elasticsearch where a key's value is at least some number - elasticsearch

I am processing files to recognize if they contain labels and what the confidence the label was recognized.
I created a nested mapping called tags which contains label (text) and confidence (float between 0 and 100).
Here is an example of how I think the query would work (I know it's invalid). It should be a something like "Find documents that have the tags labelled A and B. A must have a confidence of at least 37 and B must have a confidence of at least 80".
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"match": {
"tags.label": "A"
},
"range": {
"tags.confidence": {
"gte": 37
}
}
},
{
"match": {
"tags.label": "B"
},
"range": {
"tags.confidence": {
"gte": 80
}
}
}
]
}
}
}
}
}
Any ideas? I am pretty sure I need to approach it differently (different mapping). I am not sure how to accomplish this in ElasticSearch. Is this possible?

Let's say your parent document would contain two nested documents, something like below:
{
"tags":[
{
"label":"A",
"confidence":40
},
{
"label":"B",
"confidence":85
}
]
}
If that is the case, below is how your query would be:
Nested Query:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"match": {
"tags.label": "A"
}
},
{
"range": {
"tags.confidence": {
"gte": 37
}
}
}
]
}
}
}
},
{
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"match": {
"tags.label": "B"
}
},
{
"range": {
"tags.confidence": {
"gte": 80
}
}
}
]
}
}
}
}
]
}
}
}
Note that each nested document is indexed as a separate document. That is the reason you have to mention two separate queries. Otherwise, with what you have what it does it, it would search all the four values inside one/single nested document of its parent document.
Hope this helps!

Related

Elasticsearch combine term and range query on nested key/value data

I have ES documents structured in a flat data structure using the nested data type, as they accept arbitrary JSON that we don't control, and we need to avoid a mapping explosion. Here's an example document:
{
"doc_flat":[
{
"key":"timestamp",
"type":"date",
"key_type":"timestamp.date",
"value_date":[
"2023-01-20T12:00:00Z"
]
},
{
"key":"status",
"type":"string",
"key_type":"status.string",
"value_string":[
"warning"
]
},
... more arbitrary fields ...
],
}
I've figured out how to query this nested data set to find matches on this arbitrary nested data, using a query such as:
{
"query": {
"nested": {
"path": "doc_flat",
"query": {
"bool": {
"must": [
{"term": {"doc_flat.key": "status"}},
{"term": {"doc_flat.value_string": "warning"}}
]
}
}
}
}
}
And I figured out how to find documents matching a particular date range:
{
"query": {
"nested": {
"path": "doc_flat",
"query": {
"bool": {
"must": [
{"term": {"doc_flat.key": "timestamp"}},
{
"range": {
"doc_flat.value_date": {
"gte": "2023-01-20T00:00:00Z",
"lte": "2023-01-21T00:00:00Z"
}
}
}
]
}
}
}
}
}
But I'm struggling to combine these two queries together, in order to search for documents that have a nested documents which match these two conditions:
a doc_flat.key of status, and a doc_flat.value_string of warning
a doc_flat.key of timestamp, and a doc_flat.value_date in a range
Obviously I can't just shove the second set of query filters into the same must array, because then no documents will match. I think I need to go "one level higher" in my query and wrap it in another bool query? But I can't get my head around how that would look.
You tried two nested inside Bool query?
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "doc_flat",
"query": {
"bool": {
"must": [
{
"term": {
"doc_flat.key": "timestamp"
}
},
{
"range": {
"doc_flat.value_date": {
"gte": "2023-01-20T00:00:00Z",
"lte": "2023-01-21T00:00:00Z"
}
}
}
]
}
}
}
}
],
"must": [
{
"nested": {
"path": "doc_flat",
"query": {
"bool": {
"must": [
{
"term": {
"doc_flat.key": "status"
}
},
{
"term": {
"doc_flat.value_string": "warning"
}
}
]
}
}
}
}
]
}
}
}

Compond query with Elasticsearch

I'm trying to perform a search with the intended criteria being (activationDate in range 1598889600 to 1602051579) or someFlag=true.
Below is the query I tried, but it does not yield any records with someFlag=true (even with a big size, e.g. 5000). My Elasticsearch does have a lot of records with someFlag=true.
There are about 3000 total documents and this query returns around 280 documents.
{
"query": {
"bool": {
"must": [
{
"range": {
"activationDate": {
"gte": 1598889600
}
}
},
{
"range": {
"activationDate": {
"lte": 1602051579
}
}
}
],
"should": {
"match": {
"someFlag": true
}
}
}
},
"from": 1,
"size": 1000
}
Am I missing something?
This should work:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
]
}
}
}
In theory this should do the same:
{
"query": {
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
}
However the first query I've given wraps bool clause within a filter context (so that it does not need to score and query becomes cacheable).
Your bool query might have not worked because you were using match query, not term. match is normally used for text search only.
Replace the must with an should and set minimum_should_match=1 as is is an OR query and you are fine if just one of the ceiterias is met by any record. Next reduce the two range criterias to just one, where you combine gte and lte.

Elasticsearch: execute a filter on nested document only if it exists

I am using ES 2.3 and have a query in which filter section looks as follows:
"filter": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"range": {
"employees.max_age": {
"lte": 50
}
}
},
{
"range": {
"employees.min_age": {
"gte": 20
}
}
}
]
}
}
}
},
{
"exists": {
"field": "employees"
}
},
{
#....other filter here based on root document, not on nested employee document
}
]
}
}
}
}
I have a filter, where I check some conditions in the nested document "employees" in a bigger document called company, But I want to run this filter, only if "employees" object exists, as some of the document may not have that nested document at all. So I added , {"exists": {"field": "employees"}}
but this doesn't seem to work. Any idea what change I should make to get it work?
You can do it like this. However, if documents don't have the employees field, they will not be picked up anyway, so I'm not sure why you want/need that exists query in the first place.
{
"filter": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "employees",
"query": {
"exists": {
"field": "employees"
}
}
}
},
{
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"range": {
"employees.max_age": {
"lte": 50
}
}
},
{
"range": {
"employees.min_age": {
"gte": 20
}
}
}
]
}
}
}
}
]
}
}
}
}

ElasticSearch How to AND a nested query

I am trying to figure out how to AND my Elastic Search query. I've tried a few different variations but I am always hitting a parser error.
What I have is a structure like this:
{
"title": "my title",
"details": [
{ "name": "one", "value": 100 },
{ "name": "two", "value": 21 }
]
}
I have defined details as a nested type in my mappings. What I'm trying to achieve is a query where it matches a part of the title and it matches various details by the detail's name and value.
I have the following query which gets me nearly there but I haven't been able to figure out how to AND the details. As an example I'd like to find anything that has:
detail of one with value less than or equal to 100
AND detail of two with value less than or equal to 25
The following query only allows me to search by one detail name/value:
"query" : {
"bool": {
"must": [
{ "match": {"title": {"query": titleQuery, "operator": "and" } } },
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{ "match": {"details.name" : "one"} },
{ "range": {"details.value" : { "lte": 100 } } }
]
}
}
} // nested
}
] // must
}
}
As a second question, would it be better to query the title and then move the nested part of the query into a filter?
You were so close! Just add another "nested" clause in your outer "must":
POST /test_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "title",
"operator": "and"
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "one" } },
{ "range": { "details.value": { "lte": 100 } } }
]
}
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "two" } },
{ "range": { "details.value": { "lte": 25 } } }
]
}
}
}
}
]
}
}
}
Here is some code I used to test it:
http://sense.qbox.io/gist/1fc30d49a810d22e85fa68d781114c2865a7c92e
EDIT: Oh, the answer to your second question is "yes", though if you're using 2.0 things have changed a little.

Filter with match_all VS query

I have 2 types of queries. They are both logically identical however I'm not sure if there is any performance difference between the two.
I will be glad if someone can enlighten me.
Using match_all and filter:
{
"query": {
"filtered": {
"query": {
"term": {
"user_id": "1234567"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Using term query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Looking at your query it seems like you don't care about how documents are scored based on the value of user_id field being "1234567". What I mean to say is - If more than one document have user_id set to "1234567", you don't care about the order of documents in the result. If that is the case, 2nd option is better with respect to performance because there is some computation cost associated with scoring in the 1st query while there is no scoring in the 2nd query. By the way, your 2nd query can also be simplified to below:
{
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}

Resources