How does "must" clause with an array of "match" clauses really mean? - elasticsearch

I have an elasticsearch query which looks like this...
"query": {
"bool": {
"must": [{
"match": {"attrs.name": "username"}
}, {
"match": {"attrs.value": "johndoe"}
}]
}
}
... and documents in the index that look like this:
{
"key": "value",
"attrs": [{
"name": "username",
"value": "jimihendrix"
}, {
"name": "age",
"value": 23
}, {
"name": "alias",
"value": "johndoe"
}]
}
Which of the following does this query really mean?
Document should contain either attrs.name = username OR attrs.value = johndoe
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, even if they may match different elements in the attrs array (this would mean that the document given above would match the query)
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, but they must match the same element in the attrs array (which would mean that the document given above would not match the query)
Further, how do I write a query to express #3 from the list above, i.e. the document should match only if a single element inside the attrs array matches both the following conditions:
attrs.name = username
attrs.value = johndoe

Must stands for "And" so a document satisfying all the clauses in match query is returned.
Must will not satisfy point 1. Document should contain either attrs.name = username OR attrs.value = johndoe- you need a should clause which works like "OR"
Whether Must will satisfy Point 2 or point 3 depends on the type of "attrs" field.
If "attr" field type is object then fields are flattened that is no relationship maintained between different fields for array. So must query will return a document if any attrs.name="username" and attrs.value="John doe", even if they are not part of same object in that array.
If you want an object in an array to act like a separate document, you need to use nested field and use nested query to match documents
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {}, --> returns matched nested documents
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
hits in the response will contain all nested documents , to get all matched nested documents , inner_hits has to be specified

Based on your requirements you need to define your attrs field as nested, please refer nested type in Elasticsearch for more information. Disclaimer : it maintains the relationship but costly to query.
Answer to your other two questions also depends on what data type you are using please refer nested vs object data type for more details
Edit: solution using sample mapping, example docs and expected result
Index mapping using nested type
{
"mappings": {
"properties": {
"attrs": {
"type": "nested"
}
}
}
}
Index 2 sample doc one which severs the criteria and other which doesn't
{
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
}
Another which serves criteria
{
"attrs": [
{
"name": "username",
"value": "jimihendrix"
},
{
"name": "age",
"value": 23
},
{
"name": "alias",
"value": "johndoe"
}
]
}
And search query
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
And Search result
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_score": 1.7509375,
"_source": {
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
},
"inner_hits": {
"attrs": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7509375,
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "attrs",
"offset": 0
},
"_score": 1.7509375,
"_source": {
"name": "username",
"value": "johndoe"
}
}
]
}
}
}
}
]

Related

Query with filter array field

I want to return documents that include only some of array field members.
For example, I have of two order documents:\
{
"orderNumber":"ORD-111",
"items":[{"name":"part-1","status":"new"},
{"name":"part-2","status":"paid"}]
}
{
"orderNumber":"ORD-112",
"items":[{"name":"part-3","status":"paid"},
{"name":"part-4","status":"supplied"}]
}
I want to create a query so that my result will include all the order documents but only with items that match {"status":"supplied"}.
The result should look like:\
{
"orderNumber":"ORD-111",
"items":[]
}
{
"orderNumber":"ORD-112",
"items":[{"name":"part-4","status":"supplied"}]
}
You can use a nested query along with inner_hits to get only matching array values in the result
Adding a working example
Index Mapping:
{
"mappings": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"match": {
"items.status": "supplied"
}
}
]
}
},
"inner_hits": {}
}
}
}
Search Result:
"hits": [
{
"_index": "67890614",
"_type": "_doc",
"_id": "2",
"_score": 1.2039728,
"_source": {
"orderNumber": "ORD-112",
"items": [
{
"name": "part-3",
"status": "paid"
},
{
"name": "part-4",
"status": "supplied"
}
]
},
"inner_hits": {
"items": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2039728,
"hits": [
{
"_index": "67890614",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "items",
"offset": 1
},
"_score": 1.2039728,
"_source": {
"name": "part-4",
"status": "supplied" // note this
}
}
]
}
}
}
}
]
Elasticsearch flats the matching field so is unable to tell which was the actual element in the array that matches.
As previously answered you could use nested queries.
How arrays of objects are flattened
Elasticsearch has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values. For instance, consider the following document:
PUT my-index-000001/_doc/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
The user field is dynamically added as a field of type object.
The previous document would be transformed internally into a document that looks more like this:
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
The user.first and user.last fields are flattened into multi-value fields, and the association between alice and white is lost. This document would incorrectly match a query for alice AND smith:
GET my-index-000001/_search
{
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "Smith" }}
]
}
}
}
To answer your question:
If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested data type instead of the object data type.
Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others with the nested query:
PUT my-index-000001
{
"mappings": {
"properties": {
"user": {
"type": "nested"
}
}
}
}
PUT my-index-000001/_doc/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
GET my-index-000001/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "Smith" }}
]
}
}
}
}
}
GET my-index-000001/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "White" }}
]
}
},
"inner_hits": {
"highlight": {
"fields": {
"user.first": {}
}
}
}
}
}
}
The user field is mapped as type nested instead of type object.
This query doesn’t match because Alice and Smith are not in the same nested object.
This query matches because Alice and White are in the same nested object.
inner_hits allow us to highlight the matching nested documents.
Interacting with nested documents
Nested documents can be:
queried with the nested query.
analyzed with the nested and reverse_nested aggregations.
sorted with nested sorting.
retrieved and highlighted with nested inner hits.
Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query, the nested/reverse_nested aggregations, or nested inner hits.
consider performance when taking this approach as it is by magnitudes more expensive.
for more details
ou can check the source:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Elasticsearch query match + term boolean

I have documents in elasticsearch index with a "type" field, like this:
[
{
"id": 1,
"serviceDescription": "a bunch of text",
"serviceTitle": "title",
"serviceTags":["tag1","tag2"]
"type":"service"
},
{
"id": 2,
"companyDescription": "a bunch of text more",
"companyTitle": "title",
"companyTags":["tag1","tag2"]
"type":"company"
},...
]
I want to run a match query across all docs in my index, like this:
body = {
"query": {
"match": {
"_all":"sequencing"
}
}
}
but add a filter to only return results where the "type" field equals "service".
As far as I can understand your question, you want to query for sequencing query string, across all the fields, for that
you can use the multi_match query that builds on the match query to allow multi-field queries.
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Search Query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "bunch of text"
}
}
],
"filter": {
"term": {
"type": "service"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "64867032",
"_type": "_doc",
"_id": "1",
"_score": 0.8630463,
"_source": {
"id": 1,
"serviceDescription": "a bunch of text",
"serviceTitle": "title",
"serviceTags": [
"tag1",
"tag2"
],
"type": "service"
}
}
]

How to search on queried documents in Elasticsearch?

I am a newbie in Elasticsearch and I am facing a problem. My task is searching on a set of documents. For example, I have data with struct like this:
type Doc struct{
id string
project_id string
code string
name string
status string
}
But the difficult thing is I what to get all the documents with project_id=abc then search on them by any other fields (code,name,status) that match the keyword 'test' (for example). How can I do that in Elasticsearch query, please help me!
Thanks.
You can use a boolean query that matches documents matching
boolean combinations of other queries.
must is the same as logical AND operator and should is the same as logical OR operator
Adding a working example with index data, search query, and search result
Index Data:
{
"id": 2,
"project_id":"abc",
"code": "a",
"name":"bhavya",
"status":"engineer"
}
{
"id": 1,
"project_id":"abc",
"code": "a",
"name":"bhavya",
"status":"student"
}
{
"id": 3,
"project_id":"def",
"code": "a",
"name":"deeksha",
"status":"engineer"
}
Search Query:
The given query satisfies the condition that "project_id" = "abc" AND "name" : "bhavya" AND "status":"student"
{
"query": {
"bool": {
"must": [
{
"match": {
"project_id": "abc"
}
},
{
"match": {
"name": "bhavya"
}
},
{
"match": {
"status": "student"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64274465",
"_type": "_doc",
"_id": "1",
"_score": 1.7021472,
"_source": {
"id": 1,
"project_id": "abc",
"code": "a",
"name": "bhavya",
"status": "student"
}
}
]
I think maybe using Logstash and filtering some data can help you.
Here is my solution
GET /[index_name]/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "some text here"
}
}
],
"filter": [
{
"term": {
"project_id.keyword": "abc"
}
}
]
}
}
}

ElasticSearch rescorer plugin. How to parse inner hits with original scores

I create a query to ES:
GET my-index/_search
{
"query": {
"nested": {
"inner_hits": {},
"score_mode": "max",
"path": "my_nested_field",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"my_nested_field.value.token_analyzed": {
"query": "Looking for something like this"
}
}
}
]
}
}
]
}
}
}
},
"rescore": {
"my_plugin_name": {
}
}
}
Documents in index are something like:
{
"some_field": "some_value",
"some_other_field": "some_other_value",
"my_nested_field": [
{
"value": "some nested value",
"something_else": "something else"
},
{
"value": "some nested value 2",
"something_else": "something else 2"
}
]
]
}
My custom rescorer plugin is executed and everything is good. I would like to optimize my plugin though. Currently when I hit some document I use every element in my_nested_field to rescore the top level document. I would like to use only the ones that actually caused the hit for rescoring the top level document. But I don't know how to filter out the ones that did not cause the hit in the plugin.
My current code:
public TopDocs rescore(TopDocs topDocs, IndexSearcher searcher, RescoreContext rescoreContext) throws IOException {
for (int i = 0; i < topDocs.scoreDocs.length; i++) {
Document document = searcher.doc(topDocs.scoreDocs[i].doc);
String json = parserSource(document);
}
...
private String parseSource(Document document) {
return new String(document.getField("_source").binaryValue().bytes, StandardCharsets.UTF_8);
}
The thing that I'm looking for is not in the path _source, but the only things I can parse like this are _source and _id. I expect it's because you can only parse stored fields. But surely there must be somehow I can parse the inner hits scoring results?
In the actual ES response right next to each documents source there is this (but I dont know how to parse this stuff in plugin):
"inner_hits": {
"my_nested_field": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 4.2184687,
"hits": [ // I NEED THIS STUFF NOT THE _source
{
"_index": "my-index",
"_type": "_doc",
"_id": "8b3d929a-8e90-4ce7-aa1e-7f11ec16de1e",
"_nested": {
"field": "my_nested_field",
"offset": 2
},
"_score": 4.2184687,
"_source": {
"value": "Some value which was actually hit",
}
}
]
}
}
}
Side note: I need the full document after I make the query, not just the nested fields.

ElasticSearch querying nested objects not return result

I have this object stored like this
{
"_index": "sessions_user_dev",
"_type": "user",
"_id": "322",
"_version": 1,
"_source": {
"id": 322,
"createdAt": "2015-07-09T00:12:45+00:00",
"firstName": "Amy",
"lastName": "John",
"openLocations": [
{
"id": 4,
"code": "QLD",
"label": "Queensland",
"country": "AU"
}
]
}
}
And I would like to set a term for the openLocations and here is my code
{
"query": {
"term": {
"user.openLocations.code": {
"value": "QLD"
}
}
}
}
But it always return zero result. I have also tried to change the field to openLocations.code without user infront but still no luck. Also tried:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"openLocations.code": "QLD"
}
}
]
}
}
}
}
}
But still no result. Have tried to use nested query but it always say [nested] failed to find nested object under path [user.openLocations].
My Elasticsearch 5.4
Thx in advance
What is the mapping of openLocations.code? It should be not analyzed if you want to use a term query, since this type of query will look up the exact value from the index.

Resources