Query with filter array field - elasticsearch

I want to return documents that include only some of array field members.
For example, I have of two order documents:\
{
"orderNumber":"ORD-111",
"items":[{"name":"part-1","status":"new"},
{"name":"part-2","status":"paid"}]
}
{
"orderNumber":"ORD-112",
"items":[{"name":"part-3","status":"paid"},
{"name":"part-4","status":"supplied"}]
}
I want to create a query so that my result will include all the order documents but only with items that match {"status":"supplied"}.
The result should look like:\
{
"orderNumber":"ORD-111",
"items":[]
}
{
"orderNumber":"ORD-112",
"items":[{"name":"part-4","status":"supplied"}]
}

You can use a nested query along with inner_hits to get only matching array values in the result
Adding a working example
Index Mapping:
{
"mappings": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"match": {
"items.status": "supplied"
}
}
]
}
},
"inner_hits": {}
}
}
}
Search Result:
"hits": [
{
"_index": "67890614",
"_type": "_doc",
"_id": "2",
"_score": 1.2039728,
"_source": {
"orderNumber": "ORD-112",
"items": [
{
"name": "part-3",
"status": "paid"
},
{
"name": "part-4",
"status": "supplied"
}
]
},
"inner_hits": {
"items": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2039728,
"hits": [
{
"_index": "67890614",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "items",
"offset": 1
},
"_score": 1.2039728,
"_source": {
"name": "part-4",
"status": "supplied" // note this
}
}
]
}
}
}
}
]

Elasticsearch flats the matching field so is unable to tell which was the actual element in the array that matches.
As previously answered you could use nested queries.
How arrays of objects are flattened
Elasticsearch has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values. For instance, consider the following document:
PUT my-index-000001/_doc/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
The user field is dynamically added as a field of type object.
The previous document would be transformed internally into a document that looks more like this:
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
The user.first and user.last fields are flattened into multi-value fields, and the association between alice and white is lost. This document would incorrectly match a query for alice AND smith:
GET my-index-000001/_search
{
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "Smith" }}
]
}
}
}
To answer your question:
If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested data type instead of the object data type.
Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others with the nested query:
PUT my-index-000001
{
"mappings": {
"properties": {
"user": {
"type": "nested"
}
}
}
}
PUT my-index-000001/_doc/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
GET my-index-000001/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "Smith" }}
]
}
}
}
}
}
GET my-index-000001/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "White" }}
]
}
},
"inner_hits": {
"highlight": {
"fields": {
"user.first": {}
}
}
}
}
}
}
The user field is mapped as type nested instead of type object.
This query doesn’t match because Alice and Smith are not in the same nested object.
This query matches because Alice and White are in the same nested object.
inner_hits allow us to highlight the matching nested documents.
Interacting with nested documents
Nested documents can be:
queried with the nested query.
analyzed with the nested and reverse_nested aggregations.
sorted with nested sorting.
retrieved and highlighted with nested inner hits.
Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query, the nested/reverse_nested aggregations, or nested inner hits.
consider performance when taking this approach as it is by magnitudes more expensive.
for more details
ou can check the source:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Related

Query that satisfies all conditions in an array

The documents are stored in the form below in Elastic Research index.
mapping
{
"mappings": {
"properties": {
"data": {
"type": "nested"
}
}
}
}
first docs
{
"data": [
{
"value": "a"
},
{
"value": "a"
},
{
"value": "b"
}
]
}
second docs
{
"data": [
{
"value": "a"
},
{
"value": "a"
},
{
"value": "a"
}
]
}
I want to return the document only when all values in the array are 'a' (second docs)
In this case, how should I make the query condition?
The nested query searches nested field objects as if they were indexed
as separate documents. If an object matches the search, the nested
query returns the root parent document.
When using a combination of bool query with must and must_not, it searches for each individual nested object and eliminates the objects that do not match, but if there are some nested objects left, that match with your query, you will get your results.
Try out this below search query, where all the documents are discarded who have a nested object with the b value.
Search Query:
{
"query": {
"bool": {
"must_not": {
"nested": {
"path": "data",
"query": {
"term": {
"data.value": "b"
}
}
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64329782",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"data": [
{
"value": "a"
},
{
"value": "a"
},
{
"value": "a"
}
]
}
}
]
Search Query with the combination of multiple bool and nested queries:
The below search query will also give you the required result.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"must": [
{
"match": {
"data.value": "a"
}
}
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"must": [
{
"match": {
"data.value": "b"
}
}
]
}
}
}
}
]
}
}
}

How to search on queried documents in Elasticsearch?

I am a newbie in Elasticsearch and I am facing a problem. My task is searching on a set of documents. For example, I have data with struct like this:
type Doc struct{
id string
project_id string
code string
name string
status string
}
But the difficult thing is I what to get all the documents with project_id=abc then search on them by any other fields (code,name,status) that match the keyword 'test' (for example). How can I do that in Elasticsearch query, please help me!
Thanks.
You can use a boolean query that matches documents matching
boolean combinations of other queries.
must is the same as logical AND operator and should is the same as logical OR operator
Adding a working example with index data, search query, and search result
Index Data:
{
"id": 2,
"project_id":"abc",
"code": "a",
"name":"bhavya",
"status":"engineer"
}
{
"id": 1,
"project_id":"abc",
"code": "a",
"name":"bhavya",
"status":"student"
}
{
"id": 3,
"project_id":"def",
"code": "a",
"name":"deeksha",
"status":"engineer"
}
Search Query:
The given query satisfies the condition that "project_id" = "abc" AND "name" : "bhavya" AND "status":"student"
{
"query": {
"bool": {
"must": [
{
"match": {
"project_id": "abc"
}
},
{
"match": {
"name": "bhavya"
}
},
{
"match": {
"status": "student"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64274465",
"_type": "_doc",
"_id": "1",
"_score": 1.7021472,
"_source": {
"id": 1,
"project_id": "abc",
"code": "a",
"name": "bhavya",
"status": "student"
}
}
]
I think maybe using Logstash and filtering some data can help you.
Here is my solution
GET /[index_name]/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "some text here"
}
}
],
"filter": [
{
"term": {
"project_id.keyword": "abc"
}
}
]
}
}
}

How does "must" clause with an array of "match" clauses really mean?

I have an elasticsearch query which looks like this...
"query": {
"bool": {
"must": [{
"match": {"attrs.name": "username"}
}, {
"match": {"attrs.value": "johndoe"}
}]
}
}
... and documents in the index that look like this:
{
"key": "value",
"attrs": [{
"name": "username",
"value": "jimihendrix"
}, {
"name": "age",
"value": 23
}, {
"name": "alias",
"value": "johndoe"
}]
}
Which of the following does this query really mean?
Document should contain either attrs.name = username OR attrs.value = johndoe
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, even if they may match different elements in the attrs array (this would mean that the document given above would match the query)
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, but they must match the same element in the attrs array (which would mean that the document given above would not match the query)
Further, how do I write a query to express #3 from the list above, i.e. the document should match only if a single element inside the attrs array matches both the following conditions:
attrs.name = username
attrs.value = johndoe
Must stands for "And" so a document satisfying all the clauses in match query is returned.
Must will not satisfy point 1. Document should contain either attrs.name = username OR attrs.value = johndoe- you need a should clause which works like "OR"
Whether Must will satisfy Point 2 or point 3 depends on the type of "attrs" field.
If "attr" field type is object then fields are flattened that is no relationship maintained between different fields for array. So must query will return a document if any attrs.name="username" and attrs.value="John doe", even if they are not part of same object in that array.
If you want an object in an array to act like a separate document, you need to use nested field and use nested query to match documents
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {}, --> returns matched nested documents
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
hits in the response will contain all nested documents , to get all matched nested documents , inner_hits has to be specified
Based on your requirements you need to define your attrs field as nested, please refer nested type in Elasticsearch for more information. Disclaimer : it maintains the relationship but costly to query.
Answer to your other two questions also depends on what data type you are using please refer nested vs object data type for more details
Edit: solution using sample mapping, example docs and expected result
Index mapping using nested type
{
"mappings": {
"properties": {
"attrs": {
"type": "nested"
}
}
}
}
Index 2 sample doc one which severs the criteria and other which doesn't
{
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
}
Another which serves criteria
{
"attrs": [
{
"name": "username",
"value": "jimihendrix"
},
{
"name": "age",
"value": 23
},
{
"name": "alias",
"value": "johndoe"
}
]
}
And search query
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
And Search result
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_score": 1.7509375,
"_source": {
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
},
"inner_hits": {
"attrs": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7509375,
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "attrs",
"offset": 0
},
"_score": 1.7509375,
"_source": {
"name": "username",
"value": "johndoe"
}
}
]
}
}
}
}
]

ElasticSearch querying nested objects not return result

I have this object stored like this
{
"_index": "sessions_user_dev",
"_type": "user",
"_id": "322",
"_version": 1,
"_source": {
"id": 322,
"createdAt": "2015-07-09T00:12:45+00:00",
"firstName": "Amy",
"lastName": "John",
"openLocations": [
{
"id": 4,
"code": "QLD",
"label": "Queensland",
"country": "AU"
}
]
}
}
And I would like to set a term for the openLocations and here is my code
{
"query": {
"term": {
"user.openLocations.code": {
"value": "QLD"
}
}
}
}
But it always return zero result. I have also tried to change the field to openLocations.code without user infront but still no luck. Also tried:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"openLocations.code": "QLD"
}
}
]
}
}
}
}
}
But still no result. Have tried to use nested query but it always say [nested] failed to find nested object under path [user.openLocations].
My Elasticsearch 5.4
Thx in advance
What is the mapping of openLocations.code? It should be not analyzed if you want to use a term query, since this type of query will look up the exact value from the index.

Aggregations elasticsearch 5

In my elastic search index has following type of entries.
{
"_index": "employees",
"_type": "employee",
"_id": "10000",
"_score": 1.3640093,
"_source": {
"itms": {
"depc": [
"IT",
"MGT",
"FIN"
],
"dep": [
{
"depn": "Information Technology",
"depc": "IT"
},
{
"depn": "Management",
"depc": "MGT"
},
{
"depn": "Finance",
"depc": "FIN"
},
{
"depn": "Finance",
"depc": "FIN"
}
]
}
}
}
Now I an trying to get unique department list including department code (depc) and department name (depn).
I was trying following but it doesn't give result what I expected.
{
"size": 0,
"query": {},
"aggs": {
"departments": {
"terms": {
"field": "itms.dep.depc",
"size": 10000,
"order": {
"_term": "asc"
}
},
"aggs": {
"department": {
"terms": {
"field": "itms.dep.depn",
"size": 10
}
}
}
}
}
}
Any suggestions are appreciated.
Thanks You
From your agg query, it seems like the mapping type for itms.dep is object and not nested
Lucene has no concept of inner objects, so Elasticsearch flattens
object hierarchies into a simple list of field names and values.
Hence, your doc has internally transformed to :
{
"depc" : ["IT","MGT","FIN"],
"dep.depc" : [ "IT","MGT","FIN"],
"dep.depn" : [ "Information Technology", "Management", "Finance" ]
}
i.e. you have lost the association between depc and depn
To fix this :
You need to change your object type to nested
Use nested aggregation
The structure of your existing agg query seems fine to me but you will have to convert it to a nested aggregation post the mapping update

Resources