ElasticSearch : search and return nested type - elasticsearch

I am pretty new to ElasticSearch and I am having trouble using nested mapping / query.
I have the following data structure added to my index :
{
"_id": "3",
"_rev": "6-e9e1bc15b39e333bb4186de05ec1b167",
"skuCode": "test",
"name": "Dragon vol. 1",
"pages": [
{
"id": "1",
"tags": [
{
"name": "dragon"
},
{
"name": "japonese"
}
]
},
{
"id": "2",
"tags": [
{
"name": "tagforanotherpage"
}
]
}
]
}
This index mapping is defined as bellow :
{
"metabook" : {
"metabook" : {
"properties" : {
"_rev" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"pages" : {
"type" : "nested",
"properties" : {
"tags" : {
"properties" : {
"name" : {
"type" : "string"
}
}
}
}
},
"skuCode" : {
"type" : "string"
}
}
}
}
}
My goal is to search all pages containing a specific tag, and return the book object with the filtered page list (I would like ES to return only pages that match the given tag). Something like (ignoring the second page) :
{
"_id": "3",
"_rev": "6-e9e1bc15b39e333bb4186de05ec1b167",
"skuCode": "test",
"name": "Dragon vol. 1",
"pages": [
{
"id": "1",
"tags": [
{
"name": "dragon"
},
{
"name": "japonese"
}
]
}
]
}
Here is the query I actually use :
{
"from": 0,
"size": 10,
"query" : {
"nested" : {
"path" : "pages",
"score_mode" : "avg",
"query" : {
"term" : { "tags.name" : "japonese" }
}
}
}
}
But it actually returns an empty result. What am I doing wrong ? Maybe I should index my "pages" directly instead of books ? What am I missing ?
Thank you in advance !

Sadly you can't get back only parts of the a document. If the document matches a query, you will get the whole thing back; the root and all nested docs. If you want to get only parts back, then you could look at using parent/child docs.
Also you aren't seeing any hits as you have a small syntax error in the nested query. Look closely at the field name:
{
"from": 0,
"size": 10,
"query" : {
"nested" : {
"path" : "pages",
"score_mode" : "avg",
"query" : {
"term" : { "pages.tags.name" : "japonese" }
}
}
}
}
If you need help with parent child docs feel free to ask! (There should be examples if you do a google search)
Good luck!

Related

ElasticSearch Query fields based on conditions on another field

Mapping
PUT /employee
{
"mappings": {
"post": {
"properties": {
"name": {
"type": "keyword"
},
"email_ids": {
"properties":{
"id" : { "type" : "integer"},
"value" : { "type" : "keyword"}
}
},
"primary_email_id":{
"type": "integer"
}
}
}
}
}
Data
POST employee/post/1
{
"name": "John",
"email_ids": [
{
"id" : 1,
"value" : "1#email.com"
},
{
"id" : 2,
"value" : "2#email.com"
}
],
"primary_email_id": 2 // Here 2 refers to the id field of email_ids.id (2#email.com).
}
I need help to form a query to check if an email id is already taken as a primary email?
eg: If I query for 1#email.com I should get result as No as 1#email.com is not a primary email id.
If I query for 2#email.com I should get result as Yes as 2#email.com is a primary email id for John.
As far as i know with this mapping you can not achive what you are expecting.
But, You can create email_ids field as nested type and add one more field like isPrimary and set value of it to true whenever email is primary email.
Index Mapping
PUT employee
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"email_ids": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"value": {
"type": "keyword"
},
"isPrimary":{
"type": "boolean"
}
}
},
"primary_email_id": {
"type": "integer"
}
}
}
}
Sample Document
POST employee/_doc/1
{
"name": "John",
"email_ids": [
{
"id": 1,
"value": "1#email.com"
},
{
"id": 2,
"value": "2#email.com",
"isPrimary": true
}
],
"primary_email_id": 2
}
Query
You need to keep below query as it is and only need to change email address when you want to see if email is primary or not.
POST employee/_search
{
"_source": false,
"query": {
"nested": {
"path": "email_ids",
"query": {
"bool": {
"must": [
{
"term": {
"email_ids.value": {
"value": "2#email.com"
}
}
},
{
"term": {
"email_ids.isPrimary": {
"value": "true"
}
}
}
]
}
}
}
}
}
Result
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.98082924,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.98082924
}
]
}
}
Interpret Result:
Elasticsearch will not return result in boolean like true or false but you can implement it at application level. You can consider value of hits.total.value from result, if it is 0 then you can consider false otherwise true.
PS: Answer is based on ES version 7.10.

Elasticsearch Nested 2 Step Sorting

Given the following data with nested objects (members within teams), I need to do a 2 step sort:
Return the youngest member of each team.
Sort the teams by the name of that youngest member.
I have a query below that is close: it does get the youngest member of each team, but then it sorts the teams using the names of all the members, not just the one selected per team.
What would the query be to do this?
And would such a query be performant assuming there was a lot of data? (Probably a few million objects each having 1-3 nested objects.)
Note: Although it's not clear in this simple example, I cannot simply store the youngest member, since in my real world case, the sorting of the nested objects is determined by a formula that includes an external parameter. This is just a very simplified example of the many sorts like this I would have to do on a larger data set, where I need to get the single best matching nested document for each outer document sorted in one way, but then sort the outer objects based on some other property of that selected nested object.
Data
PUT nested_test
{
"mappings": {
"dynamic": "strict",
"properties": {
"team": { "type": "keyword", "index": true, "doc_values": true },
"members": {
"type": "nested",
"properties": {
"name": { "type": "keyword", "index": true, "doc_values": true },
"age": { "type": "integer", "index": true, "doc_values": true}
}
}
}
}
}
PUT nested_test/_doc/1
{
"team" : "A" ,
"members" :
[
{ "name" : "Curt" , "age" : "34" } ,
{ "name" : "Dave" , "age" : "33" }
]
}
PUT nested_test/_doc/2
{
"team" : "B" ,
"members" :
[
{ "name" : "Alex" , "age" : "36" } ,
{ "name" : "Earl" , "age" : "32" }
]
}
PUT nested_test/_doc/3
{
"team" : "C" ,
"members" :
[
{ "name" : "Brad" , "age" : "35" } ,
{ "name" : "Gary" , "age" : "31" }
]
}
Attempted Query
GET nested_test/_search?filter_path=hits.hits._source.team,hits.hits.sort.*,hits.hits.inner_hits.members.hits.hits._source.*,hits.hits.inner_hits.members.hits.hits.sort.*
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "members",
"query": {
"match_all" : { }
} ,
"inner_hits": {
"size": 1,
"sort": {
"members.age": { "order": "asc" }
}
}
}
}
]
}
}
,
"sort": [
{ "members.name": {
"order": "asc" ,
"nested": {
"path": "members",
"filter": { "match_all" : { } }
}
} }
]
}
Results (If the query was correct, the teams would be in A, B, C order, but they are B, C, A)
{
"hits" : {
"hits" : [
{
"_source" : {
"team" : "B"
},
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"name" : "Earl",
"age" : "32"
}
}
]
}
}
}
},
{
"_source" : {
"team" : "C"
},
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"name" : "Gary",
"age" : "31"
}
}
]
}
}
}
},
{
"_source" : {
"team" : "A"
},
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"name" : "Dave",
"age" : "33"
}
}
]
}
}
}
}
]
}
}
I not feasable with nested sort. And you cant use the result of the inner_hits to sort your documents.
You could maybe use some runtime field with a complex script to extract the name of the youngest member at search time, but it will certainly be ugly and the performance of the query will be impacted, it will perform poorly at scale.
Since you use a nested model, you have all the data needed during indexation to store the youngest member name in a specific field at the root of the document.
Then you will be able to use a standard sort for this use case.
Its the right way to do it in Elasticsearch it you want to keep the performance.

Can't Highlight Dynamic Template Field Value in Elasticsearch

Follow up to this question.
I have a dynamic template which copies the text of a JSON blob to a single text field, and I'd like to search on that field and highlight matches. Here is my full code for ES 6.5
DELETE /test
PUT /test?include_type_name=true
{
"settings": {"number_of_shards": 1,"number_of_replicas": 1},
"mappings": {
"_doc": {
"dynamic_templates": [
{
"full_name": {
"match_mapping_type": "string",
"path_match": "content.*",
"mapping": {
"type": "text",
"copy_to": "content_text"
}
}
}
],
"properties": {
"content_text": {
"type": "text"
},
"content": {
"type": "object",
"enabled": "true"
}
}
}
}
}
PUT /test/_doc/1?refresh=true
{
"content": {
"a": {
"b": {
"text": "42"
}
}
}
}
GET /test/_search
{
"query": {
"match": {
"content_text": "42"
}
},
"highlight": {
"fields": {
"content_text": {}
}
}
}
The response does not show the highlighted content_text
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"content" : {
"a" : {
"b" : {
"text" : "42"
}
}
}
}
}
]
}
}
As you can see, the content_text field is not highlight. It's also not in the response at all. How do I get highlights for this field to show up?
This is a tricky one, but will make sense once you read what follows.
As per the official documentation on highlighting, the actual content of a field is required to exist somewhere. So if the field is not stored (i.e. the mapping does not set store to true), the actual _source is loaded and the relevant field is extracted from _source.
In your case, the content_text field doesn't exist in the _source document (i.e. it is just indexed from other text fields present in content.*) and in the mapping, the store parameter is not set to true (it is false by default).
So you simply need to change your mapping to this:
"content_text": {
"store": true,
"type": "text"
},
And then your query will yield this:
"highlight" : {
"content_text" : [
"<em>42</em>"
]
}

Elasticsearch filter by multiple fields in an object which is in an array field

The goal is to filter products with multiple prices.
The data looks like this:
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
I would like to filter by membershipLevel and price. For example, if I am a silver member and query price range 0-10, the product should not appear, but if I am a gold member, the product "a" should appear. Is this kind of query supported by Elasticsearch?
You need to make use of nested datatype for price and make use of nested query for your use case.
Please see the below mapping, sample document, query and response:
Mapping:
PUT my_price_index
{
"mappings": {
"properties": {
"name":{
"type":"text"
},
"price":{
"type":"nested",
"properties": {
"membershipLevel":{
"type":"keyword"
},
"price":{
"type":"double"
}
}
}
}
}
}
Sample Document:
POST my_price_index/_doc/1
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
Query:
POST my_price_index/_search
{
"query": {
"nested": {
"path": "price",
"query": {
"bool": {
"must": [
{
"term": {
"price.membershipLevel": "Gold"
}
},
{
"range": {
"price.price": {
"gte": 0,
"lte": 10
}
}
}
]
}
},
"inner_hits": {} <---- Do note this.
}
}
}
The above query means, I want to return all the documents having price.price range from 0 to 10 and price.membershipLevel as Gold.
Notice that I've made use of inner_hits. The reason is despite being a nested document, ES as response would return the entire set of document instead of only the document specific to where the query clause is applicable.
In order to find the exact nested doc that has been matched, you would need to make use of inner_hits.
Below is how the response would return.
Response:
{
"took" : 128,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9808291,
"_source" : {
"name" : "a",
"price" : [
{
"membershipLevel" : "Gold",
"price" : "5"
},
{
"membershipLevel" : "Silver",
"price" : "50"
},
{
"membershipLevel" : "Bronze",
"price" : "100"
}
]
},
"inner_hits" : {
"price" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "price",
"offset" : 0
},
"_score" : 1.9808291,
"_source" : {
"membershipLevel" : "Gold",
"price" : "5"
}
}
]
}
}
}
}
]
}
}
Hope this helps!
Let me take show you how to do it, using the nested fields and query and filter context. I will take your example to show, you how to define index mapping, index sample documents, and search query.
It's important to note the include_in_parent param in Elasticsearch mapping, which allows us to use these nested fields without using the nested fields.
Please refer to Elasticsearch documentation about it.
If true, all fields in the nested object are also added to the parent
document as standard (flat) fields. Defaults to false.
Index Def
{
"mappings": {
"properties": {
"product": {
"type": "nested",
"include_in_parent": true
}
}
}
}
Index sample docs
{
"product": {
"price" : 5,
"membershipLevel" : "Gold"
}
}
{
"product": {
"price" : 50,
"membershipLevel" : "Silver"
}
}
{
"product": {
"price" : 100,
"membershipLevel" : "Bronze"
}
}
Search query to show Gold with price range 0-10
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Gold"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "so-60620921-nested",
"_type": "_doc",
"_id": "1",
"_score": 1.0296195,
"_source": {
"product": {
"price": 5,
"membershipLevel": "Gold"
}
}
}
]
Search query to exclude Silver, with same price range
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Silver"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Above query doesn't return any result as there isn't any matching result.
P.S :- this SO answer might help you to understand nested fields and query on them in detail.
You have to use Nested fields and nested query to archive this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
Define you Price property with type "Nested" and then you will be able to filter by every property of nested object

Search query for elastic search

I have documents in elastic search in the following format
{
"stringindex" : {
"mappings" : {
"files" : {
"properties" : {
"BaseOfCode" : {
"type" : "long"
},
"BaseOfData" : {
"type" : "long"
},
"Characteristics" : {
"type" : "long"
},
"FileType" : {
"type" : "long"
},
"Id" : {
"type" : "string"
},
"Strings" : {
"properties" : {
"FileOffset" : {
"type" : "long"
},
"RO_BaseOfCode" : {
"type" : "long"
},
"SectionName" : {
"type" : "string"
},
"SectionOffset" : {
"type" : "long"
},
"String" : {
"type" : "string"
}
}
},
"SubSystem" : {
"type" : "long"
}
}
}
}
}
}
My requirement is when I search for a particular string (String.string) i want to get only the FileOffSet (String.FileOffSet) for that string.
How do i do this?
Thanks
I suppose that you want to perform a nested query and retrieve only one field as the result, but I see problems in your mapping, hence I will split my answer in 3 sections:
What is the problem I see:
How to query nested fields (this is more ES background):
How to find a solution:
1) What is the problem I see:
You want to query a nested field, but you don't have a nested field.
The nested field part:
The field "Strings" is not nested in the type "files" (nested data without a nested field may bring future problems), otherwise your mapping for the field "Strings" would be something like this:
{
"stringindex" : {
"mappings" : {
"files" : {
"properties" : {
"Strings" : {
"properties" : {
"type" : "nested",
"String" : {
"type" : "string"
}
}
}
}
}
}
}
}
Note: yes, I cut most of the fields, but I did this to easily show that you didn't create a nested field.
With a nested field "in hands", we need a nested query.
The specific field result part:
To retrieve only one field as result, you have to include the property "_source" in your query.
2) How to query nested fields:
This is more for ES background, if you have never worked with nested fields.
Small example:
You define a type with a nested field:
{
"nesttype" : {
"properties" : {
"name" : { "type" : "string" },
"parents" : {
"type" : "nested" ,
"properties" : {
"sex" : { "type" : "string" },
"name" : { "type" : "string" }
}
}
}
}
}
You create some inputs:
{ "name" : "Dan", "parents" : [{ "name" : "John" , "sex" : "m" },
{ "name" : "Anna" , "sex" : "f" }] }
{ "name" : "Lana", "parents" : [{ "name" : "Maria" , "sex" : "f" }] }
Then you query, but only fetch the nested field "parents.name":
{
"query": {
"nested": {
"path": "parents",
"query": {
"bool": {
"must": [
{
"term": {
"sex": "m"
}
}
]
}
}
}
},
"_source" : [ "parents.name" ]
}
The output of this query is "the name of the parents of all people who have a parent of the sex 'm' ". One entry (Dan) has a father, whereas the other (Lana) doesn't. So it only will retrieve Dan's parents names.
3) How to find a solution:
To fix your mapping:
You only need to include the type "nested" in the field "Strings":
{
"files" : {
"properties" : {
...
"Strings" : {
"type" : "nested" ,
"properties" : {
"FileOffset" : { "type" : "long" },
"RO_BaseOfCode" : { "type" : "long" },
...
}
}
...
}
}
}
To query your data:
{
"query": {
"nested": {
"path": "Strings",
"query": {
"bool": {
"must": [
{
"term": {
"String": "my string"
}
}
]
}
}
}
},
"_source" : [ "Strings.FileOffSet" ]
}
Great answer by dan, but I think he didn't mention it all.
His solution don't work for your question, but I guess you even don't know that.
Consider a scenario where data is like ,
doc_1
{
"Id": 1,
"Strings": [
{
"string": "x",
"fileoffset": "f1"
},
{
"string": "y",
"fileoffset": "f2"
}
]
}
doc_2
{
"Id": 2,
"Strings": {
"string": "z",
"fileoffset": "f3"
}
}
When you run the like dan said, like say let's apply filter with Strings.string=x then response is like,
{
"hits": [
{
"_index": "stringindex",
"_type": "files",
"_id": "11961",
"_score": 1,
"_source": {
"Strings": [
{
"fileoffset": "f1"
},
{
"fileoffset": "f2"
}
]
}
}
]
}
This is because, elasticsearch will get hits from documents where any of the object inside nested field (here Strings) pass the filter criteria. (In this case in doc_1, Strings.string=x passed filter, so doc_1 is returned. But we don't know which nested object pass the criteria.
So, you have to use nested_aggregation,
Here is a solution for you..
POST index/type/_search
{
"size": 0,
"aggs": {
"StringsNested": {
"nested": {
"path": "Strings"
},
"aggs": {
"StringFilter": {
"filter": {
"term": {
"Strings.string": "x"
}
},
"aggs": {
"FileOffsets": {
"terms": {
"field": "Strings.fileoffset"
}
}
}
}
}
}
}
}
So, response is like,
"aggregations": {
"StringsNested": {
"doc_count": 2,
"StringFilter": {
"doc_count": 1,
"FileOffsets": {
"buckets": [
{
"key": "f1",
"doc_count": 1
}
]
}
}
}
}
Remember to have mapping of Strings as nested, as dan said.

Resources