Elasticsearch: Query nested object contained within an object - elasticsearch

I'm struggling to build a query where I can do a nested search across a sub-object of a document.
Say I have the following index/mapping:
curl -XPOST "http://localhost:9200/author/" -d '
{
"mappings": {
"item": {
"properties": {
"books": {
"type": "object",
"properties": {
"data": {
"type": "nested"
}
}
}
}
}
}
}
'
And the following 2 documents in the index:
{
"id": 1,
"name": "Robert Louis Stevenson",
"books": {
"count": 2,
"data": [
{
"id": 1,
"label": "Treasure Island"
},
{
"id": 3,
"label": "Dr Jekyll and Mr Hyde"
}
]
}
}
and
{
"id": 2,
"name": "Philip K. Dick",
"books": {
"count": 1,
"data": [
{
"id": 4,
"label": "Do Android Dream of Electric Sheep"
}
]
}
}
I have an array of Book ID's, say [1,4]; how would I write a query which does a keyword search of the author name AND only returns them if they wrote one of the books in the array?
I haven't managed to get a query which doesn't cause some sort of query parse_exception, but as a starting block, here's the current iteration of my query - maybe it's obvious where I'm going wrong?
{
"query": {
"bool": {
"must": {
"match": {
"label": "Louis"
}
}
},
"nested": {
"path": "books.data",
"query": {
"bool": {
"must": {
"terms": {
"books.data.id": [
1,
4
]
}
}
}
}
}
},
"from": 0,
"size": 8
}
In the above scenario I'd like the document for Mr Robert Louis Stevenson to be returned, as his name contains Louis and he wrote book ID 1.
For what it's worth, the current error I get looks like this:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "failed to parse search source. expected field name but got [START_OBJECT]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "author",
"node": "sCk3su4YSnqhvdTGjOztlw",
"reason": {
"type": "parse_exception",
"reason": "failed to parse search source. expected field name but got [START_OBJECT]"
}
}
]
},
"status": 400
}
This makes me feel like I've got my "nested" object all wrong, but the docs suggest that I'm right!

You have it almost right, the nested query must simply be located inside the bool one like in the query below. Also the match query needs to be made on the name field since this is where the author name is stored:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Louis"
}
},
{
"nested": {
"path": "books.data",
"query": {
"bool": {
"must": {
"terms": {
"books.data.id": [
1,
4
]
}
}
}
}
}
}
]
}
},
"from": 0,
"size": 8
}

Related

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

ElasticSearch Nested Query Does not work as expected

I need a quick help, i need to fetch the entire document only when my certain conditions are fulfilled in that same array.
Example
Conditions when in one array block all these three conditions fulfill. i.e.
"profile.bud.buddies.code": "1"
"profile.bud.buddies.moredata.key":"one"
"profile.bud.buddies.moredata.val": "0"
Unfortunately right now it is going through the entire document and trying to match the values in each of those arrays so it could be so that code=1 gets matched in one array, key=one in some other array and val=0 in the third array. What happens it in this case it returns me the entire document, whereas actually this was not fulfilled in one array alone so shouldn't have returned me the document.
I made the moredata as nested type but still cannot get through. Please help.
Query I am using
"query": {
"bool": {
"should": [
{
"match": {
"profile.bud.buddies.code": "1"
}
}
]
},
"nested": {
"path": "profile.bud.buddies.moredata",
"query": {
"bool": {
"must": [
{
"match": {
"profile.bud.buddies.moredata.key": "one"
}
},
{
"match": {
"profile.bud.buddies.moredata.val": "0"
}
}
]
}
}
}
}
Document Structure
"profile": {
"x":{},
"y":{},
"a":{},
"b":{},
"bud":{
"buddies": [
{
"code":"1",
"moredata": [
{
"key": "one",
"val": 0,
"setup": "2323",
"data": "myid"
},
{
"key": "two",
"val": 1,
"setup": "23",
"data": "id"
}]
},
{
"code":"2",
"moredata": [
{
"key": "two",
"val": 0,
"setup": "2323",
"data": "myid"
},
{
"key": "three",
"val": 1,
"setup": "23",
"data": "id"
}]
}]
}
This is how i have marked the mappings;
"profile": {
"bug": {
"properties": {
"buddies": {
"properties": {
"moredata": {
"type": "nested",
"properties": {
"key": {"type": "string"},
"val": {"type": "string"}
Your query structure is incorrect, it should be something like
"query": {
"bool": {
"must": [{
"match": {
"profile.bud.buddies.code": "1"
}
},
{
"nested": {
"path": "profile.bud.buddies.moredata",
"query": {
"bool": {
"must": [{
"match": {
"profile.bud.buddies.moredata.key": "one"
}
},
{
"match": {
"profile.bud.buddies.moredata.val": "0"
}
}
]
}
}
}
]
}
}
}
where the nested query is inside of the array of must clauses of the outer bool query. Note that profile.bud.buddies.moredata must be mapped as a nested data type.

update a particular field of elasticsearch document

Hi I am trying to update documents a elasticsearch which meets specific criteria. I am using google sense(chrome extension) for making request. The request that I am making is as shown below:
GET styling_rules2/product_line_filters/_update
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{"term":{"product_line_attribute": "brand"}}
],
"minimum_should_match": 1
}
},
"filter": {
"term": {
"product_line_name": "women_skirts"
}
}
}
},
"script" : "ctx._source.brand=brands"
}
sample document is as shown below:
{
"product_line_attribute_db_path": "product_filter.brand",
"product_line_attribute": "brand",
"product_line_name": "women_skirts",
"product_line_attribute_value_list": [
"vero moda",
"faballey",
"only",
"rider republic",
"dorothy perkins"
]
}
desired result: update all the document which has product_line_attribute="brand" and product_line_name="women_skirts" to product_line_attribute="brands".
problem: I am getting the error as follows:
{
"error": {
"root_cause": [
{
"type": "search_parse_exception",
"reason": "failed to parse search source. unknown search element [script]",
"line": 18,
"col": 4
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "styling_rules2",
"node": "2ijp1pXwT46FN4on4-JPlg",
"reason": {
"type": "search_parse_exception",
"reason": "failed to parse search source. unknown search element [script]",
"line": 18,
"col": 4
}
}
]
},
"status": 400
}
thanks in advance!
You should use the _update_by_query endpoint and not _update. Also the script section is not correct, which is probably why you're getting a class_cast_exception.
Try this instead:
POST styling_rules2/product_line_filters/_update_by_query
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"term": {
"product_line_attribute": "brand"
}
}
],
"minimum_should_match": 1
}
},
"filter": {
"term": {
"product_line_name": "women_skirts"
}
}
}
},
"script": {
"inline": "ctx._source.brand=brands"
}
}

Elasticsearch unexpected results when sorting against deeply nested attributes

I'm trying to perform some sorting based on the attributes of a document's deeply nested children.
Let's say we have an index filled with publisher documents. A publisher has a collection of books, and
each book has a title, a published flag, and a collection of genre scores. A genre_score represents how well
a particular book matches a particular genre, or in this case a genre_id.
First, let's define some mappings (for simplicity, we will only be explicit about the nested types):
curl -XPUT 'localhost:9200/book_index' -d '
{
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested"
}
}
}
}
}
}
}'
Here are our two publishers:
curl -XPUT 'localhost:9200/book_index/publisher/1' -d '
{
"name": "Best Books Publishing",
"books": [
{
"name": "Published with medium genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 50 },
{ "genre_id": 2, "score": 15 }
]
}
]
}'
curl -XPUT 'localhost:9200/book_index/publisher/2' -d '
{
"name": "Puffin Publishers",
"books": [
{
"name": "Published book with low genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 10 },
{ "genre_id": 4, "score": 10 }
]
},
{
"name": "Unpublished book with high genre_id of 1",
"published": false,
"genre_scores": [
{ "genre_id": 1, "score": 100 },
{ "genre_id": 2, "score": 35 }
]
}
]
}'
And here is the final definition of our index & mappings...
curl -XGET 'localhost:9200/book_index/_mappings?pretty=true'
...
{
"book_index": {
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested",
"properties": {
"genre_id": {
"type": "long"
},
"score": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"published": {
"type": "boolean"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now suppose we want to query for a list of publishers, and have them sorted by those who books performing
well in a particular genre. In other words, sort the publishers by the genre_score.score of one of their books
for the target genre_id.
We might write a search query like this...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"term": {
"books.genre_scores.genre_id": 1
}
}
}
}
],
"_source":false,
"query": {
"nested": {
"path": "books",
"query": {
"bool": {
"must": []
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Which correctly returns the Puffin (with a sort value of [100]) first and Best Books second (with a sort value of [50]).
But suppose we only want to consider books for which published is true. This would change our expectation to have Best Books first (with a sort of [50]) and Puffin second (with a sort of [10]).
Let's update our nested_filter and query to the following...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"bool": {
"must": [
{
"term": {
"books.genre_scores.genre_id": 1
}
}, {
"term": {
"books.published": true
}
}
]
}
}
}
}
],
"_source": false,
"query": {
"nested": {
"path": "books",
"query": {
"term": {
"books.published": true
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Suddenly, our sort values for both publishers has become [-9223372036854775808].
Why does adding an additional term to our nested_filter in the top-level sort have this impact?
Can anyone provide some insight as to why this behavior is happening? And additionally, if there are any viable solutions to the proposed query/sort?
This occurs in both ES1.x and ES5
Thanks!

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources