Elasticsearch query fails to return results when querying a nested object - elasticsearch

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.

I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Related

How to filter query based on a field value

I'm working with elasticsearch Query dsl, and I can't find a way to express the following:
Return results that have the field "price" > min budget and have "price" < max Budget and have has_price=true and also return all results that have "has_price=false"
In other words, I would like to use a range filter on results only that have has_price field set to true, otherwise, on results that have has_price set to false don't take in consideration the filter
Here's the mapping:
{
"formations": {
"mappings": {
"properties": {
"code": {
"type": "text"
},
"date": {
"type": "date",
"format": "dd/MM/yyyy"
},
"description": {
"type": "text"
},
"has_price": {
"type": "boolean"
},
"place": {
"type": "text"
},
"price": {
"type": "float"
},
"title": {
"type": "text"
}
}
}
}
}
The following query combines the 2 scenarios as 2 should clauses in a bool-query. And as there are only should clauses, minimum_should_match will be 1, meaning that at least one should-clause has to match:
Abstract Code Snippet
GET formations/_search
{
"query": {
"bool": {
"should": [
{ <1st scenario: has_price = false> },
{ <2nd scenario> has_price = true AND price IN budget_range}
]
}
}
}
Actual Sample Code Snippets
# 1. Create the index and populate it with some sample documents
POST formations/_bulk
{"index": {"_id": 1}}
{"has_price": true, "price": 2.0}
{"index": {"_id": 2}}
{"has_price": true, "price": 3.0}
{"index": {"_id": 3}}
{"has_price": true, "price": 4.0}
{"index": {"_id": 4}}
{"has_price": false, "price": 2.0}
{"index": {"_id": 5}}
{"has_price": false, "price": 3.0}
{"index": {"_id": 6}}
{"has_price": false, "price": 4.0}
# 2. Query assuming min_budget = 2.0 and max_budget = 4.0
GET formations/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": {
"term": {
"has_price": false
}
}
}
},
{
"bool": {
"filter": [
{
"term": {
"has_price": true
}
},
{
"range": {
"price": {
"gt": 2,
"lt": 4
}
}
}
]
}
}
]
}
}
}
# 3. Result Snippet (4 hits: 3 from 1st scenario & 1 from 2nd scenario)
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
...
Don't forget to add the Claus "minimum_should_match": 1 to your bool-query in case you add another non-should-clause to your bool-query.
Let me know if this answers your question & solves your issue.

ElasticSearch Nested Query Does not work as expected

I need a quick help, i need to fetch the entire document only when my certain conditions are fulfilled in that same array.
Example
Conditions when in one array block all these three conditions fulfill. i.e.
"profile.bud.buddies.code": "1"
"profile.bud.buddies.moredata.key":"one"
"profile.bud.buddies.moredata.val": "0"
Unfortunately right now it is going through the entire document and trying to match the values in each of those arrays so it could be so that code=1 gets matched in one array, key=one in some other array and val=0 in the third array. What happens it in this case it returns me the entire document, whereas actually this was not fulfilled in one array alone so shouldn't have returned me the document.
I made the moredata as nested type but still cannot get through. Please help.
Query I am using
"query": {
"bool": {
"should": [
{
"match": {
"profile.bud.buddies.code": "1"
}
}
]
},
"nested": {
"path": "profile.bud.buddies.moredata",
"query": {
"bool": {
"must": [
{
"match": {
"profile.bud.buddies.moredata.key": "one"
}
},
{
"match": {
"profile.bud.buddies.moredata.val": "0"
}
}
]
}
}
}
}
Document Structure
"profile": {
"x":{},
"y":{},
"a":{},
"b":{},
"bud":{
"buddies": [
{
"code":"1",
"moredata": [
{
"key": "one",
"val": 0,
"setup": "2323",
"data": "myid"
},
{
"key": "two",
"val": 1,
"setup": "23",
"data": "id"
}]
},
{
"code":"2",
"moredata": [
{
"key": "two",
"val": 0,
"setup": "2323",
"data": "myid"
},
{
"key": "three",
"val": 1,
"setup": "23",
"data": "id"
}]
}]
}
This is how i have marked the mappings;
"profile": {
"bug": {
"properties": {
"buddies": {
"properties": {
"moredata": {
"type": "nested",
"properties": {
"key": {"type": "string"},
"val": {"type": "string"}
Your query structure is incorrect, it should be something like
"query": {
"bool": {
"must": [{
"match": {
"profile.bud.buddies.code": "1"
}
},
{
"nested": {
"path": "profile.bud.buddies.moredata",
"query": {
"bool": {
"must": [{
"match": {
"profile.bud.buddies.moredata.key": "one"
}
},
{
"match": {
"profile.bud.buddies.moredata.val": "0"
}
}
]
}
}
}
]
}
}
}
where the nested query is inside of the array of must clauses of the outer bool query. Note that profile.bud.buddies.moredata must be mapped as a nested data type.

ElasticSearch Return only if Nested Date in range and occurs more than X times

I am trying to write a query that returns only the documents if they have X number of visits in a specific data range. Ie Only return guid-docA if it has 2 visits that are between "2017-01-01" and "2017-12-31"
**Example Data Documents**
{
"_id": "guid-docA"
"visits": [
{
"name": "Sarah",
"visitDate": "2013-02-27T00:00:00.000Z"
},
{
"name": "John",
"visitDate": "2017-02-27T00:00:00.000Z"
},
{
"name": "Jim",
"visitDate": "2017-12-27T00:00:00.000Z"
}
]
},
{
"_id": "guid-docB"
"visits": [
{
"name": "Brian",
"visitDate": "2013-02-27T00:00:00.000Z"
},
{
"name": "Kerri",
"visitDate": "2016-02-27T00:00:00.000Z"
},
{
"name": "Julia",
"visitDate": "2017-12-27T00:00:00.000Z"
}
]
}
I have tried script filter and can return the data if Visits count is greater than 2 but I haven't figured out how to get it if they have more than 1 (or X) visits in given year.
"filter": {
"script": {
"script": "_source.visits.size() > 1"
}
}
Not sure how to apply logic to it for the specified count.
Thank you in Advance.
First of all you need to be sure that visits field is nested Nested datatype
If so you can use a nested bool query with must clause and minimum_shold_mathc
Something like this
"query": {
"nested": {
"path": "visits",
"query": {
"bool": {
"must": [
{
"range": {
"visits.visitDate": {
"gte": "2017-01-01",
"lte": "2017-12-31"
}
}
}
],
"minimum_number_should_match": 2
}
}
}
}

Elasticsearch: Query nested object contained within an object

I'm struggling to build a query where I can do a nested search across a sub-object of a document.
Say I have the following index/mapping:
curl -XPOST "http://localhost:9200/author/" -d '
{
"mappings": {
"item": {
"properties": {
"books": {
"type": "object",
"properties": {
"data": {
"type": "nested"
}
}
}
}
}
}
}
'
And the following 2 documents in the index:
{
"id": 1,
"name": "Robert Louis Stevenson",
"books": {
"count": 2,
"data": [
{
"id": 1,
"label": "Treasure Island"
},
{
"id": 3,
"label": "Dr Jekyll and Mr Hyde"
}
]
}
}
and
{
"id": 2,
"name": "Philip K. Dick",
"books": {
"count": 1,
"data": [
{
"id": 4,
"label": "Do Android Dream of Electric Sheep"
}
]
}
}
I have an array of Book ID's, say [1,4]; how would I write a query which does a keyword search of the author name AND only returns them if they wrote one of the books in the array?
I haven't managed to get a query which doesn't cause some sort of query parse_exception, but as a starting block, here's the current iteration of my query - maybe it's obvious where I'm going wrong?
{
"query": {
"bool": {
"must": {
"match": {
"label": "Louis"
}
}
},
"nested": {
"path": "books.data",
"query": {
"bool": {
"must": {
"terms": {
"books.data.id": [
1,
4
]
}
}
}
}
}
},
"from": 0,
"size": 8
}
In the above scenario I'd like the document for Mr Robert Louis Stevenson to be returned, as his name contains Louis and he wrote book ID 1.
For what it's worth, the current error I get looks like this:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "failed to parse search source. expected field name but got [START_OBJECT]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "author",
"node": "sCk3su4YSnqhvdTGjOztlw",
"reason": {
"type": "parse_exception",
"reason": "failed to parse search source. expected field name but got [START_OBJECT]"
}
}
]
},
"status": 400
}
This makes me feel like I've got my "nested" object all wrong, but the docs suggest that I'm right!
You have it almost right, the nested query must simply be located inside the bool one like in the query below. Also the match query needs to be made on the name field since this is where the author name is stored:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Louis"
}
},
{
"nested": {
"path": "books.data",
"query": {
"bool": {
"must": {
"terms": {
"books.data.id": [
1,
4
]
}
}
}
}
}
}
]
}
},
"from": 0,
"size": 8
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources