How do I search an array that's nested in an array of objects in Elastic? - elasticsearch

I have an Elastic index that contains objects structured like this:
{
dogs: [
{
name: 'wiener dog',
id: 2,
cats: [
{
name: 'mean cat',
id: 5,
},
...
],
},
...
],
...
}
My question is: How do I search against this index for all documents that include a particular id in cats? A single match is fine.
What I have tried: I have tried many different queries, including nesting on dogs, and nesting on both dogs and cats. I have tried accessing the property directly via dogs.cats.id, and all combinations of the above. Here is an example in NEST:
query &= mst.Nested(n => n
.Path("dogs")
.Query(q => q
.Nested(n => n
.Path("dogs.cats")
.Query(q => q
.Terms(t => t
.Field("dogs.cats.id")
.Terms(catIds.ToList())
)
)
)
)
);
I have also tried with a single Nested with Field set to cats.id with no luck.
Any help here would be greatly appreciated. Changing the data structure at this point would be a much larger effort, and would be avoided if possible. Thanks!

From your information, I assume that the use of NestedQuery is ideal.
PUT bug_reports
{
"mappings": {
"properties": {
"dogs": {
"type": "nested",
"properties": {
"cats": {
"type": "nested"
}
}
}
}
}
}
POST bug_reports/_doc/1
{
"dogs": [
{
"name": "wiener dog",
"id": 1,
"cats": [
{
"name":"red cat",
"id": 4
},
{
"name":"mean cat",
"id": 5
}
]
}
]
}
POST bug_reports/_doc/2
{
"dogs": [
{
"name": "none dog",
"id": 2,
"cats": [
{
"name":"mean cat",
"id": 5
}
]
}
]
}
GET bug_reports/_search?filter_path=hits.hits
{
"query": {
"nested": {
"path": "dogs",
"query": {
"bool": {
"must": [
{
"nested": {
"path": "dogs.cats",
"query": {
"terms": {
"dogs.cats.id": [
4
]
}
}
}
},
{
"nested": {
"path": "dogs.cats",
"query": {
"terms": {
"dogs.cats.id": [
5
]
}
}
}
}
]
}
}
}
}
}

Related

Elasticsearch filter by nested fields

I have a problem with creating a query to Elasticsearch with many conditions. My model looks like:
data class Product(
#Id
val id: String? = null,
val category: String,
val imagesUrls: List<String>,
#Field(type = FieldType.Double)
val price: Double?,
#Field(type = FieldType.Nested)
val parameters: List<Parameter>?
)
data class Parameter(
val key: String,
val values: List<String>
)
I would like to query products by:
category (for example cars)
price (between 20k $ and 50k $)
and parameters -> For example products with many parameters, like key capacity values 4L, 5L and second parameter gear transmission values manual
My current query looks like this:
GET data/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"term": {
"category.keyword": {
"value": "cars"
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{"term": {
"parameters.key.keyword": {
"value": "Capacity"
}
}},
{
"term": {
"parameters.key": {
"value": "4L, 5L"
}
}
}
]
}
}
}
}
]
}
}
Could you tell me how to filter the product when parameter key is equal to Capacity and check that the values list contains one of the values?
How to combine many this kind operations in one query?
Example data:
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":"4L"
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
The search query shown below queries the data based on:
category (for example cars)
And parameters -> For example products with many parameters, like key capacity values 4L, 5L and second parameter gear transmission
values manual
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"parameters": {
"type": "nested"
}
}
}
}
Index Data:
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"gear transmission",
"values":["4L","5L"]
},
{
"key":"capacity",
"values":"automcatic"
}
]
}
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":["4L","5L"]
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":"4L"
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"term": {
"category.keyword": {
"value": "cars"
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{
"match": {
"parameters.key": "capacity"
}
},
{
"terms": {
"parameters.values": [
"4l",
"5l"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{
"match": {
"parameters.key": "gear transmission"
}
},
{
"match": {
"parameters.values": "automcatic"
}
}
]
}
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "bstof",
"_type": "_doc",
"_id": "1",
"_score": 3.9281754,
"_source": {
"category": "cars",
"name": "Ferrari",
"price": 50000,
"parameters": [
{
"key": "capacity",
"values": "4L"
},
{
"key": "gear transmission",
"values": "automcatic"
}
]
}
},
{
"_index": "bstof",
"_type": "_doc",
"_id": "2",
"_score": 3.9281754,
"_source": {
"category": "cars",
"name": "Ferrari",
"price": 50000,
"parameters": [
{
"key": "capacity",
"values": [
"4L",
"5L"
]
},
{
"key": "gear transmission",
"values": "automcatic"
}
]
}
}
]
When you need to match any one from a list then you can use terms query instead of term. Update the part in query from:
{
"term": {
"parameters.key": {
"value": "4L, 5L"
}
}
}
to below:
{
"terms": {
"parameters.values": {
"value": [
"4L",
"5L"
]
}
}
}
Note that if parameters.key is analysed field and there exist a keyword sub-field for the same, then use it instead. e.g parameters.values.keyword
You can read more on terms query here.

Elasticsearch: don't return document if any of nested object field matches term value

I struggle with writing a query that should not return a document if any of its nested objects field value matches a term value passed in a query.
Document sample:
{
"id": 1,
"test": "name",
"rules": [
{
"id": 2,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 1
},
{
"questionDetailId": 2
}
]
},
{
"id": 3,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 4
},
{
"questionDetailId": 5
}
]
}
]
}
The rule field has nested type
My nested search query is:
{
"query": {
"nested": {
"path": "rules",
"query": {
"bool": {
"must_not": [
{
"terms": {
"rules.questionDetailConditionalRules.questionDetailId": [
1
]
}
}
]
}
}
}
}
}
Expected result: the document should not be returned
Actual result: document is returned.
Should I miss anything in my query?
Was able to reproduce your issue and fixed it, please find step by step solution to make it work. you need to move nested inside the must_not block and some modification to your query.
Index def
{
"mappings" :{
"properties" :{
"rules" :{
"type" : "nested"
}
}
}
}
Index your sample doc
{
"rules": [
{
"id": 2,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 1
},
{
"questionDetailId": 2
}
]
},
{
"id": 3,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 4
},
{
"questionDetailId": 5
}
]
}
]
}
Search Query
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "rules", --> note `nested` is inside the `must_not` block.
"query": {
"bool": {
"filter": [
{
"term": {
"rules.questionDetailConditionalRules.questionDetailId": 1
}
}
]
}
}
}
}
]
}
}
}
Search result
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
Note: you can find more info in this link.

Full-text search through complex structure Elasticsearch

I have the following issue in case of a full-text search in Elasticsearch. I would like to search for all indexed attributes. However, one of my Project attributes is a very complex array of hashes/objects:
[
{
"title": "Group 1 title",
"name": "Group 1 name",
"id": "group_1_id",
"items": [
{
"pos": "1",
"title": "Position 1 title"
},
{
"pos": "1.1",
"title": "Position 1.1 title",
"description": "<p>description</p>",
"extra_description": {
"rotation": "2 years",
"amount": "1.947m²"
},
"inputs": {
"unit_price": true,
"total_net": true
},
"additional_inputs": [
{
"name": "additonal_input_name",
"label": "Additional input label:",
"placeholder": "Additional input placeholder",
"description": "Additional input description",
"type": "text"
}
]
}
]
}
]
My mappings look like this:
{:title=>{:type=>"text", :analyzer=>"english"},
:description=>{:type=>"text", :analyzer=>"english"},
:location=>{:type=>"keyword"},
:company=>{:type=>"keyword"},
:created_at=>{:type=>"date"},
:due_date=>{:type=>"date"},
:specification=>
{:type=>:nested,
:properties=>
{:id=>{:type=>"keyword"},
:title=>{:type=>"text"},
:items=>
{:type=>:nested,
:properties=>
{:pos=>{:type=>"keyword"},
:title=>{:type=>"text"},
:description=>{:type=>"text", :analyzer=>"english"},
:extra_description=>{:type=>:nested, :properties=>{:rotation=>{:type=>"keyword"}, :amount=>{:type=>"keyword"}}},
:additional_inputs=>
{:type=>:nested,
:properties=>
{:label=>{:type=>"keyword"},
:placeholder=>{:type=>"text"},
:description=>{:type=>"text"},
:type=>{:type=>"keyword"},
:name=>{:type=>"keyword"}
}
}
}
}
}
}
}
The question is, how to properly seek through it? For no nested attributes, it works as a charm, but for instance, I would like to seek by title in the specification, no result is returned. I tried both:
query:
{ nested:
{
multi_match: {
query: keyword,
fields: ['title', 'description', 'company', 'location', 'specification']
}
}
}
Or
{
nested: {
path: 'specification',
query: {
multi_match: {
query: keyword
}
}
}
}
Without any result.
Edit:
It's with elasticsearch-ruby for Ruby.
I am trying to query by: MODEL_NAME.all.search(query: with_specification("Group 1 title")) where with_specification is:
def with_specification(keyword)
{
bool: {
should: [
{
nested: {
path: 'specification',
query: {
bool: {
should: [
{
match: {
'specification.title': keyword,
}
},
{
multi_match: {
query: keyword,
fields: [
'specification.title',
'specification.id'
]
}
},
{
nested: {
path: 'specification.items',
query: {
match: {
'specification.items.title': keyword,
}
}
}
}
]
}
}
}
}
]
}
}
end
Querying on multi-level nested documents must follow a certain schema.
You cannot multi-match on nested & non-nested fields at the same time and/or query on nested fields under different paths.
You can wrap your queries in a bool-should but keep the 2 rules above in mind:
GET your_index/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "specification",
"query": {
"bool": {
"should": [
{
"match": {
"specification.title": "TEXT" <-- standalone match
}
},
{
"multi_match": { <-- multi-match but 1st level path
"query": "TEXT",
"fields": [
"specification.title",
"specification.id"
]
}
},
{
"nested": {
"path": "specification.items", <-- 2nd level path
"query": {
"match": {
"specification.items.title": "TEXT"
}
}
}
}
]
}
}
}
}
]
}
}
}

ElasticSearch - Get only matching nested objects with All Top level fields in search response

let say I have following Document:
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
},
{
name: 'xyz',
surname: 'wef'
},
{
name: 'defg',
surname: 'pqr'
}
]
}
I want to Get only matching nested objects with All Top level fields in search response.
I mean If I search/filter for users with name 'abc', I want below response
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
}
]
}
How can I do that?
Reference : select matching objects from array in elasticsearch
If you're ok with having all root fields except the nested one and then only the matching inner hits in the nested field, then we can re-use the previous answer like this by specifying a slightly more involved source filtering parameter:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": { <---- this is where the magic happens
"_source": [
"name", "surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Maybe late, I use nested sorting to limit element on my nested relation, here a example :
"sort": {
"ouverture.periodesOuvertures.dateDebut": {
"order": "asc",
"mode": "min",
"nested_filter": {
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
"nested_path": "ouverture.periodesOuvertures"
}
},
Since 5.5 ES (I think) you can use filter on nested query.
Here a example of nested query filter I use:
{
"nested": {
"path": "ouverture.periodesOuvertures",
"query": {
"bool": {
"must": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
],
"filter": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
}
}
Hope this can help ;)
Plus if you ES is not in the last version (5.5) inner_hits could slow your query Including inner hits drastically slows down query results
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-inner-hits.html#nested-inner-hits-source
"inner_hits": {
"_source" : false,
"stored_fields" : ["name", "surname"]
}
but you may need to change mapping to set those fields as "stored_fields" , otherwise you can use
"inner_hits": {}
to get a result that not that perfect.
You can make such a request, but the response will have internal fields starting with _
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {},
"query": {
"bool": {
"must": [
{ "match": { "users.name": "abc" }}
]
}
}
}
}
}
In one of my projects, My expectation was to retrieve unique conversation messages text(inner fields like messages.text) having specific tags. So instead of using inner_hits, I used aggregation like below,
final NestedAggregationBuilder aggregation = AggregationBuilders.nested("parentPath", "messages").subAggregation(AggregationBuilders.terms("innerPath").field("messages.tag"));
final NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.addAggregation(aggregation).build();
final Aggregations aggregations = elasticsearchOperations.search(searchQuery, Conversation.class).getAggregations();
final ParsedNested parentAgg = (ParsedNested) aggregations.asMap().get("parentPath");
final Aggregations childAgg = parentAgg.getAggregations();
final ParsedStringTerms childParsedNested = (ParsedStringTerms) childAgg.asMap().get("innerPath");
// Here you will get unique expected inner fields in key part.
Map<String, Long> agg = childParsedNested.getBuckets().stream().collect(Collectors.toMap(Bucket::getKeyAsString, Bucket::getDocCount));
I use the following body to get that result (I have set the full path to the values):
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Also another way exists:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": false,
"docvalue_fields": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
See results in inner_hits of the result hits.
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/inner-hits.html#nested-inner-hits-source

Combining and with or conditions

I stuck with a query which has to combine some conditions.
this properties of the catalog are the following
_id:integer
parentID: integer
path: string
level: integer
i have absolutely no clue how to combine them, so that the query returns what I need.
a) _id has to be one of a given list ("_id": ["7","10"]) OR
b) parentID has to be of a given integer ("_parentID": "1") OR
c) path has to match a special pattern ("regexp": {"path": "/foobar.*"}) AND level has be between two integer ("range": {"level": {"gte": 2, "lte": 3 } })
Additionaly all entries have to be from one defined catalog
I will not write down all my attempts. I tried to use bool query with must and should, but this does not apply c):
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"type": {
"value": "category"
}
}
],
"should": [
{
"regexp": {
"path": "/foobar.*"
}
},
{
"range": {
"level": {
"gte": 2,
"lte": 3
}
}
},
{
"term": {
"_id": [
"7",
"10"
]
}
}
]
}
}
}
}
}
what is the best way to combine and and or conditions? i am kind of lost.
I think this should be pretty darn close to what you need.
GET devdev/alert/_search
{
"filter": {
"or": {
"filters": [
{
"terms": {
"_id": [
"eee75eJpRua4HasVzz0PeA",
"VALUE2"
]
}
},
{
"term": {
"_parentID": "SE.SE.0000"
}
},
{
"and": {
"filters": [
{
"term": {
"regexp": "foobar"
}
},
{
"range": {
"level": {
"from": 2,
"to": 3
}
}
}
]
}
}
]
}
}
}

Resources