ElasticSearch - Get only matching nested objects with All Top level fields in search response - elasticsearch

let say I have following Document:
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
},
{
name: 'xyz',
surname: 'wef'
},
{
name: 'defg',
surname: 'pqr'
}
]
}
I want to Get only matching nested objects with All Top level fields in search response.
I mean If I search/filter for users with name 'abc', I want below response
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
}
]
}
How can I do that?
Reference : select matching objects from array in elasticsearch

If you're ok with having all root fields except the nested one and then only the matching inner hits in the nested field, then we can re-use the previous answer like this by specifying a slightly more involved source filtering parameter:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": { <---- this is where the magic happens
"_source": [
"name", "surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}

Maybe late, I use nested sorting to limit element on my nested relation, here a example :
"sort": {
"ouverture.periodesOuvertures.dateDebut": {
"order": "asc",
"mode": "min",
"nested_filter": {
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
"nested_path": "ouverture.periodesOuvertures"
}
},
Since 5.5 ES (I think) you can use filter on nested query.
Here a example of nested query filter I use:
{
"nested": {
"path": "ouverture.periodesOuvertures",
"query": {
"bool": {
"must": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
],
"filter": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
}
}
Hope this can help ;)
Plus if you ES is not in the last version (5.5) inner_hits could slow your query Including inner hits drastically slows down query results

https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-inner-hits.html#nested-inner-hits-source
"inner_hits": {
"_source" : false,
"stored_fields" : ["name", "surname"]
}
but you may need to change mapping to set those fields as "stored_fields" , otherwise you can use
"inner_hits": {}
to get a result that not that perfect.

You can make such a request, but the response will have internal fields starting with _
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {},
"query": {
"bool": {
"must": [
{ "match": { "users.name": "abc" }}
]
}
}
}
}
}

In one of my projects, My expectation was to retrieve unique conversation messages text(inner fields like messages.text) having specific tags. So instead of using inner_hits, I used aggregation like below,
final NestedAggregationBuilder aggregation = AggregationBuilders.nested("parentPath", "messages").subAggregation(AggregationBuilders.terms("innerPath").field("messages.tag"));
final NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.addAggregation(aggregation).build();
final Aggregations aggregations = elasticsearchOperations.search(searchQuery, Conversation.class).getAggregations();
final ParsedNested parentAgg = (ParsedNested) aggregations.asMap().get("parentPath");
final Aggregations childAgg = parentAgg.getAggregations();
final ParsedStringTerms childParsedNested = (ParsedStringTerms) childAgg.asMap().get("innerPath");
// Here you will get unique expected inner fields in key part.
Map<String, Long> agg = childParsedNested.getBuckets().stream().collect(Collectors.toMap(Bucket::getKeyAsString, Bucket::getDocCount));

I use the following body to get that result (I have set the full path to the values):
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Also another way exists:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": false,
"docvalue_fields": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
See results in inner_hits of the result hits.
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/inner-hits.html#nested-inner-hits-source

Related

How to group by field in nested path and sort the group using field in outer level and get the all the level fields in result?

I'm new to elastic search and I have a scenario where I need to group by field in nested path and sort each group based on the field in outer level.
example :- data format looks like this.
`
[
{
"percent": "1.0481265",
"subject": {
"id": "1234"
},
"updatedDate": "2016-12-31T00:00:00.000Z",
"id": "test"
},
{
"percent": "1.0581265",
"subject": {
"id": "1234"
},
"updatedDate": "2017-12-31T00:00:00.000Z",
"id": "test"
}
]
`
I need to group by subject.id and sort by updatedDate in each group and get the latest updated data as result with all level fields.
`
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "subject",
"ignore_unmapped": true,
"query": {
"bool": {
"filter": [
{
"terms": {
"subject.id": [
"1234"
]
}
}
]
}
}
}
}
]
}
},
"sort": [
{
"updatedDate": {
"order": "desc",
"unmapped_type": "long"
}
}],
"aggs": {
"agg_ids": {
"nested": {
"path": "subject"
},
"aggs": {
"by_id": {
"terms": {
"field": "subject.id"
}
}
}
}
}
}
`
I'm able to group data by id, but not able to sort each with all the fields using updated date.

Query that satisfies all conditions in an array

The documents are stored in the form below in Elastic Research index.
mapping
{
"mappings": {
"properties": {
"data": {
"type": "nested"
}
}
}
}
first docs
{
"data": [
{
"value": "a"
},
{
"value": "a"
},
{
"value": "b"
}
]
}
second docs
{
"data": [
{
"value": "a"
},
{
"value": "a"
},
{
"value": "a"
}
]
}
I want to return the document only when all values in the array are 'a' (second docs)
In this case, how should I make the query condition?
The nested query searches nested field objects as if they were indexed
as separate documents. If an object matches the search, the nested
query returns the root parent document.
When using a combination of bool query with must and must_not, it searches for each individual nested object and eliminates the objects that do not match, but if there are some nested objects left, that match with your query, you will get your results.
Try out this below search query, where all the documents are discarded who have a nested object with the b value.
Search Query:
{
"query": {
"bool": {
"must_not": {
"nested": {
"path": "data",
"query": {
"term": {
"data.value": "b"
}
}
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64329782",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"data": [
{
"value": "a"
},
{
"value": "a"
},
{
"value": "a"
}
]
}
}
]
Search Query with the combination of multiple bool and nested queries:
The below search query will also give you the required result.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"must": [
{
"match": {
"data.value": "a"
}
}
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"must": [
{
"match": {
"data.value": "b"
}
}
]
}
}
}
}
]
}
}
}

Elasticsearch filter by nested fields

I have a problem with creating a query to Elasticsearch with many conditions. My model looks like:
data class Product(
#Id
val id: String? = null,
val category: String,
val imagesUrls: List<String>,
#Field(type = FieldType.Double)
val price: Double?,
#Field(type = FieldType.Nested)
val parameters: List<Parameter>?
)
data class Parameter(
val key: String,
val values: List<String>
)
I would like to query products by:
category (for example cars)
price (between 20k $ and 50k $)
and parameters -> For example products with many parameters, like key capacity values 4L, 5L and second parameter gear transmission values manual
My current query looks like this:
GET data/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"term": {
"category.keyword": {
"value": "cars"
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{"term": {
"parameters.key.keyword": {
"value": "Capacity"
}
}},
{
"term": {
"parameters.key": {
"value": "4L, 5L"
}
}
}
]
}
}
}
}
]
}
}
Could you tell me how to filter the product when parameter key is equal to Capacity and check that the values list contains one of the values?
How to combine many this kind operations in one query?
Example data:
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":"4L"
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
The search query shown below queries the data based on:
category (for example cars)
And parameters -> For example products with many parameters, like key capacity values 4L, 5L and second parameter gear transmission
values manual
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"parameters": {
"type": "nested"
}
}
}
}
Index Data:
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"gear transmission",
"values":["4L","5L"]
},
{
"key":"capacity",
"values":"automcatic"
}
]
}
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":["4L","5L"]
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":"4L"
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"term": {
"category.keyword": {
"value": "cars"
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{
"match": {
"parameters.key": "capacity"
}
},
{
"terms": {
"parameters.values": [
"4l",
"5l"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{
"match": {
"parameters.key": "gear transmission"
}
},
{
"match": {
"parameters.values": "automcatic"
}
}
]
}
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "bstof",
"_type": "_doc",
"_id": "1",
"_score": 3.9281754,
"_source": {
"category": "cars",
"name": "Ferrari",
"price": 50000,
"parameters": [
{
"key": "capacity",
"values": "4L"
},
{
"key": "gear transmission",
"values": "automcatic"
}
]
}
},
{
"_index": "bstof",
"_type": "_doc",
"_id": "2",
"_score": 3.9281754,
"_source": {
"category": "cars",
"name": "Ferrari",
"price": 50000,
"parameters": [
{
"key": "capacity",
"values": [
"4L",
"5L"
]
},
{
"key": "gear transmission",
"values": "automcatic"
}
]
}
}
]
When you need to match any one from a list then you can use terms query instead of term. Update the part in query from:
{
"term": {
"parameters.key": {
"value": "4L, 5L"
}
}
}
to below:
{
"terms": {
"parameters.values": {
"value": [
"4L",
"5L"
]
}
}
}
Note that if parameters.key is analysed field and there exist a keyword sub-field for the same, then use it instead. e.g parameters.values.keyword
You can read more on terms query here.

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

How to query multiple parameters in a nested field in elasticsearch

I'm trying to search for keyword and then add nested queries for amenities which is a nested field of an array of objects.
With the query below I am able to search when I'm only matching one amenity id but when I have more than one it doesn't return anything.
Anyone have an idea what is wrong with my query ?
{
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"_geo_distance": {
"geolocation": [
100,
10
],
"order": "asc",
"unit": "m",
"mode": "min",
"distance_type": "sloppy_arc"
}
}
],
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"name^2",
"city",
"state",
"zip"
],
"fuzziness": 5,
"query": "complete"
}
},
{
"nested": {
"path": "amenities",
"query": {
"bool": {
"must": [
{
"term": {
"amenities.id": "1"
}
},
{
"term": {
"amenities.id": "2"
}
}
]
}
}
}
}
]
}
}
}
When you do:
"must": [
{
"term": {
"amenities.id": "1"
}
},
{
"term": {
"amenities.id": "2"
}
}]
What you're actually saying is find me any document where "amenities.id"="1" and "amenities.id"="2" which unless "amenities.id" is a list of values it won't work.
What you probably want to say is find me any document where "amenities.id"="1" or "amenities.id"="2"
To do that you should use should instead of must:
"should": [
{
"term": {
"amenities.id": "1"
}
},
{
"term": {
"amenities.id": "2"
}
}]

Resources