Searching objects having all nested children matching a given query in Elasticsearch - elasticsearch

Given an object with the following mapping:
{
"a": {
"properties": {
"id": {"type": "string"}
"b": {
"type": "nested",
"properties": {
"key": {"type": "string"}
}
}
}
}
}
I want to retrieve all the instances of this object having all nested children matching a given query.
For example, suppose I want to retrieve all the instances having all children with "key" = "yes".
Given the following instances:
{
"id": "1",
"b": [
{
"key": "yes"
},
{
"key": "yes"
}
]
},
{
"id": "2",
"b": [
{
"key": "yes"
},
{
"key": "yes"
},
{
"key": "no"
}
]
},
I want to retrieve only the first one (the one with "id" = "1").
Both using filters or queries is fine to me.
I already tried to use the "not filter" and the "must_not bool filter". The idea was to use a double negation to extract only objects that doesn't have fields that are different to the given one.
However, I was not able to write down this query correctly.
I realize that this is not a common query for a search engine, but, in my case, it can be useful.
Is it possible to write this query ("forall nested query") using nested objects?
In case it is not, would it be possible to write this query using parent-child?
Update
Andrei Stefan gave a good answer in case we know all the values of "key" that we want to avoid, ("no", in the example).
I am interested also in the case you don't know the values you want to avoid, and you just want to match nested object with "key"="yes".

You need a flattened data structure for this - an array of values. The simplest way and not to change the current mapping too much, is to use include_in_parent property and to query the field that's being included in the parent for this particular requirement:
{
"mappings": {
"a": {
"properties": {
"id": {
"type": "string"
},
"b": {
"type": "nested",
"include_in_parent": true,
"properties": {
"key": {
"type": "string"
}
}
}
}
}
}
}
And then your query would look like this:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "b.key:(yes NOT no)"}
}
}
]
}
}
}
}
The alternative is to change the type of the field from nested to object but in this way you'll loose the advantages of using nested fields:
{
"mappings": {
"a": {
"properties": {
"id": {
"type": "string"
},
"b": {
"type": "object",
"properties": {
"key": {
"type": "string"
}
}
}
}
}
}
}
The query remains the same.

Encountered the same problem, though didn't have just yes/no variants.
As per Clinton Gormley's answer in https://github.com/elastic/elasticsearch/issues/19166:
"You can't do it any efficient way. You have to count all children and compare that to how many children match. The following will return all parents where all children match but it is a horrible inefficient solution and I would never recommend using it in practice":
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "b",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"weight": -1
},
{
"filter": {
"match": {
"b.key": "yes"
}
},
"weight": 1
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
}
]
}
}
}

Related

Search-as-you-type inside arrays

I am trying to implement a search-as-you-type query inside an array.
This is the structure of the documents:
{
"guid": "6f954d53-df57-47e3-ae9e-cb445bd566d3",
"labels":
[
{
"name": "London",
"lang": "en"
},
{
"name": "Llundain",
"lang": "cy"
},
{
"name": "Lunnainn",
"lang": "gd"
}
]
}
and up to now this is what I came with:
{
"query": {
"multi_match": {
"fields": ["labels.name"],
"query": name,
"type": "phrase_prefix"
}
}
which works exactly as requested.
The problem is that I would like to search also by language.
What I tried is:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
but these queries act on separate values of the array.
So, for example, I would like to search only Welsh language (cy). That means that my query that contains the city name should match only values that have "cy" on the "lang" tag.
How do I write this kind of query?
Internally, ElasticSearch flattens nested JSON objects, so it can't correlate the lang and name of a specific element in the labels array. If you want this kind of correlation, you'll need to index your documents differently.
The usual way to do this is to use the nested data type with a matching nested query.
The query would end up looking something like this:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}
But note that you'll need to also specify nested mappings for your labels, e.g.:
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
/* you might want to add other mapping-related configuration here */
},
"lang": {
"type": "keyword"
}
}
}
}
Other ways to do this include:
Indexing each label as a separate document, repeating the guid field
Using parent/child documents
You should use Nested datatype in mapping instead of Object datatype. For detail explanation refer this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
So, you should define mapping of your field something like this:
{
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"lang": {
"type": "keyword"
}
}
}
}
}
After this you could query using Nested Query as:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}

Multiple (AND) queries for a nested index structure in Elasticsearch

I have an index with the below mapping
{
"mappings": {
"xxxxx": {
"properties": {
"ID": {
"type": "text"
},
"pairs": {
"type": "nested"
},
"xxxxx": {
"type": "text"
}
}
}
}
}
the pairs field is essentially an array of objects - each object has a unique ID associated with it
What i'm trying to do is to get only one object from the pairs field for updates. To that extent , i've tried this
GET /sample/_search/?size=1000
{
"query": {
"bool": {
"must": [
{
"match": {
"ID": "2rXdCf5OM9g1ebPNFdZNqW"
}
},
{
"match": {
"pairs.id": "c1vNGnnQLuk"
}
}
]
}
},
"_source": "pairs"
}
but this just returns an empty object despite them being valid IDs. If i remove the pairs.id rule - i get the entire array of objects .
What do i need to add/edit to ensure that i can query via both IDS (original and nested)
Since pairs is of nested type, you need to use a nested query. Also you might probably want to leverage nested inner-hits as well:
GET /sample/_search/?size=1000
{
"query": {
"bool": {
"must": [
{
"match": {
"ID": "2rXdCf5OM9g1ebPNFdZNqW"
}
},
{
"nested": {
"path": "pairs",
"query": {
"match": {
"pairs.id": "c1vNGnnQLuk"
}
},
"inner_hits": {}
}
}
]
}
},
"_source": false
}

Elasticsearch filter on nested set

I'm having trouble figuring out how to filter on nested sets. I have this in my index:
PUT /testing
PUT /testing/_mapping/product
{
"product": {
"properties": {
"features": { "type": "nested" }
}
}
}
POST /testing/product
{
"productid": 123,
"features": [
{
"name": "Weight",
"nameslug": "weight",
"value": "10",
"valueslug": "10-kg"
},
{
"name": "Weight",
"nameslug": "weight",
"value": "12",
"valueslug": "12-kg"
}
]
}
I need to filter on value but I get the valueslug from the url. So far I have the following code:
POST _search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "features",
"query": {
"bool": {
"filter": [
{
"range": {
"features.value": { "gte": ??? }
}
}
]
}
}
}
}
]
}
}
}
The difficult part is resolving the valueslug to the actual value. I have looked into Script Query using doc_value, but the problem with that is that it is executed within the current nested document. It would be possible by execution two queries, but I am trying to avoid that (if possible).
I get the feeling that the solution lies in the way the documents should be structured, but I have no clue how I could structure this any different...
I hope anyone can point me in the right direction.
Thanks in advance!

Elasticsearch sorting by matching array item

I have a following structure in indexed documents:
document1: "customLists":[{"id":8,"position":8},{"id":26,"position":2}]
document2: "customLists":[{"id":26,"position":1}]
document3: "customLists":[{"id":8,"position":1},{"id":26,"position":3}]
I am able to search matching documents that belong to a given list with match query "customLists.id = 26". But I need to sort the documents based on the position value within that list and ignore positions of the other lists.
So the expected results would be in order of document2, document1, document3
Is the data structure suitable for this kind of sorting and how to handle this?
One way to achieve this would be to set mapping type of customLists as nested and then use sorting by nested fields
Example :
1) Create Index & Mapping
put test
put test/test/_mapping
{
"properties": {
"customLists": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"position": {
"type": "integer"
}
}
}
}
}
2) Index Documents :
put test/test/1
{
"customLists":[{"id":8,"position":8},{"id":26,"position":2}]
}
put test/test/2
{
"customLists":[{"id":26,"position":1}]
}
put test/test/3
{
"customLists":[{"id":8,"position":1},{"id":26,"position":3}]
}
3) Query to sort by positon for given id
post test/_search
{
"filter": {
"nested": {
"path": "customLists",
"query": {
"term": {
"customLists.id": {
"value": "26"
}
}
}
}
},
"sort": [
{
"customLists.position": {
"order": "asc",
"mode": "min",
"nested_filter": {
"term": {
"customLists.id": {
"value": "26"
}
}
}
}
}
]
}

How to query for two fields in one and the same tuple in an array in ElasticSearch?

Let's say there are some documents in my index which look like this:
{
"category":"2020",
"properties":[
{
"name":"foo",
"value":"2"
},
{
"name":"boo",
"value":"2"
}
]
},
{
"category":"2020",
"properties":[
{
"name":"foo",
"value":"8"
},
{
"name":"boo",
"value":"2"
}
]
}
I'd like to query the index in a way to return only those documents that match "foo":"2"but not "boo":"2".
I tried to write a query that matches both properties.name and properties.value, but then I'm getting false positives. I need a way to tell ElasticSearch that name and value have to be part of the same properties tuple.
How can I do that?
You need to map properties as a nestedtype. So your mapping would look similar to this:
{
"your_type": {
"properties": {
"category": {
"type": "string"
},
"properties": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
Then, your query to match documents having "foo=2" in the same tuple but not "boo=2" in the same tuple would need to use the nested query accordingly, like the one below.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{
"match": {
"properties.name": "foo"
}
},
{
"match": {
"properties.value": "2"
}
}
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{
"match": {
"properties.name": "boo"
}
},
{
"match": {
"properties.value": "2"
}
}
]
}
}
}
}
]
}
}
}
#Val's answer is as good as it gets. One thing I would add, though, since it makes the difference between one type of query and others that might benefit from nesteds "opposite" feature.
In Elasticsearch, the default type for "properties":[{"name":"foo","value":"2"},{"name":"boo","value":"2"}] that is used to auto-create such a field is object. The object has the drawback that it doesn't associate one sub-field's value with another sub-field's value, meaning foo is not necessarily associated with 2. name is just an array of values and value is the again another array of values with not association between the two.
If one needs the above association to work then nested is a must.
But, I have encountered situations where both these features were needed. If you need both of these, you can set include_in_parent: true for the mapping so that you can take advantage of both. One of the situations that I have seen is here.
"properties": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string"
...

Resources