Elasticsearch filter on nested set - elasticsearch

I'm having trouble figuring out how to filter on nested sets. I have this in my index:
PUT /testing
PUT /testing/_mapping/product
{
"product": {
"properties": {
"features": { "type": "nested" }
}
}
}
POST /testing/product
{
"productid": 123,
"features": [
{
"name": "Weight",
"nameslug": "weight",
"value": "10",
"valueslug": "10-kg"
},
{
"name": "Weight",
"nameslug": "weight",
"value": "12",
"valueslug": "12-kg"
}
]
}
I need to filter on value but I get the valueslug from the url. So far I have the following code:
POST _search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "features",
"query": {
"bool": {
"filter": [
{
"range": {
"features.value": { "gte": ??? }
}
}
]
}
}
}
}
]
}
}
}
The difficult part is resolving the valueslug to the actual value. I have looked into Script Query using doc_value, but the problem with that is that it is executed within the current nested document. It would be possible by execution two queries, but I am trying to avoid that (if possible).
I get the feeling that the solution lies in the way the documents should be structured, but I have no clue how I could structure this any different...
I hope anyone can point me in the right direction.
Thanks in advance!

Related

Elastic Search query for an AND condition on two properties of a nested object

I have the post_filter as below, Where I am trying to filter records where the school name is HILL SCHOOL AND containing a nested child object with name JOY AND section A.
school is present in the parent object, Which is holding children list of nested objects.
All of the above are AND conditions.
But the query doesn't seem to work. Any idea why ? And is there a way to combine the two nested queries?
GET /test_school/_search
{
"query": {
"match_all": {}
},
"post_filter": {
"bool": {
"must_not": [
{
"bool": {
"must": [
{
"term": {
"schoolname": {
"value": "HILL SCHOOL"
}
}
},
{
"nested": {
"path": "children",
"query": {
"bool": {
"must": [
{
"match": {
"name": "JACK"
}
}
]
}
}
}
},
{
"term": {
"children.section": {
"value": "A"
}
}
}
]
}
}
]
}
}
}
The schema is as below:
PUT /test_school
{
"mappings": {
"_doc": {
"properties": {
"schoolname": {
"type": "keyword"
},
"children": {
"type": "nested",
"properties": {
"name": {
"type": "keyword",
"index": true
},
"section": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
Sample data as below:
POST /test_school/_doc
{
"schoolname":"HILL SCHOOL",
"children":{
"name":"JOY",
"section":"A"
}
}
second record
POST /test_school/_doc
{
"schoolname":"HILL SCHOOL",
"children":{
"name":"JACK",
"section":"B"
}
}
https://stackoverflow.com/a/17543151/183217 suggests special mapping is needed to work with nested objects. You appear to be falling foul of the "cross object matching" problem.

Multiple (AND) queries for a nested index structure in Elasticsearch

I have an index with the below mapping
{
"mappings": {
"xxxxx": {
"properties": {
"ID": {
"type": "text"
},
"pairs": {
"type": "nested"
},
"xxxxx": {
"type": "text"
}
}
}
}
}
the pairs field is essentially an array of objects - each object has a unique ID associated with it
What i'm trying to do is to get only one object from the pairs field for updates. To that extent , i've tried this
GET /sample/_search/?size=1000
{
"query": {
"bool": {
"must": [
{
"match": {
"ID": "2rXdCf5OM9g1ebPNFdZNqW"
}
},
{
"match": {
"pairs.id": "c1vNGnnQLuk"
}
}
]
}
},
"_source": "pairs"
}
but this just returns an empty object despite them being valid IDs. If i remove the pairs.id rule - i get the entire array of objects .
What do i need to add/edit to ensure that i can query via both IDS (original and nested)
Since pairs is of nested type, you need to use a nested query. Also you might probably want to leverage nested inner-hits as well:
GET /sample/_search/?size=1000
{
"query": {
"bool": {
"must": [
{
"match": {
"ID": "2rXdCf5OM9g1ebPNFdZNqW"
}
},
{
"nested": {
"path": "pairs",
"query": {
"match": {
"pairs.id": "c1vNGnnQLuk"
}
},
"inner_hits": {}
}
}
]
}
},
"_source": false
}

geo_distance doesn't return any hit Elasticsearch

Have a problem with this query, when I use geo_distance filter, nothing returned from query. When I remove it I get proper results. Query is bellow:
GET _search
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": 20,
"distance_unit": "km",
"coordinates": [48.8488576, 2.3354223]
}
},
"must": {
"term": {
"_type": {
"value": "staff"
}
}
},
"must_not": [
{
"term": {
"cabinet.zipcode": {
"value": "75006"
}
}
},
{
"term": {
"next_availability_in_days": {
"value": "-1"
}
}
}
]
}
}
}
I would appreciate if someone gives me a hint.
UPDATE
When I run Elasticsearch Ruby DSL with same query logic, I get proper results:
<Elasticsearch::Model::Searching::SearchRequest:0x007ff335763560
#definition=
{:index=>["development_app_scoped_index_20170428134744",
"development_app_scoped_index_20170428134744"], :type=>["staff", "light_staff"],
:body=>
{:query=>
{:bool=>
{:must_not=>[
{:term=>{"cabinet.zipcode"=>75006}},
{:term=> {:next_availability_in_days=>-1}}
],
:must=>[
{:term=>{:_type=>"staff"}}
],
:filter=>{:geo_distance=>
{:coordinates=>
{:lat=>48.8488576, :lon=>2.3354223},
:distance=>"6km"
}
}}},
:sort=>[
{:type=>{:order=>"desc"}},
{"_geo_distance"=>{"coordinates"=>"48.8488576,2.3354223", "order"=>"asc",
"unit"=>"km"}},
{:next_availability_in_days=>{:order=>"asc"}},
{:priority=>{:order=>"asc"}}
]
}}
So this is really weird and I'm not sure what's going wrong in ES syntax, but it definitely should work as expected.
Thanks.
There is probably nothing in the range that you have entered.
Try to increase the "distance": 20 field to "distance": 500 and check the results then. For example the distance between these two geo points [0,0] and [0,1] is ~138.3414KM .
Another suggestion is to get rid of the "distance_unit" field and put the
and put the KM inside the "distance" field as following:
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": "20km",
"coordinates": [
48.8488576,
2.3354223
]
}
}
}
}
}

How to query for two fields in one and the same tuple in an array in ElasticSearch?

Let's say there are some documents in my index which look like this:
{
"category":"2020",
"properties":[
{
"name":"foo",
"value":"2"
},
{
"name":"boo",
"value":"2"
}
]
},
{
"category":"2020",
"properties":[
{
"name":"foo",
"value":"8"
},
{
"name":"boo",
"value":"2"
}
]
}
I'd like to query the index in a way to return only those documents that match "foo":"2"but not "boo":"2".
I tried to write a query that matches both properties.name and properties.value, but then I'm getting false positives. I need a way to tell ElasticSearch that name and value have to be part of the same properties tuple.
How can I do that?
You need to map properties as a nestedtype. So your mapping would look similar to this:
{
"your_type": {
"properties": {
"category": {
"type": "string"
},
"properties": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
Then, your query to match documents having "foo=2" in the same tuple but not "boo=2" in the same tuple would need to use the nested query accordingly, like the one below.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{
"match": {
"properties.name": "foo"
}
},
{
"match": {
"properties.value": "2"
}
}
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{
"match": {
"properties.name": "boo"
}
},
{
"match": {
"properties.value": "2"
}
}
]
}
}
}
}
]
}
}
}
#Val's answer is as good as it gets. One thing I would add, though, since it makes the difference between one type of query and others that might benefit from nesteds "opposite" feature.
In Elasticsearch, the default type for "properties":[{"name":"foo","value":"2"},{"name":"boo","value":"2"}] that is used to auto-create such a field is object. The object has the drawback that it doesn't associate one sub-field's value with another sub-field's value, meaning foo is not necessarily associated with 2. name is just an array of values and value is the again another array of values with not association between the two.
If one needs the above association to work then nested is a must.
But, I have encountered situations where both these features were needed. If you need both of these, you can set include_in_parent: true for the mapping so that you can take advantage of both. One of the situations that I have seen is here.
"properties": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string"
...

Searching objects having all nested children matching a given query in Elasticsearch

Given an object with the following mapping:
{
"a": {
"properties": {
"id": {"type": "string"}
"b": {
"type": "nested",
"properties": {
"key": {"type": "string"}
}
}
}
}
}
I want to retrieve all the instances of this object having all nested children matching a given query.
For example, suppose I want to retrieve all the instances having all children with "key" = "yes".
Given the following instances:
{
"id": "1",
"b": [
{
"key": "yes"
},
{
"key": "yes"
}
]
},
{
"id": "2",
"b": [
{
"key": "yes"
},
{
"key": "yes"
},
{
"key": "no"
}
]
},
I want to retrieve only the first one (the one with "id" = "1").
Both using filters or queries is fine to me.
I already tried to use the "not filter" and the "must_not bool filter". The idea was to use a double negation to extract only objects that doesn't have fields that are different to the given one.
However, I was not able to write down this query correctly.
I realize that this is not a common query for a search engine, but, in my case, it can be useful.
Is it possible to write this query ("forall nested query") using nested objects?
In case it is not, would it be possible to write this query using parent-child?
Update
Andrei Stefan gave a good answer in case we know all the values of "key" that we want to avoid, ("no", in the example).
I am interested also in the case you don't know the values you want to avoid, and you just want to match nested object with "key"="yes".
You need a flattened data structure for this - an array of values. The simplest way and not to change the current mapping too much, is to use include_in_parent property and to query the field that's being included in the parent for this particular requirement:
{
"mappings": {
"a": {
"properties": {
"id": {
"type": "string"
},
"b": {
"type": "nested",
"include_in_parent": true,
"properties": {
"key": {
"type": "string"
}
}
}
}
}
}
}
And then your query would look like this:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "b.key:(yes NOT no)"}
}
}
]
}
}
}
}
The alternative is to change the type of the field from nested to object but in this way you'll loose the advantages of using nested fields:
{
"mappings": {
"a": {
"properties": {
"id": {
"type": "string"
},
"b": {
"type": "object",
"properties": {
"key": {
"type": "string"
}
}
}
}
}
}
}
The query remains the same.
Encountered the same problem, though didn't have just yes/no variants.
As per Clinton Gormley's answer in https://github.com/elastic/elasticsearch/issues/19166:
"You can't do it any efficient way. You have to count all children and compare that to how many children match. The following will return all parents where all children match but it is a horrible inefficient solution and I would never recommend using it in practice":
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "b",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"weight": -1
},
{
"filter": {
"match": {
"b.key": "yes"
}
},
"weight": 1
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
}
]
}
}
}

Resources