How to add properties from a root object in a nested object for sorting? - elasticsearch

A simplified example of the kind of document in our index:
{
"organisation" : {
"code" : "01310"
},
"publications" : [
{
"dateEnd" : 1393801200000,
"dateStart" : 1391986800000,
"code" : "PUB.02"
},
{
"dateEnd" : 1401055200000,
"dateStart" : 1397512800000,
"code" : "PUB.06"
}
]
}
Note that publications are mapped as nested objects because we need to filter based on a combination of the dateEnd, dateStart and publicationStatus properties.
The PUB.02 status code is special. It states: 'this publication period is valid if the current user is a member of the organisation'.
I have a problem when I want to sort on 'most recent':
{
"sort": {
"publications.dateStart" : {
"mode" : "min",
"order" : "desc",
"nested_filter" : {
"or" : [
{
"and" : [
{ "term" : { "organisation.code" : "01310" } },
{ "term" : { "publications.code" : "PUB.02" } }
]
},
{ "term" : { "publications.code" : "PUB.06" } }
]
}
}
}
}
No error is given, but the PUB.02 entry is ignored. I tried to use copy_to in my mapping to copy the value of organisation.code to the nested object, but that did not help.
Is there a way to reach for the parent document inside a nested sort?
Alternatively, is there a way to copy data from parent to the nested document?
I am currently using version 1.7 of Elasticsearch without the ability to use scripts. Upgrading to a newer version could be done if that would help the situation.
This gist shows that the sort is performed on the PUB.06 publications: https://gist.github.com/EECOLOR/2db9a1ec9d6d5c791ea6

Although the documentation does not explictly mention it does look like we cannot access the parent field in a nested filter context.
Also I wasn't able to use copy_to to add data from root/parent field to nested document. I would suggest asking in elasticsearch discuss thread you would have more luck about the reasons for this.
Before some trigger happy bloke downvotes this answer I would like to add that the query and intended results that was desired in the OP using sort could be achieved using function_score work-around.
One implementation to achieve this is as follows
1) start of with a should query
2) In the first should clause
a) use filtered query to filter documents with the `organisation.code : 01310`
b) then score these documents based on max value of reciprocal of nested document **dateStart** with terms **PUB2.0 PUB6.0**
3) In the second should clause
a) use filtered query to filter documents with those with `organisation.code not equal to 01310`
b) like before score these documents based on max value of reciprocal of nested document **dateStart** with term **PUB6.0** only
Example Query:
POST /testindex/testtype/_search
{
"query": {
"bool": {
"should": [
{
"filtered": {
"filter": {
"term": {
"organisation.code": "01310"
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.02",
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
},
{
"filtered": {
"filter": {
"not": {
"term": {
"organisation.code": "01310"
}
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
}
]
}
}
}
I'm first to admit it is not the most readable and if there is a way to 'copy_to' nested it would be much more ideal
If not simulating copy_to by injecting data in the source by client before indexing would be more simpler and flexible.
But the above is an example of how it could be done using function scores.

Related

Add condition to filter aggregation in elastic search

I want the count of each values of a variable based on some filter applied in elastic search. For example, I want all the age groups but on the filter that the students are from California.
The age groups is text field and contains an array like this,
"age_group": ["5-6-years", "6-7-years"]
I kinda want a query like this but this ain't working. It throws an error saying
unable to parse BaseAggregationBuilder with name [count]: parser not found
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.place_of_birth": "California" }
},
"aggs" : {
"age_group" : { "count" : { "field" : "students.age_group" } }
}
}
}
}
Request help from you troops.
That's because there's no metric aggregation called count but value_count instead:
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.gender": "boys" }
},
"aggs" : {
"age_group" : { "value_count" : { "field" : "students.age_group" } }
^^^
|||
}
}
}
}
UPDATE:
After discussions, the terms aggregation was more appropriate than value_count. After fixing the mapping (which was text instead of keyword), the query worked out correctly

Elasticsearch - Aggregations on part of bool query

Say I have this bool query:
"bool" : {
"should" : [
{ "term" : { "FirstName" : "Sandra" } },
{ "term" : { "LastName" : "Jones" } }
],
"minimum_should_match" : 1
}
meaning I want to match all the people with first name Sandra OR last name Jones.
Now, is there any way that I can get perform an aggregation on all the documents that matched the first term only?
For example, I want to get all of the unique values of "Prizes" that anybody named Sandra has. Normally I'd just do:
"query": {
"match": {
"FirstName": "Sandra"
}
},
"aggs": {
"Prizes": {
"terms": {
"field": "Prizes"
}
}
}
Is there any way to combine the two so I only have to perform a single query which returns all of the people with first name Sandra or last name Jones, AND an aggregation only on the people with first name Sandra?
Thanks alot!
Use post_filter.
Please refer the following query. Post_filter will make sure that your bool should clause don't effect your aggregation scope.
Aggregations are filtered based on main query as well, but they are unaffected by post_filter. Please refer to the link
{
"from": 0,
"size": 20,
"aggs": {
"filtered_lastname": {
"filter": {
"query": {
"match": {
"FirstName": "sandra"
}
}
},
"aggs": {
"prizes": {
"terms": {
"field": "Prizes",
"size": 10
}
}
}
}
},
"post_filter": {
"bool": {
"should": [{
"term": {
"FirstName": "Sandra"
}
}, {
"term": {
"LastName": "Jones"
}
}],
"minimum_should_match": 1
}
}
}
Running a filter inside the aggs before aggregating on prizes can help you achieve your desired usecase.
Thanks
Hope this helps

Finding docs that do not contain a given user in all nested fields

I have the following mapping for a nested field called ratings:
"ratings" : {
"type" : "nested",
"properties" : {
"rating" : {
"type" : "double"
},
"user_id" : {
"type" : "long"
}
}
}
I'm attempting to find all records where a user_id does not exist in the nested field.
Here's what I have, but it's failing when there are multiple nested docs and any of the docs are not user_id 1.
{
"nested": {
"path": "ratings",
"query": {
"bool": { "must_not": [
{ "term": { "ratings.user_id": 1}}
]}}}}
If I'm understanding you correctly, and what you are trying to do is find documents for which NONE of the nested documents have a specific user_id, then this query seems to do what you want (assuming you want docs that have not been rated by user 2):
POST /test_index/_search
{
"query": {
"constant_score": {
"filter": {
"not": {
"filter": {
"nested": {
"path": "ratings",
"filter": {
"term": {
"ratings.user_id": 2
}
}
}
}
}
}
}
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/afd319e64403a7f995cbf1e9f40e5c5948729193

Elastic Search Nested Object mapping and Query for search

I am trying to use Elastic Search and I am stuck trying to query for the nested object.
Basically my object is of the following format
{
"name" : "Some Name",
"field2": [
{
"prop1": "val1",
"prop2": "val2"
},
{
"prop1": "val3",
"prop2":: "val4"
}
]
}
Mapping I used for the nested field is the following.
PUT /someval/posts/_mapping
{
"posts": {
"properties": {
"field2": {
"type": "nested"
}
}
}
}
Say now i insert elements for /field/posts/1 and /field/posts/2 etc. I have k values for field2.prop1 and i want a query which gets the posts sorted based on most match of field2.prop1 among the K values i have. What would be the appropriate query for that.
Also I tried a simple filter but even that doesnt seem to work right.
GET /someval/posts/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
},
"filter" : {
"nested" : {
"path" : "field2",
"filter" : {
"bool" : {
"must" : [
{
"term" : {"field2.prop1" : "val1"}
}
]
}
},
"_cache" : true
}
}
}
}
The above query should match atleast the first post. But it returns no match. Can anyone help to clarify whats wrong here ?
There was problem in your json structure, you used filtered query , but filter(object) was in different level than query.
Find the difference.
POST /someval/posts/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "field2",
"filter": {
"bool": {
"must": [
{
"term": {
"field2.prop1": "val1"
}
}
]
}
},
"_cache": true
}
}
}
}
}

Elasticsearch multi term filter

I'm quite new to Elasticsearch, so here's my question.
I wanna do a search query with elasticsearch and wanna filter with multiple terms.
If I want to search for a user 'tom', then I would like to have all the matches where the user 'isActive = 1', 'isPrivate = 0' and 'isOwner = 1'.
Here's my search query
"query":{
"filtered": {
"query": {
"query_string": {
"query":"*tom*",
"default_operator": "OR",
"fields": ["username"]
}
},
"filter": {
"term": {
"isActive": "1",
"isPrivate": "0",
"isOwner": "1"
}
}
}
}
When I use 2 terms, it works like a charm, but when i use 3 terms it doesn't.
Thanks for the help!!
You should use bool filter to AND all your terms:
"query":{
"filtered": {
"query": {
"query_string": {
"query":"*tom*",
"default_operator": "OR",
"fields": ["username"]
}
},
"filter": {
"bool" : {
"must" : [
{"term" : { "isActive" : "1" } },
{"term" : { "isPrivate" : "0" } },
{"term" : { "isOwner" : "1" } }
]
}
}
}
}
For version 2.x+ you can use bool query instead of filtered query with some simple replacement: https://www.elastic.co/guide/en/elasticsearch/reference/7.4/query-dsl-filtered-query.html
As one of the comments says, the syntax has changed in recent ES versions. If you are using Elasticsearch 6.+, and you want to use a wildcard and a sequence of terms in your query (such as in the question), you can use something like this:
GET your_index/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"your_field_name_1": {
"value": "tom*"
}
}
},
{
"term": {
"your_field_name_2": {
"value": "US"
}
}
},
{
"term": {
"your_field_name_3": {
"value": "Michigan"
}
}
},
{
"term": {
"your_field_name_4": {
"value": "0"
}
}
}
]
}
}
}
Also, from the documentation about wildcard queries:
Note that this query can be slow, as it needs to iterate over many
terms. In order to prevent extremely slow wildcard queries, a wildcard
term should not start with one of the wildcards * or ?.
I hope this helps.

Resources