Elasticsearch script query involving root and nested values - elasticsearch

Suppose I have a simplified Organization document with nested publication values like so (ES 2.3):
{
"organization" : {
"dateUpdated" : 1395211600000,
"publications" : [
{
"dateCreated" : 1393801200000
},
{
"dateCreated" : 1401055200000
}
]
}
}
I want to find all Organizations that have a publication dateCreated < the organization's dateUpdated:
{
"query": {
"nested": {
"path": "publications",
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['publications.dateCreated'].value < doc['dateUpdated'].value"
}
}
]
}
}
}
}
}
My problem is that when I perform a nested query, the nested query does not have access to the root document values, so doc['dateUpdated'].value is invalid and I get 0 hits.
Is there a way to pass in a value into the nested query? Or is my nested approach completely off here? I would like to avoid creating a separate document just for publications if necessary.
Thanks.

You can not access the root values from nested query context. They are indexed as separate documents. From the documentation
The nested clause “steps down” into the nested comments field. It no
longer has access to fields in the root document, nor fields in any
other nested document.
You can get the desired results with the help of copy_to parameter. Another way to do this would be to use include_in_parent or include_in_root but they might be deprecated in future and it will also increase the index size as every field of nested type will be included in root document so in this case copy_to functionality is better.
This is a sample index
PUT nested_index
{
"mappings": {
"blogpost": {
"properties": {
"rootdate": {
"type": "date"
},
"copy_of_nested_date": {
"type": "date"
},
"comments": {
"type": "nested",
"properties": {
"nested_date": {
"type": "date",
"copy_to": "copy_of_nested_date"
}
}
}
}
}
}
}
Here every value of nested_date will be copied to copy_of_nested_date so copy_of_nested_date will look something like [1401055200000,1393801200000,1221542100000] and then you could use simple query like this to get the results.
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['rootdate'].value < doc['copy_of_nested_date'].value"
}
}
]
}
}
}
You don't have to change your nested structure but you would have to reindex the documents after adding copy_to to publication dateCreated

Related

Elasticsearch - object type, search nested elements

I'm using Elasticsearch version 7.2.0 and have the following type in my mapping:
"status_change": {
"type": "object",
"enabled": false
},
Example of the data inside this field:
"before": {
"status_type": "Status One"
},
"after": {
"status_type": "Status Two"
}
There are various status_types and I am attempting to create a query to identify changes from specified before status_type to specified after status type. I'm struggling to query by these nested elements and the following query fails:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"nested": {
"path": "status_change",
"query": {
"term": {
"status_change.before": "the_before_status"
}
}
}
}
}
}
}
With:
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "[nested] nested object under path [status_change] is not of nested type"
}
Which is self explanatory, due to status_change not being a 'nested' field. I have tried looking at searching objects by their nested elements but I have not found a solution.
Is there a way for me to search the nested object elements like this?
You are not making use of nested type but an object type. There is a difference and I'd suggest you to go through the aforementioned links.
If you are using Nested Type
Change your mapping to the below:
"status_change": {
"type": "nested"
}
Once you do that and then reindex the documents, the query you have, would simply return the list of documents containing status_change.before: the_before_status irrespective of what value you have under status_change.after in your documents.
For further filtering based on status_change.after, add another must clause with that field to return documents you are looking for.
If you are using Object Type
Simply remove the field enabled: false from your mapping and change your query to the below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"filter": [
{
"match": {
"status_change.before.status_type": "the_before_status"
}
}
]
}
}
}
Hope that helps!

Elasticsearch - Script Filter over a list of nested objects

I am trying to figure out how to solve these two problems that I have with my ES 5.6 index.
"mappings": {
"my_test": {
"properties": {
"Employee": {
"type": "nested",
"properties": {
"Name": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"Surname": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
I need to create two separate scripted filters:
1 - Filter documents where size of employee array is == 3
2 - Filter documents where the first element of the array has "Name" == "John"
I was trying to make some first steps, but I am unable to iterate over the list. I always have a null pointer exception error.
{
"bool": {
"must": {
"nested": {
"path": "Employee",
"query": {
"bool": {
"filter": [
{
"script": {
"script" : """
int array_length = 0;
for(int i = 0; i < params._source['Employee'].length; i++)
{
array_length +=1;
}
if(array_length == 3)
{
return true
} else
{
return false
}
"""
}
}
]
}
}
}
}
}
}
As Val noticed, you cant access _source of documents in script queries in recent versions of Elasticsearch.
But elasticsearch allow you to access this _source in the "score context".
So a possible workaround ( but you need to be careful about the performance ) is to use a scripted score combined with a min_score in your query.
You can find an example of this behavior in this stack overflow post Query documents by sum of nested field values in elasticsearch .
In your case a query like this can do the job :
POST <your_index>/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
if (params["_source"]["Employee"].length === params.nbEmployee) {
def firstEmployee = params._source["Employee"].get(0);
if (firstEmployee.Name == params.name) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
""",
"params": {
"nbEmployee": 3,
"name": "John"
}
}
}
}
]
}
}
}
The number of Employee and first name should be set in the params to avoid script recompilation for every use case of this script.
But remember it can be very heavy on your cluster as Val already mentioned. You should narrow the set a document on which your will apply the script by adding filters in the function_score query ( match_all in my example ).
And in any case, it is not the way Elasticsearch should be used and you cant expect bright performances with such a hacked query.
1 - Filter documents where size of employee array is == 3
For the first problem, the best thing to do is to add another root-level field (e.g. NbEmployees) that contains the number of items in the Employee array so that you can use a range query and not a costly script query.
Then, whenever you modify the Employee array, you also update that NbEmployees field accordingly. Much more efficient!
2 - Filter documents where the first element of the array has "Name" == "John"
Regarding this one, you need to know that nested fields are separate (hidden) documents in Lucene, so there is no way to get access to all the nested docs at once in the same query.
If you know you need to check the first employee's name in your queries, just add another root-level field FirstEmployeeName and run your query on that one.

Multiple Paths in Nested Queries

I'm cross-posting this from the elasticsearch forums (https://discuss.elastic.co/t/multiple-paths-in-nested-query/96851/1)
Below is an example, but first I’ll tell you about my use case, because I’m not sure if this is a good approach. I’m trying to automatically index a large collection of typed data. What this means is I’m trying to generate mappings and queries on those mappings all automatically based on information about my data. A lot of my data is relational, and I’m interested in being able to search accross the relations, thus I’m also interested in using Nested data types.
However, the issue is that many of these types have on the order of 10 relations, and I’ve got a feeling its not a good idea to pass 10 identical copies of a nested query to elasticsearch just to query 10 different nested paths the same way. Thus, I’m wondering if its possible to instead pass multiple paths into a single query? Better yet, if its possible to search over all fields in the current document and in all its nested documents and their fields in a single query. I’m aware of object fields, and they’re not a good fit because I want to retrive some data of matched nested documents.
In this example, I create an index with multiple nested types and some of its own types, upload a document, and attempt to query the document and all its nested documents, but fail. Is there some way to do this without duplicating the query for each nested document, or is that actually a performant way to do this? Thanks
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested",
"properties": {
"name": {
"type":"text"
},
"number": {
"type":"text"
}
}
},
"obj2" : {
"type" : "nested",
"properties": {
"color": {
"type":"text"
},
"food": {
"type":"text"
}
}
},
"lul":{
"type": "text"
},
"pucci":{
"type": "text"
}
}
}
}
}
PUT /my_index/type1/1
{
"obj1": [
{ "name":"liar", "number":"deer dog"},
{ "name":"one two three", "number":"you can call on me"},
{ "name":"ricky gervais", "number":"user 123"}
],
"obj2": [
{ "color":"red green blue", "food":"meatball and spaghetti"},
{ "color":"orange", "food":"pineapple, fish, goat"},
{ "color":"none", "food":"none"}
],
"lul": "lul its me user123",
"field": "one dog"
}
POST /my_index/_search
{
"query": {
"nested": {
"path": ["obj1", "obj2"],
"query": {
"query_string": {
"query": "ricky",
"all_fields": true
}
}
}
}
}

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Elastic search multiple terms in a dictionary

I have mapping like:
"profile": {
"properties": {
"educations": {
"properties": {
"university": {
"type": "string"
},
"graduation_year": {
"type": "string"
}
}
}
}
}
which obviously holds the educations history of people. Each person can have multiple educations. What I want to do is search for people who graduated from "SFU" in "2012". To do that I am using filtered search:
"filtered": {
"filter": {
"and": [
{
"term": {
"educations.university": "SFU"
}
},
{
"term": {
"educations.graduation_year": "2012"
}
}
]
}
But what this query does is to find the documents who have "SFU" and "2012" in their education, so this document would match, which is wrong:
educations[0] = {"university": "SFU", "graduation_year": 2000}
educations[1] = {"university": "UBC", "graduation_year": 2012}
Is there anyway I could filter both terms on each education?
You need to define nested type for educations and use nested filter to filter it, or Elasticsearch will internally flattens inner objects into a single object, and return the wrong results.
You can refer here for detail explainations and samples:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Resources