Elasticsearch Deleting all nested object with a specific datetime - elasticsearch

I'm using Elasticsearch 5.6 and I have a schedule nested field with nested objects that look like this
{
"status": "open",
"starts_at": "2020-10-13T17:00:00-05:00",
"ends_at": "2020-10-13T18:00:00-05:00"
},
{
"status": "open",
"starts_at": "2020-10-13T18:00:00-05:00",
"ends_at": "2020-10-13T19:30:00-05:00"
}
what I'm looking for is a Painless query that will delete multiple nested objects that is equals to the starts_at field. I've tried multiple ways but none worked, they run correctly but don't delete the targeted objects

Was able to do this with looping over it and using SimpleDateFormat
POST index/_update_by_query
{
"script": {"source": "for(int i=0;i< ctx._source.schedule.length;i++){
SimpleDateFormat sdformat = new SimpleDateFormat('yyyy-MM-dd\\'T\\'HH:mm:ss');
boolean equalDateTime = sdformat.parse(ctx._source.schedule[i].starts_at).compareTo(sdformat.parse(params.starts_at)) == 0;
if(equalDateTime) {
ctx._source.schedule.remove(i)
}
}",
"params": {
"starts_at": "2020-10-13T17:00:00-05:00"
},
"lang": "painless"
},
"query":{
"bool": {"must":[
{"terms":{"_id":["12345"]}}
]}}
}

You can use UpdateByQuery for the same.
POST <indexName>/<type>/_update_by_query
{
"query":{ // <======== Filter out the parent documents containing the specified nested date
"match": {
"schedule.starts_at": "2020-10-13T17:00:00-05:00"
}
},
"script":{ // <============ use the script to remove the schedule containing specific start date
"inline": "ctx._source.schedule.removeIf(e -> e.starts_at == '2020-10-13T17:00:00-05:00')"
}
}

Related

Elasticsearch: count query in _search script

I'm trying to make a single query for updating the one field value in ES index.
I have a index pages which contain information about the pages (id, name, time, parent_page_id, child_count etc)
I can update the field parent_page_id with number of documents which have this page id as parent_page_id
I can update the field with default single value like:
PUT HOST_ADDRESS/pages/_update_by_query
{
"script": {
"source": "def child_count = 0; ctx._source.child_count = child_count;",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
I'm trying with this code to get child count but its not working.
"source": "def child_count = 0; client.prepareSearch('pages').setQuery(QueryBuilders.termQuery("parent_page_id", "ctx._source.id")).get().getTotal().getDocs().getCount(); ctx._source.child_count = child_count;",
"lang": "painless"
My question is, how can i make a sub count-query in script to have a real child count in variable child_count
Scripting doesn't work like this — you cannot use java DSL in there. There's no concept of client or QueryBuilders etc in the Painless contexts.
As such, you'll need to obtain the counts before you proceed to update the doc(s) with a script.
Tip: scripts are reusable when you store them:
POST HOST_ADDRESS/_scripts/update_child_count
{
"script": {
"lang": "painless",
"source": "ctx._source.child_count = params.child_count"
}
}
and then apply via the id:
PUT HOST_ADDRESS/pages/_update_by_query
{
"script": {
"id": "update_child_count", <-- no need to write the Painless code again
"params": {
"child_count": 987
}
},
"query": {
"term": {
"parent_page_id": 123
}
}
}

Elasticsearch - Script Filter over a list of nested objects

I am trying to figure out how to solve these two problems that I have with my ES 5.6 index.
"mappings": {
"my_test": {
"properties": {
"Employee": {
"type": "nested",
"properties": {
"Name": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"Surname": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
I need to create two separate scripted filters:
1 - Filter documents where size of employee array is == 3
2 - Filter documents where the first element of the array has "Name" == "John"
I was trying to make some first steps, but I am unable to iterate over the list. I always have a null pointer exception error.
{
"bool": {
"must": {
"nested": {
"path": "Employee",
"query": {
"bool": {
"filter": [
{
"script": {
"script" : """
int array_length = 0;
for(int i = 0; i < params._source['Employee'].length; i++)
{
array_length +=1;
}
if(array_length == 3)
{
return true
} else
{
return false
}
"""
}
}
]
}
}
}
}
}
}
As Val noticed, you cant access _source of documents in script queries in recent versions of Elasticsearch.
But elasticsearch allow you to access this _source in the "score context".
So a possible workaround ( but you need to be careful about the performance ) is to use a scripted score combined with a min_score in your query.
You can find an example of this behavior in this stack overflow post Query documents by sum of nested field values in elasticsearch .
In your case a query like this can do the job :
POST <your_index>/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
if (params["_source"]["Employee"].length === params.nbEmployee) {
def firstEmployee = params._source["Employee"].get(0);
if (firstEmployee.Name == params.name) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
""",
"params": {
"nbEmployee": 3,
"name": "John"
}
}
}
}
]
}
}
}
The number of Employee and first name should be set in the params to avoid script recompilation for every use case of this script.
But remember it can be very heavy on your cluster as Val already mentioned. You should narrow the set a document on which your will apply the script by adding filters in the function_score query ( match_all in my example ).
And in any case, it is not the way Elasticsearch should be used and you cant expect bright performances with such a hacked query.
1 - Filter documents where size of employee array is == 3
For the first problem, the best thing to do is to add another root-level field (e.g. NbEmployees) that contains the number of items in the Employee array so that you can use a range query and not a costly script query.
Then, whenever you modify the Employee array, you also update that NbEmployees field accordingly. Much more efficient!
2 - Filter documents where the first element of the array has "Name" == "John"
Regarding this one, you need to know that nested fields are separate (hidden) documents in Lucene, so there is no way to get access to all the nested docs at once in the same query.
If you know you need to check the first employee's name in your queries, just add another root-level field FirstEmployeeName and run your query on that one.

Elasticsearch 5.4: Use normal and nested fields in same Painless script query?

I have a mapping like this:
{
printings: {
type: "nested",
properties: {
prop1: {type: "number"}
}
},
prop2: {type: "number"}
}
I then want to build a Painless query like this:
"script": {
"lang": "painless",
"inline": "doc['prop1'] > (3 * doc['printings.prop2'])"
}
However testing this in Sense doesn't work. If I replace the nested prop2 with a simple number then it does work. Is there a way to access both root and nested props in a single scripted query?
You can try below query.
{
"query": {
"script": {
"script": {
"lang": "painless",
"inline": "params['_source']['prop1'] > (2 * params['_source']['printings']['prop2'])"
}
}
}
}
But please keep this thing mind that _source is very slow. read more about here
Unfortunately, you cannot access nested context from root and you cannot access root context from nested because nested documents are separate documents, even though they are stored close to the parent. But you can solve it with a different mapping using copy_to field feature. Here is a mapping:
{
"mappings": {
"sample": {
"properties": {
"printings": {
"type": "nested",
"properties": {
"prop2": {
"type": "integer",
"copy_to": "child_prop2"
}
}
},
"prop1": {
"type": "integer"
},
"child_prop2": {
"type": "integer"
}
}
}
}
}
In this case the values from nested documents will be copied to the parent. You don't have to explicitly fill this new field, here is ax example of bulk indexing:
POST http://localhost:9200/_bulk HTTP/1.1
{"index":{"_index":"my_index","_type":"sample","_id":null}}
{"printings":[{"prop2":1},{"prop2":4}],"prop1":2}
{"index":{"_index":"my_index","_type":"sample","_id":null}}
{"printings":[{"prop2":0},{"prop2":1}],"prop1":2}
{"index":{"_index":"my_index","_type":"sample","_id":null}}
{"printings":[{"prop2":1},{"prop2":0}],"prop1":2}
After that you can use this query
{
"query": {
"script": {
"script": {
"inline": "doc['prop1'].value > (3 * doc['child_prop2'].value)",
"lang": "painless"
}
}
}
}
The first document won't match. The second one will match by the first subdocument. The third one will match by the second subdocument.

Elasticsearch script query involving root and nested values

Suppose I have a simplified Organization document with nested publication values like so (ES 2.3):
{
"organization" : {
"dateUpdated" : 1395211600000,
"publications" : [
{
"dateCreated" : 1393801200000
},
{
"dateCreated" : 1401055200000
}
]
}
}
I want to find all Organizations that have a publication dateCreated < the organization's dateUpdated:
{
"query": {
"nested": {
"path": "publications",
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['publications.dateCreated'].value < doc['dateUpdated'].value"
}
}
]
}
}
}
}
}
My problem is that when I perform a nested query, the nested query does not have access to the root document values, so doc['dateUpdated'].value is invalid and I get 0 hits.
Is there a way to pass in a value into the nested query? Or is my nested approach completely off here? I would like to avoid creating a separate document just for publications if necessary.
Thanks.
You can not access the root values from nested query context. They are indexed as separate documents. From the documentation
The nested clause “steps down” into the nested comments field. It no
longer has access to fields in the root document, nor fields in any
other nested document.
You can get the desired results with the help of copy_to parameter. Another way to do this would be to use include_in_parent or include_in_root but they might be deprecated in future and it will also increase the index size as every field of nested type will be included in root document so in this case copy_to functionality is better.
This is a sample index
PUT nested_index
{
"mappings": {
"blogpost": {
"properties": {
"rootdate": {
"type": "date"
},
"copy_of_nested_date": {
"type": "date"
},
"comments": {
"type": "nested",
"properties": {
"nested_date": {
"type": "date",
"copy_to": "copy_of_nested_date"
}
}
}
}
}
}
}
Here every value of nested_date will be copied to copy_of_nested_date so copy_of_nested_date will look something like [1401055200000,1393801200000,1221542100000] and then you could use simple query like this to get the results.
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['rootdate'].value < doc['copy_of_nested_date'].value"
}
}
]
}
}
}
You don't have to change your nested structure but you would have to reindex the documents after adding copy_to to publication dateCreated

Casting when querying ElasticSearch data

Is there a way in elasticsearch where I can cast a string to a long value at query time?
I have something like this in my document:
"attributes": [
{
"key": "age",
"value": "23"
},
{
"key": "name",
"value": "John"
},
],
I would like to write a query to get all the persons that have an age > 23. For that I need to cast the value to an int such that I can compare it when the key is age.
The above document is an example very specific to this problem.
I would greatly appreciate your help.
Thanks!
You can use scripting for that
POST /index/type/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "foreach(attr : _source['attributes']) {if ( attr['key']=='age') { return attr['value'] > ageValue;} } return false;",
"params" : {
"ageValue" : 23
}
}
},
"query": {
"match_all": {}
}
}
}
}
UPD: Note that dynamic scripting should be enabled in elasticsearch.yml.
Also, I suppose you can archive better query performance by refactoring you document structure and applying appropriate mapping for age field.

Resources