Elastic search script sort text as Number - sorting

I have an Elastic search field ID which is a Number but set in the index as a "text". I cannot change the index because of the huge volume of data to reload.
I am writing a script to do this sorting but getting a "Bad_request" error.
Script script = new Script(ScriptType.INLINE, "painless", "ctx._source.ID.keyword", Collections.emptyMap());
ScriptSortBuilder.ScriptSortType sortType = ScriptSortBuilder.ScriptSortType.NUMBER;
builder.sort(SortBuilders.scriptSort(script, sortType).order(SortOrder.DESC));
searchRequest.source(builder);
response = restsearchClient.search(searchRequest, RequestOptions.DEFAULT);
I have tried the following idorcode values: doc['ID'], doc['ID.keyword'], ctx._source.ID, ctx._source.ID.keyword.
please advice!

If my understanding correct then you want to sort on number which is store as string and not as integer in Elasticsearch.
Below is sample Elasticsearch Query:
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "Number",
"order": "desc",
"script": {
"lang": "painless",
"source": """
String s = doc['ID.keyword'].value;
int idvalue = Integer.parseInt(s);
return idvalue;
"""
}
}
}
}
Below is Java code:
Script script = new Script(ScriptType.INLINE, "painless", "String s = doc['ID.keyword'].value;int idvalue = Integer.parseInt(s);return idvalue;", Collections.emptyMap());
ScriptSortBuilder.ScriptSortType sortType = ScriptSortBuilder.ScriptSortType.NUMBER;
builder.sort(SortBuilders.scriptSort(script, sortType).order(SortOrder.DESC));

Related

Elasticsearch conditional sort on date field

I am trying to sort an Elastic Search query result on a date field, registeredAt. However, registeredAt doesn't exist in all documents returned. In that case, I would want the sort to look for the date on an alternative field, invitedAt.
If we have 3 hits which look like this:
hits = [
{
id: 'hit2'
registeredAt: '2021-06-01T23:00:00.000Z',
invitedAt: '2021-05-31T23:00:00.000Z'
},
{
id: 'hit3'
invitedAt: '2021-05-31T23:00:00.000Z'
},
{
id: 'hit1'
invitedAt: '2021-06-04T23:00:00.000Z'
},
],
then I would want the sort to return them in order from most recent to least recent: [hit1, hit2, hit3].
In each document, the sort script should look for the registeredAt field and take that date as the sort value and, if that field does not exist, look at the value for invitedAt and take that as the sort value.
In that sense, hit1 does not have a registeredAt and has the most recent date for invitedAt and, as such, should come first. hit2 has a registeredAt field and the date for that field is more recent than the invitedAt date of hit3 (which doesn't have a registeredAt field.
I have written the query as such:
client.search({
index: 'users',
track_total_hits: true,
sort: {
_script: {
type: 'number',
script: {
lang: 'painless',
source:
"if (!doc.containsKey('registeredAt') || doc['registeredAt'].empty) { return doc['invitedAt'].value; } else { return doc['registeredAt'].value }",
},
order: 'desc',
},
},
body: {
from: skip,
size: limit,
query: {...},
},
})
The query runs without errors but the sorting does not work and the documents are returned in the order that they were indexed in.
I assume that registeredAt and invitedAt are date in the mapping.
This query should work. What I added is calling .getMillis() after getting the value.
{
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
if (!doc.containsKey('registeredAt') || doc['registeredAt'].empty) {
return doc['invitedAt'].value.getMillis();
}
else {
return doc['registeredAt'].value.getMillis();
}
"""
},
"order": "desc"
}
}
]
}
Edit: .getMillis() is depricated in version 7.x. .toInstant().toEpochMilli() should be used instead.
This is the query:
{
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
if (!doc.containsKey('registeredAt') || doc['registeredAt'].empty) {
return doc['invitedAt'].value.toInstant().toEpochMilli();
}
else {
return doc['registeredAt'].value.toInstant().toEpochMilli();
}
"""
},
"order": "desc"
}
}
]
}

Elasticsearch: count query in _search script

I'm trying to make a single query for updating the one field value in ES index.
I have a index pages which contain information about the pages (id, name, time, parent_page_id, child_count etc)
I can update the field parent_page_id with number of documents which have this page id as parent_page_id
I can update the field with default single value like:
PUT HOST_ADDRESS/pages/_update_by_query
{
"script": {
"source": "def child_count = 0; ctx._source.child_count = child_count;",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
I'm trying with this code to get child count but its not working.
"source": "def child_count = 0; client.prepareSearch('pages').setQuery(QueryBuilders.termQuery("parent_page_id", "ctx._source.id")).get().getTotal().getDocs().getCount(); ctx._source.child_count = child_count;",
"lang": "painless"
My question is, how can i make a sub count-query in script to have a real child count in variable child_count
Scripting doesn't work like this — you cannot use java DSL in there. There's no concept of client or QueryBuilders etc in the Painless contexts.
As such, you'll need to obtain the counts before you proceed to update the doc(s) with a script.
Tip: scripts are reusable when you store them:
POST HOST_ADDRESS/_scripts/update_child_count
{
"script": {
"lang": "painless",
"source": "ctx._source.child_count = params.child_count"
}
}
and then apply via the id:
PUT HOST_ADDRESS/pages/_update_by_query
{
"script": {
"id": "update_child_count", <-- no need to write the Painless code again
"params": {
"child_count": 987
}
},
"query": {
"term": {
"parent_page_id": 123
}
}
}

Elasticsearch Deleting all nested object with a specific datetime

I'm using Elasticsearch 5.6 and I have a schedule nested field with nested objects that look like this
{
"status": "open",
"starts_at": "2020-10-13T17:00:00-05:00",
"ends_at": "2020-10-13T18:00:00-05:00"
},
{
"status": "open",
"starts_at": "2020-10-13T18:00:00-05:00",
"ends_at": "2020-10-13T19:30:00-05:00"
}
what I'm looking for is a Painless query that will delete multiple nested objects that is equals to the starts_at field. I've tried multiple ways but none worked, they run correctly but don't delete the targeted objects
Was able to do this with looping over it and using SimpleDateFormat
POST index/_update_by_query
{
"script": {"source": "for(int i=0;i< ctx._source.schedule.length;i++){
SimpleDateFormat sdformat = new SimpleDateFormat('yyyy-MM-dd\\'T\\'HH:mm:ss');
boolean equalDateTime = sdformat.parse(ctx._source.schedule[i].starts_at).compareTo(sdformat.parse(params.starts_at)) == 0;
if(equalDateTime) {
ctx._source.schedule.remove(i)
}
}",
"params": {
"starts_at": "2020-10-13T17:00:00-05:00"
},
"lang": "painless"
},
"query":{
"bool": {"must":[
{"terms":{"_id":["12345"]}}
]}}
}
You can use UpdateByQuery for the same.
POST <indexName>/<type>/_update_by_query
{
"query":{ // <======== Filter out the parent documents containing the specified nested date
"match": {
"schedule.starts_at": "2020-10-13T17:00:00-05:00"
}
},
"script":{ // <============ use the script to remove the schedule containing specific start date
"inline": "ctx._source.schedule.removeIf(e -> e.starts_at == '2020-10-13T17:00:00-05:00')"
}
}

Elasticsearch painless scripting with dates

I have a document which has a date field. I'd like to sort by documents by the this date ASC, but ones with a date in the past i'd like at the end.
In my end, it's like i want to assign the document value to a new value:
- If date is > "utc now", then assign value to whatever the date is
- If date is < "utc now", then assign value to max date
Then, i can sort by this field ASC.
So, it seems the only way to achieve this is with painless scripting.
This is what i've got so far, works.. but not sure if it's the correct approach.
GET /listings/_search
{
"track_total_hits": true,
"from": 0,
"query": {
"match_all": {}
},
"size": 48,
"sort": [
{
"_script" : {
"type": "string",
"script": {
"lang": "painless",
"source": "if (doc['auctionOn.utc'].size() == 0) { return params['maxTimestamp'].toString(); } else { long timestampDoc = doc['auctionOn.utc'].value.toInstant().toEpochMilli();long timestampNow = new Date().getTime();if (timestampDoc > timestampNow) { return timestampDoc.toString(); } else { return params['maxTimestamp'].toString(); } }",
"params": {
"maxTimestamp": 9223372036854776000
}
},
"order": "asc"
}
}
]
}
can someone please advise if this is the correct/performant approach?

Aggregations on nested documents with painless scripting

USING ELASTIC SEARCH 6.2
So I have a deeply nested document structure which has all the proper mapping (nested, text, keyword, etc). A sample document is as follows:
{
"type": "Certain Type",
"lineItems": [
{
"lineValue": 10,
"events": [
{
"name": "CREATED",
"timeStamp": "TIME VALUE"
},
{
"name": "ENDED",
"timeStamp": "TIME VALUE"
}
]
}
]
}
What I want to do is find out the average time required for all lines to go from CREATED to ENDED.
I created the following query
GET /_search
{
"size": 0,
"query": {
"match": {
"type": "Certain Type"
}
},
"aggs": {
"avg time": {
"nested": {
"path": "lineItems.events"
},
"aggs": {
"avg time": {
"avg": {
"script": {
"lang": "painless",
"source": """
long timeDiff = 0;
long fromTime = 0;
long toTime = 0;
if(doc['lineItems.events.name.keyword'] == "CREATED"){
fromTime = doc['lineItems.events.timeValue'].value.getMillis();
}
else if(doc['lineItems.events.name.keyword'] == "ENDED"){
toTime = doc['lineItems.events.timeValue'].value.getMillis();
}
timeDiff = toTime-fromTime;
return (timeDiff)
"""
}
}
}
}
}
}
}
The Result was that I got 0 as the aggregation result which is wrong.
Is there any way to achieve this?
Use doc[ in nested object script does not work as nested are a new document for elastic search.
Use params._source instead (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html). Note access to source would be really slow, if you have a lot of documents or if you need to request this query a lot, consider add this field on main document.
I consider all value exist, add if robustness test if needed, this should work.
long toTime = 0;
long fromTime = 0;
timeDiff = params['_source']['ENDED']
fromTime = params['_source']['CREATED']
return (toTime - fromTime);

Resources