Proper groovy script for sum of fields in Elasticsearch documents - elasticsearch

This question is a followup to this question.
If my documents look like so:
{"documentid":1,
"documentStats":[ {"foo_1_1":1}, {"foo_2_1":5}, {"boo_1_1":3} ]
}
What would be the correct groovy script to be used in a script_field for returning the sum of all documentStats per document that match a particular pattern, (e.g., contain _1_)

Similar to the referred question, there's a one-liner that does the same thing with your new structure:
{
"query" : {
...
},
"script_fields" : {
"sum" : {
"script" : "_source.documentStats.findAll{ it.keySet()[0] =~'_1_' }.collect{it.values()}.flatten().sum()"
}
}
}

I don't know ES, but in pure Groovy you would do:
document.documentStats.collectMany { Map entry ->
// assumes each entry has a single key and a single int value
def item = entry.entrySet()[0]
item.key.contains('_1_') ? [item.value] : []
}.sum()
Hope this helps.

Related

How to save a specific array element in Logstash

I am receiving a JSON object with an array property. I would like to search the array and save only the element that matches my criteria. My input looks like this:
{
"identifier": [
{ "system" : "Source1", "value" : "TheValueIDontWant"},
{ "system" : "Source2", "value" : "TheValueIWant"}
]
}
and I would like my output to look like this:
{
"SourceID": "TheValueIWant"
}
So in this case, I want to search the identifier array for the element which has Source2 as the system and save its corresponding value to my new property.
Is there a way to do this in Logstash?
Thanks
Got this answer from someone on the elastic forum. Using ruby was indeed the answer and this is how:
ruby {
code => '
ids = event.get("identifier")
if ids.is_a? Array
ids.each { |x|
if x["system"] == "Source2"
event.set("SourceID", x["value"])
end
}
end
'
}

Elasticsearch partial update script: Clear array and replace with new values

I have documents like:
{
MyProp: ["lorem", "ipsum", "dolor"]
... lots of stuff here ...
}
My documents can be quite big (but these MyProp fields are not), and expensive to generate from scratch.
Sometimes I need to update batches of these - it would therefore be beneficial to do a partial update (to save "indexing client" processing power and bandwidth, and thus time) and replace the MyProp values with new values.
Example of original document:
{
MyProp: ["lorem", "ipsum", "dolor"]
... lots of stuff here ...
}
Example of updated document (or rather how it should look):
{
MyProp: ["dolor", "sit"]
... lots of stuff here ...
}
From what I have seen, this includes scripting.
Can anyone enlighten me with the remaining bits of the puzzle?
Bounty added:
I'd like to also have some instructions of how to make these in a batch statement, if possible.
You can use the update by query API in order to do batch updates. This works since ES 2.3 onwards, otherwise you need to install a plugin.
POST index/_update_by_query
{
"script": {
"inline": "ctx._source.myProp += newProp",
"params": {
"newProp": "sit"
}
},
"query": {
"match_all": {}
}
}
You can of course use whatever query you want in order to select the documents on which MyProp needs to be updated. For instance, you could have a query to select documents having some specific MyProp values to be replaced.
The above will only add a new value to the existing array. If you need to completely replace the MyProp array, then you can also change the script to this:
POST index/_update_by_query
{
"script": {
"inline": "ctx._source.myProp = newProps",
"params": {
"newProps": ["dolor", "sit"]
}
},
"query": {
"match_all": {}
}
}
Note that you also need to enable dynamic scripting in order for this to work.
UPDATE
If you simply want to update a single document you can use the partial document update API, like this:
POST test/type1/1/_update
{
"doc" : {
"MyProp" : ["dolor", "sit"]
}
}
This will effectively replace the MyProp array in the specified document.
If you want to go the bulk route, you don't need scripting to achieve what you want:
POST index/type/_bulk
{ "update" : {"_id" : "1"} }
{ "doc" : {"MyProp" : ["dolor", "sit"] } }
{ "update" : {"_id" : "2"} }
{ "doc" : {"MyProp" : ["dolor", "sit"] } }
Would a _bulk update work for you?
POST test/type1/_bulk
{"update":{"_id":1}}
{"script":{"inline":"ctx._source.MyProp += new_param","params":{"new_param":"bla"},"lang":"groovy"}}
{"update":{"_id":2}}
{"script":{"inline":"ctx._source.MyProp += new_param","params":{"new_param":"bla"},"lang":"groovy"}}
{"update":{"_id":3}}
{"script":{"inline":"ctx._source.MyProp += new_param","params":{"new_param":"bla"},"lang":"groovy"}}
....
And you would also need to enable inline scripting for groovy. What the above would do is to add a bla value to the listed documents in MyProp field. Of course, depending on your requirements many other changes can be performed in that script.

elasticsearch filter by length of a string field

i am trying to get records the has in 'title' more then X characters.
NOTE: not all records contains title field.
i have tried:
GET books/_search
{
"filter" : {
"script" : {
"script" : "_source.title.length() > 10"
}
}
}
as a result, i get this error:
GroovyScriptExecutionException[NullPointerException[Cannot invoke method length() on null object
how can i solve it?
You need to take into account that some documents might have a null title field. So you can use the groovy null-safe operator. Also make sure to use the POST method instead:
POST books/_search
{
"filter" : {
"script" : {
"script" : "_source.title?.size() > 10"
}
}
}
You can also use custom tokenizers to count the number of characters. Check this answer for a possible help: https://stackoverflow.com/a/47556098/463846

Can you refer to and filter on a script field in a query expression, after defining it?

I'm new to ElasticSearch and was wondering, once you define a script field with mvel syntax, can you subsequently filter on, or refer to it in the query body as if it was any other field?
I can't find any examples of this while same time I don't see any mention of whether this is possible on the docs page
http://www.elasticsearch.org/guide/reference/modules/scripting/
http://www.elasticsearch.org/guide/reference/api/search/script-fields/
The book ElasticSearch Server doesn't mention if this is possible or not either
As for 2018 and Elastic 6.2 it is still not possible to filter by fields defined with script_fields, however, you can define custom script filter for the same purpose. For example, lets assume that you've defined the following script field:
{
"script_fields" : {
"some_date_fld_year":"doc["some_date_fld"].empty ? null : doc["some_date_fld"].date.year"
}
}
you can filter by it with
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": " (doc["some_date_fld"].empty ? null : doc["some_date_fld"].date.year) >= 2017",
"lang": "painless"
}
}
}
}
}
}
It's not possible for one simple reason: the script_fields are calculated during final stage of search (fetch phase) and only for the records that you retrieve (top 10 by default). The script filter is applied to all records that were not filtered out by preceding filters and it happens during query phase, which precedes the fetch phase. In other words, when filters are applied the script_fields don't exist yet.

Sorting results based on location using elastica

I am trying to learn ElasticSearch using elastica to connect in and finding information hard to get in order to understand how to query it.
So basically what I am trying to do is, I have inserted data into elastic search, added geo coordinates in and now what i need to do is to be able to run a query that will sort the results i get by closest to farthest.
I wanted to find all the stores in my state, then order them by which one is closest to my current location.
so given a field called "state" and field called "point" which is an array holding long/Lat using elastica what would the query be?
Thanks for any help that you can give me.
First, you need to map your location field as type geo_point (this needs to be done before inserting any data)
{
"stores" : {
"properties" : {
"point" : {
"type" : "geo_point"
}
}
}
}
After that, you can simply sort your search by _geo_distance
{
"sort" : [
{
"_geo_distance" : {
"stores.point" : [-70, 40], // <- reference starting position
"order" : "asc",
"unit" : "km"
}
}
],
"query" : {
"match_all" : {}
}
}
For Elastica, have a look at their docs regarding mapping and query building, and read the unit tests.
For those wanting to sort by distance, this cut down snippet details how to use a custom score:
$q = new BoolQuery();
$subQ = new MultiMatchQuery();
$subQ->setQuery('find me')->setFields(array('foo', 'bar'));
$q->addShould($subQ);
$cs = new CustomScore();
$cs->setScript('-doc["location"].distanceInKm(lat,lon)');
$cs->addParams(array(
'lat' => -33.882583,
'lon' => 151.209737
));
$cs->setQuery($q);
Hope it helps.

Resources