elasticsearch filter by length of a string field - elasticsearch

i am trying to get records the has in 'title' more then X characters.
NOTE: not all records contains title field.
i have tried:
GET books/_search
{
"filter" : {
"script" : {
"script" : "_source.title.length() > 10"
}
}
}
as a result, i get this error:
GroovyScriptExecutionException[NullPointerException[Cannot invoke method length() on null object
how can i solve it?

You need to take into account that some documents might have a null title field. So you can use the groovy null-safe operator. Also make sure to use the POST method instead:
POST books/_search
{
"filter" : {
"script" : {
"script" : "_source.title?.size() > 10"
}
}
}

You can also use custom tokenizers to count the number of characters. Check this answer for a possible help: https://stackoverflow.com/a/47556098/463846

Related

Use of field and script in Facet

I am new in elastic search. I am trying to understand this query but I could not succed in few things like field and script. I read official documents and get that Facet has been removed my aggregation and Attributelabels is a facet name but I could not understand full query. Can anyone explain it to me ?
Thank you
{
"size" : 0,
"facets" : {
"AttributeLabels" : {
"terms" : {
"field" : "field",
"size" : 50,
"script" : "scriptName",
"lang" : "lang"
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-facets-terms-facet.html
Field - The field that the facet is evaluated on
Size - The number of top terms that are returned
Script - Either returns a string "term + 'aaa'" which becomes the term that is evaluated on or a boolean "term == 'aaa' ? true : false" which includes or excludes it from the facet collection
Lang - The scripting language used
But if you can, I recommend upgrading to Elasticsearch 2. :)

Proper groovy script for sum of fields in Elasticsearch documents

This question is a followup to this question.
If my documents look like so:
{"documentid":1,
"documentStats":[ {"foo_1_1":1}, {"foo_2_1":5}, {"boo_1_1":3} ]
}
What would be the correct groovy script to be used in a script_field for returning the sum of all documentStats per document that match a particular pattern, (e.g., contain _1_)
Similar to the referred question, there's a one-liner that does the same thing with your new structure:
{
"query" : {
...
},
"script_fields" : {
"sum" : {
"script" : "_source.documentStats.findAll{ it.keySet()[0] =~'_1_' }.collect{it.values()}.flatten().sum()"
}
}
}
I don't know ES, but in pure Groovy you would do:
document.documentStats.collectMany { Map entry ->
// assumes each entry has a single key and a single int value
def item = entry.entrySet()[0]
item.key.contains('_1_') ? [item.value] : []
}.sum()
Hope this helps.

How to return all documents where a string occurs in the document at least N times

If I wanted to return all documents that contain the term beetlejuice, I could use a query like
{
"bool":{
"should":[
{
"term":{
"description":"beetlejuice"
}
}
]
}
}
What's not clear is how to return all documents where the description field contains the string beetlejuice at least 3 times within it. I see minimum_should_match, but I think that is to be used for separate queries in a bool. How can I craft a query to match when a word occurs at least N times within the document's description field?
You can use scripting for achieving what you desired.
Basically, all you need is the term frequency of the desired term in the document field and you can access the value using scripts.
_index['FIELD']['TERM'].tf()
Sample Filter Script :
"filter" : {
"script" : {
"script" : "_index['description']['beetlejuice'].tf() > N",
"params" : {
"N" : 2
}
}
}

elastic search filter by documents count in nested document

I have this schema in elastic search.
79[
'ID' : '1233',
Geomtries:[{
'doc1' : 'F1',
'doc2' : 'F2'
},
(optional for some of the documents)
{
'doc2' : 'F1',
'doc3' : 'F2'
}]
]
the Geometries is a nested element.
I want to get all of the documents that have one object inside Geometries.
Tried so far :
"script" : {"script" : "if (Geomtries.size < 2) return true"}
But i get exceptions : no such property GEOMTRIES
If you have the field as type nested in the mapping, the typical doc[fieldkey].values.size() approached does not seem to work. I found the following script to work:
{
"from" : 0,
"size" : <SIZE>,
"query" : {
"filtered" : {
"filter" : {
"script" : {
"script" : "_source.containsKey('Geomtries') && _source['Geomtries'].size() == 1"
}
}
}
}
}
NB: You must use _source instead of doc.
The problem is in the way you access fields in your script, use:
doc['Geometry'].size()
or
_source.Geometry.size()
By the way for performance reasons, I would denormalize and add GeometryNumber field. You can use the transform mapping to compute size at index time.

Can you refer to and filter on a script field in a query expression, after defining it?

I'm new to ElasticSearch and was wondering, once you define a script field with mvel syntax, can you subsequently filter on, or refer to it in the query body as if it was any other field?
I can't find any examples of this while same time I don't see any mention of whether this is possible on the docs page
http://www.elasticsearch.org/guide/reference/modules/scripting/
http://www.elasticsearch.org/guide/reference/api/search/script-fields/
The book ElasticSearch Server doesn't mention if this is possible or not either
As for 2018 and Elastic 6.2 it is still not possible to filter by fields defined with script_fields, however, you can define custom script filter for the same purpose. For example, lets assume that you've defined the following script field:
{
"script_fields" : {
"some_date_fld_year":"doc["some_date_fld"].empty ? null : doc["some_date_fld"].date.year"
}
}
you can filter by it with
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": " (doc["some_date_fld"].empty ? null : doc["some_date_fld"].date.year) >= 2017",
"lang": "painless"
}
}
}
}
}
}
It's not possible for one simple reason: the script_fields are calculated during final stage of search (fetch phase) and only for the records that you retrieve (top 10 by default). The script filter is applied to all records that were not filtered out by preceding filters and it happens during query phase, which precedes the fetch phase. In other words, when filters are applied the script_fields don't exist yet.

Resources