Elasticsearch - Create field using script if doesn't exist - elasticsearch

Is there a way to dynamically add fields using scripts? I am running a script that checks whether a field exists. If not then creates it.
I'm trying out:
script: 'if (ctx._source.attending == null) { ctx._source.attending = { events: newField } } else if (ctx._source.attending.events == null) { ctx._source.attending.events = newField } else { ctx._source.attending.events += newField }'
Except unless I have a field in my _source explicitly named attending in my case, I get:
[Error: ElasticsearchIllegalArgumentException[failed to execute script];
nested: PropertyAccessException[
[Error: could not access: attending; in class: java.util.LinkedHashMap]

To check whether a field exists use the ctx._source.containsKey function, e.g.:
curl -XPOST "http://localhost:9200/myindex/message/1/_update" -d'
{
"script": "if (!ctx._source.containsKey(\"attending\")) { ctx._source.attending = newField }",
"params" : {"newField" : "blue" },
"myfield": "data"
}'

I would consider if it's really necessary to see if the field exists at all. Just apply the new mapping to ES and it will add it if it's required and do nothing if it already exists.
Our system re-applies the mappings on every application startup.

Related

Elasticsearch - how to check for key exist script query parameters?

Is there any way to return null if there is no corresponding value?
example) Since params do not have a field4 value, I want the null value to be returned to the syn variable.
"aggs": {
"countries": {
"terms": {
"script": {
"params": {
"field1": "country",
"field2": "test",
"field3": "test"
},
"inline": "def syn = field4; if (syn == null) doc[country].value;"
}
}
}
}
Currently, errors always occur if there is no corresponding value.
"caused_by": {
"type": "missing_property_exception",
"reason": "No such property: field1 for class: e5ce2464b456f9c0fa360269abc927e65998ecf7"
}
I am using groovy and elasticsearch version 2.2
I can't use Python or JavaScript, which requires additional plug-ins to be installed.
How can I get a null value without causing an error if there is no value?
Thanks
You have a boolean value called empty that indicates you whether the document has such a field or not.
So you should do it this way
"inline": "def syn = field4 ?: 'dummy'; return (doc[syn].empty) ? null : doc[syn].value;"
UPDATE: To detect a missing parameter variable in Groovy is trivial if we know the script class name. But since the script class is created dynamically (e.g. e5ce2464b456f9c0fa360269abc927e65998ecf7), it makes the process not trivial at all. One way to circumvent this is to add a try/catch block around your code, so that the code can fail but at least we can catch it, basically like this:
"inline": "try { def syn = field4; return (doc[syn].empty) ? null : doc[syn].value; } catch (e) { return null } "
However, ES introduced some security hardening and class whitelisting for scripting in 2.2. One way to achieve this is to whitelist a few Exception classes in your .java.policy file, like this:
grant {
permission org.elasticsearch.script.ClassPermission "java.lang.Throwable";
permission org.elasticsearch.script.ClassPermission "java.lang.Exception";
permission org.elasticsearch.script.ClassPermission "groovy.lang.GroovyException";
};

Add an object value to a field to Elastic Search during ingest and drop empty valued fields all during ingest

I am ingesting csv data into elasticsearch using the append processor. I already have two fields that are objects (object1 and object2) and I want to append them both into an array of a different field (mainlist). So it would come out as mainlist:[ {object1}, {object}] I have tried the set processor with the copy_from parameter and I am getting an error that I am missing the required property name "value" even though the ElasticSearch documentation clearly doesn't use the "value" property when it uses the "copy_from". {"set": {"field": "mainlist","copy_from": ["object1", "object"]}}. My syntax is even copied exactly from the documentation. Please help.
Furthermore I need to drop empty fields at the ingest level so they are not returned. I don't wish to have "fieldname: "", returned to the user. What is the best way to do that. I am new to ElasticSearch and it has not been going well.
As to dropping the empty fields at ingest level -- set up a pipeline:
PUT _ingest/pipeline/no_empty_fields
{
"description": "Removes empty-ish fields from a doc",
"processors": [
{
"script": {
"source": """
def keys_to_remove = ctx.keySet()
.stream()
.filter(field -> ctx[field] == null ||
ctx[field] == "")
.collect(Collectors.toList());
for (key in keys_to_remove) {
ctx.remove(key);
}
"""
}
}
]
}
and apply it upon indexing
POST myindex/_doc?pipeline=no_empty_fields
{
"fieldname23": 123,
"fieldname": null,
"fieldname123": ""
}
You can of course extend the conditions to ditch other fields such as "undefined", "Infinity" and others.

ElasticSearch: Partial Update a document or remove it. (Opposite of upsert)

In ElasticSearch I'm using upsert to update a document that may not exist:
POST /website/pageviews/1/_update
{
"script" : "ctx._source.online+=1",
"upsert": {
"online": 1
}
}
Since my data are going to change frequently I want to remove my document if online == 0.
It would be useless to use update if I need to get the document and check online value every time, and I don't want to accumulate a lot of trash documents.
Which is the best way to remove my document when online == 0? Something like:
POST /website/pageviews/1/_update
{
"script" : "ctx._source.online-=1",
"remove_doc": "ctx._source.online == 0"
}
You can use the delete operation like this:
POST /website/pageviews/1/_update
{
"script" : "if (online == 0) { ctx.op = 'delete' } else { ctx._source.online += 1 }",
"upsert": {
"online": 1
}
}

How to remove field from document which matches a pattern in elasticsearch using Java?

I have crawled few documents and created an index in elasticsearch. I am using sense to query:
This is my query in elasticsearch:
POST /index/_update_by_query
{
"script": {
"inline": "ctx._source.remove(\"home\")"
},
"query": {
"wildcard": {
"url": {
"value": "http://search.com/*"
}
}
}
}
This is my Java program:
Client client = TransportClient.builder().addPlugin(ReindexPlugin.class)
.build().addTransportAddress(new InetSocketTransportAddress(
InetAddress.getByName("127.0.0.1"), 9300));
UpdateByQueryRequestBuilder ubqrb = UpdateByQueryAction.INSTANCE
.newRequestBuilder(client);
Script script1 = new Script("ctx._source.remove" +FieldName);
BulkIndexByScrollResponse r = ubqrb.source("index").script(script1)
.filter(wildcardQuery("url", patternvalue)).get();
FieldName(where home is saved as a string) is the name of the field which I want to remove from my documents. patternvalue is where pattern "http://search.com/*" is stored. When I run this Java program, it doesn't remove home field from my documents. It adds a new field in my documents called remove. I might be missing something. Any help would be appreciated
If FieldName is the string home, then the expression "ctx._source.remove" +FieldName will be equal to "ctx._source.removehome" which is not the correct script. The correct code for that line is:
Script script1 = new Script("ctx._source.remove(\"" + FieldName + "\")");
This way the script will be:
ctx._source.remove("home")
That is the same as you wrote in json in:
"inline": "ctx._source.remove(\"home\")"
(\" in that json is just a " escaped in the json syntax)

ElasticSearch: Using an existing field in script param

I am trying to create a nested object and set the field value to be a document fields value. I can create a non nested field with my logic value and I can create a nested field with a hard coded value. But I cannot get the two of these things to work together.
Here is what I have so far.
Create a nested field:
{
"script": "ctx._source.displayFields = displayField",
"params": {
"displayField": {
"displayField": 11
}
}
}
Or I can use a script to fetch the value and sent a field like this:
{
"script" : "if (ctx._source['fielda'] == 'term1') {
ctx._source['displayField'] = ctx._source['field2']; }
else if (ctx._source['fielda'] == 'term2') {
ctx._source['displayFields.displayPrice'] = ctx._source['fieldb'];
}
But if I try and put a script in the param field like either of the below I always get an error. Any advice would be greatly appreciated.
Things I have tried and not worked:
{
"script": "ctx._source.displayFields = displayField",
"params": {
"displayField": {
"displayField": "tag"
},
"tag" : {
"script": "ctx._source['numberField']"
}
}
}
As well as trying to assign a script as its subfield or putting it as the value.

Resources