How to rename a nested field containing dots with elasticsearch rename processor and ingest pipeline - elasticsearch

I have a field in elasticsearch (5.5.1) which I need to rename because the name contains a '.' and it is causing various problems. The field I want to rename is nested inside another field.
I am trying to use a Rename Processor in an Ingest Pipeline to do a Reindex as described here: https://stackoverflow.com/a/43142634/5114
Here is my pipeline simulation request (you can copy this verbatim into the Dev Tools utility in Kibana to test it):
POST _ingest/pipeline/_simulate
{
"pipeline" : {
"description": "rename nested fields to remove dot",
"processors": [
{
"rename" : {
"field" : "message.message.group1",
"target_field" : "message_group1"
}
},
{
"rename" : {
"field" : "message.message.group2",
"target_field" : "message.message_group2"
}
}
]
},
"docs":[
{
"_type": "status",
"_id": "1509533940000-m1-bfd7183bf036bd346a0bcf2540c05a70fbc4d69e",
"_version": 5,
"_score": null,
"_source": {
"message": {
"_job-id": "AV8wHJEaa4J0sFOfcZI5",
"message.group1": 0,
"message.group2": "foo"
},
"timestamp": 1509533940000
}
}
]
}
The problem is that I get an error when trying to use my pipeline:
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"header": {
"processor_type": "rename"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field [message.message.group1] doesn't exist"
}
},
"header": {
"processor_type": "rename"
}
}
}
]
}
I think the problem is caused by the field "message.group1" being inside another field ("message"). I'm not sure how to refer to the field I want in the context of the processor. It seems that there could be ambiguity between cases of nested fields, fields containing dots and nested fields containing dots.
I'm looking for the correct way to reference these fields, or if Elasticsearch can not do what I want, confirmation that this is not possible. If Elasticsearch can do this, then it will probably go very fast, else I have to write an external script to pull the documents, transform them, and re-save them to the new index.

Ok, investigating in the Elasticsearch code, I think I know why this won't work.
First we look at the Elasticsearch Rename Processor:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RenameProcessor.java#L76-L84
Object value = document.getFieldValue(field, Object.class);
document.removeField(field);
try {
document.setFieldValue(targetField, value);
} catch (Exception e) {
// setting the value back to the original field shouldn't as we just fetched it from that field:
document.setFieldValue(field, value);
throw e;
}
What this is doing is looking for the field to rename, getting its value, then removing the field and adding a new field with the same value but with the new name.
Now we look at what happens in document.getFieldValue:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L101-L108
public <T> T getFieldValue(String path, Class<T> clazz) {
FieldPath fieldPath = new FieldPath(path);
Object context = fieldPath.initialContext;
for (String pathElement : fieldPath.pathElements) {
context = resolve(pathElement, path, context);
}
return cast(path, context, clazz);
}
Notice it uses a FieldPath object to represent the path to the field in the document.
Now look at how the FieldPath represents the path:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L688
this.pathElements = newPath.split("\\.");
This is splitting the path on any "." character, because that is the delimiter between path elements in field names.
The problem is that the source document has a field named "message.group1", so we need to be able to reference that. Just splitting the path on "." does not account for field names containing a "." in the name. We would need a syntax more like javascript for that, where we could use brackets and quotes to make the dot mean something different.
If the source documents were all transformed so that a "." in the field name would turn that field into an object before saving, then this path scheme would work. But with source documents having field names containing "." we can not reference them in certain contexts.
To solve my problem and reindex my index, I wrote a python script which pulled a batch of documents, transformed them and bulk inserted them in a new index. This is basically what the Elasticsearch reindex api does, but I did it in python instead.

More than two year later, I come across the same issue. You can manage to have your dotted-properties expanded to real nested objects with the the dot_expander processor.
Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor
Issue 37507 on Elasticsearch's Github pointed me in the right direction.

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Is mapping property names with dot is allowed in Elasticsearch Index Management?

For example, a JSON file with key-value pair where the key name has a dot in between it. When this file is uploaded, the dot is treated as next line \n and the name will split into two properties. I tried to use mapper.allow_dots_in_name=True in the setting but no effect.
Similar question posted by someone else, but no reply https://discuss.elastic.co/t/disable-expansion-of-field-names-with-dots-in-mapping/84761
Appreciate if anyone could help.
Elasticsearch 2.4 includes a property where the field names can include a dot. And the field will not get converted to object style mapping.
This setting can be enabled by
export ES_JAVA_OPTS="-Dmapper.allow_dots_in_name=true"
But from 5.x, it is not possible to get a field value with dots without converting it into object mapping. If you index a field like abc.foo.bar (with no explicit mapping). This will get converted to
{
"mappings": {
"properties": {
"abc": {
"properties": {
"foo": {
"properties": {
"bar": {
"type": "long"
}
}
}
}
}
}
}
}
It is best to avoid dots in the field names. You can refer to this documentation, to know more about this

How to use aggregation value on document update using script in Elasticsearch

I am trying to update a document field based on id using script. The value of that field should be MAX(field) * 2. For example consider the following index
PUT /my-index
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"cost": {
"type": "integer"
}
}
}
}
Document will be created with only name field value
POST /my-index/_doc/sp1
{
"name": "Shirt"
}
Once this document was created, I want to update this document with cost value as maximum value of cost in that index (max(cost) * 2). I tried this logic using update API as follows
POST /my-index/_doc/sp1
{
"script" : {
"source": "ctx._source.cost = Math.max(doc['cost'].value) * 2"
}
}
But I couldn't able to achieve this. Encountered the following error
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "static method [java.lang.Math, max/1] not found"
}
How to achieve this scenario
It doesn't work that way. The _update API (which you're not using in your example by the way), only allows you to update a document in its own context. You don't have access to any other document, only to the document itself (via ctx._source or doc) and the script parameters (via params).
There's no way to perform an aggregation on the whole index and update a specific document with the result. You need to do this in two steps from your client application (first query for aggregation results + then index the result into a document) or via the transform API but the latter works in its own way.

Error when trying to rename a nested object name in elasticsearch

I'm trying to rename data that is in this shape:
using this:
POST r_xair_signals-2020-06/_update/2020-06-15T22:23:00Z_-1344027716
{
"doc" : {
"Customer ImpactedNested" : "CustomerImpactedNested"
}
}
But I'm getting:
"type": "mapper_parsing_exception",
"reason": "object mapping for [Customer ImpactedNested] tried to parse field [Customer ImpactedNested] as object, but found a concrete value"
I've confirmed the type of Customer ImpactedNested is nested. I see info online about people getting this error, but not when trying to rename, and don't see any solutions. I saw one article that indicated it occurred when the new name conflicted with an existing name. So, tried renaming to CustomerImpactedNested11111 as a test (sure to be unique), but same result.
Any ideas would be great!
There are two problems actually.
Your query is not renaming the field.
Renaming the nested field
What is happening actually in the following line from the question:
POST r_xair_signals-2020-06/_update/2020-06-15T22:23:00Z_-1344027716
{
"doc" : {
"Customer ImpactedNested" : "CustomerImpactedNested"
}
}
It updates column value of column=Customer ImpactedNested to CustomerImpactedNested document whose id is 2020-06-15T22:23:00Z_-1344027716.
And Customer ImpactedNested is a nested object and you are trying to set a string value to the nested object field. Hence you are getting the error. Refer this
Coming to your original problem, you need to do this via reindex. Refer this, this also
POST _reindex
{
"source": {
"index": "r_xair_signals-2020-06"
},
"dest": {
"index": "<some_new_index_name>"
},
"script": {
"inline": """ctx._source['CustomerImpactedNested'] = ctx._source.remove("Customer ImpactedNested")"""
}
}
Please try the above and let me know for errors as I didn't try the above query.

Why are fields specified by type rather than index in Elasticsearch?

If multiple types in an Elasticsearch index have fields with the same name, those fields must have the same mapping that tries to create a "foobar" property as both string and long"...
For example if you try to PUT the following index mapping:
{
"mappings": {
"type_one": {
"properties": {
"foobar": {
"type": "string"
}
}
},
"type_two": {
"properties": {
"foobar": {
"type": "long"
}
}
}
}
}
...the following error will be returned
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [type_one]: mapper [foobar] cannot be changed from type [long] to [string]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [type_one]: mapper [foobar] cannot be changed from type [long] to [string]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "mapper [foobar] cannot be changed from type [long] to [string]"
}
},
"status": 400
}
The following is from the elasticsearch site:
Conflicts between fields in different types
Fields in the same index with the same name in two different types
must have the same mapping, as they are backed by the same field
internally. Trying to update a mapping parameter for a field which
exists in more than one type will throw an exception, unless you
specify the update_all_types parameter, in which case it will update
that parameter across all fields with the same name in the same index.
If fields with the same name must have the same mapping for all types in the index then why are field mappings specified per type? Why not specify the fields for the entire index, and then identify which fields are assigned to each type.
For example something like this:
{
"fields":{
"PropA":{
"type":"string"
},
"PropB":{
"type":"long"
},
"PropC":{
"type":"boolean"
}
},
"types":{
"foo":[
"PropA",
"PropB"
],
"foo":[
"PropA",
"PropC"
],
"foo":[
"PropA",
"PropC",
"PropC"
]
}
}
Wouldn't a mapping format like this be more succinct, and a better representation of what is actually allowed?
The reason I ask is because I'm working on creating an index template JSON file with about 80 different fields used across 15 types. Many of the fields are used across multiple if not all the types. So anytime I need to update a field I have to make sure I update it for every type where it's used.
Looks like I'm not the only one that found this confusing.
Remove support for types? #15613
Sounds like removing support for multiple types per index and specifying fields at the index level is on the roadmap for future versions.

Resources