Why are fields specified by type rather than index in Elasticsearch? - elasticsearch

If multiple types in an Elasticsearch index have fields with the same name, those fields must have the same mapping that tries to create a "foobar" property as both string and long"...
For example if you try to PUT the following index mapping:
{
"mappings": {
"type_one": {
"properties": {
"foobar": {
"type": "string"
}
}
},
"type_two": {
"properties": {
"foobar": {
"type": "long"
}
}
}
}
}
...the following error will be returned
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [type_one]: mapper [foobar] cannot be changed from type [long] to [string]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [type_one]: mapper [foobar] cannot be changed from type [long] to [string]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "mapper [foobar] cannot be changed from type [long] to [string]"
}
},
"status": 400
}
The following is from the elasticsearch site:
Conflicts between fields in different types
Fields in the same index with the same name in two different types
must have the same mapping, as they are backed by the same field
internally. Trying to update a mapping parameter for a field which
exists in more than one type will throw an exception, unless you
specify the update_all_types parameter, in which case it will update
that parameter across all fields with the same name in the same index.
If fields with the same name must have the same mapping for all types in the index then why are field mappings specified per type? Why not specify the fields for the entire index, and then identify which fields are assigned to each type.
For example something like this:
{
"fields":{
"PropA":{
"type":"string"
},
"PropB":{
"type":"long"
},
"PropC":{
"type":"boolean"
}
},
"types":{
"foo":[
"PropA",
"PropB"
],
"foo":[
"PropA",
"PropC"
],
"foo":[
"PropA",
"PropC",
"PropC"
]
}
}
Wouldn't a mapping format like this be more succinct, and a better representation of what is actually allowed?
The reason I ask is because I'm working on creating an index template JSON file with about 80 different fields used across 15 types. Many of the fields are used across multiple if not all the types. So anytime I need to update a field I have to make sure I update it for every type where it's used.

Looks like I'm not the only one that found this confusing.
Remove support for types? #15613
Sounds like removing support for multiple types per index and specifying fields at the index level is on the roadmap for future versions.

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

How to rename a nested field containing dots with elasticsearch rename processor and ingest pipeline

I have a field in elasticsearch (5.5.1) which I need to rename because the name contains a '.' and it is causing various problems. The field I want to rename is nested inside another field.
I am trying to use a Rename Processor in an Ingest Pipeline to do a Reindex as described here: https://stackoverflow.com/a/43142634/5114
Here is my pipeline simulation request (you can copy this verbatim into the Dev Tools utility in Kibana to test it):
POST _ingest/pipeline/_simulate
{
"pipeline" : {
"description": "rename nested fields to remove dot",
"processors": [
{
"rename" : {
"field" : "message.message.group1",
"target_field" : "message_group1"
}
},
{
"rename" : {
"field" : "message.message.group2",
"target_field" : "message.message_group2"
}
}
]
},
"docs":[
{
"_type": "status",
"_id": "1509533940000-m1-bfd7183bf036bd346a0bcf2540c05a70fbc4d69e",
"_version": 5,
"_score": null,
"_source": {
"message": {
"_job-id": "AV8wHJEaa4J0sFOfcZI5",
"message.group1": 0,
"message.group2": "foo"
},
"timestamp": 1509533940000
}
}
]
}
The problem is that I get an error when trying to use my pipeline:
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"header": {
"processor_type": "rename"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field [message.message.group1] doesn't exist"
}
},
"header": {
"processor_type": "rename"
}
}
}
]
}
I think the problem is caused by the field "message.group1" being inside another field ("message"). I'm not sure how to refer to the field I want in the context of the processor. It seems that there could be ambiguity between cases of nested fields, fields containing dots and nested fields containing dots.
I'm looking for the correct way to reference these fields, or if Elasticsearch can not do what I want, confirmation that this is not possible. If Elasticsearch can do this, then it will probably go very fast, else I have to write an external script to pull the documents, transform them, and re-save them to the new index.
Ok, investigating in the Elasticsearch code, I think I know why this won't work.
First we look at the Elasticsearch Rename Processor:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RenameProcessor.java#L76-L84
Object value = document.getFieldValue(field, Object.class);
document.removeField(field);
try {
document.setFieldValue(targetField, value);
} catch (Exception e) {
// setting the value back to the original field shouldn't as we just fetched it from that field:
document.setFieldValue(field, value);
throw e;
}
What this is doing is looking for the field to rename, getting its value, then removing the field and adding a new field with the same value but with the new name.
Now we look at what happens in document.getFieldValue:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L101-L108
public <T> T getFieldValue(String path, Class<T> clazz) {
FieldPath fieldPath = new FieldPath(path);
Object context = fieldPath.initialContext;
for (String pathElement : fieldPath.pathElements) {
context = resolve(pathElement, path, context);
}
return cast(path, context, clazz);
}
Notice it uses a FieldPath object to represent the path to the field in the document.
Now look at how the FieldPath represents the path:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L688
this.pathElements = newPath.split("\\.");
This is splitting the path on any "." character, because that is the delimiter between path elements in field names.
The problem is that the source document has a field named "message.group1", so we need to be able to reference that. Just splitting the path on "." does not account for field names containing a "." in the name. We would need a syntax more like javascript for that, where we could use brackets and quotes to make the dot mean something different.
If the source documents were all transformed so that a "." in the field name would turn that field into an object before saving, then this path scheme would work. But with source documents having field names containing "." we can not reference them in certain contexts.
To solve my problem and reindex my index, I wrote a python script which pulled a batch of documents, transformed them and bulk inserted them in a new index. This is basically what the Elasticsearch reindex api does, but I did it in python instead.
More than two year later, I come across the same issue. You can manage to have your dotted-properties expanded to real nested objects with the the dot_expander processor.
Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor
Issue 37507 on Elasticsearch's Github pointed me in the right direction.

Is it possible to update an existing field in an index through mapping in Elasticsearch?

I've already created an index, and it contains data from my MySQL database. I've got few fields which are string in my table, where I need them as different types (integer & double) in Elasticsearch.
So I'm aware that I could do it through mapping as follows:
{
"mappings": {
"my_type": {
"properties": {
"userid": {
"type": "text",
"fielddata": true
},
"responsecode": {
"type": "integer"
},
"chargeamount": {
"type": "double"
}
}
}
}
}
But I've tried this when I'm creating the index as a new one. What I wanted to know is how can I update an existing field (ie: chargeamount in this scenario) using mapping as a PUT?
Is this possible? Any help could be appreciated.
Once a mapping type has been created, you're very constrained on what you can update. According to the official documentation, the only changes you can make to an existing mapping after it's been created are the following, but changing a field's type is not one of them:
In general, the mapping for existing fields cannot be updated. There
are some exceptions to this rule. For instance:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.

Elasticsearch 2.3 put mapping (Attempting to override date field type) error

I have some birth_dates that I want to store as a string. I don't plan on doing any querying or analysis on the data, I just want to store it.
The input data I have been given is in lots of different random formats and some even include strings like (approximate). Elastic has determined that this should be a date field with a date format which means when elastic receives a date like 1981 (approx) it freaks out and says the input is in an invalid format.
Instead of changing input dates I want to change the date type to string.
I have looked at the documentation and have been trying to update the mapping with the PUT mapping API, but elastic keeps returning a parsing error.
based on the documentation here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
I have tried:
PUT /sanctions_lists/eu_financial_sanctions/_mapping
{
"mappings":{
"eu_financial_sanctions":{
"properties": {
"birth_date": {
"type": "string", "index":"not_analyzed"
}
}
}
}
}
but returns:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [mappings : {eu_financial_sanctions={properties={birth_date={type=string, index=not_analyzed}}}}]"
}
],
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [mappings : {eu_financial_sanctions={properties={birth_date={type=string, index=not_analyzed}}}}]"
},
"status": 400
}
Question Summary
Is it possible to override elasticsearch's automatically determined date field, forcing string as the field type?
NOTE
I'm using the google chrome sense plugin to send the requests
Elastic search version is 2.3
Just remove type reference and mapping from url, you have them inside request body. More examples.
PUT /sanctions_lists
{
"mappings":{
"eu_financial_sanctions":{
"properties": {
"birth_date": {
"type": "string", "index":"not_analyzed"
}
}
}
}
}

Unanalyzed fields on Kibana

i need help to correct kibana field. when I try to visualizing the fields, shown me the following warning:
Careful! The field contains Analyzed selected strings. Analyzed
strings are highly unique and can use a lot of memory to visualize.
Values: such as bar will be foo-foo and bar broken into. See Core
Mapping Types for more information on setting esta field Analyzed as
not
Elasticsearch default dynamic mapping is to analyze any string field (break the field into tokens, for instance: aaa_bbb_ccc will be break down into aaa,bbb and ccc).
If you do not want such behavior you must change the mapping settings
before any document was pushed into the index.
You have two options to do that:
Change the mapping for a particular index using mapping API, in a static way or dynamic way (dynamic means that the mapping will be applies also to fields that still does not exist in the index)
You can change the behavior of any index according to a pattern, using the template API
This example shows a template that changes the mapping for any index that starts with "app", applying "not analyze" to any field in any type and make sure "timestamp" is a date (good for cases in with the timestamp is represented as a number of seconds from 1970):
{
"template": "myindciesprefix*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
},
{
"timestamp_field": {
"match": "timestamp",
"mapping": {
"type": "date"
}
}
}
]
}
}
}
Really you dont have any problem is only a message of info, but if you dont want analyzed fields when you build your index in elasticsearch you must indicate that one field is a not analyzed field.

Resources