Specifying Field Types Indexing from Logstash to Elasticsearch - elasticsearch

I have successfully ingested data using the XML filter plugin from Logstash to Elasticsearch, however all the field types are of the type "text."
Is there a way to manually or automatically specify the correct type?

I found the following technique good for my use:
Logstash would filter the data and change a field from the default - text to whatever form you want. The documentation would be found here. The example given in the documentation is:
filter {
mutate {
convert => { "fieldname" => "integer" }
}
}
This you add in the /etc/logstash/conf.d/02-... file in the body part. I believe the downside of this practice is that from my understanding it is less recommended to alter data entering the ES.
After you do this you will probably get the this problem. If you have this problem and your DB is a test DB that you can erase all old data just DELETE the index until now that there would not be a conflict (for example you have a field that was until now text and now it is received as date there would be a conflict between old and new data). If you can't just erase the old data then read into the answer in the link I linked.

What you want to do is specify a mapping template.
PUT _template/template_1
{
"index_patterns": ["te*", "bar*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"type1": {
"_source": {
"enabled": false
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z YYYY"
}
}
}
}
}
Change the settings to match your needs such as listing the properties to map what you want them to map to.
Setting index_patterns is especially important because it tells elastic how to apply this template. You can set an array of index patterns and can use * as appropriate for wildcards. i.e logstash's default is to rotate by date. They will look like logstash-2018.04.23 so your pattern could be logstash-* and any that match the pattern will receive the template.
If you want to match based on some pattern, then you can use dynamic templates.
Edit: Adding a little update here, if you want logstash to apply the template for you, here is a link to the settings you'll want to be aware of.

Related

Elastic Beats - Changing the Field Type of Default Fields in Beats Documents?

I'm still fairly new to the Elastic Stack and I'm still not seeing the entire picture from what I'm reading on this topic.
Let's say I'm using the latest versions of Filebeat or Metricbeat for example, and pushing that data to Logstash output, (which is then configured to push to ES). I want an "out of the box" field from one of these beats to have its field type changed (example: change beat.hostname from it's current default "text" type to "keyword"), what is the best place/practice for configuring this? This kind of change is something I would want consistent across multiple hosts running the same Beat.
I wouldn't change any existing fields since Kibana is building a lot of visualizations, dashboards, SIEM,... on the exptected fields + data types.
Instead extend (add, don't change) the default mapping if needed. On top of the default index template, you can add your own and they will be merged. Adding more fields will require some more disk space (and probably memory when loading), but it should be manageable and avoids a lot of drawbacks of other approaches.
Agreed with #xeraa. It is not advised to change the default template since that field might be used in any default visualizations.
Create a new template, you can have multiple templates for the same index pattern. All the mappings will be merged.The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them.
For your case, probably create a multi-field for any field that needs to be changed. Eg: As shown here create a new keyword multifield, then you can refer the new field as
fieldname.raw
.
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
The other answers are correct but I did the below in Dev console to update the message field from text to text & keyword
PUT /index_name/_mapping
{
"properties": {
"message": {
"type": "match_only_text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10000
}
}
}
}
}

ElasticSearch 6, copy_to with dynamic index mappings

Maybe I'm missing something simple, but still could not figure out the following thing:
As of ES 6.x the _all field is deprecated, and instead it's suggested to use the copy_to instruction (https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html).
However, I got an impression that you need to explicitly specify the fields which you want to copy to the custom _all field. But if I use dynamic mappings, I don't know the fields in advance, and therefore cannot use copy_to?
Any way I can tell ES to copy all encountered fields to the custom _all field so that I can search across all fields?
Thanks in advance!
You could use Dynamic Templates. Basically create an index, add the custom catch_all field and then specify that particular property for all the fields that are strings. (Haven't done this before, but I believe this is the only way now. Since the field catch_all will be already present when you put the dynamic template, it will not match the catch_all - meaning that the catch_all will not copy to itself, but check it out yourself to make sure).
PUT my_index
{
"mappings": {
"_doc": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"copy_to": "catch_all"
}
}
}
]
}
}
}

Unanalyzed fields on Kibana

i need help to correct kibana field. when I try to visualizing the fields, shown me the following warning:
Careful! The field contains Analyzed selected strings. Analyzed
strings are highly unique and can use a lot of memory to visualize.
Values: such as bar will be foo-foo and bar broken into. See Core
Mapping Types for more information on setting esta field Analyzed as
not
Elasticsearch default dynamic mapping is to analyze any string field (break the field into tokens, for instance: aaa_bbb_ccc will be break down into aaa,bbb and ccc).
If you do not want such behavior you must change the mapping settings
before any document was pushed into the index.
You have two options to do that:
Change the mapping for a particular index using mapping API, in a static way or dynamic way (dynamic means that the mapping will be applies also to fields that still does not exist in the index)
You can change the behavior of any index according to a pattern, using the template API
This example shows a template that changes the mapping for any index that starts with "app", applying "not analyze" to any field in any type and make sure "timestamp" is a date (good for cases in with the timestamp is represented as a number of seconds from 1970):
{
"template": "myindciesprefix*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
},
{
"timestamp_field": {
"match": "timestamp",
"mapping": {
"type": "date"
}
}
}
]
}
}
}
Really you dont have any problem is only a message of info, but if you dont want analyzed fields when you build your index in elasticsearch you must indicate that one field is a not analyzed field.

ElasticSearch Mapping: is it possible to auto-truncate a date to fit it's format?

On our project we're using NEST to insert data into ElasticSearch (1.7). We'd like to be able to force ES to truncate all dates towards the mapped format.
Mapping example:
"dateFrom" : {
"type": "date",
"format": "dateHourMinute" // Or yyyy-MM-dd'T'HH:mm
}
Data example:
{
"dateFrom" : 2015-12-21T15:55:00.000Z
}
Inserting this data throws an IllegalArgumentException:
Invalid format: "2015-12-21T15:55:00.000Z" is malformed at ":00.000Z"
Obviously we don't need the last part of the date. Can't we configure ES to just truncate it instead of erroring out?
Keep in mind we're using 1.7 right now, since date formatting seems to have changed in recent versions...
In order to get the data to index correctly I could change the data type to date_optional_time (supported in 1.7)
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date",
"format": "date_optional_time"
}
}
}
}
}
This will allow you to contribute date with time being optional.
such as:
PUT /my_index/my_type/1
{
"date": "2015-12-21"
}
or as you have it
PUT /my_index/my_type/2
{
"date": "2015-12-21T15:55:00.000Z"
}
Both are now valid submissions. I don't know of any transformation approaches within ES to support a truncation or transformation of field data at time of index. I would think if you want to parse the data and remove the time pre-submission you will need to do that outside of ES when you create the JSON object.
It appears ES is currently not capable of editing dates through a custom mapping. We ended up using JsonConverters (like this) to drop seconds and millis before inserting them into ES.

Elasticsearch change field date format

When i created my index and type a while ago I specified the date format of a field in the mapping as:
{"type": "date","format" : "dd/MM/yyyy HH:mm:ss"}
Is there a way to change the format of the field knowing that now i have more than 6000 docs indexed in my index ? I want the format to be:
{"type": "date","format" : "dd-MM-yyyy HH:mm:ss"}
You can update format mapping on an existing date field with the PUT mapping API:
PUT twitter/_mapping/user
{
"properties": {
"myDate": {
"format": "dd-MM-yyyy HH:mm:ss"
}
}
}
format is one of the few mappings that can be updated on existing fields without losing data
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/mapping-date-format.html
You cannot change field mappings after you have indexed documents into Elasticsearch. You can add new fields but you cannot change existing fields.
You could create a new index with the new mappings and then re-index all the data into it. You could then delete the old index and create a new index alias with the old name point to the new index.
There are a few strategies documented for minimizing downtime when changing mappings in the Elasticsearch blog: https://www.elastic.co/blog/changing-mapping-with-zero-downtime
Overall I'd highly suggest using index aliases - they provide a high level of abstraction and flexibility over using index names directly within your application. Perfect for situations like this where you want to make a change to the underlying index: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html
For elastic version <7.0 where mapping type is not deprecated
you can use something like this
PUT inf/_mapping/_doc
{
"properties": {
"ChargeDate": {
"type":"date",
"format": "dd-MM-yyyy HH:mm:ss"
}
}
}
where inf is your index and _doc is mapping type(which is deprecated in v >7.0)
or
PUT inf
{
"mappings": {
"_doc": {
"properties": {
"ChargeDate": {
"type":"date",
"format": "dd-MM-yyyy HH:mm:ss"
}
}
}
}
}

Resources