Dynamic Template not working for short, byte & float - elasticsearch

I am trying to create a template, in my template I am trying to achieve the dynamic mapping.
Here is what I wrote, as in 6.2.1 the only boolean, date, double, long, object, string are automatically detected, facing issues for mapping the float, short & byte.
Here if I index 127, it will be mapped to short from the short_fields, it's fine, but when I index some 325566, I am getting exception Numeric value (325566) out of range of Java short, I want to suppress this and let long_fields, should take care about this & it should be mapped to long. I have tried with coerce:false, ignore_malformed:true, none of them worked as expected.
"dynamic_templates": [
{
"short_fields": {
"match": "*",
"match_mapping_type": "long",
"mapping": {
"type": "short",
"doc_values": true
}
}
},
{
"long_fields": {
"match": "*",
"match_mapping_type": "long",
"mapping": {
"type": "long",
"doc_values": true
}
}
},
{
"byte_fields": {
"match": "*",
"match_mapping_type": "byte",
"mapping": {
"type": "byte",
"doc_values": true
}
}
}
]

Unfortunately, it is not possible to make Elasticsearch choose the smallest data type possible for you. There are plenty of workarounds, but let me first explain why it does not work.
Why it does not work?
Dynamic mapping templates allow to override default dynamic type matching in three ways:
by matching the name of the field,
by matching the type Elasticsearch have guessed for you,
and by a path in the document.
Elasticsearch picks the first matching rule that works. In your case, the first rule, short_fields, always works for any integer, because it accepts any field name and a guessed type long.
That's why it works for 127 but doesn't work for 325566.
To illustrate better this point, let's change "matching_mapping_type" in the first rule like this:
"match_mapping_type": "short",
Elasticsearch does not accept it and returns an error:
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [doc]: No field type matched on [short], \
possible values are [object, string, long, double, boolean, date, binary]"
}
But how can we make Elasticsearch pick the right types?
Here are some of the options.
Define strict mapping manually
This gives you full control over the selection of types.
Use the default long
Postpone "shrinking" data until it starts being a performance problem.
In fact, using smaller data types will only affect searching/indexing performance, not the storage required. As long as you are fine with dynamic mappings, Elasticsearch manages them for you pretty well.
Mark field names with type information
Since Elasticsearch is not able to tell a byte from long, you can determine the type beforehand and add type information in the field name, like customerAge_byte or revenue_long.
Then you will be able to use a prefix/suffix match like this:
{
"bytes_as_longs": {
"match_mapping_type": "long",
"match": "*_byte",
"mapping": {
"type": "byte"
}
}
}
Please choose the approach that fit your needs better.
Why Elasticsearch takes longs
The reason why Elasticsearch takes longs for any integer input is probably coming from the JSON definition of a number type (as shown at json.org):
It is not possible to tell if a number 0 or 1 is actually integer or long in the entire dataset. Elasticsearch has to guess the correct type from the first example shown, and it takes the safest shot possible.
Hope that helps!

Related

ElasticSearch: when to use multi-field

We have an index with a keyword field that is very often an ip address, but not always. We'd like to be able to search this index on that field using not just keywords but also CIDR notation, which is supported only for fields of type 'ip'. On the surface, this looks like a use case for multi-fields.
From https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html:
It is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields
So it seems like the following mapping would make sense for us:
{
"mappings": {
"my_field": {
"type": "keyword"
"fields": {
"ip": {
"type": "ip"
"ignore_malformed": true
}
}
}
}
}
So, when our application has a set of non-ip addresses, ip addresses, and CIDR-notation blocks/ranges of ip addresses and needs to query by them, I assume the application would split that set into one set with non-ip addresses and another with ip addresses/CIDR-notation blocks and make two separate terms filters from them in my query, like so:
{
"query": {
"bool": {
"filter": [
{
"terms": {
"my_field.ip": [
"123.123.123.0/24",
"192.168.0.1",
"192.168.16.255",
"192.169.1.0/24"
]
}
},
{
"terms": {
"my_field": [
"someDomain.com",
"notAnIp.net"
]
}
}
]
}
}
}
Is this a proper use of multi-fields? Should we be achieving this some other way? It's unlike the examples given for using multi-fields in that it's really a subset of the values for the field, not all, because I'm using ignore_malformed to discard the non-ip addresses from the sub-field. If there's a better way, what is it?
Yes, your understanding of multi-fields is correct, you just need to understand that you need to explicitly define the sub-field definition(data-type and analyzer) and also map them explicitly so that it uses the defined(data-type and analyzer).
Now once data is indexed in the format you wanted, you can include/exclude the sub-fields based on your use-case.
Multi-fields with multiple analyzers which is very common to implement multi-lingual search is a better example which you can refer.

Elastic Beats - Changing the Field Type of Default Fields in Beats Documents?

I'm still fairly new to the Elastic Stack and I'm still not seeing the entire picture from what I'm reading on this topic.
Let's say I'm using the latest versions of Filebeat or Metricbeat for example, and pushing that data to Logstash output, (which is then configured to push to ES). I want an "out of the box" field from one of these beats to have its field type changed (example: change beat.hostname from it's current default "text" type to "keyword"), what is the best place/practice for configuring this? This kind of change is something I would want consistent across multiple hosts running the same Beat.
I wouldn't change any existing fields since Kibana is building a lot of visualizations, dashboards, SIEM,... on the exptected fields + data types.
Instead extend (add, don't change) the default mapping if needed. On top of the default index template, you can add your own and they will be merged. Adding more fields will require some more disk space (and probably memory when loading), but it should be manageable and avoids a lot of drawbacks of other approaches.
Agreed with #xeraa. It is not advised to change the default template since that field might be used in any default visualizations.
Create a new template, you can have multiple templates for the same index pattern. All the mappings will be merged.The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them.
For your case, probably create a multi-field for any field that needs to be changed. Eg: As shown here create a new keyword multifield, then you can refer the new field as
fieldname.raw
.
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
The other answers are correct but I did the below in Dev console to update the message field from text to text & keyword
PUT /index_name/_mapping
{
"properties": {
"message": {
"type": "match_only_text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10000
}
}
}
}
}

can we mandate elastic search to treat all numeric field as double

I am using dynamic binding while indexing my data. For example
{ "a" : 10 }
will create the mapping for the field as long . While second time while indexing the data may be double { "a" : 10.10 }. but since the mapping is already defined as long it would index data as long. The only way to fix this is defined the mapping in advance, which I dont want to do for various reasons.
So my question - Is there a way I can mandate elastic search to treat all numberic field as double.
You can use dynamic mapping template: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
If it matches as long map it to double:
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "double"
}
}
}
]
}
}
}

Unanalyzed fields on Kibana

i need help to correct kibana field. when I try to visualizing the fields, shown me the following warning:
Careful! The field contains Analyzed selected strings. Analyzed
strings are highly unique and can use a lot of memory to visualize.
Values: such as bar will be foo-foo and bar broken into. See Core
Mapping Types for more information on setting esta field Analyzed as
not
Elasticsearch default dynamic mapping is to analyze any string field (break the field into tokens, for instance: aaa_bbb_ccc will be break down into aaa,bbb and ccc).
If you do not want such behavior you must change the mapping settings
before any document was pushed into the index.
You have two options to do that:
Change the mapping for a particular index using mapping API, in a static way or dynamic way (dynamic means that the mapping will be applies also to fields that still does not exist in the index)
You can change the behavior of any index according to a pattern, using the template API
This example shows a template that changes the mapping for any index that starts with "app", applying "not analyze" to any field in any type and make sure "timestamp" is a date (good for cases in with the timestamp is represented as a number of seconds from 1970):
{
"template": "myindciesprefix*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
},
{
"timestamp_field": {
"match": "timestamp",
"mapping": {
"type": "date"
}
}
}
]
}
}
}
Really you dont have any problem is only a message of info, but if you dont want analyzed fields when you build your index in elasticsearch you must indicate that one field is a not analyzed field.

ElasticSearch Bool Query or Finding Exact Values

I am a bit confused about Bool Query vs. Finding Exact Values in elasticsearch. Specifically, I have a title_field and a post_field that I want to search on. But all of my other fields I use because I want to look up if they exist or not or how many times (like url or username which must be exact).
So I can see from the docs that I can do a multimatch query on the title_field and post_field.
But what about the other fields that I want exact response from? Do I do a boolean query(using must)? Or do I need to remap all of those fields as not_analyzed? Or do I need to map them as not_anayzed first and then do a boolean query?
Indeed, you should map the fields you want to do exact matches on as not_analyzed, which means they are treated as a single token instead of broken into several tokens.
Then you should use a term query or filter to exactly match against the token. If you are using a filter, you can use and, or, and not filters as well (more convenient than bool).
Since mapping all fields is a bit tedious, you could instead use dynamic_mapping to map all string fields as not_analyzed and then simply add a mapping for those fields you do want analyzed:
"dynamic_templates": [
{
"non_analyzed_string": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]

Resources