Hello Elasticsearch Gurus out there.
Given the following index and doctype:
localhost:9200/myindex/mydoctype
I currently have this index definition:
{
"myindex": {
"aliases": {},
"mappings": {
"mydoctype": {
"properties": {
"theNumber": {
"type": "integer"
},
"theString": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1487158714808",
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1070599"
},
"uuid": "cm2OtivhTO-RjuZPeHvL1w"
}
},
"warmers": {}
}
}
And I was able to add this document:
{
"theNumber" : 0,
"theString" : "zero"
}
But what I wasn't expecting, is that, I am also able to add this document:
{
"theNumber" : 3.1418,
"theString" : 3,
"fiefoe" : "fiefoe"
}
... where the field types doesn't match.
As well as there is a new field/column introduced.
I wasn't expecting this kind of behaviour because of the Mappings I have defined for my index.
Does this have something to do with Elasticsearch being schema-less?
Is it possible to set Elasticsearch to accept only those types and fields for every document added for this index?
Is this how elasticsearch mapping work in the first place? (maybe I didn't know hehehe)
Thanks =)
Elasticsearch uses dynamic mapping, so when it finds a field that doesn't exist in the mapping, it tries to index it by guessing its type.
What you can do it to disable this behavior using dynamic: false in the mapping on the root object. In this case ElasticSearch will ignore the unmapped field.
{
"myindex": {
"aliases": {},
"mappings": {
"mydoctype": {
"dynamic": false, <-----
"properties": {
"theNumber": {
"type": "integer"
},
"theString": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1487158714808",
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1070599"
},
"uuid": "cm2OtivhTO-RjuZPeHvL1w"
}
},
"warmers": {}
}
}
Alternatively, you can use dynamic:strict if you want to throw an exception when an unmapped field is trying to be indexed.
The documentation for this is here.
Kindly allow me to answer my own question...
This setting worked for me in this case:
API URL Request: localhost:9200/myindex/_mapping/mydoctype
HTTP Body:
{
"mydoctype" : {
"dynamic": "strict",
"properties" : {
"theNumber" : {"type" : "integer"},
"theString" : {"type" : "string"},
"stash": {
"type": "object",
"dynamic": false
}
}
}
}
Then I tried adding this object:
{
"theNumber" : 5.55555,
"theString" : 5,
"fiefoe" : "fiefoe"
}
I got this response:
{
"error": "StrictDynamicMappingException[mapping set to strict, dynamic introduction of [fiefoe] within [mydoctype] is not allowed]",
"status": 400
}
Thanks =)!
P.S.
Reference:
https://www.elastic.co/guide/en/elasticsearch/guide/1.x/dynamic-mapping.html
Related
I have an object in Elasticsearch which may contain different fields. In my app this object is Enum so it can't actually contain more than one field at the same time. But when i do an update in Elasticsearch - it appends the fields instead of overwriting the whole object.
For example - the document may be public or accessible only to a group of users:
POST _template/test_template
{
"index_patterns": [
"test*"
],
"template": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "keyword"
},
"users": {
"type": "object",
"properties": {
"permitted": {
"type": "keyword"
},
"public": {
"type": "boolean"
}
}
}
}
}
},
"aliases" : {
"test-alias" : { }
}
}
POST test_doc/_doc/1
{
"id": "1",
"users": {
"permitted": [
"1", "2"
]
}
}
POST _bulk
{"update":{"_index":"test_doc","_type":"_doc","_id":1}}
{"doc":{"id":"1","users":{"public": true}},"doc_as_upsert":true}
GET test-alias/_search
I am expecting this result:
{
"id": "1",
"users": {
"public": true
}
}
But the actual result is:
{
"id": "1",
"users": {
"permitted": [
"1",
"2"
],
"public": true
}
}
At the same time it overwrites the fields with the same name perfectly (i can change the permitted array or public field to false). How do you disable object fields appending?
You need to change the action in bulk request to index from update, correct request would be
{"index":{"_index":"71908768","_id":1}}
{"doc":{"id":"1","users":{"public": true}}}
refer actions and what they do in details in the official Elasticsearch docs. In short, update partially updates the document, while index action Indexes the specified document. If the document exists, replaces the document and increments the version.
According to Elasticsearch's roadmap, mapping types are going to be completely removed at 7.x
How are we going to give a schema structure to Documents without mapping?
For example how would we replace this (A Doc/mapping_type with 3 fields of specific data type):
PUT twitter
{
"mappings": {
"user": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
They are going to remove types (user in you example) from mapping, because there is only 1 type per index now, the rest will be the same:
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
}
As you can see, there is no user type anymore.
I'm running a small ELK 5.4.0 stack server on a single node. When I started, I just took all the defaults, which meant 5 shards for each index. I didn't want the overhead of all those shards, so I created an index template like so:
PUT /_template/logstash
{
"template": "logstash*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
This worked fine, but I just realized that all my raw fields are now missing in ES. For example, "uri" is one of my indexed fields and I used to get "uri.raw" as an unanalyzed version of it. But since I updated the template, they are missing. Looking at the current template shows
GET /_template/logstash
Returns:
{
"logstash": {
"order": 0,
"template": "logstash*",
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
},
"mappings": {},
"aliases": {}
}
}
It seems that the mappings have gone missing. I can pull the mappings off an earlier index
GET /logstash-2017.03.01
and compare it with a recent one
GET /logstash-2017.08.01
Here I see that back in March there was a mapping structure like
mappings: {
"logs": {
"_all": {...},
"dynamic_templates": {...},
"properties": {...}
},
"_default_": {
"_all": {...},
"dynamic_templates": {...},
"properties": {...}
}
}
and now I have only
mappings: {
"logs": {
"properties": {...}
}
}
The dynamic_templates hash holds the information about creating "raw" fields.
My guess is that I need to add to update my index template to
PUT /_template/logstash
{
"template": "logstash*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"logs": {
"_all": {...},
"dynamic_templates": {...},
},
"_default_": {
"_all": {...},
"dynamic_templates": {...},
"properties": {...}
}
}
IOW, everything but logs.properties (which holds the current list of fields being sent over by logstash).
But I'm not an ES expert and now I'm a bit worried. My original index template didn't work out the way I thought it would. Is my above plan going to work? Or am I going to make things worse? Must you always include everything when you create an index template? And where did the mappings for the older indexes, before I had a template file, come from?
When Logstash first starts, the elasticsearch output plugin installs its own index template with the _default_ template and dynamic_templates as you correctly figured out.
Everytime Logstash creates a new logstash-* index (i.e. every day), the template is leveraged and the index is created with the proper mapping(s) present in the template.
What you need to do now is simply to take the official logstash template that you have overridden and reinstall it like this (but with the modified shard settings):
PUT /_template/logstash
{
"template" : "logstash-*",
"version" : 50001,
"settings" : {
"index.refresh_interval" : "5s"
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "norms" : false},
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date", "include_in_all": false },
"#version": { "type": "keyword", "include_in_all": false },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
}
Another way you could have done it is to not overwrite the logstash template, but use any other id, such as _template/my_logstash, so that at index creation time, both templates would have kicked in and used the mappings from the official logstash template and the shard settings from your template.
I am using this request when creating my index:
PUT some_name
{
"mappings": {
"_default_": {
"_timestamp" : {
"enabled": true,
"store": true
},
"properties": {
"properties": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
However, _timestamp field is not being returned, basically when I add a document (without any time field) and request it back. I am running Elasticsearch 1.5, and I have tried "store": "yes", "store": "true".
What am I doing wrong? Thanks.
You need to specifically ask for that field to be returned: "fields": ["_timestamp"] because it's a field that's not commonly returned and is not included in the _source (the default being returned):
GET /some_name/_search
{
"query": {
"match_all": {}
},
"fields": ["_timestamp"]
}
I have bunch of documents coming in from fluentd and I'm saving then to elasticsearch with fluent-plugin-elasticsearch.
Some of those documents have a string under the key name and some have an object.
Example
{
"name": "foo"
}
and
{
"name": {
"en": "foo",
"fi": "bar"
}
}
These documents are the same type in terms of my application and they are saved to same elasticsearch index.
But elasticsearch has an issue with this. When the second document is saved to elasticsearch it throws this error:
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [name]
This seems to happen because elasticsearch has set the key name to be type of string. I can see this using curl http://localhost:9200/fluentd-[tagname]/_mapping and it obviously doesn't like it when I try save an object to it afterwards.
So is there any way to workaround this in elasticsearch?
I cannot control the incoming documents and there are multiple keys with variable types - not just name. So I cannot make a single hack for that key only.
This is pretty annoying since those documents are completely left out of elasticsearch and sent to /dev/null.
If this is completely impossible - is possible to at least save those documents to a file or something so I wouldn't lose them?
Here's my template for the fluentd-* indices:
{
"fluentd_template": {
"template": "fluentd-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index": {
"query": {
"default_field": "msg"
},
"analysis" : {
"analyzer" : {
"default" : {
"type" : "keyword"
}
}
}
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"_source": {
"compress": true
},
"properties": {
"#timestamp": {
"type": "date",
"index": "not_analyzed"
}
}
}
}
}
}