Elasticsearch - template matcing based on field value - elasticsearch

Imagine this document:
{
"_index": "project.datasync.20180101",
"_type": "com.redhat.viaq.common",
"service": "data-sync-server",
"data": {
"foo":"bar"
}
...
}
I would like to have mapping for "data.foo" field (imagine I need some changes in how it is indexed etc.)
I know I can match indices like this:
{
"template" : "project.datasync.*",
"order" : 100,
"mappings": {
"data": {
"enabled": true,
"properties": {
"foo": {"type": "string", "index": "not_analyzed", ...}
}
}
}
}
However, the datasync part in the index name comes from somewhere else and there's no guarantee that it will be datasync or something similar that matches a pattern.
So, my index template wouldn't match if the index is project.thedatasync.20180101.
I know I can use project.* in my index template for matching, but in that case it is too generic where it matches irrelevant things.
So, I would like to have this mapping active only when service is data-sync-server which is always true for the documents that I am interested in.
Any ideas? This seemed like something fundamentally against how ElasticSearch works and in that case I would like to clarify that.
Please note that documents are sent to ElasticSearch with Fluentd I don't have access to Fluentd config to change the index name there.

Related

can ElasticSearch only add field index ,no save the orignal value just like lucene Field.Store.NO

I have a big size field in MySQL and do not want to save the original value to ElasticSearch. Is there a method just like Lucene Field.Store.NO?
Thanks.
You just need to define the "store" mapping accordingly, eg. :
PUT your-index
{
"mappings": {
"properties": {
"some_field": {
"type": "text",
"index": true,
"store": false
}
}
}
}
You may also want to disable the _source field :
#disable-source-field
The _source field contains the original JSON document body that was passed at index time [...] Though very handy to have around, the source field does incur storage overhead within the index.
For this reason, it can be disabled as follows:
PUT your-index
{
"mappings": {
"_source": {
"enabled": false
}
}
}

Elasticsearch Nested-field vs Depth? Check for document depth via Kibana?

I'm reading about mapping in elasticsearch and I see these 2 terms: Nested-field & Depth. I think these 2 terms are quite equivalent. I'm currently confused by these 2. Please can anyone clear me out? Thank you.
And btw, are there any ways to check a document depth via Kibana?
Sorry for my english.
The source of confusion is probably because in Elasticsearch term nested can be used in two different contexts:
"nested" as a regular JSON notation nested, i.e. JSON object within JSON object;
"nested" as Elasticsearch nested data type.
In the mappings documentation page when they mention "depth" they refer to the first meaning. Here the setting index.mapping.depth.limit defines how deeply nested can your JSON documents be.
How is JSON depth interpreted by Elasticsearch mapping?
Here is an example of JSON document with depth 1:
{
"name": "John",
"age": 30
}
Now with depth 2:
{
"name": "John",
"age": 30,
"cars": {
"car1": "Ford",
"car2": "BMW",
"car3": "Fiat"
}
}
By default (as of ES 6.3) the depth cannot exceed 20.
What is a nested data type and why isn't it the same as a document with depth>1?
nested data type allows to index arrays of objects and query their items individually via nested query. What this means is that Elasticsearch will index a document with such fields differently (see the page Nested Objects of the Definitive Guide for more explanation).
For instance, if in the following example we do not define "user" as nested field in the mapping, a query for user.first: John and user.last: White will return a match and it will be a mistake:
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
If we do, Elasticsearch will index each item of the "user" list as an implicit sub-document and thus will use more resources, more disk and memory. This is why there is also another setting on the mappings: index.mapping.nested_fields.limit regulates how many different nested fields one can declare (which defaults to 50). To customize this you can see this answer.
So, Elasticsearch documents with depth > 1 are not indexed as nested unless you explicitly ask it to do so, and that's the difference.
Can I have nested fields inside nested?
Yes, you can! Just to stop this confusion, yes, you can define a nested field inside nested field in a mapping. It will look something like this:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"user": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"cars": {
"type": "nested",
"properties": {
"brand": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
But keep in mind that the amount of implicit documents to be indexed will be multiplied, and it will be simply not that efficient.
Can I get the depth of my JSON objects from Kibana?
Most likely you can do it with scripts, check this blog post for further details: Using Painless in Kibana scripted fields.

Excluding field from _source causes aggregation to not work

We're using Elasticsearch 1.7.2 and trying to use the "include/exclude from _source" feature as it's described here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
We have a field types that's 'pretty' and that we would like to return to the client but it's not well suited to aggregations, and a field types_int (and also a types_string but that's not relevant now) that's 'ugly' but optimized for search/aggregations which we don't want to return to the client but that we want to aggregate/filter on.
The field types_int doesn't need to be stored anywhere, it just needs to be indexed. We don't want to waste bandwidth in returning it to the client either, so we don't want to include it in _source.
The mapping for it looks like this:
"types_int": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"value_int": {
"type": "integer"
}
}
}
However, after we add the exclude, our filters/aggregations on it stop working.
The excludes looks like this:
"_source": {
"excludes": [
"types_int"
]
}
Without that in the mapping, everything works fine.
An example of a filter:
POST my_index/my_type/_search
{
"filter": {
"nested": {
"path": "types_int",
"filter": {
"term": {
"types_int.name": "<something>"
}
}
}
}
}
Again, removing the excludes and everything works fine.
Thinking it might have something to do with nested types, since they're separate documents and all and perhaps handled differently from normal fields, I added an exclude mapping for a 'normal' value type field and then my filter also stopped working.
"publication": {
"type": "string",
"index": "not_analyzed"
}
"_source": {
"excludes": [
"publication"
]
}
So my conclusion is that after you exclude something from _source, you can no longer filter on it? Which doesn't make sense to me, so I'm thinking there's something we're doing wrong here. The _source include/exclude is just a post-process action that manipulates the string data inside that field, right?
I understand that we can also use source filtering to request specific fields to not be included at query time, but it's simply unnecessary to store it. If anything, I would just like to understand why this doesn't work :)

Elasticsearch search yields no results, analyzers might be the issue

Elasticsearch version: 1.6.0
I've been using elasticsearch for the last months (just started) and now I'm running into problems with it. Here is some info about my database:
The index I'm using uses the default dynamic mapping (eg: I haven't tinkered with its mapping). My objects should be schema-free. Also the index uses the default analyzer (I haven't touched that either) so index/_settings looks like this:
{
"default": {
"settings": {
"index": {
"creation_date": "1441808338958",
"uuid": "34Yn1_ixSqOzp9UotOE_4g",
"number_of_replicas": "1",
"number_of_shards": "1",
"version": {
"created": "1060099"
}
}
}
}
}
Here's the issue I'm having: on some field values the search does not work as expected (I concluded it's because of the analyzer). Example: the field email has the value user#example.com; {"query":{"bool":{"must":[{"term":{"user.email":"user#example.com"}}]}} won't work, but having the term value as just "user" works (because it somehow tokenizes it, and there is no token with the full email address).
Here's what I want: I want both wildcard text searches (finding a bad word in a comment's text) AND strict searches (like on email for example) on any field, then I'll be using bool and should with either term or wildcard.
The problem is I just can't tell him "ok, on this field you should use the X analyzer" because all my fields are dynamic.
What I've tried: On the index's settings I PUT-ed this: {"analysis":{"analyzer":{"default":{"type":"keyword"}}}}; doesnt' work: nothing changed (I also didn't forget to close the index before doing so and open it).
Is this issue even related to analyzers ?
This query won't work
{"query":{"bool":{"must":[{"term":{"user.email":"user#example.com"}}]}}
Term is exact match, meaning whatever your value for that field ("user#example.com" in your case) must match whatever tokens ES has for that field.
When you don't assign any analyzer for that field, ES will assume you are using standard analyzer for that field. When this "user#example.com" indexed, it will be tokenized into ("user","example","com").
To solve your problem you have to tell ES to "not_analyzed" the email field in your index's mapping.
With the help of Ryan Huynh I've solved my issue:
Use dynamic mappings; create the index like so:
PUT /index
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_template": {
"mapping": {
"index": "not_analyzed",
"type": "string"
},
"match_mapping_type": "string",
"match": "*"
}
}
]
}
}

How to define a mapping in elasticsearch that doesn't accept fields other that the mapped ones?

Ok, in my elastisearch I am using the following mapping for an index:
{
"mappings": {
"mytype": {
"type":"object",
"dynamic" : "false",
"properties": {
"name": {
"type": "string"
},
"address": {
"type": "string"
},
"published": {
"type": "date"
}
}
}
}
}
it works. In fact if I put a malformed date in the field "published" it complains and fails.
Also I've the following configuration:
...
node.name : node1
index.mapper.dynamic : false
index.mapper.dynamic.strict : true
...
And without the mapping, I can't really use the type. The problem is that if I insert something like:
{
"name":"boh58585",
"address": "hiohio",
"published": "2014-4-4",
"test": "hophiophop"
}
it will happily accept it. Which is not the behaviour that I expect, because the field test is not in the mapping. How can I restrict the fields of the document to only those that are in the mapping???
The use of "dynamic": false tells Elasticsearch to never allow the mapping of an index to be changed. If you want an error thrown when you try to index new documents with fields outside of the defined mapping, use "dynamic": "strict" instead.
From the docs:
"The dynamic parameter can also be set to strict, meaning that not only new fields will not be introduced into the mapping, parsing (indexing) docs with such new fields will fail."
Since you've defined this in the settings, I would guess that leaving out the dynamic from the mapping definition completely will default to "dynamic": "strict".
Is your problem with the malformed date field?
I would fix the date issue and continue to use dynamic: false.
You can read about the ways to set up the date field mapping for a custom format here.
Stick the date format string in a {type: date, format: ?} mapping.

Resources