elasticsearch: multifield mapping of multiple fields - elasticsearch

i will have a document with multiple fields. let's say 'title', 'meta1', 'meta2', 'full_body'.
each of them i want to index in a few different ways (raw, stemming without stop-words, shingles, synonyms etc.). therefore i will have fields like: title.stemming, title.shingles, meta1.stemming, meta1.shingles etc.
do i have to copy paste the mapping definition for each field? or is it possible to create one definition of all ways of indexing/analysing and then only apply it to each of 4 top level fields? if so, how?
mappings:
my_type:
properties:
title:
type: string
fields:
shingles:
type: string
analyzer: my_shingle_analyzer
stemming:
type: string
analyzer: my_stemming_analyzer
meta1:
... <-- do i have to repeat everything here?
meta2:
... <-- and here?
full_body:
... <-- and here?

In your case, you could use dynamic templates with the match_mapping_type setting so that you can apply the same setting to all your string fields:
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"shingles": {
"type": "string",
"analyzer": "my_shingle_analyzer"
},
"stemming": {
"type": "string",
"analyzer": "my_stemming_analyzer"
}
, ... other sub-fields and analyzers
}
}
}
}
]
}
}
}
As a result, whenever you index a string field, its mapping will be created according to the defined template. You can also use the match setting, to restrict the mapping to specific field names, only.

Related

how to specify a field which should not indexed?

as mentioned in the title, I want to disable index a specified field in elasticsearch, for example, I have a fields named #fileds which contains three sub-fields like name、age、salary, now I do not want to index the field #fields.age in elasticsearch, how can I achieve that? I have tried to use include_in_all parameters, but it doesn't work. mapping configuration like:
"mappings": {
"fluentd": {
"properties": {
"#fields": {
"properties": {
"age": {
"type": "text",
"include_in_all": false,
"index": "no"
}
}
}
}
}
}
when I use this mapping configuration above, I can only see #fields.age in the index's mapping, #fields.name and #fields.salary should appear on the index's mapping not the #fields.age, how can this happen? any answers will be appreciated.

Kibana visualization not showing analyzed fields

I am working on a based facebook comments dashboard from facebook graph api using elasticsearch5 & kibana5. I add some analyzed fields and they are appearing in the discover part on Kibana but when going to the visualization i don't find those fields.
My facebook comments index :
PUT fb_comments
{
"settings": {
"analysis": {},
"mapping.ignore_malformed": true
},
"mappings": {
"fb_comment": {
"dynamic_templates": [
{
"created_time": {
"match": "created_time",
"mapping": {
"type": "date",
"format": "epoch_second"
}
}
},
{
"message": {
"match": "message",
"mapping": {
"type": "string",
"analyzer": "simple"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
The field message the analyzed one is appearing in discover
The field message the analyzed one is not appearing in visualization part
I think it might be related to a memory limitation. As per Kibana 5 help, analyzed fields might required more memory.
I checked my memory and it is indeed used at its max capacity.
I finally found the solution.
So in elasticsearch 4.X we had string type and then you specified the type of analyzer if you wish to be analyzed. In elasticsearch 5.X we have two types keyword which is automatically aggregated and not analyzed, and the 2nd is text which is autmatically analyzed and not aggregated. The solution is if you want an analyzed field and aggregated at the same time you should add a property "fielddata":true and it will be analyzed and aggregated.

How to map dynamic field value in elasticsearch?

I'm mapping a couchbase gateway document and I'd like to tell elasticsearch to avoid indexing the internal attributes added by the gateway like the "_sync", this object contains another object named "channels" which has the following form:
"channels": {
"i7de5558-32ad-48ca-bf91-858c3a1e4588": 12
}
So I guess the mapping of this object would be like:
"channels": {
"type": "object",
"properties": {
"i7de5558-32ad-48ca-bf91-858c3a1e4588": {
"type": "integer",
"index": "not_analyze"
}
}
}
The problem is that the keys are always changing, so I don't know if I should use a wildcard like this "*": {"type": "integer", "index": "not_analyze"} for this property or do something else.
Any advice please?
If the fields are of integer types, you don't have to provide them explicitly in the mapping. You can create an empty mapping ,index documents with these fields. Elasticsearch will infer the type of field and update the mapping dynamically. You can also use dynamic templates for this.
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"analysed_string_template": {
"path_match": "channels.*",
"mapping": {
"type": "integer"
}
}
}
]
}
}
}
There`s a dynamic way to do that as you need, is called dynamic template
Using templates you are able to create rules like this:
PUT /my_index
{
"mappings": {
"my_type": {
"date_detection": false
}
}
}
In your case you could create a template to set all news fields inside the channel object as not_analyzed.
Hope it will help

Elasticsearch UTF-8 characters in search

I have indexed record:
"žiema"
Elastic search settings:
index:
cmpCategory: {type: string, analyzer: like_analyzer}
Analyzer
analysis:
char_filter:
lt_characters:
type: mapping
mappings: ["ą=>a","Ą=>a","č=>c","Č=>c","ę=>e","Ę=>e","ė=>e","Ė=>e","į=>i","Į=>i","š=>s","Š=>s","ų=>u","Ų=>u","ū=>u","ž=>z", "Ū=>u"]
analyzer:
like_analyzer:
type: snowball
tokenizer: standard
filter : [lowercase,asciifolding]
char_filter : [lt_characters]
What I want:
By keyword "žiema" found record "žiema" AND by keyword "ziema" also found record "žiema", how to do that ?
I try execute characters replace and filter asciifolding
What am I doing wrong?
You can try indexing your field twice like it is shown in the documentation.
PUT /my_index/_mapping/my_type
{
"properties": {
"cmpCategory": {
"type": "string",
"analyzer": "standard",
"fields": {
"folded": {
"type": "string",
"analyzer": "like_analyzer"
}
}
}
}
}
so the cmpCategory field is indexed as standard with diacritics, and the cmpCategory.folded field is indexed without diacritics.
And while searching, you do the search on both indexes as such:
GET /my_index/_search
{
"query": {
"multi_match": {
"type": "most_fields",
"query": "žiema",
"fields": [ "cmpCategory", "cmpCategory.folded" ]
}
}
}
Also, I'm not sure if the char_filter is necessary since the asciifolding filter already does that transformation.

How to not-analyze in ElasticSearch?

I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters.
If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that.
Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed?
I only create the index, I don't do anything else. I can use analyzers like "english" for other fields which seems to be built-in names for pre-configured analyzers. Is there a list of other names? Maybe there's one fitting my needs (namely doing nothing with the input).
This is my mapping currently:
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string" }
}
}
}
my_field1 is language-dependent; this seems to work. my_field2 shall be verbatim. I'd like to give an analyzer there which simply does not do anything.
A sample value for my_field2 would be "B45c 14/04".
"my_field2": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
Check you here, https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html, for further info.
This is no longer true due to the removal of the string (replaced by keyword and text) type as described here. Instead you should use keyword type with "index": true | false.
For Example OLD:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
becomes NEW:
{
"foo": {
"type" "keyword",
"index": true
}
}
This means the field is indexed but as it is typed as keyword not analyzed implicitly. If you would like to have the field analyzed, you need to use text type.
keyword analyser can be also used.
// don't actually use this, use "index": "not_analyzed" instead
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string", "analyzer": "keyword" }
}
}
}
As noted here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html, it makes more sense to mark those fields as not_analyzed.
But keyword analyzer can be useful when it is set by default for whole index.
UPDATE: As it said in comments, string is no longer supported in 5.X

Resources