Elasticsearch - Setting up default analyzers on all fields - elasticsearch

I've an Index where the mappings will vary drastically. Consider for example, I'm indexing Wikipedia infobox data of every other article. The data in infobox is not structured, neither its uniform. So, the data can be of the form:-
Data1- {
'title': 'Sachin',
'Age': 41,
'Occupation': Cricketer
}
Data2- {
'title': 'India',
'Population': '23456987654',
'GDP': '23',
'NationalAnthem': 'Jan Gan Man'
}
Since all the fields are different and I want to apply Completion field on the relevant field, hence I'm thinking of applying analyzers on all the fields.
How can I apply analyzers on every field by default while indexing?

You need a _default_ template for your index, so that whenever new fields are added to it, those string fields will take the mapping from the _default_ template:
{
"template": "infobox*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "my_completion_analyzer",
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
Or if your index is not a daily/weekly one, you can just create it once with the _default_ mapping defined:
PUT /infobox
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "my_completion_analyzer",
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}

Related

Elasticsearch - dynamic mapping with multi-field support

Is it possible to add new fields with multi-field support dynamically?
My index has properties that will only be known at indexing time. So these fields will be included with dynamic mapping.
But, when a new field is added dynamically, I need it to be mapped as text and with three sub-fields: keyword, date (if it fits with dynamic_date_formats) and long.
With these three sub-fields I will be able to search and aggregate many queries with maximum performance.
I know I can do a "pre" mapping my index with these "dynamic fields" using nested field with key and value properties so I can create the value property with these three sub-fields. But I don't want to create a nested key/value field because it's not very fast when performing aggregations with a lot of documents.
I found it.
Dynamic templates is the answer.
Very simple :)
{
"mappings": {
"doc": {
"dynamic_templates": [
{
"objs": {
"match_mapping_type": "object",
"mapping": {
"type": "{dynamic_type}"
}
}
},
{
"attrs": {
"match_mapping_type": "*",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"long": {
"type": "long",
"ignore_malformed": true
},
"double": {
"type": "double",
"ignore_malformed": true
},
"date": {
"type": "date",
"format": "dd/MM/yyyy||dd/MM/yyyy HH:mm:ss||dd/MM/yyyy HH:mm",
"ignore_malformed": true
}
}
}
}
}
],
"dynamic": "strict",
"properties": {
"fixed": {
"properties": {
"aaa": {
"type": "text"
},
"bbb": {
"type": "long"
},
"ccc": {
"type": "date",
"format": "dd/MM/yyyy"
}
}
},
"dyn": {
"dynamic": true,
"properties": {
}
}
}
}
}
}

Kibana does not search on nested field

working with Elasticsearch/Kibana and trying to search on field in a nested object. However it does not seem to work. Here's mapping that I use in a template:
{
"order": 0,
"template": "ss7_signaling*",
"settings": {
"index": {
"mapping.total_fields.limit": 3000,
"number_of_shards": "5",
"refresh_interval": "30s"
},
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_fields": {
"mapping": {
"fielddata": {
"format": "disabled"
},
"index": "no",
"type": "string"
},
"match_mapping_type": "string",
"match": "*"
}
}
],
"properties": {
"message": {
"index": "not_analyzed",
"type": "string"
},
"Protocol": {
"index": "not_analyzed",
"type": "string"
},
"IMSI": {
"index": "not_analyzed",
"type": "string"
},
"nested": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"Timestamp": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"
},
"#timestamp": {
"type": "date"
},
"#version": {
"index": "not_analyzed",
"type": "string"
}
},
"_all": {
"norms": false,
"enabled": false
}
}
},
"aliases": {
"signaling": {}
}
}
When I do search kibana on single fields - everything works fine. Still though i cannot search on nested fields like 'nested.name'.
Example of my query in kibana: nested.name:hi
Thanks.
Kibana uses the query_string query underneath, and the latter does not support querying on nested fields.
It's still being worked on but in the meantime you need to proceed differently.
UPDATE:
As of ES 7.6, it is now possible to search on nested fields

Inconsistent MapperParsingException on ElasticSearch 2.4.1

I'm doing dynamic mapping for the indices and found that one consistently works while the other one doesn't even though they are basically the same.
MapperParsingException[failed to parse]; nested: IllegalArgumentException[mapper [$source.attributes.th.values.display] of different type, current_type [string], merged_type [double]];
This one works
{
"_id": "581b883cfb54c66569adfc6c",
"$source": {
"attributes": {
"th": [
{
"values": [
{
"display": "13.726133,100.5731003",
"value": "13.726133,100.5731003"
}
],
"_v": 4,
"type": "geo",
"_dt": "com.7leaf.framework.Attribute",
"slug": "lat-long",
"key": "Lat / Long"
},
{
"values": [
{
"display": 34,
"value": 34
}
],
"_v": 4,
"type": "number",
"_dt": "com.7leaf.framework.Attribute",
"slug": "number-of-floors",
"key": "จำนวนชั้น"
}
]
}
}
}
This doesn't
{
"_id": "5824bce9fb54c611b092eec6",
"$source": {
"attributes": {
"th": [
{
"values": [
{
"display": "13.726133,100.5731003",
"value": "13.726133,100.5731003"
}
],
"type": "geo",
"_dt": "com.7leaf.framework.Attribute",
"_v": 4,
"slug": "lat-long",
"key": "Lat / Long"
},
{
"values": [
{
"display": 34,
"value": 34
}
],
"type": "number",
"_dt": "com.7leaf.framework.Attribute",
"_v": 4,
"slug": "number-of-floors",
"key": "จำนวนชั้น"
}
]
}
}
}
What could have possibly gone wrong? The "display" and "value" field can be of any type. I just don't get how it works for the first index and not the second. It doesn't make much sense. Any pointer is appreciated.
This is what the mapping looks like for the one that worked. It's automatically generated.
"values": {
"properties": {
"display": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},
"value": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
For those who didn't believe the first one actually worked. Here's the screenshot. I have a lot of documents in that index.
Here's the Java code I use for bulk reindexing. Nothing special really.
public BulkResponse bulkIndex(List<JSONObject> entries){
if(client == null) return null;
BulkRequestBuilder bulkRequest = client.prepareBulk();
for(JSONObject document : entries){
String indexName = getIndexName(
document.getString(Constants.DATABASE), document.getString(Constants.COLLECTION));
String id = document.getString(Constants.ID + "#$oid");
bulkRequest.add(client.prepareIndex(
indexName, document.getString(Constants.COLLECTION), id)
.setSource(document.toMap()));
}
return bulkRequest.get();
}
Here's the stacktrace from ElasticSearch:
MapperParsingException[failed to parse]; nested: IllegalArgumentException[mapper [$source.attributes.th.values.display] of different type, current_type [string], merged_type [double]];
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:156)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:580)
at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:559)
at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:211)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:327)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:120)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: mapper [$source.attributes.th.values.display] of different type, current_type [string], merged_type [double]
at org.elasticsearch.index.mapper.FieldMapper.doMerge(FieldMapper.java:378)
at org.elasticsearch.index.mapper.core.StringFieldMapper.doMerge(StringFieldMapper.java:382)
at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:364)
at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:53)
at org.elasticsearch.index.mapper.object.ObjectMapper.doMerge(ObjectMapper.java:528)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:60)
at org.elasticsearch.index.mapper.object.ObjectMapper.doMerge(ObjectMapper.java:528)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:60)
at org.elasticsearch.index.mapper.object.ObjectMapper.doMerge(ObjectMapper.java:528)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:271)
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
at org.elasticsearch.index.mapper.DocumentParser.parseAndMergeUpdate(DocumentParser.java:740)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:354)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:328)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:124)
... 18 more
Workaround
I added the mapping I mentioned above as part of default template and I was able to get around it. However, I have no idea why it works. I can now store any kind of properties in the same field.
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
],
"properties": {
"$source": {
"properties": {
"attributes": {
"properties": {
"en": {
"properties": {
"values": {
"properties": {
"display": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},
"value": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
}
},
"th": {
"properties": {
"values": {
"properties": {
"display": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},
"value": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Any pointer on how to make it dynamic so I don't need to specify it for every language? In the example above, it's working for English (en) and Thai (th) and I'm planning to support 40+ languages and I don't want to have to add more mapping for each language.
It matters which of your documents created the index first! If the first document that got into the index is a "correct" one where there is no confusion between the types of the same field, then the index gets created with let's say a string field for display. Then you can very well index the "problematic" without issues. Index the documents one by one in a non-existent index (so that it's automatically created) and see the difference in behavior. I really doubt that your bulk indexing code is inserting one document at a time.
If you want to send a bulk of documents, you can't tell which actually will trigger the creation of the index. Some documents will be sent to one shard, some to another and so on. There will be a mix of messages from the shards to the master node with the mapping and the first one that "wins" can be the "wrong" one or the "correct" one. You need a dynamic mapping template to control this, as you tested already.
My suggestion is to use a more targeted template, that uses wildcards to cover for all the languages you have. If what is different for your languages is just the name of one of the fields - th or en etc - then use path_match and then the template that you have but in a more concise way without duplicating the same mapping over and over:
PUT /_template/language_control
{
"template": "language_control*",
"mappings": {
"my_type": {
"dynamic_templates": [
{
"display_values": {
"path_match": "$source.attributes.*.values.*",
"mapping": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
If you look at path_match that will cover $source.attributes.th.values.display, $source.attributes.en.values.display and also the other values from the languages like $source.attributes.th.values.value, $source.attributes.en.values.value.
One thing that you can do is to augment your dynamic templates for all types to be mapped to strings, since you probably won't query on them.
It basically boils down to modifying your template to the one below, which will work with the documents you've shown above. Note that I've only added a dynamic mapping for the long type, but you can add a similar mapping for other types as well. The bottom line being to make sure that the mapper won't find any ambiguities when indexing your documents.
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
},
{
"longs": {
"match_mapping_type": "long",
"mapping": {
"type": "string"
}
}
}
]
}
}
}
Also note that I've never been able to index the first document, which always fails for me. As #Andrei rightly said, when indexing documents in bulk you have no guarantee which document is indexed first into which shard and thus you have no guarantee of which field will "win", i.e. will be first indexed and sets the field type.

Elasticsearch, how to check if my dynamic mapping works?

I'm providing a default mapping dynamic template at index creation in elasticsearch and wanted to check if it works as expected. Got me stumped, how can I verify if it works?
(Working with ES 2.2.2)
"mappings": {
"_default_": {
"dynamic_templates": [
{
"no_date_detection": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"date_detection": false
}
}
},
{
"language_de": {
"match_mapping_type": "*",
"match": "*_de",
"mapping": {
"type": "string",
"analyzer": "german"
}
}
},
{
"language_es": {
"match": "*_es",
"match_mapping_type": "*",
"mapping": {
"type": "string",
"analyzer": "spanish"
}
}
},
{
"language_en": {
"match": "*_en",
"match_mapping_type": "*",
"mapping": {
"type": "string",
"analyzer": "english"
}
}
}
]
}
}
It's pretty straightforward, like in the examples provided in the documentation.
GETting the mapping shows that the dynamic templates are handed down to new types
"testobject": {
"dynamic_templates": [
{
"no_date_detection": {
"mapping": {
"type": "string",
"date_detection": false
},
"match_mapping_type": "string"
}
},
{
"language_de": {
...
But when I create an object with new fields like
"description_en": "some english text"
and GET the mapping it just shows
"description_en": {
"type": "string"
}
Shouldn't this have
"analyzer": "english"
in it?
What did I do wrong, and if my dynamic mapping is correct, how can I verify that it gets applied?
Thanks in advance /Carsten
As my question "how can I verify that it gets applied?" seems unclear, I try to simplify:
I create an index with default dynamic mapping.
I create a type "testobject".
"GET /myindex/testobject/_mappings" verifies that, as expected, the dynamic templates are handed down to the type.
I create a new field in an object of type testobject.
"GET /myindex/testobject/_mappings" shows the new field but without say '"date_detection": false'. It shows it just as a simple string (see above).
How can I verify if/that the dynamic template got applied to a newly created field?
Simplified example:
PUT /myindex
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"no_date_detection": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"date_detection": false
}
}
}
]
}
}
}
PUT /myindex/gardeners/1
{
"name": "gary"
}
GET /myindex/_mapping
{
"myindex": {
"mappings": {
"gardeners": {
"dynamic_templates": [
{
"no_date_detection": {
"mapping": {
"type": "string",
"date_detection": false
},
"match_mapping_type": "string"
}
}
],
"properties": {
"name": {
"type": "string"
}
}
},
"_default_": {
"dynamic_templates": [
{
"no_date_detection": {
"mapping": {
"type": "string",
"date_detection": false
},
"match_mapping_type": "string"
}
}
]
}
}
}
}
The mapping for my new field "name"
"properties": {
"name": {
"type": "string"
}
}
doen't contain
"date_detection": false
Why doesn't it get handed down?
The dynamic templates are checked in the order they are defined and the first one that matches, that's the one that gets applied.
In your case no_date_detection template matches your field description_en because it's a string.
If you want that field to be used with the english analyzer, then you need to change the order of the templates:
"mappings": {
"_default_": {
"dynamic_templates": [
{
"language_de": {
"match_mapping_type": "*",
"match": "*_de",
"mapping": {
"type": "string",
"analyzer": "german"
}
}
},
{
"language_es": {
"match": "*_es",
"match_mapping_type": "*",
"mapping": {
"type": "string",
"analyzer": "spanish"
}
}
},
{
"language_en": {
"match": "*_en",
"match_mapping_type": "*",
"mapping": {
"type": "string",
"analyzer": "english"
}
}
},
{
"no_date_detection": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"date_detection": false
}
}
}
]
}
}
I found the error in my assumption: "date_detection": false doesn't work that way with dynamic_templates.
You have to specify the date_detection directly on the mapping (not at the level of a specific type).
https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
If you want this to get automatically added to new indices, you can use index templates.
Thanks to Yannick for the hint (https://discuss.elastic.co/t/mappings--default--dynamic-templates-doesnt-show-up-in-resulting-mapping/59030)

elasticsearch index_name with multi_field

I have 2 separate indexes, each containing a different type
I want to get combined records from both.
The problem is that one type has field 'email', the other has 'work_email'. However I want to treat them as the same thing for sorting purposes.
That is why I try to use index_name in one of the types.
Here are mappings:
Index1:
"mappings": {
"people": {
"properties": {
"work_email": {
"type": "string",
"index_name": "email",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Index2:
"mappings": {
"companies": {
"properties": {
"email": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I expect this to work:
GET /index1,index2/people,companies/_search?
{
"sort": [
{
"email.raw": {
"order": "asc"
}
}
]
}
But, I get an error that there is no such field in the 'people' type.
Am I doing something wrong, or is there a better way to achieve what I need?
Here you can find a recreation script that illustrates the problem: https://gist.github.com/pmishev/11375297
There is problem in the way you map the multi field..Check out the below mapping and try to index..You should get the results
"mappings": {
"people": {
"properties": {
"work_email":{
"type": "multi_field",
"fields": {
"work_email":{
"type": "string",
"index_name": "email"
},
"raw":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
We should specify the type to multi_field and under the fields we should specify the required fields....
I ended up adding a 'copy_to' property in my mapping:
"mappings": {
"people": {
"properties": {
"work_email": {
"type": "string",
"copy_to": "email",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
So now I can address both fields as email.
It's not ideal, as this means that the email field is actually indexed twice, but that was the only thing that worked.

Resources