Inconsistent MapperParsingException on ElasticSearch 2.4.1

Inconsistent MapperParsingException on ElasticSearch 2.4.1 - elasticsearch

I'm doing dynamic mapping for the indices and found that one consistently works while the other one doesn't even though they are basically the same.
MapperParsingException[failed to parse]; nested: IllegalArgumentException[mapper [$source.attributes.th.values.display] of different type, current_type [string], merged_type [double]];
This one works
{
"_id": "581b883cfb54c66569adfc6c",
"$source": {
"attributes": {
"th": [
{
"values": [
{
"display": "13.726133,100.5731003",
"value": "13.726133,100.5731003"
}
],
"_v": 4,
"type": "geo",
"_dt": "com.7leaf.framework.Attribute",
"slug": "lat-long",
"key": "Lat / Long"
},
{
"values": [
{
"display": 34,
"value": 34
}
],
"_v": 4,
"type": "number",
"_dt": "com.7leaf.framework.Attribute",
"slug": "number-of-floors",
"key": "จำนวนชั้น"
}
]
}
}
}
This doesn't
{
"_id": "5824bce9fb54c611b092eec6",
"$source": {
"attributes": {
"th": [
{
"values": [
{
"display": "13.726133,100.5731003",
"value": "13.726133,100.5731003"
}
],
"type": "geo",
"_dt": "com.7leaf.framework.Attribute",
"_v": 4,
"slug": "lat-long",
"key": "Lat / Long"
},
{
"values": [
{
"display": 34,
"value": 34
}
],
"type": "number",
"_dt": "com.7leaf.framework.Attribute",
"_v": 4,
"slug": "number-of-floors",
"key": "จำนวนชั้น"
}
]
}
}
}
What could have possibly gone wrong? The "display" and "value" field can be of any type. I just don't get how it works for the first index and not the second. It doesn't make much sense. Any pointer is appreciated.
This is what the mapping looks like for the one that worked. It's automatically generated.
"values": {
"properties": {
"display": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},
"value": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
For those who didn't believe the first one actually worked. Here's the screenshot. I have a lot of documents in that index.
Here's the Java code I use for bulk reindexing. Nothing special really.
public BulkResponse bulkIndex(List<JSONObject> entries){
if(client == null) return null;
BulkRequestBuilder bulkRequest = client.prepareBulk();
for(JSONObject document : entries){
String indexName = getIndexName(
document.getString(Constants.DATABASE), document.getString(Constants.COLLECTION));
String id = document.getString(Constants.ID + "#$oid");
bulkRequest.add(client.prepareIndex(
indexName, document.getString(Constants.COLLECTION), id)
.setSource(document.toMap()));
}
return bulkRequest.get();
}
Here's the stacktrace from ElasticSearch:
MapperParsingException[failed to parse]; nested: IllegalArgumentException[mapper [$source.attributes.th.values.display] of different type, current_type [string], merged_type [double]];
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:156)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:580)
at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:559)
at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:211)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:327)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:120)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: mapper [$source.attributes.th.values.display] of different type, current_type [string], merged_type [double]
at org.elasticsearch.index.mapper.FieldMapper.doMerge(FieldMapper.java:378)
at org.elasticsearch.index.mapper.core.StringFieldMapper.doMerge(StringFieldMapper.java:382)
at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:364)
at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:53)
at org.elasticsearch.index.mapper.object.ObjectMapper.doMerge(ObjectMapper.java:528)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:60)
at org.elasticsearch.index.mapper.object.ObjectMapper.doMerge(ObjectMapper.java:528)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:60)
at org.elasticsearch.index.mapper.object.ObjectMapper.doMerge(ObjectMapper.java:528)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:271)
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
at org.elasticsearch.index.mapper.DocumentParser.parseAndMergeUpdate(DocumentParser.java:740)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:354)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:328)
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:124)
... 18 more
Workaround
I added the mapping I mentioned above as part of default template and I was able to get around it. However, I have no idea why it works. I can now store any kind of properties in the same field.
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
],
"properties": {
"$source": {
"properties": {
"attributes": {
"properties": {
"en": {
"properties": {
"values": {
"properties": {
"display": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},
"value": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
}
},
"th": {
"properties": {
"values": {
"properties": {
"display": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},
"value": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Any pointer on how to make it dynamic so I don't need to specify it for every language? In the example above, it's working for English (en) and Thai (th) and I'm planning to support 40+ languages and I don't want to have to add more mapping for each language.

It matters which of your documents created the index first! If the first document that got into the index is a "correct" one where there is no confusion between the types of the same field, then the index gets created with let's say a string field for display. Then you can very well index the "problematic" without issues. Index the documents one by one in a non-existent index (so that it's automatically created) and see the difference in behavior. I really doubt that your bulk indexing code is inserting one document at a time.
If you want to send a bulk of documents, you can't tell which actually will trigger the creation of the index. Some documents will be sent to one shard, some to another and so on. There will be a mix of messages from the shards to the master node with the mapping and the first one that "wins" can be the "wrong" one or the "correct" one. You need a dynamic mapping template to control this, as you tested already.
My suggestion is to use a more targeted template, that uses wildcards to cover for all the languages you have. If what is different for your languages is just the name of one of the fields - th or en etc - then use path_match and then the template that you have but in a more concise way without duplicating the same mapping over and over:
PUT /_template/language_control
{
"template": "language_control*",
"mappings": {
"my_type": {
"dynamic_templates": [
{
"display_values": {
"path_match": "$source.attributes.*.values.*",
"mapping": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
If you look at path_match that will cover $source.attributes.th.values.display, $source.attributes.en.values.display and also the other values from the languages like $source.attributes.th.values.value, $source.attributes.en.values.value.

One thing that you can do is to augment your dynamic templates for all types to be mapped to strings, since you probably won't query on them.
It basically boils down to modifying your template to the one below, which will work with the documents you've shown above. Note that I've only added a dynamic mapping for the long type, but you can add a similar mapping for other types as well. The bottom line being to make sure that the mapper won't find any ambiguities when indexing your documents.
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
},
{
"longs": {
"match_mapping_type": "long",
"mapping": {
"type": "string"
}
}
}
]
}
}
}
Also note that I've never been able to index the first document, which always fails for me. As #Andrei rightly said, when indexing documents in bulk you have no guarantee which document is indexed first into which shard and thus you have no guarantee of which field will "win", i.e. will be first indexed and sets the field type.

Related

Is there a way to use Elasticsearch copy_to feature with a condition

I am implementing Completion Suggester in my application, and here goes my requirement:
I want to use the Completion Suggester only on two type of fields, only my index is like this (because of some other requirements the fields are not indexed as they come, but instead the data is flattened to match this mapping):
flatData": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"key_type": {
"type": "keyword"
},
"value_string": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
So basically, it indexes data under this format :
{
"key": "idBook",
"type": long,
"key_type": "idBook.long",
"value_string": 67d25bce-39b5-4069-b137-0698286f50a4
},
{
"key": "bookName",
"type": "string",
"key_type": "bookName.string",
"value_string": "A Song Of Ice And Fire"
},
{
"key": "numPages",
"type": "string",
"key_type": "numPages.string",
"value_string": "8000"
}
In my case, I want to add the completion suggester only when the value of the Key is BookName and AuthorName for example, what I thougt to do is to add an _all field where I copy the values of these keys, in order to have somehting like this :
"value_string": {
"type": "text",
"copy_to": "my_all"
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
"my_all": {
"type": "completion"
}
Is there a way to copy_to the field my_all only if the value of the key is bookName for Example ?

Tldr;
No, copy_to does not support conditions.
But it does not mean you can not achieve what you want to do.
You should look toward ingestion pipeline which does support condition
Solution
You start with this doc
{
"key": "bookName",
"type": "string",
"key_type": "bookName.string",
"value_string": "A Song Of Ice And Fire"
}
Create this pipeline
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"description": "If 'url.scheme' is 'http', set 'url.insecure' to true",
"if": "ctx.key == 'bookname'",
"field": "my_all",
"value": "{{{value_string}}}"
}
}
]
}
You end up with this document
{
"key": "bookName",
"type": "string",
"key_type": "bookName.string",
"value_string": "A Song Of Ice And Fire",
"my_all": "A Song Of Ice And Fire"
}
Just create the mapping accordingly to have the field my_all to be a of type completion and you should be all set.

Elasticsearch Not Returning Expected Results For Singular vs Plural

I'm currently having an issue with being unable to return hits for with a particular search term, and it's a bit perplexing to me:
Term: navy flower
The query would up looking like:
(name: "navy flower"~5 OR sku: "navy flower"~10 OR description: "navy flower"~5)
No hits.
If I change the term to: navy flowers
I get 3 hits with it:
The mappings I currently have setup on the index are as follows:
{
"mappings": {
"_doc": {
"properties": {
"active": {
"type": "long"
},
"description": {
"type": "text"
},
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"sku": {
"type": "text"
},
"upc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I'm must be missing something obvious for the match to not be working on the singular vs plural form of the word.

As per your index mapping, you have not specified any analyzer that means elastic search by default use standard analyzers and standard analyzer doesn't do stemming as by default it have only 2 token filter:
Lower Case Token Filter
Stop Token Filter (by default disabled)
For supporting your use case, you required Stemmer token filter with the analyzer. So you can create a custom analyzer and configured to the required field:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"active": {
"type": "long"
},
"description": {
"type": "text"
},
"id": {
"type": "integer"
},
"name": {
"type": "text",
"analyzer": "my_analyzer"
},
"sku": {
"type": "text"
},
"upc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
After this you can search with below query:
GET test/_search?q=(name:"navy flower"~5 OR sku: "navy flower"~10 OR description: "navy flower"~5)

elasticsearch dynamic field nested detection

Hi im trying to create an index in my elastic search without defining the mapping so what i did was this.
PUT my_index1/my_type/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith",
"age" : "1",
"enabled": false
},
{
"first" : "Alice",
"last" : "White",
"age" : "10",
"enabled": true
}
]
}
if did this elastic search will create a mapping for this index which is the result is
{
"my_index1": {
"mappings": {
"my_type": {
"properties": {
"group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"properties": {
"age": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"enabled": {
"type": "boolean"
},
"first": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
if you would notice the property user didn't have a type of nested other properties has their own type defined by elastic search is there a way to it automatically the mapping should be look like this for the user property
"user": {
type:"nested"
"properties": {
"age": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"enabled": {
"type": "boolean"
},
"first": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
which is missing. im currently using nest
is there a way to define a dynamic mapping to detect if the newly added data on index is nested?

By default, Elasticsearch/Lucene has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values.
The above document would be converted internally into a document that looks more like this: (See Nested field type for more details)
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
There is no beautiful answer here. A common approach might be using dynamic template to convert object to nested (however, a side effect is that all fields of object type would be changed to nested type),
{
"mappings": {
"dynamic_templates": [
{
"objects": {
"match": "*",
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
}
]
}
}
Another approach is specify mapping for the field before inserting data.
PUT <your index>
{
"mappings": {
"properties": {
"user": {
"type": "nested"
}
}
}
}

You can define a dynamic template where you can define your own custom mapping which can be used later when you index documents in the index.
Adding a step by step procedure, with the help of which automatically the mapping of the user field would be mapped to that of nested type
First, you need to define a dynamic template for the index as shown below, which have a match parameter which will match the field name having pattern similar to user* and map it to nested type
PUT /<index-name>
{
"mappings": {
"dynamic_templates": [
{
"nested_users": {
"match": "user*",
"mapping": {
"type": "nested"
}
}
}
]
}
}
After creating this template, you need to index the documents into it
POST /<index-name>/_doc/1
{
"group": "fans",
"user": [
{
"first": "John",
"last": "Smith",
"age": "1",
"enabled": false
},
{
"first": "Alice",
"last": "White",
"age": "10",
"enabled": true
}
]
}
Now when you see the mapping of the index documents, using the Get Mapping API, the mapping would be similar to what you expect to see
GET /<index-name>/_mapping?pretty
{
"index-name": {
"mappings": {
"dynamic_templates": [
{
"nested_users": {
"match": "user*",
"mapping": {
"type": "nested"
}
}
}
],
"properties": {
"group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested", // note this
"properties": {
"age": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"enabled": {
"type": "boolean"
},
"first": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Or as #Jacky1205 mentioned, if it is not field-specific then you can use the below template that will match all object type fields to be of nested type
{
"mappings": {
"dynamic_templates": [
{
"nested_users": {
"match": "*",
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
}
]
}
}

Elasticsearch.js analyzer error using custom analyzer

Using the latest version of the elasticsearch.js and trying to create a custom path analyzer when indexing and creating the mapping for some posts.
The goal is creating keywords out of each segment of the path. However as a start simply trying to get the analyzer working.
Here is the elasticsearch.js create_mapped_index.js, you can see the custom analyzer near the top of the file:
var client = require('./connection.js');
client.indices.create({
index: "wcm-posts",
body: {
"settings": {
"analysis": {
"analyzer": {
"wcm_path_analyzer": {
"tokenizer": "wcm_path_tokenizer",
"type": "custom"
}
},
"tokenizer": {
"wcm_path_tokenizer": {
"type": "pattern",
"pattern": "/"
}
}
}
},
"mappings": {
"post": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"titles": {
"type": "object",
"properties": {
"main": { "type": "string" },
"subtitle": { "type": "string" },
"alternate": { "type": "string" },
"concise": { "type": "string" },
"seo": { "type": "string" }
}
},
"tags": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"name": { "type": "string", "index": "not_analyzed" },
"slug": { "type": "string" }
},
},
"main_taxonomies": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"name": { "type": "string", "index": "not_analyzed" },
"slug": { "type": "string", "index": "not_analyzed" },
"path": { "type": "string", "index": "wcm_path_analyzer" }
},
},
"categories": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"name": { "type": "string", "index": "not_analyzed" },
"slug": { "type": "string", "index": "not_analyzed" },
"path": { "type": "string", "index": "wcm_path_analyzer" }
},
},
"content_elements": {
"dynamic": "true",
"type": "nested",
"properties": {
"content": { "type": "string" }
}
}
}
}
}
}
}, function (err, resp, respcode) {
console.log(err, resp, respcode);
});
If the call to wcm_path_analyzer is set to "non_analyzed" or index is omitted the index, mapping and insertion of posts work.
As soon as I try to use the custom analyzer on the main_taxonomy and categories path fields, like shown in the json above, I get this error:
response: '{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"wrong value for index [wcm_path_analyzer] for field [path]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [post]: wrong value for index [wcm_path_analyzer] for field [path]","caused_by":{"type":"mapper_parsing_exception","reason":"wrong value for index [wcm_path_analyzer] for field [path]"}},"status":400}',
toString: [Function],
toJSON: [Function] } { error:
{ root_cause: [ [Object] ],
type: 'mapper_parsing_exception',
reason: 'Failed to parse mapping [post]: wrong value for index [wcm_path_analyzer] for field [path]',
caused_by:
{ type: 'mapper_parsing_exception',
reason: 'wrong value for index [wcm_path_analyzer] for field [path]' } },
status: 400 } 400
Here is an example of the two objects that need the custom analyzer on the path field. I pulled this example, after inserting 15 posts into the elasticsearch index when not using the custom analyzer:
"main_taxonomies": [
{
"id": "123",
"type": "category",
"name": "News",
"slug": "news",
"path": "/News/"
}
],
"categories": [
{
"id": "157",
"name": "Local News",
"slug": "local-news",
"path": "/News/Local News/",
"main": true
},
To this point, I had googled similar questions and most said that people were missing putting the analyzers in settings and not adding the parameters to the body. I believe this is correct.
I have also reviewed the elasticsearch.js documentation and tried to create a:
client.indices.putSettings({})
But for this to be used the index needs to exist with the mappings or it throws an error 'no indices found'
Not sure where to go from here? Your suggestions are appreciated.

So the final analyzer is:
var client = require('./connection.js');
client.indices.create({
index: "wcm-posts",
body: {
"settings": {
"analysis": {
"analyzer": {
"wcm_path_analyzer": {
"type" : "pattern",
"lowercase": true,
"pattern": "/"
}
}
}
},
"mappings": {
"post": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"client_id": { "type": "string", "index": "not_analyzed" },
"license_id": { "type": "string", "index": "not_analyzed" },
"origin_id": { "type": "string" },
...
...
"origin_slug": { "type": "string" },
"main_taxonomies_path": { "type": "string", "analyzer": "wcm_path_analyzer", "search_analyzer": "standard" },
"categories_paths": { "type": "string", "analyzer": "wcm_path_analyzer", "search_analyzer": "standard" },
"search_tags": { "type": "string" },
// See the custom analyzer set here --------------------------^
I did determine that at least for the path or pattern analyzers that complex nested or objects cannot be used. The flattened fields set to "type": "string" was the only way to get this to work.
I ended up not needing a custom tokenizer as the pattern analyzer is full featured and already includes a tokenizer.
I chose to use the pattern analyzer as it breaks on the pattern leaving individual terms whereas the path segments the path in different ways but does not create individual terms ( I hope I'm correct in saying this. I base it on the documentation ).
Hope this helps someone else!
Steve

So I got it working ... I think that the json objects were too complex or it was the change of adding the analyzer to the field mappings that did the trick.
first I flattened out:
To:
"main_taxonomies_path": "/News/",
"categories_paths": [ "/News/Local/", "/Business/Local/" ],
"search_tags": [ "montreal-3","laval-4" ],
Then I updated the analyzer to:
"settings": {
"analysis": {
"analyzer": {
"wcm_path_analyzer": {
"tokenizer": "wcm_path_tokenizer",
"type": "custom"
}
},
"tokenizer": {
"wcm_path_tokenizer": {
"type": "pattern",
"pattern": "/",
"replacement": ","
}
}
}
},
Notice that the analyzer 'type' is set to custom.
Then when mapping theses flattened fields:
"main_taxonomies_path": { "type": "string", "analyzer": "wcm_path_analyzer" },
"categories_paths": { "type": "string", "analyzer": "wcm_path_analyzer" },
"search_tags": { "type": "string" },
which when searching yields for these fields:
"main_taxonomies_path": "/News/",
"categories_paths": [ "/News/Local News/", "/Business/Local Business/" ],
"search_tags": [ "montreal-2", "laval-3" ],
So the custom analyzer does what it was set to do in this situation.
I'm not sure if I could apply type object to the main_taxonomies_path and categories_paths, so I will play around with this and see.
I will be refining the pattern searches to format the results differently but happy to have this working.
For completeness I will put my final custom pattern analyzer, mapping and results, once I've completed this.
Regards,
Steve

Elasticsearch - Setting up default analyzers on all fields

I've an Index where the mappings will vary drastically. Consider for example, I'm indexing Wikipedia infobox data of every other article. The data in infobox is not structured, neither its uniform. So, the data can be of the form:-
Data1- {
'title': 'Sachin',
'Age': 41,
'Occupation': Cricketer
}
Data2- {
'title': 'India',
'Population': '23456987654',
'GDP': '23',
'NationalAnthem': 'Jan Gan Man'
}
Since all the fields are different and I want to apply Completion field on the relevant field, hence I'm thinking of applying analyzers on all the fields.
How can I apply analyzers on every field by default while indexing?

You need a _default_ template for your index, so that whenever new fields are added to it, those string fields will take the mapping from the _default_ template:
{
"template": "infobox*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "my_completion_analyzer",
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
Or if your index is not a daily/weekly one, you can just create it once with the _default_ mapping defined:
PUT /infobox
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "my_completion_analyzer",
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio