I'm trying to get ES's (using I'm using ES v1.4.1) dynamic templating to work on my local machine and for some reason the "mappings" are not being included? I first create the index with a simple
PUT /bigtestindex (I'm using Sense plugin, not curl),
then I follow that with
PUT /_template/bigtestindex_1
{
"template": "big*",
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"autocomplete_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"mappings": {
"doc": {
"properties": {
"anchor": {
"type": "string"
},
"boost": {
"type": "string"
},
"content": {
"type": "string",
"analyzer": "whitespace_analyzer"
},
"digest": {
"type": "string"
},
"host": {
"type": "string"
},
"id": {
"type": "string"
},
"metatag.description": {
"type": "string",
"analyzer": "standard"
},
"metatag.keywords": {
"type": "string",
"analyzer": "standard"
},
"segment": {
"type": "string"
},
"title": {
"type": "string",
"index": "not_analyzed",
"fields": {
"autocomplete": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "whitespace_analyzer"
}
}
},
"tstamp": {
"type": "date",
"format": "dateOptionalTime"
},
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I'm not receiving any errors and the syntax looks to be correct but when I do something like
GET /bigtestindex/_mappings
in Sense, I get
{
"bigtestindex": {
"mappings": {}
}
}
First you need to create template then create index. You can find the same from elasticsearch documentation.
Templates are only applied at index creation time. Changing a template will have no impact on existing indices.
It seems my Sense command was a bit off, should have been
PUT /bigtestindex/_template/bigtesttemplate_1 (creates index and template in one command
OR
PUT /_template/bigtesttemplate_1 (creates just template) thanks to #avr for pointing out my incorrect command (needed some fresh eyes)
instead of
PUT /bigtestindex/_template/bigtesttemplate_1
discovered this after trying several things, hth someone else
UPDATE
As #avr stated, you do need to create the template first and then the index, you can create the index and the template in the same PUT statement as well.
It has everything to do with making sure your JSON is setup properly to match the right API endpoints. "mappings" should be separate from settings i.e.
{
"settings" {
...
},
"mappings" {
...
}
}
NOT
{
"settings" {
...
"mappings" {
}
}
"mappings" should NOT be included in the `"settings"` - needs to be separate.
hth, anyone else having the same problem
Related
I am looking for a way to make ES search the data with multiple analyzers.
NGram analyzer and one or few language analyzers.
Possible solution will be to use multi-fields and explicitly declare which analyzer to use for each field.
For example, to set the following mappings:
"mappings": {
"my_entity": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"ngram": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"spanish": {
"type": "string",
"analyzer": "spanish"
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
The problem with that is that I have explicitly write every field and its analyzers to a search query.
And it will not allow to search with "_all" and use multiple analyzers.
Is there a way to make "_all" query use multiple analyzers?
Something like "_all.ngram", "_all.spanish" and without using copy_to do duplicate the data?
Is it possible to combine ngram analyzer with a spanish (or any other foreign language) and make a single custom analyzer?
I have tested the following settings but these did not work:
PUT /ngrams_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
}
},
"filter": {
"ngram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": ["ejemplo"]
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer"
]
}
}
}
},
"mappings": {
"my_entity": {
"_all": {
"enabled": true,
"analyzer": "ngram_analyzer"
},
"properties": {
"my_field": {
"type": "text",
"fields": {
"analyzer1": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"analyzer2": {
"type": "string",
"analyzer": "spanish"
},
"analyzer3": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
}
GET /ngrams_index/_analyze
{
"field": "_all",
"text": "Hola, me llamo Juan."
}
returns: just ngram results, without Spanish analysis
where
GET /ngrams_index/_analyze
{
"field": "my_field.analyzer2",
"text": "Hola, me llamo Juan."
}
properly analyzes the search string.
Is it possible to build a custom analyzer which combine Spanish and ngram?
There is a way to create a custom ngram+language analyzer:
PUT /ngrams_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": [
"ejemplo"
]
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer",
"ngram_filter"
]
}
}
}
},
"mappings": {
"my_entity": {
"_all": {
"enabled": true,
"analyzer": "ngram_analyzer"
},
"properties": {
"my_field": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
}
}
}
GET /ngrams_index/_analyze
{
"field": "my_field",
"text": "Hola, me llamo Juan."
}
If the query was Brid I want to get <em>Brid</em>gitte in highlighted fields, not the whole word <em>Bridgitte</em>
My index looks like this (I've added ngram analyzer as was suggested here Highlighting part of word in elasticsearch)
{
"myindex": {
"aliases": {},
"mappings": {
"mytype": {
"properties": {
"myarrayproperty": {
"properties": {
"mystringproperty1": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
},
"mystringproperty2": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
}
},
"mylongproperty": {
"type": "long"
},
"mydateproperty": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"mystringproperty3": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
},
"mystringproperty4": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
}
}
}
},
"settings": {
"index": {
"creation_date": "1498030893611",
"analysis": {
"analyzer": {
"search_term_analyzer": {
"filter": "lowercase",
"type": "custom",
"tokenizer": "ngram_tokenizer"
},
"index_ngram_analyzer": {
"filter": ["lowercase"],
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"token_chars": ["letter", "digit"],
"min_gram": "1",
"type": "nGram",
"max_gram": "15"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "e5kBX-XRTKOqeAScO1Fs0w",
"version": {
"created": "2040499"
}
}
},
"warmers": {}
}
}
}
This is embedded Elasticsearch instance, not sure if it's relevant.
My query looks like this
MatchQueryBuilder queryBuilder = matchPhrasePrefixQuery("_all", query).maxExpansions(50);
final SearchResponse response = client.prepareSearch("myindex")
.setQuery(queryBuilder)
.addHighlightedField("mystringproperty3", 0, 0)
.addHighlightedField("mystringproperty4", 0, 0)
.addHighlightedField("myarrayproperty.mystringproperty1", 0, 0)
.setHighlighterRequireFieldMatch(false)
.execute().actionGet();
And it doesn't work. I've tried to change query to queryStringQuery but it seems like it doesn't support search by part of the word. Any suggestions?
It's not possible. Elastic search does the indexing of a word. From tokenization perspective, you can not do much here.
You may need to write the wrapper over a search result. (Not elastic search specific)
I use ElasticSearch-2.3.5. I want to add my custom analyzer to mapping while index creating.
PUT /library
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
},
"mappings": {
"book": {
"properties": {
"Id": {
"type": "long",
"search_analyzer": "search_term_analyzer",
"index_analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
},
"Title": {
"type": "string",
"search_analyzer": "search_term_analyzer",
"index_analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
}
}
}
}
}
I take a template example from official guide.
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "string", "index" : "not_analyzed" }
}
}
}
}
But I get an error trying to execute the first part of code. There is my error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "analyzer [search_term_analyzer] not found for field [Title]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [book]: analyzer [search_term_analyzer] not found for field [Title]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "analyzer [search_term_analyzer] not found for field [Title]"
}
},
"status": 400
}
I can do it if I put my mappings inside of settings, but I think that it is wrong way. So I try to find my book by using a part of title. I have the "King Arthur" book for example. My query looks like this:
POST /library/book/_search
{
"query": {
"match": {
"Title": "kin"
}
}
}
Nothing will be found. What I do wrong? Could you help me? It seems my analyzer and tokenizer don't work. How can I get the terms "k", "i", "ki", "king" etc.? Because I think that I have only two terms right now. There are 'king' and 'arthur'.
You have misplaced the search_term_analyzer analyzer, it should be inside the analyzer section
PUT /library
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"book": {
"properties": {
"Id": {
"type": "long", <---- you probably need to make this a string or remove the analyzers
"search_analyzer": "search_term_analyzer",
"analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
},
"Title": {
"type": "string",
"search_analyzer": "search_term_analyzer",
"analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
}
}
}
}
}
Also make sure to use analyzer instead of index_analyzer, the latter as been deprecated in ES 2.x
i have edge_ngram configured for a filed.
suppose the word is indexed in edge_ngram is : quick
and its analyzing as : q,qu,qui,quic,quick
when i am tring to search quickfull the words contaning quick is also coming in results.
i want words only containing quickfull comes else it gives no results.
this is my mapping :
{
"john_search": {
"aliases": {},
"mappings": {
"drugs": {
"properties": {
"chemical": {
"type": "string"
},
"cutting_allowed": {
"type": "boolean"
},
"id": {
"type": "long"
},
"is_banned": {
"type": "boolean"
},
"is_discontinued": {
"type": "boolean"
},
"manufacturer": {
"type": "string"
},
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete"
},
"price": {
"type": "string",
"index": "not_analyzed"
},
"refrigerated": {
"type": "boolean"
},
"sell_freq": {
"type": "long"
},
"xtra_name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1475061490060",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "100"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"dbl_metaphone": {
"filter": "my_metaphone",
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "qoRll9uATpegMtrnFTsqIw",
"version": {
"created": "2040099"
}
}
},
"warmers": {}
}
}
any help would be appreciated
It's because your name field has "analyzer": "autocomplete", which means that the autocomplete analyzer will also be applied at search time, hence the search term quickfull will be tokenized to q, qu, qui, quic, quick, quickf, quickfu, quickful and quickfull and that matches quick as well.
In order to prevent this, you need to set "search_analyzer": "standard" on the name field to override the index-time analyzer.
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard" <--- add this
},
I made a _mapping request to elasticsearch and see that for one field custom analyzer is used. The output for field like that:
"myFieldName": {
"type": "string",
"analyzer": "someCustomAnalyzer"
}
So is there are a way to get source for that someCustomAnalyzer? I have tried request curl -XGET localhost:9200/_analyze?analyzer=someCustomAnalyzer
and got:
{
"error": "ElasticsearchIllegalArgumentException[text is missing]",
"status": 400
}
If I add text argument for query string I got analyzing result for analyzing, but I need analyzer definition.
You can see it with settings. It's more readable now in 1.5 than it used to be.
So if I create an index with a non-trivial analyzer:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edge_ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"enabled": true,
"index_analyzer": "edge_ngram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"first_name": {
"type": "string",
"include_in_all": true
},
"last_name": {
"type": "string",
"include_in_all": true
},
"ssn": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
I can get the index settings with:
GET /test_index/_settings
...
{
"test_index": {
"settings": {
"index": {
"creation_date": "1430394627755",
"uuid": "78oYlYU9RS6LZ5YFyeaMRQ",
"analysis": {
"filter": {
"edge_ngram_filter": {
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "20"
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"edge_ngram_filter"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"number_of_shards": "1",
"version": {
"created": "1050099"
}
}
}
}
}
Here is the code I used:
http://sense.qbox.io/gist/4a38bdb0cb7d381caa29b9ce2c3c154b63cdc1f8