How get custome analyzer source from elasticsearch? - elasticsearch

I made a _mapping request to elasticsearch and see that for one field custom analyzer is used. The output for field like that:
"myFieldName": {
"type": "string",
"analyzer": "someCustomAnalyzer"
}
So is there are a way to get source for that someCustomAnalyzer? I have tried request curl -XGET localhost:9200/_analyze?analyzer=someCustomAnalyzer
and got:
{
"error": "ElasticsearchIllegalArgumentException[text is missing]",
"status": 400
}
If I add text argument for query string I got analyzing result for analyzing, but I need analyzer definition.

You can see it with settings. It's more readable now in 1.5 than it used to be.
So if I create an index with a non-trivial analyzer:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edge_ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"enabled": true,
"index_analyzer": "edge_ngram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"first_name": {
"type": "string",
"include_in_all": true
},
"last_name": {
"type": "string",
"include_in_all": true
},
"ssn": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
I can get the index settings with:
GET /test_index/_settings
...
{
"test_index": {
"settings": {
"index": {
"creation_date": "1430394627755",
"uuid": "78oYlYU9RS6LZ5YFyeaMRQ",
"analysis": {
"filter": {
"edge_ngram_filter": {
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "20"
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"edge_ngram_filter"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"number_of_shards": "1",
"version": {
"created": "1050099"
}
}
}
}
}
Here is the code I used:
http://sense.qbox.io/gist/4a38bdb0cb7d381caa29b9ce2c3c154b63cdc1f8

Related

how to add filter and mappings to elasticsearch schema while creating the index?

I want to use filters like synonyms and stopwords along with mapping types in elastic search schema while indexing. Below is the json I am using. But when i use the json below, I am able to get the mappings but the filters are lost. What could be the reason? (I am using elasticsearch 6.2)
nlp_settings = {
"settings": {
"index" : {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["synonym", "stop_words", "lowercase",
"stop_words_user", "synonym_user"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"stop_words": {
"type": "stop",
"stopwords_path": "stopwords.txt"
},
"stop_words_user": {
"type": "stop",
"stopwords": "_none_"
},
"synonym_user": {
"type": "synonym",
"synonyms": default_synonym
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"section":{"type": "text"},
"document_name": {"type": "text"},
"dir_path_info": {"type": "text"},
"nlu_raw": {
"noun_list": {"type": "nested"},
"verb_list": {"type": "nested"},
},
"nlu": {
"noun": {"type": "nested"},
"verb": {"type": "nested"}
}
}
}
}
}
When I use the mappings along with the filters, I get the following JSON when I GET from this url http://localhost:9233/test/_settings
{
"test": {
"settings": {
"index": {
"creation_date": "1523962921677",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "FevdHGZjQm6ke2FgeNdnMQ",
"version": {
"created": "6020199"
},
"provided_name": "test"
}
}
}
}
However, what i actually want is
{
"test": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "test",
"creation_date": "1523963029203",
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"synonym_user": {
"type": "synonym",
"synonyms": [
"a, a"
]
},
"stop_words_user": {
"type": "stop",
"stopwords": [
"please",
"help"
]
},
"stop_words": {
"type": "stop",
"stopwords_path": "stopwords.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym",
"stop_words",
"lowercase",
"stop_words_user",
"synonym_user"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "CiBBgngdR_aNHkY1m0EtXw",
"version": {
"created": "6020199"
}
}
}
}
}
I get this, when I remove the mappings from the schema.
settings and mappings should be on the same level. So:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": [
"synonym",
"stop_words",
"lowercase",
"stop_words_user",
"synonym_user"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"stop_words": {
"type": "stop",
"stopwords_path": "stopwords.txt"
},
"stop_words_user": {
"type": "stop",
"stopwords": "_none_"
},
"synonym_user": {
"type": "synonym",
"synonyms": "default_synonym"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"section": {
"type": "text"
},
"document_name": {
"type": "text"
},
"dir_path_info": {
"type": "text"
},
"nlu_raw": {
"properties": {
"noun_list": {
"type": "nested"
},
"verb_list": {
"type": "nested"
}
}
},
"nlu": {
"properties": {
"noun": {
"type": "nested"
},
"verb": {
"type": "nested"
}
}
}
}
}
}
}

Elastic search: Run multiple analyzers on the same data

I am looking for a way to make ES search the data with multiple analyzers.
NGram analyzer and one or few language analyzers.
Possible solution will be to use multi-fields and explicitly declare which analyzer to use for each field.
For example, to set the following mappings:
"mappings": {
"my_entity": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"ngram": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"spanish": {
"type": "string",
"analyzer": "spanish"
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
The problem with that is that I have explicitly write every field and its analyzers to a search query.
And it will not allow to search with "_all" and use multiple analyzers.
Is there a way to make "_all" query use multiple analyzers?
Something like "_all.ngram", "_all.spanish" and without using copy_to do duplicate the data?
Is it possible to combine ngram analyzer with a spanish (or any other foreign language) and make a single custom analyzer?
I have tested the following settings but these did not work:
PUT /ngrams_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
}
},
"filter": {
"ngram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": ["ejemplo"]
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer"
]
}
}
}
},
"mappings": {
"my_entity": {
"_all": {
"enabled": true,
"analyzer": "ngram_analyzer"
},
"properties": {
"my_field": {
"type": "text",
"fields": {
"analyzer1": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"analyzer2": {
"type": "string",
"analyzer": "spanish"
},
"analyzer3": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
}
GET /ngrams_index/_analyze
{
"field": "_all",
"text": "Hola, me llamo Juan."
}
returns: just ngram results, without Spanish analysis
where
GET /ngrams_index/_analyze
{
"field": "my_field.analyzer2",
"text": "Hola, me llamo Juan."
}
properly analyzes the search string.
Is it possible to build a custom analyzer which combine Spanish and ngram?
There is a way to create a custom ngram+language analyzer:
PUT /ngrams_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": [
"ejemplo"
]
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer",
"ngram_filter"
]
}
}
}
},
"mappings": {
"my_entity": {
"_all": {
"enabled": true,
"analyzer": "ngram_analyzer"
},
"properties": {
"my_field": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
}
}
}
GET /ngrams_index/_analyze
{
"field": "my_field",
"text": "Hola, me llamo Juan."
}

Not able to search a phrase in elasticsearch 5.4

I am searching for a phrase in a email body. Need to get the exact data filtered like, if I search for 'Avenue New', it should return only results which has the phrase 'Avenue New' not 'Avenue Street', 'Park Avenue'etc
My mapping is like:
{
"exchangemailssql": {
"aliases": {},
"mappings": {
"email": {
"dynamic_templates": [
{
"_default": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"doc_values": true,
"type": "keyword"
}
}
}
],
"properties": {
"attachments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"body": {
"type": "text",
"analyzer": "keylower",
"fielddata": true
},
"count": {
"type": "short"
},
"emailId": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "3s",
"number_of_shards": "1",
"provided_name": "exchangemailssql",
"creation_date": "1500527793230",
"analysis": {
"filter": {
"nGram": {
"min_gram": "4",
"side": "front",
"type": "edge_ngram",
"max_gram": "100"
}
},
"analyzer": {
"keylower": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
},
"email": {
"filter": [
"lowercase",
"unique",
"nGram"
],
"type": "custom",
"tokenizer": "uax_url_email"
},
"full": {
"filter": [
"lowercase",
"snowball",
"nGram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "0",
"uuid": "2XTpHmwaQF65PNkCQCmcVQ",
"version": {
"created": "5040099"
}
}
}
}
}
I have given the search query like:
{
"query": {
"match_phrase": {
"body": "Avenue New"
}
},
"highlight": {
"fields" : {
"body" : {}
}
}
}
The problem here is that you're tokenizing the full body content using the keyword tokenizer, i.e. it will be one big lowercase string and you cannot search inside of it.
If you simply change the analyzer of your body field to standard instead of keylower, you'll find what you need using the match_phrase query.
"body": {
"type": "text",
"analyzer": "standard", <---change this
"fielddata": true
},

elastic search edge_ngram issue?

i have edge_ngram configured for a filed.
suppose the word is indexed in edge_ngram is : quick
and its analyzing as : q,qu,qui,quic,quick
when i am tring to search quickfull the words contaning quick is also coming in results.
i want words only containing quickfull comes else it gives no results.
this is my mapping :
{
"john_search": {
"aliases": {},
"mappings": {
"drugs": {
"properties": {
"chemical": {
"type": "string"
},
"cutting_allowed": {
"type": "boolean"
},
"id": {
"type": "long"
},
"is_banned": {
"type": "boolean"
},
"is_discontinued": {
"type": "boolean"
},
"manufacturer": {
"type": "string"
},
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete"
},
"price": {
"type": "string",
"index": "not_analyzed"
},
"refrigerated": {
"type": "boolean"
},
"sell_freq": {
"type": "long"
},
"xtra_name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1475061490060",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "100"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"dbl_metaphone": {
"filter": "my_metaphone",
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "qoRll9uATpegMtrnFTsqIw",
"version": {
"created": "2040099"
}
}
},
"warmers": {}
}
}
any help would be appreciated
It's because your name field has "analyzer": "autocomplete", which means that the autocomplete analyzer will also be applied at search time, hence the search term quickfull will be tokenized to q, qu, qui, quic, quick, quickf, quickfu, quickful and quickfull and that matches quick as well.
In order to prevent this, you need to set "search_analyzer": "standard" on the name field to override the index-time analyzer.
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard" <--- add this
},

Elasticsearch dynamic templating

I'm trying to get ES's (using I'm using ES v1.4.1) dynamic templating to work on my local machine and for some reason the "mappings" are not being included? I first create the index with a simple
PUT /bigtestindex (I'm using Sense plugin, not curl),
then I follow that with
PUT /_template/bigtestindex_1
{
"template": "big*",
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"autocomplete_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"mappings": {
"doc": {
"properties": {
"anchor": {
"type": "string"
},
"boost": {
"type": "string"
},
"content": {
"type": "string",
"analyzer": "whitespace_analyzer"
},
"digest": {
"type": "string"
},
"host": {
"type": "string"
},
"id": {
"type": "string"
},
"metatag.description": {
"type": "string",
"analyzer": "standard"
},
"metatag.keywords": {
"type": "string",
"analyzer": "standard"
},
"segment": {
"type": "string"
},
"title": {
"type": "string",
"index": "not_analyzed",
"fields": {
"autocomplete": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "whitespace_analyzer"
}
}
},
"tstamp": {
"type": "date",
"format": "dateOptionalTime"
},
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I'm not receiving any errors and the syntax looks to be correct but when I do something like
GET /bigtestindex/_mappings
in Sense, I get
{
"bigtestindex": {
"mappings": {}
}
}
First you need to create template then create index. You can find the same from elasticsearch documentation.
Templates are only applied at index creation time. Changing a template will have no impact on existing indices.
It seems my Sense command was a bit off, should have been
PUT /bigtestindex/_template/bigtesttemplate_1 (creates index and template in one command
OR
PUT /_template/bigtesttemplate_1 (creates just template) thanks to #avr for pointing out my incorrect command (needed some fresh eyes)
instead of
PUT /bigtestindex/_template/bigtesttemplate_1
discovered this after trying several things, hth someone else
UPDATE
As #avr stated, you do need to create the template first and then the index, you can create the index and the template in the same PUT statement as well.
It has everything to do with making sure your JSON is setup properly to match the right API endpoints. "mappings" should be separate from settings i.e.
{
"settings" {
...
},
"mappings" {
...
}
}
NOT
{
"settings" {
...
"mappings" {
}
}
"mappings" should NOT be included in the `"settings"` - needs to be separate.
hth, anyone else having the same problem

Resources