How to get the definitiion of a search analyzer of an index in elasticsearch - elasticsearch

The mapping of the elasticsearch index has a custom analyzer attached to it. How to read the definition of the custom analyzer.
http://localhost:9200/test_namespace/test_namespace/_mapping
"matchingCriteria": {
"type": "string",
"analyzer": "custom_analyzer",
"include_in_all": false
}
my search is not working with the analyzer thats why i need to know what exactly this analyzer is doing.
the doc explains how to modify an analyzer or attach a new analyzer to an existing index but i didnt find a way to see what an analyzer does.

use the _settings API:
curl -XGET 'http://localhost:9200/test_namespace/_settings?pretty=true'
it should generate a response similar to:
{
"test_namespace" : {
"settings" : {
"index" : {
"creation_date" : "1418990814430",
"routing" : {
"allocation" : {
"disable_allocation" : "false"
}
},
"uuid" : "FmX9NrSNSTO2bQM5pd-iQQ",
"number_of_replicas" : "2",
"analysis" : {
"analyzer" : {
"edi_analyzer" : {
"type" : "custom",
"char_filter" : [ "my_pattern" ],
"filter" : [ "lowercase", "length" ],
"tokenizer" : "whitespace"
},
"xml_analyzer" : {
"type" : "custom",
"char_filter" : [ "html_strip" ],
"filter" : [ "lowercase", "length" ],
"tokenizer" : "whitespace"
},
...

Related

Autocompletion with whitespace tokenizer in elasticsearch. Tokenize whitespaces correctly

I have an elastic index I want to do autocompletion with.
Therfore i have a suggestField of type completion where i put text that should be autocompleted.
"suggestField" : {
"type" : "completion",
"analyzer" : "IndexAnalyzer",
"search_analyzer" : "SearchAnalyzer",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
},
With Analyzers:
"IndexAnalyzer" : {
"filter" : [
"lowercase",
"stop",
"stopGerman",
"EdgeNGramFilter"
],
"type" : "custom",
"tokenizer" : "MyTokenizer"
},
"SearchAnalyzer" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "MyTokenizer"
},
Filters and Tokenizer:
"filter" : {
"EdgeNGramFilter" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "50"
},
"stopGerman" : {
"type" : "stop",
"stopwords" : "_german_"
}
},
"tokenizer" : {
"MyTokenizer" : {
"type" : "whitespace"
}
}
My Problem is now that if i query that field the autocompletion only works if i start at the beginning of the text, not for every word.
E.g. i have one value in my suggest field that looks like: "123-456-789 thisisatest"
If i search my suggest field for 123- i get that value as a result.
But if i search for thisis id do not get a result.
This is my Query.
POST myindex/_search?typed_keys=true
{
"suggest": {
"completion-term": {
"completion" : {
"field" : "suggestField"
} ,
"prefix" : "thisis"
}
}
}
The Question: How do I have to change the above setup to get the given value as a result if i search for thisis ?
FYI: If I use the IndexAnalyzer in kibana with an _analyze query for 123-456-789 thisisatest i get the (from my point of view correct) tokens:
1
12
123
123-
123-4
123-45
123-456
123-456-7
123-456-78
123-456-789
t
th
thi
this
thisi
thisis
thisisa
thisisat
thisisate
thisisates
thisisatest

How to highlight ngram tokens in a word using elastic search

I would like to highlight just the ngrams which match, not the whole word.
Example:
term: "Wo"
highlight should be: "<em>Wo</em>nderfull world!"
currently it is: "<em>Wonderfull</em> world!"
Mapping is:
{
"global_search_1495732922733" : {
"mappings" : {
"meeting" : {
"properties" : {
...
"name" : {
"type" : "text",
"analyzer" : "meeteor_index_analyzer",
"search_analyzer" : "meeteor_search_term_analyzer"
},
...
}
}
}
}
}
Analyzers are:
"analysis" : {
"filter" : {
"meeteor_stemmer" : {
"name" : "english",
"type" : "stemmer"
},
"meeteor_ngram" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "15"
}
},
"analyzer" : {
"meeteor_search_term_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
},
"meeteor_index_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"meeteor_ngram"
],
"tokenizer" : "standard"
},
"meeteor_project_id_analyzer" : {
"tokenizer" : "standard"
}
}
},
Concrete example:
curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "Me"
}
},
"highlight":{
"fields": {
"name": {}
}
}
}
'
The result is:
"...highlight" : {
"name" : [
"Sad <em>Meeting</em>"
]
}
The correct way to achieve what you want is using ngram as tokenizer and not filter. You can do something like this:
"analysis" : {
"filter" : {
"meeteor_stemmer" : {
"name" : "english",
"type" : "stemmer"
}
},
"tokenizer" : {
"meeteor_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "15"
}
},
"analyzer" : {
"meeteor_search_term_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
},
"meeteor_index_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "meeteor_ngram_tokenizer"
},
"meeteor_project_id_analyzer" : {
"tokenizer" : "standard"
}
}
},
It will generate the highlighting by ngram for you like this:
"...highlight" : {
"name" : [
"Sad <em>Me</em>eting"
]
}

Why doesnt Elasticsearch match any number in keyphrase?

I am searching for the following keyphrase: "xiaomi redmi note 3" in my Elasticsearch database. I m making the following bool query:
"filtered" : {
"query" : {
"match" : {
"name" : {
"query" : "xiaomi redmi note 3",
"type" : "boolean",
"operator" : "AND"
}
}
}
}
However, no matches are found. Still in Elasticsearch there is the following document:
xiaomi redmi note 3 16GB 4G Phablet
Why doesnt es match this document?
What I noticed in general is that es doesnt match any numbers in the keyprhase? Does it have to do with the analyzer I m using?
EDIT
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"autocomplete_filter"
]
},
and the mapping for my field is:
"name" : {
"type" : "string",
"index_analyzer" : "second",
"search_analyzer" : "standard",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
},
"search_quote_analyzer" : "second"
},
Autocomplete_filter:
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}

How to create and add custom analyser in Elastic search?

i have a batch of "smartphones" products in my ES and I need to query them by using "smart phone" text. So I m looking into the compound word token filter. Specifically , I m planning to use a custom filter like this:
curl -XPUT 'localhost:9200/_all/_settings -d '
{
"analysis" : {
"analyzer":{
"second":{
"type":"custom",
"tokenizer":"standard",
"filter":["myFilter"]
}
"filter": {
"myFilter" :{
"type" : "dictionary_decompounder"
"word_list": ["smart", "phone"]
}
}
}
}
}
'
Is this the correct approach ? Also I d like to ask you how can i create and add the custom analyser to ES? I looked into several links but couldn't figure out how to do it. I guess I m looking for the correct syntax.
Thank you
EDIT
I m running 1.4.5 version.
and I verified that the custom analyser was added successfully:
{
"test_index" : {
"settings" : {
"index" : {
"creation_date" : "1453761455612",
"analysis" : {
"filter" : {
"myFilter" : {
"type" : "dictionary_decompounder",
"word_list" : [ "smart", "phone" ]
}
},
"analyzer" : {
"second" : {
"type" : "custom",
"filter" : [ "lowercase", "myFilter" ],
"tokenizer" : "standard"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "xooKEdMBR260dnWYGN_ZQA"
}
}
}
}
Your approach looks good, I would also consider adding lowercase token filter, so that even Smartphone (notice Uppercase 'S') will be split into smart and phone.
Then You could create index with analyzer like this,
curl -XPUT 'localhost:9200/your_index -d '
{
"settings": {
"analysis": {
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"myFilter"
]
}
},
"filter": {
"myFilter": {
"type": "dictionary_decompounder",
"word_list": [
"smart",
"phone"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "second"
}
}
}
}
}
'
Here you are creating index named your_index, custom analyzer named second and applied that to name field.
You can check if the analyzer is working as expected with analyze api like this
curl -XGET 'localhost:9200/your_index/_analyze' -d '
{
"analyzer" : "second",
"text" : "LG Android smartphone"
}'
Hope this helps!!

Why ElasticSearch is not finding my term

I just installed and testing elastic search it looks great and i need to know some thing i have an configuration file
elasticsearch.json in config directory
{
"network" : {
"host" : "127.0.0.1"
},
"index" : {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval" : "2s",
"analysis" : {
"analyzer" : {
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase"]
}
},
"// you'll need lucene dep for this: filter" : {
"snowball": {
"type" : "snowball",
"language" : "English"
}
}
}
}
}
and i have inserted an doc that contains a word searching if i search for keyword
search it says nothing found...
wont it stem before indexing or i missed some thing in config ....
How looks your query?
your config does not look good. try:
...
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "snowball"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "snowball"]
}
},
"filter" : {
"snowball": {
"type" : "snowball",
"language" : "English"
}
}
I've had trouble overriding the "default_search" and "default_index" analyzer as well.
This works though.
You can add "index_analyzer" to default all string fields with unspecified analyzers within a type, if need be.
curl -XDELETE localhost:9200/twitter
curl -XPOST localhost:9200/twitter -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": { "a2" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball"]
}
}
}
}
}
}'
curl -XPUT localhost:9200/twitter/tweet/_mapping -d '{
"tweet" : {
"date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
"properties" : {
"user": {"type":"string"},
"message" : {"type" : "string", "analyzer":"a2"}
}
}}'
curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{ "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out searching teaching, so far so good?" }'
curl -XGET localhost:9200/twitter/tweet/_search?q=message:search
curl -XGET localhost:9200/twitter/tweet/_search?q=message:try

Resources