Multi field analyzer not working as expected - elasticsearch

I'm confused. I have the following document indexed:
POST test/topic
{
"title": "antiemetics"
}
With the following query:
{
"query": {
"query_string" : {
"fields" : ["title*"],
"default_operator": "AND",
"query" :"anti emetics",
"use_dis_max" : true
}
},
"highlight" : {
"fields" : {
"*" : {
"fragment_size" : 200,
"pre_tags" : ["<mark>"],
"post_tags" : ["</mark>"]
}
}
}
}
and the following settings and mappings:
POST test{
"settings":{
"index":{
"number_of_shards":1,
"analysis":{
"analyzer":{
"merge":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase"
],
"char_filter":[
"hyphen",
"space",
"html_strip"
]
}
},
"char_filter":{
"hyphen":{
"type":"pattern_replace",
"pattern":"[-]",
"replacement":""
},
"space":{
"type":"pattern_replace",
"pattern":" ",
"replacement":""
}
}
}
}
},
"mappings":{
"topic":{
"properties":{
"title":{
"analyzer":"standard",
"search_analyzer":"standard",
"type":"string",
"fields":{
"specialised":{
"type":"string",
"index":"analyzed",
"analyzer":"standard",
"search_analyzer":"merge"
}
}
}
}
}
}
}
I know my use of a multi-field doesn't make sense as I'm using the same index analyzer as the title so please just ignore that however I'm more interested in my understanding with regard to analyzers. I was expecting the merge analyzer to change the following query "anti emetics" to "antiemetics" and I was hoping the multifield setting that has the analyzer applied would match against the token "antiemetics" but I don't get any results back even though I have tested that the analyzer is removing white spaces from the query by running the analyze API. Any idea why?

This seems to work with your setup:
POST /test_index/_search
{
"query": {
"match": {
"title.specialised": "anti emetics"
}
}
}
Here's some code I set up to play with it:
http://sense.qbox.io/gist/3ef6926644213cf7db568557a801fec6cb15eaf9

Related

ElasticSearch accented and no accented words management

I created an index :
PUT members
{
"settings":{
"number_of_shards":1,
"analysis":{
"analyzer":{
"accentedNames":{
"tokenizer":"standard",
"filter":[
"lowercase",
"asciifolding"
]
},
"standardNames":{
"tokenizer":"standard",
"filter":[
"lowercase"
]
}
}
}
},
"mappings":{
"member":{
"properties":{
"id":{
"type":"text"
},
"name":{
"type":"text",
"analyzer":"standardNames",
"fields":{
"accented":{
"type":"text",
"analyzer":"accentedNames"
}
}
}
}
}
}
}
Assume that some documents are in this set (EDIT):
{"1", "Maéllys Macron"};
{"2", "Maêllys Alix"};
{"3", "Maëllys Rosa"};
{"4", "Maèllys Alix"};
{"5", "Maellys du Bois"};
I wanted to have this result :
If I want to get documents named "Maéllys", I expect to get "Maéllys Richard" as the best match, and others with the same score.
What I did is to use my analyzers with a such request :
GET members/member/_search
{
"query":{
"multi_match" : {
"query" : "Maéllys",
"fields" : [ "name", "name.accented" ]
}
}
}
"Maéllys Richard" has the best score. The documents "Ma(ê|ë|é|è)llys Richard have the same score that is higher than "Maellys Richard" document.
Can someone help me ?
Thanks.

Elasticsearch not analyzed and lowercase

I'm trying to make a field lowercase and not analyzed in Elasticsearch 5+ in order to search for strings with spaces in lowercase (them being indexed in mixed case)
Before Elasticsearch v5 we could use an analyzer like this one to accomplish it:
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
}
This however doesn't work for me right now. And I believe the problem to be that "string" is deprecated and automatically converted to either keyword or text.
Anyone here know how to accomplish this? I thought about adding a "fields" tag to my mapping along the lines of:
"fields": {
"lowercase": {
"type": "string"
**somehow convert to lowercase**
}
}
This would make working with it slightly more challenging and I have no idea how to convert it to lowercase either.
Below you'll find a test setup which reproduces my exact problem.
create index:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"test":{
"properties":{
"name":{
"analyzer":"analyzer_keyword",
"type":"string"
}
}
}
}
}
Add a test record:
{
"name": "city test"
}
Query that should match:
{
"size": 20,
"from": 0,
"query": {
"bool": {
"must": [{
"bool": {
"should": [{
"wildcard": {
"name": "*city t*"
}
}]
}
}]
}
}
}
When creating your index, you need to make sure that the analysis section is right under the settings section and not inside the settings > index section otherwise it won't work.
Then you also need to use the text data type for your field instead of the string one. Wipe your index, do that and it will work.
{
"settings":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
},
"mappings":{
"test":{
"properties":{
"name":{
"analyzer": "analyzer_keyword",
"type": "text"
}
}
}
}
}

Elasticsearch search pattern with Start string

I am new to elasticsearch and trying to implement search. Below is my index and settingscurl -XPUT localhost:9200/rets_data/ -d '{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_startswith":{
"tokenizer":"keyword",
"filter":"lowercase"
},
"analyzer_whitespacewith":{
"tokenizer":"whitespace",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"city":{
"properties":{
"CityName":{
"analyzer":"analyzer_startswith",
"type":"string"
}
}
},
"rets_aux_subdivision":{
"properties":{
"nn":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
},
"field_LIST_77":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
},
"SubDivisionName":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
},
"SubDivisionAlias":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
}
}
},
"rental_aux_subdivision":{
"properties":{
"nn":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
},
"field_LIST_77":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
},
"SubDivisionName":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
},
"SubDivisionAlias":{
"analyzer":"analyzer_whitespacewith",
"type":"string"
}
}
}
}
}'
Below is search string
curl -XGET localhost:9200/rets_data/rets_aux_subdivision/_search?pretty -d '{"query":{"match_phrase_prefix":{"nn":{"query":"boca w","max_expansions":50}}},"sort":{"total":{"order":"desc"}},"size":100}'
When i am searching for any text like "Boca r", "Boca w" it is not giving me result.
My expected result is below.
"Boca w" should give me result starting with "Boca w". i.e "Boca west", "Boca Woods", "Boca Winds"
Please help me on this.
Thanks
You should use edgeNgram. Check this out in elasticsearch documentation. EdgeNgram filter prepare multiple words from one like this:
Woods->[W,Wo,Woo,Wood,Woods]
It makes index bigger, but searching will be more efficient than any other option like wildcards etc. Here is my simple index creation with ngrams on title.ngram:
{
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"ngram_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["lowercase","my_ngram"]
}
},
"filter" : {
"my_ngram" : {
"type" : "edge_ngram",
"min_gram" : 1,
"max_gram" : 50
}
}
}
}
},
"mappings":
{
"post":
{
"properties":
{
"id":
{
"type": "integer",
"index":"no"
},
"title":
{
"type": "text",
"analyzer":"ngram_analyzer"
}
}
}
}
}
And search query:
{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"title":
{
"query":"press key han",
"operator":"or",
"analyzer":"standard"
}
}
}
}
What if you have your match something like this:
"query": {
"match_phrase": {
"text": {
"query": "boca w"
}
}
},
"sort":{
"total":{
"order":"desc"
}
},
"size":100
Or you could use the wildcard query:
"query": {
"wildcard" : {
"yourfield" : "boca w*"
}
}
This SO could be helpful. Hope it helps!

How do I search for partial accented keyword in elasticsearch?

I have the following elasticsearch settings:
"settings": {
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":["lowercase", "asciifolding"]
}
}
}
}
}
The above works fine for the following keywords:
Beyoncé
Céline Dion
The above data is stored in elasticsearch as beyonce and celine dion respectively.
I can search for Celine or Celine Dion without the accent and I get the same results. However, the moment I search for Céline, I don't get any results. How can I configure elasticsearch to search for partial keywords with the accent?
The query body looks like:
{
"track_scores": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["name"],
"type": "phrase",
"query": "Céline"
}
}
]
}
}
}
and the mapping is
"mappings" : {
"artist" : {
"properties" : {
"name" : {
"type" : "string",
"fields" : {
"orig" : {
"type" : "string",
"index" : "not_analyzed"
},
"simple" : {
"type" : "string",
"analyzer" : "analyzer_keyword"
}
},
}
I would suggest this mapping and then go from there:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
}
Confirm that the same analyzer is getting used at query time. Here are some possible reasons why that might not be happening:
you specify a separate analyzer at query time on purpose that is not performing similar analysis
you are using a term or terms query for which no analyzer is applied (See Term Query and the section title "Why doesn’t the term query match my document?")
you are using a query_string query (E.g. see Simple Query String Query) - I have found that if you specify multiple fields with different analyzers and so I have needed to separate the fields into separate queries and specify the analyzer parameter (working with version 2.0)

Optimising ElasticSearch aggregated search suggestions

I'm working on implementing an autocomplete field where the suggestions also contain the number of matching documents.
I have implemented this simply using a terms aggregation with include filter. So for instance given a user typing 'Chrysler' the following query may be generated:
{
"size": 0,
"query": {
"bool": {
"must": [
...
]
}
},
"aggs": {
"filtered": {
"filter": {
...
},
"aggs": {
"suggestions": {
"terms": {
"field": "prefLabel",
"include": "Chry.*",
"min_doc_count": 0
}
}
}
}
}
}
This works fine and I am able to get the data I need. However, I am concerned that this is not very well optimised and more could be done when the documents are indexed.
Currently we have the following mapping:
{
...
"prefLabel":{
"type":"string",
"index":"not_analyzed"
}
}
And I am wondering whether to add an analysed field, like so:
{
...
"prefLabel":{
"type":"string",
"index":"not_analyzed",
"copy_to":"searchLabel"
},
"searchLabel":{
"type":"string",
"analyzer":"???"
}
}
So my question is: what is the most optimal index-time analyser for this? (or, is this just crazy?)
I think that edge ngram tokenizer would speed things up:
curl -XPUT 'localhost:9200/test_ngram' -d '{
"settings" : {
"analysis" : {
"analyzer" : {
"suggester_analyzer" : {
"tokenizer" : "ngram_tokenizer"
}
},
"tokenizer" : {
"ngram_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "7",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
...
"searchLabel": {
"type": "string",
"index_analyzer": "suggster_analyzer",
"search_analyzer": "standard"
}
...
}
}'

Resources