Elastic Search Analyzer - Stemmer Not Working - elasticsearch

The settings for one of my indexes is as follows, however the stemmer isn't being applied. For example a search for fox will not pick up articles that include the term foxes. I can't see why as the order of the filters is correct (lowercase precedes the stemmer).
{
"articles": {
"settings": {
"index": {
"creation_date": "1436255268907",
"analysis": {
"filter": {
"filter_stemmer": {
"type": "stemmer",
"language": "english"
},
"kill_filters": {
"pattern": ".*_.*",
"type": "pattern_replace",
"replacement": ""
},
"filter_stop": {
"type": "stop"
},
"filter_shingle": {
"min_shingle_size": "2",
"max_shingle_size": "5",
"type": "shingle",
"output_unigrams": "true"
},
"filter_stemmerposs": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"tags_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"filter_stemmerposs",
"filter_stemmer"
],
"tokenizer": "patterntoke"
},
"shingles_analyzer": {
"filter": [
"standard",
"lowercase",
"filter_stop",
"filter_shingle",
"kill_filters",
"filter_stemmerposs",
"filter_stemmer"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "standard"
}
},
"tokenizer": {
"patterntoke": {
"type": "pattern",
"pattern": ","
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1060099"
},
"uuid": "H2NsE3eKT1y_ArPOPbjT6w"
}
}
}
}
And below is the mapping:
{
"articles": {
"mappings": {
"article": {
"properties": {
"accountid": {
"type": "double",
"include_in_all": false
},
"article": {
"type": "string",
"index_analyzer": "shingles_analyzer"
},
"articleid": {
"type": "double",
"include_in_all": false
},
"categoryid": {
"type": "double",
"include_in_all": false
},
"draftflag": {
"type": "double",
"include_in_all": false
},
"files": {
"type": "string",
"index_analyzer": "tags_analyzer"
},
"tags": {
"type": "string",
"index_analyzer": "tags_analyzer"
},
"title": {
"type": "string",
"index_analyzer": "shingles_analyzer"
},
"topicid": {
"type": "double",
"include_in_all": false
}
}
}
}
}
}
The sample documents are varied but for example 1 contains the token fox and another foxes (both derived from the article field) but each document is only found when the search is fox or foxes and not either which is what I'd expect. The search used Is a fuzzylikethis search (I'm using Nest .net to execute the query)

Related

Elasticsearch multilingual Search

I am working on a project to perform multilingual full-text search using Elasticsearch.
one field can contain a word combination of different languages or transliteration. for example in the English text may contain Armenian words. or Russian words in the Armenian text.
and i am trying now to configure text analysis with language analyzer.
How correct is my analyzer, And will it work at all ?
PUT /example{
"settings": {
"analysis": {
"filter": {
"armenian_stop": {
"type": "stop",
"stopwords": "_armenian_"
},
"armenian_keywords": {
"type": "keyword_marker",
"keywords": ["օրինակ"]
},
"armenian_stemmer": {
"type": "stemmer",
"language": "armenian"
},
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_keywords": {
"type": "keyword_marker",
"keywords": ["пример"]
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
},
"graph_synonyms": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt"
}
},
"analyzer": {
"rebuilt_armenian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"armenian_stop",
"armenian_keywords",
"armenian_stemmer",
"russian_stop",
"russian_keywords",
"russian_stemmer",
"graph_synonyms"
]
}
}
}},"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text", "analyzer": "rebuilt_armenian" } ,
"location": {
"type": "geo_point"
}
}}}
I work in a different way with multilinguals. It seems that in your case you don't know what language it is before indexing. In my current scenario, for each language I create a field, using "fields", and for each field I use the language-specific analyzer.
{
"settings": {
"analysis": {
"filter": {
"armenian_stop": {
"type": "stop",
"stopwords": "_armenian_"
},
"armenian_keywords": {
"type": "keyword_marker",
"keywords": [
"օրինակ"
]
},
"armenian_stemmer": {
"type": "stemmer",
"language": "armenian"
},
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_keywords": {
"type": "keyword_marker",
"keywords": [
"пример"
]
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
},
"graph_synonyms": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt"
}
},
"analyzer": {
"rebuilt_armenian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"armenian_stop",
"armenian_keywords",
"armenian_stemmer"
]
},
"rebuilt_russian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"russian_stop",
"russian_keywords",
"russian_stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"age": {
"type": "integer"
},
"email": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"ar": {
"type": "text",
"analyzer": "rebuilt_armenian"
},
"ru": {
"type": "text",
"analyzer": "rebuilt_russian"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
And during the indexing and during the search I don't know what language the text is in.
and as far as I understand it is necessary to search for specific fields, if you search for example by "name" then the standard parser will work
{
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": [ "name.ar", "name.ru"],
"query": "phone"
}
}
],
"filter": [
{
"geo_distance": {
"distance": "25km",
"location": {
"lat": 40.79420000 ,
"lon": 43.84528000
}
}
}
]
}
}
}
You can try to check your analyzer with the analyzer API: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
Enter some mixed text and see if the result is what you want.
Sometimes it is also ok to just use standard analyzer and forget about eliminating language-specific stopwords or stemming.

Elastic Search analyzer and synonym not working

Here is my mapping and properties
POST hr-profile/employee-type
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/names.txt"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"my_metaphone"
],
"char_filter": [
"my_pattern"
],
"tokenizer": "standard"
},
"synonym": {
"filter": [
"synonym"
],
"char_filter": [
"my_pattern"
],
"tokenizer": "whitespace"
}
},
"char_filter": {
"my_pattern": {
"pattern": "\\.|\\;|\\,",
"type": "pattern_replace",
"replacement": " "
}
}
}
},
"mappings": {
"properties": {
"companyid": {
"type": "integer"
},
"emailaddress": {
"type": "text"
},
"employeeid": {
"type": "text"
},
"firstname": {
"type": "text",
"analyzer": "my_analyzer"
},
"lastname": {
"type": "text",
"analyzer": "my_analyzer"
},
"phonenumber": {
"type": "text"
},
"profileid": {
"type": "text"
}
}
}
}
I have data in the index but getting error
[match] analyzer [synonym] not found"
Help needed pls.

Not able to search a phrase in elasticsearch 5.4

I am searching for a phrase in a email body. Need to get the exact data filtered like, if I search for 'Avenue New', it should return only results which has the phrase 'Avenue New' not 'Avenue Street', 'Park Avenue'etc
My mapping is like:
{
"exchangemailssql": {
"aliases": {},
"mappings": {
"email": {
"dynamic_templates": [
{
"_default": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"doc_values": true,
"type": "keyword"
}
}
}
],
"properties": {
"attachments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"body": {
"type": "text",
"analyzer": "keylower",
"fielddata": true
},
"count": {
"type": "short"
},
"emailId": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "3s",
"number_of_shards": "1",
"provided_name": "exchangemailssql",
"creation_date": "1500527793230",
"analysis": {
"filter": {
"nGram": {
"min_gram": "4",
"side": "front",
"type": "edge_ngram",
"max_gram": "100"
}
},
"analyzer": {
"keylower": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
},
"email": {
"filter": [
"lowercase",
"unique",
"nGram"
],
"type": "custom",
"tokenizer": "uax_url_email"
},
"full": {
"filter": [
"lowercase",
"snowball",
"nGram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "0",
"uuid": "2XTpHmwaQF65PNkCQCmcVQ",
"version": {
"created": "5040099"
}
}
}
}
}
I have given the search query like:
{
"query": {
"match_phrase": {
"body": "Avenue New"
}
},
"highlight": {
"fields" : {
"body" : {}
}
}
}
The problem here is that you're tokenizing the full body content using the keyword tokenizer, i.e. it will be one big lowercase string and you cannot search inside of it.
If you simply change the analyzer of your body field to standard instead of keylower, you'll find what you need using the match_phrase query.
"body": {
"type": "text",
"analyzer": "standard", <---change this
"fielddata": true
},

Why is this term query not returning any results?

I am currently implementing a simple person search in elastic search. I did some research and found quite a lot content about how to implement features as full text search and so on.
The problem is, that some queries just don't return any results.
I have the following index template:
PUT /_template/template_hca_bp
{
"template": "test",
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
},
"search_ngram": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"persons": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"search_analyzer": "standard",
"analyzer": "autocomplete"
},
"countryCode": {
"type": "keyword"
},
"doorNumber": {
"type": "keyword"
},
"id": {
"type": "text",
"index": "no",
"include_in_all": false
},
"stairwayNumber": {
"type": "keyword"
},
"street": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
},
"streetNumber": {
"type": "keyword"
},
"zipCode": {
"type": "keyword"
}
}
},
"id": {
"type": "keyword",
"index": "no",
"include_in_all": false
},
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard",
"boost":2
},
"personType": {
"type": "keyword",
"index": "no",
"include_in_all": false
},
"title": {
"type": "text"
}
}
}
}
}
My query looks like the following:
POST test/_search
{
"query": {
"multi_match": {
"query": "Maria",
"type":"cross_fields",
"fields": [
"name^2", "city", "street", "streetNumber", "zipCode"
]
}
}
}
If I now search e.g. for "Maria" then I get a result. But if I'm searching for a zipCode (e.g. 12345) than I don't get any result.
The analyze api has the following response:
"detail": {
"custom_analyzer": false,
"analyzer": {
"name": "default",
"tokens": [
{
"token": "12345",
"start_offset": 0,
"end_offset": 5,
"type": "<NUM>",
"position": 0,
"bytes": "[31 32 33 34 35]",
"positionLength": 1
}
]
}
}
I'm not getting any response. I have tried term, and match queries and all other kind of stuff, but I can't get it working?
The desired document:
"id": "V2718984F3A0ADA95176424457A068F9DC93FC8BDA0898A4E8248F194AE1AF4FCE04C29F46367DDEC33721C15C2679B7BB",
"name": "Maria Smith",
"personType": "APO",
"address": {
"countryCode": "A",
"city": "Testcity",
"zipCode": "12345",
"street": "Avenue",
"streetNumber": "2"
}

elastic search edge_ngram issue?

i have edge_ngram configured for a filed.
suppose the word is indexed in edge_ngram is : quick
and its analyzing as : q,qu,qui,quic,quick
when i am tring to search quickfull the words contaning quick is also coming in results.
i want words only containing quickfull comes else it gives no results.
this is my mapping :
{
"john_search": {
"aliases": {},
"mappings": {
"drugs": {
"properties": {
"chemical": {
"type": "string"
},
"cutting_allowed": {
"type": "boolean"
},
"id": {
"type": "long"
},
"is_banned": {
"type": "boolean"
},
"is_discontinued": {
"type": "boolean"
},
"manufacturer": {
"type": "string"
},
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete"
},
"price": {
"type": "string",
"index": "not_analyzed"
},
"refrigerated": {
"type": "boolean"
},
"sell_freq": {
"type": "long"
},
"xtra_name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1475061490060",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "100"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"dbl_metaphone": {
"filter": "my_metaphone",
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "qoRll9uATpegMtrnFTsqIw",
"version": {
"created": "2040099"
}
}
},
"warmers": {}
}
}
any help would be appreciated
It's because your name field has "analyzer": "autocomplete", which means that the autocomplete analyzer will also be applied at search time, hence the search term quickfull will be tokenized to q, qu, qui, quic, quick, quickf, quickfu, quickful and quickfull and that matches quick as well.
In order to prevent this, you need to set "search_analyzer": "standard" on the name field to override the index-time analyzer.
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard" <--- add this
},

Resources