i have edge_ngram configured for a filed.
suppose the word is indexed in edge_ngram is : quick
and its analyzing as : q,qu,qui,quic,quick
when i am tring to search quickfull the words contaning quick is also coming in results.
i want words only containing quickfull comes else it gives no results.
this is my mapping :
{
"john_search": {
"aliases": {},
"mappings": {
"drugs": {
"properties": {
"chemical": {
"type": "string"
},
"cutting_allowed": {
"type": "boolean"
},
"id": {
"type": "long"
},
"is_banned": {
"type": "boolean"
},
"is_discontinued": {
"type": "boolean"
},
"manufacturer": {
"type": "string"
},
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete"
},
"price": {
"type": "string",
"index": "not_analyzed"
},
"refrigerated": {
"type": "boolean"
},
"sell_freq": {
"type": "long"
},
"xtra_name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1475061490060",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "100"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"dbl_metaphone": {
"filter": "my_metaphone",
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "qoRll9uATpegMtrnFTsqIw",
"version": {
"created": "2040099"
}
}
},
"warmers": {}
}
}
any help would be appreciated
It's because your name field has "analyzer": "autocomplete", which means that the autocomplete analyzer will also be applied at search time, hence the search term quickfull will be tokenized to q, qu, qui, quic, quick, quickf, quickfu, quickful and quickfull and that matches quick as well.
In order to prevent this, you need to set "search_analyzer": "standard" on the name field to override the index-time analyzer.
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard" <--- add this
},
Related
I am looking for a way to make ES search the data with multiple analyzers.
NGram analyzer and one or few language analyzers.
Possible solution will be to use multi-fields and explicitly declare which analyzer to use for each field.
For example, to set the following mappings:
"mappings": {
"my_entity": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"ngram": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"spanish": {
"type": "string",
"analyzer": "spanish"
},
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
The problem with that is that I have explicitly write every field and its analyzers to a search query.
And it will not allow to search with "_all" and use multiple analyzers.
Is there a way to make "_all" query use multiple analyzers?
Something like "_all.ngram", "_all.spanish" and without using copy_to do duplicate the data?
Is it possible to combine ngram analyzer with a spanish (or any other foreign language) and make a single custom analyzer?
I have tested the following settings but these did not work:
PUT /ngrams_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
}
},
"filter": {
"ngram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": ["ejemplo"]
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer"
]
}
}
}
},
"mappings": {
"my_entity": {
"_all": {
"enabled": true,
"analyzer": "ngram_analyzer"
},
"properties": {
"my_field": {
"type": "text",
"fields": {
"analyzer1": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"analyzer2": {
"type": "string",
"analyzer": "spanish"
},
"analyzer3": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
}
GET /ngrams_index/_analyze
{
"field": "_all",
"text": "Hola, me llamo Juan."
}
returns: just ngram results, without Spanish analysis
where
GET /ngrams_index/_analyze
{
"field": "my_field.analyzer2",
"text": "Hola, me llamo Juan."
}
properly analyzes the search string.
Is it possible to build a custom analyzer which combine Spanish and ngram?
There is a way to create a custom ngram+language analyzer:
PUT /ngrams_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": [
"ejemplo"
]
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer",
"ngram_filter"
]
}
}
}
},
"mappings": {
"my_entity": {
"_all": {
"enabled": true,
"analyzer": "ngram_analyzer"
},
"properties": {
"my_field": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
}
}
}
GET /ngrams_index/_analyze
{
"field": "my_field",
"text": "Hola, me llamo Juan."
}
I am searching for a phrase in a email body. Need to get the exact data filtered like, if I search for 'Avenue New', it should return only results which has the phrase 'Avenue New' not 'Avenue Street', 'Park Avenue'etc
My mapping is like:
{
"exchangemailssql": {
"aliases": {},
"mappings": {
"email": {
"dynamic_templates": [
{
"_default": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"doc_values": true,
"type": "keyword"
}
}
}
],
"properties": {
"attachments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"body": {
"type": "text",
"analyzer": "keylower",
"fielddata": true
},
"count": {
"type": "short"
},
"emailId": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "3s",
"number_of_shards": "1",
"provided_name": "exchangemailssql",
"creation_date": "1500527793230",
"analysis": {
"filter": {
"nGram": {
"min_gram": "4",
"side": "front",
"type": "edge_ngram",
"max_gram": "100"
}
},
"analyzer": {
"keylower": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
},
"email": {
"filter": [
"lowercase",
"unique",
"nGram"
],
"type": "custom",
"tokenizer": "uax_url_email"
},
"full": {
"filter": [
"lowercase",
"snowball",
"nGram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "0",
"uuid": "2XTpHmwaQF65PNkCQCmcVQ",
"version": {
"created": "5040099"
}
}
}
}
}
I have given the search query like:
{
"query": {
"match_phrase": {
"body": "Avenue New"
}
},
"highlight": {
"fields" : {
"body" : {}
}
}
}
The problem here is that you're tokenizing the full body content using the keyword tokenizer, i.e. it will be one big lowercase string and you cannot search inside of it.
If you simply change the analyzer of your body field to standard instead of keylower, you'll find what you need using the match_phrase query.
"body": {
"type": "text",
"analyzer": "standard", <---change this
"fielddata": true
},
If the query was Brid I want to get <em>Brid</em>gitte in highlighted fields, not the whole word <em>Bridgitte</em>
My index looks like this (I've added ngram analyzer as was suggested here Highlighting part of word in elasticsearch)
{
"myindex": {
"aliases": {},
"mappings": {
"mytype": {
"properties": {
"myarrayproperty": {
"properties": {
"mystringproperty1": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
},
"mystringproperty2": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
}
},
"mylongproperty": {
"type": "long"
},
"mydateproperty": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"mystringproperty3": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
},
"mystringproperty4": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
}
}
}
},
"settings": {
"index": {
"creation_date": "1498030893611",
"analysis": {
"analyzer": {
"search_term_analyzer": {
"filter": "lowercase",
"type": "custom",
"tokenizer": "ngram_tokenizer"
},
"index_ngram_analyzer": {
"filter": ["lowercase"],
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"token_chars": ["letter", "digit"],
"min_gram": "1",
"type": "nGram",
"max_gram": "15"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "e5kBX-XRTKOqeAScO1Fs0w",
"version": {
"created": "2040499"
}
}
},
"warmers": {}
}
}
}
This is embedded Elasticsearch instance, not sure if it's relevant.
My query looks like this
MatchQueryBuilder queryBuilder = matchPhrasePrefixQuery("_all", query).maxExpansions(50);
final SearchResponse response = client.prepareSearch("myindex")
.setQuery(queryBuilder)
.addHighlightedField("mystringproperty3", 0, 0)
.addHighlightedField("mystringproperty4", 0, 0)
.addHighlightedField("myarrayproperty.mystringproperty1", 0, 0)
.setHighlighterRequireFieldMatch(false)
.execute().actionGet();
And it doesn't work. I've tried to change query to queryStringQuery but it seems like it doesn't support search by part of the word. Any suggestions?
It's not possible. Elastic search does the indexing of a word. From tokenization perspective, you can not do much here.
You may need to write the wrapper over a search result. (Not elastic search specific)
I am currently implementing a simple person search in elastic search. I did some research and found quite a lot content about how to implement features as full text search and so on.
The problem is, that some queries just don't return any results.
I have the following index template:
PUT /_template/template_hca_bp
{
"template": "test",
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
},
"search_ngram": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"persons": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"search_analyzer": "standard",
"analyzer": "autocomplete"
},
"countryCode": {
"type": "keyword"
},
"doorNumber": {
"type": "keyword"
},
"id": {
"type": "text",
"index": "no",
"include_in_all": false
},
"stairwayNumber": {
"type": "keyword"
},
"street": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
},
"streetNumber": {
"type": "keyword"
},
"zipCode": {
"type": "keyword"
}
}
},
"id": {
"type": "keyword",
"index": "no",
"include_in_all": false
},
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard",
"boost":2
},
"personType": {
"type": "keyword",
"index": "no",
"include_in_all": false
},
"title": {
"type": "text"
}
}
}
}
}
My query looks like the following:
POST test/_search
{
"query": {
"multi_match": {
"query": "Maria",
"type":"cross_fields",
"fields": [
"name^2", "city", "street", "streetNumber", "zipCode"
]
}
}
}
If I now search e.g. for "Maria" then I get a result. But if I'm searching for a zipCode (e.g. 12345) than I don't get any result.
The analyze api has the following response:
"detail": {
"custom_analyzer": false,
"analyzer": {
"name": "default",
"tokens": [
{
"token": "12345",
"start_offset": 0,
"end_offset": 5,
"type": "<NUM>",
"position": 0,
"bytes": "[31 32 33 34 35]",
"positionLength": 1
}
]
}
}
I'm not getting any response. I have tried term, and match queries and all other kind of stuff, but I can't get it working?
The desired document:
"id": "V2718984F3A0ADA95176424457A068F9DC93FC8BDA0898A4E8248F194AE1AF4FCE04C29F46367DDEC33721C15C2679B7BB",
"name": "Maria Smith",
"personType": "APO",
"address": {
"countryCode": "A",
"city": "Testcity",
"zipCode": "12345",
"street": "Avenue",
"streetNumber": "2"
}
The settings for one of my indexes is as follows, however the stemmer isn't being applied. For example a search for fox will not pick up articles that include the term foxes. I can't see why as the order of the filters is correct (lowercase precedes the stemmer).
{
"articles": {
"settings": {
"index": {
"creation_date": "1436255268907",
"analysis": {
"filter": {
"filter_stemmer": {
"type": "stemmer",
"language": "english"
},
"kill_filters": {
"pattern": ".*_.*",
"type": "pattern_replace",
"replacement": ""
},
"filter_stop": {
"type": "stop"
},
"filter_shingle": {
"min_shingle_size": "2",
"max_shingle_size": "5",
"type": "shingle",
"output_unigrams": "true"
},
"filter_stemmerposs": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"tags_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"filter_stemmerposs",
"filter_stemmer"
],
"tokenizer": "patterntoke"
},
"shingles_analyzer": {
"filter": [
"standard",
"lowercase",
"filter_stop",
"filter_shingle",
"kill_filters",
"filter_stemmerposs",
"filter_stemmer"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "standard"
}
},
"tokenizer": {
"patterntoke": {
"type": "pattern",
"pattern": ","
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1060099"
},
"uuid": "H2NsE3eKT1y_ArPOPbjT6w"
}
}
}
}
And below is the mapping:
{
"articles": {
"mappings": {
"article": {
"properties": {
"accountid": {
"type": "double",
"include_in_all": false
},
"article": {
"type": "string",
"index_analyzer": "shingles_analyzer"
},
"articleid": {
"type": "double",
"include_in_all": false
},
"categoryid": {
"type": "double",
"include_in_all": false
},
"draftflag": {
"type": "double",
"include_in_all": false
},
"files": {
"type": "string",
"index_analyzer": "tags_analyzer"
},
"tags": {
"type": "string",
"index_analyzer": "tags_analyzer"
},
"title": {
"type": "string",
"index_analyzer": "shingles_analyzer"
},
"topicid": {
"type": "double",
"include_in_all": false
}
}
}
}
}
}
The sample documents are varied but for example 1 contains the token fox and another foxes (both derived from the article field) but each document is only found when the search is fox or foxes and not either which is what I'd expect. The search used Is a fuzzylikethis search (I'm using Nest .net to execute the query)