I have read about previous version of ES (< 2) where the "token_analyzer" key needs to be changed to "analyzer". But no matter what I do I am still getting this error:
"type": "mapper_parsing_exception",
"reason": "analyzer on field [email] must be set when search_analyzer is set"
Here is what I am passing into ES via a PUT function when I get the error:
{
"settings": {
"analysis": {
"analyzer": {
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings" : {
"uuser": {
"properties": {
"email": {
"type": "text",
"search_analyzer": "my_email_analyzer",
"fields": {
"email": {
"type": "text",
"analyzer": "my_email_analyzer"
}
}
},
"facebookId": {
"type": "text"
},
"name": {
"type": "text"
},
"profileImageUrl": {
"type": "text"
},
"signupDate": {
"type": "date"
},
"username": {
"type": "text"
}
,
"phoneNumber": {
"type": "text"
}
}
}
}
}
Any ideas what is wrong?
Because you have specified a search_analyzer for the field, you also have to specify the analyzer to be used at indexing time. For example, add this line under where you specify the search_analyzer:
"analyzer": "standard",
To give you this:
{
"settings": {
"analysis": {
"analyzer": {
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings" : {
"uuser": {
"properties": {
"email": {
"type": "text",
"search_analyzer": "my_email_analyzer",
"analyzer": "standard",
"fields": {
"email": {
"type": "text",
"analyzer": "my_email_analyzer"
}
}
},
"facebookId": {
"type": "text"
},
"name": {
"type": "text"
},
"profileImageUrl": {
"type": "text"
},
"signupDate": {
"type": "date"
},
"username": {
"type": "text"
}
,
"phoneNumber": {
"type": "text"
}
}
}
}
}
See also: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html
Related
I am currenty using Elasticsearch's phonetic analyzer. I want the query to give higher score to exact matches then phonetic ones. Here is the query I am using:
{
"query": {
"multi_match" : {
"query" : "Abhijeet",
"fields" : ["content", "title"]
}
},
"size": 10,
"_source": [ "title", "bench", "court", "id_" ],
"highlight": {
"fields" : {
"title" : {},
"content":{}
}
}
}
When I search for Abhijeet, the top queries are Abhijit and only later does Abhijeet come. I want the exact matches to appear first, all the time and then the phonetic ones. Can this be done?
Edit:
Mappings
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
Here is the code I used to set up the phonetic analyzer:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
Now, I want to query only the title and the content field. Here, I want the exact matches to appear first and then the phonetic ones.
The general solution approach is:
to use a bool-query,
with your ponectic query/queries in the must clause,
and the non-phonetic query/queries in the should clause
I can update the answer if you include the mappings and settings of your index to your question.
Update: Solution Approach
A. Expand your mapping to use multi-fields for title and content:
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields" : {
"standard" : {
"type" : "text"
}
}
},
...
"content": {
"type": "text",
"analyzer": "my_analyzer"
"fields" : {
"standard" : {
"type" : "text"
}
}
},
B. Get the fields populated (e.g. by re-indexing everything):
POST courts_2/_update_by_query
C. Adjust your query to leverage the newly introduced fields:
GET courts_2/_search
{
"_source": ["title","bench","court","id_"],
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title", "content"]
}
},
"should": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title.standard", "content.standard"]
}
}
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
I am trying to create an index with the mapping of text and keyword with the analyzer defined, here what i have tried till now:
{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
,
"mappings": {
"properties": {
"question": {
"type":"text",
"fields": {
"keyword": {
"type": "keyword"
},
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
}
I have tried this but getting error :
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "unknown key [analysis] for create index"
}
],
"type": "parse_exception",
"reason": "unknown key [analysis] for create index"
},
"status": 400
}
Question is the field where I need to add this mapping.
I am trying this in AWS ES service.
Great start, you're almost there!
The analysis section needs to be located inside the top-level settings section, like this:
{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"question": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
},
"answer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
}
I tried to debug my synonym search .it seems like when i use wornet format and use the wn_s.pl file it doesn't work, but when i use a custom synonym.txt file then it works.Please let me know where i am doing wrong.please find my below index:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
},
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["lowercase",
"synonym"
]
}
},
"mappings": {
"properties": {
"firebaseId": {
"type": "text"
},
"name": {
"fielddata": true,
"type": "text",
"analyzer": "standard"
},
"name_auto": {
"type": "text"
},
"category_name": {
"type": "text",
"analyzer": "synonym"
},
"sku": {
"type": "text"
},
"price": {
"type": "text"
},
"magento_id": {
"type": "text"
},
"seller_id": {
"type": "text"
},
"square_item_id": {
"type": "text"
},
"square_variation_id": {
"type": "text"
},
"typeId": {
"type": "text"
}
}
}
}
}
}
}
I am trying to do synonym search on category_name ,i have items like shoes and dress etc .when i search for boots,flipflop or slipper nothing comes.
here is my query search:
{
"query": {
"match": {
"category_name": "flipflop"
}
}
}
Your wordnet synonym format is not correct. Please have a look here
For a fast implementation please look at the synonyms.json
I am searching for a phrase in a email body. Need to get the exact data filtered like, if I search for 'Avenue New', it should return only results which has the phrase 'Avenue New' not 'Avenue Street', 'Park Avenue'etc
My mapping is like:
{
"exchangemailssql": {
"aliases": {},
"mappings": {
"email": {
"dynamic_templates": [
{
"_default": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"doc_values": true,
"type": "keyword"
}
}
}
],
"properties": {
"attachments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"body": {
"type": "text",
"analyzer": "keylower",
"fielddata": true
},
"count": {
"type": "short"
},
"emailId": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "3s",
"number_of_shards": "1",
"provided_name": "exchangemailssql",
"creation_date": "1500527793230",
"analysis": {
"filter": {
"nGram": {
"min_gram": "4",
"side": "front",
"type": "edge_ngram",
"max_gram": "100"
}
},
"analyzer": {
"keylower": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
},
"email": {
"filter": [
"lowercase",
"unique",
"nGram"
],
"type": "custom",
"tokenizer": "uax_url_email"
},
"full": {
"filter": [
"lowercase",
"snowball",
"nGram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "0",
"uuid": "2XTpHmwaQF65PNkCQCmcVQ",
"version": {
"created": "5040099"
}
}
}
}
}
I have given the search query like:
{
"query": {
"match_phrase": {
"body": "Avenue New"
}
},
"highlight": {
"fields" : {
"body" : {}
}
}
}
The problem here is that you're tokenizing the full body content using the keyword tokenizer, i.e. it will be one big lowercase string and you cannot search inside of it.
If you simply change the analyzer of your body field to standard instead of keylower, you'll find what you need using the match_phrase query.
"body": {
"type": "text",
"analyzer": "standard", <---change this
"fielddata": true
},
i have edge_ngram configured for a filed.
suppose the word is indexed in edge_ngram is : quick
and its analyzing as : q,qu,qui,quic,quick
when i am tring to search quickfull the words contaning quick is also coming in results.
i want words only containing quickfull comes else it gives no results.
this is my mapping :
{
"john_search": {
"aliases": {},
"mappings": {
"drugs": {
"properties": {
"chemical": {
"type": "string"
},
"cutting_allowed": {
"type": "boolean"
},
"id": {
"type": "long"
},
"is_banned": {
"type": "boolean"
},
"is_discontinued": {
"type": "boolean"
},
"manufacturer": {
"type": "string"
},
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete"
},
"price": {
"type": "string",
"index": "not_analyzed"
},
"refrigerated": {
"type": "boolean"
},
"sell_freq": {
"type": "long"
},
"xtra_name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1475061490060",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "100"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"dbl_metaphone": {
"filter": "my_metaphone",
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "qoRll9uATpegMtrnFTsqIw",
"version": {
"created": "2040099"
}
}
},
"warmers": {}
}
}
any help would be appreciated
It's because your name field has "analyzer": "autocomplete", which means that the autocomplete analyzer will also be applied at search time, hence the search term quickfull will be tokenized to q, qu, qui, quic, quick, quickf, quickfu, quickful and quickfull and that matches quick as well.
In order to prevent this, you need to set "search_analyzer": "standard" on the name field to override the index-time analyzer.
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard" <--- add this
},