I am trying to add autocomplete based on what the user searches.
Currently, I have the following mapping:
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"fields": {
"standard": {
"type": "text"
}
},
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"standard": {
"type": "text"
}
},
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
Below is the code I used for the settings:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"verdict": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"standard": {
"type": "text"
}
}
},
"content": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"standard": {
"type": "text"
}
}
}
}
}
}
Here's what I would like to implement:
I would like to collect and store all of the queries made to the endpoint and use autocomplete on that. For example, to date, all of the users have made the following queries -
Real Madrid v/s Barcelona
Real Madrid Team
Real Madrid Coach
Barcelona v/s Man City
Sevilla Home Ground
Man Utd. recent results
Now, if anyone searches Rea then the following autocomplete queries should be suggested:
Real Madrid v/s Barcelona
Real Madrid Team
Real Madrid Coach
This based on the searches made by all of the users till date and not a single user. Further, I would like to analyze what are the top queries that were made in let's say the past month.
I am using ElasticSearch version 7.1 on AWS Elasticsearch service.
Edit: I have considerably digressed from the initial question as my need has evolved a bit. I apologise if this has caused any troubles.
Related
I am currenty using Elasticsearch's phonetic analyzer. I want the query to give higher score to exact matches then phonetic ones. Here is the query I am using:
{
"query": {
"multi_match" : {
"query" : "Abhijeet",
"fields" : ["content", "title"]
}
},
"size": 10,
"_source": [ "title", "bench", "court", "id_" ],
"highlight": {
"fields" : {
"title" : {},
"content":{}
}
}
}
When I search for Abhijeet, the top queries are Abhijit and only later does Abhijeet come. I want the exact matches to appear first, all the time and then the phonetic ones. Can this be done?
Edit:
Mappings
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
Here is the code I used to set up the phonetic analyzer:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
Now, I want to query only the title and the content field. Here, I want the exact matches to appear first and then the phonetic ones.
The general solution approach is:
to use a bool-query,
with your ponectic query/queries in the must clause,
and the non-phonetic query/queries in the should clause
I can update the answer if you include the mappings and settings of your index to your question.
Update: Solution Approach
A. Expand your mapping to use multi-fields for title and content:
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields" : {
"standard" : {
"type" : "text"
}
}
},
...
"content": {
"type": "text",
"analyzer": "my_analyzer"
"fields" : {
"standard" : {
"type" : "text"
}
}
},
B. Get the fields populated (e.g. by re-indexing everything):
POST courts_2/_update_by_query
C. Adjust your query to leverage the newly introduced fields:
GET courts_2/_search
{
"_source": ["title","bench","court","id_"],
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title", "content"]
}
},
"should": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title.standard", "content.standard"]
}
}
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
I have a index with movie franchises in them, and I would like the exact word/phrase matches to score higher. For example, if I search for "Star Trek" I want "Star Trek" to score highest (first result) followed by "Star Trek Beyond" and "Star Trek Into Darkness". Currently when I search for "Star Trek" I get the titles with additional words scoring higher. Is this possible and how?
Also is it possible to get the same results as described above if there is some additrional unmatched text around the search term, for example: "(randomText) Star Trek (randomText)"
Here are my settings/mappings:
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title_english": {
"type": "text",
"fields": {
"raw": { "type": "keyword" },
"space": { "type": "text", "analyzer": "whitespace" }
},
"analyzer": "autocomplete"
},
"title_native": {
"type": "text",
"fields": {
"raw": { "type": "keyword" },
"space": { "type": "text", "analyzer": "whitespace" }
},
"analyzer": "autocomplete"
},
"title_romaji": {
"type": "text",
"fields": {
"raw": { "type": "keyword" },
"space": { "type": "text", "analyzer": "whitespace" }
},
"analyzer": "autocomplete"
},
"title_synonyms": {
"type": "text",
"fields": {
"raw": { "type": "keyword" },
"space": { "type": "text", "analyzer": "whitespace" }
},
"analyzer": "autocomplete"
}
}
}
}
And here is my query:
'query': {
'bool': {
'must': {
'multi_match': {
'query': request.args.get('query'),
'analyzer': 'standard',
'fields': ['title_*']
},
},
'should': [{
'term': {
'title_*.raw': {
'value': request.args.get('query'),
'boost': 3
}
}
},
{
'prefix': {
'title_*.raw': {
'value': request.args.get('query'),
'boost': 2
}
}
}]
}
}
I tried to debug my synonym search .it seems like when i use wornet format and use the wn_s.pl file it doesn't work, but when i use a custom synonym.txt file then it works.Please let me know where i am doing wrong.please find my below index:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
},
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["lowercase",
"synonym"
]
}
},
"mappings": {
"properties": {
"firebaseId": {
"type": "text"
},
"name": {
"fielddata": true,
"type": "text",
"analyzer": "standard"
},
"name_auto": {
"type": "text"
},
"category_name": {
"type": "text",
"analyzer": "synonym"
},
"sku": {
"type": "text"
},
"price": {
"type": "text"
},
"magento_id": {
"type": "text"
},
"seller_id": {
"type": "text"
},
"square_item_id": {
"type": "text"
},
"square_variation_id": {
"type": "text"
},
"typeId": {
"type": "text"
}
}
}
}
}
}
}
I am trying to do synonym search on category_name ,i have items like shoes and dress etc .when i search for boots,flipflop or slipper nothing comes.
here is my query search:
{
"query": {
"match": {
"category_name": "flipflop"
}
}
}
Your wordnet synonym format is not correct. Please have a look here
For a fast implementation please look at the synonyms.json
I have read about previous version of ES (< 2) where the "token_analyzer" key needs to be changed to "analyzer". But no matter what I do I am still getting this error:
"type": "mapper_parsing_exception",
"reason": "analyzer on field [email] must be set when search_analyzer is set"
Here is what I am passing into ES via a PUT function when I get the error:
{
"settings": {
"analysis": {
"analyzer": {
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings" : {
"uuser": {
"properties": {
"email": {
"type": "text",
"search_analyzer": "my_email_analyzer",
"fields": {
"email": {
"type": "text",
"analyzer": "my_email_analyzer"
}
}
},
"facebookId": {
"type": "text"
},
"name": {
"type": "text"
},
"profileImageUrl": {
"type": "text"
},
"signupDate": {
"type": "date"
},
"username": {
"type": "text"
}
,
"phoneNumber": {
"type": "text"
}
}
}
}
}
Any ideas what is wrong?
Because you have specified a search_analyzer for the field, you also have to specify the analyzer to be used at indexing time. For example, add this line under where you specify the search_analyzer:
"analyzer": "standard",
To give you this:
{
"settings": {
"analysis": {
"analyzer": {
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings" : {
"uuser": {
"properties": {
"email": {
"type": "text",
"search_analyzer": "my_email_analyzer",
"analyzer": "standard",
"fields": {
"email": {
"type": "text",
"analyzer": "my_email_analyzer"
}
}
},
"facebookId": {
"type": "text"
},
"name": {
"type": "text"
},
"profileImageUrl": {
"type": "text"
},
"signupDate": {
"type": "date"
},
"username": {
"type": "text"
}
,
"phoneNumber": {
"type": "text"
}
}
}
}
}
See also: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html
i have edge_ngram configured for a filed.
suppose the word is indexed in edge_ngram is : quick
and its analyzing as : q,qu,qui,quic,quick
when i am tring to search quickfull the words contaning quick is also coming in results.
i want words only containing quickfull comes else it gives no results.
this is my mapping :
{
"john_search": {
"aliases": {},
"mappings": {
"drugs": {
"properties": {
"chemical": {
"type": "string"
},
"cutting_allowed": {
"type": "boolean"
},
"id": {
"type": "long"
},
"is_banned": {
"type": "boolean"
},
"is_discontinued": {
"type": "boolean"
},
"manufacturer": {
"type": "string"
},
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete"
},
"price": {
"type": "string",
"index": "not_analyzed"
},
"refrigerated": {
"type": "boolean"
},
"sell_freq": {
"type": "long"
},
"xtra_name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1475061490060",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "false",
"type": "phonetic",
"encoder": "metaphone"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "100"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"dbl_metaphone": {
"filter": "my_metaphone",
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "qoRll9uATpegMtrnFTsqIw",
"version": {
"created": "2040099"
}
}
},
"warmers": {}
}
}
any help would be appreciated
It's because your name field has "analyzer": "autocomplete", which means that the autocomplete analyzer will also be applied at search time, hence the search term quickfull will be tokenized to q, qu, qui, quic, quick, quickf, quickfu, quickful and quickfull and that matches quick as well.
In order to prevent this, you need to set "search_analyzer": "standard" on the name field to override the index-time analyzer.
"name": {
"type": "string",
"boost": 2,
"fields": {
"exact": {
"type": "string",
"boost": 4,
"analyzer": "standard"
},
"phenotic": {
"type": "string",
"analyzer": "dbl_metaphone"
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard" <--- add this
},