elasticsearch ngram analyzer return unexpected result - elasticsearch

I'm using ngram analyzer for indexing and standard analyzer for query.
currently i have indexed multiphone and iphone.
when i search for iphone the score and therefore relevancy of multiphone is higher than iphone.
how should i build query in order to get higher score for iphone?
the query that i execute is
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "iphone",
"fields": [
"englishName",
"aliasName"
]
}
},
what i need is that iphone score be higher than multiphone.
what about performance?

I have answered similar question here
Basically you need to add raw version of the field to your mapping. You could use keyword analzyer with lowercase filter or you can make it "index" : "not_analyzed" or even use default standard analyzer.
Then you do a bool query and add a clause for the exact match and It will be scored higher.
EDIT : Example
You could map your englishName field as follow
englishName: {
type: 'string',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
You could do the same with aliasName
Then your query would look something like this
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "iphone",
"fields": [
"englishName",
"aliasName"
]
}
},
{
"multi_match": {
"query": "iphone",
"fields": [
"englishName.raw",
"aliasName.raw"
],
"boost": 5
}
}
]
}
}
}
iphone will be scored higher with this query
Hope this helps.

Related

Elastic search Match query with comma value not working

Hi We wanted to suppot both partial search and exact match for one filed category.
Here is the mapping for category , We achieved this with fields.raw
"category": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Everything is working as expected , I am able to do both exact and partial search.
But When I am having char comma "," in the data , Exact match is not working.
I am searching with category.name.raw, which is not_analyzed filed
{ "query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "",
"type": "cross_fields",
"fields": [
"filed1",
"field2^12"
]
}
},
{
"match": {
"category.name.raw": " Poverty, Poor and Hunger"
}
}
]
}
}
}}}
I am not getting any results, I am not sure what I am doing wrong, Please help me to fix this.
Thanks in advance
Try to use below analyzer:
"lower_whitespace" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "whitespace"
}
for more details check below about tokenizers:
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/analysis-whitespace-analyzer.html
And it seems you using an old version from Elastic search consider upgrading will be a great idea
The problem is
{
"match": {
"category.name.raw": " Poverty, Poor and Hunger"
}
}
Whilst the targeted field is mapped as not_analyzed (aka keyword in newer versions of Elasticsearch), the query input here will be analyzed. I think it'll inherit the standard analyzer defined for the search_analyzer on category.name.
If you need an exact match, use a term query instead of the match query.

Adding fuzziness conditionally in ElasticSearch

I have ten or so fields in all my documents: One in particular is product_code which is unique per document and of type string.
I have a match query on _all that works well, but I would like to perform a "fuzzy match" while preserving the ability to search for exact product_code
Here's what I've attempted:
"query": {
"bool": {
"should": [
{
"match": {
"product_code": {
"query": searchString,
"operator": "AND"
}
}
},
{
"match": {
"_all": {
"query": searchString,
"operator": "AND"
"fuzziness": 2,
"prefix_length": 2
}
}
}
]
}
}
The problem with this approach is that the fuzziness is being applied to searches for product_code as well because it's included in _all.
Is there a way to either perform the search on product_code first and if no results are found, perform the search on _all, or exclude product_code from the _all query?
Any help is greatly appreciated.
yes you can exlude product_code from _all using the following mappings.
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {},
"filter": {}
}
},
"mappings": {
"type_name": {
"properties": {
"product_code": {
"type": "string",
"include_in_all": false
}
}
}
}
}
Alternatively you can use query_string search which also offer fuzziness.
Use the following query which use query string with AND operator and fuzziness settings
{
"query": {
"bool": {
"should": [{
"query_string": {
"fields": ["product_code", "other_field"],
"query": "this is my string",
"default_operator": "AND",
"fuzziness": 2,
"fuzzy_prefix_length": 2
}
}, {
"match": {
"product_code": {
"query": "this is my string",
"operator": "AND"
}
}
}]
}
}
}
Hope this helps

Elastic : search two terms, one on _all, other one on a field

I would like to mix a search on a whole document (eg "developer") and a search on some field for another term (eg "php").
I can do each search separately but I can't mix them.
Here my example (simplified to show only my issue) :
{
"query": {
"function_score": {
"query": {
"match": {
"_all": "developer"
},
"multi_match": {
"query": "php",
"fields": [
"skills.description",
"skills.description",
"skills.details"
],
"operator": "or",
"type": "most_fields"
}
}
}
}
If I run this example I have an error :
Parse Failure [Failed to parse source
Is there a way to search on both _all and specific fields with two terms?
Thanks.
Yes, you're almost there, you need to combine them into a bool/must query:
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"_all": "developer"
}
},
{
"multi_match": {
"query": "php",
"fields": [
"skills.description",
"skills.description",
"skills.details"
],
"operator": "or",
"type": "most_fields"
}
}
]
}
}
}
}
}

ElasticSearch multi_match query over multiple fields with Fuzziness

How can I add fuzziness to a multi_match query? So if someone is to search for 'basball' it would still find 'baseball' articles. Currently my query looks like this:
POST /newspaper/articles/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "baseball",
"type": "phrase",
"fields": [
"subject^3",
"section^2.5",
"article^2",
"tags^1.5",
"notes^1"
]
}
}
}
}
}
One option I was looking at is to do something like this, just don't know if this is the best option. It's important to keep the sorting based on the scoring:
"query" : {
"query_string" : {
"query" : "subject:basball^3 section:basball^2.5 article:basball^2",
"fuzzy_prefix_length" : 1
}
}
Suggestions?
To add fuzziness to a multiquery you need to add the fuzziness property as described here:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "baseball",
"type": "phrase",
"fields": [
"subject^3",
"section^2.5",
"article^2",
"tags^1.5",
"notes^1"
],
"fuzziness" : "AUTO",
"prefix_length" : 2
}
}
}
}
}
Please notice that prefix_length explained in the doc as:
The number of initial characters which will not be “fuzzified”. This helps to reduce the number of terms which must be examined. Defaults to 0.
To check the possible values of fuzziness please visit the ES docs.

Elasticsearch: how to disable scoring on a field?

I am new to Elasticsearch and please forgive me if the answer is obvious.
Here is what I have for the mapping of the field in question:
"condition" : { "type" : "string", "store" : "no", "index": "not_analyzed", "omit_norms" : "true" }
I need search on this field, but I need 100% string match (no stemming, etc.) on a sub-string (blank separated). An example of this field in a document is as follows:
{
"condition": "abc xyz"
}
An example query is:
/_search?q=condition:xyz
Is the above mapping correct? I also used omit_norms (true). Is this a correct thing to do in my case?
How can I disable scoring on this field? Can I do it in mapping? What is the best way of doing it? (Actually I need to disable scoring on more than one. I do have fields that need scoring)
Thanks and regards!
Using omit_norms:true will not take the length of the field into consideration for the scoring, Elasticsearch won't index the norms information. So if you don't want to use scoring that is a good thing to do as it will save you some disk space.
If you're not interested in scoring in your queries use a filtered query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": {
"term": {
"condition": "abc xyz"
}
}
}
}
}
}
}
The new syntax for a filtered query is now:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"condition": "abc"
}
}
}
}
}

Resources