Fuzzy score to include only best match found in query text - Elasticsearch - elasticsearch

Lets say I have multiple words in query text that are close to word "Raquel" which in indexed in field1.
The problem is that score increases as the number of terms increases in query text.
To elaborate:
if query text is "Raquel", score is 5 (let's assume)
if query text is "Raquei", score is 4.9 (let's assume)
if query text is "Raquel Raquei Raque", the score increases, lets say (15). I need this score to be 5. Just the best score from amongst all scores evaluated against all query terms for a specific term in a field. Is there anyway I can achieve this?
Here's the query:
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "Raquel Raquei Raque",
"fields": ["filed1", "filed2"],
"fuzziness": "AUTO",
"minimum_should_match": "1"
}}]
}
}
Mappings for the fields used in query:
"filed1": {
"type": "text",
"analyzer": "standard_rebuilt",
"index_options": "docs"
},
"filed2": {
"type": "text",
"analyzer": "standard_rebuilt",
"index_options": "docs"
}
where standard_rebuilt uses unique word filter

Related

Getting results for multi_match cross_fields query in elasticsearch with custom analyzer

I have an elastic search 5.3 server with products.
Each product has a 14 digit product code that has to be searchable by the following rules. The complete code should match as well as a search term with only the last 9 digits, the last 6, the last 5 or the last 4 digits.
In order to achieve this I created a custom analyser which creates the appropriate tokens at index time using the pattern capture token filter. This seems to be working correctly. The _analyse API shows that the correct terms are created.
To fetch the documents from elastic search I'm using a multi_match cross_fields bool query to search a number of fields simultaneously.
When I have a query string that has a part that matches a product code and a part that matches any of the other fields no results are returned, but when I search for each part separately the appropriate results are returned. Also when I have multiple parts spanning any of the fields except the product code the correct results are returned.
My maping and analyzer:
PUT /store
{
"mappings": {
"products":{
"properties":{
"productCode":{
"analyzer": "ProductCode",
"search_analyzer": "standard",
"type": "text"
},
"description": {
"type": "text"
},
"remarks": {
"type": "text"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"ProductCodeNGram": {
"type": "pattern_capture",
"preserve_original": "true",
"patterns": [
"\\d{5}(\\d{9})",
"\\d{8}(\\d{6})",
"\\d{9}(\\d{5})",
"\\d{10}(\\d{4})"
]
}
},
"analyzer": {
"ProductCode": {
"filter": ["ProductCodeNGram"],
"type": "custom",
"preserve_original": "true",
"tokenizer": "standard"
}
}
}
}
}
The query
GET /store/products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "[query_string]",
"fields": ["productCode", "description", "remarks"],
"type": "cross_fields",
"operator": "and"
}
}
]
}
}
}
Sample data
POST /store/products
{
"productCode": "999999123456789",
"description": "Foo bar",
"remarks": "Foobar"
}
The following query strings all return one result:
"456789", "foo", "foobar", "foo foobar".
But the query_string "foo 456789" returns no results.
I am very curious as to why the last search does not return any results. I am convinced that it should.
The problem is that you are doing a cross_fields over fields with different analysers. Cross fields only works for fields using the same analyser. It in fact groups the fields by analyser before doing the cross fields. You can find more information in this documentation.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#_literal_cross_field_literal_and_analysis
Although cross_fields needs the same analyzer across the fields it operates on, I've had luck using the tie_breaker parameter to allow other fields (that use different analyzers) to be weighed for the total score.
This has the added benefit of allowing per-field boosting to be calculated in the final score, too.
Here's an example using your query:
GET /store/products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "[query_string]",
"fields": ["productCode", "description", "remarks"],
"type": "cross_fields",
"tie_breaker": 1 # You may need to tweak this
}
}
]
}
}
}
I also removed the operator field, as I believe using the "AND" operator will cause fields that don't have the same analyzer to be scored inappropriately.

How to force certain fields in mult_match to have exact match

I am trying to match the title of a product listing to a database of known products. My first idea was to put the known products and their metadata into elasticsearch and try to find the best match with multi_match. My current query is something like:
{
"query": {
"multi_match" : {
"query": "Men's small blue cotton pants SKU123",
"fields": ["sku^2","title","gender","color", "material","size"],
"type" : "cross_fields"
}
}
}
The problem is sometimes it will return products with the wrong color. Is there a way i could modify the above query to only score items in my index that have a color field equal to a word that exists in the query string? I am using elasticsearch 5.1.
If you want elasticsearch to score only items that meet certain criteria then you need to use the terms query in a filter context.
Since the terms query does not analyze your query, you'll have to do that yourself. Something simple would be to tokenize by whitespace and lowercase and generate a query that looks like this:
{
"query": {
"bool": {
"filter": {
"terms": {
"color": ["men's", "small", "blue", "cotton", "pants", "sku123"]
}
},
"must": {
"multi_match": {
"query": "Men's small blue cotton pants SKU123",
"fields": [
"sku^2",
"title",
"gender",
"material",
"size"
],
"type": "cross_fields"
}
}
}
}
}

ElasticSearch Multi-match and scoring

I'm using the following query on Elastic Search 2.3.3
es_query = {
"fields": ["title", "content"],
"query":
{
"multi_match" : {
"query": "potato tomato",
"type": "best_fields",
"fields": [ "title_cuis", "content_cuis" ]
}
}
}
I would like the results to be scored so that the first document returned is the one that contains the highest occurrence of the words "tomato" and "potato", but this doesn't seem to happen and I was wondering how I can modify the query to get that without re-indexing.
You're using best_fields, this will use the max score retrieved in matching process from title_cuis or content_cuis, separately.
Take a look to cross-fields

ElasticSearch sorting document based on fields that the phrase is found

In ElasticSearch how do i sort documents based on finding a phrase in the following order of fields.
Search Phrase: Miami
Fields: Title, Content, Topics
If found in Title, Content and in Topics it will show before other documents that the phrase is only found in Content.
Maybe there is a way to say:
if phrase found in Title then weight 2
if phrase found in Content then weight 1.5
if phrase found in Topics then weight 1
and this will be sum(weight) with _score
My Current query looks like
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title",
"content",
"topics",
"destinations"
]
}
}
}
}
You can use boosting on fields with the caret ^ notation to score them higher than other matching fields
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title^10",
"content^3",
"topics",
"destinations"
]
}
}
}
}
Here I have applied a weight of 10 to title and weight of 3 to content. Documents will be returned in decreasing _score order so you need to boost scores in fields that you consider more important; the values to use for boosting are up to you and may require a little trial and improvement to return documents in your preferred order.

elasticsearch scoring unique terms vs ngram terms

i've figured out how to return results on a partial word result using ngrams. but now i'd like to arrange (score or sort) my results based on the term first and then a partial term.
for example, the user searches a movie db for 'we'. i want 'we are marshall' and similar to show up at the top, and not 'north by northwest'. (the 'we' is in 'northwest').
currently this is my mapping for this title field:
"title": {
"type": "string",
"analyzer": "ngramAnalyer",
"fields": {
"term": {
"type": "string",
"analyzer": "fullTermCaseInsensitive"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
i've created a multifield where ngramAnalyzer is a custom ngram, term is using a keyword tokenizer with a standard filter, and raw is not_indexed.
my query is as follows:
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_score * (1+ (1 / doc['salesrank'].value) )"
}
}
],
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": {
"query": "we",
"max_expansions": 10
}
}
}
],
"should":{
"term" : {
"title.term" : {
"value" : "we",
"boost" : 10
}
}
}
}
}
}
i'm basically requiring that the ngram must be matched, and the term 'we' should be matched, and if so, boost it.
this isn't working of course.
any ideas?
edit
to add further complexity ... how would i match first on exact title, then on a custom score?
i've taken some stabs at it, but doesn't seem to work.
for example:
input: 'game'
results should be ordered by exact match 'game'
followed by a custom score based on a sales rank (integer)
so that the next results after 'game' might be something like 'hunger games'
what about bool combination of boosting query, where first match about full term with 10x boost factor, and another matches against ngram term with standard boost factor?

Resources