Elasticsearch match string to field with fuzziness - elasticsearch

I'm trying to match a string to a field and only want to apply fuzziness.
For example, with these documents:
{ title: "replace oilfilter" }, { title: "replace motoroil" }
The following queries should match only the first document:
"Replace oilfilter", "Replace oilsfilter", "Replaze oilfilter"
The following queries should NOT match any document:
"replace", "oilfilter", "motoroil"
What I got so far is the following:
index
I'm using the keyword analyzer so it sees the (potential) phrase as a single word, this way it does not match a document when searching for "replace" but it does find a document when searching for the exact term "Replace oilfilter".
"mappings": {
"blacklist": {
"properties": {
"title": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
search
I've tried multiple queries to search the documents. I got close with the following query:
"query": {
"query_string": {
"default_field": "title",
"fuzziness": "3",
"query": query
}
}
results
With this query the following are the results:
> "Replace oilfilter" (exact words)
< doc: { title: "replace oilfilter" }, score: 0.5753..
< doc: { title: "replace motoroil" }, score: 0.2876..
> "Replace iolfilter" (typo)
< doc: { title: "replace oilfilter" }, score: 0.2876..
> "oilfilter" (other term)
< doc: { title: "replace oilfilter" }, score: 0.2876..
problem
The results aren't that bad, but I need the scores to be more accurate. The second query with only the simple typo should get a much higher score than the second result in the first query and the only result in the third query.
What I'm trying to achieve is that it matches the whole query against the whole field in the document, that's why I'm using keyword analyzer. On top of that I only want to apply some fuzziness.
Hope someone can shed some light on this issue.
Thanks!

The following search should achieve what you want:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "replace oilfliter",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
You can increase the minimum_should_match to 100% if you want require a match on all the query terms no matter how long the query string is.

Related

increase score of query where all text match and not repeating words

I'm using the following query but it gets higher score for words which are repeated and is a subset of the words typed but not the entire sentence match.
For Eg:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "test in maths",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
If the field value contains : test test test
has higher score than the field value : test in maths
How can I get the higher score for the exact words match and not repeated words?
Thanks in Advance.
If you want to search exact sentences/phrases you should use the match_phrase query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).
You can add a should-clause that contains the match-phrase query to boost the score of exact phrases to your current query.
you can use match_phrase query for an exact match. match_phrase matches for exact occurrence in the sequence of the query provided.
e.g
{
'query': {
'bool': {
'must': [{
'match_phrase': {
'title': 'test in maths'
}
}]
}
}
}
Editing after comment:
Use
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_options": "docs"
}
}
}
}
and then you can use normal match type query, the elastisearch won't consider repetition of the words in the index for the title field.

multi_match fuzzy query across multiple fields

I am working to match a 'term' to multi fields (or _all field)
I want to do a fuzzy match on cross_fields but it is not supported.
any ideas how to do it or any other ways to do it ?
query: {
multi_match: {
query: term,
type: "cross_fields",
fields: ['_all']
}
}
when trying the solution here
ElasticSearch multi_match query over multiple fields with Fuzziness
I get this error
[parsing_exception] Fuziness not allowed for type [cross_fields], with
{ line=1 & col=128 }
elasticsearch version 5.0
edit:
here is the query I am building
bool: {
must: [
{
fuzzy: {
_all: term
}
},
{
fuzzy: {
"location.country": country
}
},
{
fuzzy: {
"location.city": city
}
}
]
}
cross_fields works by searching the term on your multiple fields. Since fuzziness isn't supported for cross_fields you have to write the query in a different way.
One possible is: implement your own "cross_fields" with shoulds and add there the fuzziness.
Say your term is: "term1 term2", you can split by word boundary (Regex \b) then should them in this form:
{
{
"query": {
"bool": {
"should": [{
"match": {
"field1": "term",
"fuzziness": 1
}
},{
"match": {
"field1": "term",
"fuzziness": 1
}
},{
"match": {
"field2": "term1",
"fuzziness": 1
}
},{
"match": {
"field2": "term12",
"fuzziness": 1
}
}
]
}
}
}
}
This is probably less the optimal if you have many fields, the query will become a cartesian product of the terms and fields.
Important note You're using _all field which is one field. which all other fields are indexed into. Maybe you don't even need cross_fields?

How to add fuzziness to search query in elasticsearch?

I'm trying to implement fuzziness on a particular field in a cross-fields query. It's a bit difficult though.
So the query should:
Match phrases across fields.
Match an exact match against partNumber and barcode (no fuzziness)
Match fuzzy terms against title and subtitle.
The query that I have so far is below - note the fuzziness isn't working at all in query so far.
So this should match 1 result which is "Amazing t-Shirt" in the title, and Blue in the subtitle. (note the spelling error).
Is it possible to implement the fuzziness at the index mapping level instead? Title and subtitle are quite short in the data set - maybe 30 - 40 characters combined maximum.
Otherwise how can I add fuzziness to the title and subtitle in the query?
{
"query": {
"multi_match": {
"query": "Bleu Amazing T-Shirt",
"fuzziness": "auto",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
],
"type": "cross_fields"
}
},
"fields": [
"identity.partNumber",
"identity.barcode",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
]
}
well it doesn't seem to be supported to fuzzy search using cross_fields, there was a few related issues. So instead of crossfield search, I copied the title & subtitle to a new field at index time and split the query like below. Seems to work for my test cases at least....
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "{{searchTerm}}",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn"
],
"type": "best_fields"
}
},
{
"match": {
"fuzzyFields": {
"query": "{{searchTerm}}",
"operator": "and",
"fuzziness": "auto"
}
}
}
]
}
}

Finding an exact phrase in multiple fields with Elasticsearch

I'm wanting to find an exact phrase (for instance, "the quick brown fox") across mutliple fields in a document.
Right now, I'm using something like this:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox"
}
},
"filters": {
"and": [
{
"term": {
"priority": "high"
}
}
...more ands
]
}
}
}
}
Question is, how can I do this correctly. Right now I'm getting the best match first, which tends to be the entire phrase, but I'm getting a load of almost matches too.
If you are using an ElasticSearch cluster with version >= 1.1.0, you could set the mode of your multi-match query to phrase :
...
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox",
"type": "phrase"
}
...
It will replace the match query generated for each field by a match_phrase one, which will return only the documents containing the full phrase (you can find details in the documentation)
how are you analyzing the subject/comments fields? if you want exact match, you'll need to use the keyword tokenizer for both index/search.

elasticsearch scoring unique terms vs ngram terms

i've figured out how to return results on a partial word result using ngrams. but now i'd like to arrange (score or sort) my results based on the term first and then a partial term.
for example, the user searches a movie db for 'we'. i want 'we are marshall' and similar to show up at the top, and not 'north by northwest'. (the 'we' is in 'northwest').
currently this is my mapping for this title field:
"title": {
"type": "string",
"analyzer": "ngramAnalyer",
"fields": {
"term": {
"type": "string",
"analyzer": "fullTermCaseInsensitive"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
i've created a multifield where ngramAnalyzer is a custom ngram, term is using a keyword tokenizer with a standard filter, and raw is not_indexed.
my query is as follows:
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_score * (1+ (1 / doc['salesrank'].value) )"
}
}
],
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": {
"query": "we",
"max_expansions": 10
}
}
}
],
"should":{
"term" : {
"title.term" : {
"value" : "we",
"boost" : 10
}
}
}
}
}
}
i'm basically requiring that the ngram must be matched, and the term 'we' should be matched, and if so, boost it.
this isn't working of course.
any ideas?
edit
to add further complexity ... how would i match first on exact title, then on a custom score?
i've taken some stabs at it, but doesn't seem to work.
for example:
input: 'game'
results should be ordered by exact match 'game'
followed by a custom score based on a sales rank (integer)
so that the next results after 'game' might be something like 'hunger games'
what about bool combination of boosting query, where first match about full term with 10x boost factor, and another matches against ngram term with standard boost factor?

Resources