How to add fuzziness to search query in elasticsearch? - elasticsearch

I'm trying to implement fuzziness on a particular field in a cross-fields query. It's a bit difficult though.
So the query should:
Match phrases across fields.
Match an exact match against partNumber and barcode (no fuzziness)
Match fuzzy terms against title and subtitle.
The query that I have so far is below - note the fuzziness isn't working at all in query so far.
So this should match 1 result which is "Amazing t-Shirt" in the title, and Blue in the subtitle. (note the spelling error).
Is it possible to implement the fuzziness at the index mapping level instead? Title and subtitle are quite short in the data set - maybe 30 - 40 characters combined maximum.
Otherwise how can I add fuzziness to the title and subtitle in the query?
{
"query": {
"multi_match": {
"query": "Bleu Amazing T-Shirt",
"fuzziness": "auto",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
],
"type": "cross_fields"
}
},
"fields": [
"identity.partNumber",
"identity.barcode",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
]
}

well it doesn't seem to be supported to fuzzy search using cross_fields, there was a few related issues. So instead of crossfield search, I copied the title & subtitle to a new field at index time and split the query like below. Seems to work for my test cases at least....
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "{{searchTerm}}",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn"
],
"type": "best_fields"
}
},
{
"match": {
"fuzzyFields": {
"query": "{{searchTerm}}",
"operator": "and",
"fuzziness": "auto"
}
}
}
]
}
}

Related

Elastic Search Query (a like x and y) or (b like x and y)

Some background info: In the bellow example user searched for "HTML CSS". I split each word from the search string and created the SQL query seen bellow.
Now I am trying to make an elastic search query that has the same logic as the following SQL query:
SELECT
title, description
FROM `classes`
WHERE
(`title` LIKE '%html%' AND `title` LIKE '%css%') OR
(description LIKE '%html%' AND description LIKE '%css%')
Currently, half way there but can't seem to get it right yet.
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "html"
}
},
{
"term": {
"title": "css"
}
}
]
}
},
"_source": [
"title"
],
"size": 30
}
Now I need to find how to add follow logic
OR (description LIKE '%html%' AND description LIKE '%css%')
One important point is that I need to only fetch documents that have both words in either title or disruption. I don't want to fetch documents that have only 1 word.
I will update questions as I find more info.
Update: The chosen answer also provides a way to boost scoring based on the field.
Can you try following query. You can use should for making or operation
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": { // Go for term if your field is analyzed
"title": {
"query": "html css",
"operator": "and",
"boost" : 2
}
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"description": {
"query": "html css",
"operator": "and"
}
}
}
]
}
}
],
"minimum_number_should_match": 1
}
},
"_source": [
"title",
"description"
]
}
Hope this helps!!
I feel most appropriate query to be used in this case is multi_match.
multi_match query is convenient way of running the same query on
multiple fields.
So your query can be written as:
GET /_search
{
"_source": ["title", "description"],
"query": {
"multi_match": {
"query": "html css",
"fields": ["title^2", "description"],
"operator":"and"
}
}
}
_source filters the dataset so that only fields mentioned in array
will be displayed in results.
^2 denotes boosting title field with the number 2
operator:and makes sure that all terms in query must be matched
in either fields
From the elasticsearch 5.2 doc:
One option is to use the nested datatype instead of the object datatype.
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/nested.html
Hope this helps

How to force certain fields in mult_match to have exact match

I am trying to match the title of a product listing to a database of known products. My first idea was to put the known products and their metadata into elasticsearch and try to find the best match with multi_match. My current query is something like:
{
"query": {
"multi_match" : {
"query": "Men's small blue cotton pants SKU123",
"fields": ["sku^2","title","gender","color", "material","size"],
"type" : "cross_fields"
}
}
}
The problem is sometimes it will return products with the wrong color. Is there a way i could modify the above query to only score items in my index that have a color field equal to a word that exists in the query string? I am using elasticsearch 5.1.
If you want elasticsearch to score only items that meet certain criteria then you need to use the terms query in a filter context.
Since the terms query does not analyze your query, you'll have to do that yourself. Something simple would be to tokenize by whitespace and lowercase and generate a query that looks like this:
{
"query": {
"bool": {
"filter": {
"terms": {
"color": ["men's", "small", "blue", "cotton", "pants", "sku123"]
}
},
"must": {
"multi_match": {
"query": "Men's small blue cotton pants SKU123",
"fields": [
"sku^2",
"title",
"gender",
"material",
"size"
],
"type": "cross_fields"
}
}
}
}
}

ElasticSearch sorting document based on fields that the phrase is found

In ElasticSearch how do i sort documents based on finding a phrase in the following order of fields.
Search Phrase: Miami
Fields: Title, Content, Topics
If found in Title, Content and in Topics it will show before other documents that the phrase is only found in Content.
Maybe there is a way to say:
if phrase found in Title then weight 2
if phrase found in Content then weight 1.5
if phrase found in Topics then weight 1
and this will be sum(weight) with _score
My Current query looks like
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title",
"content",
"topics",
"destinations"
]
}
}
}
}
You can use boosting on fields with the caret ^ notation to score them higher than other matching fields
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title^10",
"content^3",
"topics",
"destinations"
]
}
}
}
}
Here I have applied a weight of 10 to title and weight of 3 to content. Documents will be returned in decreasing _score order so you need to boost scores in fields that you consider more important; the values to use for boosting are up to you and may require a little trial and improvement to return documents in your preferred order.

Finding an exact phrase in multiple fields with Elasticsearch

I'm wanting to find an exact phrase (for instance, "the quick brown fox") across mutliple fields in a document.
Right now, I'm using something like this:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox"
}
},
"filters": {
"and": [
{
"term": {
"priority": "high"
}
}
...more ands
]
}
}
}
}
Question is, how can I do this correctly. Right now I'm getting the best match first, which tends to be the entire phrase, but I'm getting a load of almost matches too.
If you are using an ElasticSearch cluster with version >= 1.1.0, you could set the mode of your multi-match query to phrase :
...
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox",
"type": "phrase"
}
...
It will replace the match query generated for each field by a match_phrase one, which will return only the documents containing the full phrase (you can find details in the documentation)
how are you analyzing the subject/comments fields? if you want exact match, you'll need to use the keyword tokenizer for both index/search.

elasticsearch scoring unique terms vs ngram terms

i've figured out how to return results on a partial word result using ngrams. but now i'd like to arrange (score or sort) my results based on the term first and then a partial term.
for example, the user searches a movie db for 'we'. i want 'we are marshall' and similar to show up at the top, and not 'north by northwest'. (the 'we' is in 'northwest').
currently this is my mapping for this title field:
"title": {
"type": "string",
"analyzer": "ngramAnalyer",
"fields": {
"term": {
"type": "string",
"analyzer": "fullTermCaseInsensitive"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
i've created a multifield where ngramAnalyzer is a custom ngram, term is using a keyword tokenizer with a standard filter, and raw is not_indexed.
my query is as follows:
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_score * (1+ (1 / doc['salesrank'].value) )"
}
}
],
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": {
"query": "we",
"max_expansions": 10
}
}
}
],
"should":{
"term" : {
"title.term" : {
"value" : "we",
"boost" : 10
}
}
}
}
}
}
i'm basically requiring that the ngram must be matched, and the term 'we' should be matched, and if so, boost it.
this isn't working of course.
any ideas?
edit
to add further complexity ... how would i match first on exact title, then on a custom score?
i've taken some stabs at it, but doesn't seem to work.
for example:
input: 'game'
results should be ordered by exact match 'game'
followed by a custom score based on a sales rank (integer)
so that the next results after 'game' might be something like 'hunger games'
what about bool combination of boosting query, where first match about full term with 10x boost factor, and another matches against ngram term with standard boost factor?

Resources