Finding an exact phrase in multiple fields with Elasticsearch - elasticsearch

I'm wanting to find an exact phrase (for instance, "the quick brown fox") across mutliple fields in a document.
Right now, I'm using something like this:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox"
}
},
"filters": {
"and": [
{
"term": {
"priority": "high"
}
}
...more ands
]
}
}
}
}
Question is, how can I do this correctly. Right now I'm getting the best match first, which tends to be the entire phrase, but I'm getting a load of almost matches too.

If you are using an ElasticSearch cluster with version >= 1.1.0, you could set the mode of your multi-match query to phrase :
...
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox",
"type": "phrase"
}
...
It will replace the match query generated for each field by a match_phrase one, which will return only the documents containing the full phrase (you can find details in the documentation)

how are you analyzing the subject/comments fields? if you want exact match, you'll need to use the keyword tokenizer for both index/search.

Related

Filter results from Elasticsearch if only a specific field matches

I'm using the following query for searching across multiple fields:
{
"query": {
"multi_match": {
"query": "italian sports car",
"fields": ["car_name", "car_brand", "car_description", "car_country"],
"type": "most_fields"
}
}
}
In this example, I'm looking for sports cars made in Italy (hence the car_country field). However, this will return all the cars made in Italy even if they are not sports cars. I want car_country to be just an auxiliary search field, so I don't want hits when the only matched field is car_country. Is this possible? I know I can set a lower score for that field, but I want hits with only this matching field to be completely ignored.
There can be different ways you handle this problem depending on the scoring etc. you require from you results. For instance -
Use a bool query with 2 parts
Must query - include queries that must match for the document to be in the resultset
Should query - include queries that should match(and impact scoring) but do not decide if a document should or should not be in the result set.
Add the multi-match query without the car_country field in must query and a match query for car_country field in should query.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "italian sports car",
"fields": [
"car_name",
"car_brand",
"car_description"
],
"type": "most_fields"
}
}
],
"should": [
{
"match": {
"car_country": {
"query": "italian sports car"
}
}
}
]
}
}
}

Elasticsearch doesn't return results on a multi match query

I'm wondering why Elasticsearch doesn't give me any results for the following Multi Match Query:
GET /stag/_search
{
"query": {
"multi_match": {
"type": "phrase_prefix",
"query": "ferran ma",
"fields": [ "fullName", "fullName.folded" ]
}
}
}
But it gives me results on:
GET /stag/_search
{
"query": {
"multi_match": {
"type": "phrase_prefix",
"query": "ferran may",
"fields": [ "fullName", "fullName.folded" ]
}
}
}
I thought that maybe there is a minimum character length per word but then I've seen the following query:
GET /stag/_search
{
"query": {
"multi_match": {
"type": "phrase_prefix",
"query": "ignasi t",
"fields": [ "fullName", "fullName.folded" ]
}
}
}
Is giving me results. So I have no idea what's going on.
Seems like the problem is explained here
The match_phrase_prefix query is a poor-man’s autocomplete. It is very
easy to use, which lets you get started quickly with
search-as-you-type but its results, which usually are good enough, can
sometimes be confusing.
Consider the query string quick brown f. This query works by creating
a phrase query out of quick and brown (i.e. the term quick must exist
and must be followed by the term brown). Then it looks at the sorted
term dictionary to find the first 50 terms that begin with f, and adds
these terms to the phrase query.
The problem is that the first 50 terms may not include the term fox so
the phrase quick brown fox will not be found. This usually isn’t a
problem as the user will continue to type more letters until the word
they are looking for appears.

Elastic Search Query (a like x and y) or (b like x and y)

Some background info: In the bellow example user searched for "HTML CSS". I split each word from the search string and created the SQL query seen bellow.
Now I am trying to make an elastic search query that has the same logic as the following SQL query:
SELECT
title, description
FROM `classes`
WHERE
(`title` LIKE '%html%' AND `title` LIKE '%css%') OR
(description LIKE '%html%' AND description LIKE '%css%')
Currently, half way there but can't seem to get it right yet.
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "html"
}
},
{
"term": {
"title": "css"
}
}
]
}
},
"_source": [
"title"
],
"size": 30
}
Now I need to find how to add follow logic
OR (description LIKE '%html%' AND description LIKE '%css%')
One important point is that I need to only fetch documents that have both words in either title or disruption. I don't want to fetch documents that have only 1 word.
I will update questions as I find more info.
Update: The chosen answer also provides a way to boost scoring based on the field.
Can you try following query. You can use should for making or operation
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": { // Go for term if your field is analyzed
"title": {
"query": "html css",
"operator": "and",
"boost" : 2
}
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"description": {
"query": "html css",
"operator": "and"
}
}
}
]
}
}
],
"minimum_number_should_match": 1
}
},
"_source": [
"title",
"description"
]
}
Hope this helps!!
I feel most appropriate query to be used in this case is multi_match.
multi_match query is convenient way of running the same query on
multiple fields.
So your query can be written as:
GET /_search
{
"_source": ["title", "description"],
"query": {
"multi_match": {
"query": "html css",
"fields": ["title^2", "description"],
"operator":"and"
}
}
}
_source filters the dataset so that only fields mentioned in array
will be displayed in results.
^2 denotes boosting title field with the number 2
operator:and makes sure that all terms in query must be matched
in either fields
From the elasticsearch 5.2 doc:
One option is to use the nested datatype instead of the object datatype.
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/nested.html
Hope this helps

How to force certain fields in mult_match to have exact match

I am trying to match the title of a product listing to a database of known products. My first idea was to put the known products and their metadata into elasticsearch and try to find the best match with multi_match. My current query is something like:
{
"query": {
"multi_match" : {
"query": "Men's small blue cotton pants SKU123",
"fields": ["sku^2","title","gender","color", "material","size"],
"type" : "cross_fields"
}
}
}
The problem is sometimes it will return products with the wrong color. Is there a way i could modify the above query to only score items in my index that have a color field equal to a word that exists in the query string? I am using elasticsearch 5.1.
If you want elasticsearch to score only items that meet certain criteria then you need to use the terms query in a filter context.
Since the terms query does not analyze your query, you'll have to do that yourself. Something simple would be to tokenize by whitespace and lowercase and generate a query that looks like this:
{
"query": {
"bool": {
"filter": {
"terms": {
"color": ["men's", "small", "blue", "cotton", "pants", "sku123"]
}
},
"must": {
"multi_match": {
"query": "Men's small blue cotton pants SKU123",
"fields": [
"sku^2",
"title",
"gender",
"material",
"size"
],
"type": "cross_fields"
}
}
}
}
}

How to add fuzziness to search query in elasticsearch?

I'm trying to implement fuzziness on a particular field in a cross-fields query. It's a bit difficult though.
So the query should:
Match phrases across fields.
Match an exact match against partNumber and barcode (no fuzziness)
Match fuzzy terms against title and subtitle.
The query that I have so far is below - note the fuzziness isn't working at all in query so far.
So this should match 1 result which is "Amazing t-Shirt" in the title, and Blue in the subtitle. (note the spelling error).
Is it possible to implement the fuzziness at the index mapping level instead? Title and subtitle are quite short in the data set - maybe 30 - 40 characters combined maximum.
Otherwise how can I add fuzziness to the title and subtitle in the query?
{
"query": {
"multi_match": {
"query": "Bleu Amazing T-Shirt",
"fuzziness": "auto",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
],
"type": "cross_fields"
}
},
"fields": [
"identity.partNumber",
"identity.barcode",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
]
}
well it doesn't seem to be supported to fuzzy search using cross_fields, there was a few related issues. So instead of crossfield search, I copied the title & subtitle to a new field at index time and split the query like below. Seems to work for my test cases at least....
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "{{searchTerm}}",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn"
],
"type": "best_fields"
}
},
{
"match": {
"fuzzyFields": {
"query": "{{searchTerm}}",
"operator": "and",
"fuzziness": "auto"
}
}
}
]
}
}

Resources