I'm looking for a way to fuzzy partial match against a field where the words match, however I want to also add in strict phrase matching.
i.e. say I have fields such as
foo bar
bar foo
I would like to achieve the following search behaviour:
If I search foo, I would like to return back both results.
If I search ba, I would like to return back both results.
If I search bar foo, I would like to only return back one result.
If I search bar foo foo, I don't want to return any results.
I would also like to add in single character fuzziness matching, so if a foo is mistyped as fbo then it would return back both results.
My current search and index analyzer uses an edge_gram tokenizer and is working fairly well, except if any gram matches, it will return the results regardless if the following words match. i.e. my search would return the back the following result for the search bar foo buzz
foo bar
bar foo
My tokenzier:
ngram_tokenizer: {
type: "edge_ngram",
min_gram: "2",
max_gram: "15",
token_chars: ['letter', 'digit', 'punctuation', 'symbol'],
},
My analyzer:
nGram_analyzer: {
filter: [
lowercase,
"asciifolding"
],
type: "custom",
tokenizer: "ngram_tokenizer"
},
My field mapping:
type: "search_as_you_type",
doc_values: false,
max_shingle_size: 3,
analyzer: "nGram_analyzer"
One way to achieve all your requirements is to use span_near query
Span near query are much longer, but these are suitable for doing phrase match along with fuzziness parameter
Adding a working example with index data, search queries and search results
Index Mapping:
{
"mappings": {
"properties": {
"title": {
"type": "text"
}
}
}
}
Index Data:
{
"title":"bar foo"
}
{
"title":"foo bar"
}
Search Queries:
If I search foo, I would like to return back both results.
{
"query": {
"bool": {
"must": [
{
"span_near": {
"clauses": [
{
"span_multi": {
"match": {
"fuzzy": {
"title": {
"value": "foo",
"fuzziness": 2
}
}
}
}
}
],
"slop": 0,
"in_order": true
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "67205552",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "bar foo"
}
},
{
"_index": "67205552",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"title": "foo bar"
}
}
]
If I search ba, I would like to return back both results.
{
"query": {
"bool": {
"must": [
{
"span_near": {
"clauses": [
{
"span_multi": {
"match": {
"fuzzy": {
"title": {
"value": "ba",
"fuzziness": 2
}
}
}
}
}
],
"slop": 0,
"in_order": true
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "67205552",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "bar foo"
}
},
{
"_index": "67205552",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"title": "foo bar"
}
}
]
If I search bar foo foo, I don't want to return any results.
{
"query": {
"bool": {
"must": [
{
"span_near": {
"clauses": [
{
"span_multi": {
"match": {
"fuzzy": {
"title": {
"value": "bar",
"fuzziness": 2
}
}
}
}
},
{
"span_multi": {
"match": {
"fuzzy": {
"title": {
"value": "foo",
"fuzziness": 2
}
}
}
}
},
{
"span_multi": {
"match": {
"fuzzy": {
"title": {
"value": "foo",
"fuzziness": 2
}
}
}
}
}
],
"slop": 0,
"in_order": true
}
}
]
}
}
}
Search Result will be empty
Related
So if a user search with the word covid, i want all those results first, where title of the sentence starts with word covid and then I want all those items where in other parts of the title have the word have word covid. How can I achieve this?
I want more specific answer, how to do that with searchkick.
If you are using the default mapping, You can use the bool query with two should clause, one with match on the text and another is prefix query on .keyword subfield as shown in below example.
Index sample documents
{
"name" : "foo bar"
}
{
"name" : "bar foo"
}
Search query
{
"query": {
"bool": {
"should": [
{
"match": {
"name" : "foo"
}
},
{
"prefix": {
"name.keyword": "foo"
}
}
]
}
}
}
Search results
"hits": [
{
"_index": "71998426",
"_id": "1",
"_score": 1.1823215,
"_source": {
"name": "foo bar"
}
},
{
"_index": "71998426",
"_id": "2",
"_score": 0.18232156,
"_source": {
"name": "bar foo"
}
}
]
Note: first result, having foo bar is scored much higher and comes first in the search hits.
you can even use boost i guess little modification to #Amit code
"query": {
"bool": {
"should": [
{
"match": {
"name" : "covid",
"boost" : 0.5
}
},
{
"prefix": {
"name.keyword": "foo",
"boost" : 1.0
}
}
]
}
}
}```
I have a pattern ".TP-V." which returns strings like "SSTP-VPN". But the pattern ".SSH." Does not return anything, although there are lines like "core:Login:SSH:Cisco". I have no idea what pattern is need.
You need to use ".*SSH.*" instead of ".SSH.".
Adding a working example -
Index Data:
{
"name":"core:Login:SSH:Cisco"
}
{
"name":"SSTP-VPN"
}
Search Query:
{
"query": {
"regexp": {
"name.keyword": {
"value": ".*SSH.*"
}
}
}
}
Search Result:
"hits": [
{
"_index": "68015371",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "core:Login:SSH:Cisco"
}
}
]
Search Query:
{
"query": {
"regexp": {
"name.keyword": {
"value": ".*TP-V.*"
}
}
}
}
I have the following query:
{
size: 6,
query: {
multi_match: {
query,
type: 'bool_prefix',
fields: ['recommendation', 'recommendation._2gram', 'recommendation._3gram'],
},
},
highlight: {
fields: {
recommendation: {},
},
},
}
I want to add fuzziness: 1 to this query, but it has issues with the type: 'bool_prefix'. I need the type: 'bool_prefix to remain there b/c its integral to how the query works, but I'd also like to add some fuzziness to it. Any ideas?
As mentioned in the official ES documentation of bool_prefix
The fuzziness, prefix_length, max_expansions, fuzzy_rewrite, and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"recommendation": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"recommendation":"good things"
}
{
"recommendation":"good"
}
Search Query:
You can add fuzziness parameter with bool_prefix, as shown below
{
"size": 6,
"query": {
"multi_match": {
"query": "goof q",
"type": "bool_prefix",
"fields": [
"recommendation",
"recommendation._2gram",
"recommendation._3gram"
],
"fuzziness": 1
}
},
"highlight": {
"fields": {
"recommendation": {}
}
}
}
Search Result:
"hits": [
{
"_index": "65817192",
"_type": "_doc",
"_id": "2",
"_score": 1.1203322,
"_source": {
"recommendation": "good things"
},
"highlight": {
"recommendation": [
"<em>good</em> things"
]
}
},
{
"_index": "65817192",
"_type": "_doc",
"_id": "1",
"_score": 0.1583319,
"_source": {
"recommendation": "good"
},
"highlight": {
"recommendation": [
"<em>good</em>"
]
}
}
]
I ended up with additional fuzzy query combined with multi_match by bool. In your case it would look like this:
{
"size": 6,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "goof q",
"type": "bool_prefix",
"fields": [
"recommendation",
"recommendation._2gram",
"recommendation._3gram"
]
}
},
{
"fuzzy": {
"nameSearch": {
"value": "goof q",
"fuzziness": "AUTO"
}
}
}
]
}
},
"highlight": {
"fields": {
"recommendation": {}
}
}
}
I am trying to match dashes (and other symbols) in my elastic query.
It is fuzzysearch on all the fields using default whitespace analyzer.
My query:
function_score: {
query: {
multi_match: {
query: string
analyzer: "whitespace",
fuzziness: 1
}
}
}
However this has unexpected results with dash characters. E.x. Central-Park doesnt work with this. Or
Dashes only work well when I use a phrase match and strip out the double quotes. But there is no fuzziness.
Does anyone know how I can get the fuzzysearch normally with dashes please?
Adding a working example with index mapping, index data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"place": {
"type": "text",
"analyzer":"whitespace"
}
}
}
}
Index Data:
{
"place": "Cwntral-Park"
}
{
"place": "Central-Park"
}
{
"place": "Central-Area"
}
Search Query:
{
"query": {
"bool": {
"should": {
"match": {
"place": {
"query": "Central-Park",
"fuzziness": 1
}
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65605120",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"place": "Central-Park"
}
},
{
"_index": "65605120",
"_type": "_doc",
"_id": "3",
"_score": 0.8990934,
"_source": {
"place": "Cwntral-Park"
}
}
]
This works:
GET /bitbucket$$pull-request-activity/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"prid": "12343"
}
},
{
"match": {
"repoSlug": "com.xxx.vserver"
}
}
]
}
}
}
But I would like to capture multiple prids in one call.
This does not work however:
GET /bitbucket$$pull-request-activity/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"prid": "[12343, 11234, 13421]"
}
},
{
"match": {
"repoSlug": "com.xxx.vserver"
}
}
]
}
}
}
any hints?
As you are using must in your bool query, then this represents logical AND, so be sure that all the documents that you are Matching of the prid field, should also match with "repoSlug": "com.xxx.vserver".
If none of the documents match with "repoSlug": "com.xxx.vserver", then no result will return.
And, if only 2 documents match, then only 2 of them will be returned in the search result, and not all the documents.
Adding Working example with mapping, sample docs and search query
Index Sample Data :
{
"id":"1",
"message":"hello"
}
{
"id":"2",
"message":"hello"
}
{
"id":"3",
"message":"hello-bye"
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"match": {
"id": "[1, 2, 3]"
}
},
{
"match": {
"message": "hello"
}
}
]
}
}
}
Search Result :
"hits": [
{
"_index": "foo14",
"_type": "_doc",
"_id": "1",
"_score": 1.5924306,
"_source": {
"id": "1",
"message": "hello"
}
},
{
"_index": "foo14",
"_type": "_doc",
"_id": "3",
"_score": 1.4903541,
"_source": {
"id": "3",
"message": "hello-bye"
}
},
{
"_index": "foo14",
"_type": "_doc",
"_id": "2",
"_score": 1.081605,
"_source": {
"id": "2",
"message": "hello"
}
}
]