Get ElasticSearch simple_query_string to support fuzzy - elasticsearch

I have a record in my ElasticSearch index with the term "cleveland". When I do this search:
"query": {
"multi_match": {
"fields": [
"firstname^3",
"lastname^3",
"home_address",
"home_city"
],
"query": "clevela",
"fuzziness": "AUTO"
}
},
it successfully finds the term. The missing two characters are within the fuzziness threshold. But I'd like to support the extended query syntax of simple_query_string (+, -, phrase search, etc.) So I tried this syntax:
"query": {
"simple_query_string": {
"query": "clevela",
"fields": [
"firstname^3",
"lastname^3",
"home_address",
"home_city"
],
"lenient": true
}
},
and it does not find the term. Fuzziness appears to be turned off. How do I turn it on?

In a simple query string, you need to specify the fuzziness parameter, by adding ~N (N is the max edit distance) after the search term. Modify your search query as
{
"query": {
"simple_query_string": {
"query": "clevela~2", // note this
"fields": [
"firstname^3",
"lastname^3",
"home_address",
"home_city"
],
"lenient": true
}
}
}

Related

Elasticsearch searching across fields with boosting and fuzziness

I am creating an index in elasticsearch and i want the ability to search across multiple fields i.e. have those fields be treated as one big search field. I've done some researching a came across 2 different ways to do this:
The first is with cross_fields multi-match query. This allows for searching across multiple fields as one big field with the ability to boost certain fields. But does not allow for fuzziness to be added.
Using copy_to I can copy fields to an 'all' field so that all the searchable terms are in one big field. This allows for fuzzy search but then does not allow me to boost by specific fields
Is there another cross_fields or search option i'm unaware of that will allow for me to fuzzy search as well as boost by a specific field?
I think you could add fussiness to multi match.
But it will be applied to all fields.
Find an example below with boost and fuzziness
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "bjorn borg schoenen",
"fields": [
"title^5.0",
"brand^2.0"
],
"type": "best_fields",
"operator": "and",
"fuzziness": "auto"
}
}
]
}
}
}
If you want to be more granular, you can use a boolean query with should and a minimum should match:
{
"query": {
"bool": {
"should": [
{
"match": {
"brand": {
"query": "my query",
"fuzziness": "auto",
"boost": 2
}
}
},
{
"match": {
"title": {
"query": "my query",
"fuzziness": "auto",
"boost": 5
}
}
}
],
"minimum_should_match": 1
}
}
}
And if the query become to complicated, I can suggest you to use a search template to keep integration easy on the app side:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html

MUST and MUST_NOT query in Elasticsearch

I have indexed documents with metadata "User_Id" containing data "A"
and "B". I'm trying to check documents "A NOT B". I am not able to get the desired output. I am restricted to not use "query string query" and use "NOT" operator.
Doesn't must_not support multi_match?
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "A",
"fields": ["User_Id"],
"fuzziness": "AUTO"
}
}
],
"must_not" :[
{
"multi_match": {
"query": "B",
"fields": ["User_Id"],
"fuzziness": "AUTO"
}
}
]
}
}
}
You need to remove the fuzziness auto. This setting allows approximation in the string: a query "AUTO13273" with fuzziness "AUTO" will match AUTO13272 and AUTO13273 since the distance between those two strings is only 1.
See the fuzziness documentation here

elasticsearch ngram analyzer return unexpected result

I'm using ngram analyzer for indexing and standard analyzer for query.
currently i have indexed multiphone and iphone.
when i search for iphone the score and therefore relevancy of multiphone is higher than iphone.
how should i build query in order to get higher score for iphone?
the query that i execute is
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "iphone",
"fields": [
"englishName",
"aliasName"
]
}
},
what i need is that iphone score be higher than multiphone.
what about performance?
I have answered similar question here
Basically you need to add raw version of the field to your mapping. You could use keyword analzyer with lowercase filter or you can make it "index" : "not_analyzed" or even use default standard analyzer.
Then you do a bool query and add a clause for the exact match and It will be scored higher.
EDIT : Example
You could map your englishName field as follow
englishName: {
type: 'string',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
You could do the same with aliasName
Then your query would look something like this
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "iphone",
"fields": [
"englishName",
"aliasName"
]
}
},
{
"multi_match": {
"query": "iphone",
"fields": [
"englishName.raw",
"aliasName.raw"
],
"boost": 5
}
}
]
}
}
}
iphone will be scored higher with this query
Hope this helps.

elasticsearch multi_match vs should

Can someone tell me the difference between
"query": {
"bool": {
"should": [
{ "match": {"title": keyword} },
{ "match": {"description": keyword} }
]
}
and
"query": {
"multi_match": {
"query": keyword,
"fields": [ "title", "description" ]
}
}
Is there any performance turning if choose one of two above?
It depends on the type parameter of your multi_match. In your example, since you didn't specify a type, best_fields is used. That makes use of a Dis Max Query and basically
uses the _score from the best field
On the other hand, your example with should
combines the _score from each field.
and it is equivalent to multi_match with type most_fields

How can Elasticsearch search characters like #

I face the problem about writing Elasticsearch query.
My query is like below
{
"query": {
"query_string": {
"default_field": "content",
"query": "#lin1"
}
},
"from": 0,
"size": 1000,
"sort": [
{
"time": "desc"
}
]
}
And I am using query_string
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
But the # character can not match.
It will come out this kind of result: lin1 or 「lin1」
So how should I write the Elasticsearch query to match #lin1?
It all depends on the analyzer you are using. For all you know, you are using the standard analyzer which discards the '#' symbol from the index. In that case, you'll never be able to search for the '#' symbol. But if that is not the case and you do have '#' indexed, you can modify the query_string section of your query to below:
"query_string": {
"default_field": "content",
"query": "#lin1",
"analyzer": "whitespace"
}

Resources