I am new to ES and have a multi-match query in ES, and want to consider field based on its availability - elasticsearch

{
"multi_match": {
"query": "TEST",
"fields": [
"description.regexkeyword^1.0",
"logical_name.regexkeyword^1.0",
"logical_table_name.regexkeyword^1.0",
"physical_name.regexkeyword^1.0",
"presentation_name.regexkeyword^1.0",
"table_name.regexkeyword^1.0"
],
"type": "best_fields",
"operator": "AND",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
There is a field, i.e. edited_description, if in case edited_description exists in document then consider edited_description.regexkeyword^1.0 else consider description, i.e. description.regexkeyword^1.0.

You can't define an if condition in multi_match query. But what you can do is re-look your problem statement in a different way. I can re-look this as, that if in case edited_description and description both exists then the match in edited_description field should be given a higher preference.
This can achieved by setting slightly higher boost value for edited_description field.
{
"multi_match": {
"query": "TEST",
"fields": [
"description.regexkeyword^1.0",
"edited_description.regexkeyword^1.2",
"logical_name.regexkeyword^1.0",
"logical_table_name.regexkeyword^1.0",
"physical_name.regexkeyword^1.0",
"presentation_name.regexkeyword^1.0",
"table_name.regexkeyword^1.0"
],
"type": "best_fields",
"operator": "AND",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
This will result in documents having a match in edited_description to be ranked higher. You can adjust the boosting value to your needs.

Related

Elastic Search Exception for a multi match query of type phrase when using a combination of number and alphabate without space

I am getting exception for below query:
"multi_match": {
"query": "\"73a\"",
"fields": [],
"type": "phrase",
"operator": "AND",
"analyzer": "custom_analyzer",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1.0
}
Exception I am getting:
error" : {
"root_cause" : [
{
"type" : "illegal_state_exception",
"reason" : "field \"log_no.keyword\" was indexed without position data; cannot run SpanTermQuery (term=73)"
},
{
"type" : "illegal_state_exception",
"reason" : "field \"airplanes_data.keyword\" was indexed without position data; cannot run SpanTermQuery (term=73)"
}
],
Note: 1) When I am changing the type from "phrase" to "best_fields", I am not getting any error and getting proper results for "query": ""73a"".
2) Using type as "phrase" and giving space between number and alphabet ex: "query": ""73 a"" also gives results without error.
My question is why with type as "phrase", I am getting error when there is no space between a number and alphabet combo in a query. Ex - "query": ""443abx"", "query": ""73222aaa"".
I am new to elastic search. Any help is appreciated. Thanks :)

ElasticSearch: Ignore frequency in multi_match query

I have query like this:
"body": {
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"multi_match": {
"query": "iphone",
"fields": [
"primary.titles^2",
"primary.descriptions^1"
],
"fuzziness": 1,
"prefix_length": 5,
"max_expansions": 25,
"operator": "and"
}
}
],
"must": []
}
}
}
The problem is that docs with title:"iphone/iphone" are scoring much higher than with title:"iphone", is there any way to ignore repeating of term in search scoring?

Fuzzy Matching in Elasticsearch gives different results in two different versions

I have a mapping in elasticsearch with a field analyzer having tokenizer:
"tokenizer": {
"3gram_tokenizer": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
now I am trying to search a name = "avinash" in Elasticsearch with query = "acinash"
The query formed is:
{
"size": 5,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "acinash",
"fields": [
"name"
],
"type": "best_fields",
"operator": "AND",
"slop": 0,
"fuzziness": "1",
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": false,
"fuzzy_transpositions": false,
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}
But in ES version 6.8 I am getting the desired result(because of fuzziness) i.e "avinash" from querying "acinash", but in ES version 7.1 I am not getting the result.
Same goes when tried to search "avinash" using "avinaah" in 6.8 i am getting results but in 7.1 i am not getting results
What ES does is it will convert it into tokens :[aci, cin, ina, nas, ash] which ideally should match with tokenised inverted index in ES with tokens : [avi, vin, ina, nas, ash].
But why is it not matching in 7.1?
It's not related to ES version.
Update max_expansions to more than 50.
max_expansions : Maximum number of variations created.
With 3 grams letter & digits as token_chars, ideal max_expansion will be (26 alphabets + 10 digits) * 3

Elasticsearch wildcard query on numeric fields without using mapping

I'm looking for a creative solution because I can't use mapping as solution is already in production.
I have this query:
{
"size": 4,
"query": {
"bool": {
"filter": [
{
"range": {
"time": {
"from": 1597249812405,
"to": null,
}
}
},
{
"query_string": {
"query": "*181*",
"fields": [
"deId^1.0",
"deTag^1.0",
],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"sort": [
{
"time": {
"order": "asc"
}
}
]
}
"deId" field is an integer in elasticsearch and the query returns nothing (though should),
Is there a solution to search for wildcards in numeric fields without using the multi field option which requires mapping?
Once you index an integer, ES does not treat the individual digits as position-sensitive tokens. In other words, it's not directly possible to perform wildcards on numeric datatypes.
There are some sub-optimal ways of solving this (think scripting & String.substring) but the easiest would be to convert those integers to strings.
Let's look at an example deId of 123181994:
POST prod/_doc
{
"deId_str": "123181994"
}
then
GET prod/_search
{
"query": {
"bool": {
"filter": [
{
"query_string": {
"query": "*181*",
"fields": [
"deId_str"
]
}
}
]
}
}
}
works like a charm.
Since your index/mapping is already in production, look into _update_by_query and stringify all the necessary numbers in a single call. After that, if you don't want to (and/or cannot) pass the strings at index time, use ingest pipelines to do the conversion for you.

elasticsearch cross fields query alternative for fuzziness?

I have a cross-fields query, and I understand already that you cant use fuzziness with cross-fields queries, but I dont understand the alternative...
this is my simple query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "John Legend",
"fields": [
"fname^-4.0",
"lname^-1.0",
"city^-1.0",
],
"type": "cross_fields",
"lenient": "true",
"operator": "AND"
}
}
],
"minimum_should_match": "1"
}
},
"from": 0,
"size": 20
}
I want to be able to find:
John Legend
Joh
John Lege
is that possible?

Resources