multi_match fuzzy query across multiple fields - elasticsearch

I am working to match a 'term' to multi fields (or _all field)
I want to do a fuzzy match on cross_fields but it is not supported.
any ideas how to do it or any other ways to do it ?
query: {
multi_match: {
query: term,
type: "cross_fields",
fields: ['_all']
}
}
when trying the solution here
ElasticSearch multi_match query over multiple fields with Fuzziness
I get this error
[parsing_exception] Fuziness not allowed for type [cross_fields], with
{ line=1 & col=128 }
elasticsearch version 5.0
edit:
here is the query I am building
bool: {
must: [
{
fuzzy: {
_all: term
}
},
{
fuzzy: {
"location.country": country
}
},
{
fuzzy: {
"location.city": city
}
}
]
}

cross_fields works by searching the term on your multiple fields. Since fuzziness isn't supported for cross_fields you have to write the query in a different way.
One possible is: implement your own "cross_fields" with shoulds and add there the fuzziness.
Say your term is: "term1 term2", you can split by word boundary (Regex \b) then should them in this form:
{
{
"query": {
"bool": {
"should": [{
"match": {
"field1": "term",
"fuzziness": 1
}
},{
"match": {
"field1": "term",
"fuzziness": 1
}
},{
"match": {
"field2": "term1",
"fuzziness": 1
}
},{
"match": {
"field2": "term12",
"fuzziness": 1
}
}
]
}
}
}
}
This is probably less the optimal if you have many fields, the query will become a cartesian product of the terms and fields.
Important note You're using _all field which is one field. which all other fields are indexed into. Maybe you don't even need cross_fields?

Related

multi fields search query for elasticsearch golang

I have a situation where I need to do elastic search based on multi-field. For Example: I have multiple fields in my postindex and I want to apply condition on four these fields (i.e. userid, channelid, createat, teamid) to meet my search requirement. When value of all these fields matched then search query displays results and if one of these is not match with values in postindex then it display no result.
I am trying to make a multifield search query for go-elasticsearch to search data from my post index. For the searcquery result four field must match otherwise it display 0 hit/no-result.
So, I think you need to write a following query :
GET postindex/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must": [
{
"term": {
"userid": {
"value": "mcqmycxpyjrddkie9mr13txaqe"
}
}
},
{
"term": {
"channelid": {
"value": "dnoihmrinins3qrm6bb9175ume"
}
}
},
{
"range": {
"createat": {
"gt": 1672909114890
}
}
}
]
}
},
{
"term": {
"teamid": {
"value": "qomrg11o8b8ijxoy8hrcnweoay"
}
}
}
]
}
}
}
In here, there is a bool query with should in parent scope, which is like OR. And inside the should there is another bool query with must which is like AND. We can also write the query shorter, but this will be better for you to understand.

Stop elastic search tokenizing a query

I'm trying to filter out some documents in elastic search 8.4. The issue I'm having is something like this...
must_not: [
match: { ingredients: { query : 'peanut butter' } }
]
seems to break the query into 'peanut' and 'butter'. Then, documents which contain the ingredient 'butter' get incorrectly filtered. Is there a way to prevent this tokenizing without defining a custom analyzer? Or perhaps a different way to search to get that result?
If you don't want to filter documents with just "peanut" or "butter" you need to use the "and" operator. In this way only documents with "peanut butter" will be filtered.
{
"query": {
"bool": {
"must_not": [
{
"match": {
"ingredients": {
"query": "peanut butter",
"operator": "and"
}
}
}
]
}
}
}

increase score of query where all text match and not repeating words

I'm using the following query but it gets higher score for words which are repeated and is a subset of the words typed but not the entire sentence match.
For Eg:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "test in maths",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
If the field value contains : test test test
has higher score than the field value : test in maths
How can I get the higher score for the exact words match and not repeated words?
Thanks in Advance.
If you want to search exact sentences/phrases you should use the match_phrase query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).
You can add a should-clause that contains the match-phrase query to boost the score of exact phrases to your current query.
you can use match_phrase query for an exact match. match_phrase matches for exact occurrence in the sequence of the query provided.
e.g
{
'query': {
'bool': {
'must': [{
'match_phrase': {
'title': 'test in maths'
}
}]
}
}
}
Editing after comment:
Use
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_options": "docs"
}
}
}
}
and then you can use normal match type query, the elastisearch won't consider repetition of the words in the index for the title field.

Query string query with keyword and text fields in the same search

Upgrading from Elasticsearch 5.x to 6.x. We make extensive use of query string queries and commonly construct queries which used fields of different types.
In 5.x, the following query worked correctly and without error:
{
"query": {
"query_string": {
"query": "my_keyword_field:\"Exact Phrase Here\" my_text_field:(any words) my_other_text_field:\"Another phrase here\" date_field:[2018-01-01 TO 2018-05-01]",
"default_operator": "AND",
"analyzer": "custom_text"
}
}
}
In 6.x, this query will return the following error:
{
"type": "illegal_state_exception",
"reason": "field:[my_keyword_field] was indexed without position data; cannot run PhraseQuery"
}
If I wrap the phrase in parentheses instead of quotes, the search will return 0 results:
{
"query": {
"query_string": {
"query": "my_keyword_field:(Exact Phrase Here)",
"default_operator": "AND",
"analyzer": "custom_text"
}
}
}
I guess this is because there is a conflict between the way the analyzer stems the incoming query and how the data is stored in the keyword field, but the phrase approach (my_keyword_field:"Exact Phrase Here") did work in 5.x.
Is this no longer supported in 6.x? And if not, what is the migration path and/or a good workaround?
It would be better to rephrase the query by using different type of queries available for different use cases. For example use term query for exact search on keyword field. Use range query for ranges etc.
You can rephrase query as below:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "my_text_field:(any words) my_other_text_field:\"Another phrase here\"",
"default_operator": "AND",
"analyzer": "custom_text"
}
},
{
"term": {
"my_keyword_field": "Exact Phrase Here"
}
},
{
"range": {
"date_field": {
"gte": "2018-01-01",
"lte": "2018-05-01"
}
}
}
]
}
}
}

ElasticSearch: Labelling documents with matching search term

I'm using elasticsearch 1.7 and am in need of a way to label documents with what part of a query_string query they match.
I've been experimenting with highlighting, but found that it gets a bit messy with some cases. I'd love to have the document tagged with matching search terms.
Here is the query that I'm using: ( note this is a ruby hash that later gets encoded to JSON )
{
query: {
query_string: {
fields: ["title^10", "keywords^4", "content"],
query: query_string,
use_dis_max: false
}
},
size: 20,
from: 0,
sort: [
{ pub_date: { order: :desc }},
{ _score: { order: :desc }}
]
}
The query_string variable is based off user followed topics and might look something like this: "(the AND walking AND dead) OR (iphone) OR (video AND games)"
Is there any option I can use so that documents returned would have a property matching a search term like the walking dead or (the AND walking AND dead)
If you're ready to switch to using bool/should queries, you can split the match on each field and use named queries, then in the results you'll get the name of the query that matched.
It goes basically like this: in a bool/should query, you add one query_string query per field and name the query so as to identify that field (e.g. title_query for the title field, etc)
{
"query": {
"bool": {
"should": [
{
"query_string": {
"fields": [
"title^10"
],
"query": "query_string",
"use_dis_max": false,
"_name": "title_query"
}
},
{
"query_string": {
"fields": [
"keywords^4"
],
"query": "query_string",
"use_dis_max": false,
"_name": "keywords_query"
}
},
{
"query_string": {
"fields": [
"content"
],
"query": "query_string",
"use_dis_max": false,
"_name": "content_query"
}
}
]
}
}
}
In the results, you'll then get below the _source another array called matched_queries which contains the name of the query that matched the returned document.
"_source": {
...
},
"matched_queries": [
"title_query"
],

Resources