Elasticsearch: how to write bool query that will contain multiple conditions on the same token? - elasticsearch

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}

I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

Related

How to use Wildcards in Elastic search query to skip some prefix values

"I am searching in a elasticsearch cluster GET request on the basis of sourceID tag with value :- "/A/B/C/UniqueValue.xml" and search query looks like this:-"
{
"query": {
"bool": {
"must": [
{
"term": {
"source_id": {
"value": "/A/B/C/UniqueValue.xml"
}
}
}
]
}
}
}
"How can i replace "/A/B/C" from any wildcard or any other way as i just have "UniqueValue.xml" as an input for this query. Can some please provide the modified search Query for this requirement? Thanks."
The following search returns documents where the source_id field contains a term that ends with UniqueValue.xml.
{
"query": {
"wildcard": {
"source_id": {
"value": "*UniqueValue.xml"
}
}
}
}
Note that wildcard queries are expensive. If you need fast suffix search, you could add a multi-field to your mapping which includes a reverse token filter. Then you can use prefix queries on that reversed field.

Return matched input from an elasticsearch query as they were typed

Here is an example:
"query": {
"bool": {
"should": [
{
"match_phrase_prefix": {
"keyword":{
"query": "pokemno", //approximative term
"fuzziness": 1
}
}
},
//... multiple should closures there
]
}
}
This query match a document with the "Pokémon" keyword, however i want to get back the exact input search term(s) that has matched for each document (here, "pokemno").
AFAIK the highlight feature only return the matched values as they are saved in the document, but not the ones from the search input.
Is there any way to get those values back from the search?

AND between tokens in elasticsearch

When I'm trying to search for a documents with such query (field indexed with Standard analyzer):
"query": {
"match": {
"Book": "OG/44"
}
}
I've got terms 'OG' and '44' and the result set will contain results where could be either of these terms. What analyzer/tokenizer I should use to get results when only both of terms are present?
You can set operator in match query (by default it is or)
"query": {
"match": {
"Book": {
"query": "OG/44",
"operator" : "and"
}
}
}
You have two tokens because standard analyzer tokenized them by slash, so if you need not this behaviour you can escape it

Filter Then Sort Results By Query in ElasticSearch

Is there a way in ElasticSearch to run a boolean filter, then without refinding the search further, sort/order the results based on a multi_field query?
Eg: Get all items with status_id = 1 (the filter), then order those documents by using the keywords "red car" (documents whose name and description contain those keywords are first, documents without are last).
You can use bool query -
As per condition of should -
The clause (query) should appear in the matching document. In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
In our case , as there is a must and its a number match , score value wont be computed. But then conditions in should would be used for computing the score alone -
{
"query": {
"bool": {
"must": [
{
"match": {
"status_id": 1
}
}
],
"should": [
{
"multi_match": {
"query": "red car",
"fields": [
"subject",
"message"
]
}
}
]
}
}
}

query_string vs group match in elasticsearch

What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!
The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.
I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.

Resources