Elasticsearch boolean query doesn't work with filter - elasticsearch

I'm not very strong in Elasticsearch. I'm trying to set up search in my app and got some strange problems. I have two documents:
{
"title": "Second insight"
"content": "Bla bla bla"
"library": "workspace"
}
{
"title": "Test source"
"content": "Bla bla bla"
"library": "workspace"
}
Then, I want to be able to make a search by text fields like title and content and apply some filters on fields like library. I have a query:
{
"query": {
"bool": {
"should": [
{ "match": { "title": "insight" }}
],
"filter": [
{
"term": {
"library": "workspace"
}
}
]
}
}
}
Despite the fact that I clearly defined title to be matched to insight, the query above returns both documents, not only the first one.
If I remove filter block:
{
"query": {
"bool": {
"should": [
{ "match": { "title": "insight" }}
]
}
}
}
the query returns correct results.
Then, I also tried to make a partial search. For some reasons, the query uses ins instead of insight below doesn't work, so, it returns empty list:
{
"query": {
"bool": {
"should": [
{ "match": { "title": "ins" }}
]
}
}
}
How should I make partial search? And how can I set up filters correctly? In other words, how to make a search partial query by some fields, but at the same time filtered by other fields?
Thanks.

You need to supply minimum_should_match in your first query.
I did the following and only got a single document (your desired outcome)
POST test_things/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"title": "insight"
}
}
],
"filter": [
{
"term": {
"library": "workspace"
}
}
]
}
}
}
As for why ins doesn't work, it depends on your mapping + analyzer being used. You are matching against analyzed terms in the index, if you want to match against ins you need to change your analyzer (possibly using the ngram tokenizer) or use a wildcard query.

Related

how to make match query on array field more accurate

example:
here is a document:
{
"_source": {
"name": [
"beef soup",
"chicken rice"
]
}
}
it can be recalled by below query
{
"match": {
"name": {
"query": "soup chicken noodle",
"minimum_should_match": "67%"
}
}
}
but I only want it to be recalled by keyword hot beef soup or rice chicken hainan, is there any way except nested or span query to do this, thanks.
my es query is complex, anyone know how to rewrite it by span query
{
"query": {
"bool": {
"filter": [
...
],
"must": {
"dis_max": {
"queries": [
{
"match": {
"array_field_3": {
"boost": 2,
"minimum_should_match": "67%",
"query": "keyword aa bb"
}
}
},
......
{
"nested": {
"path": "path_1",
"query": {
"must": {
"match": {
"array_field_6": {
......
"query": "keyword aa bb"
}
}
}
}
}
}
}
],
"tie_breaker": 0.15
}
}
}
}
}
You can use match_phrase but it will only work for entire phrase. if you want to do only keyword match on each element of array then it is not possible without nested or span as mentioned in document.
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
When you get a document back from Elasticsearch, any arrays will be in the same order as when you indexed the document. The _source field that you get back contains exactly the same JSON document that you indexed.
However, arrays are indexed — made searchable — as multi-value fields, which are unordered. At search time you can’t refer to “the first element” or “the last element”.
Please try match_phrase query:
POST index1/_search
{
"query": {
"match_phrase": {
"text": {
"query": "chicken soup"
}
}
}
}

Elasticsearch search query: nested query with OR-gates & AND-gates

I have docs as follow:
{
"name": "...",
"country": "...",
}
I need to find. either one of the following criteria:
name=John AND country=US
name=Andy AND country=UK
How should be write this nested query?
Assuming the default fields mapping is defined, you can use boolean queries as follows:
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"term": {
"name.keyword": "John"
}
},
{
"term": {
"country.keyword": "US"
}
}
]
}
},
{
"bool": {
"filter": [
{
"term": {
"name.keyword": "Andy"
}
},
{
"term": {
"country.keyword": "UK"
}
}
]
}
}
]
}
}
}
You should use must instead of filter if you want the query to contribute to the score.
must
The clause (query) must appear in matching documents and will
contribute to the score.
filter
The clause (query) must appear in matching documents. However unlike
must the score of the query will be ignored. Filter clauses are
executed in filter context, meaning that scoring is ignored and
clauses are considered for caching.

Is there any option to minimize this elastic search must not match query?

I'm trying to avoid some text from the field and for that I have used must not condition but, it seems to be static also took more lines. So, please let me know is there any other option to optimize this query.
Here is the query,
"must_not": [
{
"match": {
"field.keyword": "welcome"
}
},
{
"match": {
"field.keyword": "Welcome"
}
},
{
"match": {
"field.keyword": "entry_point"
}
},
{
"match": {
"field.keyword": "Entry point"
}
}
]
Thanks,
If search text is same , you can use multi- match which will search for text in multiple fields
"bool": {
"must_not": [
{
"multi_match": {
"query": "text",
"fields": ["field1.keyword","field2.keyword"]
}
}
]
}
If field is same and texts are different , you can use terms query
"must_not": [
{
"terms": {
"field.keyword": [
"VALUE1",
"VALUE2"
]
}
}
]
If both fields and texts are different you will have to use query in your question.
As you said you are not looking for an exact match i would just use query string for single words and match phrase for phrases.
"must_not": [
{
"query_string": {
"query": "welcome OR Welcome"
}
},
{
"match_phrase": {
"title": {
"query": "entry point",
}
}
}
]
I'm not sure which analyzer you use but if you use lowercase + alphanumeric only for example you wont have to have "duplicate" queries like "welcome" and "Welcome".

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

Exact and fuzzy search

My setup:
I have some documents with name "Apple", "Apple delicous", ...
This is my query:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "apple"
}},
{ "fuzzy": {
"name": "apple"
}}
]
}
}
}
I want achieve, that first the exact match is shown and then the fuzzy one:
apple
apple delicous
Second, i am wondering that i did not get any result if i enter only app in the search:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "app"
}},
{ "fuzzy": {
"name": "app"
}}
]
}
}
}
There are two problems here.
1)To give higher score to an exact match you could try adding "index" : "not_analyzed" to your name field like this.
name: {
type: 'string',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
After that your query would look like this
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "apple"
}
},
{
"match": {
"name.raw": "apple"
},
"boost": 5
}
]
}
}
}
This will give higher score for document with "apple" than "apple delicous"
2)To better understand fuzziness you should go through this and this article.
From the Docs
The fuzziness parameter can be set to AUTO, which results in the
following maximum edit distances:
0 for strings of one or two characters
1 for strings of three, four, or five characters
2 for strings of more than five characters
So, the reason your fuzzy query did not return apple for app is because fuzziness i.e edit distance is 2 between those words and since "app" is only three letter word, fuzziness value is 1. You could achieve the desired result with following query
{
"query": {
"fuzzy": {
"name": {
"value": "app",
"fuzziness": 2
}
}
}
}
I seriously would not recommend using this query, because It will return bizarre results, the above query will return cap, arm, pip and lot of other words as they fall within edit distance of 2.
This would better query
{
"query": {
"fuzzy": {
"name": {
"value": "appl"
}
}
}
}
It will return apple.
I hope this helps.
I think ,This will help you.
{"query":{"bool":{"must":[{"function_score":{"query":{"multi_match":{"query":"airetl","fields":["brand_lower"],"boost":1,"fuzziness":Auto,"prefix_length":1}}}}}]}}

Resources