Elasticsearch post_filter on nested types not filtering on aggregations - elasticsearch

Hello Elasticsearch gurus, I need your help!
We're using Elasticsearch v5.4 and
I have multiple nested types in elasticsearch in the form of:
"testLookup" : [
{
"id": 1001,
"name": "test1"
},
{
"id": 1002,
"name": "test2"
}
]
which I'm trying to display the names as checkbox options in a filter.
I was trying to use the post_filter top level element to basically filter the aggregations so I can update and display ONLY the affected filter options, kinda like how it is being used on this demo site: http://demo.searchkit.co
"post_filter": {
"bool":{
"must": [
{
"nested" : {
"path" : "testLookup",
"query": {
"bool": {
"filter":{
"bool":{
"should":[
{ "term": { "perTestLookup.name.keyword": "Adele"}}
]
}
}
}
}
}
},
{
"nested" : {
"path" : "testLookup2",
"query": {
"bool": {
"filter":{
"bool":{
"should":[
{ "term": { "perTestLookup2.name.keyword": "Gene" }}
]
}
}
}
}
}
}
]
}
}
If i'm not mistaken I think what the post_filter does is apply the filter for each of the aggregations before the search request is sent, which you can observe that by looking into the request payload of the search filter after you click one of the filter checkboxes.
However, I can not get the post_filter to apply the filters to the aggregations, the filters are just not being applied. Why is this the case? is post_filter for nested types not supported?
Any tips or guidance will be greatly appreciated. Thank you!

Note: I do not have any searchkit experience.
The post_filter in an Elasticsearch request is executed after the aggregations are calculated to reduce the hits result further. See the docs about post_filter.
The idea behind this is the fact that in a faceted search (i.e. imagine filtering by category), you still want to retrieve that category counts for all categories in your aggregations, but you want to filter for that clicked category in your the documents you are searching.
So this is expected behaviour from the Elasticsearch perspective.

Related

Stop elastic search tokenizing a query

I'm trying to filter out some documents in elastic search 8.4. The issue I'm having is something like this...
must_not: [
match: { ingredients: { query : 'peanut butter' } }
]
seems to break the query into 'peanut' and 'butter'. Then, documents which contain the ingredient 'butter' get incorrectly filtered. Is there a way to prevent this tokenizing without defining a custom analyzer? Or perhaps a different way to search to get that result?
If you don't want to filter documents with just "peanut" or "butter" you need to use the "and" operator. In this way only documents with "peanut butter" will be filtered.
{
"query": {
"bool": {
"must_not": [
{
"match": {
"ingredients": {
"query": "peanut butter",
"operator": "and"
}
}
}
]
}
}
}

Not getting where data with filter (elastic search 6.4)

elasticsearch version: 6.4
Here is my current data:
I want to search for products which has Xbox in name. I am using the match keyword but that is not working.
Below is my elastic search query:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "xbox"
}
}
},
{
"terms": {
"deep_sub": [
"Konsol Oyunları",
"Konsol Aksesuarları"
]
}
}
]
}
},
"from": 0,
"size": 50
}
Whenever you face such kind of issues, try breaking down the query. You have Match Query and Term Query. Run both of them individually and see what's not working.
From what I understand, looks like your field deep_sub is of text type and this would mean Term Query is not returning results.
You would need to create its sibling equivalent using keyword type and then run Term Query on it for exact matches.
From the above link we have the below info:
Keyword fields are only searchable by their exact value.
If you do not control the mapping, meaning if your mapping if of dynamic type, then you must have its sibling equivalent keyword field available which would be deep_sub.keyword
You can check by running GET <your_index_name>/_mapping
Your query would then be as follows:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"match":{
"name":{
"query":"xbox"
}
}
},
{
"terms":{
"deep_sub.keyword":[ <----- Change This
"Konsol Oyunları",
"Konsol Aksesuarları"
]
}
}
]
}
},
"from":0,
"size":50
}
Let me know if this helps!

ElasticSearch query with MUST and SHOULD

I have this query to get data from AWS elasticSearch instance v6.2
{
"query": {
"bool": {
"must": [
{
"term": {"logLevel": "error"}
},
{
"bool": {
"should": [
{
"match": {"EventCategory": "Home Management"}
}
]
}
}
],
"filter": [{
"range": { "timestamp": { "gte": 155254550880 }}
}
]
}
},
"size": 10,
"from": 0
}
My data has multiple EventCategories for example 'Home Management' and 'User Account Management'. Problem with this is inside should having match returns all data because phrase 'Management' is in both categories. If I use term instead of match, it don't returns anything at all even when the given value is exactly same as in document.
I need to get data when any of given category is matched with rest of filters.
EDIT:
There may none, one or more than one EventCategory be passed to should clause
I'm not sure why you added a should within a must. Do you expect to have more than one should cases? It looks a bit odd.
As for your question, you can't use the term query on an analysed field, but only on keyword typed fields. If your EventCategory field has the default mapping, you can run the term query against the default non-analysed multi-field of EventCategory as follows:
...
{
"term": { "EventCategory.keyword": "Home Management" }
}
...
Furthermore, if you just want to filter in/out documents without caring about their relevance, I'd recommend you to move all the conditions in the filter block, to speed-up your query and make a better use of the cache.
Below query should work.
I've just removed should and created two must clauses one for each of event and management. Note that the query is meant for text datatypes.
{
"query":{
"bool":{
"must":[
{
"term":{
"logLevel":"error"
}
},
{
"match":{
"EventCategory":"home"
}
},
{
"match":{
"EventCategory":"management"
}
}
],
"filter":[
{
"range":{
"timestamp":{
"gte":155254550880
}
}
}
]
}
},
"size":10,
"from":0
}
Hope it helps!

How to boost individual documents

I have a pretty complex query and now I want to boost some documents that fulfill some criteria. I have the following simplified document structure and I try to give some documents a boost based on the id, genre, tag.
{
"id": 123,
"genres": ["ACTION", "DRAMA"],
"tags": ["For kids", "Romantic", "Nature"]
}
What I want to do is for example
id: 123 boost: 5
genres: ACTION boost: 3
tags: Romantic boost: 0.2
and boost all documents that are contained in my query and fit the criteria but I don't want to filter them out. So query clause boosting is not of any help I guess.
Edit: To make if easier to understand what I want to achieve (not sure if it is possible with elasticsearch, no is also a valid answer).
I want to search with a query and get a result set. In this set I want to boost some documents. But I don't want to enlarge the result set or filter it. The boost should be independent from the query.
For example I search for a specific tag and want to boost all documents with category 'ACTION' in the result set. But I don't want all documents with category 'ACTION' in the result set and also I don't want only documents with the specific tag AND category 'ACTION'.
I think you need to have Dynamic boosting during query time.
The first matches the id title with boost and second one matches the 'genders' ACTION.
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "id",
"boost": 5
}
}
},
{
"match": {
"content": "Action"
}
}
]
}
}
}
If you want to have multi_match match based on your query:
{
"multi_match" : {
"query": "some query terms here",
"fields": [ "id^5", "genders^3", "tags^0.2" ]
}
}
Note: the ^5 means boost for the title.
Edit:
Maybe you are asking for different types of multi_match queries (at least for ES 5.x) from the ES reference guide:
best_fields
(default) Finds documents which match any field, but uses
the _score from the best field. See best_fields.
most_fields
Finds documents which match any field and combines the _score from
each field. See most_fields.
cross_fields
Treats fields with the same analyzer as though they were one big
field. Looks for each word in any field. See cross_fields.
phrase
Runs a match_phrase query on each field and combines the _score from
each field. See phrase and phrase_prefix.
phrase_prefix
Runs a match_phrase_prefix query on each field and combines the _score
from each field. See phrase and phrase_prefix.
More at: ES 5.4 ElasticSearch reference
I found a solution and it was pretty simple. I use a boosting query. I now just nest the different boosting criteria with and my original query is now the base query.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-boosting-query.html
For example:
{
"query": {
"boosting": {
"positive": {
"boosting": {
"positive": {
"match": {
"director": "Spielberg"
}
},
"negative": {
"term": {
"genres": "DRAMA"
}
},
"negative_boost": 1.3
}
},
"negative": {
"term": {
"tags": "Romantic"
}
},
"negative_boost": 1.2
}
}
}

what's difference between simple_query_string and query_string?

I had a nested field source in my index seems like this:
"source": [
{
"name": "source_c","type": "type_a"
},
{
"name": "source_c","type": "type_b"
}
]
I used query_string query and simple_query_string query to query type_a and got two different result.
query_string
{
"size" : 3,
"query" : {
"bool" : {
"filter" : {
"query_string" : {
"query" : "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
I got 163459 hits in 294088 docs.
simple_query_string
{
"size": 3,
"query": {
"bool": {
"filter": {
"simple_query_string": {
"query": "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
I got 163505 hits in 294088 docs.
I only made three different types type_a,type_b,type_c randomly. So I had to say 163459 and 163505 were very little difference in 294088 docs.
I noly got one info in Elasticsearch Reference [2.1]
Unlike the regular query_string query, the simple_query_string query will never throw an exception, and discards invalid parts of the query.
I don't think it's the reason to make the difference.
I want to know what make the little different results between query_string and simple_query_string?
As far as I know, nested query syntax is not supported for either query_string or simple_query_string. It is an open issue, and this is the PR regarding that issue.
Then how are you getting the result? Here Explain API will help you understand what is going on. This query
{
"size": 3,
"query": {
"bool": {
"filter": {
"simple_query_string": {
"query": "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
have a look at the output, you will see
"description": "ConstantScore(QueryWrapperFilter(_all:source _all:source.type _all:type_a)),
so what is happening here is that ES looking for term source , source.type or type_a, it finds type_a and returns the result.
You will also find something similar with query_string using explain api
Also query_string and simple_query_string have different syntax, for e.g field_name:search_text is not supported in simple_query_string.
Correct way to query nested objects is using nested query
EDIT
This query will give you desired results.
{
"query": {
"nested": {
"path": "source",
"query": {
"term": {
"source.type": {
"value": "type_a"
}
}
}
}
}
}
Hope this helps!!
Acording to the documentation simple_query_string is meant to be used with unsafe input.
So that users can enter anything and it will not throw exception if input is invalid. Will simply discard invalid input.

Resources