Elastic Search query with should returning 10.000 results but nothing matches - elasticsearch

So I have an index of about 60GB data and basically I want to make a query to retrieve 1 specific product based off its reference number.
here is my query:
GET myindex/_search
{
"_source": [
"product.ref",
"product.urls.*",
"product.i18ns.*.title",
"product_sale_elements.quantity",
"product_sale_elements.prices.*.price",
"product_sale_elements.listen_price.*",
"product.images.image_url",
"product.image_count",
"product.images.visible",
"product.images.position"
],
"size": "6",
"from": "0",
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "product.sales_count",
"missing": 0,
"modifier": "log1p"
}
},
{
"field_value_factor": {
"field": "product.image_count",
"missing": 0,
"modifier": "log1p"
}
},
{
"field_value_factor": {
"field": "featureCount",
"missing": 0,
"modifier": "log1p"
}
}
],
"query": {
"bool": {
"filter": [
{
"term": {
"product.is_visible": true
}
}
],
"should": [
{
"query_string": {
"default_field": "product.ref",
"query": "13141000",
"boost": 2
}
}
]
}
}
}
},
"aggs": {
"by_categories": {
"terms": {
"field": "categories.i18ns.de_DE.title.raw",
"size": 100
}
}
}
}
My question therefore is, why does this query give me back 10k results whereas I just wanted the 1 single product with that reference number.
If I do:
GET my-index/_search
{
"query": {
"match": {
"product.ref": "13141000"
}
}
}
it matches correctly. How is should different then a normal match query?

If you have must or filter clauses, as you do, then anything than matches must or filter does not have to match your should clause, since it's considered "optional"
You can either move query_string within your should clause to filter or set minimum_should_match to 1 like this
...
"should": [
{
"query_string": {
"default_field": "product.ref",
"query": "13141000",
"boost": 2
}
}
],
"minimum_should_match" : 1,
...

Must - The condition must match.
Should - If the condition matches, then it will improve the score in a non-filter context. (If minimum_should_match is not declared explicitly)
As you can see, must is similar to filter but also provides scoring. Filter will not be providing any scoring.
You can put this clause inside a new must clause:
{
"query_string": {
"default_field": "product.ref",
"query": "13141000",
"boost": 2
}
}
Boost will not effect scoring if you put the above inside the filter clause.
Read more about bool queries here

Related

Elasticsearch Ranking on aggregation based on AND and OR

I have a query with multiple keywords with an aggregation on author ID.
I want the ranking to be based on combining must and should.
For example for query 'X', 'Y' the authors containing both 'X' and 'Y' in the document field should be ranked higher, followed by authors who have either 'X' or 'Y'.
Doing each of them (AND/OR) is easy, I need the idea/direction how to achieve both in one ES query.
The current query I have for both X and Y is:
GET /docs/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "X",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
},
{
"query_string": {
"query": "Y",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}
Changing must to should change it to OR but I want the combination of both ranking authors with must a higher ranking in aggregation.
The usual way of boosting results is by adding a should clause looking for both terms, like this.
GET /docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{
"match": {
"fulltext": "X Y",
"operator": "AND"
}
}
],
"must": [
{
"match": {
"fulltext": "X Y",
"operator": "OR"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}

Query on multiple range of document

What I want to search is to extract documents among certain range of documents, not the whole documents. I know ids of documents. For example, I want to query matching some sentences with query field - 'pLabel' among the documents ids of which I know via different process. My trial is as below but I got bunch of documents which is different with my expectation.
For example, in such documents as eid1, eid2...etc groups, I want to query filtering out the matching documents out of the groups (eid1, eid2, eid3, ...). Query is shown as below.
How I fix query statement to get the right search result?
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "pLabel" ,
"query": "search words here"
}
}
] ,
"must_not": [] ,
"should": [
{
"term": {
"eid": "eid1"
}
} ,
{
"term": {
"eid": "eid2"
}
}
]
}
} ,
"size": 0 ,
"_source": [
"eid"
] ,
"aggs": {
"eids": {
"terms": {
"field": "eid" ,
"size": 1000
}
}
}
}
You need to move the should clause of the Doc IDs inside the must clause.
Right now the query can return any document that matches the query_string clause, it'll only prefer docs that matches the Doc IDs.
Also, you should use terms query
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "pLabel",
"query": "search words here"
}
},
{
"terms": {
"user": ["eid1", "eid2"]
}
}
]
}
},
"size": 0,
"_source": [
"eid"
],
"aggs": {
"eids": {
"terms": {
"field": "eid",
"size": 1000
}
}
}
}

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Minimum should match on filtered query

Is it possible to have a query like this
"query": {
"filtered": {
"filter": {
"terms": {
"names": [
"Anna",
"Mark",
"Joe"
],
"execution" : "and"
}
}
}
}
With the "minimum_should_match": "2" statement?
I know that I can use a simple query (I've tried, it works) but I don't need the score to be computed. My goal is just to filter documents which contains 2 of the values.
Does the score generally heavily impact the time needed to retrieves document?
Using this query:
"query": {
"filtered": {
"filter": {
"terms": {
"names": [
"Anna",
"Mark",
"Joe"
],
"execution" : "and",
"minimum_should_match": "2"
}
}
}
}
I got this error:
QueryParsingException[[my_db] [terms] filter does not support [minimum_should_match]]
Minimum should match is not a parameter for the terms filter. If that is the functionality you are looking for, I might rewrite your query like this, to use the bool query wrapped in a query filter:
{
"filter": {
"query": {
"bool": {
"should": [
{
"term": {
"names": "Anna"
}
},
{
"term": {
"names": "Mark"
}
},
{
"term": {
"name": "Joe"
}
}
],
"minimum_should_match": 2
}
}
}
}
You will get documents matching preferably exactly all three, but the query will also match document with exactly two of the three terms. The must is an implicit and. We also do not compute score, as we have executed the query as a filter.

query not applying custom score

I'm making the next query, my problem is that the custom score (scrip_score) is not being applied. Am I doing something wrong?:
{
"query": {
"bool": {
"must": [
{
"terms": {
"tactics": [
"user_id"
"type_user",
"browser_plugins",
"cashback"
]
}
}
]
},
"script_score": {
"script": "type_user === 2 ? 1 : 2"
}
},
"from": "0",
"size": 50,
"sort": {
"name": {
"order": "desc",
"ignore_unmapped": true
}
}
}
The script_score section in your query gets ignored. If you want it to be taken into account you need to wrap you existing bool query into a function_score query where you can use the script_score part as well.

Resources