How to calculate score after filter in Elasticsearch - elasticsearch

I want the score to be calculated after the filter is applied.
Right now, the score gets calculated considering all the documents in the index. And other entity documents interfere with the score calculation which I don't want.
eg: I have multiple stores, and each store items should be independently scored.
I tried the score_function, but could not make it work.
Indexing items in-store separately can solve the problem, but the store can be in thousands.
Each store can have thousands--200000 documents.
Stores can be up to. 10000. can be more, customer can create as many as they want.
{
"bool": {
"should": [{
"match": {
"normalized_tokens": {
"query": "Where can i find the chocolate",
"analyzer": "custom_analyzed_document_search",
"operator": "or"
}
}
}, {
"match": {
"normalized_tokens": {
"query": "find chocolate",
"analyzer": "custom_analyzed_document_search",
"operator": "or"
}
}
}, {
"term": {
"document": "Where can i find the chocolate"
}
}],
"filter": [{
"bool": {
"should": [{
"terms": {
"department_id": []
}
}, {
"terms": {
"store_id": [1,2,3]
}
}]
}
}]
}
}

Related

Can ElasticSearch perform multiple aggregations with different query conditions in a single request?

I am looking for a solution to get aggregations, one of each field, but apply different query conditions at different aggregations.
I have a collection of products, which has attributes: type, color, brand.
User selected: brand=Gap, color=White, and type=Sandal. To display the counts of the various similar products of at each aggregation:
Query condition for brand aggregation : color=White, and type=Sandal
Query condition for color aggregation: brand=Gap, and
type=Sandal
Query condition for type aggregation: brand=Gap, and color=White
Can this be done in a single ElasticSearch query?
You'd create three aggregations with a filter agg for each and add the queries you'd like in there. I used the simplest one - bool with term - just to show the high level approach:
"aggs": {
"brand_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"color": "white"
}
},
{
"term": {
"type": "sandal"
}
}
]
}
}
},
"color_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"brand": "gap"
}
},
{
"term": {
"type": "sandal"
}
}
]
}
}
},
"type_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"color": "white"
}
},
{
"term": {
"brand": "gap"
}
}
]
}
}
}
}

How can we use exists query in tandem with the search query?

I have a scenario in Elasticsearch where my indexed docs are like this :-
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123", "note": "imp"}
Here the requirement stress that we have to do a term match query and then provide relevance score to them which is a straight forward thing but the additional aspect here is if any doc found in search result has note field then it should be given higher relevance. How can we achieve it with DSL query? Using exists we can check which docs contain notes but how to integrate with match query in ES query. Have tried lot of ways but none worked.
With ES 5, you could boost your exists query to give a higher score to documents with a note field. For example,
{
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "your term"
}
}
},
"should": {
"exists": {
"field": "note",
"boost": 4
}
}
}
}
}
With ES 2, you could try a boosted filtered subset
{
"query": {
"function_score": {
"query": {
"match": { "name": "your term" }
},
"functions": [
{
"filter": { "exists" : { "field" : "note" }},
"weight": 4
}
],
"score_mode": "sum"
}
}
}
I believe that you are looking for boosting query feature
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl-boosting-query.html
{
"query": {
"boosting": {
"positive": {
<put yours original query here>
},
"negative": {
"filtered": {
"filter": {
"exists": {
"field": "note"
}
}
}
},
"negative_boost": 4
}
}
}

Elasticsearch outputs the score of 1.0 for all results when searching for a single "starred" term

We are using Elasticsearch to search for the most relevant companies in a specific catalog. When we use the normal search term like lettering we get reasonable scores and can sort the results according to the score.
However, when we modify the search term before querying and make the "starred" version of it (e.g., *lettering*) to be able to search for substrings we get a score of 1.0 for every result. The search for substrings is a requirement in the project.
Any ideas on what could cause this relevance computation? The problem occurs only when a single term is used. We get comprehensible scores when we use two starred terms in combination (e.g., *lettering* *digital*).
EDIT 1:
Exemplary mapping (YAML, other properties are mapped in the same way, excepting boost which is different for each property):
elasticSearchMapping:
type: object
include_in_all: true
enabled: true
properties:
'keywords':
type: string
include_in_all: true
boost: 50
Query:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match_all": []
}, {
"query_string": {
"query": "*lettering*"
}
}]
}
},
"filter": {
"bool": {
"must": [{
"term": {
"__parentPath": "/sites/industrycatalog"
}
}, {
"terms": {
"__workspace": ["live"]
}
}, {
"term": {
"__dimensionCombinationHash": "d751713988987e9331980363e24189ce"
}
}, {
"term": {
"__typeAndSupertypes": "IndustryCatalog:Entry"
}
}],
"should": [],
"must_not": [{
"term": {
"_hidden": true
}
}, {
"range": {
"_hiddenBeforeDateTime": {
"gt": "now"
}
}
}, {
"range": {
"_hiddenAfterDateTime": {
"lt": "now"
}
}
}]
}
}
}
},
"fields": ["__path"],
"script_fields": {
"distance": {
"script": "doc['coordinates'].distanceInKm(51.75631079999999,14.332867899999997)"
}
},
"sort": [{
"customer.featureFlags.industrycatalog": {
"order": "asc"
}
}, {
"_geo_distance": {
"coordinates": {
"lat": "51.75631079999999",
"lon": "14.332867899999997"
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}],
"size": 999999
}
What you are doing is wildcard query, They fall under term level queries and by default constant score is applied.
Check the Lucene Documentation, WildcardQuery extends MultiTermQuery
You can also verify this with the help of explain api, you will something like this
"_explanation": {
"value": 1,
"description": "ConstantScore(company:lettering), product of:",
"details": [{
"value": 1,
"description": "boost"
}, {
"value": 1,
"description": "queryNorm"
}]
}
You can change this behavior with rewriting,
Try this, rewrite also works with query string query
{
"query": {
"wildcard": {
"company": {
"value": "digital*",
"rewrite": "scoring_boolean"
}
}
}
}
It has various options for scoring, see what fits your requirement.
EDIT 1, the reason you see score other than 1 for *lettering* *digital* is due to queryNorm, you can again check with explain api, If you look closely, all documents with both matches will have same score and documents with single match will have same score also.
P.S : leading wildcard is not recommended at all. You will get performance issues since it has to check against every single term in the inverted index. You might want to check edge ngram or ngram filter
Hope this helps!

May I search among some fields, but use another field's matching score for sorting?

I have some documents like this
{"id":1,"city":"London","content":"soccer","continent":"Europe"},
{"id":2,"city":"New York","content":"basketball","continent":"North America"},
{"id":3,"city":"Tokyo","content":"baseball","continent":"Asia"},
...
I need to search keywords among some fields(excluding city field), e.g. a query like
{
"query": {
"bool": {
"should": [ //SHOULD_CLAUSE
"match": {
"continent": "America"
},
"term": {
"content": "soccer"
}
]
}
}
}
To make the results more "personalized", I want to make matched documents whose city field is the same as the visiting user's city property.
However, if I make city as a query field(something like "match":{"city":"Tokyo"}) in should boolean clause, it may return some documents that only match the city field, which mismatch the fields I need to search. When using boost to make city field more "important" for sorting things goes worse.
How can I achieve my goal?
It seems that a possible way write the SHOULD_CLAUSE part twice and make one of it combined with city clause using and
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"bool": {
SHOULD_CLAUSE
}
}, {
"match": {
"city": {
"query": "Tokyo",
"boost": 4.0
}
}
}]
}
}, {
"bool": {
SHOULD_CLAUSE
}
}]
}
}
}
But under the real circumstance the SHOULD_CLAUSE part may be more complicated and the whole query seems too long to write. I wonder if there is a better way.
If you want to have only result matching your user city, you should wrap your should query into a must query, something like :
{
"query": {
"bool": {
"must": [{
"bool": {
"should": [{
SHOULD_CLAUSE_1
}, {
SHOULD_CLAUSE_2
}]
}
}, {
"match": {
"city": "Tokyo"
}
}]
}
}
}

Why script in custom_filters_score behaves as boost?

{
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [
{
"filter": {
"term": {
"subject": "math"
}
},
"script": "_score + doc['subject_score'].value"
}
]
}
}
}
If script is having like above it gives Error: unresolvable property or identifier: _score
If script is like "script": "doc['subject_score'].value" It multiplies the _score in similar way boost does. I want to replace the elasticsearch _score with custom score.
If I understood you correctly you would like to use elasticsearch scoring if subject is not math and you would like to use custom scoring with subject is math. If you are using Elasticsearch v0.90.4 or higher, it can be achieved using new function_score query:
{
"query": {
"function_score": {
"query": {
"term": {
"name": "user1234"
}
},
"functions": [{
"filter": {
"term": {
"subject": "math"
}
},
"script_score": {
"script": "doc[\"subject_score\"].value"
}
}, {
"boost_factor": 0
}],
"score_mode": "first",
"boost_mode": "sum"
}
}
}
Prior to v0.90.4 you would have to resort to using combination of custom_score and custom_filters_score:
{
"query": {
"custom_score": {
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [{
"filter": {
"term": {
"subject": "math"
}
},
"script": "-1.0"
}]
}
},
"script": "_score < 0.0 ? _score * -1.0 + doc[\"subject_score\"].value : _score"
}
}
}
or as #javanna suggested, use multiple custom_score queries combined together by bool query:
{
"query": {
"bool": {
"disable_coord": true,
"should": [{
"filtered": {
"query": {
"term": {
"name": "user1234"
}
},
"filter": {
"bool": {
"must_not": [{
"term": {
"subject": "math"
}
}]
}
}
}
}, {
"filtered": {
"query": {
"custom_score": {
"query": {
"term": {
"name": "user1234"
}
},
"script": "doc['subject_score'].value"
}
},
"filter": {
"term": {
"subject": "math"
}
}
}
}]
}
}
}
Firstly I'd like to say that there are many ways of customising the scoring in elasticsearch and it seems like you may have accidentally picked the wrong one. I will just summarize two and you will see what the problem is:
Custom Filters Score
If you read the docs (carefully) on custom_filters_score then you will see that it there for performance reasons, to be able to use for scoring the the faster filter machinery of elasticsearch. (Filters are faster as scoring is not calculated when computing the hit set, and they are cached between requests.)
At the end of the docs; it mentions custom_filters_score can take a "script" parameter to use instead of a "boost" parameter per filter. Best way to think of this is to calculate a number, which will be passed up to the parent query to be combined with the other sibling queries to calculate the total score for the document.
Custom Score Query
Reading the docs this is used when you want to customise the score from the query and change it how you wish. There is a _score variable available to you to use in your "script" which is the score of the query inside the custom_score query.
Try this:
"query": {
"filtered": {
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "doc['subject_score'].value" //*see note below
}
},
"filter": {
"and": [
{
"term": {
"subject": "math"
}
},
{
"term": {
"name": "user1234"
}
}
]
}
}
}
*NOTE: If you wanted to you could use _score here. Also, I moved both your "term" parts to filters as any match of a term would get the same score and filters are faster.
Good luck!

Resources