how to use boost or weight in more_like_this - elasticsearch

I have the following Elastic Query,
more_like_it_match = {
"min_score": 5,
"query":
{"filtered": {
"query": {
"bool": {
"must": {
"more_like_this": {
"fields": ["title","desc","cat_id","user_id"],
"like": {
"doc": {
"title": item["title"],
"desc": item["desc"],
"cat_id": item["cat_id"],
"user_id": item["user_id"],
},
},
"min_term_freq": 1,
"max_query_terms": 100,
"min_doc_freq": 0
}
}
}
},
"filter": {
"not": {
"term": {
"id": item["id"]
}
}
}
}
}
}
it works correctly but I'm looking for a solution that I could set boost or weight for each one of the fields, as an example I want to say to Elastics Title field matching is three-time more important than Category Field, is there any way to achieve it?
note: I've found the following query as the solution but it not what I'm looking for.
{
"min_score" : 5,
"query": {
"dis_max": {
"queries": [
{
"more_like_this" : {
"fields" : ["title"],
"like_text" : item["title"],
"min_term_freq" : 1,
"max_query_terms" : 100,
"boost": 100
}
},
{
"more_like_this" : {
"fields" : ["desc"],
"like_text" : item["desc"],
"min_term_freq" : 1,
"max_query_terms" : 100,
"boost": 100,
}
}
]
}
},
"filter":{
"not":{
"term" :{
"id": item["id"]
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html
Dis Max Queryedit
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as Boolean Query would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both Boolean Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery’s is combined into a BooleanQuery.

Related

elasticsearch - can you give weight to newer documents?

If we have 10,000 documents with the same score, but we limit the search to 1,000, is there a way to give more weight to newer documents so the newer 1,000 show up?
If all the documents have the same score then the most straightforward way to go is just sorting by creation date:
https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html
Example with _score as first criteria, and date for tiebreakers:
GET /my-index-000001/_search
{
"sort" : [
"_score",
{ "post_date" : {"order" : "desc"} },
],
"query" : {
"term" : { "user" : "kimchy" }
}
}
If you want to add score on top the query score you can use a distance query on the creation date field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-distance-feature-query.html
PUT /items
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"creation_date": {
"type": "date"
}
}
}
}
PUT /items/_doc/1?refresh
{
"name" : "chocolate",
"production_date": "2018-02-01",
"location": [-71.34, 41.12]
}
PUT /items/_doc/2?refresh
{
"name" : "chocolate",
"creation_date": "2018-01-01"
}
PUT /items/_doc/3?refresh
{
"name" : "chocolate",
"creation_date": "2017-12-01"
}
GET /items/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "chocolate"
}
},
"should": {
"distance_feature": {
"field": "creation_date",
"pivot": "7d",
"origin": "now"
}
}
}
}
}
origin will define the starting point from where you want to give more weight to the documents which are close, in the example the closest to "now" the document is, the weight it will have.
pivot distance of the origin the document will receive half of the score.

How to filter with multiple fields and values in elasticsearch?

I've been reading through the docs and been trying to achieve a solution to filter results through multiple fields and columns, however I keep getting errors; malformed query.
I want to filter the result with exact equal values, such as the following:
is_active: true
category_id: [1,2,3,4]
brand: "addidas"
gender: "male"
To make it more clear what I intend to do, this is how I'd like it to run if it would be written in SQL:
SELECT .... WHERE
is_active= 1 AND category_id IN(1,2,3,4)
AND brand='addidas' AND gender='male'
My query in DSL goes as following:
{
"body": {
"query": {
"nested": {
"query": {
"bool": {
"must": {
"terms": {
"category_id": [
1,
2,
3
]
},
"term": {
"is_active": true
},
"term": {
"brand": "addidas"
}
}
}
}
}
}
}
}
How do I filter multiple fileds and values as described, in elasticsearch?
If you need extra information from me that is required to answer the question, leave a comment. If you add a link to the docs, please also provide an example (with query dsl) of how my current, or similar situations should be solved.
Use the following code:
The clause (query) must appear in matching documents and will contribute to the score.
"query": {
"bool": {
"must" : [
{"term" : { "is_active" : true}},
{"term" : { "gender" : "female"}},
{"term" : { "brand" : "addidas"}},
{"terms": { "categoryId": [1,2,3,4]}}
]
}
}
Queries specified under the filter element have no effect on scoring
"query": {
"bool": {
"filter" : [
{"term" : { "is_active" : true}},
{"term" : { "gender" : "female"}},
{"term" : { "brand" : "addidas"}},
{"terms": { "categoryId": [1,2,3,4]}}
]
}
}

Exclude documents from aggregation

I am trying to get a filtered result set from my index.
{"group_id": 123, "type" : 1},
{"group_id": 123, "type" : 3},
{"group_id": 123, "type" : 2},
{"group_id": 423, "type" : 3},
{"group_id": 423, "type" : 1},
{"group_id": 231, "type" : 1}
Now I want to get all documents but exclude the ones with group_id that contains type = 2. So, in this case, I want to get all documents with group_id = 423 and group_id = 231, but exclude all documents with group_id = 123.
I was experimenting with filtered bool query:
{
"query": {
"bool": {
"must_not": [
{
"term": {
"type": 2
}
}
]
}
}
}
but that only excludes one document.
Any hints are welcome!
You can achieve this using two Elasticsearch search requests:
First, get all values of "group_id" for which corresponding value of "type" is 2. You need to use Terms Aggregation for this.
POST <index name>/<type name>/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"term": {
"type": 2
}
}
}
},
"aggs": {
"group_ids_type_2": {
"terms": {
"field": "group_id",
"size": 0
}
}
}
}
Save the list of values of "group_id" fields received from the above request.
Now, use a query with must_not filter to get all documents such that the value of their "group_id" is not present in the list obtained above. You need to use Terms Filter here.
POST <index name>/<type name>/_search
{
"query": {
"bool": {
"must_not": [
{
"terms": {
"group_id": [
"123" <-- Replace this with a comma separated list of all group_id values received from first search request
]
}
}
]
}
}
}

Elasticsearch: how to filter by summed values in nested objects?

I have the following products structure in the elasticsearch:
POST /test/products/1
{
"name": "product1",
"sales": [
{
"quantity": 10,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-02"
},
{
"quantity": 5,
"customer": "customer2",
"date": "2013-12-30"
}
]
}
POST /test/products/2
{
"name": "product2",
"sales": [
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 15,
"customer": "customer1",
"date": "2014-02-01"
},
{
"quantity": 1,
"customer": "customer2",
"date": "2014-01-21"
}
]
}
The sales field is nested object. I need to filter products like this:
"get all products which have total quantity >= 16 and sales.customer = 'customer1'".
The total quantity is sum(sales.quantity) where sales.customer = 'customer1'.
Therefore the search results should contain only 'product2'.
I tried to use aggs but I didn't understand how to filter in this case.
I haven't found any information about it in the elasticsearch documentation.
Is it possible?
I would welcome any ideas, thanks!
First of all be clear what do you want as result? Is it count or query fields? Aggregations only gives count and for fields you need to use filter in query. If you want fields then you cant get filter for sum(sales.quantity)>=16 and if you want count you can get it using range aggregation but for that also i think you can use range only in elasticsearch document fields not some computed values.
The nearest solution i can give you is as below
{
"size" : 0,
"query" :{
"filtered" : {
"query" :{ "match_all": {} },
"filter" : {
"nested": {
"path": "sales",
"filter" : {"term" : {"sales.customer" : "customer1"}}
}
}
}
},
"aggregations" :{
"salesNested" : {
"nested" : {"path" : "sales"},
"aggregations" :{
"aggByrange" : {
"numeric_range": {
**"field": "sales.quantity"**,
"ranges": [
{
"from": 16
}]
}
}
},
"aggregations" : {
"quantityStats" : {
"stats" : {
{ "field" : "sales.quantity" }
}
}
}
}
}
}
In above query we are using "field": "sales.quantity". For your solution use must be able change sales.quantity with sum value of quantityStats aggregation which i think elasticsearch dont provide.

filter by child frequency in ElasticSearch

I currently have parents indexed in elastic search (documents) and child (comments) related to these documents.
My first objective was to search for a document with more than N comments, based on a child query. Here is how I did it:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
}
I used score to calculate the amount of comments a document has and then I filtered the documents by this amount, using "min_score".
Now, my objective is to search not just comments, but several other child documents related to the document, always based on frequency. Something like the query bellow:
documents/document/_search
{
"query": {
"match_all": {
}
},
"filter" : {
"and" : [{
"query": {
"has_child" : {
"type" : "comment",
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
},
{
"or" : [
{"query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "Finally"
}
}
}
}
},
{ "query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "several"
}
}
}
}
}
]
}
]
}
}
The query above works fine, but it doesn't filter based on frequency as the first one does. As filters are computed before scores are calculated, I cannot use min_score to filter each child query.
Any solutions to this problem?
There is no score at all associated with filters. I'd suggest to move the whole logic to the query part and use a bool query to combine the different queries together.

Resources