elasticsearch - can you give weight to newer documents? - elasticsearch

If we have 10,000 documents with the same score, but we limit the search to 1,000, is there a way to give more weight to newer documents so the newer 1,000 show up?

If all the documents have the same score then the most straightforward way to go is just sorting by creation date:
https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html
Example with _score as first criteria, and date for tiebreakers:
GET /my-index-000001/_search
{
"sort" : [
"_score",
{ "post_date" : {"order" : "desc"} },
],
"query" : {
"term" : { "user" : "kimchy" }
}
}
If you want to add score on top the query score you can use a distance query on the creation date field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-distance-feature-query.html
PUT /items
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"creation_date": {
"type": "date"
}
}
}
}
PUT /items/_doc/1?refresh
{
"name" : "chocolate",
"production_date": "2018-02-01",
"location": [-71.34, 41.12]
}
PUT /items/_doc/2?refresh
{
"name" : "chocolate",
"creation_date": "2018-01-01"
}
PUT /items/_doc/3?refresh
{
"name" : "chocolate",
"creation_date": "2017-12-01"
}
GET /items/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "chocolate"
}
},
"should": {
"distance_feature": {
"field": "creation_date",
"pivot": "7d",
"origin": "now"
}
}
}
}
}
origin will define the starting point from where you want to give more weight to the documents which are close, in the example the closest to "now" the document is, the weight it will have.
pivot distance of the origin the document will receive half of the score.

Related

How to boost certain documents if the search query contains a certain term/text in elastic

If the search query contains fruits I want to boost the products from a certain category?
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 2,
"filter": {
"term": { "categories": "3" }
}
}
}
]
}
}
}
I have the above query, which gives a constant score to items with the category 3, I want to apply this constant score/boosting/increase relevancy only when a certain text (for example fruits) is present in the search term.
Sample elasticsearch document
{
"id" : 1231,
"name" : {
"ar" : "Arabic fruit name",
"en" : "english fruit name"
}
"categories" : [3,1,3] // category ids because the same product is shown in multiple categories
}
How do I achieve this? I use elasticsearch 7.2
Original answer:
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 2,
"filter": {
"bool": {
"filter": [
{
"term": {
"categories": "3"
}
}
],
"should": [
{
"match": {
"name.ar": "fruit"
}
},
{
"match": {
"name.en": "fruit"
}
}
],
"minimum_should_match": 1
}
}
}
}
]
}
}
}
If i understand correctly what you're looking for.
Btw, I suggest using "match_phrase" instead of "match" if you want to match "fruit name" exactly and not "fruit" or "name"
Update: (based on the comment)
In that case i'd suggest reorganizing your schema in the following manner:
"item": {
"properties": {
"name": {
"type": ["string"]
},
"language": {
"type": ["string"]
}
}
}
So your sample would become:
{
"id" : 1231,
"item" : [
{"name": "Arabic fruit name", "language": "ar"}
{"name": "english fruit name", "language": "en"}
],
"categories" : [3,1,3]
}
And then you can match against "item.name"
Why? Because the way ElasticSearch indexes (at least, by default) is to flatten your the array, so internally it looks like ["Arabic fruit name", "english fruit name"]
With your original sample, two different fields are created in the schema (name.ar and name.en), which is actually not a great design if you need to scale

Filtering, sorting and paginating by sub-aggregations in ElasticSearch 6

I have a collection of documents, where each document indicates the available rooms for a given hotel and day, and their cost for that day:
{
"hotel_id": 2016021519381313,
"day": "20200530",
"rooms": [
{
"room_id": "00d70230ca0142a6874358919336e53f",
"rate": 87
},
{
"room_id": "675a5ec187274a45ae7a5fdc20f72201",
"rate": 53
}
]
}
Being the mapping:
{
"properties": {
"day": {
"type": "keyword"
},
"hotel_id": {
"type": "long"
},
"rooms": {
"type": "nested",
"properties": {
"rate": {
"type": "long"
},
"room_id": {
"type": "keyword"
}
}
}
}
}
I am trying to figure out, how to do a query where I can get the available rooms for a set of days which total cost is less than a given amount, ordered by total cost in ascending order and paginated.
So far I came up with the way of getting rooms available for the set of days and their total cost. Basically filtering by the days, and grouping per hotel and room IDs, requiring that the minimum count in the aggregation is the number of days I am looking for.
{
"size" : 0,
"query": {
"bool": {
"must": [
{
"terms" : {
"day" : ["20200423", "20200424", "20200425"]
}
}
]
}
} ,
"aggs" : {
"hotel" : {
"terms" : {
"field" : "hotel_id"
},
"aggs" : {
"rooms" : {
"nested" : {
"path" : "rooms"
},
"aggs" : {
"rooms" : {
"terms" : {
"field" : "rooms.room_id",
"min_doc_count" : 3
},
"aggs" : {
"sum_price" : {
"sum" : { "field" : "rooms.rate" } }
}
}
}
}
}
}
}
}
So now I am interesting in ordering the result buckets in descending order at the "hotel" level based on the value of the sub-aggregation with "rooms", and also filtering the buckets that do not contains enough documents or which "sum_price" is bigger than a given budget. But I cannot manage how to do it.
I have been taking a look at "bucket_sort", but I cannot find the way to sort in base a subaggregation. I have been also taking a look to "bucket_selector", but it gives me empty buckets when they do not fit the predicate. I am probably not using them correctly in my case.
Which would be the right way of accomplish it?
Here is the query without pagination:
{
"size":0,
"query":{
"bool":{
"must":[
{
"terms":{
"day":[
"20200530",
"20200531",
"20200532"
]
}
}
]
}
},
"aggs":{
"rooms":{
"nested":{
"path":"rooms"
},
"aggs":{
"rooms":{
"terms":{
"field":"rooms.room_id",
"min_doc_count":3,
"order":{
"sum_price":"asc"
}
},
"aggs":{
"sum_price":{
"sum":{
"field":"rooms.rate"
}
},
"max_price":{
"bucket_selector":{
"buckets_path":{
"var1":"sum_price"
},
"script":"params.var1 < 100"
}
}
}
}
}
}
}
}
Please note that the following variables should be changed for the desired results:
day
min_doc_count
script in max_price

Elasticsearch geo_shape query giving wrong results

I am facing am issue, I know how to find all geo_points in a particular radius but I need to find how many regions or geo_shape a particular point lies in. To solve this issue, I have made following index:
PUT /users
And this mapping:
PUT /users/_mapping/_doc
{
"properties": {
"radius": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "100m"
},
"point":{
"type":"geo_point"
}
}
}
Also following is the sample document:
POST /users/_doc
{
"radius":{
"type" : "circle",
"coordinates" : [28.363157, 77.287550],
"radius" : "100km"
},
"point":{
"lat" : 28.363157,
"lon": 77.287550
}
}
The query I am making is:
POST /users/_search
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"radius": {
"shape": {
"type": "point",
"coordinates" : [29.363157, 77.28755]
},
"relation": "contains"
}
}
}
}
}
}
Now, the distance between the latlongs in query and doc is almost 110-112kms, hence above query returns exact result, but when I query [30.363157, 77.28755], it still returns the document even when the distance is over 220kms.
What am I doing wrong?

Range Query on a score returned by match Query in Elastic Search

Suppose I have a set of documents like :-
{
"Name":"Random String 1"
"Type":"Keyword"
"City":"Lousiana"
"Quantity":"10"
}
Now I want to implement a full text search using an N-gram analyazer on the field Name and City.
After that , I want to filter only the results returned with
"_score" :<Query Score Returned by ES>
greater than 1.2 (Maybe By Range Query Aggregation Method)
And after that apply term aggregation method on the property: "Type" and then return the top results in each bucket by using "top_hits" aggregation method.
How can I do so ?
I've been able to implement everything apart from the Range Query on score returned by a search query.
if you want to score the documents organically then i you can use min_score in query to filter the matched documents for the score.
for ngram analyer i added whitespace tokenizer and a lowercase filter
Mappings
PUT index1
{
"settings": {
"analysis": {
"analyzer": {
"edge_n_gram_analyzer": {
"tokenizer": "whitespace",
"filter" : ["lowercase", "ednge_gram_filter"]
}
},
"filter": {
"ednge_gram_filter" : {
"type" : "NGram",
"min_gram" : 2,
"max_gram": 10
}
}
}
},
"mappings": {
"document_type" : {
"properties": {
"Name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"City" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"Type" : {
"type": "keyword"
}
}
}
}
}
Index Document
POST index1/document_type
{
"Name":"Random String 1",
"Type":"Keyword",
"City":"Lousiana",
"Quantity":"10"
}
Query
POST index1/_search
{
"min_score": 1.2,
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"Name": {
"value": "string"
}
}
},
{
"term": {
"City": {
"value": "string"
}
}
}
]
}
},
"aggs": {
"type_terms": {
"terms": {
"field": "Type",
"size": 10
},
"aggs": {
"type_term_top_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
Hope this helps

filter by child frequency in ElasticSearch

I currently have parents indexed in elastic search (documents) and child (comments) related to these documents.
My first objective was to search for a document with more than N comments, based on a child query. Here is how I did it:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
}
I used score to calculate the amount of comments a document has and then I filtered the documents by this amount, using "min_score".
Now, my objective is to search not just comments, but several other child documents related to the document, always based on frequency. Something like the query bellow:
documents/document/_search
{
"query": {
"match_all": {
}
},
"filter" : {
"and" : [{
"query": {
"has_child" : {
"type" : "comment",
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
},
{
"or" : [
{"query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "Finally"
}
}
}
}
},
{ "query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "several"
}
}
}
}
}
]
}
]
}
}
The query above works fine, but it doesn't filter based on frequency as the first one does. As filters are computed before scores are calculated, I cannot use min_score to filter each child query.
Any solutions to this problem?
There is no score at all associated with filters. I'd suggest to move the whole logic to the query part and use a bool query to combine the different queries together.

Resources