Elasticsearch - bump individual result to the top - sorting

I'm working with Elasticsearch. I have an array of documents, and I'm trying to sort documents by the property price, except that I'd like a particular document to be the first result no matter what.
The below is what I'm using as my "sort" array as my attempt to order documents by ID 1213, and then all following documents ordered by price descending.
[
{
"id": {
"mode": "max",
"order": "desc",
"nested_filter": {
"term": {
"id": 1213
}
},
"missing": "_last"
}
},
{
"price": {
"order": "asc"
}
}
]
This doesn't appear to be working, though—document 1213 doesn't appear first. What am I doing wrong here?
As an example—the ideal returned result:
[{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
Instead, I get:
[{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]

As others have already asked, what is the reason for the nested_filter?
There's many possible ways to do what you need. Here is one possible way which fits with the simple requirements you mentioned so far:
{
"query" : {
"custom_filters_score" : {
"query" : {
"match_all" : {}
},
"filters" : [
{
"filter" : {
"term" : {
"id" : "1213"
}
},
"boost" : 2
}
]
}
},
"sort" : [
"_score",
"price"
]
}
The assumption here is that your query is simple like the match_all query and does not affect the scores in anyway. If you do have something more complicated for the queries, to not affect the scores, you can try wrapping with a constant_score query. But ideally you get the document set you want where all the documents have the same score and then custom_filters_score query will boost the score of the document you want. You can do this for any number of documents adding further filters or if the documents are equal, use a terms filter. In the end the sort by the score and then the price.

In this case you need to use function_score to modify score of each doc.
{
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"id": "1213"
}
},
"weight": 1
},
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
],
"score_mode": "sum",
"boost_mode" : "replace",
"query" : {
//YOUR QUERY GOES HERE
}
}
}
}
Explanation:
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
Compute score based on price and give a value < 1. The higher the price the smaller the score (ascending). If you want to switch to descending then just replace it with
"script": "(1 - (1 / doc['price'].value))"
{
"filter": {
term": {
"id": "1213"
}
},
"weight": 1
}
This will give any docs with "id" = 1213 an extra 1 score. The total score at the end will be the sum of those 2 functions.

Related

Sorting by a nested field in elasticsearch

If I had a data structure that looked like this
[{"_id" 1
"scores" [{"student_id": 1, "score": 100"}, {"student_id": 2, "score": 80"}
]},
{"_id" 2
"scores" [{"student_id": 1, "score": 20"}, {"student_id": 2, "score": 90"}
]}]
Would it be possible to sort this dataset by student_1's score or by student_2's score?
For example if I sorted descending by student 1's score, I would get document 1,2, but if I sorted descending by student 2's score, I would get 2,1.
I could re-arrange the data, but I don't want to use another index because there's a bunch of metadata not included above for brevity. Thanks!
Yes, it is possible. You must use "nested" field type for your scores, that way you can keep the relation between each student_id and its score.
You can read an article I wrote about that subject:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Now the example:
Mappings
PUT test_students
{
"mappings": {
"properties": {
"scores": {
"type": "nested",
"properties": {
"student_id": {
"type": "keyword"
},
"score": {
"type": "long"
}
}
}
}
}
}
Documents
PUT test_students/_doc/1
{
"scores": [{"student_id": 1, "score": 100}, {"student_id": 2, "score": 80}]
}
PUT test_students/_doc/2
{
"scores": [{"student_id": 1, "score": 20}, {"student_id": 2, "score": 90}]
}
Query
POST test_students/_search
{
"sort" : [
{
"scores.score" : {
"mode" : "max",
"order" : "desc",
"nested": {
"path": "scores",
"filter": {
"term" : { "scores.student_id" : "2" }
}
}
}
}
]
}

Elasticsearch - Filter on string array and then aggregate on relevant keywords only

I have an index with an attribute containing a list of keywords.
Let's say my documents look like this :
{
"product_name": "Iphone",
"keywords" : ["Best seller", "Apple", "Black", "Awesome"]
}
{
"product_name": "Galaxy S21",
"keywords" : ["Awesome", "Android"]
}
I want to enable my users to do get autocompletions on the keywords (like suggestions) but I also want to make aggregations on the suggestions to let them know how many documents match each one.
So if a user types "A", we should return 3 results :
{"expression": "Android", "count": 1}
{"expression": "Apple", "count": 1}
{"expression": "Awesome", "count": 2}
"Best seller" / "Black" should not be returned as results by Elasticsearch.
There's no mapping constraint.
I've tried queries like the one below but unexpected keywords are returned in the aggregations :
{
"query": {
"multi_match": {
"query": "a",
"fields": ["keywords"],
"type": "bool_prefix"
}
},
"size": 0,
"aggs": {
"matched_keywords": {
"terms": {
"field": "keywords",
"size": 10
}
}
}
}
Any solution / advice would be helpful.
Thanks.

Aggregator of type top_hits cannot accept sub-aggregations with Percentiles

I have the following documents:
{"id": 1, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 500}
{"id": 2, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 450}
{"id": 3, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 420}
{"id": 4, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 150}
{"id": 5, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 150}
{"id": 6, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 100}
{"id": 7, "type": "bags", "brand": "Louis Vuitton" "condition": "used", "price": 400}
{"id": 8, "type": "bags", "brand": "Louis Vuitton", "condition": "used", "price": 350}
{"id": 9, "type": "bags", "brand": "Louis Vuitton", "condition": "used", "price": 300}
I am looking to write a query that will return to me the Percentiles of prices for the top 2 documents for each condition. In other words, I want to perform some calculation after getting the top 2 best scoring documents for each item condition (new, like new, used). I have tried this but I am getting the error the error Aggregator of type top_hits cannot accept sub-aggregations:
{
"query": {
"match": {
"brand": "Louis Vuitton"
}
},
"aggs": {
"item_conditions": {
"terms": {
"field": "condition"
},
"aggs": {
"top_two": {
"top_hits": {
"size": 2
},
"aggs": {
"top_two_percentiles": {
"percentiles": {
"field": "price"
}
}
}
}
}
}
}
}
Is there another way to achieve this, or do I have to do some post-processing myself after getting the results back from ES? The end result I want is to be able to supply this data to charts to make it look like this: https://ibb.co/y5FpV80
"... the percentiles of prices for the top two documents ..." is somewhat arbitrary. What's the metric that determines the score? A terms aggregation would score the buckets equally. The only differentiating factor would be the bucket count... What I'm saying is, you'll need to first determine what puts a given bucket in the top 2 and go from there.
In any event, you can:
Order any terms aggregation by the result of one of its numeric child aggregations.
After that, you can limit it to 2 buckets.
When that's done, you can use a percentiles bucket aggregation to calculate the percentiles of the two top prices.
In concrete terms:
POST your-index/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.doc_count,aggregations.*.buckets.percentiles_top_two_prices
{
"size": 0,
"query": {
"match": {
"brand": "Louis Vuitton"
}
},
"aggs": {
"item_conditions": {
"terms": {
"field": "condition"
},
"aggs": {
"top_two": {
"terms": {
"field": "price",
"size": 2,
"order": {
"max_score": "desc" <-- here's how you enforce the top 2 docs
}
},
"aggs": {
"max_score": {
"max": {
"script": "_score" <-- how you determine what happens here is up to you. _score will be equal across all buckets (I believe) so pick some other metric.
}
},
"just_the_price": {
"min": {
"field": "price" <-- there's no "identity" agg in ES so I'm using min. There will be only bucket because you're already under the parent which aggregates the price.
}
}
}
},
"percentiles_top_two_prices": {
"percentiles_bucket": {
"buckets_path": "top_two>just_the_price"
}
}
}
}
}
}
yielding something along the lines of:
{
"aggregations" : {
"item_conditions" : {
"buckets" : [
{
"key" : "like new",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 100.0,
"5.0" : 100.0,
"25.0" : 100.0,
"50.0" : 150.0,
"75.0" : 150.0,
"95.0" : 150.0,
"99.0" : 150.0
}
}
},
{
"key" : "new",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 420.0,
"5.0" : 420.0,
"25.0" : 420.0,
"50.0" : 450.0,
"75.0" : 450.0,
"95.0" : 450.0,
"99.0" : 450.0
}
}
},
{
"key" : "used",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 300.0,
"5.0" : 300.0,
"25.0" : 300.0,
"50.0" : 350.0,
"75.0" : 350.0,
"95.0" : 350.0,
"99.0" : 350.0
}
}
}
]
}
}
}
I'm frankly not sure what these stats would bring you (when based on only two values) but this is how it could be done 😉

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Elasticsearch: how to filter by summed values in nested objects?

I have the following products structure in the elasticsearch:
POST /test/products/1
{
"name": "product1",
"sales": [
{
"quantity": 10,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-02"
},
{
"quantity": 5,
"customer": "customer2",
"date": "2013-12-30"
}
]
}
POST /test/products/2
{
"name": "product2",
"sales": [
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 15,
"customer": "customer1",
"date": "2014-02-01"
},
{
"quantity": 1,
"customer": "customer2",
"date": "2014-01-21"
}
]
}
The sales field is nested object. I need to filter products like this:
"get all products which have total quantity >= 16 and sales.customer = 'customer1'".
The total quantity is sum(sales.quantity) where sales.customer = 'customer1'.
Therefore the search results should contain only 'product2'.
I tried to use aggs but I didn't understand how to filter in this case.
I haven't found any information about it in the elasticsearch documentation.
Is it possible?
I would welcome any ideas, thanks!
First of all be clear what do you want as result? Is it count or query fields? Aggregations only gives count and for fields you need to use filter in query. If you want fields then you cant get filter for sum(sales.quantity)>=16 and if you want count you can get it using range aggregation but for that also i think you can use range only in elasticsearch document fields not some computed values.
The nearest solution i can give you is as below
{
"size" : 0,
"query" :{
"filtered" : {
"query" :{ "match_all": {} },
"filter" : {
"nested": {
"path": "sales",
"filter" : {"term" : {"sales.customer" : "customer1"}}
}
}
}
},
"aggregations" :{
"salesNested" : {
"nested" : {"path" : "sales"},
"aggregations" :{
"aggByrange" : {
"numeric_range": {
**"field": "sales.quantity"**,
"ranges": [
{
"from": 16
}]
}
}
},
"aggregations" : {
"quantityStats" : {
"stats" : {
{ "field" : "sales.quantity" }
}
}
}
}
}
}
In above query we are using "field": "sales.quantity". For your solution use must be able change sales.quantity with sum value of quantityStats aggregation which i think elasticsearch dont provide.

Resources