Sorting by a nested field in elasticsearch - elasticsearch

If I had a data structure that looked like this
[{"_id" 1
"scores" [{"student_id": 1, "score": 100"}, {"student_id": 2, "score": 80"}
]},
{"_id" 2
"scores" [{"student_id": 1, "score": 20"}, {"student_id": 2, "score": 90"}
]}]
Would it be possible to sort this dataset by student_1's score or by student_2's score?
For example if I sorted descending by student 1's score, I would get document 1,2, but if I sorted descending by student 2's score, I would get 2,1.
I could re-arrange the data, but I don't want to use another index because there's a bunch of metadata not included above for brevity. Thanks!

Yes, it is possible. You must use "nested" field type for your scores, that way you can keep the relation between each student_id and its score.
You can read an article I wrote about that subject:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Now the example:
Mappings
PUT test_students
{
"mappings": {
"properties": {
"scores": {
"type": "nested",
"properties": {
"student_id": {
"type": "keyword"
},
"score": {
"type": "long"
}
}
}
}
}
}
Documents
PUT test_students/_doc/1
{
"scores": [{"student_id": 1, "score": 100}, {"student_id": 2, "score": 80}]
}
PUT test_students/_doc/2
{
"scores": [{"student_id": 1, "score": 20}, {"student_id": 2, "score": 90}]
}
Query
POST test_students/_search
{
"sort" : [
{
"scores.score" : {
"mode" : "max",
"order" : "desc",
"nested": {
"path": "scores",
"filter": {
"term" : { "scores.student_id" : "2" }
}
}
}
}
]
}

Related

How to filter query based on a field value

I'm working with elasticsearch Query dsl, and I can't find a way to express the following:
Return results that have the field "price" > min budget and have "price" < max Budget and have has_price=true and also return all results that have "has_price=false"
In other words, I would like to use a range filter on results only that have has_price field set to true, otherwise, on results that have has_price set to false don't take in consideration the filter
Here's the mapping:
{
"formations": {
"mappings": {
"properties": {
"code": {
"type": "text"
},
"date": {
"type": "date",
"format": "dd/MM/yyyy"
},
"description": {
"type": "text"
},
"has_price": {
"type": "boolean"
},
"place": {
"type": "text"
},
"price": {
"type": "float"
},
"title": {
"type": "text"
}
}
}
}
}
The following query combines the 2 scenarios as 2 should clauses in a bool-query. And as there are only should clauses, minimum_should_match will be 1, meaning that at least one should-clause has to match:
Abstract Code Snippet
GET formations/_search
{
"query": {
"bool": {
"should": [
{ <1st scenario: has_price = false> },
{ <2nd scenario> has_price = true AND price IN budget_range}
]
}
}
}
Actual Sample Code Snippets
# 1. Create the index and populate it with some sample documents
POST formations/_bulk
{"index": {"_id": 1}}
{"has_price": true, "price": 2.0}
{"index": {"_id": 2}}
{"has_price": true, "price": 3.0}
{"index": {"_id": 3}}
{"has_price": true, "price": 4.0}
{"index": {"_id": 4}}
{"has_price": false, "price": 2.0}
{"index": {"_id": 5}}
{"has_price": false, "price": 3.0}
{"index": {"_id": 6}}
{"has_price": false, "price": 4.0}
# 2. Query assuming min_budget = 2.0 and max_budget = 4.0
GET formations/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": {
"term": {
"has_price": false
}
}
}
},
{
"bool": {
"filter": [
{
"term": {
"has_price": true
}
},
{
"range": {
"price": {
"gt": 2,
"lt": 4
}
}
}
]
}
}
]
}
}
}
# 3. Result Snippet (4 hits: 3 from 1st scenario & 1 from 2nd scenario)
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
...
Don't forget to add the Claus "minimum_should_match": 1 to your bool-query in case you add another non-should-clause to your bool-query.
Let me know if this answers your question & solves your issue.

Sorting result based on dynamic terms

Imagine that I have an index with the following three documents representing images and their colors.
[
{
"id": 1,
"intensity": {
"red": 0.6,
"green": 0.1,
"blue": 0.3
}
},
{
"id": 2,
"intensity": {
"red": 0.5,
"green": 0.6,
"blue": 0.0
}
},
{
"id": 3,
"intensity": {
"red": 0.98,
"green": 0.0,
"blue": 0.0
}
}
]
It the user wants a "red image" (selected in a dropdown or in a “tag cloud”), it is very convenient to do a range query over the floats (maybe intensity.red > 0.5). I can also use the score of that query to get the "red-est" image ranked highest.
However, if I would like to offer free-text search, it gets harder. My solution to that would to index the documents as the following (eg use the if color > 0.5 then append(colors, color_name) at index time):
[
{
"id": 1,
"colors": ["red"]
},
{
"id": 2,
"colors": ["green", "red"]
}
{
"id": 3,
"colors": ["red"]
}
]
I could now use a query_string or a match on the colors field and then search for "red", but all of a sudden I lost my ranking possibilities. ID 3 is far more red than ID 1 (0.98 vs 0.6) but the score will be similar?
My question is: Can I have the cake and eat it too?
One solution I see is to have one index that turns free text into "keywords" which I later use in the actual search.
POST image_tag_index/_search {query: "redish"} -> [ "red" ]
POST images/_search {query: {"red" > 0.5}} -> [ {id: 1}, {id: 3}]
But then I need to fire two searches for every search, but maybe that is the only option?
You can make use of nested data type along with function_score query to get the desired result.
You need to change the way you are storing image data. The mapping will be as below:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
},
"image": {
"type": "nested",
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"intensity": {
"type": "float"
}
}
}
}
}
}
}
Index the image data as below:
PUT test/_doc/1
{
"id": 1,
"image": [
{
"color": "red",
"intensity": 0.6
},
{
"color": "green",
"intensity": 0.1
},
{
"color": "blue",
"intensity": 0.3
}
]
}
The above corresponds to the first image data the you posted in the question. Similarly you can index other images data.
Now when a user search for red the query should be build as below:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "image",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"image.color": "red"
}
},
{
"range": {
"image.intensity": {
"gt": 0.5
}
}
}
]
}
},
"field_value_factor": {
"field": "image.intensity",
"modifier": "none",
"missing": 0
}
}
}
}
}
]
}
}
}
You can see in the above query that I have used the field value of image.intensity to calculate the score.

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

ElasticSearch order by number of matches in nested fields

Complete beginner here, quite possibly trying to do the impossible.
I have the following structure that I would like to store in Elasticsearch:
{
"id" : 1,
"code" : "03f3301c-4089-11e7-a919-92ebcb67fe33",
"countries" : [
{
"id" : 1,
"name" : "Netherlands"
},
{
"id" : 2,
"name" : "United Kingdom"
}
],
"tags" : [
{
"id" : 1,
"name" : "Scanned"
},
{
"id" : 2,
"name" : "Secured"
},
{
"id" : 3,
"name" : "Cleared"
}
]
}
I have complete control over how it will be stored, so the structure can change, but it should contain all these fields in some form.
I’d like to be able to query this data over countries and tags in such way that all those items having at least one match are returned, ordered by number of matches. If at all possible I’d prefer not to do a full text search.
For example:
id, code, country ids, tag ids
1, ..., [1, 2, 3], [1]
2, ..., [1], [1, 2, 3]
For the question: "which of these was in country 1 or has tag 1 or has tag 2", should return:
2, ..., [1], [1, 2, 3]
1, ..., [1, 2, 3], [1]
In this order, because the second row matches more sub-queries in the above disjunction.
In essence, I’d like to replicate this SQL query:
SELECT p.id, p.code, COUNT(p.id) FROM packages p
LEFT JOIN tags t ON t.package_id = p.id
LEFT JOIN countries c ON c.package_id = p.id
WHERE t.id IN (1, 2, 3) OR c.id IN (1, 2, 3)
GROUP BY p.id
ORDER BY COUNT(p.id);
I’m using ElasticSearch 2.4.5 if that matters.
Hopefully I was clear enough. Thank you for your help!
You need countries and tags to be of type nested. Also, you need to take control of the scoring with function_score give a weight of 1 for the queries inside the function_score and also play with boost_mode and score_mode. In the end you can use this query:
GET /nested/test/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 1
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 2
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "countries",
"query": {
"term": {
"countries.id": 1
}
}
}
},
"weight": 1
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
}
For a more complete test case, I am also providing the mapping and test data:
PUT nested
{
"mappings": {
"test": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"countries": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST nested/test/_bulk
{"index":{"_id":1}}
{"name":"Foo Bar","tags":[{"id":2,"name":"My Tag 5"},{"id":3,"name":"My Tag 7"}],"countries":[{"id":1,"name":"USA"}]}
{"index":{"_id":2}}
{"name":"Foo Bar","tags":[{"id":3,"name":"My Tag 6"}],"countries":[{"id":1,"name":"USA"},{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}
{"index":{"_id":3}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 4"},{"id":3,"name":"My Tag 1"}],"countries":[{"id":3,"name":"UAE"}]}
{"index":{"_id":4}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 1"},{"id":2,"name":"My Tag 4"},{"id":3,"name":"My Tag 2"}],"countries":[{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}

Elasticsearch - bump individual result to the top

I'm working with Elasticsearch. I have an array of documents, and I'm trying to sort documents by the property price, except that I'd like a particular document to be the first result no matter what.
The below is what I'm using as my "sort" array as my attempt to order documents by ID 1213, and then all following documents ordered by price descending.
[
{
"id": {
"mode": "max",
"order": "desc",
"nested_filter": {
"term": {
"id": 1213
}
},
"missing": "_last"
}
},
{
"price": {
"order": "asc"
}
}
]
This doesn't appear to be working, though—document 1213 doesn't appear first. What am I doing wrong here?
As an example—the ideal returned result:
[{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
Instead, I get:
[{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
As others have already asked, what is the reason for the nested_filter?
There's many possible ways to do what you need. Here is one possible way which fits with the simple requirements you mentioned so far:
{
"query" : {
"custom_filters_score" : {
"query" : {
"match_all" : {}
},
"filters" : [
{
"filter" : {
"term" : {
"id" : "1213"
}
},
"boost" : 2
}
]
}
},
"sort" : [
"_score",
"price"
]
}
The assumption here is that your query is simple like the match_all query and does not affect the scores in anyway. If you do have something more complicated for the queries, to not affect the scores, you can try wrapping with a constant_score query. But ideally you get the document set you want where all the documents have the same score and then custom_filters_score query will boost the score of the document you want. You can do this for any number of documents adding further filters or if the documents are equal, use a terms filter. In the end the sort by the score and then the price.
In this case you need to use function_score to modify score of each doc.
{
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"id": "1213"
}
},
"weight": 1
},
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
],
"score_mode": "sum",
"boost_mode" : "replace",
"query" : {
//YOUR QUERY GOES HERE
}
}
}
}
Explanation:
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
Compute score based on price and give a value < 1. The higher the price the smaller the score (ascending). If you want to switch to descending then just replace it with
"script": "(1 - (1 / doc['price'].value))"
{
"filter": {
term": {
"id": "1213"
}
},
"weight": 1
}
This will give any docs with "id" = 1213 an extra 1 score. The total score at the end will be the sum of those 2 functions.

Resources