Number of nested objects in Elasticsearch - elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.

You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning

The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

Related

Elasticsearch - extract values from a flattened field inside a nested type using a runtime field script

I have this mapping, a big document actually, a lot of fields excluded for brevity
{
"items": {
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword"
},
"costs": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"costs_samples": {
"type": "flattened"
}
}
}
}
}
}
}
costs_samples is a flattened field type huge collection of possible costs (sometimes more than 10k entries), based on some dynamic dimensions. Have to highlight that costs_sample cannot live outside costs, since at query time some conditions from costs should be composed with should or match clauses, like (costs.country=this_value AND some_other costs_samples_condition.
I would like to be able to extract and eventually inject a new field at costs level as a runtime field, and then use that field to sort, filter and aggregate.
Something like this
{
"runtime_mappings": {
"costs.selected_cost": {
"type": "long",
"script": {
"source":
"for (def cost : doc['costs.costs_samples']) { if(cost.values!= null) {emit(Long.parseLong(cost.values.some_dynamic_identity_known_at_query_time.last))} }"
}
}
},
"query":{
"nested": {
"path": "costs",
"query": {
"bool": {
"filter": [
{
"terms": {
"costs.id": ["id-1","id-2"]
}
},
{
"term": {
"costs.selected_cost": 10
}
}
]
}
}
}
},
"fields": ["costs.selected_cost"]
}
The trouble is selected_cost it is not created / returned by ES. No error message.
Where did I do wrong? Documentation was not helpful.
Maybe worth mention that I've also tried with 2 different documents, like items and costs and then execute kind of a join operation, but the performance tests were really poor.
Thanks!
Long story short, there is no way to use a runtime field inside a nested field type.

Elasticsearch multiple index query

I have a following index which stores the course details (I have truncated some attributes for brevity):
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course": {
}
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"id": {
"type": "integer"
},
"max_per_user": {
"type": "integer"
}
}
}
}
Here max_per_user is number of times a user can complete the course. A user is allowed through a course multiple times but not more than max_per_user for a course
I want to track user interactions with courses. I have created following index to track interaction events. event_type_id represents a type of interaction
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course_events": {
}
},
"mappings": {
"properties": {
"user_progress": {
"dynamic": "true",
"properties": {
"current_count": {
"type": "integer"
},
"user_id": {
"type": "integer"
},
"events": {
"dynamic": "true",
"properties": {
"event_type_id": {
"type": "integer"
},
"event_timestamp": {
"type": "date",
"format": "strict_date_time"
}
}
}
}
},
"created_at": {
"type": "date",
"format": "strict_date_time"
},
"course_id": {
"type": "integer"
}
}
}
}
Where current_count is number of times the user has gone through the complete course
Now when I run a search on course index, I also want to be able to pass in the user_id and get only those courses where the current_count for the given user is less than max_per_user for the course
My search query for course index is something like this (truncated some filters for brevity). This query is executed when a user searches for a course, so basically at the time of executing this I will have user_id.
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"filter": [
{
"range": {
"end_date": {
"gte": "2020-09-28T12:27:55.884Z"
}
}
},
{
"range": {
"start_date": {
"lte": "2020-09-28T12:27:55.884Z"
}
}
}
],
"must": [
{
"term": {
"is_active": true
}
}
]
}
}
}
I am not sure how to construct my search query such that I am able to filter out courses where max_per_user has been achieved for a given user_id.
If I understood the question correctly you want to find the courses where max_per_user limit isn't exceeded. My answer is on the same basis:
Considering your current Schema way to find what you want is:
For the given user_id find all the course_ids and their corresponding completion count
Using the data fetched in #1 find out the courses where-in max_per_user limit is not exceeded.
Now comes the problem:
In a relational database such use case can be solved using table join and checks
Elastic Search doesn't support joins and can't be done here.
Poor solution with current schema:
For each course check whether it is applicable or not. For n courses number of queries to E.S will be proportional to N.
Solution with current schema:
With-in the user-course-completion index (second index you mentioned), track max_per_user as well and use a simple query like below, to get the required course ids :
{
"size": 10,
"query": {
"script": {
"script": "doc['current_usage'].value<doc['max_per_user'].value &&
doc['u_id'].value==1" // <======= 1 is the user_id here
}
}
}

ElasticSearch sort by value

I have ElasticSearch 5 and I would like to do sorting based on field value. Imagine having document with category e.g. genre which could have values like sci-fi, drama, comedy and while doing search I would like to order values so that first comes comedies then sci-fi and drama at last. Then of course I will order within groups by other criteria. Could somebody point me to how do this ?
Elasticsearch Sort Using Manual Ordering
This is possible in elasticsearch where you can assign order based on particular values of a field.
I've implemented what you are looking for using script based sorting which makes use of painless script. You can refer to the links I've mentioned to know more on these for below query would suffice what you are looking for.
Also assuming you have genre and movie as keyword with the below mapping.
PUT sampleindex
{
"mappings": {
"_doc": {
"properties": {
"genre": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"movie": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
You can make use of below query to get what you are looking for
GET sampleindex/_search
{
"query": {
"match_all": {}
},
"sort": [{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "if(params.scores.containsKey(doc['genre.raw'].value)) { return params.scores[doc['genre.raw'].value];} return 100000;",
"params": {
"scores": {
"comedy": 0,
"sci-fi": 1,
"drama": 2
}
}
},
"order": "asc"
}
},
{ "movie.raw": { "order": "asc"}
}]
}
Note that I've included sort for both genre and movie. Basically sort would happen on genre and post that further it would sort based on movie for each genre.
Hope it helps.

Elasticsearch sorting by matching array item

I have a following structure in indexed documents:
document1: "customLists":[{"id":8,"position":8},{"id":26,"position":2}]
document2: "customLists":[{"id":26,"position":1}]
document3: "customLists":[{"id":8,"position":1},{"id":26,"position":3}]
I am able to search matching documents that belong to a given list with match query "customLists.id = 26". But I need to sort the documents based on the position value within that list and ignore positions of the other lists.
So the expected results would be in order of document2, document1, document3
Is the data structure suitable for this kind of sorting and how to handle this?
One way to achieve this would be to set mapping type of customLists as nested and then use sorting by nested fields
Example :
1) Create Index & Mapping
put test
put test/test/_mapping
{
"properties": {
"customLists": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"position": {
"type": "integer"
}
}
}
}
}
2) Index Documents :
put test/test/1
{
"customLists":[{"id":8,"position":8},{"id":26,"position":2}]
}
put test/test/2
{
"customLists":[{"id":26,"position":1}]
}
put test/test/3
{
"customLists":[{"id":8,"position":1},{"id":26,"position":3}]
}
3) Query to sort by positon for given id
post test/_search
{
"filter": {
"nested": {
"path": "customLists",
"query": {
"term": {
"customLists.id": {
"value": "26"
}
}
}
}
},
"sort": [
{
"customLists.position": {
"order": "asc",
"mode": "min",
"nested_filter": {
"term": {
"customLists.id": {
"value": "26"
}
}
}
}
}
]
}

Elasticsearch getting the last nested or most recent nested element

We have this mapping:
{
"product_achievement": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"last_purchase": {
"type": "long"
},
"products": {
"type": "long"
}
}
}
}
As you see this is nested, and the last_purchase field is a unixtimestamp value. We would like to query from all nested elements the most recent entry defined by the last_purchase field AND see if in the last entry there is some product id is in products.
You can achieve this using a nested query with inner_hits. In the query part, you can specify the product id you want to match and then using inner_hits you can sort by decreasing last_purchase timestamp and only take the first one using size: 1
{
"query": {
"nested": {
"path": "product_achievement",
"query": {
"term": {
"product_achievement.products": 1
}
},
"inner_hits": {
"size": 1,
"sort": {
"product_achievement.last_purchase": "desc"
}
}
}
}
}

Resources