Elasticsearch multiple index query - elasticsearch

I have a following index which stores the course details (I have truncated some attributes for brevity):
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course": {
}
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"id": {
"type": "integer"
},
"max_per_user": {
"type": "integer"
}
}
}
}
Here max_per_user is number of times a user can complete the course. A user is allowed through a course multiple times but not more than max_per_user for a course
I want to track user interactions with courses. I have created following index to track interaction events. event_type_id represents a type of interaction
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course_events": {
}
},
"mappings": {
"properties": {
"user_progress": {
"dynamic": "true",
"properties": {
"current_count": {
"type": "integer"
},
"user_id": {
"type": "integer"
},
"events": {
"dynamic": "true",
"properties": {
"event_type_id": {
"type": "integer"
},
"event_timestamp": {
"type": "date",
"format": "strict_date_time"
}
}
}
}
},
"created_at": {
"type": "date",
"format": "strict_date_time"
},
"course_id": {
"type": "integer"
}
}
}
}
Where current_count is number of times the user has gone through the complete course
Now when I run a search on course index, I also want to be able to pass in the user_id and get only those courses where the current_count for the given user is less than max_per_user for the course
My search query for course index is something like this (truncated some filters for brevity). This query is executed when a user searches for a course, so basically at the time of executing this I will have user_id.
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"filter": [
{
"range": {
"end_date": {
"gte": "2020-09-28T12:27:55.884Z"
}
}
},
{
"range": {
"start_date": {
"lte": "2020-09-28T12:27:55.884Z"
}
}
}
],
"must": [
{
"term": {
"is_active": true
}
}
]
}
}
}
I am not sure how to construct my search query such that I am able to filter out courses where max_per_user has been achieved for a given user_id.

If I understood the question correctly you want to find the courses where max_per_user limit isn't exceeded. My answer is on the same basis:
Considering your current Schema way to find what you want is:
For the given user_id find all the course_ids and their corresponding completion count
Using the data fetched in #1 find out the courses where-in max_per_user limit is not exceeded.
Now comes the problem:
In a relational database such use case can be solved using table join and checks
Elastic Search doesn't support joins and can't be done here.
Poor solution with current schema:
For each course check whether it is applicable or not. For n courses number of queries to E.S will be proportional to N.
Solution with current schema:
With-in the user-course-completion index (second index you mentioned), track max_per_user as well and use a simple query like below, to get the required course ids :
{
"size": 10,
"query": {
"script": {
"script": "doc['current_usage'].value<doc['max_per_user'].value &&
doc['u_id'].value==1" // <======= 1 is the user_id here
}
}
}

Related

Number of nested objects in Elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.
You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning
The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

Elasticsearch - Mapping fields from other indices

How can I define mapping in Elasticsearch 7 to index a document with a field value from another index? For example, if I have a users index which has a mapping for name, email and account_number but the account_number value is actually in another index called accounts in field number.
I've tried something like this without much success (I only see "name", "email" and "account_id" in the results):
PUT users/_mapping
{
"properties": {
"name": {
"type": "text"
},
"email": {
"type": "text"
},
"account_id": {
"type": "integer"
},
"accounts": {
"properties": {
"number": {
"type": "text"
}
}
}
}
}
The accounts index has the following mapping:
{
"properties": {
"name": {
"type": "text"
},
"number": {
"type": "text"
}
}
}
As I understand it, you want to implement field joining as is usually done in relational databases. In elasticsearch, this is possible only if the documents are in the same index. (Link to doc). But it seems to me that in your case you need to work differently, I think your Account object needs to be nested for User.
PUT /users/_mapping
{
"mappings": {
"properties": {
"account": {
"type": "nested"
}
}
}
}
You can further search as if it were a separate document.
GET /users/_search
{
"query": {
"nested": {
"path": "account",
"query": {
"bool": {
"must": [
{ "match": { "account.number": 1 } }
]
}
}
}
}
}

How can I get parent with all children in one query

I have following mapping:
PUT /test_products
{
"mappings": {
"_doc": {
"properties": {
"type": {
"type": "keyword"
},
"name": {
"type": "text"
},
"entity_id": {
"type": "integer"
},
"weighted": {
"type": "integer"
}
"product_relation": {
"type": "join",
"relations": {
"window": "simple"
}
}
}
}
}
}
I want to get "window" products with all "simple"s but only where one or more "simple"s have property "weighted" = 1
I wrote following query:
GET test_products/_search
{
"query": {
"has_child": {
"type": "simple",
"query": {
"term": {
"weighted": 1
}
},
"inner_hits": {}
}
}
}
But I've got "window"s with "simple"s which are match to the term. In other words I want to filter "window"s list by "simple"'s option and get all matched "window"s with all their "simple"s. Is it possible without "nested" in one query? Or I have to do some queries?
OK. Luckily, I need to get only one "window" product with all it's children by it's ID, so I found parent_id query which can helps me with this task.
Now I have following query:
GET test_products/_search
{
"query": {
"parent_id": {
"type": "simple",
"id": "window-1"
}
}
}
Unfortunately, I have to execute 2 queries (has_child and then parent_id) instead of one but it's OK for me.

Elastic Search- Distinct elements from multiple nested fields

I have created mapping using elasticsearch. Here is the mapping properties
"properties": {
"userPermissions": {
"type": "nested",
"properties": {
"prm": {
"type": "string"
},
"id": {
"type": "string"
}
}
},
"pSPermissions": {
"type": "nested",
"properties": {
"prm": {
"type": "string"
},
"id": {
"type": "string"
}
}
}
}
I want to retrieve overall distinct items from these fields: userPermissions.id, pSPermissions.id.
I can achieve distinct values of multiple fields under a single path. We need to use a script to retrieve terms from multiple fields.
GET /permissions/perm/_search?pretty=true&search_type=count
{
"aggs": {
"Parents": {
"nested": {
"path": "userPermissions"
},
"aggs": {
"permCount": {
"terms": {
"script": "[doc['userPermissions.id'].value,doc['userPermissions.prm'].value]",
"size": 5000
}
}
}
}
}
}
But I have no idea how to achieve across different paths userPermissions and pSPermissions. Is it achievable?
You can achieve it using the following script:
"script": "doc['userPermissions.id'].values + doc['userPermissions.prm'].values",
What it does is retrieve all the userPermissions.id values and all the userPermissions.prm values and then concatenate them into a single array using groovy's + operator.
UPDATE
Following up on your comment, you can achieve what you want with this script in a similar way as you're already doing for fields under the same path:
"script": "doc['userPermissions.id'].values + doc['pSPermissions.id'].values",

Elasticsearch getting the last nested or most recent nested element

We have this mapping:
{
"product_achievement": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"last_purchase": {
"type": "long"
},
"products": {
"type": "long"
}
}
}
}
As you see this is nested, and the last_purchase field is a unixtimestamp value. We would like to query from all nested elements the most recent entry defined by the last_purchase field AND see if in the last entry there is some product id is in products.
You can achieve this using a nested query with inner_hits. In the query part, you can specify the product id you want to match and then using inner_hits you can sort by decreasing last_purchase timestamp and only take the first one using size: 1
{
"query": {
"nested": {
"path": "product_achievement",
"query": {
"term": {
"product_achievement.products": 1
}
},
"inner_hits": {
"size": 1,
"sort": {
"product_achievement.last_purchase": "desc"
}
}
}
}
}

Resources