Elastic Search- Distinct elements from multiple nested fields - elasticsearch

I have created mapping using elasticsearch. Here is the mapping properties
"properties": {
"userPermissions": {
"type": "nested",
"properties": {
"prm": {
"type": "string"
},
"id": {
"type": "string"
}
}
},
"pSPermissions": {
"type": "nested",
"properties": {
"prm": {
"type": "string"
},
"id": {
"type": "string"
}
}
}
}
I want to retrieve overall distinct items from these fields: userPermissions.id, pSPermissions.id.
I can achieve distinct values of multiple fields under a single path. We need to use a script to retrieve terms from multiple fields.
GET /permissions/perm/_search?pretty=true&search_type=count
{
"aggs": {
"Parents": {
"nested": {
"path": "userPermissions"
},
"aggs": {
"permCount": {
"terms": {
"script": "[doc['userPermissions.id'].value,doc['userPermissions.prm'].value]",
"size": 5000
}
}
}
}
}
}
But I have no idea how to achieve across different paths userPermissions and pSPermissions. Is it achievable?

You can achieve it using the following script:
"script": "doc['userPermissions.id'].values + doc['userPermissions.prm'].values",
What it does is retrieve all the userPermissions.id values and all the userPermissions.prm values and then concatenate them into a single array using groovy's + operator.
UPDATE
Following up on your comment, you can achieve what you want with this script in a similar way as you're already doing for fields under the same path:
"script": "doc['userPermissions.id'].values + doc['pSPermissions.id'].values",

Related

Elasticsearch multiple index query

I have a following index which stores the course details (I have truncated some attributes for brevity):
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course": {
}
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"id": {
"type": "integer"
},
"max_per_user": {
"type": "integer"
}
}
}
}
Here max_per_user is number of times a user can complete the course. A user is allowed through a course multiple times but not more than max_per_user for a course
I want to track user interactions with courses. I have created following index to track interaction events. event_type_id represents a type of interaction
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course_events": {
}
},
"mappings": {
"properties": {
"user_progress": {
"dynamic": "true",
"properties": {
"current_count": {
"type": "integer"
},
"user_id": {
"type": "integer"
},
"events": {
"dynamic": "true",
"properties": {
"event_type_id": {
"type": "integer"
},
"event_timestamp": {
"type": "date",
"format": "strict_date_time"
}
}
}
}
},
"created_at": {
"type": "date",
"format": "strict_date_time"
},
"course_id": {
"type": "integer"
}
}
}
}
Where current_count is number of times the user has gone through the complete course
Now when I run a search on course index, I also want to be able to pass in the user_id and get only those courses where the current_count for the given user is less than max_per_user for the course
My search query for course index is something like this (truncated some filters for brevity). This query is executed when a user searches for a course, so basically at the time of executing this I will have user_id.
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"filter": [
{
"range": {
"end_date": {
"gte": "2020-09-28T12:27:55.884Z"
}
}
},
{
"range": {
"start_date": {
"lte": "2020-09-28T12:27:55.884Z"
}
}
}
],
"must": [
{
"term": {
"is_active": true
}
}
]
}
}
}
I am not sure how to construct my search query such that I am able to filter out courses where max_per_user has been achieved for a given user_id.
If I understood the question correctly you want to find the courses where max_per_user limit isn't exceeded. My answer is on the same basis:
Considering your current Schema way to find what you want is:
For the given user_id find all the course_ids and their corresponding completion count
Using the data fetched in #1 find out the courses where-in max_per_user limit is not exceeded.
Now comes the problem:
In a relational database such use case can be solved using table join and checks
Elastic Search doesn't support joins and can't be done here.
Poor solution with current schema:
For each course check whether it is applicable or not. For n courses number of queries to E.S will be proportional to N.
Solution with current schema:
With-in the user-course-completion index (second index you mentioned), track max_per_user as well and use a simple query like below, to get the required course ids :
{
"size": 10,
"query": {
"script": {
"script": "doc['current_usage'].value<doc['max_per_user'].value &&
doc['u_id'].value==1" // <======= 1 is the user_id here
}
}
}

Elasticsearch - Mapping fields from other indices

How can I define mapping in Elasticsearch 7 to index a document with a field value from another index? For example, if I have a users index which has a mapping for name, email and account_number but the account_number value is actually in another index called accounts in field number.
I've tried something like this without much success (I only see "name", "email" and "account_id" in the results):
PUT users/_mapping
{
"properties": {
"name": {
"type": "text"
},
"email": {
"type": "text"
},
"account_id": {
"type": "integer"
},
"accounts": {
"properties": {
"number": {
"type": "text"
}
}
}
}
}
The accounts index has the following mapping:
{
"properties": {
"name": {
"type": "text"
},
"number": {
"type": "text"
}
}
}
As I understand it, you want to implement field joining as is usually done in relational databases. In elasticsearch, this is possible only if the documents are in the same index. (Link to doc). But it seems to me that in your case you need to work differently, I think your Account object needs to be nested for User.
PUT /users/_mapping
{
"mappings": {
"properties": {
"account": {
"type": "nested"
}
}
}
}
You can further search as if it were a separate document.
GET /users/_search
{
"query": {
"nested": {
"path": "account",
"query": {
"bool": {
"must": [
{ "match": { "account.number": 1 } }
]
}
}
}
}
}

nested terms aggregation on object containing a string field

I like to run a nested terms aggregation on string field which is inside an object.
Usually, I use this query
"terms": {
"field": "fieldname.keyword"
}
to enable fielddata
But I am unable to do that for a nested document like this
{
"nested": {
"path": "objectField"
},
"aggs": {
"allmyaggs": {
"terms": {
"field": "objectField.fieldName.keyword"
}
}
}
}
The above query is just returning an empty buckets array
Is there a way this can be done without enabling field-data by default during index mapping.
Since that will take a large heap memory and I have already loaded a huge data without it
document mapping
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text"
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
sample document
{
"productname": [
{
"productlineseqno": "1.58",
"iscontrolleddrug": "false",
"productlinename": "Consultations",
"productlinedescription": "Consultations",
"isprescribable": "false",
"invoiceitemname": "invoice name"
}
]
}
Fixed
By changing the mapping to enable field data
Nested query is used to access nested fields similarly nested aggregation is needed to aggregation on nested fields
{
"aggs": {
"fieldname": {
"nested": {
"path": "objectField"
},
"aggs": {
"fields": {
"terms": {
"field": "objectField.fieldname.keyword",
"size": 10
}
}
}
}
}
}
EDIT1:
If you are searching for productname.invoiceitemname.keyword then it will give empty bucket as no field exists with that name.
You need to define your mapping like below
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text",
"fields":{ --> note
"keyword":{
"type":"keyword"
}
}
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
}
}
}
Fields
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
When mapping is not explicitly provided, keyword fields are created by default. If you are creating your own mapping(which you need to do for nested type), you need to provide keyword fields in mapping, wherever you intend to use them

Search for documents in elasticsearch and then query the nested fields

I have an index like this:
{
"rentals": {
"aliases": {},
"mappings": {
"rental": {
"properties": {
"address": {
"type": "text"
},
"availability": {
"type": "nested",
"properties": {
"chargeBasis": {
"type": "text"
},
"date": {
"type": "date"
},
"isAvailable": {
"type": "boolean"
},
"rate": {
"type": "double"
}
}
}
}
And this is my use case:
I need to search for all the "rentals" that have a given address.
This is easy and done
I need to get "availability" data for all those "rentals" searched; only for today's date.
This is the part where I'm stuck at, how do I query the nested documents of all the "rentals"?
You need to use the nested query:
Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them.
Try something like:
{
"query": {
"nested": {
"path": "availability",
"query": {
"term": {
"availability.date": "2015-01-01"
}
}
}
}
}

Elasticsearch getting the last nested or most recent nested element

We have this mapping:
{
"product_achievement": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"last_purchase": {
"type": "long"
},
"products": {
"type": "long"
}
}
}
}
As you see this is nested, and the last_purchase field is a unixtimestamp value. We would like to query from all nested elements the most recent entry defined by the last_purchase field AND see if in the last entry there is some product id is in products.
You can achieve this using a nested query with inner_hits. In the query part, you can specify the product id you want to match and then using inner_hits you can sort by decreasing last_purchase timestamp and only take the first one using size: 1
{
"query": {
"nested": {
"path": "product_achievement",
"query": {
"term": {
"product_achievement.products": 1
}
},
"inner_hits": {
"size": 1,
"sort": {
"product_achievement.last_purchase": "desc"
}
}
}
}
}

Resources