Describing object's keys with MSON - apiblueprint

I have API that looks like this.
The products object contains Product IDs (keys) and their amount in basket (values).
{
"id": "0a4d44aa-2ace-11e7-93ae-92361f002671",
"products": {
4 => 3, // product with ID 4 is 3x in basket
10 => 1, // product with ID 10 is 1x in basket
...
},
// some other values...
}
a) How can I describe this API with MSON?
b) Is this schema correct?
{
"type": "object",
"properties": {
"id": {
"type": "string",
},
"products": {
"type": "object",
"properties": {
"productId": {
"type": "int",
}
}
},
// some other values
}
}

Related

Elasticsearch range filter for multiplication of two numeric fields

I have an index which I need to filter a multiplication of two fields to be within a range.
First, here's the mapping for my "items" index:
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "float"
},
"discount": {
"type": "float"
}
}
}
}
An item's actual price would be its price multiplied by its discount.
I need to create a query for items with their actual price to be between two numbers: X <= price * discount <= Y
I have looked at the documentation for Elasticsearch, but the range query seems to only take into account the value of a single field, not the multiplicative product of two fields:
{
"query": {
"range": {
"price": { // only price
"gte": 10, // X
"lte": 200, // Y
}
}
}
}
I wonder if there any solution besides adding another field which would store the multiplied value to be used in the query.
Thank you.
You have 2 alternatives:
Add the field on index time
Use a runtime field
1 is self explaining, and it is the recommended one in most cases because storage is cheaper than compute. If you don't store it you will have to compute it every time.
You can use Runtime fields to generate this new field on the mappings, or in the query.
I will show you both ways:
mappings
PUT test_product
{
"mappings": {
"runtime": {
"discount_price": {
"type": "double",
"script": {
"source": "emit(doc['price'].value * doc['discount'].value )"
}
}
},
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "double"
},
"discount": {
"type": "double"
}
}
}
}
Ingest a document
POST test_product/_doc
{
"name": "Orange",
"price": "10.0",
"discount": "0.5"
}
Run a query:
GET test_product/_search
{
"query": {
"range": {
"discount_price": {
"gte": 5,
"lte": 5
}
}
}
}
Now without defining the runtime field in the mappings:
GET test_product/_search
{
"runtime_mappings": {
"discount_price": {
"type": "double",
"script": {
"source": "emit(doc['price'].value * doc['discount'].value )"
}
}
},
"query": {
"range": {
"discount_price": {
"gte": 5,
"lte": 5
}
}
}
}

Elasticsearch multiple index query

I have a following index which stores the course details (I have truncated some attributes for brevity):
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course": {
}
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"id": {
"type": "integer"
},
"max_per_user": {
"type": "integer"
}
}
}
}
Here max_per_user is number of times a user can complete the course. A user is allowed through a course multiple times but not more than max_per_user for a course
I want to track user interactions with courses. I have created following index to track interaction events. event_type_id represents a type of interaction
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"aliases": {
"course_events": {
}
},
"mappings": {
"properties": {
"user_progress": {
"dynamic": "true",
"properties": {
"current_count": {
"type": "integer"
},
"user_id": {
"type": "integer"
},
"events": {
"dynamic": "true",
"properties": {
"event_type_id": {
"type": "integer"
},
"event_timestamp": {
"type": "date",
"format": "strict_date_time"
}
}
}
}
},
"created_at": {
"type": "date",
"format": "strict_date_time"
},
"course_id": {
"type": "integer"
}
}
}
}
Where current_count is number of times the user has gone through the complete course
Now when I run a search on course index, I also want to be able to pass in the user_id and get only those courses where the current_count for the given user is less than max_per_user for the course
My search query for course index is something like this (truncated some filters for brevity). This query is executed when a user searches for a course, so basically at the time of executing this I will have user_id.
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"filter": [
{
"range": {
"end_date": {
"gte": "2020-09-28T12:27:55.884Z"
}
}
},
{
"range": {
"start_date": {
"lte": "2020-09-28T12:27:55.884Z"
}
}
}
],
"must": [
{
"term": {
"is_active": true
}
}
]
}
}
}
I am not sure how to construct my search query such that I am able to filter out courses where max_per_user has been achieved for a given user_id.
If I understood the question correctly you want to find the courses where max_per_user limit isn't exceeded. My answer is on the same basis:
Considering your current Schema way to find what you want is:
For the given user_id find all the course_ids and their corresponding completion count
Using the data fetched in #1 find out the courses where-in max_per_user limit is not exceeded.
Now comes the problem:
In a relational database such use case can be solved using table join and checks
Elastic Search doesn't support joins and can't be done here.
Poor solution with current schema:
For each course check whether it is applicable or not. For n courses number of queries to E.S will be proportional to N.
Solution with current schema:
With-in the user-course-completion index (second index you mentioned), track max_per_user as well and use a simple query like below, to get the required course ids :
{
"size": 10,
"query": {
"script": {
"script": "doc['current_usage'].value<doc['max_per_user'].value &&
doc['u_id'].value==1" // <======= 1 is the user_id here
}
}
}

Parent Child Relation In Elastic Search 7.5

I am new to "Elastic Search" and currently trying to understand how does ES maintain "Parent-Child" relationship. I started with the following article:
https://www.elastic.co/blog/managing-relations-inside-elasticsearch
But the article is based on old version of ES and I am currently using ES 7.5 which states that:
The _parent field has been removed in favour of the join field.
Now I am currently following this article:
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/parent-join.html
However, I am not able to get the desired result.
I have a scenario in which i have two indices "Person" and "Home". Each "Person" can have multiple "Home" which is basically a one-to-many relation. Problem is when I query to fetch all homes whose parent is "XYZ" person the answer is null.
Below are my indexes structure and search query:
Person Index:
Request URL: http://hostname/person
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Home Index:
Request URL: http://hostname/home
{
"mappings": {
"properties": {
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Adding data in person Index
Request URL: http://hostname/person/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
Adding data in home index
Request URL: http://hostname/home/_doc/2?routing=1&refresh
{
"state": "ontario",
"person_home": {
"name": "home",
"parent": "1"
}
}
Query to fetch data: (To fetch all the records who parent is person id "1")
Request URL: http://hostname/person/_search
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"name": "shujaat"
}
}
}
}
}
OR
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"_id": "1"
}
}
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
I am unable to understand what I am missing here or what is wrong with the above mentioned query as it not returning any data.
You should put the parent and child documents in the same index:
The join datatype is a special field that creates parent/child
relation within documents of the same index.
So the mapping would look like the following:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Notice that it has both fields from your original person and home indexes.
The rest of your code should work just fine. Try inserting the person and home documents into the same index person_home and use the queries as you posted in the question.
What if person and home objects have overlapping field names?
Let's say, both object types have got field name but we want to index and query them separately. In this case we can come up with a mapping like this:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"person": {
"properties": {
"name": {
"type": "text"
}
}
},
"home": {
"properties": {
"name": {
"type": "keyword"
},
"state": {
"type": "text"
}
}
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Now, we should change the structure of the objects themselves:
PUT http://hostname/person_home/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
PUT http://hostname/person_home/_doc/2?routing=1&refresh
{
"home": {
"name": "primary",
"state": "ontario"
},
"person_home": {
"name": "home",
"parent": "1"
}
}
If you have to migrate old data from the two old indexes into a new merged one, reindex API may be of use.

Separation of hits returned from elastic by nested field value

I've index with products there. I'm trying to separate hits returned from elastic by nested field value. There's my shortened index:
{
"mapping": {
"product": {
"properties": {
"id": {
"type": "integer"
},
"model_name": {
"type": "text",
},
"variants": {
"type": "nested",
"properties": {
"attributes": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"product_attribute_id": {
"type": "integer"
},
"value": {
"type": "text"
}
}
},
"id": {
"type": "integer"
},
"product_id": {
"type": "integer"
}
}
}
}
}
}
}
And product example (there's is more variants and attributes in product - I just cut them off):
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
},
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}
The field in question is attribute id. Let's say I want to separate products by size attribute - id 29. It would be perfect if the response would look like:
"hits" : [
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
}
]
},
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}]
I thought about separate all variants in elastic request and then group them on application side by those attribute but i think it's not most elegant and above all, efficient way.
What are the elastic keywords that I should be interested in?
Thank you in advance for your help.

ElasticSearch - Range query on min value in array

Given the following mapping:
"item": {
"properties": {
"name": {
"type": "string",
"index": "standard"
},
"state": {
"type": "string",
"index": "not_analyzed"
},
"important_dates": {
"properties": {
"city_id": {
"type": "integer"
},
"important_date": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
And given the following items in an index:
{
"_id": 1,
"name": "test data 1",
"state": "california",
"important_dates": [
{
"city_id": 100,
"important_date": "2016-01-01T00:00:00"
},
{
"city_id": 200,
"important_date": "2016-05-15T00:00:00"
}
},
{
"_id": 2,
"name": "test data 2",
"state": "wisconsin",
"important_dates": [
{
"city_id": 300,
"important_date": "2016-04-10T00:00:00"
},
{
"city_id": 400,
"important_date": "2016-05-20T00:00:00"
}
}
Is it possible to do a range filter on important_dates, but only filter using the min date in the important_dates array? Could this also be expanded to only use the date for a specific city if a city_id was given as a parameter?
Example Queries:
If I have a range filter of 4/9/2016 to 5/17/2016 on important_dates, I only want to get back item 2 since the min date in item 1 doesn't fall within the range given.
If I have a range filter range filter of 4/9/2016 to 5/17/2016 on important_dates and pass in city_id 400, I should not get any results.

Resources