'Associated' data in ElasticSearch - elasticsearch

For an ecommerce platform, we're looking to index products. Default fields are simple as: name_en, name_de, name_fr, description. But, price and stock are dependant on another value:
Product A, for webshop 1, has price = 1.99, stock = 10, and fits under categories 1, 10, and 50.
Product A, for webshop 2, has price = 5.99, stock = 5, and categories 9, 90, and 500.
I was thinking of nested objects, but is that even an option?
- name_en: Product A
- description_en: Product A description
- webshops: [{
- key: webshop_id
value: 1
- key: price
value: 1.99
- key: stock
value: 10
- key: categories
value: [1, 10, 50]
},{
- key: webshop_id
value: 2
- key: price
value: 5.99
- key: stock
value: 5
- key: categories
value: [9, 90, 500]
}
]
Is it easy querying like this? Can we easily get the entire document, with the values where webshop.key.webshop_id.value = 1, or webshop.key.categories.value = 500?
Is my thinking wrong, any pointers in the right direction?

You can nest as you did, but it will get difficult to update the price or stock of a product in a single webshop, because you'll have to reindex the whole webshops array. There are ways to around it, but that's convoluted.
Instead of having a nested structure, you can also denormalize the webshop part and simply include the price, stock and categories fields in the documents like this.
Document 1:
- name_en: Product A
- description_en: Product A description
- webshop_id: 1
- price: 1.99
- stock: 10
- categories: [1, 10, 50]
Document 2:
- name_en: Product A
- description_en: Product A description
- webshop_id: 2
- price: 5.99
- stock: 5
- categories: [9, 90, 500]
Then in your queries you can simply add a constraint for webshop = 1 or webshop = 2 (or both) depending on which webshop you're querying against. It's also much easier to update the price, stock and categories of a product in a specific shop, all you have to do is update the corresponding document.
This means that your product data (name, description, etc) will be copied once per webshop but it's not a big deal usually (pretty common in the NoSQL world), you just have to update 2 documents instead of a single, but _bulk will help there. At least, when you add new webshops, you don't need to reindex all your data (!!!) and you change the prices, stocks in one webshop without interfering with the others.

You can also use the parent/child relationship capability.
You must define two document types: product and webshop
In the mapping, you must define the relationship like that : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-parent-field.html
{
"webshop" : {
"_parent" : {
"type" : "product"
}
}
}
Like that you can index all the products in the product type. Then you can index all the webshop details related to the product.
You can use the query/filters to retrieve the webshop details related to a product.
Like that you have real different documents that can be queried separately.

Related

Is there a way to create a runtime field in Elasticsearch that is equal to a 'Value'/'Sum of Value across index'?

I have a task to show the percent of value a set of filtered documents represents vs the entire value represented across a whole year. For example:
[{
name: 'Foo',
value: 12,
year: 2021
},
{
name: 'Bar',
value: 2,
year: 2021
},
{
name: 'Car',
value: 10,
year: 2021
},
{
name: 'Lar',
value: 4,
year: 2022
}]
I'd like to create a runtime field that would equal .5 for 'Foo' (12/(12+2+10)), .42 for 'Car' (10/(12+2+10)) and 1 for 'Lar' (4/4). Is this possible? Is there a better way to achieve this result? The ultimate goal is that if someone creates a query that returns 'Foo' and 'Car' they could sum the runtime field to get .92 (.5+.42) and that such a result could be used in a Kibana Lens visualization.
I've tried creating queries that return the above results, and that is easy enough, but those queries aren't usable inside Kibana which also has global filters to account for. That's why I thought a calculated field that represents the ratio of a document's value in relation to the sum of all documents' values would be useful.

Adding a New field too slow

I want to add a new field to my index which includes more than 20m documents. I have dictionary like this Template : [Catalog_id : {Keyword: Sold Count}]
Sold Counts = {1234: {Apple:50}, 3242: {Banana:20}, 3423: {Apple:23}, ...}
In the index, there are many documents which share the same catalog_id. According to each document's catalog_id, I want to add a new field.
_id: 12323423423, catalog_id: 1234, name: '....', **Sold Count: [Apple,50]**
What is the best way to insert a new field in this situtation?

Q: Structuring elasticsearch indexes for query optimization

So I'm working on setting up ElasticSearch/Opensearch in order to build an analytics dashboards.
The data that I have is:
Product > Date > Customer > Variables:Data (e.g, revenue: 100)
{
"_id": X,
“_type”: [“Date”],
“_index”: [“Product A”],
"_CustomerXYZ":{
"revenue": 100,
"name": ["ABC Inc.”],
"usage":200,
}
}
I was thinking of setting up an index for each product and then a document for each date and then do a JSON map for each customer where we have each of the variables.
I essentially want to be able to easily query and graph customer variables over time for a particular product. E.g, product A for last 90 days for customer B plot their revenue.
As I will have millions of customers, and 2yrs+ of data + multiple products - I'm looking at 100s of millions if not billions of records. What is the best way to setup my ElasticSearch cluster to ensure scalability and sub-second latencies?

Filtering products index in elasticsearch by user

I have an index of products. They have regular fields such as id, name, brand etc. Querying this index is working great, however I want to limit the products which are returned for specific users.
Say I have 5 products who’s IDs go from 1-5
Id: 1, name: “Product One”, brand: “Fake Brand”
Id: 2, name: “Product Two”, brand: “Fake Brand”
Id: 3, name: “Product Three”, brand: “Fake Brand”
Id: 4, name: “Product Four”, brand: “Fake Brand”
Id: 5, name: “Product Five”, brand: “Fake Brand”
If there’s no filter, and I search for brand: “Fake Brand”, I get 5 results.
But I want to add this functionality: Say I have two users. User 1 is only able to “see” product IDs 1, 2, and 5. And User 2 is only able to “see” a different subset, say, product IDs 1, 2, 4 and 5.
So if user 1 searches for brand: “Fake Brand”, he only gets back product with IDs 1, 2, and 5. Where as if user 2 searches for brand: “Fake Brand”, he only gets back products with IDs 1, 2, 4 and 5.
Is there a way to add a “user id” to this products query and then store somewhere else what products a user is able to see?
In SQL I would probably have a different table storing what products each user can see and then just do a join. But using ES I think I either have to have two separate indexes or to use nested or has_child/has_parent queries but I’m not entirely sure how to implement it.

fetch perticular number of documents satisfying multiple conditions - Elasticsearch

I have a Elasticsearch index for an information of fruits as below
GET fruits/fruits_data/_search
[{ id: 1,
name: apple},
{ id: 2,
name: mango},
{ id: 3,
name: apple},
{ id: 4,
name: banana},
{ id: 5,
name: apple},
{ id: 6,
name: mango},
{ id: 7,
name: pineapple},
{ id: 8,
name: jackfruit}]
Now I need to fetch 7 fruits as per the priority (below):
{"apple": 3, "banana": 3, "mango": 2, "guava": 2, "pineapple": 1, "jackfruit": 1}
Here the key indicates the fruit to be fetched and valueindicates the maximum number of the document to be fetched.
This means I need to fetch maximum 3 apple, 3 banana and 1 mango and I can ignore the others in priority hash when I have required number of fruits. But here I have only 1 banana in my ES index so I need to fetch maximum 3 apple, 1 banana, 2 mango and 1 pineapple (Since guava is not present in index we need to ignore it.
Is there a way to fetch fruits like this in ES in a single query. I don't want to use multiple queries.
Thanks
It is not possible to fetch results directly,Try using Aggregation in elasticsearch. You can refer to link below,
[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html]

Resources