Adding a New field too slow - elasticsearch

I want to add a new field to my index which includes more than 20m documents. I have dictionary like this Template : [Catalog_id : {Keyword: Sold Count}]
Sold Counts = {1234: {Apple:50}, 3242: {Banana:20}, 3423: {Apple:23}, ...}
In the index, there are many documents which share the same catalog_id. According to each document's catalog_id, I want to add a new field.
_id: 12323423423, catalog_id: 1234, name: '....', **Sold Count: [Apple,50]**
What is the best way to insert a new field in this situtation?

Related

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

Update the value of a field in index based on its value in another index

There's an index_A that contains say about 10K docs. It has many fields like field_1, field_2, ...field_n and one of the fields is product_name.
Then there's another index_B that contains about 10 docs only and is a master catalogue sort of index. It has 2 fields: product_name and product_description.
e.g
{
"product_name" : "EES",
"product_desc" : "Elastic Enterprise Search"
}
{
"product_name" : "EO",
"product_desc" : "Elastic Observability"
}
index_A contains many fields, from that one of the fields is product_name. index_A does not have the field product_desc
I want to insert product_desc field into each document in index_A such that the value of product_name in index_A matches value of product_name in index_B.
i.e. something like set index_A.prod_desc = index_B.prod_desc where index_A.prod_name = index_B.prod_name
How can I achieve that?
Elasticsearch cannot do joins like that
the best approach would be to do this during indexing, using something like an ingest pipeline, or Logstash, or some other piece of code that pulls the description into the product document

Index DynamoDB streams to elastic search

I have a requirement for implementing following entities in a DynamoDB table
I have stored these entities in DynamoDB as below.
Partition Key : PROJ#ProjectId:CountryId
Sort Key : Project Name
Company : company data as JSON document
Since this is a one to many relationship, N number of projects of the same company will create N number of project records and same company details will be stored in their Company attribute. The reason for doing this is, the most critical data access point is via ProjectId and CountryId (Assume that I can't change this DB design)
I have a requirement to implement a search functionality which supports filter table using company name, address, project name, country etc (using a single filter or any combination of these filters). I'm using DynamoDB streams to feed elastic search cluster and update any creation, deleting or update of the details there and use elastic search API to query data.
But I need to index these data in following format, so that when I receive the details from elastic search, data will not be duplicated
{
"id" : 1
"name" : "ABC",
"description" : "description",
"address" : "address",
"projects" : [
{
"id" : 10,
"name" : "project 1",
"countryId" : 10
},
{
"id" : 20,
"name" : "project 1",
"countryId" : 10
}
]
}
At the record creation time, since Project records are creating as single records, is there any recommended or standard way that I can grab all the Project records of Company and create the above json document and index it in elastic search?
This is how I would approach it :
In elastic the document id will be the companyID
What you can do is create a lambda that is triggered based on the change streams and use elastic's update by query to query for the document and PAINLESS scripting to update the project section of the document, this will work for less frequent changes.

Filtering all field values per row

I have a table called 'sample'. Based on which algorithm is used, each sample may have different field (property) names.
I need to be able to retrieve all samples which have field values that contain/match a user filter value.
So for instance, if a sample has the following properties:
example 1: "name", "gender", "state"
and another had properties:
example 2: "name", "gender", "rate"
and there would be thousands of such samples with more variation.
If a user looking at a table with a set of samples from the second example above ("name", "gender", "rate") and used a filter "foo", I need to query the table "sample" for all rows where any of the property's values contained/matched "foo" where value could be "foobar".
If they were looking at a set of samples that had the properties that example 1 has ("name", "gender", "state"), then I need to do the same, however, I cannot hard code the properties of either.
In SQL I would get the field names and dynamically build a SQL query string but with REQL object DOT notation, I am struggling with how to do it.

Elasticsearch - Unique values in a field of an index

I have an index of a following type:
{
company: {
watchlist: [ {id: 1}, {id: 2}, {id, 1} ]
}
}
In the watchlist array in the indexes, duplicate values are stored. I want the indexes not to store duplicate values as this is increasing the size of my index.
I know that i can get unique values by calling aggregation, but what I want to do here is to store unique values in the index.
I am using elasticsearch rails here, it indexes data according to the json returned from 'as_indexed_json' method. The data for the above index is in sql database, which i cannot change. I can only create indexes from that database, so i need some 'uniqueness' constraint on the field 'watchlist'.
Is there a way to do it?

Resources