Update the value of a field in index based on its value in another index - elasticsearch

There's an index_A that contains say about 10K docs. It has many fields like field_1, field_2, ...field_n and one of the fields is product_name.
Then there's another index_B that contains about 10 docs only and is a master catalogue sort of index. It has 2 fields: product_name and product_description.
e.g
{
"product_name" : "EES",
"product_desc" : "Elastic Enterprise Search"
}
{
"product_name" : "EO",
"product_desc" : "Elastic Observability"
}
index_A contains many fields, from that one of the fields is product_name. index_A does not have the field product_desc
I want to insert product_desc field into each document in index_A such that the value of product_name in index_A matches value of product_name in index_B.
i.e. something like set index_A.prod_desc = index_B.prod_desc where index_A.prod_name = index_B.prod_name
How can I achieve that?

Elasticsearch cannot do joins like that
the best approach would be to do this during indexing, using something like an ingest pipeline, or Logstash, or some other piece of code that pulls the description into the product document

Related

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

How to update a document using index alias

I have created an index "index-000001" with primary shards = 5 and replica = 1. And I have created two aliases
alias-read -> index-000001
alias-write -> index-000001
for indexing and searching purposes. When I do a rollover on alias-write when it reaches its maximum capacity, it creates a new "index-000002" and updates aliases as
alias-read -> index-000001 and index-000002
alias-write -> index-000002
How do I update/delete a document existing in index-000001(what if in case all I know is the document id but not in which index the document resides) ?
Thanks
Updating using an index alias is not directly possible, the best solution for this is to use a search query using the document id or a term and get the required index. Using the index you can update your document directly.
GET alias-read/{type}/{doc_id} will get the required Document if doc_id is known.
If doc_id is not known, then find it using a unique id reference
GET alias-read/_search
{
"term" : { "field" : "value" }
}
In both cases, you will get a single document as a response.
Once the document is obtained, you can use the "_index" field to get the required index.
PUT {index_name}/{type}/{id} {
"required_field" : "new_value"
}
to update the document.

Automatically indexing by a field name as desc

i have index type of book story that every week wants to put some books.
in this index i want to have always query by sorting a field name(in this case is "price" ) as desc so it's have some overhead on ES (cause of data volume)
in this service we always shows to user books by maximum to minimum price
is possible to have this feature automatically or manually for sorting document of book type in index always by price as desc and then when to want to query them it's always sorted by price as desc and dont need to give it by:
"sort" : { "price" { "order" : "desc" } }
No, you can not keep your data ordered based on a field. Elasticsearch keeps the data as Lucene segments inside. Take a look here to better understand internal structure of ES: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

How to get the total documents count, containing a specific field, using aggregations?

I am moving from ElasticSearch 1.7 to 2.0. Previously while calculating Term Facets I got the Total Count as well. This will tell in how many documents that field exists. This is how I was doing previously.
TermsFacet termsFacet = (TermsFacet) facet;
termsFacet.getTotalCount();
It worked with Multivalue field as well.
Now in current version for Term Aggregation we don't have anything as Total Count. I am getting DocCount inside Aggregation bucket. But that will not work for muti-valued fields.
Terms termsAggr = (Terms) aggr;
for (Terms.Bucket bucket : termsAggr.getBuckets()) {
String bucketKey = bucket.getKey();
totalCount += bucket.getDocCount();
}
Is there any way I can get Total count of the field from term aggregation.
I don't want to fire exists Filter query. I want result in single query.
I would use the exists query:
https://www.elastic.co/guide/en/elasticsearch/reference/2.x/query-dsl-exists-query.html
For instance to find the documents that contain the field user you can use:
{
"exists" : { "field" : "user" }
}
There is of course also a java API:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-term-level-queries.html#java-query-dsl-exists-query
QueryBuilder qb = existsQuery("name");

Resources