Automatically indexing by a field name as desc - elasticsearch

i have index type of book story that every week wants to put some books.
in this index i want to have always query by sorting a field name(in this case is "price" ) as desc so it's have some overhead on ES (cause of data volume)
in this service we always shows to user books by maximum to minimum price
is possible to have this feature automatically or manually for sorting document of book type in index always by price as desc and then when to want to query them it's always sorted by price as desc and dont need to give it by:
"sort" : { "price" { "order" : "desc" } }

No, you can not keep your data ordered based on a field. Elasticsearch keeps the data as Lucene segments inside. Take a look here to better understand internal structure of ES: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Elastic Search - Sorting & Filtering on nested Documents

I am working on an E-Commerce application. Catalog Data is being served by Elastic Search.
I have document's for Product which is already indexed in Elastic Search.
Document Looks something like this (Excluded few fields for the purpose of better readability):
{
"title" : "Product Name",
"volume" : "200gm",
"brand" : {
"brand_code" : XXXX,
"brand_name" : "Brand Name"
},
"#timestamp" : "2021-08-26T08:08:11.319Z",
"store" : [
{
"physical_unit" : 0,
"default_price" : 115.0,
"_id" : "1234_111",
"product_code" : "1234",
"warehouse_code" : 111,
"available_unit" : 100
}
],
"category" : {
"category_code" : 987,
"category_name" : "CategoryName",
"category_url_link" : "CategoryName",
"super_category_name" : "SuperCategoryName",
"parent_category_name" : "ParentCategoryName"
}
}
store object in the above document is the one where ES Query will look for price and to decide if item is in stock or Out Of Stock.
I would like to add more child objects to store (Basically data from multiple inventory). This can go up to more than 150 child objects for each product.
Eventually, A product document will look something like this with multiple inventory's data mapped to a particular document.
{
"title" : "Product Name",
"volume" : "200gm",
"brand" : {
"brand_code" : XXXX,
"brand_name" : "Brand Name"
},
"#timestamp" : "2021-08-26T08:08:11.319Z",
"store" : [
{
"physical_unit" : 0,
"default_price" : 115.0,
"_id" : "1234_111",
"product_code" : "1234",
"warehouse_code" : 111,
"available_unit" : 100
},
{
"physical_unit" : 0,
"default_price" : 125.0,
"_id" : "1234_112",
"product_code" : "1234",
"warehouse_code" : 112,
"available_unit" : 100
},
{
"physical_unit" : 0,
"default_price" : 105.0,
"_id" : "1234_113",
"product_code" : "1234",
"warehouse_code" : 113,
"available_unit" : 100
}
Upto N no of stores
],
"category" : {
"category_code" : 987,
"category_name" : "CategoryName",
"category_url_link" : "CategoryName",
"super_category_name" : "SuperCategoryName",
"parent_category_name" : "ParentCategoryName"
}
}
Functional Requirement :
For any product, we should show lowest price across all warehouse.
For EX: If a particular product has 50 store mapped to it, Elastic Search query should look into the nested object and get the value which is lowest in all 50 stores if item is available.
Performance should not be degraded.
Challenges :
If we start storing those many stores for each product, data will go considerably high. Will that be a problem ?
What would be the efficient way to extract the lowest price from nested document?
How would facets work within nested document ? Like if i apply price range filter ES picks up the data which was not showed earlier. (It might pick the data from other store which matches the range)
We are using template to query ES and the Version of the Elastic Search is 6.0.
Thanks in Advance!!
First there are improvements to nested document search in version 7.x that are worth the upgrade.
As for version 6.x, there are a lot of factors there that I could not give you a concrete answer. It also seems you may not be understanding the way that nested documents work, they are not relational.
In particular when you say that each product might have 50 stores mapped to it that sounds like you are implying a relationship, which will not exist with a nested document. However, the values from those 50 stores would be stored within an index nested under the parent document. Having 50 stores under a product or category does not sound concerning.
ElasticSearch has not really talked in terms of facets since the introduction of the aggregation framework. Its not that they dont exist, just not how they are discussed.
So lets try this. ElasticSearch optimizes its search and query through a divide and conquer mechanism. The data is spread across several shards, a configurable number, and each shard is responsible for reviewing its own data. Further, those shards can be distributed across many machines so that there are many cpus and lots of memory for the search. So growing the data doesn't matter if you are willing to grow the cluster, as it is possible to maintain a situation where each machine is doing the same amount of work as it was doing before.
Unlike a relational database, filters search terms allow Elastic to drastically reduce the data that it is looking at and a larger number of filters will improve performance where on a relational database performance declines.
Now back to nested documents. They are stored as a separate index, but instead of mapping the results to the nested doc, the results map to the parent doc id. So you're nested docs arent exactly in the same index as the rest of the document, though they are not truly separate either. But that does mean that the nested documents should have minimal impact the performance of the queries against the parent documents. But if your data size grows beyond the capacity of your current system you will still need to increase its size.
As to how you would query, you would use Elastic aggregations. These will allow you to calculate your "facet" counts and identify the best prices. The Elastic aggregations are very powerful and very fast. There are caveats that are well documented, but in general they will work as you expect.
In version 6.x query string queries cannot access the search criteria in a nested document, and a complex query must be used.
To recap
Functional Requirement :
For any product, we should show lowest price across all warehouse.
For EX: If a particular product has 50 store mapped to it,
ElasticSearch query should look into the nested object and get the
value which is lowest in all 50 stores if item is available.
Yes a nested aggregation will do this.
Performance should not be degraded.
Performance will continue to depend on the ratio of the size of the data to the overall cluster size.
Challenges :
If we start storing those many stores for each product, data will go considerably high. Will that be a problem ?
No this should not be a problem
What would be the efficient way to extract the lowest price from nested document?
Elastic Aggregations
How would facets work within nested document ? Like if i apply price range filter ES picks up the data which was not showed earlier. (It might pick the data from other store which matches the range)
Yes filtering can work with Aggregations very well. The aggregation will be based on the filtered data. In fact you could have an aggregation based on just minimum price, and in the same query then have an aggregation using your price ranges, which will give you the count of documents that have a store within that price range, and you could have a sub aggregation showing the stores under each price range.
We are using template to query ES and the Version of the Elastic Search is 6.0. Thanks in Advance!!
I know nothing about template. The ElasticSearch API is so dead simple I do not know why anyone uses additional tools on top of the API, they just add weight, and increase complexity and make key features not available because the wrapper author did not pass through the feature.

Sorting by product price considering special prices (client, group, country)

we have a shop with a few products (~ 5000).
There are, of course, category overview sites which show all products that are in the current category. A requirement is that all products can be sorted by price (ASC and DESC).
This already works (partially), because the problem is, in our Elasticsearch, we currently only have the "original" price, so any product discounts are not considered and therefore the sorting does not work correctly.
My task is it now to fix that.
But I am already struggling with "how to" persist the "special prices" into Elasticsearch.
The problem is every product can be discounted in general, on a customer level, on a customer group level and on a country level.
So I imagine a structure like this would be a start:
# current
{
"articleNumber": "12345",
...
"price": 9.99,
...
}
# new
{
"articleNumber": "12345",
...
"price": 9.99,
...
"special_prices": [
{
"customer": "123456",
"client_price": 5.99,
"client_group_price": null,
"country_de": null
"country_es": null,
...
},
...
]
}
Following thoughts:
The specials prices could be stored as a nested object inside the product index (but I am not sure how to do the sorting on it later)
Maybe I could create a second index with prices, then I would have two queries, but I guess that would be ok? Because I have to build a whole matrix with every customer we have (also ~5000), with every product with every possible price. But if I would have a second index then I would have to join and maybe the sorting is incorrect then
If possible, I would like to only persist any prices if a product has a special price and if not, I don't want to blow up the index
I tried something with painless to return the special price if one exists for the product and customer, but this gives me this:
...
"script": "if (doc['special_prices.customer'] != null && doc['special_prices.customer'].value == '123456') { return 12.45; } else { return doc['price']; }",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [special_prices.customer] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
...
Maybe something like SQL ORDER BY CASE WHEN would be an option?
Any ideas on how I should model and persist the special prices? And how can I achieve the sorting?
Is joining a second index a good idea?
Best regards
The error you see is because special_prices.customer is not indexed as keyword, and instead is a text (which allows full-text search). If you didn't specify mapping explicitly, Elasticsearch most likely created a keyword for you. Just try to replace special_prices.customer with special_prices.customer.keyword in your script.
The idea of using a script for sorting is good, given that you only have 5000 documents. Scripts do not have good performance, but in your case this might not matter.
In general this looks like a tough case, because you need some kind of joining between products and prices, and Elasticsearch is not good at joins. It has got some joining options: nested datatype, join datatype (a.k.a. parent-child), and denormalization. The last one you have already considered - when you put different prices in the original product document.
Unfortunately I can't recommend one over another, because there is no single recipe. I would try with scripts, and if performance is not good enough consider remodelling the data.
Hope that helps!

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

elastic search get distinct random field values

We have elastic search document that has following fields:
{
"stockId": 1
"sellerId": 100
}
Multiple stockId can be mapped to single sellerId but one stock can only be mapped to a single dealer. There are around 10K stocks mapped to 1K sellers. But each sellerId might have different number of stocks i.e. few might have 100 while others have only 1.
Problem Statement: We want to select 'N' random documents out of all these documents indexed. The condition is that each of these 'N' document should belong to different seller i.e. distinct "sellerId". (We need to give award to these sellers).
What I have tried: I am trying to solve this by elastic query that fetches 'N' random distinct 'sellerId'. (and then elastic query to fetch 1 document of each of these 'N' sellers). One way could be to aggregate on 'sellerId' and then pick random 'N' keys but this is not desirable approach performance wise. Can someone help with better query?
I would rebuild my mapping to create a nested document type, with seller being the parent and stockid being the nested object:
{
"sellerid" : {"type" : "integer" },
"stock_obj" : {
"type" : "nested",
"properties" : {
"stockid" : { "type" : "integer" }
}
}
When you rebuild your index, you would create only one object per seller. Each seller would have all of their stock ids. It seems like there are about 10 stocks per seller, elasticsearch can handle this fine. (If there are thousands of stocks per seller, I would do this differently)
Then, I would do a search for N sellers, sorted randomly, and then as a second sort field, you would sort the stock ids randomly. Not the simplest mapping, but the query is easy and should be fast.
Also, separately, if you're just dealing with ~10k seller/stock data points that are integers, using elasticsearch is probably overkill. It can do what you want, but its main purpose is for searching large amounts of text.

Resources