I'm adding a script_score to my DSL query, to boost results based on the price of a source product. something like this:
functions":[
{
"script_score":{
"script":"if (doc['price'].size() > 0 && doc['price'].value == 1000.0) {15}"
}
}
],
Here "1000.0" is the price of one document that is not part of the response. To achieve this, I have to query 2 times, first to get the price of that document and then frame the query and add the price to the query and boost the results, which is causing performance degradation.
There should be some way to do this using painless scripting to get the value of an id, but I'm not able to get it. It would be great if someone can help on this.
TIA!
Actually, there is a way. I've tried to reproduce your case below, I hope it's close enough to what you have.
Say, you have one index with prices, i.e. from where you first fetch the document with the price that should be boosted.
PUT prices/_doc/1
{
"price": 1000
}
Then, your main index is the one that contains the documents on which the second query runs, say it's for products and we have one product with price 1000 and another with price 500.
PUT products/_doc/q
{
"name": "product q",
"price": 1000
}
PUT products/_doc/2
{
"name": "product 2",
"price": 500
}
Now, the query would look like this. In the function_score query we give a boost of 15 (i.e. the hardcoded value in your script) to documents whose price matches the price of the document with ID 1 in the price index. The terms lookup query will fetch the price of 1000 from the specified document (i.e. with ID 1) and then will boost the documents having that price with a weight of 15.
GET products/_search
{
"query": {
"function_score": {
"functions": [
{
"weight": 15,
"filter": {
"terms": {
"price": {
"index": "prices",
"id": "1",
"path": "price"
}
}
}
}
]
}
}
}
I hope this solves your problem.
Related
I want to boost all documents from certain countries, let's say UAE and Egypt, by a factor of 500. Note that this factor has to be multiplied, not added, so I can't use bq.
My current solution is to use:
&boost=map(sum(termfreq(countryname,UAE),termfreq(countryname,Egypt)),0,0.1,1,500)
If the document is from UAE or Egypt, termfreq returns a value greater than 0, sum returns a value greater than 0 and map returns a boost value of 500.
However with this, I am having trouble boosting the countries where there is a space in the name. For example, Saudi Arabia.
&boost=map(sum(termfreq(countryname,Saudi Arabia)),0,0.1,1,500)
&boost=map(sum(termfreq(countryname,Saudi+Arabia)),0,0.1,1,500)
&boost=map(sum(termfreq(countryname,Saudi%20Arabia)),0,0.1,1,500)
All the above give errors.
I also tried
&boost=map(sum(termfreq(countryname,Arabia)),0,0.1,1,500)
but that did not boost the documents from Saudi Arabia.
Kindly suggest a solution here. Any help would be appreciated.
Tldr;
I sounds like you are in need for the function score query ?
The function_score allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.
Solution
The following request should help you respond to your needs.
GET /_search
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "terms": { "countryname": ["Saudi Arabia", "UAE"] } },
"script_score": {
"script": {
"source": "500"
}
}
}
],
"score_mode": "multiply"
}
}
}
I want to use Elasticsearch to improve performance on product search (duh) in an e-commerce solution. We have a data model where a product can have multiple variants and each variant can have one or more prices (sometime quite a substantial number of prices).
The user, query-time, chooses if (s)he wants to return products or variants and only one price should be returned (the lowest valid price, each price have a number of fields like valid from-to and valid customer groups).
My first approach was to denormalize product/variants and have prices as nested fields, but this was quite slow and I had a few problems sorting (I think on price, but the exact details eludes me right now).
Second approach was to totally denormalize so all product/variant/price combination is represented as a document. This approach is much faster (obviously), I can aggregate on productId or variantId and get the lowest price but the problem is that I cannot sort the aggregates on non-numeric or non-aggregate fields.
Denormalized documents (productId, variantId are keyword fields, price is numeric, validFrom/-To are date and the rest is text):
[
{
"productId": "111-222-333",
"variantId": "aaa-bbb-ccc",
"product_title": "Mega-product",
"product_description": "This awesome piece of magic will change your life",
"variant_title": "Green mega-product",
"variant_description": "Behold the awesomeness of the green magic mega-product",
"color": [
"blue",
"green"
],
"brand": "DaBrand",
"validFrom": "2019-06-01T00:00:00Z",
"validTo": null,
"price": 399
},
{
"productId": "111-222-333",
"variantId": "aaa-bbb-ddd",
"product_title": "Mega-product",
"product_description": "This awesome piece of magic will change your life",
"variant_title": "Blue mega-product",
"variant_description": "Behold the awesomeness of the blue magic mega-product",
"color": [
"blue",
"green"
],
"brand": "DaBrand",
"validFrom": "2019-06-01T00:00:00Z",
"validTo": null,
"price": 499
},
{
"productId": "111-222-333",
"variantId": "aaa-bbb-ddd",
"product_title": "Mega-product",
"product_description": "This awesome piece of magic will change your life",
"variant_title": "Blue mega-product",
"variant_description": "Behold the awesomeness of the blue magic mega-product",
"color": [
"blue",
"green"
],
"brand": "DaBrand",
"validFrom": "2019-06-05T00:00:00Z",
"validTo": "2019-06-10T00:00:00Z",
"price": 399
}
]
An example of a working query where I sort on the aggregated price.
{
"size": 1,
"sort": {
"product_name_text_en.keyword": "asc"
},
"query": {
// All the query and filtering
},
"aggs": {
"by_product_id": {
"terms": {
"field": "product_id_string",
"order": {
"min_price": "desc"
}
},
"aggs": {
"min_price": {
"min": {
"field": "price_decimal"
}
}
}
}
}
}
However, using this approach I cannot find a way to sort on document fields. It is possible (I think) on numeric, boolean and date fields using bucket_sort, but I need to be able to sort on, for example, brand or title field (which are text). If it would've been possible to order on a top_hits aggregation I would be home free, but that's unfortunately not possible as I understand from the docs (I've also tried it just to make sure).
Can anyone guide me to a better solution? I don't mind if I have to do the query in two steps, but to make that work for sorting I likely need to have a few different "document types", like Product, Variant, ProductPrice and VariantPrice to use depending on the requested sort order. I'm not the far gone so remodelling is definitively on the table, I've considered using join fields, but I'm not sure that would be performant.
Since the number of products and variants (and prices) can be significant - a million products is definitively on the table, I think I will have problems getting Id's from a query (for example filtering on brand and sorting on title) and then sending them into a get-best-price-query.
I figured this out by accident when I was reading the docs for another case. It all became very simple when I found out about Field collapsing. I feel like I should've known about this...
The index have the same model as in my initial question but the query became much simpler:
{
"size": 10,
"query": {
// filter/match stuff, including filtering valid prices.
},
"collapse": {
"field": "productId",
"inner_hits": {
"name": "least_price",
"collapse": {
"field": "price"
},
"size": 1,
"sort": [
{
"price": "asc"
}
]
}
},
"sort": [
{
"brand.keyword": "asc"
}
]
}
And to return variants instead of products I just collapse on variantId
The collapsing is based on productId or variantId and the least_price for the inner_hits returns the document with the least price (asc sorted by price and picking the first) of the document matching my criterias. Works like a charm.
In Elastic search I am trying to filter employees with more than 80% attendance in a given date range.
Model is
{
userId_ids:1,
AvailableDays:["2019-05-10","2019-05-11","2019-05-12",......,"2019-12-30"]
}
availability days can be 5 year data and need to fetch all employees with more than 80% availablity in a date range "2019-01-01"- "2019-12-30"
I've come up with the below solution where I've made use of below noted aggregation queries. Note the tree structure of query which would help in understanding the parent/sibling aggregations.
Range Query
Terms Aggregation
Cardinality Aggregation on date field
Top Hits Aggregation (to retrieve the document)
Bucket Selector Aggregation
Now I've simply made use of Range query first to filter the documents that would fall in that range.
For sake of simplicity, I've considered using the below query which would return the list of employee whose attendance is greater than or equal to 80% from 1st-Jan-2019 to 10th-Jan-2019 i.e. only for 10 days.
Note that I've added some comments wherever required to change the query depending on your use-case
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"query":{
"range": {
"availabilityDates": {
"gte": "2019-01-01",
"lte": "2019-01-10"
}
}
},
"aggs":{
"student":{
"terms":{
"field":"userId.keyword"
},
"aggs":{
"count_dates_attendance":{
"cardinality":{
"field":"availabilityDates"
}
},
"hits": {
"top_hits": {
"size": 10 <---- Returns only 10 students. Change to see more students
}
},
"myfinal":{
"bucket_selector":{
"buckets_path":{
"attendanceCount":"count_dates_attendance"
},
"script": {
"params": {
"count_days": 10 <----- Change this to 365 if your range is for an entire year
},
"inline": "params.attendanceCount/params.count_days >= 0.8"
}
}
}
}
}
}
}
Only thing that would you need to do is manually calculate the number of days between two days and update the count_days based on your requirements. I've added 10 because that's the range I've used in my query.
Hope this helps!
lets say I have a simple product index for docs like this:
{
"product_name": "some_product",
"category": "some_cotegory",
"price": "200"
"sold_times": "5",
"store": "store1"
}
and I want to get the most expensive products in their category and per store that have been sold less than 3 times and I want them to be ordered by store, category and price.
I can use two terms aggregations and top hits aggregation to get the most expensive products in their category per store, but how I sort and filter these top hits result? I really need to filter the results after the top hits agg is performed, so the filter query is not the solution. How can I do this? Thx
EDIT:
Long story short - I need elastic equivalent for SQL:
SELECT p.*
FROM products AS p
INNER JOIN (
SELECT max(price) AS price, categroy, store
FROM products
GROUP BY category, store
) AS max_prices ON p.price = max_prices.price AND p.category = max_prices.category AND p.store = max_prices.store
WHERE p.sold_times < 3;
You could filter the search to only return products sold less than 3 times, then aggregate those by store and category, then finally apply a top hits aggregation to get the most expensive item in the category (for that store). Something like
{
"size": 0,
"query": {
"range": {
"sold_times": {
"lt": 3
}
}
},
"aggs": {
"store": {
"terms": {
"field": "store",
"size": 10
},
"aggs": {
"category": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"most_expensive": {
"top_hits": {
"size": 1,
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
Well, after some search, I have found "possible" solution. I could use Bucket Selector aggregation, together with some script that would make accessible the top hits properties for filtering and similar approach for sorting using Bucket Sort aggregation (some info can be found here: How do I filter top_hits metric aggregation result [Elasticsearch])
But I'm facing another issue with aggregations. Because a lot of categories I want to use a pagination (as "scroll" or "size and from" used in common search query) but it cannot be done easily with aggregations. There's a Composite Aggregation which could do something similar, but after all the query would be so complicated so it scares me a lot so I decided to give it up and make the grouping outside of elastic.
It is sad that there is no an easy way to do such a common analytic query in elastic.
I currently try to prototype a product recommendation system using the Elasticsearch Significant Terms aggregation. So far, I didn't find a good example yet which deals with "flat" JSON structures of sales (here: The itemId) coming from a relational database, such as mine:
Document 1
{
"lineItemId": 1,
"lineNo": 1,
"itemId": 1,
"productId": 1234,
"userId": 4711,
"salesQuantity": 2,
"productPrice": 0.99,
"salesGross": 1.98,
"salesTimestamp": 1234567890
}
Document 2
{
"lineItemId": 1,
"lineNo": 2,
"itemId": 1,
"productId": 1235,
"userId": 4711,
"salesQuantity": 1,
"productPrice": 5.99,
"salesGross": 5.99,
"salesTimestamp": 1234567890
}
I have around 1.5 million of these documents in my Elasticsearch index. A lineItem is a part of a sale (identified by itemId), which can consist of 1 or more lineItems What I would like to receive is the, say, 5 most uncommonly common products which were bought in conjunction with the sale of one specific productId.
The MovieLens example (https://www.elastic.co/guide/en/elasticsearch/guide/current/_significant_terms_demo.html) deals with data in the structure of
{
"movie": [122,185,231,292,
316,329,355,356,362,364,370,377,420,
466,480,520,539,586,588,589,594,616
],
"user": 1
}
so it's unfortunately not really useful to me. I'd be very glad for an example or a suggestion using my "flat" structures. Thanks a lot in advance.
It sounds like you're trying to build an item-based recommender. Apache Mahout has tools to help with collaborative filtering (formerly the Taste project).
There is also a Taste plugin for Elasticsearch 1.5.x which I believe can work with data like yours to produce item-based recommendations.
(Note: This plugin uses Rivers which were deprecated in Elasticsearch 1.5, so I'd check with the authors about plans to support more recent versions of Elasticsearch before adopting this suggestion.)
Since I don't have the amount of data that you do, try this:
get the list of itemIds for bundles that contain a certain productId that you want to find "stuff" for:
{
"query": {
"filtered": {
"filter": {
"term": {
"productId": 1234
}
}
}
},
"fields": [
"itemId"
]
}
Then
using this list create this query:
GET /sales/sales/_search?search_type=count
{
"query": {
"filtered": {
"filter": {
"terms": {
"itemId": [1,2,3,4,5,6,7,11]
}
}
}
},
"aggs": {
"most_sig": {
"significant_terms": {
"field": "productId",
"size": 0
}
}
}
}
If I understand correctly you have a doc per order line item. What you want is a single doc per order. The Order doc should have an array of productIds (or an array of line item objects that each include a productId field).
That way when you query for orders containing product X the sig_terms aggregation should find product Y is found to be uncommonly common in these orders.