Elasticsearch searching and sorting across 2 models - elasticsearch

I have 2 models: Products and Skus, where a Product has one or more Skus, and a Sku belongs to exactly one Product. They have the following columns:
Product: id, title, content, category_id
Sku: id, product_id, price
I'd like to be able to display 48 products per page across various search and sort configurations, but I'm having trouble translating this to elasticsearch.
For example, it's not clear to me how I would search on title while sorting the relevant results by the lowest-priced Sku for each Product. I've tried a few different things, and closest has been to index everything as belonging to the Sku, then searching like so:
size: '48',
aggs: {
group_by_product: {
terms: { field: 'product_id' }
}
},
filter: {
and: [{
bool: {
must: { range: { price: { gte: 0, lte: 50 } } }
},{
bool: {
must: { terms: { category_id: [ 1, 2, 3, 4, 5, 6 ] } }
}
}]
},
query: {
fuzzy_like_this: {
fields: [ 'title', 'content' ],
like_text: 'Chair',
fuzziness: 1
}
}
But this gives 48 matching Skus, many of which belong to the same Product, so my pagination is off if I try to combine them after the search.
What would be the best way to handle this use case?
Update
Trying with the nested method, using the following structure:
{
size: '48',
query:
{ bool:
{ should:
{ fuzzy_like_this:
{ fields: [ 'title' ],
like_text: 'chair',
fuzziness: 1 },
},
{ must:
{ nested:
{ path: 'skus',
query:
{ bool:
{ must: { range: { price: { gte: 0, lte: 100 } } }
}
}
}
}
}
}
},
sort:
{ _score: 'asc',
'skus.price':
{ nested_path: 'skus',
nested_filter:
{ range: { 'skus.price': { gte: 0, lte: 100 } } },
order: 'asc',
mode: 'min'
}
}
}
This is likely closer, but still not sure how to format it. The above gives products ordered by price, but seems to completely disregard the search field.

Since paginating aggregation results is not possible, even though the approach of including the sku inside the product is a good one, I would go with nested objects depending on the requirements for queries.
As an example query:
GET /product/test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "whatever",
"fuzziness": 1,
"prefix_length": 3
}
}
},
{
"nested": {
"path": "skus",
"query": {
"range": {
"skus.price": {
"gte": 11,
"lte": 50
}
}
}
}
}
]
}
},
"sort": [
{
"skus.price": {
"nested_path": "skus",
"order": "asc",
"mode": "min"
}
}
]
}

Related

Elasticsearch - Sort query based on collapse results

I'm trying to group/stack items based on their SKU.
Currently if sorting from high to low, an item thats being sold for $10 or $1, will show the $1 item first (because it's also sold for $10 it will be placed in front of the array ofcourse). The sorting should only respect the lowest_price for its sorting operation, for only that specific SKU.
Is there a way so I can do sorting based on the lowest_price of for every SKU and only return 1 single item per SKU?
If the results from the collapse could be used as variable for the sorting, this could be solved but I haven't been able to find out how this work.
My item object looks like this:
{
itemId: String,
sku: String,
price: Number
}
This is my query:
let itemsPerPage = 25;
let searchQuery = {
from: itemsPerPage * page,
size: itemsPerPage,
_source: ['itemId'],
sort: [{'sale.price': 'desc'}],
query: {
bool: {
must: [],
must_not: []
}
},
collapse: {
field: 'sku',
inner_hits: [{
name: 'lowest_price',
size: 1,
_source: ['itemId'],
sort: [{
'price': 'asc'
}]
}
],
}
};
You need to add sort underneeth collapse.
example:
GET /test/_search
{
"query": {
"function_score": {
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match" : {
"job_status" : "SUCCESS"
}
}
]
}
}
}
}
}
},
"collapse": {
"field": "run_id.keyword"
},
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
This may solve your issue.

Elasticsearch: Order by date field (descending): gauss or field_value_factor?

I have an issue concerning the modification of the score document according to its creation date. I have tried gauss function and field_value_factor.
The fist one is (all the query clause):
#search_definition[:query] = {
function_score:{
query: {
bool: {
must: [
{
query_string: {
query: <query_term>,
fields: %w( field_1ˆ2
field_2ˆ3
...
field_n^2),
analyze_wildcard: true,
auto_generate_phrase_queries: false,
analyzer: 'brazilian',
default_operator: 'AND'
}
}
],
filter: {
bool: {
should: [
{ term: {"boolean_field": false}},
{ terms: {"array_field_1": options[:key].ids}},
{ term: {"array_field_2.id": options[:key].id}}
]
}
}
}
},
gauss:{
date_field: {
scale: "1d",
decay: "0.5"
}
}
}
}
With this configuration, I am telling elastic that the last documents must have a higher score. When I execute the query with it, the result is totally the opposite! The oldest documents are being returned firstly. Even if I change the origin to
origin: "2010-05-01 00:00:00"
which is the date of the first document, the oldest ones are also being retrieved firstly. What am I doing wrong?
With field_value_factor, the things are better, but not yet what I am waiting for.... (all the query clause is)
#search_definition[:query] = {
function_score:{
query: {
bool: {
must: [
{
query_string: {
query: <query_term>,
fields: %w( field_1ˆ2
field_2ˆ3
...
field_n^2),
analyze_wildcard: true,
auto_generate_phrase_queries: false,
analyzer: 'brazilian',
default_operator: 'AND'
}
}
],
filter: {
bool: {
should: [
{ term: {"boolean_field": false}},
{ terms: {"array_field_1": options[:key].ids}},
{ term: {"array_field_2.id": options[:key].id}}
]
}
}
}
},
field_value_factor: {
field: "date_field",
factor : 100,
modifier: "sqrt"
}
}
}
With this other configuration, the documents from 2016 and 2015 are being returned firstly, however there are tons of documents from 2016 that receive less score than others from 2015, even if I set a modifier "sqrt" with factor: 100 !!!!
I suppose guass function would be the appropriate solution. How can I invert this gauss result? Or how can I increase the field_value_factor so that the 2016 comes before the 2015??
Thanks a lot,
Guilherme
You might want to try putting gauss function insides functions param and give it a weight like following query. I also think scale is too low which could be making lot of documents score zero. I have also increased decay to 0.8 and given higher weight to recent documents. You could also use explain api to see how scoring is done.
{
"function_score": {
query: {
bool: {
must: [{
query_string: {
query: < query_term > ,
fields: % w(field_1ˆ2 field_2ˆ3
...field_n ^ 2),
analyze_wildcard: true,
auto_generate_phrase_queries: false,
analyzer: 'brazilian',
default_operator: 'AND'
}
}],
filter: {
bool: {
should: [{
term: {
"boolean_field": false
}
}, {
terms: {
"array_field_1": options[: key].ids
}
}, {
term: {
"array_field_2.id": options[: key].id
}
}]
}
}
}
},
"functions": [{
"gauss": {
"date_field": {
"origin": "now"
"scale": "30d",
"decay": "0.8"
}
},
"weight": 20
}]
}
}
Also the origin should be latest date so rather than origin: "2010-05-01 00:00:00", try
origin: "2016-05-01 00:00:00"
Does this help?

Elasticsearch custom function score

i am trying to do a search with custom functions to modify document score.
I have a mapping with specialities stored inside a hospital and every speciality has a priority with it:
Something like:
hospital:{
name: 'Fortis',
specialities: [
{
name: 'Cardiology',
priority: 10
},
{
name: 'Oncology',
priority: 15
}
]
}
Now i have a function score :
functions: [{
filter: {terms: {'specialities.name' => params[:search_text]}},
script_score: {script: "_score * doc['specialities.priority'].value"}
},
I have a filter query to match the search text to any speciality.
Like if i search Oncology, it will match and then I have specified a script_score to take priority of that speciality and add it to final score of document.
But, it is taking the priority of the first speciality it encounters that is 10 and a score of 1 for the filter matched and the end score is 11 not 21 (priority of oncology + 1 for filter match)
I solved it using nested mapping in elasticsearch.
Lucene internally has no concept of storing object mappings by default, so if I am looking to store priority for every speciality I should have a mapping like this:
hospital: {
properties: {
specialities: {
type: nested,
properties: {
name: {
type: 'string'
}priority: {
type: 'long'
}
}
}
}
}
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/nested.html
After that, I was able to define function score with nested query and my query looks like this:
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "specialities",
"query": {
"function_score": {
"score_mode": "sum",
"boost_mode": "sum",
"filter": {
"terms": {
"specialities.name.raw": ["Oncology"]
}
},
"functions": [
{
"field_value_factor": {
"field": "specialities.priority"
}
}
]
}
}
}
}
]
}
}
}
}

Elastic search returning wrong results

I am running a query against elastic search but the results returned are wrong. The idea is that I can check against a range of fields with individual queries. But when I pass the following query, items which don't have the included lineup are returned.
query: {
bool: {
must: [
{match:{"lineup.name":{query:"The 1975"}}}
]
}
}
The objects are events which looks like.
{
title: 'Glastonbury'
country: 'UK',
lineup: [
{
name: 'The 1975',
genre: 'Indie',
headliner: false
}
]
},
{
title: 'Reading'
country: 'UK',
lineup: [
{
name: 'The Strokes',
genre: 'Indie',
headliner: true
}
]
}
In my case both of these events are returned.
The mapping can be seen here:
https://jsonblob.com/567e8f10e4b01190df45bb29
You need to use match_phrase query, match query is looking for either The or 1975 and it find The in The strokes and it gives you that result.
Try
{
"query": {
"bool": {
"must": [
{
"match": {
"lineup.name": {
"query": "The 1975",
"type": "phrase"
}
}
}
]
}
}
}

How to filter terms aggregation

Currently I have something like this
aggs: {
categories: {
terms: {
field: 'category'
}
}
}
and this is giving me number of products in each category. But I have additional condition. I need to get number of products in each category which are not sold already, so I need to perform filter on terms somehow.
Is there some elegant way of doing this using aggregation framework, or I need to write filtered query?
Thank you
You can merge between Terms Aggregation and Filter Aggregation, and this is how it should look: (tested)
aggs: {
categories: {
filter: {term: {sold: true}},
aggs: {
names: {
terms: {field: 'category'}
}
}
}
}
You can add also more conditions to the filter, I hope this helps.
Just to add to the other answer, you can also use a nested query. This is similar to what I had to do. I'm using Elasticsearch 5.2.
From the docs, here is the basic syntax:
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
This is how I implemented it:
GET <path> core_data/_search
{
"aggs": {
"NAME": {
"nested": {
"path": "ATTRIBUTES"
},
"aggs": {
"NAME": {
"filter": {
"term": {
"ATTRIBUTES.ATTR_TYPE": "EDUCATION_DEGREE"
}
},
"aggs": {
"NAME": {
"terms": {
"field": "ATTRIBUTES.DESCRIPTION",
"size": 100
}
}
}
}
}
}
}
}
This filtered the data down to one bucket, which is what I needed.

Resources