ElasticSearch: get the bucket key inside bucket scripted_metric - elasticsearch

I am trying to run this query in elasticsearch. Im trying to run a custom scripted_metric aggregation on my buckets. Within the metric script, I want to get access to the bucket key that it is aggregated on.
My documents in ES looks like this.
{
user_id: 5,
data: {
5: 200,
8: 300
}
},
{
user_id: 8,
data: {
5: 889,
8: 22
}
}
My aggregation query looks like this:
aggs = {
approvers: {
terms: {
field: 'user_id'
},
aggs: {
new_metric: {
scripted_metric: {
map_script: `
// IS IT POSSIBLE TO GET THE BUCKET KEY HERE?
// The bucket key here would be the user_id
// so i can do stuff like
doc['data'][**_term**]....
`
}
}
}
}

I had to do some digging and was likely having the same difficulty you were in finding a solution as to how to retrieve parent values... the only thing I could find was in regard to a special "_count" value on the child agg, but nothing related to its parent bucket names/keys.
If it's not a strict requirement to use a child agg with of a scripted_metric, I was able to find a way that allows you to at least access the bucket key within the parents. Maybe this can get you started in the direction of a solution:
aggs = {
approvers: {
terms: {
field: 'user_id',
script: '"There seems to be a magic value here: " + _value'
}
}
Sample adapted from this

Related

Can't sort buckets based on specific fields of complex key

New to Open Search and couldn't really find an answer that worked for this use case. Essentially, my query uses scripts to access field document values within a multi_term search, then aggregates them into buckets reflecting certain metrics. The bucket key is an array of strings in the format of ['val1', 'val2', 'val3'] with an associated key_as_string of 'val1|val2|val3'
My goal is to be able to sort these buckets after aggregation based on any of these 3 values. Problem is, I can't seem to get sorting to work outside of a root "order" entry that sorts by the entire key (I think). Query is here:
aggregations: {
plans: {
multi_terms: {
size: 10000,
terms: [
{
script: "doc['plan.title.keyword'].value"
},
{
script: "doc['plan.type.keyword'].value"
},
{
script: "doc['plan.id.keyword'].value"
}
],
order: { _key: order } // This orders buckets by entire key?
},
aggregations: {
completed: {
filter: {
term: { 'status.keyword': 'Completed' }
}
},
in_progress: {
filter: {
term: { 'status.keyword': 'Started' }
}
},
stopped: {
filter: {
term: { 'status.keyword': 'Stopped' }
}
},
assigned: {
filter: {
term: { 'status.keyword': 'Assigned' }
}
},
my_bucket: {
bucket_sort: {
sort: [{_key: {order: 'asc'}}] // Breaks sort
}
}
}
}
},
The output of the query is correct, but the order of buckets output is not and I can't seem to get it right. I've attempted various ways of implementing bucket_sort to no avail. Feels like there is an easy solution to this and I'm just not finding it. My end goal is to be able to sort the buckets returned by a specified index of the key.
Can anyone tell me what I'm doing wrong here?
Note: Using Open Search v2.3

$elemMatch with $in SpringData Mongo Query

I am in the process of attempting to create a method that will compose a query using Spring Data and I have a couple of questions. I am trying to perform a query using top level attributes of a document (i.e. the id field) as well as attributes of an subarray.
To do so I am using a query similar to this:
db.getCollection("journeys").find({ "_id._id": "0104", "journeyDates": { $elemMatch: { "period": { $in: [ 1,2 ] } } } })
As you can see I would also like to filter using $in for the values of the subarray. Running the above query though result in wrong results, as if the $elemMatch is ignored completely.
Running a similiar but slightly different query like this:
db.getCollection("journeys").find({ "_id._id": { $in: [ "0104" ] } }, { journeyDates: { $elemMatch: { period: { $in: [ 1, 2 ] } } } })
does seem to yield better results but it returns the only first found element matching the $in of the subarray filter.
Now my question is, how can I query using both top level attributes as well subarrays using $in. Preferably I would like to avoid aggregations. Secondly, how can I translate this native Mongo query to a Spring data Query object?

Unable to sort aggregation bucket results in elasticsearch

I am trying to execute a query in elasticsearch to get a list of products with the largest sales change percentage. The aggregation results should be group by productId and sorted by salesChangePercent. I have search around for a solution and tried solutions such as sorting elasticsearch top hits results but I am not able to sort the aggregation buckets by salesChangePercent. The following query is the only one which work, however it does not seem right to me as I am using "max_salesChangePercent" to do the sorting.
Am I doing something wrong here? Is there a better or cleaner way to get the aggregation buckets sorted? Really appreciate any help I can get to improve the query.
GET product_sales/_search
{
“size”: 0,
“query”: {
“range”: {
“salesChangePercent”: { “gte”: 50 }
}
},
“aggs”: {
“unique_products”: {
“terms”: {
“field”: “productId”,
“order" : {
“max_salesChangePercent”: “desc”
}
},
“aggs”: {
“top-sales”: {
“top_hits”: {
“size”: 1,
“_source”: {
“includes”: [
“productId”,
“productName”,
“salesChangePercent”,
]
}
}
},
“max_salesChangePercent”: {
“max”: {
“field”: “salesChangePercent”
}
}
}
}
}
}

Sorting ElasticSearch query by multiple fields

I have some data that I'm trying to sort in a very specific order.
I've looked over a few questions here on SO and Elasticsearch sort on multiple queries was pretty helpful. From what I can tell I'm getting the data back in the correct order but it's not always the same data and appears to be very random as to what is returned from the query.
My question is, how do I get my data sorted correctly and get the expected data each time?
Example Data
[
{
id: 00,
...
current_outage: {
device_id: 00,
....
},
forecasted_outages: [
{
device_id: 00
}
]
},
{
id: 01,
...
current_outage: {
device_id: 01,
....
},
forecasted_outages: []
},
{
id: 02,
...
current_outage: null,
forecasted_outages: [
{
device_id: 02
}
]
},
{
id: 03,
...
current_outage: null,
forecasted_outages: []
},
]
Current Query
bool: {
should: [
{
constant_score: {
boost: 6,
filter: {
nested: {
path: 'current_outage',
query: {
exists: {
field: 'current_outage'
}
}
}
}
}
},
{
nested: {
path: 'forecasted_outages',
query: {
exists: {
field: 'forecasted_outages'
}
}
}
}
]
}
Just to reiterate, the above query returns the data in the format/sorted method I expect but it does NOT return the data that I expect each time. The returned data is very random as far as I can tell.
Sort Criteria:
First: Data with both current_outage and one or more forecasted_outages
Second: Data with only current_outage
Third: Data with only forecasted_outages
Edit
The data returning can be anything from zero to thousands of results depending on a user. The user has an option to paginate the data or return all of their relevant data.
Edit 2
The data returned will be anywhere from zero to 1,000 hits.
If the search hits is more than 10 (default result size) and all documents have same score (in your case it could be as you are provided constant score), then the data returned could be different for each run (giving randomness feeling).
The reason for this is, the search results are merged from different shards till the hit count reaches 10 and rest of the results are ignored. So every run can have different result based on the shards merged.
Increasing the result size to include all the search result can provide same data for every run.
UPDATE
Changing the Shard count to 1 might help (you have close and reopen the index if the index is already created).
PUT /twitter/_settings
{
"index" : {
"number_of_shards" : 1
}
}

ElasticSearch | Randomize results with same score

In ElasticSearch is it possible to randomize the order of search results with equal score without losing pagination?
I'm hosting a database with thousands of job candidates. When a company are searching for a particular skill (or a combination of skills), it's always the same order (and thus the candidates in the top of search results are having a huge advantage)
Example for a search query:
let params = {
index: 'candidates',
type: 'candidate',
explain: true,
size: size,
from: from,
body: {
_source: {
includes: ['firstName', 'middleName', 'lastName']
},
query: {
bool: {
must: [/* Left out */],
should: [/* Left out */],
}
}
}
};
Henry's answer is good, but I think it is easier to do:
function_score: {
query: {
...
},
random_score: {
seed: 12345678910,
field: '_seq_no',
weight: 0.0001
},
boost_mode: 'sum'
So there is no need to boost the original score, just weight the random score down so that it contributes little (but still enough to break ties).
I do dislike such approach to break ties though, because even if you are contributing just a little to the score, you could still change order of results between results which do not have the same score, but have the score very close. This is why I opened this feature request.
You could use a function_score query, wrap your bool query in it and add a random_score function. Next step is to find the good weighting that match your needs using "boost" and "boost_mode" or "weight"...
Note that if you use filters the output score will be 0 so you will need to change the "boost_mode" from "multiply" to "replace", "sum" or something else...
Finally, don't forget to add a seed (and field as of ES 7.0) to the random_score to keep a near-consistent pagination
From your example I would suggest something like :
let params = {
...
body: {
...
function_score: {
query: {
bool: {
must: [/* Left out */],
should: [/* Left out */],
boost: 100
}
},
random_score: {
seed: 12345678910,
field: '_seq_no'
},
boost_mode: 'sum'
}
}
};

Resources