Conditional ElasticSearch sorting by different fields if in range

Conditional ElasticSearch sorting by different fields if in range - sorting

I do have have products and some of them are reduced in price for a specific date range.
(simplified) example products:
{
"id": 1,
"price": 2.0,
"specialPrice": {
"fromDate": null,
"tillDate": null,
"value": 0,
},
},
{
"id": 2,
"price": 4.0,
"specialPrice": {
"fromDate": 1540332000,
"tillDate": 1571781600,
"value": 2.5,
},
},
{
"id": 3,
"price": 3.0,
"specialPrice": {
"fromDate": null,
"tillDate": null,
"value": 0,
},
}
Filtering by price was no problem. That I could do with a simple bool query.
But I could not yet find a good example for ElasticSearch scripts that could point me in the right direction, even though it should be quite simple, given you know the syntax.
My pseudocode: price = ('now' between specialPrice.fromDate and specialPrice.tillDate) ? specialPrice.value : price
Is there a way to translate this into something that would work in an ElasticSearch sorting?
To clarify further: By default, all products are already sorted by several conditions. The user can also search for any terms and filter the results while also being able to select multiple sorting parameters. Items can for example be sorted by tags and then by price, it's all very dynamic and it does still sort those results by some other properties (including the _score) afterwards.
So just changing the _score would be bad, since that is already calculated in a complex matter to show the best results for the given search terms.
Here is my current script, which does fail at the first params.currentDate:
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if(doc['specialPrice.tillDate'] > params.currentDate) {params.currentPrice = doc['specialPrice.value']} return params.currentPrice",
"params": {
"currentDate": "now",
"currentPrice": "doc['price']"
}
}
}
How it does work now:
One problem was the nesting of some of the properties.
So one of my steps was to duplicate their content to new fields for the product (which I'm not that happy about, but whatever).
So in my mapping, I created new properties for products (specialFrom, specialTill, specialValue) and gave the corresponding fields in my specialPrice "copy_to" properties with the new property names.
The part is in php array syntax, since I'm using ruflin/elastica:
'specialPrice' => [
'type' => 'nested',
'properties' => [
'fromDate' => [
'type' => 'date',
'format' => 'epoch_second',
'copy_to' => 'specialFrom',
],
'tillDate' => [
'type' => 'date',
'format' => 'epoch_second',
'copy_to' => 'specialTill',
],
'value' => [
'type' => 'float',
'copy_to' => 'specialValue',
],
],
],
'specialFrom' => [
'type' => 'date',
'format' => 'epoch_second',
],
'specialTill' => [
'type' => 'date',
'format' => 'epoch_second',
],
'specialValue' => [
'type' => 'float',
],
Now my sorting sorting script does look like this (in my testing client, still working on implementing it within elastica):
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "params.param = ((doc['specialTill'].value - new Date().getTime()) > 0 && (new Date().getTime() - doc['specialFrom'].value) > 0) ? doc['specialValue'].value : doc['price'].value; return params.param;",
"params": {
"param": 0.0
}
}
}
}
I'm not 100% happy with this because I have redundant data and scripts (calling new Date().getTime() twice in the script), but it does work and that is the most important thing for now :)

I've updated the below query post your clarifications. Let me know if that works!
POST dateindex/_search
{
"query":{
"match_all":{ // you can ignore this, I used this to test at my end
}
},
"sort":{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":" params.param = ((doc['specialPrice.tillDate'].value - new Date().getTime()) > 0) ? doc['specialPrice.value'].value : doc['price'].value; return params.param;",
"params":{
"param":0.0
}
},
"order":"asc"
}
}
}
You can try using source instead of inline in the above query as I've been testing on ES5.X version on my machine.
Hope it helps!

Related

i want to change my API response to become array of array object in laravel

i have a problem for the response, i want to change the response API because i need for my mobile APP, the feature have filter object based on date. So i hope you all can help me to solve the problem
i wanna change the response for my API
before:
{
"tasks": [
{
"id": 7,
"user_id": 1,
"title": "gh",
"date": "2022-02-10 13:05:00",
"deskripsi": "gfh",
"created_at": "2022-02-09T06:05:56.000000Z",
"updated_at": "2022-02-09T06:05:56.000000Z"
},
{
"id": 5,
"user_id": 1,
"title": "ghf",
"date": "2022-02-17 16:05:00",
"deskripsi": "fghf",
"created_at": "2022-02-09T06:05:12.000000Z",
"updated_at": "2022-02-09T06:05:12.000000Z"
},
{
"id": 6,
"user_id": 1,
"title": "fgh",
"date": "2022-02-17 18:05:00",
"deskripsi": "gh",
"created_at": "2022-02-09T06:05:40.000000Z",
"updated_at": "2022-02-09T06:05:40.000000Z"
}
]
}
here is the code for the response API above
return response([
'tasks' => Task::where('user_id', auth()->user()->id)->where('date','>',NOW())->orderBy('date','asc')->get(),
],200);
and i want to change it my response API into this response
{
"tasks": [
{
"date": "2022-02-10",
"task": [
{
"id": 7,
"user_id": 1,
"title": "gh",
"date": "2022-02-10 13:05:00",
"deskripsi": "gfh",
"created_at": "2022-02-09T06:05:56.000000Z",
"updated_at": "2022-02-09T06:05:56.000000Z"
},
{
"id": 7,
"user_id": 1,
"title": "gh",
"date": "2022-02-10 15:05:00",
"deskripsi": "gfh",
"created_at": "2022-02-09T06:05:56.000000Z",
"updated_at": "2022-02-09T06:05:56.000000Z"
}
]
},
{
"date": "2022-02-12",
"task": [
{
"id": 7,
"user_id": 1,
"title": "gh",
"date": "2022-02-12 13:05:00",
"deskripsi": "gfh",
"created_at": "2022-02-09T06:05:56.000000Z",
"updated_at": "2022-02-09T06:05:56.000000Z"
}
]
},
]
}

Do groupBy on the resulting Collection from the query (see docs: https://laravel.com/docs/9.x/collections#method-groupby)
For example, you could do:
$tasksGroupedByDate = Task::where(.......)
->get()
->groupBy(fn (Task $task) => $task->date->format('Y-m-d'));
(Note: above uses PHP 7.4 arrow functions. Also, add a date cast on the date column in your Task model to be able to use ->format( directly on the date field)
The above code results to:
{
'2022-01-01' => [
{ Task object },
{ Task object },
{ Task object },
],
'2022-01-02' => [
{ Task object },
{ Task object },
{ Task object },
],
}
(used Task object for brevity, but that will be ['id' => 1, 'title' => 'Task name', .....])
To morph that to the structure you want, you can use map and then values to remove the keys and turn it back to an ordered array:
$tasksGroupedByDate->map(fn ($tasksInGroup, $date) => [
'date' => $date,
'task' => $tasksInGroup
])->values();
If you want to combine everything into one method chain:
return [
'tasks' => Task::where(......)
->get()
->groupBy(fn (Task $task) => $task->date->format('Y-m-d'))
->map(fn ($tasksInGroup, $date) => [
'date' => $date,
'task' => $tasksInGroup
])
->values(),
];

It sounds like you want to create a human friendly date field based on the date column, then group by it.
While solutions do exists to accomplish this at the database level, I believe you'd still need to loop around it again afterwards to get the hierarchy structure you're looking for. I don't think it's too complicated for PHP to loop through it.
My suggestion is as follows:
Before:
return response([
'tasks' => Task::where('user_id', auth()->user()->id)
->where('date','>',NOW())->orderBy('date','asc')->get(),
],200);
After:
$out = [];
$tasks = Task::where('user_id', auth()->user()->id)
->where('date','>',NOW())->orderBy('date','asc')->get();
foreach($tasks as $task) {
$date = strtok((string)$task->date, ' ');
if (empty($out[$date])) {
$out[$date] = (object)['date' => $date, 'task' => []];
}
$out[$date]->task[] = $task;
}
$out = array_values($out);
return response(['tasks' => $out], 200);
Note in the above I'm using the function strtok. This function might look new even to the most senior of php developers.... It's a lot like explode, except it can be used to grab only the first part before the token you're splitting on. While I could have used explode, since the latter part after the token isn't needed, strtok is better suited for the job here.

$task = Task::where('user_id', auth()->user()->id)->where('date','>',NOW())->orderBy('date','asc')->get();
foreach($task as $item){
$date[] = item->date;
$result = Task::where('user_id', auth()->user()->id)->where('date','=', $date)->get();
}
return response([
'tasks' =>
['date' => $date,
'task' => $task]
],200);
maybe something like this

Unable to create nested json output (aggregated) from CSV input

Issue I am facing is I need aggregation of CSV inputs on ID, and it contains multiple nesting. I am able to perform single nesting, but on further nesting, I am not able to write correct syntax.
INPUT:
input {
generator {
id => "first"
type => 'csv'
message => '829cd0e0-8d24-4f25-92e1-724e6bd811e0,GSIH1,2017-10-10 00:00:00.000,HCC,0.83,COMMUNITYID1'
count => 1
}
generator {
id => "second"
type => 'csv'
message => '829cd0e0-8d24-4f25-92e1-724e6bd811e0,GSIH1,2017-10-10 00:00:00.000,LACE,12,COMMUNITYID1'
count => 1
}
generator {
id => "third"
type => 'csv'
message => '829cd0e0-8d24-4f25-92e1-724e6bd811e0,GSIH1,2017-10-10 00:00:00.000,CCI,0.23,COMMUNITYID1'
count => 1
}
}
filter
{
csv {
columns => ['id', 'reference', 'occurrenceDateTime', 'code', 'probabilityDecimal', 'comment']
}
mutate {
rename => {
"reference" => "[subject][reference]"
"code" => "[prediction][outcome][coding][code]"
"probabilityDecimal" => "[prediction][probabilityDecimal]"
}
}
mutate {
add_field => {
"[resourceType]" => "RiskAssessment"
"[prediction][outcome][text]" => "Member HCC score based on CMS HCC V22 risk adjustment model"
"[status]" => "final"
}
}
mutate {
update => {
"[subject][reference]" => "Patient/%{[subject][reference]}"
"[comment]" => "CommunityId/%{[comment]}"
}
}
mutate {
remove_field => [ "#timestamp", "sequence", "#version", "message", "host", "type" ]
}
}
filter {
aggregate {
task_id => "%{id}"
code => "
map['resourceType'] = event.get('resourceType')
map['id'] = event.get('id')
map['status'] = event.get('status')
map['occurrenceDateTime'] = event.get('occurrenceDateTime')
map['comment'] = event.get('comment')
map['[reference]'] = event.get('[subject][reference]')
map['[prediction]'] ||=
map['[prediction]'] << {
'code' => event.get('[prediction][outcome][coding][code]'),
'text' => event.get('[prediction][outcome][text]'),
'probabilityDecimal'=> event.get('[prediction][probabilityDecimal]')
}
event.cancel()
"
push_previous_map_as_event => true
timeout => 3
}
mutate {
remove_field => [ "#timestamp", "tags", "#version"]
}
}
output{
elasticsearch {
template => "templates/riskFactor.json"
template_name => "riskFactor"
action => "index"
hosts => ["localhost:9201"]
index => ["deepak"]
}
stdout {
codec => json{}
}
}
OUTPUT:
{
"reference": "Patient/GSIH1",
"comment": "CommunityId/COMMUNITYID1",
"id": "829cd0e0-8d24-4f25-92e1-724e6bd811e0",
"status": "final",
"resourceType": "RiskAssessment",
"occurrenceDateTime": "2017-10-10 00:00:00.000",
"prediction": [
{
"probabilityDecimal": "0.83",
"code": "HCC",
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
{
"probabilityDecimal": "0.23",
"code": "CCI",
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
{
"probabilityDecimal": "12",
"code": "LACE",
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
}
]
}
REQUIRED OUTPUT:
{
"resourceType": "RiskAssessment",
"id": "829cd0e0-8d24-4f25-92e1-724e6bd811e0",
"status": "final",
"subject": {
"reference": "Patient/GSIH1"
},
"occurrenceDateTime": "2017-10-10 00:00:00.000",
"prediction": [
{
"outcome": {
"coding": [
{
"code": "HCC"
}
],
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
"probabilityDecimal": 0.83
},
{
"outcome": {
"coding": [
{
"code": "CCI"
}
],
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
"probabilityDecimal": 0.83
}
],
"comment": "CommunityId/COMMUNITYID1"
}

How to sort aggregations where the keys have international characters?

Given a database which contains a list of people, where they live, and their wealth/income/tax level, I've given my Elasticsearch 5.6.2 this mapping:
mappings => {
person => {
properties => {
name => {
type => 'text',
fields => {
raw => {
type => 'keyword',
},
},
},
county => {
type => 'text',
fields => {
raw => {
type => 'keyword',
},
},
},
community_name => {
type => 'text',
fields => {
raw => {
type => 'keyword',
},
},
},
wealth => {
type => 'long',
},
income => {
type => 'long',
},
tax => {
type => 'long',
},
},
},
},
One county can have several communities, and I want to do an aggregation that creates an overview of the average wealth/income/tax for each of the counties and for each each county's communities.
This seems to work:
aggs => {
counties => {
terms => {
field => 'county.raw',
size => 100,
order => { _term => 'asc' },
},
aggs => {
communities => {
terms => {
field => 'community_name.raw',
size => 1_000,
order => { _term => 'asc' },
},
aggs => {
avg_wealth => {
avg => {
field => 'wealth',
},
},
avg_income => {
avg => {
field => 'income',
},
},
avg_tax => {
avg => {
field => 'tax',
},
},
},
},
avg_wealth => {
avg => {
field => 'wealth',
},
},
avg_income => {
avg => {
field => 'income',
},
},
avg_tax => {
avg => {
field => 'tax',
},
},
},
},
},
However, the "county" and "community_name" aren't sorted correctly because some of them have Norwegian characters in them, meaning that ES sorts "Ål" before "Øvre Eiker", which is wrong.
How can I achieve correct Norwegian sorting?
EDIT: I tried changing the "community_name" field to use "icu_collation_keyword" instead of "keyword":
community_name => {
type => 'text',
fields => {
raw => {
type => 'icu_collation_keyword',
index => 'false',
language => 'nb',
},
},
},
But this results in garbled output:
Akershus - 276855 - 229202 - 80131
ᦥ免⡠႐໠  - 314430 - 243684 - 87105
↘卑◥猔᠈〇㠖 - 202339 - 225665 - 78186
⚞乀⃠᷀　 - 306985 - 237405 - 83186
⦘卓敫တ倎瀤 - 218060 - 218407 - 75602
⸳䄓†怜〨 - 271174 - 216843 - 75257

If the field on which you want to do aggregation (community_name in your example) always has only one value, then I think you might try the following approach, which is an extension of where you got so far.
Basically, you can add another sub-aggregation on the original, non-garbled value, and fetch it on the client side for display.
I will show it on a simplified mapping:
PUT /icu_index
{
"mappings": {
"my_type": {
"properties": {
"community": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"norwegian": {
"type": "icu_collation_keyword",
"index": false,
"language": "nb"
}
}
},
"wealth": {
"type": "long"
}
}
}
}
}
We store community name as:
unchanged as community;
as keyword in community.raw;
as icu_collation_keyword in community.norwegian.
Then we put couple of documents (note: community_name has a string argument, not list of strings):
PUT /icu_index/my_type/2
{
"community": "Ål",
"wealth": 10000
}
PUT /icu_index/my_type/3
{
"community": "Øvre Eiker",
"wealth": 5000
}
Now we can do the aggregation:
POST /icu_index/my_type/_search
{
"size": 0,
"aggs": {
"communities": {
"terms": {
"field": "community.norwegian",
"order": {
"_term": "asc"
}
},
"aggs": {
"avg_wealth": {
"avg": {
"field": "wealth"
}
},
"community_original": {
"terms": {
"field": "community.raw"
}
}
}
}
}
}
We are still sorting by community.norwegian, but we also add sub-aggregation on community.raw. Let's see the result:
"aggregations": {
"communities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "⸳䃔楦၃৉瓅ᘂก捡㜂\u0000\u0001",
"doc_count": 1,
"community_original": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Øvre Eiker",
"doc_count": 1
}
]
},
"avg_wealth": {
"value": 5000
}
},
{
"key": "⸳䄏怠怜〨\u0000\u0000",
"doc_count": 1,
"community_original": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Ål",
"doc_count": 1
}
]
},
"avg_wealth": {
"value": 10000
}
}
]
}
}
Now the buckets are sorted by ICU collation of community name. The first bucket with key "⸳䃔楦၃৉瓅ᘂก捡㜂\u0000\u0001" has its original value in community_original.buckets[0].key, which is "Øvre Eiker".
NB: This hack will of course not work if community_name can be a list of values.
Hope this hack helps!

How to query array of objects as part of term query

I am using elasticsearch 5.5.0.
Im my index i have data of type attraction part of the json in elastic looks like:
"directions": "Exit the M4 at Junction 1",
"phoneNumber": "03333212001",
"website": "https://www.londoneye.com/",
"postCode": "SE1 7PB",
"categories": [
{
"id": "ce4cf4d0-6ddd-49fd-a8fe-3cbf7be9b61d",
"name": "Theater"
},
{
"id": "5fa1a3ce-fd5f-450f-92b7-2be6e3d0df90",
"name": "Family"
},
{
"id": "ed492986-b8a7-43c3-be3d-b17c4055bfa0",
"name": "Outdoors"
}
],
"genres": [],
"featuredImage": "https://www.daysoutguide.co.uk/media/1234/london-eye.jpg",
"images": [],
"region": "London",
My next query looks like:
var query2 = Query<Attraction>.Bool(
bq => bq.Filter(
fq => fq.Terms(t => t.Field(f => f.Region).Terms(request.Region.ToLower())),
fq => fq.Terms(t => t.Field(f => f.Categories).Terms(request.Category.ToLower())))
The query generated looks like:
{
"query": {
"bool": {
"filter": [
{
"terms": {
"region": [
"london"
]
}
},
{
"terms": {
"categories": [
"family"
]
}
}
]
}
}
}
That returns no results. If i take out the categories bit i get results. So i am trying to do term filter on categories which is an array of objects. Looks like I am doing this query wrong. Anyone any hints on how to get this to work?
Regards
Ismail

You can still use strongly typed properties access by using:
t.Field(f => f.Categories.First().Name)
NEST's property inferrer will reader will read over .First() and yield categories.name.
t.Field(f => f.Categories[0].Name) works as well.

Elasticsearch filter logic

I can't find results when filtering by category. Removing the category filter works.
After much experimentation, this is my query:
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "*",
"zero_terms_query": "all",
"operator": "and",
"fields": [
"individual_name.name^1.3",
"organisation_name.name^1.8",
"profile",
"accreditations"
]
}
},
"filter": {
"bool": {
"must": [{
"term": { "categories" : "9" }
]}
}
}
}
}
This is some sample data:
{
_index: providers
_type: provider
_id: 3
_version: 1
_score: 1
_source: {
locations:
id: 3
profile: <p>Dr Murray is a (blah blah)</p>
cost_id: 3
ages: null
nationwide: no
accreditations: null
service_types: null
individual_name: Dr Linley Murray
organisation_name: Crawford Medical Centre
languages: {"26":26}
regions: {"1":"Auckland"}
districts: {"8":"Manukau City"}
towns: {"2":"Howick"}
categories: {"10":10}
sub_categories: {"47":47}
funding_regions: {"7":7}
}
}
These are my indexing settings:
$index_settings = array(
'number_of_shards' => 5,
'number_of_replicas' => 1,
'analysis' => array(
'char_filter' => array(
'wise_mapping' => array(
'type' => 'mapping',
'mappings' => array('\'=>', '.=>', ',=>')
)
),
'filter' => array(
'wise_ngram' => array(
'type' => 'edgeNGram',
'min_gram' => 5,
'max_gram' => 10
)
),
'analyzer' => array(
'index_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'char_filter' => array('html_strip', 'wise_mapping'),
'filter' => array('standard', 'wise_ngram')
),
'search_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'char_filter' => array('html_strip', 'wise_mapping'),
'filter' => array('standard', 'wise_ngram')
),
)
)
);
Is there a better way to filter/search this? The filter worked when I used snowball instead of nGram. Why is this?

You are querying the category field looking for term 9, but the category field is actually an object:
{ "category": { "10": 10 }}
So your filter should look like this instead:
{ "term": { "category.9": 9 }}
Why are you specifying the category in this way? You'll end up with a new field for every category, which you don't want.
There's another problem with the query part. You are querying multiple fields with multi_match and setting operator to and. A query for "brown fox":
{ "multi_match": {
"query": "brown fox",
"fields": [ "foo", "bar"]
}}
would be rewritten as:
{ "dis_max": {
"queries": [
{ "match": { "foo": { "query": "brown fox", "operator": "and" }}},
{ "match": { "bar": { "query": "brown fox", "operator": "and" }}}
]
}}
In other words: all terms must be present in the same field, not in any of the listed fields! This is clearly not what you are after.
This is quite a hard problem to solve. In fact, in v1.1.0 we will be adding new functionality to the multi_match query which will greatly help in this situation.
You can read about the new functionality on this page.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Conditional ElasticSearch sorting by different fields if in range - sorting

Related

i want to change my API response to become array of array object in laravel

Unable to create nested json output (aggregated) from CSV input

How to sort aggregations where the keys have international characters?

How to query array of objects as part of term query

Elasticsearch filter logic

Categories

Resources