Aggregation on time interval - elasticsearch

Let's say document has
{startDate,endDate}
fields among other fields.
And now we want to create range aggregation based on these fields.
So document must appear in each bucket that overlaps with start-end interval.
I found that aggregation is not supported yet on Range Fields => https://github.com/elastic/elasticsearch/issues/34644
As a workaround => I can use script and since I know size(time interval) of buckets I could generate array of values between start-end dates
aggregation_field : [startDate + bucketSize, startDate + bucketSize * 2, .... endDate]
But in some cases this array could be huge.
Are there other workarounds ?
Thanks !

Related

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

Elasticsearch query with sorting by array of values

Is that possible in ES to query and sort by array of values like for ex.
Give me all results, but results with "country_code" = [ 'de', 'au', 'es'] should be prioritized at the given order like they are in array
It is possible???
Elasticsearch does not really handle arrays, internally it's just same field having 3 different values: country_code = "de" AND country_code = "au" AND country_code = "es" all at the same time. You can though use script based sorting and handle arrays in Painless.

Get final score by sum of multiple fields boost

I want to build a search that prioritizes the amount of field matches instead of one field over another. All the fields would have the same boost value and the final score should be calculated by sum matched fields boost. If the full text matches two fields and each field have boost 1, the final score would be 1 + 1 = 2.
Let's use an example:
class Event < ApplicationRecord
searchable do
text :title
text :category
text :artist_name
end
end
Suppose I have two events:
Event 1: Name: "Christmas festival" Artist name: "AC/DC"
Event 2: Name: "New year festival" Artist name: "Queen"
So, if the user searches just "festival", both events are returned with the same score because it matches both event's name.
But, if the user searches "festival AC/DC", I want to return Event 1 in the first place or just Event 1 because it matches the event name (festival) and the artist name (AC/DC). While Event 2 just matches the event name (festival). Event 1 score should be 2 while Event 2 score should be 1.
Any suggestion about How can I do that? Is this even possible?
It seems you are mixing up scoring and boosting, I think your question should be titled Compute total score by summing each field score (regardless of the boosts).
Field scores are computed based on field matches, and they can be applied arbitrary set of additive or multiplicative boosts (functions and/or matching subqueries). But in the end what you want is to compute the global score by summing each field score, not the boosts themselves.
DisMax query parser for example precisely allows you to control how the final score is computed using the tie (Tie Breaker) parameter :
The tie parameter specifies a float value (which should be something
much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields,
more than one field may match. If so, each field will generate a
different score based on how common that word is in that field (for
each document relative to all other documents). The tie parameter lets
you control how much the final score of the query will be influenced
by the scores of the lower scoring fields compared to the highest
scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction
max query": that is, only the maximum scoring subquery contributes to
the final score. A value of "1.0" makes the query a pure "disjunction
sum query" where it doesn’t matter what the maximum scoring sub query
is, because the final score will be the sum of the subquery scores.
Typically a low value, such as 0.1, is useful.
In your situation you need a disjunction sum query so you might want to set the tie to 1.0.

Kibana. Data tables. Exclude terms depending on the length

I'm storing sentences in Elasticsearch.
Example:
this is a sentence
this is a second sentence
And I want to show a data table with the most used terms in my Kibana 4.3.1, selecting:
Metric = count
Split rows
Aggregation = terms
Field = input
Order by = metric count
Order descending. Size 5
This is what I'm getting in the table:
this 2
is 2
a 2
sentence 2
second 1
And I want to remove the short words, with less than 3 chars. In this example, "is" and "a".
How can achieve this?
Thanks!
It works adding this Exclude Pattern:
[a-zA-Z0-9]{0,3}

Sum of total tokens in array

I have a document as below -
{
"array" : [ "Aone" , "Btwo" , "Aone" ]
}
I need to aggregate the sum of number of elements in array using aggregation.
value_count is giving me the unique tokens , but that is not what i am looking for.
First you need to make array a multi field with a new field called numOfTokens . Declare this field as token count.
You can find more about it here -http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
This will create an addition field called array.numOfTokens per document that will have the number of tokens for that field.
Next you can do a simple sum aggregation on that field using - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#search-aggregations-metrics-sum-aggregation

Resources