Sum of total tokens in array - elasticsearch

I have a document as below -
{
"array" : [ "Aone" , "Btwo" , "Aone" ]
}
I need to aggregate the sum of number of elements in array using aggregation.
value_count is giving me the unique tokens , but that is not what i am looking for.

First you need to make array a multi field with a new field called numOfTokens . Declare this field as token count.
You can find more about it here -http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
This will create an addition field called array.numOfTokens per document that will have the number of tokens for that field.
Next you can do a simple sum aggregation on that field using - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#search-aggregations-metrics-sum-aggregation

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

query the first element of a list in ElasticSearch

In my Elasticsearch index I have fields that are lists of strings:
"city" = ["Boston","NY","Chicago"]
I need to write a query that searches only the first element of the list.
I have accomplished this by adding a new field that contains only the first element.
"city_first"="Boston"
I like to avoid creating a new field. Is there a way to write a query that searches only the 1st element of the list in Elasticsearch?

Aggregation on time interval

Let's say document has
{startDate,endDate}
fields among other fields.
And now we want to create range aggregation based on these fields.
So document must appear in each bucket that overlaps with start-end interval.
I found that aggregation is not supported yet on Range Fields => https://github.com/elastic/elasticsearch/issues/34644
As a workaround => I can use script and since I know size(time interval) of buckets I could generate array of values between start-end dates
aggregation_field : [startDate + bucketSize, startDate + bucketSize * 2, .... endDate]
But in some cases this array could be huge.
Are there other workarounds ?
Thanks !

Searching for multiple values in a String array in Elastic

I have a field that I am indexing into Elasticsearch that is an array of strings. So, for example, here is what the string array will look like in two records:
Record 1: {"str1", str2", str3", "str4", "str5"}
Record 2: {"str1", str2", str6", "str7", "str8"}
Question 1: I want to be able to query for multiple strings in this array. For e.g. my query has "str1", "str2". "str3" as the search parameter. I want to search for records where the string array has any of these three strings
Question 2: For the scenario above will Record 1 return with a higher score than record 2 (since all three strings are in the array for record 1 but only two are there in record 2).
Is this possible at all? Can you please help with what the query should look like and if the scoring works the way I stated.
You can index them as an array, such as:
{
"myArrayField": [ "str1", str2", str3", "str4", "str5" ],
...
}
You would then be able to query a number of ways, the simplest for your case being a match query (which is analyzed):
{
"match" : {
"myArrayField" : "str1 str2 str3"
}
}
Or a terms query (which is not analyzed):
{
"terms" : {
"myArrayField" : [ "str1", "str2", "str3" ]
}
}
And Yes, matches against more query terms will receive a higher score, so Record 1 would be scored higher than Record 2.

Resources