elasticsearch aggregation bucket - elasticsearch

In Kibana (or potentially even elasticsearch), is there a way to sort documents into buckets based on a field, and then compute statistics on the generated buckets themselves? Here is a simplified example of my problem:
I have logs with structure:
{
user_id: [string],
post_id: [string]
}
that signal a user with ID user_id has viewed post with ID post_id. I would like to:
bucket the logs by matching user_id
Count the amount of logs per bucket
Compute the 75th percentile of these bucket-specific counts
Is this possible in Kibana?

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Avoid ranking all matching documents in elasticsearch search query

I am having Elasticsearch index with multi-millions of documents. I am running a following search query.
POST testIndex/_search?size=200
{
"query": {
"query_string": {
"query": "(title:QA Manager OR title:QA Lead) AND (skills:JIRA OR skills:Software Development OR skills:Test Case)"
}
}
}
Even if we have passed the limit with size=200, it seems Elasticsearch is doing ranking for all the matching documents and bringing the top 200 with the highest rank.
Is there a way we can limit ranking? meaning do ranking on max 1000 matching documents only?
ES will consider your all data for search and ranking that is how Elasticsearch work. What basically do is, It executes your query in 2 phases, one is query and the second is fetch.
In Query Phase, it executes your query in all shades and get document id and score from each shard and return to requesting node. So in your scenario as size is set to 200, it will get 200 documents id from each shard and return to requesting node.
On requesting node, all the document id and score are merged and sorted based on score and select top document based on size param.
In Fetch phase, the actual docs are retrieved from individual shards where they reside based on ID which are selected in Query Phase and Results are returned to the client.
If you don't want to calculate score for some of your query, then you can move that query to the filter clause in bool query.

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

Limit facets and random count to value specified by user

Using Solr 6.4
When running a query over a set of documents I need to be able to return a random set of results that are limited to a number the customer requests.
e.g. The customer running the search wants 100 random documents from the 1,000,000 they have in their index.
Of course I could &fq={!frange incu=false l=0 u=1}mod(random_1927377943, 1)&pageSize=100... problem solved.
Not really, because I also need the facets for the type of document they are searching for in their index. However, the facets are related to the numFound which there could be any number found between 0-1,000,000.
"response":{"numFound":1000000,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"documentTypeId":[
"78",500000,
"3",250000,
"2",150000,
"1",100000,
How do I limit the random numFound results to the specific number the customer is asking for (100) AND where the facets of each documentTypeId reflect the total the number of random results the customer is requesting?
"response":{"numFound":100,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"documentTypeId":[
"78",50,
"3",25,
"2",15,
"1",10,

Filter ES query based on aggregation results

We have an index with the following document structure:
{
email: "test#test.com",
stuff ..
},
{
email: "test#test.com,
stuff...
},
{
email: anotherEmail#test.com,
stuf..
}
We need to get all records where the count of distinct email is > 2 for example. I know I can use an aggregation with a mininum doc count to find all counts of all records where there are at least 2 records for an email.
But what we need to do is actually get all the records where the count of distinct email is > X. So we need our query to constrain our results to only those records that match an aggregation.
I know that we can have a nested TopHits aggregation, but that is not good enough for us, because we need to be able to page through these results... there could be records where an email has 10k records for example. We need to be able to get these results in the Hits collection so that we can page them.
How would we go about doing something like that?

Resources