ElasticSearch Aggregating Against FIRST/Max Nested Document - elasticsearch

I'm using Elastic Search trying to get an aggregation of "last login country" for a set of users, and am not sure whether ES supports this type of aggregation? Here's a rough picture of the mapping:
User
UserId
Sessions (array)
Session1 - CreateDate, Country
Session2 - CreateDate, Country
What I'm wanting to do is pass in a date range, and get an output of the logins by country, with ONLY a single session per user. In other words, if the user logged in 3 times during the date range, only 1 of those sessions would count towards the overall count.
The output would look something like the following:
Country Aggregations
USA, Count: 10
Japan, Count: 15
Spain, Count: 23
I've been looking over nested aggregations, but I'm not sure they can give me what I need. The main problem I'm having is that if a User has multiple Sessions during the date range, each of those sessions contribute to the overall country count. Is there a way to filter this inner list of nested documents down so that only 1 will contribute to the aggregation per User?

I posted this question on ElasticSearch's github forum, and apparently this functionality is not available in the current ES version (1.4.2):
https://github.com/elasticsearch/elasticsearch/issues/9536

Related

Laravel elastic search display relevant data in top order

This is regarding to order the elastic search results in custom order.
I have city ids(integers) in my elastic search index, based on the user city selection the elastic search should happen.
For example:
Consider the id of Chennai is 1 & Mumbai is 2
If we have 10 records for Chennai and 20 records for Mumbai in elastic index. If the user choose Chennai, we should display the 10 records belongs to Chennai in top order and then display the remaining items.
If the user choose Mumbai, we should display the 20 records belongs to Mumbai in top order and then display the remaining items.
I am using sleimanx2/plastic laravel package for search. Appreciate if anyone help me to achieve this.
Is there any specific reason that you wish to achieve this with elastic?
The mentioned case seems to me like something i would achieve with two queries. One for the promoted, let's call them results, and one that would match everything else, except those that belong to the first query.
Then I would go ahead and display them to their respective areas or whatever.
There might be a way to merge those queries together and get your results as buckets that you can later use to create your markup accordingly, but honestly I am not sure that there is a reason to do it like this.
I hope I do not misunderstand your question,
Best Regards.

Advice on ElasticSearch query design

I've got ES documents that looks like this:
{
"auctionOn": "2018-01-01",
"inspections: [
{
"startsOn": "2018-01-02 09:00",
"endsOn": "2018-01-02 10:00"
}
]
}
I need the following answers from a search (or multiple searches)
number of documents with an auctionOn in the future (e.g > now)
number of documents with an inspection.startsOn in the future (e.g > now)
date histogram (day breakdown) of the next 7 days, with # of documents with a auctionOn on that day
date histogram (day breakdown) of the next 7 days, with # of documents with a inspection.startsOn on that day
So, i'm trying to figure out how to efficiently get these answers. I know i can/should test out all different approaches, but i'm relatively new to ES so easier said than done.
Can someone give me a advice (or ideally, a query) on how to get these 4 values?
Ideas i had:
Query for all documents with an inspection/auction in the future. Create date histogram aggregations filtered to the next 7 days for both auction and inspections. Use range aggregations to get number of docs with auction/inspection > today.
Pros: one search for all answers. Cons: lots of documents to aggregate over?
Create seperate searches (e.g msearch) for:
query all documents with an inspection in the next 7 days. aggregate by day.
query all documents with an auction in the next 7 days. aggregate by day.
query all documents with an inspection in the future. use hits to get total
query all documents with an auction in the future. use hits to get total.
Pros: queries are simpler.. more cache hits? Cons: 4 seperate searches.
Can someone please guide me down the right path, and give me hints on how to do the query/aggregations?
Thanks
Use range query on the field auctionOn setting from as current date and to date as null.
Use range query inside nested query on the field inspection.startsOn as above.
Use date histogram aggregation using interval as day
Same as 3.) but inside nested aggregation
You can adjust all these in one query.

Elasticsearch: group into buckets, reduce to one document per bucket, group these documents

I'm looking for a way how to compute the bounce rate of webpages with elastic search.
We collect data in the following simplified structure
{"id":"1", "timestamp"="2017-01-25:15:23", "sessionid"="s1", "page"="index"}
{"id":"2", "timestamp"="2017-01-25:15:24", "sessionid"="s1", "page"="checkout"}
{"id":"3", "timestamp"="2017-01-25:15:25", "sessionid"="s1", "page"="confirm"}
{"id":"4", "timestamp"="2017-01-25:15:26", "sessionid"="s2", "page"="index"}
{"id":"5", "timestamp"="2017-01-25:15:27", "sessionid"="s2", "page"="checkout"}
{"id":"6", "timestamp"="2017-01-25:15:26", "sessionid"="s3", "page"="product_a"}
{"id":"7", "timestamp"="2017-01-25:15:28", "sessionid"="s3", "page"="checkout"}
For this sample the result of the analysis should be:
2/3 of the users get lost at the checkout page.
1/3 of the users get lost at the confirm page
More formally, I'm looking for a generic approach how to implement the following algorithm in an elastic query:
group documents by a field
sort each group (bucket) by a second field and reduce to the topmost document
group all these remaining documents by a third field
sort groups by number of documents
My first attempt was to solve this with a terms aggregation followed by a top_hits aggregation and finally use a
terms_pipeline aggregation to group the pages.
(simplified aggregation structure)
aggs
terms
field: sessionid
aggs
top_hits
sort:timestamp desc
size: 1
terms_pipeline
bucket_path: terms>top_hits
field: page
... but unfortunately there is no such thing like a terms_pipeline aggregation. My bad.
Any ideas for an alternative approach?
Maybe I misunderstood something but if you are willing to know where your users are bouncing, since all pages are in a sequence, you could simply have a terms aggregation on the page field (to know which pages were visited) and a cardinalityone on the sessionid field (to know how many different unique sessions you have). In this case, cardinality(sessionid) would yield 3.
Then again, since all pages are in a sequence, I think you don't really need to know what happened within a given session.
In your example, from the terms(page) aggregation, you'd know that 3 users landed on the checkout page but only one went to the confirm one. Using the cardinality of the sessions, this implicitly means that 2 users (3 total sessions - 1 confirm page hit) bounced on the checkout page.

Elasticsearch aggregation on latest documents

I have a document which can be modified any number of times a day.
I've ordered these document in time series creating index for each day.
And each day would have multiple versions of the same document with different modified date.
Document sample:
{
id: 1234,
user: kc,
subscriptions: [
'paper1',
'paper2'
],
modified_date: 1466697434020
}
What I'm looking for is to get the latest documents in a particular time range for all users
and to apply aggregation on top of it.
That would give a result like, in the last week/month how many people are subscribed for each of the papers.
Using top_hits I was able to get the latest document for different users in a time range, but I cannot apply further aggregations on this set of data.

elasticsearch: How to set custom record count based on a type in search result

I have a search request like this:
My docs have a type, and the types like "sport", "health", "news" and etc, and now I want the result count is mapping percent by types:
sport 10%, health 30%, news 60%.
Eg: if I search to get 200 records, I expect that the 200 records include 20 news records, 60 health records and 120 news records.
thanks for any suggest!
johnson
the percent like
Based on your comment, I think the best approach to this is to Facet on the doc types, using a Terms Facet. You can then calculate the percentages based on count for each facet (doc type) in combination with the total hits and this calculated percentage will be valid across all paged hits. Only when the query itself is updated will you need to update the percentages. Hope this makes sense.

Resources