Elastic Search Aggregation and Details - elasticsearch

I am trying to get the teachers name too in this query..
From this I am able to get loop the teachers and get the number of classes she is working for and also the amount of money she gets for each year.
But I can't get full details in this query. I want to display teachers name too.
here is my current query
{
"aggs": {
"teacher": {
"terms": {
"field": "teacher_id",
"size": 10
},
"aggs": {
"academic_year": {
"date_histogram": {
"field": "acc_year",
"interval": "year"
},
"aggs": {
"income": {
"stats": {
"field": "teacher_hourly_fee"
}
}
}
}
}
}
},
"size": 0
}

Most straightforward approach may be to combine teacher ID and name as a generated term using a script:
{
"aggs" : {
"teacher" : {
"terms" : {
"script" : "_source.teacher_id + '-' + _source.teacher_name",
"size": 10
}
}
}
}
Adjust script particulars per your actual schema.

Related

Query returns result with small size that is not my intention in elasticsearch

I am using rest api to query the result from ElasticSearch.
Below is the API query string.
GET /..../_search
{
"size":0,
"query": {
"bool": {
"must": [
{ "range": {
"#timestamp": {
"time_zone": "+09:00",
"gte": "2023-01-24T00:00:00.000Z",
"lt": "2023-01-24T03:03:00.000Z" } } },
{
"term" : {
"serviceid.keyword" : {
"value" : "430011397"
}
}
}
]
}
},
"aggs": {
"by_day": {
"auto_date_histogram": {
"field": "#timestamp",
"minimum_interval":"minute"
},
"aggs": {
"agg-type": {
"terms": {
"field": "nxlogtype.keyword",
"size": 100000
},
"aggs": {
"my-sub-agg-name": {
"avg": {
"field": "size"
}
}
}
}
}
}
}
}
As you can see, I specified the time range about three hours in gte and lt field.
However, the result returns only 6 buckets which have 30 minute intervals.
I expected that many buckets will be returned with one minute interval during the timestamp I specified, but the result is always same even though I changed the time range as more extended one.
Since I am quite new to elastic search, I am not familiar with query usage.
How to resolve my issue?

Elasticsearch question, How to order by inner aggregate of date_histogram aggregation?

Thanks for checking the question
my query like below
"size":0,
"aggs": {
"result" : {
"date_histogram": {
"field": "time",
"calendar_interval": "day"
},
"aggs": {
"user": {
"terms": {
"field": "user.number"
},
"aggs" : {
"privacy_types": {
"nested": {
"path": "list"
},
"aggs": {
"totalCnt": {
"sum": {
"field": "list.count"
}
}
}
}
}
}
}
}
}
this is my result
enter image description here
I want to group by date and user.number and sort by totalCnt.
My query is not getting the desired result
how can i get it to work properly?
I'm struggling for 3 days, please help :(

Elasticsearch - get N top items in group

I keep such data in elasticsearch with such a structure.
"_source" : {
"artist" : "Roger McGuinn",
"track_id" : "TRBIACM128F930021A",
"title" : "The Bells Of Rhymney",
"score" : 0,
"user_id" : "61583201a0b70d3f7ed79b60",
"timestamp" : 1634991817
}
How can I get the top N songs with the best score for each user. If a user has rated a song several times, I would like to take into account only the most recent rating.
I'm done with this ,but instead the top 10 songs for the user, I just get the first 10 songs found, without including the score
{
"size": 0,
"aggs": {
"group_by_user": {
"terms": {
"field": "user_id.keyword",
"size": 1
},
"aggs": {
"group_by_track": {
"terms": {
"field": "track_id.keyword"
},
"aggs": {
"take_the latest_score": {
"terms": {
"field": "timestamp",
"size": 1
},
"aggs": {
"take N tracks": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
}
}
What I understand is that you'd want to return list of valid users with the highest rated track based on date/times.
You can make use of Date Histogram aggregation followed by Terms aggregation on which you can further extend pipeline to include Top Hits aggregation:
Aggregation Query:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"songs_over_time": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1h", <---- Note this. Change this to 1d if you'd want to return results on daily basis
"min_doc_count": 1
},
"aggs": {
"group_by_user": {
"terms": {
"field": "user_id.keyword",
"size": 10 <---- Note this. To return 10 users
},
"aggs": {
"take N tracks": {
"top_hits": {
"sort": [
{
"score": {
"order": "desc". <---- Also note this to sort based on score
}
}],
"_source": {
"includes": ["track_id", "score"]. <---- To return track_id and score
},
"size": 1
}
}
}
}
}
}
}
}
What this would give you for e.g since I'm using fixed_interval as 1h is, for every hour, return all highest rated track of valid users in that time.
Feel free to filter out the docs using Range Query on which you can run the above aggregation query.

How to detect the number of days that a person passed in a city?

I have the following mapping in Elasticsearch:
PUT /traffic-data
{
"mappings": {
"traffic-entry": {
"_all": {
"enabled": false
},
"properties": {
"CameraId": {
"type":"keyword"
},
"VehiclePlateNumber": {
"type":"keyword"
},
"DateTime": {
"type":"date"
}
}
}
}
}
I want to calculate how many days per month has a vehicle stayed. A unique vehicle is identified by VehiclePlateNumber.
So, I want to get the result something like this:
VehiclePlaneNumber Month StayDays
111 1 5
222 1 1
...
How can I do it using Elasticsearch query?
This is what I tried:
GET traffic-data/_search?
{
"size": 0,
"aggs":{
"by_district":{
"terms": {
"field": "VehiclePlateNumber",
"size": 100000
},
"aggs": {
"by_month": {
"terms": {
"field": "DateTime",
"size": 12
}
}
}
}
}
}
You can do terms aggregation on Vehicle plate number then a terms sub agg on month then a sum sub agg on days.
Something like:
GET traffic-data/_search
{
"size": 0,
"aggs":{
"by_district":{
"terms": {
"field": "VehiclePlateNumber",
"size": 100000
},
"aggs": {
"by_month": {
"terms": {
"field": "DateTime",
"size": 12
},
"aggs": {
"days": {
"sum": {
"field": "days"
}
}
}
}
}
}
}
}
Month should be a scripted field but would be better to compute it at index time.
That should work.
Or you can use entity centric design and regularly index that value computed. See https://www.elastic.co/elasticon/2015/sf/building-entity-centric-indexes

Elasticsearch derivate of a deep metric

I have a web crawler that collects data and stores snapshots several times a day. My query has some aggregations that group the snapshots together per day and return the last snapshot of each day using top_hits.
The documents look like this:
"_source": {
"taken_at": "2016-02-01T11:27:09.184-03:00",
... ,
"my_metric": 113
}
I'd like to be able to calculate the derivative of a certain metric, say my_metric, of the documents returned by top_hits (i.e., the derivative of the last snapshots of each day's my_metric).
Here's what I have so far:
{
"aggs": {
"filtered_snapshots": {
"filter": {
// ...
},
"aggs" : {
"grouped_data": {
"date_histogram": {
"field": "taken_at",
"interval": "day",
"format": "YYYY-MM-dd",
"order": { "_key" : "asc" }
},
"aggs": {
"resource_by_date": {
"terms": { "field": "remote_id" },
"aggs": {
"latest_snapshots": {
"top_hits": {
"sort": { "taken_at": { "order": "asc" }},
"size" : 1
}
}
}
},
"my_metric_deriv": {
"derivative": {
"buckets_path": "resource_by_date>latest_snapshots>my_metric"
}
}
}
}
}
}
}
}
I get a "No aggregation [my_metric] found for path ..." error with the query above.
Am I using a wrong bucket_path? I've read through the bucket_path and the derivative documentation and haven't found much that could help.
The documentation mentions briefly "deep metrics", stating that they can be limited in some ways, which I couldn't quite understand. I'm not sure how or if the limitations affect my case.

Resources