Elasticsearch stats aggregation group by date on timeseries - elasticsearch

I having some trouble getting a query working. I want to aggregate a weather station's timeseries data in ElasticSearch. I have a value (double) for each day of the year. I would like a query to be able to provide me the sum, min, max of my value field, grouped by month.
My document has a stationid field and a timeseries object array:
}PUT /stations/rainfall/2
{
"stationid":"5678",
"timeseries": [
{
"value": 91.3,
"date": "2016-05-01"
},
{
"value": 82.2,
"date": "2016-05-02"
},
{
"value": 74.3,
"date": "2016-06-01"
},
{
"value": 34.3,
"date": "2016-06-02"
}
]
}
So I am hoping to be able to query this stationid: "5678" or the doc index:2
and see: stationid: 5678, monthlystats: [ month:5, avg:x, sum:y, max:z ]
Many thanks in advance for any help. Also happy to take any advice on my document structure too.

Related

ElasticSearch. How can I get one document without counting all documents by filter?

I want to get any document by a filter if it exists and I do not want ElasticSearch to count how many documents fit this filter.
Example, there are docs:
{"name": "dima", "age": 15},
{"name": "amid", "age": 15}
I want one document (size=1) where age is 15, I don't want ElasticSearch to waste time counting all
I do not need this:
"hits": {
"total": {
"value": 2,
...
},
You can add a size field to tell Elastic how many docs to return, if you'd only like 1 then you can just use 1 in the size field. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
For example:
GET my-index/_search
{
"size": 1,
"query": {
"bool": {
"filter": [
{
"term": {
"age": "15"
}
}
]
}
}
}
Alternatively you can just specify the size in the query string params
my-index/_search?size=1
I understand that this is not exactly what you want, because you'd rather Elastic just stop looking as soon as it finds the first one, but don't think that is a possibilty .. check out the conversation on Elastic here https://discuss.elastic.co/t/how-to-stop-searching-on-first-match/15507/3

Elastic search - Retrieve data from multiple types

I am working with elasticsearch and I have two types which hold my data.
Now I need to retrieve data from both types by using a single query.
Please see my use case
I have two types called basic and marks and I saved document as follows
myindex/basic
{ "id": "100", "name": "Tom" }
myindex/basic
{ "id": "101", "name": "John"
}
myindex/marks
{ "id": "100", "mark": "300"
}
myindex/marks
{ "id": "101", "mark": "500" }
Now i need to get the name and mark of a student his id is 100.
Is there any possibility to get result like this.
I came to know that these type of data model is not good for nosql but here i need it as these records are replicated from RDBMS DB.
Any suggestion please and thanks in advance.
You can query both types in single query by listing them in url
POST myindex/basic,marks/_search
You can also filter all of them by id
POST myindex/basic,marks/_search
{
"query": {
"bool": {
"must": [
{"term": {
"id": {
"value": 100
}
}}
]
}
}
}

Checking "never seen" values in Elasticsearch

I'm using ES 5.X for indexing syslog messages with a Timestamp.
At the end of the day I need to make a query to know, for a given field, which values are never seen before in my index history.
Any ideas how to achive this goal in an efficient way?
As an example,
Suppose in date 2017/06/19 the following document has been indexed:
{
"text": "hello",
"date": "2017/06/19"
}
Now, in date 2017/06/20 the following documents has been indexed:
{
"text": "hello",
"date": "2017/06/20"
}
{
"text": "world",
"date": "2017/06/20"
}
{
"text": "from",
"date": "2017/06/20"
}
{
"text": "Europe",
"date": "2017/06/20"
}
At 23:59 of 2017/06/20 I want to know which new values for text field has been discovered today. I'm wondering if is there any better solution than take each single value and query the text field with a range filter.
The query should return "world", "from", "Europe".

Elasticsearch arithmetic and nested aggregation

I've this kind of objects in my ElasticSearch:
"myobject": {
"type": "blah",
"events": [
{
"code": "code1"
"date": "2016-08-03 18:00:00"
},
{
"code": "code2"
"date": "2016-08-03 20:00:00"
}
]
}
I'd like to compute the average time spend in between events with code "code1" and events with type "code2". Basically, I need to subtract the date of "code2" from the date of "code1" for each object and then compute the average.
thanks for you help !
Plan B is definitely MUCH better. Anything you can do at indexing time, you should do. If you know you'll need that date difference, then you should compute it at indexing time and store it into another field.
You should definitely not worry about storing redundant data, Elasticsearch doesn't really care. Your cluster will be much better off storing a few more fields than doing heavy scripting during each query. Your users will appreciate, too, as they won't have to wait for ages to get an answer as your data grows.
So store this instead (time_spent is the number of milliseconds between the second and the first event):
"myobject": {
"type": "blah",
"time_spent": 7200000,
"events": [
{
"code": "code1"
"date": "2016-08-03 18:00:00"
},
{
"code": "code2"
"date": "2016-08-03 20:00:00"
}
]
}
Then you'll be able to run a simple aggregation query like this:
{
"size": 0,
"aggs": {
"avg_duration": {
"avg": {
"field": "time_spent"
}
}
}
}

Elasticsearch - how to do field collapsing and get Distinct results? (actual records, not just counters)

In relational db our data looks like this:
Company -> Department -> Office
Elasticsearch version of the same data (flattened):
{
"officeID": 123,
"officeName": "office 1",
"state": "CA",
"department": {
"departmentID": 456,
"departmentName": "Department 1",
"company": {
"companyID": 789,
"companyName": "Company 1",
}
}
},{
"officeID": 124,
"officeName": "office 2",
"state": "CA",
"department": {
"departmentID": 456,
"departmentName": "Department 1",
"company": {
"companyID": 789,
"companyName": "Company 1",
}
}}
We need to find department (or company) by providing office information (such as state).
For example, since all I need is a department info, I can specify it like this (we are using Nest)
searchDescriptor = searchDescriptor.Source(x => x.Include("department"));
and get all departments with qualifying offices.
The problem is - I am getting multiple "department" records with the same id (one for each office).
We are using paging and sorting.
Would it be possible to get paged and sorted Distinct results?
I have spent a few days trying to find an answer (exploring options like facets, aggregations, top_hits etc) but so far the only working option I see would be a manual one - get results from Elasticsearch, group data manually and pass to the client. The problem with this approach is obvious - every time I grab next portion, I'll have to get X extra records just in case some of the records will be duplicate; since I don't know X in advance (and number of such records could be huge) will be forced either to get lots of data unnecessarily (every time I do the search) or to hit our search engine several times until I get required number of records.
So far I was unable to achieve my goal using aggregations (all I am getting is document count, but I want actual data; when I try to use top_hits, I am getting data, but those are really top hits (sorted by number of offices per department, ignoring sorting I have specified in the query); here is an example of the code I tried:
searchDescriptor = searchDescriptor.Aggregations(a => a
.Terms("myunique",
t =>
t.Field("department.departmentID")
.Size(10)
.Aggregations(
x=>x.TopHits("mytophits",
y=>y.Source(true)
.Size(1)
.Sort(k => k.OnField("department.departmentName").Ascending())
)
)
)
);
Does anyone know if Elasticsearch can perform operations like Distinct and get unique records?
Update:
I can get results using top_hits (see below), but in this case I won't be able to use paging (looks like Elasticsearch aggregations feature doesn't support paging), so I am back to square one...
{
"from": 0,
"size": 33,
"explain": false,
"sort": [
{
"departmentID": {
"order": "asc"
}
}
],
"_source": {
"include": [
"department"
]
},
"aggs": {
"myunique": {
"terms": {
"field": "department.departmentID",
"order": {
"mytopscore": "desc"
}
},
"aggs": {
"mytophits": {
"top_hits": {
"size": 5,
"_source": {
"include": [
"department.departmentID"
]
}
}
},
"mytopscore": {
"max": {
"script": "_score"
}
}
}
}
},
"query": {
"wildcard" : { "officeName" : "some office*" }
}
}

Resources