elasticsearch aggregation with date comparision and calcul - elasticsearch

Hi i am kinda new to elasticsearch. I need to get an aggregation with date comparison and a dynamic range filter.
Like i need to get documents count where created_at document is 1 week earlier than their identification_date.
So i tried something like this but my date param seems unused, actually changing it never changes my results.
"aggs": {
"identified": {
"terms": {
"script": "doc['created_at'].value > (doc['identification_date'].value - diff_date)
&& doc['created_at'].value < doc['identification_date'].value",
"params": {
"diff_date": 604800
}
}
}
}
Thank you for taking time helping.

since on the abstracted level you just want count of documents created 7 days before on some date field.you don't need aggregation for this if you don't want to group the results sets further on some fields.Simply you can a)use range filter query on your date field
{
"range" : {
"date" : {
"gte" : "now-7d/d",
"lt" : "now"
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

Related

#timestamp range query in elasticsearch

Can I make a range query on default timestamp field ignoring date values i.e. using only time in timestamp - say 2 hours of each day?
My intentions are to search for all the documents but exclude the documents indexed between 9 PM and 12 AM (I have seen example with date ranges in filtering).
timestamp example stands following:
"#timestamp": [
"2015-12-21T15:18:17.120Z"
]
Elasticsearch version: 1.5.2
My first idea would be to use the date math in Elasticsearch query, e.g. if you run your query at 1PM, this would work:
{
"query": {
"range" : {
"#timestamp" : {
"gte": "now-16h/h",
"lte": "now-1h/h"
}
}
}
}
(watch out for the timezone though).
As far as I know, the only other possibility would be to use scripting.
Please note also that you are running a very old version of Elasticsearch.
Edit If you need simply absolute date, then check how your #timestamp field look, and use the same format, for instance on my Elasticsearch, it would be:
{
"query": {
"range" : {
"#timestamp" : {
"gte": "2015-03-20T01:21:00.01Z",
"lte": "2015-03-21T01:12:00.04Z"
}
}
}
}

How to calculate difference between two datetime in ElasticSearch

I'm working with ES and I need a query that returns the difference between two datetime (mysql timediff), but have not found any function of ES to do that. Someone who can help me?
MySQL Query
SELECT SEC_TO_TIME(
AVG(
TIME_TO_SEC(
TIMEDIFF(r.acctstoptime,r.acctstarttime)
)
)
) as average_access
FROM radacct
Thanks!
Your best best is scripted fields. The above search query should work , provided you have enabled dynamic scripting and these date fields are defined as date in the mapping.
{
"script_fields": {
"test1": {
"script": "doc['acctstoptime'].value - doc['acctstarttime'].value"
}
}
}
Note that you would be getting result in epoch , which you need to convert to your denomination.
You can read about scripted field here and some of its examples here.
Here is another example using script fields. It converts dates to milli seconds since epoch, subtracts the two and converts the results into number of days between the two dates.
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "priorTransactionDate"
}
},
{
"script": {
"script": "(doc['transactionDate'].date.millis - doc['priorTransactionDate'].date.millis)/1000/86400 < 365"
}
}
]
}
}
}

Elastic Search filter with aggregate like Max or Min

I have simple documents with a scheduleId. I would like to get the count of documents for the most recent ScheduleId. Assuming Max ScheduleId is the most recent, how would we write that query. I have been searching and reading for few hours and could get it to work.
{
"aggs": {
"max_schedule": {
"max": {
"field": "ScheduleId"
}
}
}
}
That is getting me the Max ScheduleId and the total count of documents out side of that aggregate.
I would appreciate if someone could help me on how take this aggregate value and apply it as a filter (like a sub query in SQL!).
This should do it:
{
"aggs": {
"max_ScheduleId": {
"terms": {
"field": "ScheduleId",
"order" : { "_term" : "desc" },
"size": 1
}
}
}
}
The terms aggregation will give you document counts for each term, and it works for integers. You just need to order the results by the term instead of by the count (the default). And since you only want the highest ScheduleID, "size":1 is adequate.
Here is the code I used to test it:
http://sense.qbox.io/gist/93fb979393754b8bd9b19cb903a64027cba40ece

Elastic Search Date Range

I have a query that properly parses date ranges. However, my database has a default value that all dates have a timestamp of 00:00:00. This means that items that are still valid today are shown as expired even if they should still be valid. How can I adjust the following to look at just the date and not the time of the item (expirationDate).
{
"range": {
"expirationDate": {
"gte": "now"
}
}
}
An example of the data is:
"expirationDate": "2014-06-24T00:00:00.000Z",
Did you look into the different format options for dates stored in ElasticSearch? If this does not work for you or you don't want to store dates without the time you can try this query, which will work for your exact use case I guess:
{
"range": {
"expirationDate": {
"gt": "now-1d"
}
}
}
You can also round down the time so that your query returns anything that occurred since the beginning of the day:
Assuming that
now is 2017-03-07T07:00:00.000,
now/d is 2017-03-07T00:00:00.000
Your query would be:
{
"range": {
"expirationDate": {
"gte": "now/d"
}
}
}
elastic search documentation on rounding times

Filter facet returns count of all documents and not range

I'm using Elasticsearch and Nest to create a query for documents within a specific time range as well as doing some filter facets. The query looks like this:
{
"facets": {
"notfound": {
"query": {
"term": {
"statusCode": {
"value": 404
}
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
In the specific case, the total hits of the search is 21 documents, which fits the documents within that time range in Elasticsearch. But the "notfound" facet returns 38, which fits the total number of ErrorDocuments with a StatusCode value of 404.
As I understand the documentation, facets collects data from withing the search. In this case, the "notfound" facet should never be able to return a count higher that 21.
What am I doing wrong here?
There's a distinct difference between filter/query/filtered_query/facet filter which is good to know.
Top level filter
{
filter: {}
}
This acts as a post-filter, meaning it will filter the results after the query phase has ended. Since facets are part of the query phase filters do not influence the documents that are facetted over. Filters do not alter score and are therefor very cacheable.
Top level query
{
query: {}
}
Queries influence the score of a document and are therefor less cacheable than filters. Queries run in the query phase and thus also influence the documents that are facetted over.
Filtered query
{
query: {
filtered: {
filter: {}
query: {}
}
}
}
This allows you to run filters in the query phase taking advantage of their better cacheability and have them influence the documents that are facetted over.
Facet filter
"facets" : {
"<FACET NAME>" : {
"<FACET TYPE>" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
this allows you to apply a filter to the documents that the facet is run over. Remember that the it'll be a combination of the queryphase/facetfilter unless you also specify global:true on the facet as well.
Query Facet/Filter Facet
{
"facets" : {
"wow_facet" : {
"query" : {
"term" : { "tag" : "wow" }
}
}
}
}
Which is the one that #thomasardal is using in this case which is perfectly fine, it's a facet type which returns a single value: the query hit count.
The fact that your Query Facet returns 38 and not 21 is because you use a filter for your time range.
You can fix this by either doing the filter in a filtered_query in the query phase or apply a facet filter(not a filter_facet) to your query_facet although because filters are cached better you better use facet filter inside you filter facet.
Confusingly Filter Facets are specified using .FacetFilter() on the search object. I will change this in 1.0 to avoid future confusion.
Sadly: .FacetFilter() and .FacetQuery() in NEST do not allow you to specify a facet filter like you can with other facets:
var results = typedClient.Search<object>(s => s
.FacetTerm(ft=>ft
.OnField("myfield")
.FacetFilter(f=>f.Term("filter_facet_on_this_field", "value"))
)
);
You issue here is that you are performing a Filter Facet and not a normal facet on your query (which will follow the restrictions applied via the query filter). In the JSON, the issue is because of the "query" between the facet name "notfound" and the "terms" entry. This is telling Elasticsearch to run this as a separate query and facet on the results of this separate query and not your main query with the date range filter. So your JSON should look like the following:
{
"facets": {
"notfound": {
"term": {
"statusCode": {
"value": 404
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
Since I see you have this tagged with NEST as well, in your call using NEST, you are probably using FacetFilter on your search request, switch this to just Facet to get the desired result.

Resources