Elasticsearch - Remove double results in search - elasticsearch

I don't know how to remove double results with the same value in one field.
My Searchquery:
query :{
range : {
"endtime" : {
"lt" : "2017-02-09T20:00:00",
"gt" : "2017-02-09T01:00:00"
}
}
}
In my results there's one field called "link" which has often the same value (f.ex. https://www.facebook.com).
I would prefer a solution for my query, that would be great.
Thanks.
Greetings!

You can do a terms aggregation.
GET /cars/transactions/_search?search_type=count
{
"query": {
"range" : {
"endtime" : {
"gte" : "2017-02-09T20:00:00",
"lt" : "2017-02-09T01:00:00"
}
}
},
"aggs": {
"distinct_links": {
"terms": {
"field": "links",
"size": 100
}
}
}
}
something like this.

Related

ElasticSearch: Sort Aggregations by Filtered Average

I have an ElasticSearch index with documents structured like this:
"created": "2019-07-31T22:44:41.437Z",
"id": "2956",
"rating": 1
If I wish to create an aggregation of the id fields which is sorted on the average of the rating, that could be handled by:
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
However, I'm looking to only factor in documents which have a created value that was within the last week (and then take the average of those rating fields).
My naive thoughts on this would be to apply a filter or range within the sort aggregation, but an aggregation cannot have multiple types, and looking through the avg documentation, I don't see a means to put it in the avg. Optimistically attempting to put range fields in the avg regardless of what the documentation says yielded no results (as expected).
How would I go about achieving this?
Try adding a bool query to the body with a range query:
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": one_week_ago,
}
}
}
}
},
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
and you can query for dynamic dates like this
as Tom referred but use "now-7d/d"
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": "now-7d/d"
}
}
}
}
}

How to group documents by hours in elastic search aggregation?

I tried to group my document by hours for a day through aggregation but always get exception "expected field name but got [START_OBJECT]"? What's the problem?
{
"query" : {
"bool" : {
"must" : {
"range" : {
"timestamp" : {
"from" : "2017-08-14 00:00:00",
"to" : "2017-08-15 00:00:00",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"aggs": {
"result_by_hours": {
"histogram": {
"script": "doc.timestamp.date.getHourOfDay()",
"interval": 1
}
}
}
}
What I expect is to return the number of documents for each hour on yesterday. How can I use dynamic real time instead of "2017-08-14 - 2017-08-15"?
Thanks in advance:)
Depending on ES version, you can use range filter/query relative to "now", ex now-1d/d will go 1 day back in time.
See examples at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
As for the aggs you can also group by interval of for instance an hour using date_histogram with interval
In ES 5.5:
Query:
"range" : {
"timestamp" : {
"gte" : "now-1d/d,
"lte" : "now/d"
}
}
Aggs:
{
"aggs" : {
"values_over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "1h"
}
}
}
}

Elasticsearch - EXISTS syntax + Filter not working

I am trying to query for a date range where a particular field exists. This seems like it would be easy but I am sensing that the keyword "exists" has changed per the documentation. I am on 5.4. https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-exists-filter.html
I use #timestamp for dates and the field "error_data" is in the mapping and only appears if an error condition is found.
Here is my query....
GET /filebeat-2017.07.25/_search
{
"query": {
"bool" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
},
"exists": {
"field": "error_data"
}
}
}
}
but it says that "[bool] query does not support [exists]" whereas the following does not work either but gets an parsing error message of "[exists] malformed query, expected [END_OBJECT] but found [FIELD_NAME]" on line 6 column 9. Thanks for your help.
GET /filebeat-2017.07.25/_search
{
"query": {
"exists": {
"field": "error_data"
},
"bool" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
}
}
}
}
You're almost there. Try like this:
GET /filebeat-2017.07.25/_search
{
"query": {
"bool" : {
"filter" : [
{
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
},
{
"exists": {
"field": "error_data"
}
}
]
}
}
}
i.e. the bool/filter clause must be an array if you have several clauses to put in it:

Converting SQL query to ElasticSearch Query

I want to convert the following sql query to Elasticsearch one. can any one help in this.
select csgg, sum(amount) from table1
where type in ('a','b','c') and year=2016 and fc="33" group by csgg having sum(amount)=0
I tried following way:enter code here
{
"size": 500,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"fc" : "33"}},
{"term" : {"year" : 2016}}
],
"should" : [
{"terms" : {"type" : ["a","b","c"] }}
]
}
}
}
},
"aggs": {
"group_by_csgg": {
"terms": {
"field": "csgg"
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
}
}
}
}
}
but not sure if I am doing right as its not validating the results.
seems query to be added inside aggregation.
Assuming that you use Elasticsearch 2.x, there is a possibility to have the having-semantics in Elasticsearch.
I'm not aware of a possibility prior 2.0.
You can use the new Pipeline Aggregation Bucket Selector Aggregation, which only selects the buckets, which meet a certain criteria:
POST test/test/_search
{
"size": 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"fc" : "33"}},
{"term" : {"year" : 2016}},
{"terms" : {"type" : ["a","b","c"] }}
]
}
}
}
},
"aggs": {
"group_by_csgg": {
"terms": {
"field": "csgg",
"size": 100
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
},
"no_amount_filter": {
"bucket_selector": {
"buckets_path": {"sumAmount": "sum_amount"},
"script": "sumAmount == 0"
}
}
}
}
}
}
However there are two caveats. Depending on your configuration, it might be necessary to enable scripting like that:
script.aggs: true
script.groovy: true
Moreover, as it works on the parent buckets it is not guaranteed that you get all buckets with amount = 0. If the terms aggregation selects only terms with sum amount != 0, you will have no result.

Range query in elasticsearch does not work properly

I have an index that contains objects eventvalue-eventtime. I want to write a query that will return aggregated event count based on eventvalue for the last 30 seconds. Also, I need empty buckets if for a given seconds there was no events - I need to display this data on a graph.
So I wrote the following query:
{
"query" : {
"bool" : {
"must" : [
{
"range" : {
"eventtime" : {
"gte" : "now-30s/s",
"lte" : "now/s",
"format" : "yyyy-MM-dd HH:mm:ss",
"time_zone": "+03:00"
}
}
},
{
"range" : {
"eventvalue" : {
"lte" : 3
}
}
}
]
}
},
"aggs": {
"values_agg": {
"terms": {
"field": "eventvalue",
"min_doc_count" : 0,
"order": {
"_term": "asc"
}
},
"aggs": {
"events_over_time" : {
"date_histogram" : {
"field" : "eventtime",
"interval" : "1s",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "now-30s/s",
"max" : "now/s"
},
"format" : "yyyy-MM-dd HH:mm:ss",
"time_zone": "+03:00"
}
}
}
}
}
}
This query is not working properly and I don't know why. Specifically, the first "range" query gives me desired interval (if I remove it I'm getting values from all time). But the second "range" query seems to have no effect. Eventvalue can be anywhere from 1 to 10 and the desired effect is that I will have three buckets for eventvalues 1-3. However, I get all 10 buckets with all events.
How can I fix this query so it still returns empty buckets but only for selected evenvalues?
I believe you need to remove the "min_doc_count": 0 from your terms aggregation. To achieve the empty buckets you're aiming for, you need only use min_doc_count in the date_histogram aggregation.
Per the documentation for the terms aggregation:
Setting min_doc_count=0 will also return buckets for terms that didn’t
match any hit.
This explains why you are seeing buckets for eventvalues that are greater than 3. They were filtered out by the query, but brought back in by the terms aggregation.
UPDATE
Since there is a possibility that the eventvalues may not exist anywhere in the 30sec time slice, the other approach I would recommend is to manually specify the discrete values you want to use as buckets using a filters aggregation. See the documentation here.
Try using this for your aggregations:
"aggs": {
"values_agg": {
"filters": {
"filters": {
"1": { "term": { "eventvalue": 1 }},
"2": { "term": { "eventvalue": 2 }},
"3": { "term": { "eventvalue": 3 }}
}
},
"aggs": {
"events_over_time" : {
"date_histogram" : {
"field" : "eventtime",
"interval" : "1s",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "now-30s/s",
"max" : "now/s"
},
"format" : "yyyy-MM-dd HH:mm:ss",
"time_zone": "+03:00"
}
}
}
}
}

Resources