Elasticsearch - EXISTS syntax + Filter not working - elasticsearch

I am trying to query for a date range where a particular field exists. This seems like it would be easy but I am sensing that the keyword "exists" has changed per the documentation. I am on 5.4. https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-exists-filter.html
I use #timestamp for dates and the field "error_data" is in the mapping and only appears if an error condition is found.
Here is my query....
GET /filebeat-2017.07.25/_search
{
"query": {
"bool" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
},
"exists": {
"field": "error_data"
}
}
}
}
but it says that "[bool] query does not support [exists]" whereas the following does not work either but gets an parsing error message of "[exists] malformed query, expected [END_OBJECT] but found [FIELD_NAME]" on line 6 column 9. Thanks for your help.
GET /filebeat-2017.07.25/_search
{
"query": {
"exists": {
"field": "error_data"
},
"bool" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
}
}
}
}

You're almost there. Try like this:
GET /filebeat-2017.07.25/_search
{
"query": {
"bool" : {
"filter" : [
{
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
},
{
"exists": {
"field": "error_data"
}
}
]
}
}
}
i.e. the bool/filter clause must be an array if you have several clauses to put in it:

Related

How to group documents by hours in elastic search aggregation?

I tried to group my document by hours for a day through aggregation but always get exception "expected field name but got [START_OBJECT]"? What's the problem?
{
"query" : {
"bool" : {
"must" : {
"range" : {
"timestamp" : {
"from" : "2017-08-14 00:00:00",
"to" : "2017-08-15 00:00:00",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"aggs": {
"result_by_hours": {
"histogram": {
"script": "doc.timestamp.date.getHourOfDay()",
"interval": 1
}
}
}
}
What I expect is to return the number of documents for each hour on yesterday. How can I use dynamic real time instead of "2017-08-14 - 2017-08-15"?
Thanks in advance:)
Depending on ES version, you can use range filter/query relative to "now", ex now-1d/d will go 1 day back in time.
See examples at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
As for the aggs you can also group by interval of for instance an hour using date_histogram with interval
In ES 5.5:
Query:
"range" : {
"timestamp" : {
"gte" : "now-1d/d,
"lte" : "now/d"
}
}
Aggs:
{
"aggs" : {
"values_over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "1h"
}
}
}
}

Return every nth record in Elastic Search

I have time series data and I want to query Elasticsearch by using time ranges with a fixed set of 2000 records.
I have this query
GET http://IP:9200/MYINDEX/_search
{
"_source": ["XXX1", "XXX2","timestamp"],
"sort" :
{ "#timestamp" : {"order" : "asc"}},
"query" : {
"range" : {
"#timestamp" : {
"gte" : "2017-02-10T10:55:31,259Z",
"lte" : "2017-02-10T10:55:32,272Z"
}
}
}
Is it possible to return only every 5th or 10th record?
I found some filter scripts but none of them seems to work.
Since there could be millions of records in one index its crucial to limit the number of returned values!
EDIT: rework query becasue filtered was replaced by bool:
{
"_source":[
"XXX1",
"XXX2",
"timestamp"
],
"sort":{
"#timestamp":{
"order":"asc"
}
},
"query":{
"bool":{
"must":{
"range":{
"#timestamp":{
"gte":"2017-02-10T10:55:31,259Z",
"lte":"2017-02-10T10:55:32,272Z"
}
}
},
"filter":{
"script":{
"script":"doc['#timestamp'].value % 5 == 0"
}
}
}
}
}
There is one way to do it. You can add a field which can behave like an auto increment field of a DB.
Then you can add a filter to the query that you want to run.
"filter": {
"script": {
"script": "doc['auto_increment'].value % n == 0",
"params" : {
"n" : 5
}
}
}
This should work for indexes that have time series data and are going to be searched for a range. It will not work properly if you have an added text search to the field.
For the query that you are trying it would transform into something like this.
GET http://IP:9200/MYINDEX/_search
{
"_source": ["XXX1", "XXX2","timestamp"],
"sort" :
{ "#timestamp" : {"order" : "asc"}},
"query" : {
"filtered": {
"query": {
"range" : {
"#timestamp" : {
"gte" : "2017-02-10T10:55:31,259Z",
"lte" : "2017-02-10T10:55:32,272Z"
}
}
},
"filter": {
"script": {
"script": "doc['auto_increment'].value % 5 == 0"
}
}
}
}
}
For reference do look into this

Elasticsearch - Remove double results in search

I don't know how to remove double results with the same value in one field.
My Searchquery:
query :{
range : {
"endtime" : {
"lt" : "2017-02-09T20:00:00",
"gt" : "2017-02-09T01:00:00"
}
}
}
In my results there's one field called "link" which has often the same value (f.ex. https://www.facebook.com).
I would prefer a solution for my query, that would be great.
Thanks.
Greetings!
You can do a terms aggregation.
GET /cars/transactions/_search?search_type=count
{
"query": {
"range" : {
"endtime" : {
"gte" : "2017-02-09T20:00:00",
"lt" : "2017-02-09T01:00:00"
}
}
},
"aggs": {
"distinct_links": {
"terms": {
"field": "links",
"size": 100
}
}
}
}
something like this.

elasticsearch query on all array elements

How can I search for documents that have all of the specified tags in the following query? I tried minimum_should_match and "execution": "and", but none of them is supported in my query.
GET products/fashion/_search
{
"query": {
"constant_score": {
"filter" : {
"bool" : {
"must" : [
{"terms" : {
"tags" : ["gucci", "dresses"]
}},
{"range" : {
"price.value" : {
"gte" : 100,
"lt" : 1000
}
}}
]
}
}
}
},
"sort": { "date": { "order": "desc" }}
}
====== UPDATE
I found a way to build my queries. The task was to reproduce the following mongodb query in the elasticsearch:
{
"tags": {
"$all":["gucci","dresses"]
},
"price.value":{"$gte":100,"$lte":1000}
}
And here is my elasticsearch query
GET products/fashion/_search
{
"query": {
"bool" : {
"filter" : [
{"term" : {
"tags" : "gucci"
}},
{"term" : {
"tags" : "dresses"
}},
{"range" : {
"price.value" : {
"gte" : 100,
"lt" : 1000
}
}}
]
}
}
}
Do you have a mapping defined for your index? By default, Elasticsearch will analyze string fields. If you want to find exact terms like you are above, you need to specify them as not_analyzed in the mapping.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html#_term_filter_with_text

Converting SQL query to ElasticSearch Query

I want to convert the following sql query to Elasticsearch one. can any one help in this.
select csgg, sum(amount) from table1
where type in ('a','b','c') and year=2016 and fc="33" group by csgg having sum(amount)=0
I tried following way:enter code here
{
"size": 500,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"fc" : "33"}},
{"term" : {"year" : 2016}}
],
"should" : [
{"terms" : {"type" : ["a","b","c"] }}
]
}
}
}
},
"aggs": {
"group_by_csgg": {
"terms": {
"field": "csgg"
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
}
}
}
}
}
but not sure if I am doing right as its not validating the results.
seems query to be added inside aggregation.
Assuming that you use Elasticsearch 2.x, there is a possibility to have the having-semantics in Elasticsearch.
I'm not aware of a possibility prior 2.0.
You can use the new Pipeline Aggregation Bucket Selector Aggregation, which only selects the buckets, which meet a certain criteria:
POST test/test/_search
{
"size": 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"fc" : "33"}},
{"term" : {"year" : 2016}},
{"terms" : {"type" : ["a","b","c"] }}
]
}
}
}
},
"aggs": {
"group_by_csgg": {
"terms": {
"field": "csgg",
"size": 100
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
},
"no_amount_filter": {
"bucket_selector": {
"buckets_path": {"sumAmount": "sum_amount"},
"script": "sumAmount == 0"
}
}
}
}
}
}
However there are two caveats. Depending on your configuration, it might be necessary to enable scripting like that:
script.aggs: true
script.groovy: true
Moreover, as it works on the parent buckets it is not guaranteed that you get all buckets with amount = 0. If the terms aggregation selects only terms with sum amount != 0, you will have no result.

Resources