ElasticSearch date range - elasticsearch

I have the following query:
{
"query": {
"query_string": {
"query": "searchTerm",
"default_operator": "AND"
}
},
"facets": {
"counts": {
"date_histogram": {
"field": "firstdate",
"interval": "hour"
}
}
}
and I would like to add a date range to it, so as to retrieve values for the field firstdate which are within a specific from/to interval. Any suggestions on how to do it? Many thanks!

you just need to add a range filter to your query:
{
"query":{
"filtered": {
"query": {
"query_string": {"query": "searchTerm", "default_operator": "AND" }
},
"filter" : {
"range": {"firstdate": {"gte": "2014-10-21T20:03:12.963","lte": "2014-11-24T20:03:12.963"}}
}
}
},
"facets": {
"counts": {
"date_histogram": {
"field": "firstdate",
"interval": "hour"
}
}
}
}

Boolean query will work too,
{
"query" :{
"bool" : {
"must" : {
"range": {"firstdate": {"gte": "2014-10-21T20:03:12.963","lte": "2014-11-24T20:03:12.963"}}
},
"must" : {
"query_string": {
"query": "searchTerm",
"default_operator": "AND"
}
}
}
},
"facets": {
"counts": {
"date_histogram": {
"field": "firstdate",
"interval": "hour"
}
}
}
}

This query displays the results which appears in the given date range. "date_field_name" is the field name on which you want to set date range filters.
GET index_name/_search
{
"query": {
"bool": {
"must":[
{
"range": {
"date_field_name": {
"gte": "2019-09-23 18:30:00",
"lte": "2019-09-24 18:30:00"
}
}
}
]
}
},
"size": 10
}

https://your_elasticsearch/your_index PUT
{
"mappings": {
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
https://your_elasticsearch/your_index/_search POST
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"gte": "2020-04-01 08:03:12",
"lte": "2020-04-01 20:03:12"
}
}
}
]
}
}
}

Related

Combine multiple individual queries into one to get aggregated result in Elasticsearch

I have built two queries in ElasticSearch to get the counts for each error message. for example, the first query is to get how many error messages related to "was not found" error
GET /logstash*/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"kubernetes.pod_name": "api"
}
},
{
"match": {
"log": "error"
}
},
{
"match": {
"log": {
"query": "was not found",
"operator": "and"
}
}
},
{
"range": {"#timestamp": {
"time_zone": "CET",
"gt": "now-7d",
"lte": "now"}}
}
]
}
}
}
},
"aggs" : {
"type_count" : {
"value_count" : {
"script" : {
"source" : "doc['log.keyword'].value"
}
}
}
}
}
The second query is to get the count of error messages related to "Duplicate Entry" error
GET /logstash*/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"kubernetes.pod_name": "api"
}
},
{
"match": {
"log": "error"
}
},
{
"match": {
"log": {
"query": "Duplicate entry",
"operator": "and"
}
}
},
{
"range": {"#timestamp": {
"time_zone": "CET",
"gt": "now-7d",
"lte": "now"}}
}
]
}
}
}
},
"aggs" : {
"type_count" : {
"value_count" : {
"script" : {
"source" : "doc['log.keyword'].value"
}
}
}
}
}
My boss really wants me to combine these individual query into a one big query, then get the list of counts for each error messages in one output. Since we have a lot of error messages, which means we have to write each query for each error message, then we have to run each query to get the counts. Is there a way I can click one run to get the list of counts?
I have been trying use query string query and looking for solutions on either Stack Overflow and Documentation. However, there is no luck
You can use filter aggregation along with the value_count aggregation to combine these two queries. In both the queries, out of the 4 queries inside must clause only one differs. You can take this out and combine them with the two filter aggregations as below:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"kubernetes.pod_name": "api"
}
},
{
"match": {
"log": "error"
}
},
{
"range": {
"#timestamp": {
"time_zone": "CET",
"gt": "now-7d",
"lte": "now"
}
}
}
]
}
}
}
},
"aggs": {
"not_found_count": {
"filter": {
"match": {
"log": {
"query": "was not found",
"operator": "and"
}
}
},
"aggs": {
"count": {
"value_count": {
"script": {
"source": "doc['log.keyword'].value"
}
}
}
}
},
"duplicate_entry_count": {
"filter": {
"match": {
"log": {
"query": "Duplicate entry",
"operator": "and"
}
}
},
"aggs": {
"count": {
"value_count": {
"script": {
"source": "doc['log.keyword'].value"
}
}
}
}
}
}
}

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

Must not query elasticsearch

I have my request:
{
"size": 10,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{"term": {"event": "matchmaking_done"}}
]
}
},
"filter": {
"range": {
"#timestamp": {
"gt" : "2016-06-01T00:00:00.000Z",
"lte" : "2016-06-01T00:05:00.000Z"
}
}
}
}
},
"aggs" : {
"user-ids" : {
"terms" : { "field" : "user_id",
"size": 0
}
}
}
}
And I need to add into this request parameter - does not contain field pvp_league! I tried add must_not but can't understand how to do this correct.
Help please!
You answered it yourself, but the ES 2.x way to do this is to not use the filtered query because it has been deprecated and it will be removed in ES 5.0. ES 2.x introduces the concept of the "filter" context rather than every query being either just a query or a filter; now every query is both a filter or a query (scored), just depending on the context it's used in.
For your query, this therefore becomes a little simpler because of the simplified bool / filter syntax:
{
"size":10,
"query":{
"bool":{
"must":[
{
"term":{
"event":"matchmaking_done"
}
}
],
"must_not":[
{
"exists":{
"field":"pvp_league"
}
}
],
"filter":[
{
"range":{
"#timestamp":{
"gt":"2016-06-01T00:00:00.000Z",
"lte":"2016-06-01T00:05:00.000Z"
}
}
}
]
}
},
"aggs":{
"user-ids":{
"terms":{
"field":"user_id",
"size":0
}
}
}
}
As a very big aside, specifying "size" : 0 for the terms aggregation, you are requesting all unique terms, up to INT_MAX. That is not a scalable request (works great with 10 user_ids, or even 100, but not 10000 users).
As a not-so-bad aside, your request doesn't need a query context at all because nothing about the search side of it cares about relevance. Your term query ("event" : "matchmaking_done") either matches or it doesn't. Since you either want it to match or not, but you don't really care about order inherently, you should use this in the filter context. This changes the request to:
{
"size": 10,
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "pvp_league"
}
}
],
"filter":[
{
"range": {
"#timestamp": {
"gt":"2016-06-01T00:00:00.000Z",
"lte":"2016-06-01T00:05:00.000Z"
}
}
},
{
"term": {
"event": "matchmaking_done"
}
}
]
}
},
"aggs": {
"user-ids": {
"terms": {
"field": "user_id",
"size": 0
}
}
}
}
I've found solution! It looks like this:
{
"size": 10,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{"term": {"event": "matchmaking_done"}}
],
"must_not": [
{"filtered": {
"filter": {
"exists": {
"field": "pvp_league"
}
}
}
}
]
}
},
"filter": {
"range": {
"#timestamp": {
"gt" : "2016-06-01T00:00:00.000Z",
"lte" : "2016-06-01T00:05:00.000Z"
}
}
}
}
},
"aggs" : {
"user-ids" : {
"terms" : { "field" : "user_id",
"size": 0
}
}
}
}

Elasticsearch Aggregation Word Count with using Stopwords

I'm using elasticsearch to store my data. I want to count the words in my documents. But I want to see the result without the stopwords. For example; in my current result I see 'and' is my top word. But I want to remove it. Currently I have 3802 stopwords in my stopword.txt. I don't want any of them to be shown in the aggregation result. How can I do that? MY current query;
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-0d/d"
}
}
}
]
}
},
"aggs": {
"words": {
"terms": {
"size" : 0,
"field": "text"
}
}
}
}
The way I want query to work is;
{
"aggs": {
"filtered": {
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-0d/d"
}
}
}
]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "/work/projects/stop_words.txt"
}
},
"aggs": {
"words": {
"terms": {
"size" : 0,
"field": "text"
}
}
}
}
}
}
By the way, I have my stopwords list in my custom analyzer.But it doesn't work the way I want.

Aggregate Date Range filtered values in elastic search

I need to filter a group of values based on date (added field here) and then group it by device_id. So I am using the following thing :
{
"aggs":{
"dates_between":{
"filter": {
"range" : {
"added" : {
"gte": "2014-07-01 00:00:00",
"lte" :"2014-08-01 00:00:00"
}
}
},
"aggs": {
"group_by_device_id": {
"terms": {
"field": "device_id"
}
}
}
}
}
}
This is giving me an error "Failed to parse source" when executing the query. This is the right way of doing it ?
If I execute the date aggregation only it is showing values which are not in the specified date range
It is the other way around dates_between is a nested aggregation of group_by_device_id
"aggs": {
"group_by_device_id": {
"terms": {
"field": "device_id"
},
"aggs": {
"dates_between": {
"filter": {
"range": {
"added": {
"gte": "2014-07-01 00:00:00",
"lte": "2014-08-01 00:00:00"
}
}
}
}
}
}
}
You could also move the filter into the the query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"added": {
"gte": "2014-07-01 00:00:00",
"lte": "2014-08-01 00:00:00"
}
}
}
}
},
"aggs": {
"group_by_device_id": {
"terms": {
"field": "device_id"
}
}
}
}

Resources