Difference between the result of two date fields then getting average - elasticsearch

I am looking to get the average of the difference between two different fields in an elastic DB, I have been able to write a query to return the last 1000 results, however I am not sure how I go about getting the difference between each result then getting an overall average.
Elastic query below:
POST my_index/_search
{
"size":1000,
"_source": ["date.time.received","date.time.sent"],
"query": {
"bool": {
"must": [
{
"range": {
"date.time.received": {
"gte": "2019-06-19"
}
}
},
{
"range": {
"date.time.sent": {
"gte": "2019-06-19"
}
}
}
]
}
}
}

I am using average aggregation and script
POST testindex5/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"date.time.received": {
"gte": "2019-06-19"
}
}
},
{
"range": {
"date.time.sent": {
"gte": "2019-06-19"
}
}
}
]
}
},
"aggs": {
"avg_resp": {
"avg": {
"script": "(doc['date.time.received'].value.toInstant().toEpochMilli()- doc['date.time.sent'].value.toInstant().toEpochMilli())/1000/86400" ---> convert to days
}
}
}
}

Related

need something like coalesce in elasticsearch

My current elasticsearch query is-
{
"must": [
{
"range": {
"firstClosedAt": {
"gte": 1667948400000,
"lte": 1668034800000
}
}
},
{
"term": {
"status": "CLOSED"
}
}
I want to modify it such that if "firstClosedAt" is null or not present then look for "closedAt".
Just like we have coalesce("firstClosedAt","closedAt") in sql
Help would be appreciated
There's no coalesce equivalent in ES, but you can do the query like below, which can read like: "either use firstClosedAt OR use closedAt if firstClosedAt does not exist":
{
"query": {
"bool": {
"filter": [
{
"term": {
"status": "CLOSED"
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"range": {
"firstClosedAt": {
"gte": 1667948400000,
"lte": 1668034800000
}
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "firstClosedAt"
}
},
"filter": {
"range": {
"closedAt": {
"gte": 1667948400000,
"lte": 1668034800000
}
}
}
}
}
]
}
}
]
}
}
}
You could, however, create a much simpler query if you create another date field at indexing time which would either take the value of firstClosedAt or closedAt if firstClosedAt does not exist

Compond query with Elasticsearch

I'm trying to perform a search with the intended criteria being (activationDate in range 1598889600 to 1602051579) or someFlag=true.
Below is the query I tried, but it does not yield any records with someFlag=true (even with a big size, e.g. 5000). My Elasticsearch does have a lot of records with someFlag=true.
There are about 3000 total documents and this query returns around 280 documents.
{
"query": {
"bool": {
"must": [
{
"range": {
"activationDate": {
"gte": 1598889600
}
}
},
{
"range": {
"activationDate": {
"lte": 1602051579
}
}
}
],
"should": {
"match": {
"someFlag": true
}
}
}
},
"from": 1,
"size": 1000
}
Am I missing something?
This should work:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
]
}
}
}
In theory this should do the same:
{
"query": {
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
}
However the first query I've given wraps bool clause within a filter context (so that it does not need to score and query becomes cacheable).
Your bool query might have not worked because you were using match query, not term. match is normally used for text search only.
Replace the must with an should and set minimum_should_match=1 as is is an OR query and you are fine if just one of the ceiterias is met by any record. Next reduce the two range criterias to just one, where you combine gte and lte.

Putting two queries together

How am I able to put both of these queries together, as you can see that query one is bringing back all the date from today and the second query is bringing back data for all users that has the name test in it.
So I want to bring back all of the data for data with the name that has test in it.
Could someone show me how this is done please?
Query one:
{
"_source":["VT"],
"query": {
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}}
}
Query two:
from elasticsearch import Elasticsearch
es = Elasticsearch(["9200"])
res = es.search(index="search", body=
{
"_source": ["DTDT", "TRDT"],
"query": {
"bool": {
"should": [
{"wildcard": {"N": "TEST*"}}
]
}
}
}, size=10
)
for doc in res['hits']['hits']:
print(doc)
You can use a bool query with two must clauses, like this:
{
"_source": ["DTDT", "TRDT", "VT"],
"query": {
"bool": {
"must": [
{
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}
},
{
"wildcard": {
"N": "TEST*"
}
}
]
}
}
}
Check out the docs for the bool query.
This will help you:
POST _search
{
"query": {
"bool": {
"must": [
{
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}
},
{
"match": {
"N": {
"query": "TEST",
"operator": "and"
}
}
}]
}
}
}

Filter with match_all VS query

I have 2 types of queries. They are both logically identical however I'm not sure if there is any performance difference between the two.
I will be glad if someone can enlighten me.
Using match_all and filter:
{
"query": {
"filtered": {
"query": {
"term": {
"user_id": "1234567"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Using term query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Looking at your query it seems like you don't care about how documents are scored based on the value of user_id field being "1234567". What I mean to say is - If more than one document have user_id set to "1234567", you don't care about the order of documents in the result. If that is the case, 2nd option is better with respect to performance because there is some computation cost associated with scoring in the 1st query while there is no scoring in the 2nd query. By the way, your 2nd query can also be simplified to below:
{
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}

Elasticsearch count number of occurrences

I am trying to write a elastic search query that will show me the number of returning users to a site. The following query return all unique users for a day by site. I am looking for the number of users that landed on a site only once for the time period.
GET 2015.*/_search?search_type=count
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-1d/d",
"lte": "now-1d/d"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "event:script_initiated"
}
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "session_depth:0"
}
}
}
}
]
}
}
}
},
"aggs": {
"Site Name": {
"terms": {
"field": "site_name",
"size": 1
},
"aggs": {
"uniques": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}
You will need to use metric aggregation and write a script for this.
In the script , you can check if the same user name has come across multiple documents and hence see the number of occurrence of a user.
Or you can wait to get the issue refereed in this bug resolved.

Resources