Building backend for reporting using elasticsearch: Is this query possible? - elasticsearch

Is it possible to query elasticsearch to sum the number of minutes an entry is in a given status based on the datetimes for a month?
For example, entries would be of the form:
Datetime Cluster Hosts_on Hosts_off Hosts_on_percentage
Oct 10 12:01 c101 10 2 .8333
Oct 10 12:02 c101 10 2 .8333
Oct 10 12:03 c101 10 2 .8333
Is it possible to sum the number of minutes c101 has had greater than 60% hosts based on the datetime?

Not exactly, but you can get pretty close with something like this:
POST /test_index/_search?search_type=count
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"Cluster": "c101"
}
},
{
"range": {
"Hosts_on_percentage": {
"gt": 0.6
}
}
}
]
}
}
}
},
"aggs": {
"min_datetime": {
"min": {
"field": "Datetime"
}
},
"max_datetime": {
"max": {
"field": "Datetime"
}
}
}
}
With the data you posted, this query returns:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"max_datetime": {
"value": 820980000,
"value_as_string": "Jan 10 12:03"
},
"min_datetime": {
"value": 820860000,
"value_as_string": "Jan 10 12:01"
}
}
}
So then you could calculate the difference in the min and max time client-side.
Or, if you just want a count of the documents returned, you can get it from:
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
Here is some code I used to test it (getting the date mapping right is important here):
http://sense.qbox.io/gist/c62289926a18e34b1b1b31e3643f36cbe5a7b4cf

You can definitive sum up minutes if they are in a field for each cluster and datetime. You have to use bucket and metrics aggregation. The condition could be made by a range aggregation.
Links i putted below; i hope i could give you an idea how to solve this task:-)
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html

Related

Elasticsearch aggregation shows incorrect total

Elasticsearch version is 7.4.2
I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
It returns:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with
{
"size": 10,
"from": 427000,
...
}
I get:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
But if I change from to be 426000 I still get results.
You are comparing the cardinality aggregation value of your field lastName.keyword to your total documents in the index, which is two different things.
You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have track_total_hits it shows 10k with relation gte means there are more than 10k documents matching your search query.
When it comes to your aggregation, I can see in both the case it returns the count as 429896 as this aggregation is not depend on the from/size you are mentioning for your query.
I was surprised when I found out that the cardinality parameter has Precision control.
Setting the maximum value was the solution for me.

elasticsearh how to find the percentage of document that has less prices than a number

Let's say i want to allow the users to enter the name of the city and the price of a thing (anything)
I need to know the percentage of (things) in that city that has less value for a field than an entered value:
i can search for a city like this:
"query": {
"filtered": {
"query": {
"match": {
"city": "Paris"
}
}
}
},
but i don't know how to do the other requirements, could you help me please?
Suposedly percentile-ranks-aggregation was intended as a means to achieve this
Example:
post <index>/<type>/_search
{
"filter": {
"term": {
"city": "blore"
}
},
"aggs": {
"rank": {
"percentile_ranks": {
"values": [
31
],
"field": "price"
}
}
},
"size": 0
}
But when testing I found that it is buggy I believe is related to issue.
So the work around would be to calculate the percentage on client side once the document counts have been acquired using a query similar to as follows:
post _search
{
"filter": {
"term": {
"city": "blore"
}
},
"aggs": {
"less_price_filter": {
"filter": {
"range": {
"price": {
"lt": 60
}
}
}
}
},
"size": 0
}
Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"less_price_filter": {
"doc_count": 2
}
}
}
The percentage can be calculated on client-side by dividing doc_count*100/total.

ElasticSearch count multiple fields grouped by

I have documents like
{"domain":"US", "zipcode":"11111", "eventType":"click", "id":"1", "time":100}
{"domain":"US", "zipcode":"22222", "eventType":"sell", "id":"2", "time":200}
{"domain":"US", "zipcode":"22222", "eventType":"click", "id":"3","time":150}
{"domain":"US", "zipcode":"11111", "eventType":"sell", "id":"4","time":350}
{"domain":"US", "zipcode":"33333", "eventType":"sell", "id":"5","time":225}
{"domain":"EU", "zipcode":"44444", "eventType":"click", "id":"5","time":120}
I want to filter these documents by eventType=sell and time between 125 and 400, group by domain followed by zipcode and count the documents in each bucket. So my output would be like (first and last docs would be ignored by the filters)
US, 11111,1
US, 22222,1
US, 33333,1
In SQL, this should have been straightforward. But I am not able to get this to work on ElasticSearch. Could someone please help me out here?
How do I write ElasticSearch query to accomplish the above?
This query seems to do what you want:
POST /test_index/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"eventType": "sell"
}
},
{
"range": {
"time": {
"gte": 125,
"lte": 400
}
}
}
]
}
}
}
},
"aggs": {
"zipcode_terms": {
"terms": {
"field": "zipcode"
}
}
}
}
returning
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"zipcode_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "11111",
"doc_count": 1
},
{
"key": "22222",
"doc_count": 1
},
{
"key": "33333",
"doc_count": 1
}
]
}
}
}
(Note that there is only 1 "sell" at "22222", not 2).
Here is some code I used to test it:
http://sense.qbox.io/gist/1c4cb591ab72a6f3ae681df30fe023ddfca4225b
You might want to take a look at terms aggregations, the bool filter, and range filters.
EDIT: I just realized I left out the domain part, but it should be straightforward to add in a bucket aggregation on that as well if you need to.

Elastic search find total hits for a date

I have a requirement to find total records in my user table for a particular date, i can able to find the total hits, but cannot find a query to fetch date for a particular date.
Query
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"daily_team": {
"filter": {
"range": {
"created_date": {
"from": "2015-01-02",
"to": "2015-01-02"
}
}
}
}
}
}
Result
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 33,
"max_score": 0,
"hits": []
},
"aggregations": {
"daily_team": {
"doc_count": 1
}
}
}
Here "total": 33, but its for total number of records in my database. I have only 22 records from "starting date" to "2015-01-02". Could you please help me to find query for the same. Thanks
I found a solution, just removed "from" parameter from range, now i can get the size from doc_count.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"daily_team": {
"filter": {
"range": {
"created_date": {
"to": "2015-01-02"
}
}
}
}
}
}

Elasticsearch Cardinality Aggregation giving completely wrong results

I am saving each page view of a website in an ES index, where each page is recognized by an entity_id.
I need to get the total count of unique page views since a given point in time.
I have the following mapping:
{
"my_index": {
"mappings": {
"page_views": {
"_all": {
"enabled": true
},
"properties": {
"created": {
"type": "long"
},
"entity_id": {
"type": "integer"
}
}
}
}
}
}
According to the Elasticsearch docs, the way to do that is using a cardinality aggregation.
Here is my search request:
GET my_index/page_views/_search
{
"filter": {
"bool": {
"must": [
[
{
"range": {
"created": {
"gte": 9999999999
}
}
}
]
]
}
},
"aggs": {
"distinct_entities": {
"cardinality": {
"field": "entity_id",
"precision_threshold": 100
}
}
}
}
Note, that I have used a timestamp in the future, so no results are returned.
And the result I'm getting is:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"distinct_entities": {
"value": 116
}
}
}
I don't understand how the unique page visits could be 116, giving that there are no page visits at all for the search query. What am I doing wrong?
Your aggregation is returning the global value for the cardinality. If you want it to return only the cardinality of the filtered set, one way you could do that is to use a filter aggregation, then nest your cardinality aggregation inside that. Leaving out the filtered query for clarity (you can add it back in easily enough), the query I tried looks like:
curl -XPOST "http://localhost:9200/my_index/page_views/_search " -d'
{
"size": 0,
"aggs": {
"filtered_entities": {
"filter": {
"bool": {
"must": [
[
{
"range": {
"created": {
"gte": 9999999999
}
}
}
]
]
}
},
"aggs": {
"distinct_entities": {
"cardinality": {
"field": "entity_id",
"precision_threshold": 100
}
}
}
}
}
}'
which returns:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"filtered_entities": {
"doc_count": 0,
"distinct_entities": {
"value": 0
}
}
}
}
Here is some code you can play with:
http://sense.qbox.io/gist/bd90a74839ca56329e8de28c457190872d19fc1b
I used Elasticsearch 1.3.4, by the way.

Resources