Finding the max date in elastic search query - elasticsearch

Can you please help me to convert this sql query to elastic search query?
SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group

What if you have your query as below, assuming that you're doing an HTTP POST. You could simply use max aggregations of ES in order to get the max value and use terms within aggs in order to get the GROUP BY function done.
Request:
yourhost:9200/your_index/_search
Request Body:
{
"query": {
"query_string": {
"query": "checks > 0" <-- check whether this works, if not use the range query
}
},
"aggs": {
"groupby_group": {
"terms": {
"field": "group"
},
"aggs": {
"maximum": {
"max": {
"script": "doc['date'].value"
}
}
}
}
}
}
For checks > 0, you could go with the range query as well within the query, which could look like:
"range" : {
"checks" : {
"gte" : 0
}
}
This one could help you on executing aggregations. But please do make sure that you've enabled scripting from your elasticsearch.yml before you try querying:
script.inline: on
Hope this helps!

Related

How to limit search results from each index in a multi index search query?

I am using Elasticsearch version 6.3 and I want to make queries across multiple indices.Elasticsearch has support for this and I can give multiple indices as comma separated values in the url with one query in request body and also give size parameter to limit the number of search results returned.However this limits the size of the overall search results and might lead to no results from some indexes- so instead I want to fetch first n number of results from each index.
I tried using multi search api (_msearch) but with that it seems I have to give the same query and size for all indexes and that works, but I am not able to get a single aggregation over the entire result , is there any way to address both the issues?
Solution 1:
You're on the right path with the _msearch query. What I would do is to issue one query per index (no aggregations!) with the size you want for that index, as well as another query just for the aggregations, like this:
{ "index": "index1" }
{ "size": 5, "query": { ... }}
{ "index": "index2" }
{ "size": 5, "query": { ... }}
{ "index": "index3" }
{ "size": 5, "query": { ... }}
{ "index": "index1,index2,index3" }
{ "size": 0, "query": { ... }, "aggs": { ... } }
So the first three queries will return document hits from each of the three indexes and the last query will return the aggregation computed on all indexes, but no documents.
Solution 2:
Another way to tackle this if you have a small size, is to have a single query in the query part and then aggregate on the index name and retrieve hits from each index using top_hits, like this:
POST index1,index2,index3/_search
{
"size": 0,
"query": { ... },
"aggs": {
"indexes": {
"terms": {
"field": "_index",
"size": 50
},
"aggs": {
"hits": {
"top_hits": {
"size": 5
}
}
}
}
}
}

"Filter then Aggregation" or just "Filter Aggregation"?

I am working on ES recently and I found that I could achieve the almost same result but I have no clear idea as to the DIFFERENCE between these two.
"Filter then Aggregation"
POST kibana_sample_data_flights/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"term": {
"DestCountry": "CA"
}
}
}
},
"aggs": {
"ca_weathers": {
"terms": { "field": "DestWeather" }
}
}
}
"Filter Aggregation"
POST kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"ca": {
"filter": {
"term": {
"DestCountry": "CA"
}
},
"aggs": {
"_weathers": {
"terms": { "field": "DestWeather" }
}
}
}
}
}
My Questions
Why there are two similar functions? I believe I am wrong about it but what's the difference then?
(please do ignore the result format, it's not the question I am asking ;p)
Which is better if I want to filter out the unrelated/unmatched and start the aggregation on lots of documents?
When you use it in "query", you're creating a context on ALL the docs in your index. In this case, it acts like a normal filter like: SELECT * FROM index WHERE (my_filter_condition1 AND my_filter_condition2 OR my_filter_condition3...).
When you use it in "aggs", you're creating a context on ALL the docs that might have (or haven't) been previously filtered. Let's say that if you have an structure like:
#OPTION A
{
"aggs":{
t_shirts" : {
"filter" : { "term": { "type": "t-shirt" } }
}
}
}
Without a "query", is exactly the same as having
#OPTION B
{
"query":{
"filter" : { "term": { "type": "t-shirt" } }
}
}
BUT the results will be returned in different fields.
In the Option A, the results will be returned in the aggregations field.
In the Option B, the results will be returned in the hits field.
I would recommend to apply your filters always on the query part, so you can work with subsecuent aggregations of the already filtered docs. Also because Aggrgegations cost more performance than queries.
Hope this is helpful! :D
Both filters, used in isolation, are equivalent. If you load no results (hits), then there is no difference. But you can combine listing and aggregations. You can query or filter your docs for listing, and calculate aggregations on bucket further limited by the aggs filter. Like this:
POST kibana_sample_data_flights/_search
{
"size": 100,
"query": {
"bool": {
"filter": {
"term": {
... some other filter
}
}
}
},
"aggs": {
"ca_filter": {
"term": {
"TestCountry": "CA"
}
},
"aggs": {
"ca_weathers": {
"terms": { "field": "DestWeather" }
}
}
}
}
But more likely you will need the other way, ie. make aggregations on all docs, to display summary informations, while you display docs from specific query. In this case you need to combine aggragations with post_filter.
Answer from #Val's comment, I may just quote here for reference:
In option A, the aggregation will be run on ALL documents. In option B, the documents are first filtered and the aggregation will be run only on the selected documents. Say you have 10M documents and the filter select only a 100, it's pretty evident that option B will always be faster.

Finding unique documents in an index in elastic search

I am having duplicates entries in my index and I want to find out only unique documents in the index . TopHits aggregation solves this problem but my other requirement is to support sorting on the results (across buckets). Hence I cant use top hits aggregation.
Other options I can think of is to write a plugin or use painless script.
Need help to solve this.It would be great if you can redirect me to some examples.
Top hits aggregation find the value from the complete result set while If you use cardinality it gives only filtered result set.
You can use cardinality aggregation like below:
{
"aggs" : {
"UNIQUE_COUNT" : {
"cardinality" : {
"field" : "your_field"
}
}
}
}
This aggregation comes with some responsibility, You can find the below ElasticSearch documentation to understand it better.
Link: Cardinality Aggregation
For sorting, you can refer the below example, where you can pass your aggregation in order of terms for which your bucket get created:
{
"aggs": {
"AGG_NAME": {
"terms": {
"field": "you_field",
"size": 10,
"order": {
"UNIQUE_COUNT.doc_count": "asc"
},
"min_doc_count": 1
},
"aggs": {
"UNIQUE_COUNT": {
"cardinality": {
"field": "your_field"
}
}
}
}
}
}

Elasticsearch - Search by a field but distinct another field

What I mean would be equivalent of this SQL query:
SELECT distinct fieldA
from DB
where fieldB like '%value%'
What is the Term Aggregation of this query in elastic search?
You can use wildcard query in conjunction with terms aggregation to fetch distinct values for the field.
You can use the following query to get the results.
POST test/_search
{
"query": {
"wildcard": {
"fieldB": {
"value": "*ali*"
}
}
},
"aggs": {
"distinct_fieldA": {
"terms": {
"field": "fieldA",
"size": 10
}
}
}
}
Hope this works

how to achieve an exists filter on ES5.0?

The exists filter has been replaced by an exists query in ES5.0.
So how can we achieve, within the same query the equivalent? In other words, we don't want to do two query but just on for various aggregations, including the exists count?
So I want to count the number of time the field "the_field" exists (or is not null)
"aggregation":{
"exists_count":{
"filter":{
"exists":{
"field":"the_field"
}
}
}
}
I think you can use stats aggregation,
{ "aggs" :
{ "time_stats" :
{ "extended_stats" :
{ "field" : "time" }
}
}
}
Look at elastic stats doc
With Elastic 5.0, filters didn't so much get replaced by queries, but combined. Syntactically they look the same, but the context in which you use it determines if it gets interpreted as a query (factors into scoring) or as a filter to simply weed out documents. The below code should achieve exactly what you want:
{
"query": {
"match_all": {}
},
"aggs": {
"field_exists": {
"filter": {
"exists": {
"field": "name"
}
}
}
}
}
The aggregation returned will look something like this, with the doc_count representing the number of documents where the "name field exists. Hope this helps!
{
"aggregations": {
"field_exists": {
"doc_count": 11984
}
}
}

Resources