Faster query filter for getting a document with "greatest" field value? - performance

In an Elasticsearch index I have document having fields: fooId and fooField.
I would like to fetch the document with a given fooId value but the largest value of fooField. Right now, I have a filtered query with an aggregation like this one:
"aggs": {
"topHits_agg": {
"top_hits": {
"sort": [{
"fooField": {
"order": "desc"
}
}],
size: 1
}
}
}
However, the performance is not good. Is there any way to make this better?

If I understand correctly you do not need aggregation, you could sort on fooField directly like this
GET your_index/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"fooId": "your_specific_id"
}
}
}
},
"sort": [
{
"fooField": {
"order": "desc"
}
}
],
"size": 1
}

Related

Aggregated results show less items than doc_count?

I have an ElasticSearch query which aggregates the result on a certain field, called _aggregate. Now I have this strange situation given this query:
"size": 100,
"aggregations": {
"results": {
"terms": {
"field": "_aggregate",
"size": 1000,
"order": {
"_count": "desc"
}
},
"aggregations": {
"bundled": {
"top_hits": {
"sort": [
{
"_weight": "asc"
}
]
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"_aggregate": "5713618784853"
}
}
]
}
}
}
When I do this search, it returns 8 hits (like expected). However, when I take a look at the aggregated results, I see a doc_count of 8 (so far so good), but it only returns 3 hits.
Increasing the size of the _aggregate field does not have any effect.
Does anyone know how this is possible, or what can possibly cause this?
This is because the top_hits metric aggregation returns 3 hits by default. You can override this
"aggregations": {
"bundled": {
"top_hits": {
"size": 10, <--- add this
"sort": [
{
"_weight": "asc"
}
]
}
}
}

How to sort query result with hit count

Hi I've indexed some info into ElasticSearch like
{"info":"002345 Groot 7AP"}
and supported a query template
GET _search?size=5
`{"query": {
"match_phrase_prefix": {
"info": "%s"
}
}
}`
so I can search info by any terms.
the default order is "_score":"desc"
and now I want to return query results sorting by hit count, so the frequently used infos would show up.
I read some aggregation api on elastic.co, but don't know how to write the query body.
Thanks.
Try this if this works:
`{
"aggs": {
"top_tags": {
"terms": {
"field": "type",
"size": 3
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size" : 1
}
}
}
}`
}
}`

Elasticsearch scoped aggregation not desired results

I have the following query but the aggregation doesn't seem to be acting on top of the query.
The query returns 3 results there are 10 items in the aggregation. Looks like the aggregation is acting on top of all queried results.
Basically, how do I get the aggregation to take the given query as the input?
{
"query": {
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"coordinates": [
-79.3931,
43.6709
],
"distance": "15km"
}
},
{
"term": {
"user.type": "2"
}
}
]
},
"query": {
"match": {
"user.shoes": "314"
}
}
}
},
"aggs": {
"dedup": {
"terms": { "field": "user.id" }
"aggs": {
"dedup_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
So as it turns out, I was expecting the aggregation to act on the paginated results given by the query. And that's incorrect.
The aggregation takes as input "all results" of the query, not just the paginated one.

Elasticsearch Ordering terms aggregation buckets after field in top hits sub aggregation

I would like to order the buckets from a terms aggregation based on a property possessed by the first element in a top hits aggregation.
My best effort query looks like this (with syntax errors):
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topAnswer._source.id": "asc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Does anyone know how to accomplish this?
Example:
{
"a":1,
"b":2,
"id":4
}
{
"a":1,
"b":3,
"id":1
}
{
"a":2,
"b":4,
"id":3
}
Grouping by "a" and ordering the buckets by "id" (desc) and sorting the top hits on "b" (desc) would give:
{2:{
"a":2,
"b":4,
"id":3
},1:{
"a":1,
"b":3,
"id":1
}}
You can do it with the following query. The idea is to show for each parent_uuid bucket the first top hit with the minimum id value and to sort the parent_uuid buckets according the smallest id value as well using a min sub-aggregation.
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topSort": "desc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1,
"sort": {
"b": "desc"
}
}
},
"topSort": {
"max": {
"field": "id"
}
}
}
}
}
}
Try it out and report if this works out for you.

ElasticSearch filtering by field1 THEN field2 THEN take max of field3

I am struggling to get the information that I need from ElasticSearch.
My log statements are like this:
field1: Example
field2: Example2
field3: Example3
I would like to search a timeframe (using last 24 hours) to find all data that has this in field1 and that in field2.
There then may be multiple this.that.[field3] entries, so I want to only return the maximum of that field.
In fact, in my data, field3 is actually the key of the entry.
What is the best way of retrieving the information I need? I have managed to get the results returned using aggs, but the data is in buckets, and I am only interested in the data with the max value of field3.
I have added an example of the query that I am looking to do: https://jsonblob.com/54535d49e4b0d117eeaf6bb4
{
"size": 0,
"aggs": {
"agg_129": {
"filters": {
"filters": {
"CarName: Toyota": {
"query": {
"query_string": {
"query": "CarName: Toyota"
}
}
}
}
},
"aggs": {
"agg_130": {
"filters": {
"filters": {
"Attribute: TimeUsed": {
"query": {
"query_string": {
"query": "Attribute: TimeUsed"
}
}
}
}
},
"aggs": {
"agg_131": {
"terms": {
"field": "#timestamp",
"size": 0,
"order": {
"_count": "desc"
}
}
}
}
}
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
],
"must_not": []
}
}
}
}
}
So, that example above is showing only those that have CarName = Toyota and Attribute = TimeUsed.
My data is as follows:
There are x number of cars CarName and each car has y number of Attributes and each of those Attributes have a document with a timestamp.
To begin with, I was looking for a query for CarName.Attribute.timestamp (latest), however, if I am able to use just ONE query to get the latest timestamp for EVERY attribute for EVERY CarName, then that would decrease query calls from ~50 to one.
If you are using a ElasticSearch v1.3+, you can add a top_hits aggregation with parameter size:1 and descending sort on the field3 value.
This will return the whole document with maximum value on the field, as you wish.
This example in the documentation might do the trick.
Edit:
Ok, it seems you don't need the whole document, but only the maximum timestamp value. You can use a max aggregation instead of using a top_hits one.
The following query (not tested) should give you the maximum timestamp value for each top 10 Attribute value of each CarName top 10 value, in only one request.
terms aggregation is like a GROUP BY clause, and you should not have to query 50 times to retrieve the values of each CarName/Attribute combination : this is the point of nesting a terms aggregation for Attribute in the CarName aggregation.
Note that, to work properly, the CarName and Attribute fields should be not_analyzed. If it's not the case, you will have "funny" results in your buckets. The problem (and possible solution) is very well described here.
Feel free to change the size parameter of the terms aggregation to fit to your case.
{
"size": 0,
"aggs": {
"by_carnames": {
"terms": {
"field": "CarName",
"size": 10
},
"aggs": {
"by_attribute": {
"terms": {
"field": "Attribute",
"size": 10
},
"aggs": {
"max_timestamp": {
"max": {
"field": "#timestamp"
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
]
}
}
}
}
}

Resources