Elasticsearch, counting not included terms - elasticsearch

I'm trying to get a single, or a couple, of ES requests to count the terms I have not included in my current search.
Let me elaborate.... My front-end looks like this:
I have Closed currently selected, so the other items should show how many items they would add if I were to include that term.
Assume that closed == 500 and Rejected == 100;
While I have closed selected the rejected field should have the number 100 appended to it. If I deselect closed , it should show the number 500. If I select rejected and not select closed it should also show 500.
Easy enough huh? We just add a bucket counting the status field and that will return a bucket for each of these items, we then get the value from it and display it.
That part I got :) However.... when I actually add a term (for example one that filters on NoOffer) the buckets won't include the others field...
This is what my query looks like (global buckets by: ChintanShah25)
{
"size": 50,
"from": 1,
"sort": [
{
"createdAt": "desc"
}
],
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"fromPlace": "*rotter*"
}
}
]
}
},
{
"bool": {
"should": [
{
"wildcard": {
"status": "closed"
}
}
]
}
}
]
}
},
"aggs": {
"status": {
"global": {},
"aggs": {
"all_status": {
"terms": {
"field": "status.raw",
"size": 10
}
}
}
}
}
}
The global now shows all the different status codes, but it doesn't take into regard the rest of the statement. The "fromPlace" filter doesn't get applied.

I guess you are looking for global aggregation which will include all the fields regardless of the query. You could also use filter aggregation for selective stats if you want.
{
"query": {
"term": {
"status": {
"value": "closed"
}
}
},
"size": 0,
"aggs": {
"everything": {
"global": {},
"aggs": {
"all_status": {
"terms": {
"field": "status.raw",
"size": 10
}
}
}
}
}
}

Related

Elasticsearch - Find all documents with aggregations results included in math operations

I have 4 different aggregation queries where the results included in a math operation to find the total number required, pseudo example below. I need to find all the documents where the number is negative (e.g. -10).
number = agg1 + agg2 - agg3 - agg4
To keep it simple I will post two abbreviated aggregation queries.
Agg1:
{
"track_total_hits": true,
"aggs": {
"queryAmount_1": {
"sum": {
"field": "amount"
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match": {
"some_field": {
"query": "PayoutRequested"
}
}
}
]
}
}
]
}
},
"size": 0
}
Agg2:
{
"track_total_hits": true,
"aggs": {
"queryAmount_2": {
"sum": {
"field": "amount"
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match": {
"some_field": {
"query": "DonationRequested"
}
}
}
]
}
}
]
}
},
"size": 0
}
Somehow, I need to combine these in 1 query and grab the amount from the response for each aggregation grouped by some_id where the number result is negative.
Not sure if we can really achieve it but ideas are welcome.
The starting point would be the Pipeline aggregations and in specific have a look at Cumulative sum and Sum Bucket. Hope this would help.

Need aggregation of only the query results

I need to do an aggregation but only with the limited results I get form the query, but it is not working, it returns other results outside the size limit of the query. Here is the query I am doing
{
"size": 500,
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "possiblePurchase"
}
},
{
"term": {
"clientName": "Ci"
}
},
{
"range": {
"firstSeenDate": {
"gte": "now-30d"
}
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "skipPurchase"
}
}
]
}
},
"sort": [
{
"firstSeenDate": {
"order": "desc"
}
}
],
"aggs": {
"byClient": {
"terms": {
"field": "clientName",
"size": 25
},
"aggs": {
"byTarget": {
"terms": {
"field": "targetName",
"size": 6
},
"aggs": {
"byId": {
"terms": {
"field": "id",
"size": 5
}
}
}
}
}
}
}
}
I need the aggregations to only consider the first 500 results of the query, sorted by the field I am requesting on the query. I am completely lost. Thanks for the help
Scope of the aggregation is the number of hits of your query, the size parameter is only used to specify the number of hits to fetch and display.
If you want to restrict the scope of the aggregation on the first n hits of a query, I would suggest the sampler aggregation in combination with your query

Elasticsearch Remove duplicate results if greater than some value

I have news articles form multiple sources saved and each source have different category I need to write a query which will reverse time sort the article in chunks of 15 at a time also I don't need more than 3 articles from a particular source I am using the below query but the results are wrong can any one tell me what am I doing wrong.
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"category": "Digital"
}
},
{
"match_phrase": {
"type": "Local"
}
}
]
}
},
"collapse": {
"field": "source.keyword",
"max_concurrent_group_searches": 3
},
"sort": [
{
"pub_date": {
"order": "desc"
}
}
]
}

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

ElasticSearch filtering by field1 THEN field2 THEN take max of field3

I am struggling to get the information that I need from ElasticSearch.
My log statements are like this:
field1: Example
field2: Example2
field3: Example3
I would like to search a timeframe (using last 24 hours) to find all data that has this in field1 and that in field2.
There then may be multiple this.that.[field3] entries, so I want to only return the maximum of that field.
In fact, in my data, field3 is actually the key of the entry.
What is the best way of retrieving the information I need? I have managed to get the results returned using aggs, but the data is in buckets, and I am only interested in the data with the max value of field3.
I have added an example of the query that I am looking to do: https://jsonblob.com/54535d49e4b0d117eeaf6bb4
{
"size": 0,
"aggs": {
"agg_129": {
"filters": {
"filters": {
"CarName: Toyota": {
"query": {
"query_string": {
"query": "CarName: Toyota"
}
}
}
}
},
"aggs": {
"agg_130": {
"filters": {
"filters": {
"Attribute: TimeUsed": {
"query": {
"query_string": {
"query": "Attribute: TimeUsed"
}
}
}
}
},
"aggs": {
"agg_131": {
"terms": {
"field": "#timestamp",
"size": 0,
"order": {
"_count": "desc"
}
}
}
}
}
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
],
"must_not": []
}
}
}
}
}
So, that example above is showing only those that have CarName = Toyota and Attribute = TimeUsed.
My data is as follows:
There are x number of cars CarName and each car has y number of Attributes and each of those Attributes have a document with a timestamp.
To begin with, I was looking for a query for CarName.Attribute.timestamp (latest), however, if I am able to use just ONE query to get the latest timestamp for EVERY attribute for EVERY CarName, then that would decrease query calls from ~50 to one.
If you are using a ElasticSearch v1.3+, you can add a top_hits aggregation with parameter size:1 and descending sort on the field3 value.
This will return the whole document with maximum value on the field, as you wish.
This example in the documentation might do the trick.
Edit:
Ok, it seems you don't need the whole document, but only the maximum timestamp value. You can use a max aggregation instead of using a top_hits one.
The following query (not tested) should give you the maximum timestamp value for each top 10 Attribute value of each CarName top 10 value, in only one request.
terms aggregation is like a GROUP BY clause, and you should not have to query 50 times to retrieve the values of each CarName/Attribute combination : this is the point of nesting a terms aggregation for Attribute in the CarName aggregation.
Note that, to work properly, the CarName and Attribute fields should be not_analyzed. If it's not the case, you will have "funny" results in your buckets. The problem (and possible solution) is very well described here.
Feel free to change the size parameter of the terms aggregation to fit to your case.
{
"size": 0,
"aggs": {
"by_carnames": {
"terms": {
"field": "CarName",
"size": 10
},
"aggs": {
"by_attribute": {
"terms": {
"field": "Attribute",
"size": 10
},
"aggs": {
"max_timestamp": {
"max": {
"field": "#timestamp"
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
]
}
}
}
}
}

Resources