How to filter terms aggregation - elasticsearch

Currently I have something like this
aggs: {
categories: {
terms: {
field: 'category'
}
}
}
and this is giving me number of products in each category. But I have additional condition. I need to get number of products in each category which are not sold already, so I need to perform filter on terms somehow.
Is there some elegant way of doing this using aggregation framework, or I need to write filtered query?
Thank you

You can merge between Terms Aggregation and Filter Aggregation, and this is how it should look: (tested)
aggs: {
categories: {
filter: {term: {sold: true}},
aggs: {
names: {
terms: {field: 'category'}
}
}
}
}
You can add also more conditions to the filter, I hope this helps.

Just to add to the other answer, you can also use a nested query. This is similar to what I had to do. I'm using Elasticsearch 5.2.
From the docs, here is the basic syntax:
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
This is how I implemented it:
GET <path> core_data/_search
{
"aggs": {
"NAME": {
"nested": {
"path": "ATTRIBUTES"
},
"aggs": {
"NAME": {
"filter": {
"term": {
"ATTRIBUTES.ATTR_TYPE": "EDUCATION_DEGREE"
}
},
"aggs": {
"NAME": {
"terms": {
"field": "ATTRIBUTES.DESCRIPTION",
"size": 100
}
}
}
}
}
}
}
}
This filtered the data down to one bucket, which is what I needed.

Related

ElasticSearch sorting based with wildcard in key

I have a data structure something like this from query. I want to apply a sort based on the date in the object values.
{
users: {
"1234": {
name: "User 1",
joining_date: "2022-12-28T11:37:00.000Z"
},
"3456": {
name: "User 2",
joining_date: "2022-12-18T11:37:00.000Z"
}
}
}
This is my query so far.
GET /_search
{
"sort" : [ {
"users.*.joining_date": {
"order": "desc",
"format": "date",
"unmapped_type": "long"
} }
],
"query": {
"query_string": {
"query": "_schema:users"
}
}
}
The problem is with using a wildcard in the key. I have tried multiple combinations from the documentation but nothing worked so far. I will be grateful for any help.

Create different sub-aggregations depending on a top-level filters aggregation

I'm using a filters aggregation with ElasticSearch and within that aggregation I was wondering if I could create different sub-aggregations depending on the different filter buckets. In my case, I'm aggregating from two sources chrome and electron. For each source, I want to run different sub-aggregations. Below is my current aggregation hash:
aggs: {
**top_level_agg,
sources: {
filters: {
filters: {
chrome: { term: { source: TrackingEvent::CHROME_SOURCE } },
electron: { term: { source: TrackingEvent::ELECTRON_SOURCE } }
}
},
aggs: {
**chrome_specific_agg,
**electron_specific_agg
}
}
}
This works but isn't ideal because the chrome and electron filters aggs results both contain the chrome specific and electron specific aggs. It would be better if I could do something like this (I know this doesn't work):
aggs: {
**top_level_agg,
sources: {
filters: {
filters: {
chrome: {
term: { source: TrackingEvent::CHROME_SOURCE },
aggs: {
**chrome_specific_agg
}
},
electron: {
term: { source: TrackingEvent::ELECTRON_SOURCE },
aggs: {
**electron_specific_agg
}
}
}
}
}
}
I'm not sure if this is possible with ES but I thought I'd ask. Any ideas on how to make this work?
Sure you can. Just gotta adjust the nested-ness a bit:
{
"aggs": {
"top_level_agg": {},
"chrome_specific_agg_name": {
"filter": {
"term": {
"source": "TrackingEvent::CHROME_SOURCE"
}
},
"aggs": {
"chrome_specific_agg": {}
},
"electron_specific_agg_name": {
"filter": {
"term": {
"source": "TrackingEvent::ELECTRON_SOURCE"
}
},
"aggs": {
"electron_specific_agg": {}
}
}
}
}

"Filter then Aggregation" or just "Filter Aggregation"?

I am working on ES recently and I found that I could achieve the almost same result but I have no clear idea as to the DIFFERENCE between these two.
"Filter then Aggregation"
POST kibana_sample_data_flights/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"term": {
"DestCountry": "CA"
}
}
}
},
"aggs": {
"ca_weathers": {
"terms": { "field": "DestWeather" }
}
}
}
"Filter Aggregation"
POST kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"ca": {
"filter": {
"term": {
"DestCountry": "CA"
}
},
"aggs": {
"_weathers": {
"terms": { "field": "DestWeather" }
}
}
}
}
}
My Questions
Why there are two similar functions? I believe I am wrong about it but what's the difference then?
(please do ignore the result format, it's not the question I am asking ;p)
Which is better if I want to filter out the unrelated/unmatched and start the aggregation on lots of documents?
When you use it in "query", you're creating a context on ALL the docs in your index. In this case, it acts like a normal filter like: SELECT * FROM index WHERE (my_filter_condition1 AND my_filter_condition2 OR my_filter_condition3...).
When you use it in "aggs", you're creating a context on ALL the docs that might have (or haven't) been previously filtered. Let's say that if you have an structure like:
#OPTION A
{
"aggs":{
t_shirts" : {
"filter" : { "term": { "type": "t-shirt" } }
}
}
}
Without a "query", is exactly the same as having
#OPTION B
{
"query":{
"filter" : { "term": { "type": "t-shirt" } }
}
}
BUT the results will be returned in different fields.
In the Option A, the results will be returned in the aggregations field.
In the Option B, the results will be returned in the hits field.
I would recommend to apply your filters always on the query part, so you can work with subsecuent aggregations of the already filtered docs. Also because Aggrgegations cost more performance than queries.
Hope this is helpful! :D
Both filters, used in isolation, are equivalent. If you load no results (hits), then there is no difference. But you can combine listing and aggregations. You can query or filter your docs for listing, and calculate aggregations on bucket further limited by the aggs filter. Like this:
POST kibana_sample_data_flights/_search
{
"size": 100,
"query": {
"bool": {
"filter": {
"term": {
... some other filter
}
}
}
},
"aggs": {
"ca_filter": {
"term": {
"TestCountry": "CA"
}
},
"aggs": {
"ca_weathers": {
"terms": { "field": "DestWeather" }
}
}
}
}
But more likely you will need the other way, ie. make aggregations on all docs, to display summary informations, while you display docs from specific query. In this case you need to combine aggragations with post_filter.
Answer from #Val's comment, I may just quote here for reference:
In option A, the aggregation will be run on ALL documents. In option B, the documents are first filtered and the aggregation will be run only on the selected documents. Say you have 10M documents and the filter select only a 100, it's pretty evident that option B will always be faster.

Search for documents with exactly different fields values

I'm adding documents with the following strutucte
{
"proposta": {
"matriculaIndicacao": 654321,
"filial": 100,
"cpf": "12345678901",
"idStatus": "3",
"status": "Reprovada",
"dadosPessoais": {
"nome": "John Five",
"dataNascimento": "1980-12-01",
"email": "fulanodasilva#fulano.com.br",
"emailValidado": true,
"telefoneCelular": "11 99876-9999",
"telefoneCelularValidado": true,
"telefoneResidencial": "11 2211-1122",
"idGenero": "1",
"genero": "M"
}
}
}
I'm trying to perform a search with multiple field values.
I can successfull search for a document with a specific cpf atribute with the following search
{
"query": {
"term" : {
"proposta.cpf" : "23798770823"
}
}
}
But now I need to add an AND clause, like
{
"query": {
"term" : {
"proposta.cpf" : "23798770823"
,"proposta.dadosPessoais.dataNascimento": "1980-12-01"
}
}
}
but it's returning an error message.
P.S: If possible I would like to perform a search where if the field doesn't exist, it returns the document that matches only the proposta.cpf field.
I really appreciate any help.
The idea is to combine your constraints within a bool/should query
{
"query": {
"bool": {
"should": [
{
"term": {
"proposta.cpf": "23798770823"
}
},
{
"term": {
"proposta.dadosPessoais.dataNascimento": "1980-12-01"
}
}
]
}
}
}

Elasticsearch match nested field against array of values

I'm trying to apply a terms query on a nested field using mongoid-elasticsearch and ElasticSearch 2.0. This has come to be quite frustrating since the trial-error didn't pay off much and the docs on the subject are rather sparse.
Here is my query:
{
"query": {
"nested": {
"path": "awards",
"query": {
"bool": {
"must": [
{ "match": { "awards.year": "2010"}}
]
}
}
},
"nested":{
"path": "procuring_entity",
"query": {
"bool": {
"must": [
{ "terms": { "procuring_entity.country": ["ES", "PL"]}}
]
}
}
}
}
}
While "match" and "term", work just fine, when combined with the "terms" query it returns no results, even thought it should. My mappings looks like this:
elasticsearch!({
prefix_name: false,
index_name: 'documents',
index_options: {
mappings: {
document: {
properties: {
procuring_entity: {
type: "nested"
},
awards: {
type: "nested"
}
}
}
}
},
wrapper: :load
})
If "nested" doesn't count as analyzer (which as far as I know doesn't), than there's no problem with that. As for the second example,I don't think it's the case since the array of values that it's matched against comes from exterior.
Is terms query possible on nested fields? Am I doing something wrong?
Is there any other way to match a nested field against multiple values?
Any thoughts would be much appreciated.
I think you would need to change your mappings for your nested types for this - the terms query only works on not_analyzed fields. If you update your mapping to something like:
elasticsearch!({
prefix_name: false,
index_name: 'documents',
index_options: {
mappings: {
document: {
properties: {
procuring_entity: {
type: 'nested',
properties: {
country: {
'type': 'string',
'index': 'not_analyzed'
}
}
},
awards: {
type: 'nested'
}
}
}
}
},
wrapper: :load
})
I think the query should work if you do that.

Resources