aggregating properties in elastic search - elasticsearch

I have an indexed entry that has optional properties. So, for example, I have entries like this
{
"id":1
"field1":"XYZ"
},
{
"id":2
"field2":"XYZ"
},
{
"id":3
"field1":"XYZ"
}
I would like to make an aggregation that will tell me how many entries I have with field1 and field2 populated.
The expected result should be:
{
"field1":2
"field2":1
}
Is this even possible with elasticsaerch?

Yes, you can do it like this:
POST myindex/_search
{
"size": 0,
"aggs": {
"field_exists": {
"filters": {
"filters": {
"field1": {
"exists": {
"field": "field1"
}
},
"field2": {
"exists": {
"field": "field2"
}
}
}
}
}
}
}
You'll get an answer like this one:
"aggregations" : {
"field_exists" : {
"buckets" : {
"field1" : {
"doc_count" : 2
},
"field2" : {
"doc_count" : 1
}
}
}
}

Related

Elasticsearch aggregation query with filters

I wrote a elasticsearch query to get the aggregated doc count of a matching keyword "webserver1". Below is the query:
POST _search?filter_path=aggregations.*.buckets
{
"query": {
"bool": {
"must": [
{
"match": {
"hostname": "webserver1"
}
}
]
}
},
"aggs": {
"webserver1": {
"terms": {
"field": "webserver1"
}
}
}
}
Response:
{
"aggregations" : {
"webserver1" : {
"buckets" : [
{
"key" : "webserver1",
"doc_count" : 36715
}
]
}
}
}
Is there a way to filter only the wanted text and display it like the below one:
{
"webserver1" : 36715
}
I have checked multiple resource but I'm not able to find any filters/options to do it.

Search by internal field in Elasticsearch

Structure:
{
.................
"mp": "CAR",
"nPhoto": 1,
"items": [
{
"availableQuantity": 3,
},
{
"availableQuantity": 0,
},
{
"availableQuantity": 0,
}
],
............................
}
}
If I filter by mp field, I generate the following query:
GET catalog/_search
{
"from" : 0,
"size" : 0,
"aggregations" : {
"brand" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"mp" : "CAR"
}
}
}
},
"aggregations" : {
"photosQuantity" : { "sum" : { "field" : "nPhoto" } }
}
}
}
}
But how to generate query if you need to filter by field availableQuantity, where availableQuantity > 0 at least one of the items?
What you probably want is nested query in filter part.
something along line of this:
{
"from": 0,
"size": 0,
"aggregations": {
"brand": {
"filter": {
"nested": {
"path": "items",
"query": {
"range": {
"items.availableQuantity": {
"gte": 0
}
}
}
}
},
"aggregations": {
"photosQuantity": {
"sum": {
"field": "nPhoto"
}
}
}
}
}
}

How to aggregate nested fields to include null values?

I'm having trouble aggregating my nested data to include null values as well.
I'm using Elasticsearch version 6.8
I'll simplify the problem, I've a nested field that looks like:
PUT test/doc/_mapping
{
"properties": {
"fields": {
"type" : "nested",
"properties" : {
"name" : {
"type" : "keyword"
},
"value" : {
"type" : "long"
}
}
}
}
}
I created 3 documents:
PUT test/doc/1
{
"fields" : {
"name" : "aaa",
"value" : 1
}
}
PUT test/doc/2
{
"fields" : [{
"name" : "aaa",
"value" : 1
},
{
"name" : "bbb",
"value" : 2
}]
}
PUT test/doc/3
{
"fields" : [
{
"name" : "bbb",
"value" : 2
}]
}
Now I want to group my data to get how many documents there are where name="bbb" group by each value.
For the above data I want to get:
2 – 2 documents
N/A – 1 document (the first document where bbb is missing)
The problem is with the null values, I cannot find a way to match the documents where "bbb" is null and put them in a N/A bucket.
So far I wrote a query that match the values where "bbb" exist:
GET test/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"my_agg": {
"nested": {
"path": "fields"
},
"aggs": {
"my_filter": {
"filter": {
"term": {
"fields.name": "bbb"
}
},
"aggs": {
"my_term": {
"terms": {
"field": "fields.value"
}
}
}
}
}
}
}
}
And the response is:
"aggregations" : {
"my_agg" : {
"doc_count" : 4,
"my_filter" : {
"doc_count" : 2,
"my_term" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2,
"doc_count" : 2
}
]
}
}
}
}
I want to get also:
"key" : 0 (for N/A)
"doc_count" : 1
What am I missing?
If I understand this correctly, you want to know the buckets where there was zero/null/no matches. You can use min_doc_count
GET test/doc/_search
{
"size": ,
"query": {
"match_all": {}
},
"aggs": {
"my_agg": {
"nested": {
"path": "fields"
},
"aggs": {
"my_filter": {
"filter": {
"term": {
"fields.name": "bbb"
}
},
"aggs": {
"my_term": {
"terms": {
"field": "fields.value", --> you can also use "_id" to get count based on each document
"min_doc_count": 0 --> this will include all the buckets where count is zero/ or there is no match.
}
}
}
}
}
}
}
}
You could also use inner_hits to find a hit in each document or use _id in above aggregations query.
POST test/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"nested": {
"path": "fields",
"query": {
"match": {
"fields.name": "bbb"
}
},
"inner_hits": {}
}
}
]
}
}
}

Getting "Field data loading is forbidden" when trying to aggregate

I'm trying to do a simple unique aggregation, but getting this error:
java.lang.IllegalStateException: Field data loading is forbidden on eid
this is my query:
POST /logstash-2016.06.*/Nginx/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"pid": "1"
}
},
{
"term": {
"cvprogress": "0"
}
},
{
"range" : {
"ServerTime" : {
"gte" : "2016-06-28T00:00:00"
}
}
}
]
}
},
"aggs": {
"distinct_colors" : {
"cardinality" : {
"field" : "eid"
}
}
}
}
After going through the entire thread at https://github.com/elastic/elasticsearch/issues/15267 what worked was adding .raw
like this:
"aggs": {
"distinct_colors" : {
"cardinality" : {
"field" : "eid.raw"
}
}
}

Elasticsearch match list against field

I have a list, array or whichever language you are familiar. E.g. names : ["John","Bas","Peter"] and I want to query the name field if it matches one of those names.
One way is with OR Filter. e.g.
{
"filtered" : {
"query" : {
"match_all": {}
},
"filter" : {
"or" : [
{
"term" : { "name" : "John" }
},
{
"term" : { "name" : "Bas" }
},
{
"term" : { "name" : "Peter" }
}
]
}
}
}
Any fancier way? Better if it's a query than a filter.
{
"query": {
"filtered" : {
"filter" : {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
Which Elasticsearch rewrites as if you hat used this one
{
"query": {
"filtered" : {
"filter" : {
"bool": {
"should": [
{
"term": {
"name": "John"
}
},
{
"term": {
"name": "Bas"
}
},
{
"term": {
"name": "Peter"
}
}
]
}
}
}
}
}
When using a boolean filter, most of the time, it is better to use the bool filter than and or or. The reason is explained on the Elasticsearch blog: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
As I tried the filtered query I got no [query] registered for [filtered], based on answer here it seems the filtered query has been deprecated and removed in ES 5.0. So I provide using:
{
"query": {
"bool": {
"filter": {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
example query = filter by keyword and a list of values
{
"query": {
"bool": {
"must": [
{
"term": {
"fguid": "9bbfe844-44ad-4626-a6a5-ea4bad3a7bfb.pdf"
}
}
],
"filter": {
"terms": {
"page": [
"1",
"2",
"3"
]
}
}
}
}
}

Resources