Return Documents with Null Field in Multi Level Aggregation - elasticsearch

We are using Multi Level Aggregation. We have Buckets of City and each Bucket has Buckets of Class.
For Few documents Class is Null and in such cases an empty bucket is returned for the City. Please refer to below response:
Sample Output:
"aggregations":
{
"CITY":{
"buckets":[
{
"key":"CITY 1",
"doc_count":2
"CLASS":{
"buckets":[
{
"key":"CLASS A",
"top_tag_hits":{
}
}
]
}
},
{
"key":"CITY 2",
"doc_count":2
"CLASS":{
"buckets":[
]
}
},
]
}
}
Here the key CITY 2 has an empty bucket of CLASS as all documents under key CITY 2 has the field CITY as null. But we are having a doc count.
How can we return documents under the bucket when terms field is null
Update:
Field Mapping for CLASS:
"CLASS":
{
"type": "string",
"index_analyzer": "text_with_autocomplete_analyzer",
"search_analyzer": "text_standard_analyzer",
"fields": {
"raw": {
"type": "string",
"null_value" : "na",
"index": "not_analyzed"
},
"partial_matching": {
"type": "string",
"index_analyzer": "text_with_partial_matching_analyzer",
"search_analyzer": "text_standard_analyzer"
}
}
}
Please refer to mapping to solve the issue.

You can use the missing setting or the terms aggregation in order to handle buckets with missing values. So in your case, you'd do it like this:
{
"aggs": {
"CITY": {
"terms": {
"field": "city_field"
},
"aggs": {
"CLASS": {
"terms": {
"field": "class_field",
"missing": "NO_CLASS"
}
}
}
}
}
}
With this setup, all documents than don't have a class_field field (or a null value) will land in the NO_CLASS bucket.
PS: Note that this only works since ES 2.0 and not in prior releases.

Related

Elasticsearch - Mapping fields from other indices

How can I define mapping in Elasticsearch 7 to index a document with a field value from another index? For example, if I have a users index which has a mapping for name, email and account_number but the account_number value is actually in another index called accounts in field number.
I've tried something like this without much success (I only see "name", "email" and "account_id" in the results):
PUT users/_mapping
{
"properties": {
"name": {
"type": "text"
},
"email": {
"type": "text"
},
"account_id": {
"type": "integer"
},
"accounts": {
"properties": {
"number": {
"type": "text"
}
}
}
}
}
The accounts index has the following mapping:
{
"properties": {
"name": {
"type": "text"
},
"number": {
"type": "text"
}
}
}
As I understand it, you want to implement field joining as is usually done in relational databases. In elasticsearch, this is possible only if the documents are in the same index. (Link to doc). But it seems to me that in your case you need to work differently, I think your Account object needs to be nested for User.
PUT /users/_mapping
{
"mappings": {
"properties": {
"account": {
"type": "nested"
}
}
}
}
You can further search as if it were a separate document.
GET /users/_search
{
"query": {
"nested": {
"path": "account",
"query": {
"bool": {
"must": [
{ "match": { "account.number": 1 } }
]
}
}
}
}
}

nested terms aggregation on object containing a string field

I like to run a nested terms aggregation on string field which is inside an object.
Usually, I use this query
"terms": {
"field": "fieldname.keyword"
}
to enable fielddata
But I am unable to do that for a nested document like this
{
"nested": {
"path": "objectField"
},
"aggs": {
"allmyaggs": {
"terms": {
"field": "objectField.fieldName.keyword"
}
}
}
}
The above query is just returning an empty buckets array
Is there a way this can be done without enabling field-data by default during index mapping.
Since that will take a large heap memory and I have already loaded a huge data without it
document mapping
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text"
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
sample document
{
"productname": [
{
"productlineseqno": "1.58",
"iscontrolleddrug": "false",
"productlinename": "Consultations",
"productlinedescription": "Consultations",
"isprescribable": "false",
"invoiceitemname": "invoice name"
}
]
}
Fixed
By changing the mapping to enable field data
Nested query is used to access nested fields similarly nested aggregation is needed to aggregation on nested fields
{
"aggs": {
"fieldname": {
"nested": {
"path": "objectField"
},
"aggs": {
"fields": {
"terms": {
"field": "objectField.fieldname.keyword",
"size": 10
}
}
}
}
}
}
EDIT1:
If you are searching for productname.invoiceitemname.keyword then it will give empty bucket as no field exists with that name.
You need to define your mapping like below
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text",
"fields":{ --> note
"keyword":{
"type":"keyword"
}
}
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
}
}
}
Fields
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
When mapping is not explicitly provided, keyword fields are created by default. If you are creating your own mapping(which you need to do for nested type), you need to provide keyword fields in mapping, wherever you intend to use them

How I can get the distinct result?

What I am trying to do is the query to elastic search (ver 6.4), to get the unique search result (named eids). I made a query as below. What I'd like to do is first text search from both 2 fields called eLabel and pLabel, and get the distinct result called eid. But actually the result is not aggregated, showing redundant ids from 0 to over 20. How I can adjust the query?
{
"query": {
"multi_match": {
"query": "Brazil Capital",
"fields": [
"eLabel",
"pLabel"
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
my current mappings are as follows.
eid : id of entity
eLabel: entity label (ex, Brazil)
prop_id: property id of the entity (eid)
pLabel: the label of the property (ex, is the capital of, is located at ...)
"mappings": {
"entity": {
"properties": {
"eLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}
Based on your comment, you can make use of the below query using Bool. Don't think anything is wrong with aggregation query, just replace the query you have with the bool query I've mentioned and I think it would suffice.
When you make use of multi_match query, it would retrieve even if the document has eLabel = "Rio is capital of brazil" & pLabel = "something else entirely here"
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"eLabel": "capital"
}
},
{
"match": {
"pLabel": "brazil"
}
}
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
Note that if you only want the values of eid and do not want the documents, you can set "size":0 in the above query. That way you'd only have aggregation results returned.
Let me know if this helps!!

ElasticSearch - Bucket average with Nested fields aggregation

I am trying to execute the following query in the elasticsearch. The scenario is I have one field in the document which has 3 subfields: time1, time2, and id, the field is an array of objects having the above fields.
I want to calculate the average of difference b/w time2 and time1 for all the items.
Query being executed is :
`{
"query":{"match_all":{}},
"aggs":{
"total_time_diff":{
"nested":{"path":"diff_list"},
"aggs":{
"diff_r":{
"sum":"doc['time2'].date.getMills()-doc['time1'].date.getMills()"
}
}
},
// Here I need average of the sum which is calculated in total_time_diff "sum" aggregation
"avg_diff":{
"avg_bucket":{"buckets_path":"total_time_diff"}
}
}
}`
I am gettting following error:
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "class_cast_exception",
"reason": "org.elasticsearch.search.aggregations.bucket.nested.InternalNested cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation"
}
},
"status": 503
}
Index Mapping
{
"my_index": {
"mappings": {
"response_index": {
"date_detection": false,
"diff_list": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"time2": {
"type": "date"
},
"time1": {
"type": "date"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Thank you in advance.
"aggs":{
"diff_r":{
"sum":"doc['time2'].date.getMills()-doc['time1'].date.getMills()"
}
}
isnot a bucket selector and so the total_time_diff wont work inside the last aggregation (avg_diff).
use a script instead (like)
"script": "doc['time2'].date.getMills()-doc['time1'].date.getMills()"
Let us know it it word.
I have found different solution for your problem here. Instead of doing the sum in the script and then looking for bucket script aggregation to work on it. i used average script aggregation using script.
Avg bucket aggregation will not work here for this sibling aggregation as the aggregation doing sum is not multi bucket aggregation.
i have made some changes to the script to compute the difference between two date fields. Following query should work for you.
{
"size": 0,
"aggs": {
"total_time_diff": {
"nested": {
"path": "diff_list"
},
"aggs": {
"diff_r": {
"avg": {
"script": {
"source": "doc['diff_list.time2'].value.millis - doc['diff_list.time1'].value.millis"
}
}
}
}
}
}
}
Hope this works for you.

Elasticsearch Aggregation - Unable to perform aggregation to object

I have a mapping with an inner object as follows:
{
"mappings": {
"_all": {
"enabled": false
},
"properties": {
"foo": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"address": {
"type": "object",
"properties": {
"address": {
"type": "string"
},
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
When I try the following aggregation it does not return any data:
post data:*/foo/_search?search_type=count
{
"query": {
"match_all": {}
},
"aggs": {
"unique": {
"cardinality": {
"field": "address.city"
}
}
}
}
When I try to put field city or address.city, aggregation returns zero but if i put foo.address.city it is then when i get the correct respond by elasticsearch. This also affects kibana behavior
Any ideas why this is happening? I saw there is a mapping refactoring that might affects this. I use elasticsearch version 1.7.1
To add on this if, I use the relative path in a search query as follows it works normally:
"query": {
"filtered": {
"filter": {
"term": {
"address.city": "london"
}
}
}
}
Seems its this same issue.
This is seen when the type name and field name is same.

Resources