How to get fields in top_hits aggregation - elasticsearch

I have elasticsearch cluster - version 1.3.0. The index documents of this cluster doesn't have _source enabled hence when retrieving hits, I usually get based on "fields" parameter in my request.
Now I am implementing top_hits aggregation for a duplicate grouping functionality. I would like to get fields in the top_hits result which I couldn't do it now since _source isn't enabled by default in my mapping. Could you please suggest me a option/work around to achieve this with out changing existing mapping?
I didn't find it in the top-hits aggregation doc. Any help on this much appreciated.
Thanks!

Use script fields:
"aggs": {
"sample": {
"top_hits": {
"size": 1,
"script_fields": {
"field1": {
"script": "doc['field1']"
},
"field2": {
"script": "doc['field2']"
}
...
}
}
}
}
But, if field1 or field2 are analyzed, you would need a sub-field that should keep a not_analyzed version of the field. Why? Because, if the normal field is analyzed in any way, the doc['field'] call will return the analyzed terms, not the initial content that was indexed.
Something like this:
"mappings": {
"test": {
"_source": {
"enabled": false
},
"properties": {
"field1": {
"type": "string",
"fields": {
"notAnalyzed": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
And the query:
"aggs": {
"sample": {
"top_hits": {
"size": 1,
"script_fields": {
"field1": {
"script": "doc['field1.notAnalyzed']"
}
}
}
}
}

Related

How to exclude certain fields from the _source field in ElasticSearch

I have following simple snippet:
PUT /lib36
{
"mappings": {
"_source": {"enabled": false},
"properties": {
"name": {
"type": "text"
},
"description":{
"type": "text"
}
}
}
}
PUT /lib36/_doc/1
{
"name":"abc",
"description":"xyz"
}
POST /lib36/_search
{
"query": {
"match": {
"name": "abc"
}
}
}
With "_source": {"enabled": false}, the queried result doesn't include _source field.
I would to know how to write the query that the query result has the _source field ,but only contain name field, but not description field.
Thanks!
You can use the source filtering of elasticsearch, but for that first you need to have _source enabled in your mapping.
You need to have below key-value in your search query JSON. below search will exclude all other fields apart from name.
{
"query": {
"match": {
"name": "abc"
}
},
"_source": [
"name"
],
}

Aggregate objects in ElasticSearch by IP Prefix

I have an ElasticSearch index where I store internet traffic flow objects, which each object containing an IP address. I want to aggregate the data in a way that all objects with the same IP Prefix are collected in the same bucket (but without specifying a specific Prefix). Something like a histogram aggregation. Is this possible?
I have tried this:
GET flows/_search
{
"size": 0,
"aggs": {
"ip_ranges": {
"histogram": {
"field": "ipAddress",
"interval": 256
}
}
}
}
But this doesn't work, probably because histogram aggregations aren't supported for ip type fields. How would you go about doing this?
Firstly, As suggested here, the best approach would be to:
categorize the IP address at index time and then use a simple keyword field to store the class c information, and then use a term aggregation on that field to do the count.
Alternatively, you could simply add a multi-field keyword mapping:
PUT myindex
{
"mappings": {
"properties": {
"ipAddress": {
"type": "ip",
"fields": {
"keyword": { <---
"type": "keyword"
}
}
}
}
}
}
and then extract the prefix at query time (⚠️ highly inefficient!):
GET myindex/_search
{
"size": 0,
"aggs": {
"my_prefixes": {
"terms": {
"script": "/\\./.split(doc['ipAddress.keyword'].value)[0]",
"size": 10
}
}
}
}
As a final option, you could define the intervals of interest in advance and use an ip_range aggregation:
{
"size": 0,
"aggs": {
"my_ip_ranges": {
"ip_range": {
"field": "ipAddress",
"ranges": [
{ "to": "192.168.1.1" },
{ "from": "192.168.1.1" }
]
}
}
}
}

ElasticSearch Failing to Sort Nested Object in order

ElasticSearch 6.5.2 Given the mapping and query, the document order is not effected by changing 'desc' to 'asc' and vice versa. Not seeing any errors, just sort: [Infinity] in the results.
Mapping:
{
"mappings": {
"_doc": {
"properties": {
"tags": {
"type": "keyword"
},
"metrics": {
"type": "nested",
"dynamic": true
}
}
}
}
}
Query
{
"query": {
"match_all": {
}
},
"sort": [
{
"metrics.http.test.value": {
"order": "desc"
}
}
]
}
Document structure:
{
"tags": ["My Tag"],
"metrics": {
"http.test": {
"updated_at": "2018-12-08T23:22:07.056Z",
"value": 0.034
}
}
}
When sorting by nested field it is necessary to tell the path of nested field using nested param.
One thing more you were missing in the query is the field on which to sort. Assuming you want to sort on updated_at the query will be:
{
"query": {
"match_all": {}
},
"sort": [
{
"metrics.http.test.updated_at": {
"order": "desc",
"nested": {
"path": "metrics"
}
}
}
]
}
One more thing that you should keep in mind while sorting using nested field is about filter clause in sort. Read more about it here.
Apparently changing the mapping to this:
"metrics": {
"dynamic": true,
"properties": {}
}
Fixed it and allowed sorting to happen in the correct order.

Elasticsearch nested significant terms aggregation with background filter

I am having hard times applying a background filter to a nested significant terms aggregation , the bg_count is always 0.
I'm indexing article views that have ids and timestamps, and have multiple applications on a single index. I want the foreground and background set to relate to the same application, so I'm trying to apply a term filter on the app_id field both in the boo query and in the background filter. article_views is a nested object since I want to be also able to query on views with a range filter on timestamp, but I haven't got to that yet.
Mapping:
{
"article_views": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
},
"app_id": {
"type": "string",
"index": "not_analyzed"
}
}
Query:
{
"aggregations": {
"articles": {
"nested": {
"path": "article_views"
},
"aggs": {
"articles": {
"significant_terms": {
"field": "article_views.id",
"size": 5,
"background_filter": {
"term": {
"app_id": "17"
}
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"app_id": "17"
}
},
{
"nested": {
"path": "article_views",
"query": {
"terms": {
"article_views.id": [
"1",
"2"
]
}
}
}
}
]
}
}
}
As I said, in my result, the bg_count is always 0, which had me worried. If the significant terms is on other fields which are not nested the background_filter works fine.
Elasticsearch version is 2.2.
Thanks
You seem to be hitting the following issue where in your background filter you'd need to "go back" to the parent context in order to define your background filter based on a field of the parent document.
You'd need a reverse_nested query at that point, but that doesn't exist.
One way to circumvent this is to add the app_id field to your nested documents so that you can simply use it in the background filter context.

Nested Objects aggregations (with Kibana)

We got an Elasticsearch index containing documents with a subset of arbitrary nested object called devices. Each of those devices has a key call "aw".
What I try to accomplish, is to get an average of the aw key for each device type.
When trying to aggregate and visualize this average I don't get the average of the aw of every device type, but of all devices within the documents containing the specific device.
So instead of fetching all documents where device.id=7 and aggregating the awper device.id, Elasticsearch / Kibana fetches all documents containing device.id=7 but then builds it's average using all devices within the documents.
Out index mapping looks like this (only important parts):
"mappings" : {
"devdocs" : {
"_all": { "enabled": false },
"properties" : {
"cycle": {
"type": "object",
"properties": {
"t": {
"type": "date",
"format": "dateOptionalTime||epoch_second"
}
}
},
"devices": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"aw": {
"type": "long"
}
"t": {
"type": "date",
"format": "dateOptionalTime||epoch_second"
},
}
}
}
}
Kibana generates the following query:
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"cycle.t": {
"gte": 1290760324744,
"lte": 1448526724744,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"terms": {
"field": "devices.name",
"size": 35,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"avg": {
"field": "devices.aw"
}
}
}
}
}
}
Is there a way to aggregate the average aw on device level, or what am I doing wrong?
Kibana doesn't support nested aggregations yet , Nested Aggregations Issue.
I had the same issue and solved it by building kibana from src from this fork by user ppadovani. [branch : nestedAggregations]
See instructions to build kibana from source here.
After building when you run kibana now it will contain a Nested Path text box and a reverse nested checkbox in advanced options for buckets and metrics.
Here is an example of nested terms aggregation on lines.category_1, lines.category_2, lines.category_3 and lines being of nested type. using the above with three buckets, :
I would suggest adding filter aggregation to leave everything with aw: 7.
Defines a single bucket of all the documents in the current document
set context that match a specified filter. Often this will be used to
narrow down the current aggregation context to a specific set of
documents.
Kibana does not support Nested json.

Resources