Elasticsearch nested significant terms aggregation with background filter - elasticsearch

I am having hard times applying a background filter to a nested significant terms aggregation , the bg_count is always 0.
I'm indexing article views that have ids and timestamps, and have multiple applications on a single index. I want the foreground and background set to relate to the same application, so I'm trying to apply a term filter on the app_id field both in the boo query and in the background filter. article_views is a nested object since I want to be also able to query on views with a range filter on timestamp, but I haven't got to that yet.
Mapping:
{
"article_views": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
},
"app_id": {
"type": "string",
"index": "not_analyzed"
}
}
Query:
{
"aggregations": {
"articles": {
"nested": {
"path": "article_views"
},
"aggs": {
"articles": {
"significant_terms": {
"field": "article_views.id",
"size": 5,
"background_filter": {
"term": {
"app_id": "17"
}
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"app_id": "17"
}
},
{
"nested": {
"path": "article_views",
"query": {
"terms": {
"article_views.id": [
"1",
"2"
]
}
}
}
}
]
}
}
}
As I said, in my result, the bg_count is always 0, which had me worried. If the significant terms is on other fields which are not nested the background_filter works fine.
Elasticsearch version is 2.2.
Thanks

You seem to be hitting the following issue where in your background filter you'd need to "go back" to the parent context in order to define your background filter based on a field of the parent document.
You'd need a reverse_nested query at that point, but that doesn't exist.
One way to circumvent this is to add the app_id field to your nested documents so that you can simply use it in the background filter context.

Related

ElasticSearch Failing to Sort Nested Object in order

ElasticSearch 6.5.2 Given the mapping and query, the document order is not effected by changing 'desc' to 'asc' and vice versa. Not seeing any errors, just sort: [Infinity] in the results.
Mapping:
{
"mappings": {
"_doc": {
"properties": {
"tags": {
"type": "keyword"
},
"metrics": {
"type": "nested",
"dynamic": true
}
}
}
}
}
Query
{
"query": {
"match_all": {
}
},
"sort": [
{
"metrics.http.test.value": {
"order": "desc"
}
}
]
}
Document structure:
{
"tags": ["My Tag"],
"metrics": {
"http.test": {
"updated_at": "2018-12-08T23:22:07.056Z",
"value": 0.034
}
}
}
When sorting by nested field it is necessary to tell the path of nested field using nested param.
One thing more you were missing in the query is the field on which to sort. Assuming you want to sort on updated_at the query will be:
{
"query": {
"match_all": {}
},
"sort": [
{
"metrics.http.test.updated_at": {
"order": "desc",
"nested": {
"path": "metrics"
}
}
}
]
}
One more thing that you should keep in mind while sorting using nested field is about filter clause in sort. Read more about it here.
Apparently changing the mapping to this:
"metrics": {
"dynamic": true,
"properties": {}
}
Fixed it and allowed sorting to happen in the correct order.

Unwind in ElasticSearch

I am currently having the below index in ElasticSearch
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"type" : {
"type": "text",
"fielddata": true
},
"id" : {
"type": "text",
"fielddata": true
},
"nestedTypes": {
"type": "nested",
"properties": {
"nestedTypeId":{
"type": "integer"
},
"nestedType":{
"type": "text",
"fielddata": true
},
"isLead":{
"type": "boolean"
},
"share":{
"type": "float"
},
"amount":{
"type": "float"
}
}
}
}
}
}
}
I need the nested types to be displayed in a HTML table along with the id and type fields in each row.
I am trying to achieve something similar to unwind in MongoDB.
I have tried the reverse nested aggregation as below
GET my_index/_search
{
"size": 0,
"aggs": {
"NestedTypes": {
"nested": {
"path": "nestedTypes"
},
"aggs": {
"NestedType": {
"terms": {
"field": "nestedTypes.nestedType",
"order": {
"_key": "desc"
}
},
"aggs": {
"Details": {
"reverse_nested": {},
"aggs": {
"type": {
"terms": {
"field": "type"
}
},
"id": {
"terms": {
"field": "id"
}
}
}
}
}
}
}
}
}
}
But the above returns only one field from the nestedTypes, but I need all of them.
Also, I need sorting and pagination for this table. Could you please let me know how this can be achieved in ElasticSearch.
ElasticSearch does not support this operation out of the box. When a request was raised to implement the same in git, the below response was given:
We discussed it in Fixit Friday and agreed that we won't try to
implement it due to the fact that we can't think of a way to support
such operations efficiently.
The only ideas that we thought were reasonable boiled down to having
another index that stores the same data but flattened. Depending on
your use-case, you might be able to maintain those two views in
parallel or would only maintain the one you have today, then
materialize a flattened view of the data when you need it and throw it
away after you are done querying. In both cases, this requires
client-side logic.
The link to the request is here

Nested Objects aggregations (with Kibana)

We got an Elasticsearch index containing documents with a subset of arbitrary nested object called devices. Each of those devices has a key call "aw".
What I try to accomplish, is to get an average of the aw key for each device type.
When trying to aggregate and visualize this average I don't get the average of the aw of every device type, but of all devices within the documents containing the specific device.
So instead of fetching all documents where device.id=7 and aggregating the awper device.id, Elasticsearch / Kibana fetches all documents containing device.id=7 but then builds it's average using all devices within the documents.
Out index mapping looks like this (only important parts):
"mappings" : {
"devdocs" : {
"_all": { "enabled": false },
"properties" : {
"cycle": {
"type": "object",
"properties": {
"t": {
"type": "date",
"format": "dateOptionalTime||epoch_second"
}
}
},
"devices": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"aw": {
"type": "long"
}
"t": {
"type": "date",
"format": "dateOptionalTime||epoch_second"
},
}
}
}
}
Kibana generates the following query:
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"cycle.t": {
"gte": 1290760324744,
"lte": 1448526724744,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"terms": {
"field": "devices.name",
"size": 35,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"avg": {
"field": "devices.aw"
}
}
}
}
}
}
Is there a way to aggregate the average aw on device level, or what am I doing wrong?
Kibana doesn't support nested aggregations yet , Nested Aggregations Issue.
I had the same issue and solved it by building kibana from src from this fork by user ppadovani. [branch : nestedAggregations]
See instructions to build kibana from source here.
After building when you run kibana now it will contain a Nested Path text box and a reverse nested checkbox in advanced options for buckets and metrics.
Here is an example of nested terms aggregation on lines.category_1, lines.category_2, lines.category_3 and lines being of nested type. using the above with three buckets, :
I would suggest adding filter aggregation to leave everything with aw: 7.
Defines a single bucket of all the documents in the current document
set context that match a specified filter. Often this will be used to
narrow down the current aggregation context to a specific set of
documents.
Kibana does not support Nested json.

Elasticsearch getting the last nested or most recent nested element

We have this mapping:
{
"product_achievement": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"last_purchase": {
"type": "long"
},
"products": {
"type": "long"
}
}
}
}
As you see this is nested, and the last_purchase field is a unixtimestamp value. We would like to query from all nested elements the most recent entry defined by the last_purchase field AND see if in the last entry there is some product id is in products.
You can achieve this using a nested query with inner_hits. In the query part, you can specify the product id you want to match and then using inner_hits you can sort by decreasing last_purchase timestamp and only take the first one using size: 1
{
"query": {
"nested": {
"path": "product_achievement",
"query": {
"term": {
"product_achievement.products": 1
}
},
"inner_hits": {
"size": 1,
"sort": {
"product_achievement.last_purchase": "desc"
}
}
}
}
}

I don't get any documents back from my elasticsearch query. Can someone point out my mistake?

I thought I had figured out Elasticsearch but I suspect I have failed to grok something, and hence this problem:
I am indexing products, which have a huge number of fields, but the ones in question are:
{
"show_in_catalogue": {
"type": "boolean",
"index": "no"
},
"prices": {
"type": "object",
"dynamic": false,
"properties": {
"site_id": {
"type": "integer",
"index": "no"
},
"currency": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "float"
},
"gross_tax": {
"type": "integer",
"index": "no"
}
}
}
}
I am trying to return all documents where "show_in_catalogue" is true, and there is a price with site_id 1:
{
"filter": {
"term": {
"prices.site_id": "1",
"show_in_catalogue": true
}
},
"query": {
"match_all": {}
}
}
This returns zero results. I also tried an "and" filter with two separate terms - no luck.
A subset of one of the documents returned if I have no filters looks like:
{
"prices": [
{
"site_id": 1,
"currency": "GBP",
"value": 595,
"gross_tax": 1
},
{
"site_id": 2,
"currency": "USD",
"value": 745,
"gross_tax": 0
}
]
}
I hope I am OK to omit so much of the document here; I don't believe it to be contingent but I cannot be certain, of course.
Have I missed a vital piece of knowledge, or have I done something terminally thick? Either way, I would be grateful for an expert's knowledge at this point. Thanks!
Edit:
At the suggestion of J.T. I also tried reindexing the documents so that prices.site_id was indexed - no change. Also tried the bool/must filter below to no avail.
To clarify, the reason I'm using an empty query is that the web interface may supply a query string, but the same code is used to simply filter all products. Hence I left in the query, but empty, since that's what Elastica seems to produce with no query string.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}
}
}
You have site_id set as {"index": "no"}. This tells ElasticSearch to exclude the field from the index which makes it impossible to query or filter on that field. The data will still be stored. Likewise, you can set a field to only be in the index and searchable, but not stored.
I'm new to ElasticSearch as well and can't always grok the questions! I'm actually confused by you query. If you are going to "just filter" then you don't need a query. What I don't understand is your use of two fields inside the term filter. I've never done this. I guess it acts as an OR? Also, if nothing matches, it seems to return everything. If you wanted a query with the results of that query filtered, then you would want to use a
-d '{
"query": {
"filtered": {
"query": {},
"filter": {}
}
}
}'
If you just want to apply filters is the filter that should work without any "query" necessary
-d '{
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}'

Resources