How can I get ElasticSearch aggregations to count the parent documents instead of the nested documents - elasticsearch

My ElasticSearch index has nested documents to indicate the places where various events occurred related to the document. I am using aggregations to get facets of the places. The count returned is the count of the number of occurrences of the place. For example, if a document has a birth and death place of California, the aggregation count for California is 2. I would like the aggregation count to be the number of documents containing a particular place, rather than the number of child documents containing the place. The relevant part of my schema looks like this:
"mappings": {
"document": {
"properties": {
"docId" : { "type": "keyword" },
"place": {
"type": "nested",
"properties": {
"id": { "type": "keyword" },
"type": { "type": "keyword" },
"loc": { "type" : "geo_point" },
"text": {
"type": "text",
"analyzer": "english",
"copy_to" : "text"
}
},
"dynamic": false
}
}
}
}
I can get facets with a simple aggregation like this, which retrieves the places with type place.vital.* (e.g. place.vital.birth, place.vital.death, etc), but counts the number of nested documents, not the number of parent documents.
"aggs": {
"place.vital": {
"aggs": {
"types": {
"aggs": {
"values": {
"terms": {
"field": "place.id"
}
}
},
"terms": {
"field": "place.type",
"include": "place\\.vital\\..*"
}
}
},
"nested": {
"path": "place"
}
}
Is it possible to tweak my aggregation so that it only counts each parent document once?

Use reverse nested aggregation. This will then create an aggregation with the nested counts and a sub aggregation with the parent counts.
See how to return the count of unique documents by using elasticsearch aggregation for more detail.

I'm sure you can do it with nested fields, but not with parent child relationships. If you are looking for places Why don't you search on places index and filter by child?
Has child query

Related

Nested Fields, Wildcard Queries and Aggregations in Elasticsearch

I have an index that collects web redirects data for various sites. I am using a nested field to collect the data as shown in the mapping below:
"chain": {
"type": "nested",
"properties": {
"url.position": {
"type": "long"
},
"url.full": {
"type": "text"
},
"url.domain": {
"type": "keyword"
},
"url.path": {
"type": "keyword"
},
"url.query": {
"type": "text"
}
}
}
As you can imagine, each document contains an array of url chains, the size of the array being equal to number of web redirects. I want to get aggregations based on wildcard/regexp matches to url.query field. Here is a sample query:
GET push_url_chain/_search
{
"query": {
"nested": {
"path": "chain",
"query": {
"regexp": {
"chain.url.query": "aff_c.*"
}
}
}
},
"size": 0,
"aggs": {
"dataFields": {
"nested": {
"path": "chain"
},
"aggs": {
"offers": {
"terms": {
"field": "chain.url.domain",
"size": 30
}
}
}
}
}
}
The above query does produce aggregated results but not the way I want.
I want to see chain.url.domain aggregations for the urls that contain the aff_c.* phrase. Right now it is looking at all the urls in the chain and then aggregating the buckets by doc_count regardless of whether that url/domain has the particular phrase. I hope I have been able to explain this clearly. How do I get my results to show bucket aggregations that contain domains that have aff_c.* phrase match to the query field of the url.
I would also like to know how I can use = or / in my wildcard or regexp queries. It is not producing any results if I use the above symbols in my queries.
Tha
Nested query returns all documents where a nested document matches the condition, you get matched nested docs only in inner_hits.
Aggregation is applied on top of these documents, so all domains are coming in terms
You need to use nested aggregation to gets only matching terms.
{
"size": 0,
"aggs": {
"Name": {
"nested": {
"path": "chain"
},
"aggs": {
"matched_doc": {
"filter": { --> filter for url
"match_phrase_prefix": {
"chain.url.query": "abc"
}
},
"aggs": {
"domain": {
"terms": {
"field": "chain.url.domain", -- terms for matched url
"size": 10
}
}
}
}
}
}
}
}
You can use match_phrase_prefix instead of regex. It has better performance.
Standard analyzer while generating tokens removes "/","=". So if you want to use regex or wildcard and look for these , you need to use keyword field not text field.

Elasticsearch query to find documents with number of values of a term equal to a specified number

I have an Elasticsearch index "library" with below mapping:
{
"mappings": {
"book": {
"properties": {
"title": { "type": "text", "index": "not_analyzed" },
"author": { "type": "text", "index": "not_analyzed" },
"price": { "type": "integer" },
}
}
}
Now I want make a query to find all documents(book) where number of author is equal to 3. i.e. I want to make a query which will match
curl -XGET "http://localhost:9200/library/_search?pretty=true" -d '{
"query": {
"match": {
Number of values of term "author" = 3.
}
}
}'
Is there any way to make such an query without adding an extra term?
[I know the aggregation to find all possible values of a term in search result but wasn't able to convert that aggregation in according to above criteria.]
Can't find a way to get exactly each author with 3 documents.
Aggregation will give you all possible values. But, it also show you the doc_count - and there we can find our way:
{
"size": 0,
"aggregations": {
"authors": {
"terms": {
"field": "author",
"min_doc_count": 3,
"size": 5
}
}
}
}
min_doc_count - will filter only buckets with, at least, 3 documents.
size - will give you only first 5 documents (remember that, by default, buckets are sorted by 'doc_count' ascending).
Now you can adjust size to get exactly those authors with 3 documents.

Aggregation in elastic search

Need help with aggregation in elastic search. Is it possible to agreggate values of a particular field as an array or list - This is more of a grouping for example instead of getting the result as
{"Book_Id":"102","Review_Text":"DescentRead"},{"Book_Id":"102","Review_Text":"For Kids."},{"Book_Id":"103","Review_Text":"Great"},{"Book_Id":"103","Review_Text":"Excellent"}
can i get all the reviews of each book as a list ?
[ { Book_Id: 102, Review_Text: [ "DescentRead", "For Kids"], { Book_Id: 103, reviews: [ "Great","Excellent"] } ]
Tried some trail with aggs but not able to get it. Any pointers would help!!
Could aggregations with top hits work? The limitation is that you need to specify a max amount of hits per aggregation (will return the top 100 results per book ID in the example ordered by the review text), but apart from that you can do run it as a normal query and specify which fields to return, how they should be sorted (to get the top hits), etc.
Example aggs query:
POST
http://myserver:9200/books/book/_search
{
"size": 0,
"aggs": {
"BookReviews": {
"terms": {
"field": "Book_Id.keyword"
},
"aggs": {
"top_reviews": {
"top_hits": {
"sort": [ { "Review_Text.keyword": { "order": "desc" } } ],
"size": 100,
"_source": {
"includes": [ "Review_Text" ]
}
}
}
}
}
}
}
Note that the name for the aggregations ("BookReviews" and "top_reviews") you can use any name you choose, and that same name will appear in the resulting aggregation tree. You can do multi level aggregations on terms in your index, and inclute top hits on any level, typically for drill-down reporting or similar cases.
Mapping used:
{
"books": {
"mappings": {
"book": {
"properties": {
"Book_Id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Review_Text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
"size": 0 in the root node will omit any hits for the search and only return the aggs trees.
You can also add a normal "query": {} block on the same level as size and aggs if you need to filter the results before elastic starts aggregating.
Read more in the elasticsearch documentation pages:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
(If you provide a more complete example dataset, we can give a more realistic example query, as there isn't a lot of data in the example for sorting or scoring the results)

Nested Objects aggregations (with Kibana)

We got an Elasticsearch index containing documents with a subset of arbitrary nested object called devices. Each of those devices has a key call "aw".
What I try to accomplish, is to get an average of the aw key for each device type.
When trying to aggregate and visualize this average I don't get the average of the aw of every device type, but of all devices within the documents containing the specific device.
So instead of fetching all documents where device.id=7 and aggregating the awper device.id, Elasticsearch / Kibana fetches all documents containing device.id=7 but then builds it's average using all devices within the documents.
Out index mapping looks like this (only important parts):
"mappings" : {
"devdocs" : {
"_all": { "enabled": false },
"properties" : {
"cycle": {
"type": "object",
"properties": {
"t": {
"type": "date",
"format": "dateOptionalTime||epoch_second"
}
}
},
"devices": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"aw": {
"type": "long"
}
"t": {
"type": "date",
"format": "dateOptionalTime||epoch_second"
},
}
}
}
}
Kibana generates the following query:
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"cycle.t": {
"gte": 1290760324744,
"lte": 1448526724744,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"terms": {
"field": "devices.name",
"size": 35,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"avg": {
"field": "devices.aw"
}
}
}
}
}
}
Is there a way to aggregate the average aw on device level, or what am I doing wrong?
Kibana doesn't support nested aggregations yet , Nested Aggregations Issue.
I had the same issue and solved it by building kibana from src from this fork by user ppadovani. [branch : nestedAggregations]
See instructions to build kibana from source here.
After building when you run kibana now it will contain a Nested Path text box and a reverse nested checkbox in advanced options for buckets and metrics.
Here is an example of nested terms aggregation on lines.category_1, lines.category_2, lines.category_3 and lines being of nested type. using the above with three buckets, :
I would suggest adding filter aggregation to leave everything with aw: 7.
Defines a single bucket of all the documents in the current document
set context that match a specified filter. Often this will be used to
narrow down the current aggregation context to a specific set of
documents.
Kibana does not support Nested json.

Returning a partial nested document in ElasticSearch

I'd like to search an array of nested documents and return only those that fit a specific criteria.
An example mapping would be:
{"book":
{"properties":
{
"title":{"type":"string"},
"chapters":{
"type":"nested",
"properties":{"title":{"type":"string"},
"length":{"type":"long"}}
}
}
}
}
}
So, say I want to look for chapters titled "epilogue".
Not all the books have such a chapter, but If I use a nested query I'd get, as a result, all the chapters in a book that has such a chapter. While all I'm interested is the chapters themselves that have such a title.
I'm mainly concerned about i/o and net traffic since there might be a lot of chapters.
Also, is there a way of retrieving ONLY the nested document, without the containing doc?
This is a very old question I stumbled upon, so I'll show two different approaches to how this can be handled.
Let's prepare index and some test data first:
PUT /bookindex
{
"mappings": {
"book": {
"properties": {
"title": {
"type": "string"
},
"chapters": {
"type": "nested",
"properties": {
"title": {
"type": "string"
},
"length": {
"type": "long"
}
}
}
}
}
}
}
PUT /bookindex/book/1
{
"title": "My first book ever",
"chapters": [
{
"title": "epilogue",
"length": 1230
},
{
"title": "intro",
"length": 200
}
]
}
PUT /bookindex/book/2
{
"title": "Book of life",
"chapters": [
{
"title": "epilogue",
"length": 17
},
{
"title": "toc",
"length": 42
}
]
}
Now that we have this data in Elasticsearch, we can retrieve just the relevant hits using an inner_hits. This approach is very straightforward, but I prefer the approach outlined at the end.
# Inner hits query
POST /bookindex/book/_search
{
"_source": false,
"query": {
"nested": {
"path": "chapters",
"query": {
"match": {
"chapters.title": "epilogue"
}
},
"inner_hits": {}
}
}
}
The inner_hits nested query returns documents, where each hit contains an inner_hits object with all of the matching documents, including scoring information. You can see the response.
My preferred approach to this type of query is using a nested aggregation with filtered sub aggregation which contains top_hits sub aggregation. The query looks like:
# Nested and filter aggregation
POST /bookindex/book/_search
{
"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "chapters"
},
"aggs": {
"filter": {
"filter": {
"match": { "chapters.title": "epilogue" }
},
"aggs": {
"t": {
"top_hits": {
"size": 100
}
}
}
}
}
}
}
}
The top_hits sub aggregation is the one doing the actual retrieving
of nested documents and supports from and size properties among
others. From the documentation:
If the top_hits aggregator is wrapped in a nested or reverse_nested
aggregator then nested hits are being returned. Nested hits are in a
sense hidden mini documents that are part of regular document where in
the mapping a nested field type has been configured. The top_hits
aggregator has the ability to un-hide these documents if it is wrapped
in a nested or reverse_nested aggregator. Read more about nested in
the nested type mapping.
The response from Elasticsearch is (IMO) prettier (and it seems to return it faster (though this is not a scientific observation)) and "easier" to parse.

Resources