I have a document structure like this. For this below two documents, we have nested documents called interaction info. I just need to get only the documents that have title duration and their value is greater than 60
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "11"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "145"
}
]
},
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "120"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "60"
}
]
}
]
Is it possible to get only the document that has title: duration and their value is greater than 60.Value Property in the nested Document is text and keyword.
There are few basic mistakes in your solution, in order to utilize the range query(ie find a document which has more than 60 value, you need to store them as an integer in your case).
Also please refer this official guide which has a similar example.
Let me show you a step-by-step example on how to do it.
Index def
{
"mappings" :{
"properties" :{
"interactionInfo" :{
"type" : "nested"
},
"key" : {
"type" : "keyword"
}
}
}
}
Index sample docs
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 120. --> note, not using `""` double quotes which would store them as integer
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 60
}
]
}
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 11
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 145
}
]
}
Search query
{
"query": {
"nested": {
"path": "interactionInfo",
"query": {
"bool": {
"must": [
{
"match": {
"interactionInfo.title": "duration"
}
},
{
"range": {
"interactionInfo.value": {
"gt": 60
}
}
}
]
}
}
}
}
}
And your expected search result
"hits": [
{
"_index": "nestedsoint",
"_type": "_doc",
"_id": "2",
"_score": 2.0296195,
"_source": {
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 120
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 60
}
]
}
}
]
Related
I am working with an Elasticsearch index with data like this:
"_source": {
"article_number": "123456",
"title": "Example item #1",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Grey"
}
]
},
"_source": {
"article_number": "654321",
"title": "Example item #2",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Red"
}
]
}
The goal is to dynamically generate search inputs in a page where there is one search input for each unique value of attributes.key and within that input one value for each corresponding value of attributes.value. So in this case I would want to render a "Type" input offering only the value "Bag" and a "Color" input offering the values "Grey" and "Red."
I am trying to accomplish this with an aggregation that will give me a unique set of all values of attributes.key along with an array of all the values of attributes.value that are associated with each key. An example of a result that would fit what I am hoping for would be this:
{
[
{
"key": "Type",
"values": [{
"name": "Bag",
"doc_count": 2
}]
},
{
"key": "Color",
"values": [{
"name": "Grey",
"doc_count": 1
},
{
"name": "Red",
"doc_count": 1
}]
}
}
I have tried nested and reverse nested aggregations, as well as composite aggregations, but so far without success.
Assuming your index mapping looks like this:
PUT attrs
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
}
you can achieve the desired results with the following combination of a nested terms aggregation and its sub-aggregation:
POST attrs/_search
{
"size": 0,
"aggs": {
"nested_context": {
"nested": {
"path": "attributes"
},
"aggs": {
"by_keys": {
"terms": {
"field": "attributes.key",
"size": 10
},
"aggs": {
"by_values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
}
}
I have been struggling for a week trying to get correct data out of an Elasticsearch nested aggregtation index. Below is my index mapping and two sample documents inserted. What i want to find is:
Match all documents with the field xforms.sentence.tokens.value equal to 24
Within the matched set of documents do a count of matches grouped by xforms.sentence.tokens.tag where xforms.sentence.tokens.value equal to 24
So as an example in the inserted documents below the output i expect is:
{"JJ": 1, "NN": 1}
{
"_doc": {
"_meta": {},
"_source": {},
"properties": {
"originalText": {
"type": "text"
},
"testDataId": {
"type": "text"
},
"xforms": {
"type": "nested",
"properties": {
"sentence": {
"type": "nested"
},
"predicate": {
"type": "nested"
}
}
},
"corpusId": {
"type": "text"
},
"row": {
"type": "text"
},
"batchId": {
"type": "text"
},
"processor": {
"type": "text"
}
}
}
}
A sample doc inserted is as follows:
{
"_id": "28",
"_source": {
"testDataId": "5e97e9bef033448b893e485baa0fdf15",
"originalText": "Some text with the word 24",
"xforms": [{
"sentence": {
"tokens": [{
"lemma": "Some",
"index": 1,
"after": " ",
"tag": "JJ",
"value": "Some"
},
{
"lemma": "text",
"index": 2,
"after": " ",
"tag": "NN",
"value": "text"
},
{
"lemma": "with",
"index": 3,
"after": " ",
"tag": "NN",
"value": "with"
},
{
"lemma": "the",
"index": 4,
"after": "",
"tag": "CD",
"value": "the"
},
{
"lemma": "word",
"index": 5,
"after": " ",
"tag": "CC",
"value": "word"
},
{
"lemma": "24",
"index": 6,
"after": " ",
"tag": "JJ",
"value": "24"
}
],
"type": "RAW"
},
"originalSentence": "Some text with the word 24 in it",
"id": "e724611d8c024bcb8f0158b60e3df87e"
}]
}
},
{
"_id": "56",
"_source": {
"testDataId": "5e97e9bef033448b893e485baa0fad15",
"originalText": "24 word",
"xforms": [{
"sentence": {
"tokens": [{
"lemma": "24",
"index": 1,
"after": " ",
"tag": "NN",
"value": "24"
},
{
"lemma": "word",
"index": 2,
"after": " ",
"tag": "JJ",
"value": "word"
}
],
"type": "RAW"
},
"originalSentence": "24 word",
"id": "e724611d8c024bcb8f0158b60e3d123"
}]
}
}
Expanding on #Gibbs's answer, #N Kiram you'll need to set the tokens as nested too:
{
"xforms":{
"type":"nested",
"properties":{
"sentence":{
"type":"nested",
"properties":{
"tokens":{ <----
"type":"nested"
}
}
},
"predicate":{
"type":"nested"
}
}
}
}
Then and only then will your aggs yield the correct counts:
{
"aggregations":{
"xforms":{
"doc_count":8,
"inner":{
"doc_count":2,
"tag_count":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"JJ",
"doc_count":1
},
{
"key":"NN",
"doc_count":1
}
]
}
}
}
}
}
Side note: you'll have to reindex in order for the changed mapping to apply.
{
"aggs": {
"xforms": {
"nested": { //Nested aggregation
"path": "xforms.sentence"
},
"aggs": {
"inner": { //Counting only within the matching doc
"filter": {
"bool": {
"filter": { //Filtering docs with value=24
"terms": {
"xforms.sentence.tokens.value": [
"24"
]
}
}
}
},
"aggs" : {
"tag_count":{ //On filtered doc, doing terms aggregation on tag's keyword version as tag is of type text
"terms":{
"field":"xforms.sentence.tokens.tag.keyword"
}
}
}
}
}
}
}
}
It provides the below output
"aggregations": {
"xforms": {
"doc_count": 2,
"inner": {
"doc_count": 2,
"tag_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "JJ",
"doc_count": 2
},
{
"key": "NN",
"doc_count": 2
},
{
"key": "CC",
"doc_count": 1
},
{
"key": "CD",
"doc_count": 1
}
]
}
}
}
}
Elasticsearch 7.7 and I'm using the official php client to interact with the server.
My issue was somewhat solved here: https://discuss.elastic.co/t/need-to-return-part-of-a-doc-from-a-search-query-filter-is-parent-child-the-way-to-go/64514/2
However "Types are deprecated in APIs in 7.0+" https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html
Here is my document:
{
"offering_id": "1190",
"account_id": "362353",
"service_id": "20087",
"title": "Quick Brown Mammal",
"slug": "Quick Brown Fox",
"summary": "Quick Brown Fox"
"header_thumb_path": "uploads/test/test.png",
"duration": "30",
"alter_ids": [
"59151",
"58796",
"58613",
"54286",
"51812",
"50052",
"48387",
"37927",
"36685",
"36554",
"28807",
"23154",
"22356",
"21480",
"220",
"1201",
"1192"
],
"premium": "f",
"featured": "f",
"events": [
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "boo"
},
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
},
{
"event_id": "9999",
"start_date": "2020-08-11 11:30:00",
"registration_count": "41",
"description": "test"
}
]
}
Notice how the object may have one or many "events"
Searching based on event data is the most common use case.
For example:
Find events that start before 12pm
Find events with a description of "xyz"
List find events with a start date in the next 10 days.
I would like to NOT return any events that didn't match the query!
So, for example Find events with a description of "xyz" for a given service
{
"query": {
"bool": {
"must": {
"match": {
"events.description": "xyz"
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"service_id": 20087
}
}
]
}
}
}
}
}
I would want the result to look like this:
{
"offering_id": "1190",
"account_id": "362353",
"service_id": "20087",
"title": "Quick Brown Mammal",
"slug": "Quick Brown Fox",
"summary": "Quick Brown Fox"
"header_thumb_path": "uploads/test/test.png",
"duration": "30",
"alter_ids": [
"59151",
"58796",
"58613",
"54286",
"51812",
"50052",
"48387",
"37927",
"36685",
"36554",
"28807",
"23154",
"22356",
"21480",
"220",
"1201",
"1192"
],
"premium": "f",
"featured": "f",
"events": [
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
}
]
}
However, instead it just returns the ENTIRE document, with all events.
Is it even possible to return only a subset of the data? Maybe with Aggregations?
Right now, we're doing an "extra" set of filtering on the result set in the application (php in this case) to strip out event blocks that don't match the desired results.
It would be nice to just have elastic give directly what's needed instead of doing extra processing on the result to pull out the applicable event.
Thought about restructuring the data to instead have it based around "events" but then I would be duplicating data since every offering will have the parent data too.
This used to be in SQL, where there was a relation instead of having the data nested like this.
A subset of the nested data can be returned using Nested Aggregations along with Filter Aggregations
To know more about these aggregations refer these official documentation :
Filter Aggregation
Nested Aggregation
Index Mapping:
{
"mappings": {
"properties": {
"offering_id": {
"type": "integer"
},
"account_id": {
"type": "integer"
},
"service_id": {
"type": "integer"
},
"title": {
"type": "text"
},
"slug": {
"type": "text"
},
"summary": {
"type": "text"
},
"header_thumb_path": {
"type": "keyword"
},
"duration": {
"type": "integer"
},
"alter_ids": {
"type": "integer"
},
"premium": {
"type": "text"
},
"featured": {
"type": "text"
},
"events": {
"type": "nested",
"properties": {
"event_id": {
"type": "integer"
},
"registration_count": {
"type": "integer"
},
"description": {
"type": "text"
}
}
}
}
}
}
Search Query :
{
"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "events"
},
"aggs": {
"filter": {
"filter": {
"match": { "events.description": "xyz" }
},
"aggs": {
"total": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
Search Result :
"hits": [
{
"_index": "foo21",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "events",
"offset": 1
},
"_score": 1.0,
"_source": {
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
}
}
]
Second Method :
{
"query": {
"bool": {
"must": [
{
"match": {
"service_id": "20087"
}
},
{
"nested": {
"path": "events",
"query": {
"bool": {
"must": [
{
"match": {
"events.description": "xyz"
}
}
]
}
},
"inner_hits": {
}
}
}
]
}
}
}
You can even go through this SO answer:
How to filter nested aggregation bucket?
Returning a partial nested document in ElasticSearch
I have the following dataset
[
{
"rating": "10",
"subject": "maths"
},
{
"rating": "9",
"subject": "physics"
},
{
"rating": "10",
"subject": "chemistry"
},
{
"rating": "5",
"subject": "physics"
},
{
"rating": "2",
"subject": "geography"
},
{
"rating": "5",
"subject": "maths"
},
{
"rating": "1",
"subject": "geography"
},
{
"rating": "5",
"subject": "maths"
},
{
"rating": "8",
"subject": "chemistry"
}
]
What I need to do is find the avg rating for each subject, and then calculate the # of subjects in ranges of rating (0-2,2-5,5-8,8-10) with an elastic search query.
The query I have so far creates buckets for each subject calculating the avg of each bucket. But I can't find how to do a range aggregation on the result of the composite aggregation. Is it even possible? Is there an alternative?
Here is my query that buckets the data according to the subject and calculates the avg rating.
GET kibana_sample/_search
{
"size":0,
"aggs" : {
"my_buckets": {
"composite" : {
"sources" : [
{ "subject": { "terms" : { "field": "subject" } } }
]
},
"aggs": {
"avg_rating": {
"avg" : { "field" : "rating" }
}
}
}
}
}
It results in the following.
"aggregations": {
"my_buckets": {
"buckets": [
{
"key": {
"subject": "maths"
},
"doc_count": 3,
"avg_rating": {
"value": 6.66666667
}
},
{
"key": {
"subject": "physics"
},
"doc_count": 2,
"avg_rating": {
"value": 7
}
},
{
"key": {
"subject": "chemistry"
},
"doc_count": 2,
"avg_rating": {
"value": 9
}
},
{
"key": {
"subject": "geography"
},
"doc_count": 2,
"avg_rating": {
"value": 1.5
}
}
]
}
}
It's all good, but now I need to perform a range aggregation on top of this result to get the number of subjects in ranges of ratings
eg:
ratings range: {0-2}: 1 subject, {2-5}: 0 subjects, {5-8}: 2 subjects,
{8-10}: 1 subject
You can use pipeline aggregations in order to concat one aggregation results through more aggregations. Another thing you can do is use scripts in the pipeline, in order to filter only the relevant results.
Check out for the scripts examples here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html
I have product items indexed in Elasticsearch where common information contains some parameters list (the list varies from 3 to 20 params based on specific product). Params has been indexed as nested values see products JSON example below.
{
"sku": "asd8574fdf",
"title": "Test product",
"params": [
{
"unit": "mm",
"value": 100,
"name": "width"
}
,
{
"unit": "mm",
"value": 60,
"name": "height"
}
]
}
Now I dont know how to query product items properly based on params. For example how to query products with 'width = 60' . When I do query like below this will find any item that has any param with name 'width' and value=60 but in even other params than width. Is there any way how to do it properly or index params in some other way?
"bool": {
"must": [
{
"term": {
"product.params.name": "width"
}
}
,
{
"term": {
"product.params.value": "60"
}
}
],
}
EDIT #1
I have nested mapping defined correctly but problem is that query above still finds product item as the above, because it finds item with name 'width' but match value=600 for 'height' param. The query like this cant say that value=60 must be for the param with name 'width'. So it searches values across all params.
Make sure your mapping contains "type": "nested", and then you can set up your query with a nested query or nested filter. Be careful that you have the "path" parameter and field names set correctly. That trips a lot of people up.
To test, I set up an index with a mapping appropriate for what you posted here, and three docs, each of which have a nested doc with "name": "width" and a nested doc with "value": 60, but only one that should match your query:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"params": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"unit": {
"type": "string"
},
"value": {
"type": "long"
}
}
},
"sku": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
}
}
PUT /test_index/doc/1
{
"sku": "asd8574fdf",
"title": "Test product 1",
"params": [
{
"unit": "mm",
"value": 100,
"name": "width"
}
,
{
"unit": "mm",
"value": 60,
"name": "height"
}
]
}
PUT /test_index/doc/2
{
"sku": "asd8574ghg",
"title": "Test product 2",
"params": [
{
"unit": "mm",
"value": 100,
"name": "width"
}
,
{
"unit": "mm",
"value": 60,
"name": "depth"
}
]
}
PUT /test_index/doc/3
{
"sku": "asd8574ghg",
"title": "Test product 3",
"params": [
{
"unit": "mm",
"value": 60,
"name": "width"
}
,
{
"unit": "mm",
"value": 10,
"name": "height"
}
]
}
Then I can get back the correct document with this query:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "params",
"query": {
"bool": {
"must": [
{
"term": {
"params.name": "width"
}
},
{
"term": {
"params.value": 60
}
}
]
}
}
}
}
}
}
}
...
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"sku": "asd8574ghg",
"title": "Test product 3",
"params": [
{
"unit": "mm",
"value": 60,
"name": "width"
},
{
"unit": "mm",
"value": 10,
"name": "height"
}
]
}
}
]
}
}
Here is all the code I used:
http://sense.qbox.io/gist/2a299224da805e64932a2fec6990f2e173820b3b