Add condition to filter aggregation in elastic search - elasticsearch

I want the count of each values of a variable based on some filter applied in elastic search. For example, I want all the age groups but on the filter that the students are from California.
The age groups is text field and contains an array like this,
"age_group": ["5-6-years", "6-7-years"]
I kinda want a query like this but this ain't working. It throws an error saying
unable to parse BaseAggregationBuilder with name [count]: parser not found
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.place_of_birth": "California" }
},
"aggs" : {
"age_group" : { "count" : { "field" : "students.age_group" } }
}
}
}
}
Request help from you troops.

That's because there's no metric aggregation called count but value_count instead:
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.gender": "boys" }
},
"aggs" : {
"age_group" : { "value_count" : { "field" : "students.age_group" } }
^^^
|||
}
}
}
}
UPDATE:
After discussions, the terms aggregation was more appropriate than value_count. After fixing the mapping (which was text instead of keyword), the query worked out correctly

Related

Multi-query match_phrase_prefix elasticsearch

I would like to query 2 different prefixes for the same field. The code below works exactly how I would like it to when working with on field:
GET /logstash-*/_search
{
"query": {
"match_phrase_prefix" : {
"type" : {
"query" : "job-source"
}
}
}
}
I could not find in the docs how to do this with two queries (I found how to search in multiple fields). I have tried a boolean should and the snippet below but both are not giving me the results I am looking for.
GET /logstash-*/_search
{
"query": {
"match_phrase_prefix" : {
"type" : {
"query" : ["job-source","job-find"]
}
}
}
}
How do I query for only documents that have type:job-source or type:job-find as the prefix?
Thank you in advance,
You can combine two match_phrase_prefix queries using should and set minimum_should_match to 1.
Sample Query:
{
"query":
{
"bool":
{
"should": [
{
"match_phrase_prefix":
{
"type": "job-source"
}
},
{
"match_phrase_prefix":
{
"type": "job-find"
}
}],
"minimum_should_match": 1
}
}
}

Aggregation of fields inside nested type field

I want to aggregate keyword type field which lies inside a nested type field. The mapping for nested field is as below:
"Nested_field" : {
"type" : "nested",
"properties" : {
"Keyword_field" : {
"type" : "keyword"
}
}
}
And the part of query which I am using to aggregate is as below:
"aggregations": {
"Nested_field": {
"aggregations": {
"Keyword_field": {
"terms": {
"field": "Nested_field.Keyword_field"
}
}
},
"filter": {
"bool": {}
}
},
}
But this is not returning correct aggregation. Even though there are Keyword_field value existing docs, the query returns 0 buckets. So, there is something wrong in my aggregation query. Can anyone help me to find what's wrong?
I think you need to provide a nested path in there. This worked in ES 5, but it looks like you're using 6 based on the "aggregations" vs "aggs", so let me know if it doesn't work and I'll scrap this answer. Give this a try:
{
"aggregations": {
"nested_level": {
"nested": {
"path": "Nested_field"
},
"aggregations": {
"keyword_field": {
"terms": {
"field": "Nested_field.Keyword_field"
}
}
}
}
}
}

How to add properties from a root object in a nested object for sorting?

A simplified example of the kind of document in our index:
{
"organisation" : {
"code" : "01310"
},
"publications" : [
{
"dateEnd" : 1393801200000,
"dateStart" : 1391986800000,
"code" : "PUB.02"
},
{
"dateEnd" : 1401055200000,
"dateStart" : 1397512800000,
"code" : "PUB.06"
}
]
}
Note that publications are mapped as nested objects because we need to filter based on a combination of the dateEnd, dateStart and publicationStatus properties.
The PUB.02 status code is special. It states: 'this publication period is valid if the current user is a member of the organisation'.
I have a problem when I want to sort on 'most recent':
{
"sort": {
"publications.dateStart" : {
"mode" : "min",
"order" : "desc",
"nested_filter" : {
"or" : [
{
"and" : [
{ "term" : { "organisation.code" : "01310" } },
{ "term" : { "publications.code" : "PUB.02" } }
]
},
{ "term" : { "publications.code" : "PUB.06" } }
]
}
}
}
}
No error is given, but the PUB.02 entry is ignored. I tried to use copy_to in my mapping to copy the value of organisation.code to the nested object, but that did not help.
Is there a way to reach for the parent document inside a nested sort?
Alternatively, is there a way to copy data from parent to the nested document?
I am currently using version 1.7 of Elasticsearch without the ability to use scripts. Upgrading to a newer version could be done if that would help the situation.
This gist shows that the sort is performed on the PUB.06 publications: https://gist.github.com/EECOLOR/2db9a1ec9d6d5c791ea6
Although the documentation does not explictly mention it does look like we cannot access the parent field in a nested filter context.
Also I wasn't able to use copy_to to add data from root/parent field to nested document. I would suggest asking in elasticsearch discuss thread you would have more luck about the reasons for this.
Before some trigger happy bloke downvotes this answer I would like to add that the query and intended results that was desired in the OP using sort could be achieved using function_score work-around.
One implementation to achieve this is as follows
1) start of with a should query
2) In the first should clause
a) use filtered query to filter documents with the `organisation.code : 01310`
b) then score these documents based on max value of reciprocal of nested document **dateStart** with terms **PUB2.0 PUB6.0**
3) In the second should clause
a) use filtered query to filter documents with those with `organisation.code not equal to 01310`
b) like before score these documents based on max value of reciprocal of nested document **dateStart** with term **PUB6.0** only
Example Query:
POST /testindex/testtype/_search
{
"query": {
"bool": {
"should": [
{
"filtered": {
"filter": {
"term": {
"organisation.code": "01310"
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.02",
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
},
{
"filtered": {
"filter": {
"not": {
"term": {
"organisation.code": "01310"
}
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
}
]
}
}
}
I'm first to admit it is not the most readable and if there is a way to 'copy_to' nested it would be much more ideal
If not simulating copy_to by injecting data in the source by client before indexing would be more simpler and flexible.
But the above is an example of how it could be done using function scores.

Elastic Search Nested Object mapping and Query for search

I am trying to use Elastic Search and I am stuck trying to query for the nested object.
Basically my object is of the following format
{
"name" : "Some Name",
"field2": [
{
"prop1": "val1",
"prop2": "val2"
},
{
"prop1": "val3",
"prop2":: "val4"
}
]
}
Mapping I used for the nested field is the following.
PUT /someval/posts/_mapping
{
"posts": {
"properties": {
"field2": {
"type": "nested"
}
}
}
}
Say now i insert elements for /field/posts/1 and /field/posts/2 etc. I have k values for field2.prop1 and i want a query which gets the posts sorted based on most match of field2.prop1 among the K values i have. What would be the appropriate query for that.
Also I tried a simple filter but even that doesnt seem to work right.
GET /someval/posts/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
},
"filter" : {
"nested" : {
"path" : "field2",
"filter" : {
"bool" : {
"must" : [
{
"term" : {"field2.prop1" : "val1"}
}
]
}
},
"_cache" : true
}
}
}
}
The above query should match atleast the first post. But it returns no match. Can anyone help to clarify whats wrong here ?
There was problem in your json structure, you used filtered query , but filter(object) was in different level than query.
Find the difference.
POST /someval/posts/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "field2",
"filter": {
"bool": {
"must": [
{
"term": {
"field2.prop1": "val1"
}
}
]
}
},
"_cache": true
}
}
}
}
}

filter by child frequency in ElasticSearch

I currently have parents indexed in elastic search (documents) and child (comments) related to these documents.
My first objective was to search for a document with more than N comments, based on a child query. Here is how I did it:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
}
I used score to calculate the amount of comments a document has and then I filtered the documents by this amount, using "min_score".
Now, my objective is to search not just comments, but several other child documents related to the document, always based on frequency. Something like the query bellow:
documents/document/_search
{
"query": {
"match_all": {
}
},
"filter" : {
"and" : [{
"query": {
"has_child" : {
"type" : "comment",
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
},
{
"or" : [
{"query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "Finally"
}
}
}
}
},
{ "query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "several"
}
}
}
}
}
]
}
]
}
}
The query above works fine, but it doesn't filter based on frequency as the first one does. As filters are computed before scores are calculated, I cannot use min_score to filter each child query.
Any solutions to this problem?
There is no score at all associated with filters. I'd suggest to move the whole logic to the query part and use a bool query to combine the different queries together.

Resources