filter by child frequency in ElasticSearch - elasticsearch

I currently have parents indexed in elastic search (documents) and child (comments) related to these documents.
My first objective was to search for a document with more than N comments, based on a child query. Here is how I did it:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
}
I used score to calculate the amount of comments a document has and then I filtered the documents by this amount, using "min_score".
Now, my objective is to search not just comments, but several other child documents related to the document, always based on frequency. Something like the query bellow:
documents/document/_search
{
"query": {
"match_all": {
}
},
"filter" : {
"and" : [{
"query": {
"has_child" : {
"type" : "comment",
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
},
{
"or" : [
{"query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "Finally"
}
}
}
}
},
{ "query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "several"
}
}
}
}
}
]
}
]
}
}
The query above works fine, but it doesn't filter based on frequency as the first one does. As filters are computed before scores are calculated, I cannot use min_score to filter each child query.
Any solutions to this problem?

There is no score at all associated with filters. I'd suggest to move the whole logic to the query part and use a bool query to combine the different queries together.

Related

Add condition to filter aggregation in elastic search

I want the count of each values of a variable based on some filter applied in elastic search. For example, I want all the age groups but on the filter that the students are from California.
The age groups is text field and contains an array like this,
"age_group": ["5-6-years", "6-7-years"]
I kinda want a query like this but this ain't working. It throws an error saying
unable to parse BaseAggregationBuilder with name [count]: parser not found
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.place_of_birth": "California" }
},
"aggs" : {
"age_group" : { "count" : { "field" : "students.age_group" } }
}
}
}
}
Request help from you troops.
That's because there's no metric aggregation called count but value_count instead:
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.gender": "boys" }
},
"aggs" : {
"age_group" : { "value_count" : { "field" : "students.age_group" } }
^^^
|||
}
}
}
}
UPDATE:
After discussions, the terms aggregation was more appropriate than value_count. After fixing the mapping (which was text instead of keyword), the query worked out correctly

Elasticsearch Query to show results of field values that appear in two date ranges?

I'm storing documents related to events that happen daily. They have a name field that denotes the event.
For tracking purposes, I want to specifically track events that happened a week, or a month ago, and also track whether those events ALSO happened the previous week or previous month before that month.
For example, with the time being "now", I want to grab the documents which have names that appear in both:
{
range: { "last_seen" : { "gte" : "now-1w/d", "lte" : "now" } }
}
and
{
range: { "last_seen" : { "gte" : "now-2w/d", "lte" : "now-1w/d" } }
}
So if a document with name "visitor" appears in both ranges, it is counted. If a document with name "shutdown" appears in only one range, it is not counted.
Currently I'm only able to get all unique names that exist between one large range that encompasses both ranges that I want. It is aggregated in a daily date histogram and it lists the unique names for each day.
{
"query": {
"bool": {
"filter": [
{
"term": {
"_type": "events"
}
}
],
"must": {
"range": {
"last_seen": {
"gte": "now-2w/d"
}
}
}
}
},
"size":0,
"aggregations": {
"per_day_events": {
"date_histogram": {
"field" : "last_seen",
"interval" : "day",
"format" : "date",
"time_zone" : "America/New_York"
},
"aggregations" : {
"daily_events": {
"date_range" : {
"field": "last_seen",
"format": "date",
"ranges": [
{ "from" : "now-1w/d" }
]
},
"aggregations" : {
"unique_events": {
"cardinality": {
"field": "name.keyword"
}
}
}
}
}
}
}
}
name is a text field and last seen is a date field.
Is what I want to do possible in a single Elasticsearch query?

Multi-query match_phrase_prefix elasticsearch

I would like to query 2 different prefixes for the same field. The code below works exactly how I would like it to when working with on field:
GET /logstash-*/_search
{
"query": {
"match_phrase_prefix" : {
"type" : {
"query" : "job-source"
}
}
}
}
I could not find in the docs how to do this with two queries (I found how to search in multiple fields). I have tried a boolean should and the snippet below but both are not giving me the results I am looking for.
GET /logstash-*/_search
{
"query": {
"match_phrase_prefix" : {
"type" : {
"query" : ["job-source","job-find"]
}
}
}
}
How do I query for only documents that have type:job-source or type:job-find as the prefix?
Thank you in advance,
You can combine two match_phrase_prefix queries using should and set minimum_should_match to 1.
Sample Query:
{
"query":
{
"bool":
{
"should": [
{
"match_phrase_prefix":
{
"type": "job-source"
}
},
{
"match_phrase_prefix":
{
"type": "job-find"
}
}],
"minimum_should_match": 1
}
}
}

Elasticsearch Filter Query

I am using elasticsearch 1.5.2. I stored some products with a field named "allergic" and some others without this field. And the values of this field can be fish or milk or nuts etc. I want to make a query and to get as a result only products which doesn't have at all this field called "allergic" and to integrate this to an other aggregation query. I want to make just one query: first eliminate products which have "allergic" field and then execute the aggregation query of the second block.
How to integrate this :
{
"constant_score" : {
"filter" : {
"missing" : { "field" : "allergic" }
}
}
}
to this aggregation query:
POST tes1/_search?search_type=count
{
"aggs" : {
"fruits" : {
"filter" : {
"query":{
"query_string": {
"query": "Fruits",
"fields": [
"category"
]
}
}},
"aggs" : {
"minprice": {
"top_hits": {
"sort": [
{
"prix en €/kg": {
"order": "asc"
}
}
], "size":400
}
}
}
}} }
You need to add the query part before the aggregation call. This will filter the results and then run aggregation on the resultset.
POST tes1/_search
{
"_source": false,
"size": 1000,
"query":
{ "constant_score" : {
"filter" : {
"missing" : { "field" : "allergic" }
}
}
},
"aggs" : {
"fruits" : {
"filter" : {
"query":{
"query_string": {
"query": "Fruits",
"fields": [
"category"
]
}
}},
"aggs" : {
"minprice": {
"top_hits": {
"sort": [
{
"prix en €/kg": {
"order": "asc"
}
}
], "size":400
}
}
}
}} }
On a side note please consider upgrading ElasticSearch to the latest version as 1.x is no longer supported.

Highlighting on has_child query

In some of our types, we have a parent child setup and we want to search on parent fields and also on the child fields (and return parent) and we do query like below. When there is a has_child match is there any way to get highlighting information from the child match even though the parent is being returned. As an example, if we have mapping like the following:
PUT nested2
{
"mappings":{
"discussion":{
"properties" : {
"title":{
"type":"string"
}
}
},
"discussionPost":{
"_parent":{
"type" : "discussion"
},
"properties" : {
"post" : {
"type" : "string"
}
}
}
}
}
And we issue a query like below, highlight information is returned if there is a match on parent field but not if the parent is being returned due to a has_child match:
POST nested2/discussion/_search
{
"query": {
"bool": {
"should": [
{
"prefix": {
"_all" : "cat"
}
},
{
"has_child" : {
"type" : "discussionPost",
"score_mode" : "sum",
"query" : {
"prefix": {
"_all" : "cat"
}
}
}
}
],
"minimum_should_match": 1
}
},
"highlight":{
"fields":{
"*":{}
}
}
}
Is it possible to get highlight information on what matched in the child when has_child query is being issued on the parent?
Regards
LT
It is possible to do this using inner_hits inside the has_child query clause:
{
"query": {
"bool": {
"should": [
{
"has_child" : {
"inner_hits": {
"_source": false,
"highlight":{
"order": "score",
"fields": {"*":{}}
}
},
"type" : "discussionPost",
"score_mode" : "sum",
"query" : {
"prefix": {
"_all" : "cat"
}
}
}
}
],
"minimum_should_match": 1
}
}
}

Resources