Elasticsearch facets filters - elasticsearch

I've created a facet using elasticsearch but I want to filter it just for specific words.
{
...
"facets": {
"my_facets": {
"terms": {
"field": "description",
"size": 1000
}
}
}
}
And the result contains all the words from description .
{
"my_facet": {
"_type": "terms",
"missing": 0,
"total": 180,
"other": 0,
"terms": [
{
"term": "și",
"count": 1
},
{
"term": "światłowska",
"count": 1
},
{
"term": "łódź",
"count": 1
}
]
}
}
I want my facets to contain an analyze just for specific words not for entire words finded in description .
I've already tried to use a query match inside my facet but it makes an overall analyze
like follows
{
"query_Facet_test": {
"query": {
"match": {
"description": "word1 word2"
}
}
}
}
and the result I get :
{
"query_Facet_test": {
"_type": "query",
"count": 1
}
}

You can use a bool query like this to get query facets
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "word1"
}
},
{
"match": {
"description": "word2"
}
}
]
}
},
"facets": {
"my_facets": {
"terms": {
"field": "description",
"size": 1000
}
}
}
}

Related

How to write a conditional in a search query?

I am searching among documents in a particular district. Documents have various statuses. The aim is to return all documents, except when document's status code is ABCD - such documents should only be returned if their ID is greater than 100. I have tried writing multiple queries, including the one below, which returns only the ABCD documents with ID greater than 100, and none of the other documents. What is wrong here? How can I get the non-ABCD documents as well?
"_source": true,
"from": 0,
"size": 50,
"sort": [
{
"firstStamp": "DESC"
}
],
"query": {
"bool": {
"must": [
{
"term": {
"districtId": "3755"
}
},
{
"bool": {
"must": [
{
"terms": {
"documentStatus.code.keyword": [
"ABCD"
]
}
},
{
"bool": {
"must": {
"script": {
"script": "doc['id'].value > 100"
}
}
}
}
]
}
}
]
}
}
}```
Since you have not added any index mapping, looking at your search
query data seems to be of object field data type. As far as I can
understand, your aim is to return all documents, except when the
document's status code is ABCD and document with status code ABCD
should only be returned if their ID is greater than 100.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":200,
"documentStatus":{
"code":"DEF"
}
}
{
"id":200,
"documentStatus":{
"code":"ABCD"
}
}
{
"id":100,
"documentStatus":{
"code":"ABCD"
}
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {
"documentStatus.code.keyword": [
"ABCD"
]
}
},
{
"bool": {
"must": {
"script": {
"script": "doc['id'].value > 100"
}
}
}
}
]
}
},
{
"bool": {
"must_not": {
"terms": {
"documentStatus.code.keyword": [
"ABCD"
]
}
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64351595",
"_type": "_doc",
"_id": "2",
"_score": 2.0,
"_source": {
"id": 200,
"documentStatus": {
"code": "ABCD"
}
}
},
{
"_index": "stof_64351595",
"_type": "_doc",
"_id": "3",
"_score": 0.0,
"_source": {
"id": 200,
"documentStatus": {
"code": "DEF"
}
}
}
]
You need to use must_not in your query if you want to have documents which don't have status code = ABCD. So your query would be some thing like this:
"from": 0,
"size": 50,
"sort": [
{
"firstStamp": "DESC"
}
],
{
"query": {
"bool": {
"must": [
{
"term": {
"districtId": "3755"
}
},
{
"range": {
"id": {
"gt": 100
}
}
}
],
"must_not": [
{
"terms": {
"documentStatus.code.keyword": [
"ABCD"
]
}
}
]
}
}
}

Global term aggregation with filtered count - Elasticsearch 5

I have products stored in ES and I'm trying to aggregate them by their size. I would like to design following behaviour. For each term even outside of query to receive term counts based on query.
So querying for sizes ["S", "M"] I would like to receive:
S: 1
M: 1
L: 0
Is this somehow possible?
Here is my setup where I get following result:
S: 1
M: 1
But L is completely missing.
PUT demo
{
"mappings": {
"product": {
"properties": {
"size": {
"type": "keyword"
}
}
}
}
}
PUT demo/product/1
{
"size": "S"
}
PUT demo/product/2
{
"size": "M"
}
PUT demo/product/3
{
"size": "L"
}
GET demo/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"size": [
"S",
"M"
]
}
}
]
}
},
"aggs": {
"size": {
"terms": {
"field": "size"
}
}
}
}
You can use filter.
{
"size": 0,
"query": {
"bool": {
"must": [
{ "terms": { "field": "size" } }
],
"filter": {
"terms": { "size": [ "S", "M"] }
}
}
},
"aggs": {
"size": {
"terms": { "field": "size" }
}
}
}

Document count aggregation via query in Elasticsearch (like facet.query in solr)

I have a main query and i need the number of matches for a couple of sub-queries.
In solr words I need a facet.query. What I am missing is a simple doc_count aggregation like the value_count aggregation.
Any suggestions?
I found two possible solutions which I do not like:
Use filter aggregation with value_count metric on _id:
example:
GET _search
{
"query": {
"match_main": {}
},
"aggs": {
"facetvalue1": {
"filter": {
"bool": {
"should": [
{"match": { "name": "fred" }},
{"term": { "lastname": "krueger" }}
]
}
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"facetvalue2": {
"filter": {
"term": { "name": "freddy" }
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
}
}
}
Use Multi Search API
example:
GET _msearch
{"index":"myindex"}
{"query":{"match_main": {}}}
{"index":"myindex"}
{"size": 0, "query":{"match_main": {}}, "filter": {"bool": {"should":[{"match": { "name": "fred" }},{"term": { "lastname": "krueger" }}]}}}
{"index":"myindex"}
{"size": 0, "query":{"match_main": {}},"filter": {"term": { "name": "freddy" }}}
I see that solution 2 is faster but imagine match_main as complex query!
So I would prefer solution 1 if there would be an doc_count:{} instead of value_count:{"field":"_id"}.
But back to my basic question: what is the counterpart of the solr facet.query in elasticsearch?
You can use a filters aggregation for this. Note the additional s, that is different from the filter aggregation you already mentioned.
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"values": {
"filters": {
"filters": {
"value1": {
"bool": {
"should": [
{
"match": {
"name": "fred"
}
},
{
"term": {
"lastname": "krueger"
}
}
]
}
},
"value2": {
"term": {
"name": "freddy"
}
}
}
}
}
}
}
This will return something like
"aggregations": {
"values": {
"buckets": {
"value1": {
"doc_count": 4
},
"value2": {
"doc_count": 1
}
}
}
}
Edit: As a general note, you don't have to use a metric aggregation on your bucket aggregations. If you don't provide any subaggregations, you will just get the document count. In this case, filters will provide the buckets, but multiple filter aggregations should work as well.

Elastic search find sum of two fields in single query

I have a requirement of find sum of two fields in a single query. I have managed to find the sum of one field, but facing difficulty to add two aggression in a single query.
My json look like the following way
{
"_index": "outboxprov1",
"_type": "message",
"_id": "JXpDpNefSkKO-Hij3T9m4w",
"_score": 1,
"_source": {
"team_id": "1fa86701af05a863f59dd0f4b6546b32",
"created_user": "1a9d05586a8dc3f29b4c8147997391f9",
"created_ip": "192.168.2.245",
"folder": 1,
"post_count": 5,
"sent": 3,
"failed": 2,
"status": 6,
"message_date": "2014-08-20T14:30Z",
"created_date": "2014-06-27T04:34:30.885Z"
}
}
My search query
{
"query": {
"filtered": {
"query": {
"match": {
"team_id": {
"query": "1fa86701af05a863f59dd0f4b6546b32"
}
}
},
"filter": {
"and": [
{
"term": {
"status": "6"
}
}
]
}
}
},
"aggs": {
"intraday_return": {
"sum": {
"field": "sent"
}
}
},
"aggs": {
"intraday_return": {
"sum": {
"field": "failed"
}
}
}
}
How to put two aggression in one query? Please help me to solve this issue. Thank you
You can compute the sum using script
Example:
{
"size": 0,
"aggregations": {
"age_ranges": {
"range": {
"script": "DateTime.now().year - doc[\"birthdate\"].date.year",
"ranges": [
{
"from": 22,
"to": 25
}
]
}
}
}
}
your query should contain
"script" : "doc['sent'].value+doc['failed'].value"
There can be multiple sub aggregates
{
"query": {
"filtered": {
"query": {
"match": {
"team_id": {
"query": "1fa86701af05a863f59dd0f4b6546b32"
}
}
},
"filter": {
"and": [
{
"term": {
"status": "6"
}
}
]
}
}
},
"aggs": {
"intraday_return_sent": {
"sum": {
"field": "sent"
}
},
"intraday_return_failed": {
"sum": {
"field": "failed"
}
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources