Elasticsearch agg filter using an array of values - elasticsearch

{ "colors":["red","black","blue"] }
{ "colors":["red","black"] }
{ "colors":["red"] }
{ "colors":["orange, green"] }
{ "colors":["purple"] }
How can I run an agg that filters for specific values contained in the array field?
For example, I only want the count of "red" and wish to exclude its other siblings from the aggregation result.
Note: I cannot use an "include" pattern for "red". This example is simplistic, the real-world example has a long list of string values that are unique.
I would like to filter the agg using an array of string values.

From docs
For matching based on exact values the include and exclude parameters can simply take an array of strings that represent the terms as they are found in the index:
{
"aggs": {
"colors": {
"terms": {
"field": "colors",
"include": [ "red","black" ]
}
}
}
}

Related

How can I find entries in elastic where a specific value is present in array?

Ok, this is the schema:
id
generated{
status{
myStatuses[]
}
}
So having now these entries:
id=1
generated.status.myStatuses=['busy', 'free']
id=2
generated.status.myStatuses=['busy']
id=3
generated.status.myStatuses=['free']
I want to match all the documents where "generated.status.myStatuses" contains the word "free".
In the example above I would find id=1 and id=3.
There's no dedicated array datatype in ES so you can treat your keyword arrays as keywords. This means either
GET generated/_search
{
"query": {
"match": {
"generated.status.myStatuses": "free"
}
}
}
or, for exact matches,
GET generated/_search
{
"query": {
"term": {
"generated.status.myStatuses.keyword": "free"
}
}
}
{
"query":{
"match":{
"generated.status.myStatuses":"free"
}
}
}
You need to use match or term queries based on your data mapping.

Aggregate over top hits ElasticSearch

My documents are structured in the following way:
{
"chefInfo": {
"id": int,
"employed": String
... Some more recipe information ...
}
"recipe": {
... Some recipe information ...
}
}
If a chef has multiple recipes, the nested chefInfo block will be identical in each document. My problem is that I want to do an aggregation of a field in the chefInfo part of the document. However, this doesn't take into account for the fact that the chefInfo block is a duplicate.
So, if the chef with the id of 1 is on 5 recipes and I am aggregating on the employed field then this particular chef, will represent 5 of the counts in the aggregation, whereas, I want them to only count a single one.
I thought about doing a top_hits aggregation on the chef_id and then I wanted to do a sub-aggregation over all of the buckets but I can't work out how to do the counts over the results of all the buckets.
Is it possible what I want to do?
For elastic every document in itself is unique. In your case you want to define uniqueness based on a different field, here chefInfo.id. To find unique count based on this field you have to make use of cardinality aggregation.
You can apply the aggregation as below:
{
"aggs": {
"employed": {
"nested": {
"path": "chefInfo"
},
"aggs": {
"employed": {
"terms": {
"field": "chefInfo.employed.keyword"
},
"aggs": {
"employed_unique": {
"cardinality": {
"field": "chefInfo.id"
}
}
}
}
}
}
}
}
In the result employed_unique give you the expected count.

Elasticsearch order by type

I'm searching an index with multiple types by simply using 'http://es:9200/products/_search?q=sony'. This will return a lot of hits with many different types. The hits array contains all the results but not in the order I want it to; i want the 'television' type to always show before the rest. Is it possible at all to order by type?
You can achieve this by sorting on the pre-defined field _type. The query below sorts results in ascending order of document types.
POST <indexname>/_search
{
"sort": [
{
"_type": {
"order": "asc"
}
}
],
"query": {
<query goes here>
}
}
I do it by adding a numeric field _is_OF_TYPE to the indexed documents and set it to 1 for those docs that are of the given type. Then just sort on those fields in any order you want.
For example:
Document A:
{
_is_television: 1,
... some television props here ...
}
Document B:
{
_is_television: 1,
... another television props here ...
}
Document C:
{
_is_radio: 1,
... some radio props here ...
}
and so on...
Then in ElasricSearch query:
POST radio,television,foo,bar,baz/_search
{
"sort": [
{"_is_television": {"unmapped_type" : "long"}}, // television goes first
{"_is_radio": {"unmapped_type" : "long"}}, // then radio
{"_is_another_type": {"unmapped_type" : "long"}} // ... and so on
]
}
The benefit of this solution is speed. You simply sort on numeric fields. No script sorting required.

Elastic Search filter with aggregate like Max or Min

I have simple documents with a scheduleId. I would like to get the count of documents for the most recent ScheduleId. Assuming Max ScheduleId is the most recent, how would we write that query. I have been searching and reading for few hours and could get it to work.
{
"aggs": {
"max_schedule": {
"max": {
"field": "ScheduleId"
}
}
}
}
That is getting me the Max ScheduleId and the total count of documents out side of that aggregate.
I would appreciate if someone could help me on how take this aggregate value and apply it as a filter (like a sub query in SQL!).
This should do it:
{
"aggs": {
"max_ScheduleId": {
"terms": {
"field": "ScheduleId",
"order" : { "_term" : "desc" },
"size": 1
}
}
}
}
The terms aggregation will give you document counts for each term, and it works for integers. You just need to order the results by the term instead of by the count (the default). And since you only want the highest ScheduleID, "size":1 is adequate.
Here is the code I used to test it:
http://sense.qbox.io/gist/93fb979393754b8bd9b19cb903a64027cba40ece

Filter facet returns count of all documents and not range

I'm using Elasticsearch and Nest to create a query for documents within a specific time range as well as doing some filter facets. The query looks like this:
{
"facets": {
"notfound": {
"query": {
"term": {
"statusCode": {
"value": 404
}
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
In the specific case, the total hits of the search is 21 documents, which fits the documents within that time range in Elasticsearch. But the "notfound" facet returns 38, which fits the total number of ErrorDocuments with a StatusCode value of 404.
As I understand the documentation, facets collects data from withing the search. In this case, the "notfound" facet should never be able to return a count higher that 21.
What am I doing wrong here?
There's a distinct difference between filter/query/filtered_query/facet filter which is good to know.
Top level filter
{
filter: {}
}
This acts as a post-filter, meaning it will filter the results after the query phase has ended. Since facets are part of the query phase filters do not influence the documents that are facetted over. Filters do not alter score and are therefor very cacheable.
Top level query
{
query: {}
}
Queries influence the score of a document and are therefor less cacheable than filters. Queries run in the query phase and thus also influence the documents that are facetted over.
Filtered query
{
query: {
filtered: {
filter: {}
query: {}
}
}
}
This allows you to run filters in the query phase taking advantage of their better cacheability and have them influence the documents that are facetted over.
Facet filter
"facets" : {
"<FACET NAME>" : {
"<FACET TYPE>" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
this allows you to apply a filter to the documents that the facet is run over. Remember that the it'll be a combination of the queryphase/facetfilter unless you also specify global:true on the facet as well.
Query Facet/Filter Facet
{
"facets" : {
"wow_facet" : {
"query" : {
"term" : { "tag" : "wow" }
}
}
}
}
Which is the one that #thomasardal is using in this case which is perfectly fine, it's a facet type which returns a single value: the query hit count.
The fact that your Query Facet returns 38 and not 21 is because you use a filter for your time range.
You can fix this by either doing the filter in a filtered_query in the query phase or apply a facet filter(not a filter_facet) to your query_facet although because filters are cached better you better use facet filter inside you filter facet.
Confusingly Filter Facets are specified using .FacetFilter() on the search object. I will change this in 1.0 to avoid future confusion.
Sadly: .FacetFilter() and .FacetQuery() in NEST do not allow you to specify a facet filter like you can with other facets:
var results = typedClient.Search<object>(s => s
.FacetTerm(ft=>ft
.OnField("myfield")
.FacetFilter(f=>f.Term("filter_facet_on_this_field", "value"))
)
);
You issue here is that you are performing a Filter Facet and not a normal facet on your query (which will follow the restrictions applied via the query filter). In the JSON, the issue is because of the "query" between the facet name "notfound" and the "terms" entry. This is telling Elasticsearch to run this as a separate query and facet on the results of this separate query and not your main query with the date range filter. So your JSON should look like the following:
{
"facets": {
"notfound": {
"term": {
"statusCode": {
"value": 404
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
Since I see you have this tagged with NEST as well, in your call using NEST, you are probably using FacetFilter on your search request, switch this to just Facet to get the desired result.

Resources