Elasticsearch, get average document length - elasticsearch

Is there any better way in elasticsearch (other than issuing a match all query and manually averaging over the length of all returned documents) to get the average document length for a specific index?

The _size mapping field, if enabled, should give you the size of each document for free. Combining this with the avg aggregation should get you what you want. Something like:
{
"query" : {"match_all" : {}},
"aggs" : {"avg_size" : {"avg" : {"terms" : {"field" : "_size"}}}}
}

I have used this code (I have the _source enabled)
{
"query" : {"match_all" : {}},
"aggs":{
"avg_length" : { "avg" : { "script" : "_source.toString().length()"}}
}
}
Well, the chars .. .if the string are UTF-8 to get the bytes:
{
"query" : {"match_all" : {}},
"aggs":{
"avg_length" : { "avg" : { "script" : "_source.toString().getBytes(\"UTF-8\").length"}}
}
}

Shot in the dark, but facets or aggregations combined with a script might do it.
{
...,
"aggs" : {
"avg_length" : { "avg" : { "script" : "doc['_all'].length" } }
}
}

In ElasticSearch 6.2 you should just use the following line (no need to add 'terms'):
"aggs" :
{"avg_size" :
{"avg" :
{"field" : "_size"}}}
See details here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html

Related

Sum of all the values of a field in Kibana using elasticsearch query DSL

I have seen quite a few similar questions answered but they are all for older versions of Kibana, or do not actually help with my particular question.
I want to find the sum of all values in a specific field,the kibana docs give the following example code for creating the sum of a field.
POST /sales/_search?size=0
{
"query" : {
"constant_score" : {
"filter" : {
"match" : { "type" : "hat" }
}
}
},
"aggs" : {
"hat_prices" : { "sum" : { "field" : "price" } }
}
}
Based on this, the following should sum all the values in the field "tweetSentiment.polarity"
(POST /sales/_search?size=0 was removed because the UI gives an "unexpected 'p'" error with that line in.)
{
"query" : {
"constant_score" : {
"filter" : {
"match" : { "type" : "number" }
}
}
},
"aggs" : {
"hat_prices" : { "sum" : { "field" : "tweetSentiment.polarity" } }
}
}
Changing around the values for "type" and "field" between all the possible combinations of things they could be did not solve the issue either. My best guess is that this is not actually the code I want, especially after digging deep into how to create the query I am looking for.

elasticsearch percolator filter fails

I'm using a document query against a percolator that works ok. When I try to filter the percolator queries against which document percolate using queries ids, it doesn't return any result. For example:
{
"doc" : {
"text" : "This is the text within my document"
},
"highlight" : {
"order" : "score",
"pre_tags" : ["<example>"],
"post_tags" : ["</example>"],
"fields" : {
"text" : { "number_of_fragments" : 0 }
}
},
"filter":{"ids":{"values":[11,15]}}
,
"size" : 100
}
I know for sure that those ids are correct, but allways obtain "matches" : [ ]. When I don't use filter, ES retrieves correct matches.
Thanks for your help.
I think I've solved it. It seems that the filter only works on the "metadata" fields, meaning that you have to add customized fields to the queries indexed in the percolator in order to use them to filter when you need.
Using my previous example, I would have to index in percolator queries like:
{
"query" : {
"match_phrase" : {
"text" : "document"
}
},
"id" : 11
}
Adding "manually" a redundant id field in order to use it later as filter reference.
At percolation time, you have to use something like:
{
"doc" : {
"text" : "This is the text within my document"
},
"filter":{"match":{"id":11}},
"highlight" : {
"order" : "score",
"pre_tags" : ["<example>"],
"post_tags" : ["</example>"],
"fields" : {
"text" : { "number_of_fragments" : 0 }
}
},
"size" : 100
}
In order to use only that percolator query. Complementary information can be found here.

Do query results impact elasticsearch phrase suggestions?

I'd like to know whether Elasticsearch users query results to populate phrase suggestions for direct generator or not?
Or it simply picks tokens from given index?
My queries are based on some permission sets.
So for instance, that'd be my query:
{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"bool" : {
"must" : [{
"terms" : {
"Permissions" : ["permission1", "permission2", "permission3"
]
}
}
]
}
}
}
},
"suggest" : {
"DidYouMean" : {
"text" : "{{SearchPhrase}}",
"phrase" : {
"field" : "_all",
"analyzer" : "simple",
"size" : 1,
"real_word_error_likelihood" : 0.96,
"max_errors" : 5,
"gram_size" : 3,
"direct_generator" : [{
"field" : "_all",
"suggest_mode" : "popular",
"min_word_length" : 3
}
]
}
}
}
}
How would I ensure that direct generator creates suggestions and doesn't violate my permissions clause?
Is this even possible?
The term suggester and phrase suggester feeds on the tokens for generating suggest results. The query does not affect the suggest results. The suggester directly works on the reverse index and get the tokens from them. So its scope is global and never the query

ElasticSearch using wildcard and term queries

I'm new using Elastic Search, and i never used Lucene too.
I build this query:
{
"query" : {
"wildcard" : { "referer" : "*.domain.com*" }
},
"filter" : {
"query" : {
"term" : { "first" : "1" }
}
},
"facets" : {
"site_id" : {
"terms" : {
"field" : "site",
"size" : "70"
}
}
}
}
The wildcard is working great, but the term filter was ignored, what i did wrong?
I need to filter the results with both wildcard and term
Thanks!
Assuming what you are trying to do is applying the filter on the wildcard query results,
you can use a FilteredQuery. However, your case might fit better for a filter.
You use a query filter. Instead of that you may directly use a TermFilter in a FilteredQuery rather than making a filter out of a TermQuery. TermFilter should be faster as it directly uses the TermsEnum.
Note that results of Filters are cached in a FilterCache and Filters are faster because they do not do any scoring of documents. In your case, even though the filter part of the FilteredQuery will work fast, but the wildcard query will be unnecessarily do scoring. You may try to use an AND Filter to club both queryfilter(wildcard query) and term filter instead of a FilteredQuery.
To make just the filter work as required by you, try something like below. (Not tried myself)
{
"filtered" : {
"query" : {
"wildcard" : { "referer" : "*.domain.com*" }
},
"filter" : {
"term" : { "first" : "1" }
}
},
"facets" : {
"site_id" : {
"terms" : {
"field" : "site",
"size" : "70"
}
}
}
}

elasticsearch offset and limit facets

I'm trying to make a search that both limits and "offsets" (the keyword from in elasticsearch) the facet result set, so something like:
'{
"query" : {
"nested" : {
"_scope" : "my_scope",
"path" : "related_award_vendors",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : {
"text" : {"related_award_vendors.title" : "inc"}
}
}
}
}
},
"facets" : {
"facet1" : {
"terms_stats" : {
"key_field" : "related_award_vendors.django_id",
"value_field" : "related_award_vendors.award_amount",
"order":"term",
"size": 5,
"from":2
},
"scope" : "my_scope" }
}
}'
In the above, it returns id's 1,2,3,4,5 and if I remove "from" it still returns 1,2,3,5 in the result set.
The "size" is working correctly. In this case, it's returning five items in the result set.
My understanding is that solr can do this. Can this be done in elasticsearch?
The terms stats facet doesn't support the from parameter. The only way to achieve what you want is to set size to size + offset and ignore first offset entries on the client side. In your example it would mean to request 7 entries and ignore first 2.

Resources