Elastic Search: get documents where value > AVG - elasticsearch

Hi I would like to understand how deep you can go with Elastic Search queries.
I have an index with a property "Price" and i would extract any document where Price > AVG(Price).
For example if I have 6 documents with this prices:
532,400,299,100,100,33
it should extracts documents 299, 400, 532 because > of price average (244).
I can reach this goal with simple elastic search query or I need to use something else, for example scripting (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html) or custom script with another language (nodejs, python, .Net etc) or use some ETL tool like Logstash?
I have some difficulties to identify the road. I tried to use subquery in ES using query but it's not supported.
({ "query" : "select * from myIndex where Price > (select avg('Price') from myIndex) "})

I think this query will solve your problem in efficient way:
{
"aggs":{
"price_gte_244":{
"filter":{
"range":{
"price":{
"gte":244
}
}
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"
}
}
}
}
}
}
When 244 could be any value/variable that you want
If you want to read more about filter aggregation: Filter aggregation docs

Depending on how your data is structured, you can get the avg value across all of your documents through an AVG aggregation. Then, use that value in a Range query to find the documents greater than this value. You can also look into the Bucket Selector Aggregation which has a lot of similarity with a SQL HAVING clause.

Related

Does aggregation in ES works on all the docs(with query match all) in es index

Let's suppose I have 10^6 docs in my Es index . Will aggregation work on all the docs to get results .
GET myindex/_search
{
"size":0,
"aggs":{
"Unique_term":{
"terms":{
"field":"category",
"size":"10000"
}
}
}
}
Also the distinct terms I want to get are less than 10k .
Yes, the terms aggregation will give you the result where buckets will be built based on each unique value from the category
If no size param is set then by default it returns only 10 buckets

elasticsearch: get random distinct field values?

We have elastic search document with dealerId "field". Multiple documents can have the same "dealerId". We want to pick "N" random dealers from it.
What I have done so far: The following query would return max 1000 "dealerId" and their count in descending order. We will then randomly pick "N" records client side.
{
"from":0,
"size":0,
"aggs":{
"CityIdCount":{
"terms":{
"field":"dealerId",
"order" : { "_term" : "desc" },
"size":1000
}
}
}
}
The downside with this approach is that:
If in future, we have more than 1K unique dealers, this approach would fail as it would pick only top 1K dealerId occurence. What should we put as "size" for this?
We are fetching all the data although we just require random "N" i.e. 3 or 4 random "dealerId" from elastic server to the client. Can we somehow do this randomization in the elastic query itself i.e. order: "random"?
I have read something similar here but trying to check if we have some solution for this now.

Exclude results from Elasticsearch / Kibana based on aggregation value

Is it possible to exclude results based on the outcome of an aggregation?
In other words, I have aggregated on a Term and a whole bunch of results appear in a data table ordered in descending order by the count. Is it possible to configure kibana / elasticsearch to exclude results where count is 1 or less. (Where count is an aggregation).
I realise I can export the raw data from the data table visualization and delete those records manually through a text editor or excel. But I am trying to convince my organization that elasticsearch is a cool new thing and this is one of their 1st requirements...
You can exclude the result from the search by applying a filter here a sample that can be helpfull.
"query": {
"bool": {
"filter": {
"range": {
"Your_term": {
"gte": 1
}
}
}
}

Querying large amounts of terms without expanding maxClauseCount

In a data flow of mine, I am trying to retrieve a subset of documents from a previous terms aggregation, but hitting the maxClauseCount limit within my ES cluster. The follow up query is along these lines:
GET dataset/_search
{
"size": 2000,
"query": {
"bool": {
"must": [
(a filter or two)...,
{
"terms":{
"otherid":[
"789e18f2-bacb-4e38-9800-bf8e4c65c206",
"8e6967aa-5b98-483e-b50f-c681c7396a6a",
...
]
}
}
]}
}
}
In my research I've come across a lookup - which sadly we can't use - as well as the ids query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
From experimentation, it appears that the ids query doesn't share the limit the terms query has (potentially it's not converted into terms clauses). Do any of you know if there's a good way to achieve similar functionality to the ids query without using the ids fields.
My version of ES is 5.0.
Thanks!
instead of using terms use the Terms filter it will solve the issue
OR
index.query.bool.max_clause_count: increase to higher value(*Not Recommended)
http://george-stathis.com/2013/10/18/setting-the-booleanquery-maxclausecount-in-elasticsearch/

Elastic Search Distinct values

I want to know how it's possible to get distinct value of a field in elastic search. I read an article here shows how to do that with facets, but I read facets are deprecated:
http://elasticsearch-users.115913.n3.nabble.com/Getting-Distinct-Values-td3830953.html
Is there any other way to do that? if not is it possible to tell me how to do that? it's abit hard to understand solutions like this: Elastic Search - display all distinct values of an array
Use aggregations:
GET /my_index/my_type/_search?search_type=count
{
"aggs": {
"my_fields": {
"terms": {
"field": "name",
"size": 1000
}
}
}
}
You can use the Cardinality metric
Although the counts returned aren't guaranteed to be 100% accurate, they almost always are for low cardinality terms and the precision is configurable via the precision_threshold param.
http://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html

Resources