Division of two aggregration metric of kibana - elasticsearch

I want to do the division of two aggregation metric of kibana. I am getting the count of two values and i want to divide both of them.
Is there is any way to do it.
Kibana is generating this elastic search request :
{
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"1": {
"sum_bucket": {
"buckets_path": "1-bucket>_count"
}
},
"2": {
"cardinality": {
"field": "ms.keyword"
}
},
"1-bucket": {
"terms": {
"field": "ms.keyword",
"size": 10000,
"order": {
"_count": "desc"
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {
"indiviualCount": {
"script": {
"inline": "(doc['campaign'].empty) ? 0 : ((1.0/doc['campaign'].value) * 100)",
"lang": "painless"
}
}
},
"docvalue_fields": [
"edrTimestamp",
"timestamp"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [],
"should": [],
"must_not": []
}
}
}
Is there any way we can achieve it in Kibana. I was think to use scripted field but where it will be written. Someone recommended me to use Pipeline aggregation but i am not able to achieve it

I am searching for an answer for a while and my workaround is to use the vertical bar chart.
You need add 2 metric, each metric set to "sum" for your fields (you can manage your fields under "Management side tab" -> "scripted fields" to add or subtract your fields for the correctness of the division).
Switch to Metrics & Axes tab, set both metric to "stacked" mode. Set your y-axes to percentage mode.

Related

Aggregation Query in Discover not returning expected result (Kibana / Elasticsearch)

I have set up a Kibana / Elasticsearch instance to analyze some Data that i'm scraping.
I am analyzing news Articles from different websites, i want to use a query / Filter that only shows me Articles once by using a cardinality aggregation on the Field "article_id".
To do so i have set up a Lens Visualization, added the Visualization to a Dashboard and got the Request from Visualization via the "Inspect" option. Then i tried to use the Request as a Filter in the Discover Tab ("Edit as Query DSL"). The only thing that seems to be affected by the query is the time. When i run the Query in the "Dev Tools" Section it does just fine.
My Request looks like this:
{
"aggs": {
"696b506b-2d7f-4bfc-9fab-704ca6e95d5c": {
"terms": {
"field": "article_title.keyword",
"order": {
"acbaafc6-829d-4c65-9b6b-cbca538c938e": "desc"
},
"size": 100
},
"aggs": {
"acbaafc6-829d-4c65-9b6b-cbca538c938e": {
"cardinality": {
"field": "article_id.keyword"
}
}
}
}
},
"size": 0,
"fields": [
{
"field": "run_date",
"format": "date_time"
},
{
"field": "scrape_date",
"format": "date_time"
}
],
"script_fields": {},
"stored_fields": [
"*"
],
"runtime_mappings": {},
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"match_all": {}
},
{
"range": {
"run_date": {
"gte": "2021-04-02T23:49:43.440Z",
"lte": "2021-04-17T23:49:43.440Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}
Any help is greatly appreciated as this has been driving me insane the last few hours...

How to convert ElasticSearch query to ES7

We are having a tremendous amount of trouble converting an old ElasticSearch query to a newer version of ElasticSearch. The original query for ES 1.8 is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"terms": {
"organization_id": [
"fred"
]
}
}
]
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 0,
"field": "status"
}
},
"tags": {
"terms": {
"size": 0,
"field": "tags"
}
}
}
}
and we are trying to convert it to ES version 7. Does anyone know how to do that?
The Elasicsearch docs for Filtered query in 6.8 (the latest version of the docs I can find that has the page) state that you should move the query and filter to the must and filter parameters in the bool query.
Also, the terms aggregation no longer support setting size to 0 to get Integer.MAX_VALUE. If you really want all the terms, you need to set it to the max value (2147483647) explicitly. However, the documentation for Size recommends using the Composite aggregation instead and paginate.
Below is the closest query I could make to the original that will work with Elasticsearch 7.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"terms": {
"organization_id": [
"fred"
]
}
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 2147483647,
"field": "status"
}
},
"tags": {
"terms": {
"size": 2147483647,
"field": "tags"
}
}
}
}

Filter Elasticsearch Aggregation by Bucket Key Value

I have an Elasticsearch index of documents in which there is a field that contains a list of URLs. Aggregating on this field gives me the count of unique URLs, as expected.
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
}
}
}
I then want to filter out the buckets whose keys do not contain a certain string. I've tried doing so with the Bucket Selector Aggregation.
This attempt:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
},
"links_key_filter": {
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
}
Fails with:
Invalid pipeline aggregation named [links_key_filter] of type
[bucket_selector]. Only sibling pipeline aggregations are allowed at
the top level
Putting the bucket selector inside the links aggregation, like so:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
},
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
}
fails with:
Found two aggregation type definitions in [links]: [terms] and [bucket_selector]
I'm going to keep tinkering but am a bit stuck at the moment :(
You won't be able to use the bucket_selector because its bucket_path
must reference either a number value or a single value numeric metric aggregation [source]
and what a terms aggregation produces is denoted as StringTerms — and that simply won't work, regardless of whether you force a placeholder multibucket aggregation or not.
Having said that, each terms aggregation supports the exclude filter.
Assuming that your links are arrays of keywords:
POST models/_doc/1
{
"links": [
"google.com",
"wikipedia.org"
]
}
POST models/_doc/2
{
"links": [
"reddit.com",
"google.com"
]
}
and you'd like to group everything except reddit, you can use the following regex:
POST models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"exclude": ".*reddit.*", <--
"size": 10
}
}
}
}
BTW, There are some non-trivial implications arising from the usage of such regexes, esp. when you imagine a case-sensitive scenario in which you'd need a query-time-generated regex — as discussed in How to correctly query inside of terms aggregate values in elasticsearch, using include and regex?
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
},
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
Your selector should come a level up, it should be directly in the aggs and parallel to your selector group.
I am not sure about the key filtering
You can use "_key" to get keys:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
},
"bucket_selector": {
"buckets_path": {
"key": "_key"
},
"script": "!params.key.contains('foo')"
}
}
}
}

Elasticsearch outputs the score of 1.0 for all results when searching for a single "starred" term

We are using Elasticsearch to search for the most relevant companies in a specific catalog. When we use the normal search term like lettering we get reasonable scores and can sort the results according to the score.
However, when we modify the search term before querying and make the "starred" version of it (e.g., *lettering*) to be able to search for substrings we get a score of 1.0 for every result. The search for substrings is a requirement in the project.
Any ideas on what could cause this relevance computation? The problem occurs only when a single term is used. We get comprehensible scores when we use two starred terms in combination (e.g., *lettering* *digital*).
EDIT 1:
Exemplary mapping (YAML, other properties are mapped in the same way, excepting boost which is different for each property):
elasticSearchMapping:
type: object
include_in_all: true
enabled: true
properties:
'keywords':
type: string
include_in_all: true
boost: 50
Query:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match_all": []
}, {
"query_string": {
"query": "*lettering*"
}
}]
}
},
"filter": {
"bool": {
"must": [{
"term": {
"__parentPath": "/sites/industrycatalog"
}
}, {
"terms": {
"__workspace": ["live"]
}
}, {
"term": {
"__dimensionCombinationHash": "d751713988987e9331980363e24189ce"
}
}, {
"term": {
"__typeAndSupertypes": "IndustryCatalog:Entry"
}
}],
"should": [],
"must_not": [{
"term": {
"_hidden": true
}
}, {
"range": {
"_hiddenBeforeDateTime": {
"gt": "now"
}
}
}, {
"range": {
"_hiddenAfterDateTime": {
"lt": "now"
}
}
}]
}
}
}
},
"fields": ["__path"],
"script_fields": {
"distance": {
"script": "doc['coordinates'].distanceInKm(51.75631079999999,14.332867899999997)"
}
},
"sort": [{
"customer.featureFlags.industrycatalog": {
"order": "asc"
}
}, {
"_geo_distance": {
"coordinates": {
"lat": "51.75631079999999",
"lon": "14.332867899999997"
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}],
"size": 999999
}
What you are doing is wildcard query, They fall under term level queries and by default constant score is applied.
Check the Lucene Documentation, WildcardQuery extends MultiTermQuery
You can also verify this with the help of explain api, you will something like this
"_explanation": {
"value": 1,
"description": "ConstantScore(company:lettering), product of:",
"details": [{
"value": 1,
"description": "boost"
}, {
"value": 1,
"description": "queryNorm"
}]
}
You can change this behavior with rewriting,
Try this, rewrite also works with query string query
{
"query": {
"wildcard": {
"company": {
"value": "digital*",
"rewrite": "scoring_boolean"
}
}
}
}
It has various options for scoring, see what fits your requirement.
EDIT 1, the reason you see score other than 1 for *lettering* *digital* is due to queryNorm, you can again check with explain api, If you look closely, all documents with both matches will have same score and documents with single match will have same score also.
P.S : leading wildcard is not recommended at all. You will get performance issues since it has to check against every single term in the inverted index. You might want to check edge ngram or ngram filter
Hope this helps!

query not applying custom score

I'm making the next query, my problem is that the custom score (scrip_score) is not being applied. Am I doing something wrong?:
{
"query": {
"bool": {
"must": [
{
"terms": {
"tactics": [
"user_id"
"type_user",
"browser_plugins",
"cashback"
]
}
}
]
},
"script_score": {
"script": "type_user === 2 ? 1 : 2"
}
},
"from": "0",
"size": 50,
"sort": {
"name": {
"order": "desc",
"ignore_unmapped": true
}
}
}
The script_score section in your query gets ignored. If you want it to be taken into account you need to wrap you existing bool query into a function_score query where you can use the script_score part as well.

Resources