Percentage of number of documents falling under time filter to the total number of document in a Kibana Index Pattern - elasticsearch

I have a time based index pattern in Kibana. I want to show following metric on Kibana dashboard.
A percentage of the number of document falling under the global time
filter to the total number of document present in the index pattern.
► For Example,
The percentage of the number of document in the last 2 days to the total number of document present in the index pattern. Here the user has applied the time filter to the last 2 days.
Can it, at all, be visualized using Kibana? How?

Related

How to calculate the size of an elasticsearch node?

How can i calculate the needed size of the elasticsearch-node for my Shopware 6 instance when i known some KPIs.
For example:
KPI
Value
Customer
5.000
Products
10.000
SalesChannel
2
Languages
1
Categories
20
Is there a (rough) formula to calculate the number of documents or the required size of a node?
https://www.elastic.co/blog/benchmarking-and-sizing-your-elasticsearch-cluster-for-logs-and-metrics is relevant
however what you have provided there is only logical sizing of the data. you will need to figure out what this all means when you start putting documents into Elasticsearch

How to accommodate minutely and hourly data in the same visualisation?

Current Scenario -
The current dashboard is set to Sum aggregation at minutely level. My dashboard currently works only when interval is set to minutely. If I change the interval the current graph shows incorrect values. This happens due to the fact that there are more than 1 documents generated per minute and the correct value per minute will be the sum of the field values at minutely level.
So even today we are obliged to use minute interval but I'm fine with this.
Now the hourly documents is designed to ingest data after doing all the math( and we have validated the ingestion logic). So there is 1 doc per hour. This is the reason the visualisation is not able to accommodate both types of data.
If I had a scenario like 1 document per minute and then 1 document per hour, then I could have gone with using average metrics or perhaps max metrics but at present the problem is I have to do sum of the doc values for a minute (mandatory), therefore, whatever internal logic applies for minutely data gets also applied to hourly too.
Is there a way where I can show both types of data in the same graph?
Mathematically, the approach is wrong.
Having n documents per minute (where n depends on the no. of hosts in that cluster) and then 1 document per hour per type is illogical from visualisation perspective because the actual value needed was the sum of all n documents generated per min and so the sum metric that was being applied at minutely level was also getting applied at hourly data. If we wanted to accommodate both types of data in the same graph, there is a need of uniformity and thus, aggregate the data at minutely level from other end and then send aggregated data to elastic.

Elasticsearch Document Count Doesn't Reflect Indexing Rate

I've indexing data from Spark into Elasticsearch, and according the Kibana, I'm indexing at a rate of 6k/s for the primary shards. However, if you look at the Document Count graph in the lower right, you'll see that it doesn't increase proportionately. How can this index have only 1.3k documents when it's indexing at 5 times that per second?

Bigdesk charts explanation

I don't understand what Search time per second (Δ) means. Is it the delta of number of milliseconds that the search requests took in previous and current refresh interval? Also there is a Query and Fetch time below the chart, not sure what that represents.
Attached is a screenshot:
A query in Elasticsearch actually a 2 phased process:
Query Phase :
During the initial query phase, the query is broadcast to a shard copy (a primary or replica shard) of every shard in the index. Each shard executes the search locally and builds a priority queue of matching documents.
And
Fetch Phase :
The query phase identifies which documents satisfy the search request, but we still need to retrieve the documents themselves. This is the job of the fetch phase.
And that mail explains the Search time per second (Δ) part in detail:
Here is an example for "Search requests per second (Δ)":
- You do some "_search" request
- It hits 15 shards of some indices on that node, so the value of indices -> search -> "query_total" in nodes stats API 2 response
increases by 15
- Bigdesk refresh value is 5000 (5 sec)
As a result the chart should display peak of 3 (15/5) in the Query
line. So if the value is ~1500 in your case then it means in average
an X number of shards is hit by search requests per second where
X=1500*refresh (does it make sense)?
You can see the chart is really only informative (it depends on
refresh interval and number of shards). But there is the cumulative
"query_total" value displayed as well in the web UI.
Similarly, the second chart "Search time per second (Δ)" displays the
average time (in mills) spent in query or fetch phase on the node.
Again this value includes all involved shards on that node.
Search time per second (Δ) based on 2 series seies1 and serie2
they are explained here
looks like chart shows these metrics per time unit

Logstash + ElasticSearch + Kibana combine results from different fields in different documents

We have Apache log analyzed by Elasticsearch (2.1.0) and Kibana (4.3.0).
Logs are parsed and shipped to Elasticsearch by Logstash running on web servers and reading Apache combined log format.
All works good but now we need analyze more complicated pattern.
We have documents with field “purchase_id” which has integer value (like 130012, 130016, 133552 etc).
We have OTHER documents which have integer field “view_id” with same values (like 130012, 130016, 133552 etc.)
Both fields never appear in same document, because those fields extracted from different URI in Apache log.
Our goal is calculate and visualize percentage of appearance in given time frame of values in “purchase_id” compared to values in “view_id”.
For example, lets say we want to see current purchase rate of item 130012. It may appear in last 30 seconds 1000 times in documents with field “purchase_id” and in same last 30 seconds it may appear 40000 times in documents with field “view_id”.
This is obvious because only small amount of people buy item compared to amount of people exposed to product. I need to calculate and visualize that in time frame there was 1000 times purchase_id of item 130012 and 40000 times view_id of item 130012 then divide 1000 by 40000 and multiply 100% so I get 2.5% visualized on dashboard (for item 130012).
Of course I have many such purchase_id=view_id=(some number):int pairs, so I need calculate percentage for all of them and display, lets say 20 with highest percentage.
This will allow me know the best selling items compared to advertisements we invest.
I would track this issue for kibana.

Resources