Kusto Query, selecting an interval of 5 minutes and calculate the average - performance

I'm fairly new to the Kusto Query language so perhaps this is something very common, but I really can't find my answer. So here goes.
I've enabled performance gathering with Azure Log Analytics on some of our servers and would like to achieve the following:
From the Perf dataset, select all the CPU data from the previous day and display the average CPU utilization per 5 minutes. Now I've figured out the first part, which was really easy to do. However I can't figure out how to do the per 5 minute selection in Kusto. I'm guessing something with summarise? Can anyone share some insights?
Perf
| where Computer == "servername.domain.internal"
| where TimeGenerated > ago(1d)
| where CounterName == "% Processor Time"
| where ObjectName == "Processor Information"

Try adding | summarize avg(CounterValue) by bin(Time Generated, 5m) to your query.
For charting, you can also append a | render timechart to the latter.

Related

Creating advanced SUMIF() calculations in Quicksight

I have a couple of joined Athena tables in Quicksight. The data looks something like this:
Ans_Count | ID | Alias
10 | 1 | A
10 | 1 | B
10 | 1 | C
20 | 2 | D
20 | 2 | E
20 | 2 | F
I want to create a calculated field such that it sums the Ans_Count column based on distinct IDs only. i.e., in the example above the result should be 30.
How do I do that?? Thanks!
Are you looking for the sum before or after applying a filter?
Sumif(Ans_Count,ID) may be what your looking for.
If you need to always return the result of the sum, regardless of the filter on the visual, look at the sumOver() function.
You can use distinctCountOver at PRE_AGG level to count unique number of values for a given partition. You could use that count to drive the sumIf condition as well.
Example : distinctCountOver(operand, [partition fields], PRE_AGG)
More details about what will be visual's group by specification and an example where there duplicate IDs will help give a specific solution.
It might even be as simple as minOver(Ans_Count, [ID], PRE_AGG) and using SUM aggregation on top of it in the visual.
If you want another column with the values repeated, use sumOver(Ans_Count, [ID], PRE_AGG). Or, if you want to aggregate via QuickSight, you would use sumOver(sum(Ans_Count), [ID]).
I agree with the above suggestions to use sumOver(sum(Ans_Count), [ID]).
I have yet to understand the use cases for pre_agg, so if anyone has concrete examples please share them!
Another suggestion would be to do a sumover + partition by in your table (if possible) before uploading the dataset, then checking if the results matche with Quicksight's aggregations. I find Quicksight can be tricky with calculated fields, aggregations, and nested ifs so I've been doing calculations in SQL where possible before bringing it in to quicksight to have a better grasp of what the outputs should look like. This obviously is an extra step, but can help in understanding how quicksight pulls off calcs and brings up figures (as the documentation doesn't always give much), and spotting things that don't look right (I've had a few) before you share your analysis with a wider group.

Kibana gauge with dynamic maximum value

I have a data coming from logstash that shows how much space is used on a table in a database and maximum allocated capacity for a table. I want to create in Kibana gauges for every table that show how much space is currently occupied.
The problem is that maximum available space sometimes changes so the limit for a gauge has to be set as a variable and I can't figure out how to do this. I also don't know how to show only data from current day on a dashboard for a time range. Data coming from logstash looks like that:
time | table_name | used_gb | max_gb
---------+------------+---------+--------
25.04.18 | table_1 | 1.2 | 10.4
25.04.18 | table_2 | 4.6 | 5.0
26.04.18 | table_1 | 1.4 | 14.6
26.04.18 | table_2 | 4.9 | 5.0
I want my gauge for every table to look something like that:
This problem can be solved using Time Series Visual Builder.
Choose Gauge, then Panel options, you can specify 1 as your max value. Then in your gauge data settings you can compute the dynamic ratio per table. Here's a screenshot of a similar setup:
In older versions of Kibana instead of Bucket Script you should use Calculation Aggregation.
Reference:
https://discuss.elastic.co/t/gauge-with-dynamic-maximum-value/130634/2

Calculate Session Duration based on LogFiles in Kibana

I setup an ElasticStack and imported Millions of LogEntries. Each log entry contains a Tiestamp and a sessionID. Each session produces multiple log entries thus I have the following information available
SessionID | Timestamp
1234 | stamp1
1234 | stamp2
2223 | stamp3
1234 | stamp4
5566 | stamp5
5566 | stamp6
2223 | stamp7
Now I would like to calculate the average/minimum/maximum session duration.
Does anyone know how to achieve this?
Thanks in advance
To do exactly what you want isn't going to be simple, I'm not even convinced it's possible with your data in its current form.
I'm also not sure what having the average, minimum and maximum session lengths actually gives you in terms of actionable information - why do you need the max/min/avg session times?
Something that could be easily visualised using you data would be session count against a date histogram. From Kibana, create a line graph visualisation. On the y-axis do a unique count of the session ID, on the x-axis select date histogram and use your timestamp field...
I would have thought knowing the session count over a period of time would give you a better idea for capacity planning than knowing max/min session times - perhaps you have already done this? This assumes each session is regularly logging... If you zoom in too far (i.e. between log events) the graph will look choppy, but it should smooth as you zoom out and there are options available for smoothing.

GSA - Determining which queries need higher Click Rank from ASR

I have been analyzing click data from our Google Search Appliance (GSA) Advanced Search Reports (ASR), and I have run into a bit of an issue. I am trying to generate a .csv report that is ordered by a "priority" that determines which queries would benefit from a manual boost in Click Rank. An example entry in the report looks like this:
| Query | Avg Start Page | Avg Click Rank | Total Clicks | Unique Users | Attention Indicator |
---------------------------------------------------------------------------------------------------
| transfers | 0 | 5.5 | 9| 4| 88.72|
My current Indicator is following this formula:
Priority = ((Unique Users^2)*Avg Click Rank)+(Unique Users/Avg Click Rank)
In my formula, I am trying to lower the priority of cases where 1 user has many clicks (ex. a user clicks every link on a page, skewing results with higher clicks and click rank), and also lower priority of cases where only 1-2 users are searching for a query.
Is there a better way to analyze GSA click data based on a similar Priority metric?
There is no manual boost in click rank (other than faking the clicks). You do have source biasing and also metadata biasing which could feed into that.
Click data should be used to judge the general performance of the system. We generally aren't circling back to circumvent the self learning scorer.

SSRS Report Based on Query from SSAS Runs much slower than cube browse in SSAS

I have a report in SQL Reporting Services (SSRS) that pulls data from an SQL Analysis Services (SSAS) cube. The Cube has two important dimensions - Time and Activity that are related (it's a report on activity over time). The activity dimension has a single unique key, and attributes to indicate who performed the activity. Measures are simple counts and percentages of types of activity and their results.
The report looks something like this:
Report for user: xyx
Report Period: 1/1/2011 - 3/1/2011
Type of Activity | Submitted | Completed | Success Rate
Type 1 | 50 | 20 | 40%
--------------------------------------------------------
Type 2 | 50 | 20 | 40%
--------------------------------------------------------
Type 3 | 50 | 20 | 40%
--------------------------------------------------------
Type 4 | 50 | 20 | 40%
--------------------------------------------------------
Type 5 | 50 | 20 | 40%
--------------------------------------------------------
Total | 250 | 100 | 40%
If I browse the cube is SQL Management studio, I get the results in a fraction of a second. In SSRS it takes upwards of 7 minutes to generate. The Execution Log for SSRS shows time pretty evenly split in retrieval/processing/rendering at:
> TimeDataRetrieval TimeProcessing TimeRendering
> 170866 142324 154689
I suspect it has to do with how the report is filtered, but I don't know how investigate that.
What should I look at next to figure out why SSRS seems to take so long when doing the browse in SSAS is really fast (and the actual reports aren't much bigger than my sample, 3 more rows and a few more columns)?
Have you compared the queries being generated by SSMS and SSRS to see if they are the same? That would be my next step. SSRS has been known to generate terribly inefficient queries on occasion...when a dataset is built via the drag-n-drop designer.

Resources