How to calculate the average of rows/selected rows in the matrix

How to calculate the average of rows/selected rows in the matrix - matrix

I have a matrix and I need to calculate the average of rows in the matrix for the past 12 months.
The average for 'Actual Exp' will be different than the 'Actual Min' values, and RAG will be calculated based on the average value of 'Actual Exp'.
This is how it should look like with calculated averages.
I don't know how to get the average for 'Actual Exp' and 'Actual Min' in a matrix.
Thanks, guys

To calculate the separate averages for Actual Exp and Actual Min you can create a table with measures.
The formula for the measure is:
Avg_Actual Exp = AVERAGEX({Table Name}, {Column Name})
This formula will also consider only the valid values and will omit the text or null values.
For those values which only have null or text values in the rows, the formula is:
Avg_Actual Exp = IF(ISNUMBER(AVERAGEX({Table Name}, {Column Name}), AVERAGEX({Table Name}, {Column Name},"N/A" )
This will give you the "N/A" if the whole row has null values.
You can replicate the same thing for the Actual Min.

Related

summation of same values once for each id

How can I define a measure to calculate the same values for each id once for summation in Power BI?
for example, in this picture sum of num ( sum(num) )should be: 5
I used the code below but it returns a summation of all numbers and its result for this example is 11 instead of 5!
Measure = (CALCULATE(SUM('table1'[num]) , ALLEXCEPT('table1','table1'[order_id])))

First, try to calculate a virtual table that contains only rows that match your condition, then you can easily countrows and display them in CARD.
CountOF =
var __temp = CALCULATETABLE(VALUES(Sheet2[user_id]), Sheet2[num] > 0)
return
CALCULATE( COUNTROWS(__temp))

Calculating Moving Average for N Months in DAX Power BI

I have a measure that calculates Moving Average for 3 months:
Moving_Avg_3_Months = AVERAGEX(DATESINPERIOD('Calendar FY'[Date],
LASTDATE('Calendar FY'[Date]), -3, MONTH),[CUS Revenue Credible All])
Is it possible to create a measure that would calculate Moving Average for my [CUS Revenue Credible All] - but for N months. Where N = 3 or N = 6 or N = whatever number I'd like?

If you create a new table with the different values for moving average you want to use eg. TableMovingAverage: [-3,-6,-12,-24,...,N]
and modify you DAX formula like this:
Moving_Avg_3_Months =
AVERAGEX(
DATESINPERIOD('Calendar FY'[Date],
LASTDATE('Calendar FY'[Date]),
SELECTEDVALUE('TableMovingAverage', -3),
MONTH),
[CUS Revenue Credible All])
SELECTEDVALUE returns a scalar if only one value is in the specified table, otherwise it return a default value -3 in this case.
If you filter TableMovingAverage you can switch between different moving averages

How to simulate BigQuery's quantiles in Hive

I want to simulate BigQuery's QUANTILES function in Hive.
Data set: 1,2,3,4
BigQuery's query result will return value 2
select nth(2, quantiles(col1, 3))
But in Hive:
select percentile(col1, 0.5)
I've got 2.5
Note: I've got same result for odd number of records.
Is there any adequate Hive's udf functions?

I guess what you are looking for is the percentile_approx UDF.
This page gives you the list of all built-in UDFs in Hive.
percentile_approx(DOUBLE col, p [, B])
Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000. When the number of distinct values in col is smaller than B, this gives an exact percentile value.

How does ORACLE DB sum NUMBER(*,s) with many records?

I am wondering how Oracle sums NUMBER(9,2) with SUM(numWithScale/7).
This is because I am wondering how the error will propagate with a large amount of records
Let's say I have a table EMP_SAL with some EMP_ID, numWithScale, numWithScale being a salary.
To make it simple, let us make the numWithScale column NUMBER(9,2) 9 decimals of precision with 2 decimals to round to. All of these numbers in the table are random digits from 10.00-20.00 (ex. 10.12, 20.00, 19.95)
I divide by 7 in my calculation to give random digits at the end that round up or down.
Now, I sum all of the employees salaries with SUM(numWithScale/7).
Will the sum round each time it adds a record? Or does Oracle round after the calculation is complete? i.e. the error can be +/-0.01 from rounding, and with many additions then roundings, error adds up. Or does it round at the end? Thus I dont have to worry about the error adding up (unless I use the result in many more calculations)
Also, will Oracle return the sum as the more precise NUMBER, (38 digit precision, floating point)? or will it round up to the second digit NUMBER(9,2) when returning the value?
Will MSSQL behave pretty much the same way (even though syntax is different?

Oracle performs operation in the order you specified.
So, if you write this query:
select SUM(numWithScale/7) from some_table -- (1)
each of values divided by 7 and rounded to maximum available precision: NUMBER with 38 significant digits. After that all digits are summed.
In case of this query:
select SUM(numWithScale)/7 from some_table -- (2)
all numWithScale values are summed and only after that divided by 7. In this case there are no precision loss for each record, only result of sum() division by 7 are rounded to 38 significant digits.
This problem are common for calculation algorithms. Each time when you divide value by 7 you produce small calculation error because of limited number of digits, representing a number:
numWithScale/7 => quotient + delta.
While summing this values you got
sum(quotient) + sum(delta).
If numWithScale represents ideal uniform distribution and and a some_table contains infinite number of records, then sum(delta) tends to zero. But it happens only in theory. In practical cases sum(delta) grows and introduces significant error. This is a case of query(1).
On the other hand, summing can't introduce a rounding error if implemented properly. So for query (2) rounding error introduced only in last step, when whole sum divided by 7. Therefore value of delta for this query not affected by number of records.

Number scale and precision is only relevant as column or variable constraint.
When you attempt to store a number that exceeds defined precision it will raise an exception:
create table num (a number(5,2));
insert into num values (123456.789);
=> ORA-01438: value larger than specified precision allowed for this column
When you attempt to store a number that exceeds defined scale it will be rounded:
insert into num values (123.456789);
select a from num;
=> 123.46
Precision and scale do not matter when you read data and perform any calculations on it...
select 100000 + a / 100 from num;
=> 100001.2346
...unless you want to store it back into column with constraints, so above rules apply:
update num set a = a / 100;
select a from num;
=> 1.23

numWithScale/7 will be converted to NUMBER (i.e. it will not be rounded to number(9,2)).

How to decide on weights?

For my work, I need some kind of algorithm with the following input and output:
Input: a set of dates (from the past). Output: a set of weights - one weight per one given date (the sum of all weights = 1).
The basic idea is that the closest date to today's date should receive the highest weight, the second closest date will get the second highest weight, and so on...
Any ideas?
Thanks in advance!

First, for each date in your input set assign the amount of time between the date and today.
For example: the following date set {today, tomorrow, yesterday, a week from today} becomes {0, 1, 1, 7}. Formally: val[i] = abs(today - date[i]).
Second, inverse the values in such a way that their relative weights are reversed. The simplest way of doing so would be: val[i] = 1/val[i].
Other suggestions:
val[i] = 1/val[i]^2
val[i] = 1/sqrt(val[i])
val[i] = 1/log(val[i])
The hardest and most important part is deciding how to inverse the values. Think, what should be the nature of the weights? (do you want noticeable differences between two far away dates, or maybe two far away dates should have pretty equal weights? Do you want a date which is very close to today have an extremely bigger weight or a reasonably bigger weight?).
Note that you should come up with an inverting procedure where you cannot divide by zero. In the example above, dividing by val[i] results in division by zero. One method to avoid division by zero is called smoothing. The most trivial way to "smooth" your data is using the add-one smoothing where you just add one to each value (so today becomes 1, tomorrow becomes 2, next week becomes 8, etc).
Now the easiest part is to normalize the values so that they'll sum up to one.
sum = val[1] + val[2] + ... + val[n]
weight[i] = val[i]/sum for each i

Sort dates and remove dups
Assign values (maybe starting from the farthest date in steps of 10 or whatever you need - these value can be arbitrary, they just reflect order and distance)
Normalize weights to add up to 1
Executable pseudocode (tweakable):
#!/usr/bin/env python
import random, pprint
from operator import itemgetter
# for simplicity's sake dates are integers here ...
pivot_date = 1000
past_dates = set(random.sample(range(1, pivot_date), 5))
weights, stepping = [], 10
for date in sorted(past_dates):
weights.append( (date, stepping) )
stepping += 10
sum_of_steppings = sum([ itemgetter(1)(x) for x in weights ])
normalized = [ (d, (w / float(sum_of_steppings)) ) for d, w in weights ]
pprint.pprint(normalized)
# Example output
# The 'date' closest to 1000 (here: 889) has the highest weight,
# 703 the second highest, and so forth ...
# [(151, 0.06666666666666667),
# (425, 0.13333333333333333),
# (571, 0.2),
# (703, 0.26666666666666666),
# (889, 0.3333333333333333)]

How to weight: just compute the difference of all dates and the current date
x(i) = abs(date(i) - current_date)
you can then use different expression to assign weights:
w(i) = 1/x(i)
w(i) = exp(-x(i))
w(i) = exp(-x(i)^2))
use gaussian distribution - more complicated, do not recommend
Then use normalized weights: w(i)/sum(w(i)) so that the sum is 1.
(Note that the exponential func is always used by statisticians in survival analysis)

The first thing that comes to my mind to to use a geometric series:
http://en.wikipedia.org/wiki/Geometric_series
(1/2)+(1/4)+(1/8)+(1/16)+(1/32)+(1/64)+(1/128)+(1/256)..... sums to one.
Yesterday would be 1/2
2 days ago would be 1/4
and so on

Is is the index for the i-th date.
Assign weights equal to to Ni / D.
D0 is the first date.
Ni is the difference in days between the i-th date and the first date D0.
D is the normalization factor

converts dates to yyyymmddhhmiss format (24 hours), add all these values and the total, divide by the total time, and sort by this value.
declare #data table
(
Date bigint,
Weight float
)
declare #sumTotal decimal(18,2)
insert into #Data (Date)
select top 100
replace(replace(replace(convert(varchar,Datetime,20),'-',''),':',''),' ','')
from Dates
select #sumTotal=sum(Date)
from #Data
update #Data set
Weight=Date/#sumTotal
select * from #Data order by 2 desc

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to calculate the average of rows/selected rows in the matrix - matrix

Related

summation of same values once for each id

Calculating Moving Average for N Months in DAX Power BI

How to simulate BigQuery's quantiles in Hive

How does ORACLE DB sum NUMBER(*,s) with many records?

How to decide on weights?

Categories

Resources