How to get hour equally weight average in Amazon QuickSight? - amazon-quicksight

I want to aggregate value as hour equally weight average not general average.
Let's say I have the following dataset.
datetime
HH
value
2022-01-01T14:20
14
100
2022-01-01T14:30
14
80
2022-01-02T14:50
14
30
2022-01-01T15:00
15
60
2022-01-01T15:10
15
60
2022-01-01T15:15
15
60
2022-01-02T15:25
15
0
And I want to get this result.
HH
hour equally weight average
14
60
15
30
Calculation example for HH=14
General average calculation:(100 + 80 + 30)/3 = 70
But I want to get hour equally weight average:(100 + 80)/2 + 30/1 = 60
Calculation example for HH=15
General average calculation: (60 + 60 + 60 + 0)/4 = 45
hour equally weight average: (60 + 60 + 60)/3 + 0/1 = 30
I tried to use avgOver and sumOver, but failed.
Please help

Related

Cumulative sum by category with DAX (Without Date dimension)

This is the input data (Let's suppose that I have 14 different products), I need to calculate with DAX, cumulative Total products by Status
ProductID
Days Since LastPurchase
Status
307255900
76
60 - 180 days
525220000
59
30 - 60 days
209500000
20
< 30 days
312969600
151
60 - 180 days
249300000
52
30 - 60 days
210100000
52
30 - 60 days
304851400
150
60 - 180 days
304851600
150
60 - 180 days
314152700
367
> 180 days
405300000
90
60 - 180 days
314692300
90
60 - 180 days
314692400
53
30 - 60 days
524270000
213
> 180 days
524280000
213
> 180 days
Desire ouput:
Status
Cumulative Count
< 30 days
1
> 180 days
4
30 - 60 days
8
60 - 180 days
14
That's trivial: Just take the build in Quick measure "Running total", see screenshot.
The resulting table will look like this:
However, when you think about it, from a data point of view a sort order like the following makes more sense than ordering "status" by alphabet,
and finally you can take it straight away without any crude categorization

How can I modify a 2d matrix so that predefined sums in both dimensions are satisfied?

I'm working on optimizing production of a number of related widgets over a number of weeks. The total quantity of each widget, and the total quantity of widgets produced each week is fixed. By default, a few of each widget is produced each week.
For example:
Week: 1 2 3 4 5 Total
Widget A: 10 10 20 10 10 60
Widget B: 20 20 40 20 20 120
Widget C: 15 10 5 15 15 60
Totals: 45 40 65 45 45 240
However, due to overhead and setup time in the factory, I'd like the ability to reduce the number of types of widgets produced each week. For example, I'd like the user to be able to delete a number of the weekly widget runs, like this:
Week: 1 2 3 4 5 Total
Widget A: 10 __ 20 10 10 60
Widget B: 20 20 40 __ 20 120
Widget C: 15 __ 5 __ 15 60
Totals: 45 40 65 45 45 240
Given the above input, how could I code a solution to modify the numbers produced per widget per week, so that the total quantity produced per week, and the total quantity produced per widget overall, still satisfies the original totals?

Basic Velocity Algorithm?

Given the following dataset for a single article on my site:
Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60
Article 2
2/1/2010 20000
2/2/2010 25000
2/3/2010 23000
where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?
Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?
thanks!
update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.
Velocity is simply the change in a value (delta pageviews) over time:
For article 1 on 2/3/2010:
delta pageviews = 100 + 80 + 60
= 240 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 240 / 3
= 80 pageviews/day
For article 2 on 2/3/2010:
delta pageviews = 20000 + 25000 + 23000
= 68000 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 68,000 / 3
= 22,666 + 2/3 pageviews/day
Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):
relative pageview velocity of article 1 = velocity / MAX_VELOCITY
= 240 / (22,666 + 2/3)
~ 0.0105882353
~ 1.05882353%
relative pageview velocity of article 2 = velocity / MAX_VELOCITY
= (22,666 + 2/3)/(22,666 + 2/3)
= 1
= 100%
"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.
PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")
Explanation:
Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:
PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
= -40 / 2
= -20 pageviews per day per day
Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:
PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
= -20 pageviews per day per day
This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.
Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.
http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed

Shortest path algorithm for an analog clock

I've got a c homework problem that is doing my head in and will be greatful if anyone can help point me in the right direction.
If I have two minutes points on an analog watch such as t1 (55 minutes) and t2 (7 minutes), I need to calculate the shortest amount of steps between the two points.
What I've come up with so far is these two equations:
-t1 + t2 + 60 =
-55 + 7 + 60
= 12
t1 - t2 + 60 =
55 - 7 + 60
= 108
12 is lower then 108, therefore 12 steps is the shortest distance.
This appears to work fine if I compare the two results and use the lowest. However, if I pick out another two points for example let t1 = 39 and t2 = 34 and plug them into the equation:
-t1 + t2 + 60 = -39 + 34 + 60 = 55
t1 - t2 + 60 = 39 - 34 + 60 = 35
35 is lower then 55, therefore 35 steps is the shortest distance.
However, 35 isn't the correct answer. 5 steps is the shorest distance (39 - 34 = 5).
My brain is a little fried, and I know I am missing something simple. Can anyone help?
What you want is addition and subtraction modulo 60. Check out the % operator. Make sure you handle negatives correctly.
If you don't want to use % operator, try to think this way:
for each couple of points (t1; t2), you'll have two way to connect them: one path will cross 0 (12 o'clock), and the other won't.
Provided that t2 >= t1, the second distance is easy to calculate: it's t2 - t1.
the other distance is t1 + 60 - t2
I think your mistake was adding 60 in the first expression.

Clustering with a distance matrix

I have a (symmetric) matrix M that represents the distance between each pair of nodes. For example,
A B C D E F G H I J K L
A 0 20 20 20 40 60 60 60 100 120 120 120
B 20 0 20 20 60 80 80 80 120 140 140 140
C 20 20 0 20 60 80 80 80 120 140 140 140
D 20 20 20 0 60 80 80 80 120 140 140 140
E 40 60 60 60 0 20 20 20 60 80 80 80
F 60 80 80 80 20 0 20 20 40 60 60 60
G 60 80 80 80 20 20 0 20 60 80 80 80
H 60 80 80 80 20 20 20 0 60 80 80 80
I 100 120 120 120 60 40 60 60 0 20 20 20
J 120 140 140 140 80 60 80 80 20 0 20 20
K 120 140 140 140 80 60 80 80 20 20 0 20
L 120 140 140 140 80 60 80 80 20 20 20 0
Is there any method to extract clusters from M (if needed, the number of clusters can be fixed), such that each cluster contains nodes with small distances between them. In the example, the clusters would be (A, B, C, D), (E, F, G, H) and (I, J, K, L).
Thanks a lot :)
Hierarchical clustering works directly with the distance matrix instead of the actual observations. If you know the number of clusters, you will already know your stopping criterion (stop when there are k clusters). The main trick here will be to choose an appropriate linkage method. Also, this paper(pdf) gives an excellent overview of all kinds of clustering methods.
One more possible way is using Partitioning Around Medoids which often called K-Medoids. If you look at R-clustering package you will see pam function which recieves distance matrix as input data.
Well, It is possible to perform K-means clustering on a given similarity matrix, at first you need to center the matrix and then take the eigenvalues of the matrix. The final and the most important step is multiplying the first two set of eigenvectors to the square root of diagonals of the eigenvalues to get the vectors and then move on with K-means . Below the code shows how to do it. You can change similarity matrix. fpdist is the similarity matrix.
mds.tau <- function(H)
{
n <- nrow(H)
P <- diag(n) - 1/n
return(-0.5 * P %*% H %*% P)
}
B<-mds.tau(fpdist)
eig <- eigen(B, symmetric = TRUE)
v <- eig$values[1:2]
#convert negative values to 0.
v[v < 0] <- 0
X <- eig$vectors[, 1:2] %*% diag(sqrt(v))
library(vegan)
km <- kmeans(X,centers= 5, iter.max=1000, nstart=10000) .
#embedding using MDS
cmd<-cmdscale(fpdist)

Resources