Transition matrix in Tableau - matrix

I want to create a transition matrix in tableau with the following data as example.
IDKEY Status Date
A 1 2013
B 1 2013
C 2 2013
D 1 2013
A 1 2014
B 1 2014
C 1 2014
D 2 2014
From this data I want to build a simple matrix. Rows are Current Status and Columns are Previous status. [1,2] means for example: 33%(C) of the ID's which have a current status of
1 had a previous status of 2. And the ID with a current status of 2(D) had a previous status of 1.
[,1] [,2]
[1,]0.67 0.33
[2,]1 0
Any ideas on this? I tried several calculated fields but they do not seem to work.
For additional information, see the attached screenshot.
Loan A&B remain 3 during 2014. I want to give those loans a value like "33" and loan
I goes from 1 to 2 and I want that loan to get a value "12". For example in a new column.
This example is with more levels than the first one.

Just bumped into this. And wrote a note on the solution for this exact problem.
http://vizdiff.blogspot.com/2016/10/creating-markov-chain-transition-matrix.html
The calculation in Tableau is actually very easy. We just need to compute the row totals along the new states. One can use Either window_sum() or LOD the level of details. See the details in the above article.
Sum({Fixed Year13: Sum(Number of Records)}) or Window_Sum(Sum(Number of Records)) computing using Year14.
See if this answer after 2 years works for you.

Related

How to deal with reporting slowly changing dimensions

For a client I am creating a data warehouse in which we have some slowly changing dimensions (or facts if that is even a thing?). For example we want to report the annually recurring revenue (ARR) for subscriptions and we want to have both the currently active and the expired subscriptions in there. So that we can see the ARR over a timeline.
The data we retrieve looks like this:
subscription_id
account_id
ARR
start_date
end_date
1
1
10
01-01-2022
31-03-2022
2
2
20
01-01-2022
31-12-2022
3
1
5
01-04-2022
31-11-2022
So in this case the same account (account_id 1) renewed a subscription at the 01-04-2022. In the report of 2022 we want to see the ARR for all months in 2022. I've looked into slowly changing dimensions, however something I can not really see in that concept is how to report both the currently active license and the history in a dashboard. If we for example want to visualize the ARR in all of 2022 per month in a dashboarding tool we want to see both subscriptions for account_id 1 over the course of the year, not just the currently active one. This seems to be very tricky to do in most dashboarding tools.
To overcome this I've done the following. I created a calendar table with an interval of 1 month and I cross join it with the table above to generate a fact table. The end result would look like:
timestamp
account_id
ARR
01-01-2022
1
10
01-01-2022
2
20
01-02-2022
1
10
...
...
...
01-11-2022
1
10
01-11-2022
2
20
01-11-2022
2
20
This makes it really easy for the user of the reporting tool to filter on a specific month and show the ARR between the dates and over multiple subscriptions. It does however generate a lot of extra data, but at the moment the storage space is not an issue. And it makes it more of a transactional style table, but the ARR is not really a transaction (i.e. it is not really a sold product on a specific date).
My question is: Are there better ways of generating a fact table where the source data contains a date range?

AWS Quicksight aggregate data

i have a dataset like this
Order
id
expected date
1
11-04-2022
2
10-04-2022
2
14-04-2022
Order Event
Id
Order Id
Order status
Date
1
1
created
01-04-2022
2
1
completed
12-04-2022
3
2
created
01-04-2022
4
2
in progress
07-04-2022
5
2
completed
10-04-2022
6
3
created
10-04-2022
and i need to create a graph that show, for all order with completed status the difference between expected date and actual order date.
How can i archueve that
First, you have to join both of the tables into one because QuickSight can only work with multiple data files if they are merged. You can apply an inner join on the order ID.
Then, you can calculate the difference between the expected date and the order date and add an if-statement to filter out the orders who are not completed yet. You do this by adding a calculated field to your dataset with the following code:
ifelse(
{Order_status}="completed",
dateDiff({expected_date},{Date},"DD"),
0
)
You can also modify this field. Here, I wrote "DD" for the date difference in days, you can also select hours etc.. Also, if the order is not completed, I selected 0 as a default value. To find out more about the commands used in this calculated field, visit this AWS Docs links:
If-Else Command
Date-Diff Command
Now that the calculated field is created, you can plot it together with the order ID.
BR mylosf

Google Sheets: Data Validation - Unique row values across multiple columns

Good day,
I have seen from here a solution to control duplicate entries into a single column. A Data validation with this custom formula works well for one column.
I would like to achieve the same effect over multiple columns ... i.e. unique row entries across multiple columns. Take for example below three columns A-C. Only when values {1,2,1} are entered for the second time will the input be rejected.
A B C
1 1 1
1 2 1
1 2 2
2 2 2
1 2 1 X Entry should be rejected.
Is there a quick way to do this using Data Validation - custom formulae?
use custom formula for data validation:
=INDEX(COUNTIF($A$1:$A&"×"&$B$1:$B&"×"&$C$1:$C, $A1&"×"&$B1&"×"&$C1)<2)

Month slicer and filter not working properly on rolling data Power BI

i All.
I have created below measure to reflects always 3 month figures when month slicer is used.
3R =
CALCULATE(COUNT('Order'[Order/ not ordered]),DATESINPERIOD('Date'[Date],LASTDATE('Date'[Date]),-3,MONTH)))
However, when I try to add salesman figures as filter with below formula, and click to month slicer it is directly show just choosen month figures not 3 months.
3R John =
CALCULATE(COUNT('Order'[Order/ notordered])
,DATESINPERIOD('Date'[Date],LASTDATE('Date'[Date]),-3,MONTH),FILTER('Order','Order'[Salesman]="John")))
on below link I have shared a sample for this. there are 2 different table and 1 matrix.
Matrix is named working and 1 of the table is named as "not working properly". not selecting any value on slicer. all data gives same data. however when clicked month slicer not named main table is changing and this is correct. also matrix is correct but table that I am trying to achive is not working.
What i am trying to achive is, 3 month roling data based on customer and salesman. when click for example 1 on month slicer table should give January 2020, December 2019 and November 2019 figures.
https://drive.google.com/file/d/1LoqSiKhHMFn_OioI2RnXOzjcIL9dPRjS/view?usp=sharing
below is the solution. worked for me.
3R John =CALCULATE(COUNT('Order'[Order/notordered]),DATESINPERIOD('Date'[Date],LASTDATE('Date'[Date]),-3,MONTH),'Order'[Salesman]="John"))
just remove filter('order', section and it is ok now.

Getting the latest of multiple values from DynamoDB

If I have a table like below:
ID A B C D(ate)
ASDF 1 2 1 12/12/2016
ASEF 1 2 3 12/13/2016
AFDS 2 3 1 12/13/2016
ASFA 2 3 4 12/14/2016
And I want to get the latest based on a pair (A, B) and the latest date (D), how would I set up a table to work well with this request, and what kind of query would I use to retrieve that?
The output would give me values for the IDs ASEF and ASFA.
I'm using the aws-go-sdk to run this, but I should be able to convert any solution to that SDK.
You should use attribute A as HashKey(PartitionKey) and Date as RangeKey(SortKey). With partitionKey only equality condition(==) can be used. RangeKey supports <,>,=,Between operations.
Check this link

Resources