Show daily count and the total count up to that day in Quicksight - amazon-quicksight

I want to create a table analysis in AWS Quicksight that shows the number of new user per day and also the total number of user that has registered up until that day for the specified month.
The following sample table is what I want to achieve in Quicksight.
It shows the daily register count for March:
+-----------+----------------------+----------------------+
| | Daily Register Count | Total Register Count |
+-----------+----------------------+----------------------+
| March 1st | 2 | 42 |
+-----------+----------------------+----------------------+
| March 2nd | 5 | 47 |
+-----------+----------------------+----------------------+
| March 3rd | 3 | 50 |
+-----------+----------------------+----------------------+
| March 4th | 8 | 58 |
+-----------+----------------------+----------------------+
| March 5th | 2 | 60 |
+-----------+----------------------+----------------------+
The "Total Register Count" column above should show the total count of users registered from the beginning up until March 1st, and then for each row it should be incremented with the value from "Daily Register Count"
I'm absolutely scratching my head trying to implement the "Total Register Count". I have found some form of success using runningSum function however I need to be able to filter my dataset by month, and the runningSum function won't count the number outside of the filtered date.
My dataset is very simple, it looks like this:
+----+-------------+---------------+
| id | email | registered_at |
+----+-------------+---------------+
| 1 | aaa#aaa.com | 2020-01-01 |
+----+-------------+---------------+
| 2 | bbb#aaa.com | 2020-01-01 |
+----+-------------+---------------+
| 3 | ccc#aaa.com | 2020-01-03 |
+----+-------------+---------------+
| 4 | abc#aaa.com | 2020-01-04 |
+----+-------------+---------------+
| 5 | def#bbb.com | 2020-02-01 |
+----+-------------+---------------+
I hope someone can help me with this.
Thank you!

I am new to QuickSight but the way I was able to get Total Register Count is by creating a calculated field called count and assigned it the fixed value of 1.
Then I created a second calculated field "Total Register Count" with the following formula
runningSum(sum(count), [{ registered_at} ASC], [])

It sounds as if the CountOver function would work well for you. You'll need to partition your count by the day of the month (using the extract function). Here is a link related to the CountOver function.
https://docs.aws.amazon.com/quicksight/latest/user/countOver-function.html
This is called a Level Aware Aggregation in QuickSight. Here is additional information on that:
https://docs.aws.amazon.com/quicksight/latest/user/level-aware-aggregations.html
Here is information on the extract function:
https://docs.aws.amazon.com/quicksight/latest/user/extract-function.html
If I were to take a stab at your formula, it would look like this:
countover(ID,[extract('DD',registered_at)],PRE_FILTER)
Your table would have the registered_at field as the date.

Related

How can i merge multiple columns from two different files in talend

Lets say i have multiple columns coming from two different files like that :
USERNAME | AGE | GENDER | CHILDREN
Joe | 23 | male | 2
Annie | 45 | female | 5
| | |
And another one like this :
USERNAME | AGE |
Jonathan | 33 |
Mike | 41 |
And i want to merge the data of the columns that have the same name into one like this while keeping the data of the columns that are unique at each field:
USERNAME | AGE | GENDER | CHILDREN
Joe | 23 | male | 2
Annie | 45 | female | 5
Jonathan | 33 | |
Mike | 41 | |
Sorry if the answer is obvious, im new to talend, thanks.
What tool is available toy you?
The Append function in SAS for example can do this for you.
You can use the append approach in Python, R or other language you intend using.
For Talen:
Copy the complete subjob1 – copy me sub job and paste it to create a second sub job.
Link the two sub jobs using an onSubjobOK link.
Open tFixedFlowInput, and change Records from first subjob to Records from second subjob.
Open tFileOutputDelimited on the new sub job, and tick Append, as shown in the following screenshot:
use a tUnite component to accomplish that
here is the link of the documentation : https://help.talend.com/r/fr-FR/8.0/orchestration/tunite
your flow would be
tFileInput1(excel or csv ) ----------------------------------------------
|
| ->tUnite -> tLogRow
tFileInput2(excel or csv )->tMap (add to empty fields GENDER & Children )|

AWS QuickSight filtering based on result of a query or other dataset

I want to create an analysis table in AWS Quicksight that shows the quantity sold in a given month and it's subsequent month based on users who made a purchase on the given month.
Let's say I have a dataset called user_orders with the following data:
+---------+----------+------------+
| user_id | quantity | order_date |
+---------+----------+------------+
| 1 | 2 | 2020-04-01 |
+---------+----------+------------+
| 1 | 3 | 2020-04-02 |
+---------+----------+------------+
| 1 | 1 | 2020-05-23 |
+---------+----------+------------+
| 1 | 2 | 2020-06-02 |
+---------+----------+------------+
| 2 | 1 | 2020-05-03 |
+---------+----------+------------+
| 2 | 1 | 2020-05-04 |
+---------+----------+------------+
| 3 | 2 | 2020-04-07 |
+---------+----------+------------+
| 3 | 1 | 2020-04-10 |
+---------+----------+------------+
| 3 | 1 | 2020-06-23 |
+---------+----------+------------+
For example, using the table above I want to be able to show how many quantities sold in April, May, June, and so on (max 12 months) by users who made a purchase in April.
The resulting table should look like this:
+-----------+----------+
| | quantity |
+-----------+----------+
| 04-2020 | 8 |
+-----------+----------+
| 05-2020 | 1 |
+-----------+----------+
| 06-2020 | 3 |
+-----------+----------+
8 sold in April because user_id 1 made 5 purchase and user_id 3 made 3 purchase while user_id 2 did not make any purchase.
There is only 1 item sold in May because only user_id 1 made the purchase in May, but also made a purchase in April. user_id 2 also made a purchase in May but didn't in April so it's not counted.
I can make the table above using PHP and MySQL fairly easily using the following code:
# first get all the user ids who made a purchase in April
$user_ids = sql_query("SELECT DISTINCT user_id FROM user_orders WHERE order_date BETWEEN '2020-04-01' AND '2020-04-30'");
# get the quantity sold for each month by users who made a purchase in April
$purchases = sql_query("SELECT MONTH(order_date), SUM(quantity) FROM user_orders WHERE user_id IN ({$user_ids}) AND order_date BETWEEN '2020-04-01' AND '2021-03-31' GROUP BY MONTH(order_date);")
(Obviously, April is just an example, I'd like to be able to change the starting month dynamically using QuickSight control)
As my above example shown, it requires two queries to perform this analysis. First, is to get the user_ids of the users, and the next is to actually get the quantity sold by the users.
I have been trying to achieve this using Quicksight for the last 3 days but hasn't found any way yet.
I hope someone can point me in the right direction.
Thank you!
You can achieve this by creating a calculated field like this and filtering on it
distinctCountOver(ifelse(truncDate('MM', {order_Date}) = parseDate('2020-04-01'), 1, NULL), [{user_id}], PRE_AGG)
(ofcourse, you can change the parseDate portion to be your date parameter)
Now, lets say the name of the above calculated field is SpecificMonthUser. You can add a filter sum(SpecificMonthUser) != 0.
And then create a pivot table visualization with OrderDate, user id in the rows and sum(quantity) in the values. You should get the desired result.

Is there a way to rank multiple columns in power query?

I am setting up a query where I need to rank multiple columns. I was able to sort the first column in descending order and inserted an index column. However, I am not able to rank the other columns.
I have included an example below:
Table to show agent performance
Agent | surveys | rank | outcalls |total calls |outcalls/total calls |rank
Dallas | 80% | 1 | 50 | 80 | 62.5% | ?
May | 75% | 2 | 90 | 100 | 90.0% | ?
Summer | 60% | 3 | 60 | 75 | 80.0% | ?
So basically from the example above, I was able to add an index column that ranked the surveys. How can I rank the outcalls/total calls column while still maintaining the rank in the other columns?
In this case, a simple approach would be to sort on outcalls/total calls, add another index column, and then sort the first rank column if you want to revert back to your starting order.

Get 1 value of each date SSRS

Ussing SSRS, I have data with duplicate values in Field1. I need to get only 1 value of each month.
Field1 | Date |
----------------------------------
30 | 01.01.1990 |
30 | 01.01.1990 |
30 | 01.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
40 | 03.01.1990 |
40 | 03.01.1990 |
40 | 03.01.1990 |
It should be ssrs expression with average value of each month or mb there are other solutions to get requested data by ssrs expression. Requested data in table:
30 | 01.01.1990 |
50 | 02.01.1990 |
40 | 03.01.1990 |
Hope for help.
There is no SumDistinct function in SSRS, and it is real lack of it (CountDistinct exist although). So you obviously can't achieve what you want easy way. You have two options:
Implement a new stored procedure with select distinct, returning reduced set of fields to avoid repeated data that you need. You then need to use this stored procedure to build new dataset and use in your table. But this way obviously may be not applicable in your case.
The other option is to implement your own function, which will save state of aggregation and perform distinct sum. Take a look at this page, it contains examples of code that you need.

Select only the rows which timestamp correspond to the current month

I am starting to try some experiments using Google SpreadSheets as a DB and for that I am collecting data from different sources and inserting them via spreadsheets API into a sheet.
Each row has a value (Column B) and a timestamp (Column A).
+---------------------+------+
| ColA | ColB |
+---------------------+------+
| 13/10/2012 00:19:01 | 42 |
| 19/10/2012 00:29:01 | 100 |
| 21/10/2012 00:39:01 | 23 |
| 22/10/2012 00:29:01 | 1 |
| 23/10/2012 00:19:01 | 24 |
| 24/10/2012 00:19:01 | 4 |
| 31/10/2012 00:19:01 | 2 |
+---------------------+------+
What I am trying to do is to programmatically add the sum of all rows in Column B where Column A is equal to the current month into a different cell.
Is there any function that I can use for that? Or anyone can point me to the right direction on how can I create a custom function which might do something like this? I know how to do this using MySQL but I couldn't find anything for Google SpreadSheets
Thanks in advance for any tip in the right direction.
Would native spreadsheet functions do?
=ArrayFormula(SUMIF(TEXT(A:A;"MM/yyyy");TEXT(GoogleClock();"MM/yyyy");B:B))

Resources